Description
It’s return guest season here at Latent Space! We last talked to Kanjun in October and Jonathan in May (and December post Databricks acquisition):
Imbue and Databricks are back for a rare treat: a double-header interview talking about DBRX from Databricks and Imbue 70B, a new internal LLM that “outperforms GPT-4o” zero-shot on a range of reasoning and coding-related benchmarks and datasets, while using 7x less data than Llama 3 70B.
While Imbue, being an agents company rather than a model provider, are not releasing their models today, they are releasing almost everything else:
* Cleaned-up and extended versions of 11 of the most popular NLP reasoning benchmarks
* An entirely new code-focused reasoning benchmark
* A fine-tuned 70B model, built with Meta Llama 3, to identify ambiguity
* A new dataset of 450,000 human judgments about ambiguity
* Infrastructure scripts for bringing a cluster from bare metal to robust, high performance training
* Our cost-aware hyperparameter optimizer, CARBS, which automatically and systematically fine-tunes all hyperparameters to derive optimum performance for models of any size
As well as EXTREMELY detailed posts on the infrastructure needs, hyperparameter search, and clean versions of the sorry state of industry standard benchmarks. This means for the FIRST TIME (perhaps since Meta’s OPT-175B in 2022?) you have this level of educational detail into the hardware and ML nitty gritty of training extremely large LLMs, and if you are in fact training LLMs of this scale you now have evals, optimizers, scripts, and human data/benchmarks you can use to move the industry forward together with Imbue.
We are busy running the sold-out AI Engineer World’s Fair today, and so are unable to do our usual quality writeup, however, please enjoy our show notes and the excellent conversation! Thanks also to Kanjun, Ashley, Tom and the rest of team Imbue for setting up this interview behind the scenes.
Video pod
Timestamps
* [00:00:00] Introduction and catch up with guests
* [00:01:55] Databricks' text to image model release
* [00:03:46] Details about the DBRX model
* [00:05:26] Imbue's infrastructure, evaluation, and hyperparameter optimizer releases
* [00:09:18] Challenges of training foundation models and getting infrastructure to work
* [00:12:03] Details of Imbue's cluster setup
* [00:18:53] Process of bringing machines online and common failures
* [00:22:52] Health checks and monitoring for the cluster
* [00:25:06] Typical timelines and team composition for setting up a cluster
* [00:27:24] Monitoring GPU utilization and performance
* [00:29:39] Open source tools and libraries used
* [00:32:33] Reproducibility and portability of cluster setup
* [00:35:57] Infrastructure changes needed for different model architectures
* [00:40:49] Imbue's focus on text-only models for coding and reasoning
* [00:42:26] CARBS hyperparameter tuner and cost-aware optimization
* [00:51:01] Emergence and CARBS
* [00:53:18] Evaluation datasets and reproducing them with high quality
* [00:58:40] Challenges of evaluating on more realistic tasks
* [01:06:01] Abstract reasoning benchmarks like ARC
* [01:10:13] Long context evaluation and needle-in-a-haystack tasks
* [01:13:50] Function calling and tool use evaluation
* [01:19:19] Imbue's future plans for coding and reasoning applications
* [01:20:14] Databricks' future plans for useful applications and upcoming blog posts
Transcript
SWYX [00:00:00]: Welcome to the Latent Space Podcast, another super special edition. Today, we have sort of like a two-header. John Frankel from Mosaic Databricks, or Databricks Mosaic, and Josh Albrecht from MBU. Welcome.
JOSH [00:00:12]: Hey, glad to be here.
SWYX [00:00:14]: Thank you for having us. Hey, so both of you are kind of past guests. Jonathan, you were actually one of the most popular episodes from last year talking about MPT7B. Remember the days when we trained large models and there was 7B?
JONATHAN [00:00:30]: Yeah, back
Alessio will be at AWS re:Invent next week and hosting a casual coffee meetup on Wednesday, RSVP here! And subscribe to our calendar for our Singapore, NeurIPS, and all upcoming meetups!
We are still taking questions for our next big recap episode! Submit questions and messages on Speakpipe here...
Published 11/15/24
We are recording our next big recap episode and taking questions!
Submit questions and messages on Speakpipe here for a chance to appear on the show!
Also subscribe to our calendar for our Singapore, NeurIPS, and all upcoming meetups!
In our first ever episode with Logan Kilpatrick we called out...
Published 11/11/24