๐Ÿงจ ThursdAI - July 25 - OpenSource GPT4 intelligence has arrived - Meta LLaMa 3.1 405B beats GPT4o! Mistral Large 2 also, Deepseek Code v2 ALSO - THIS WEEK
Listen now
Description
Holy s**t, folks! I was off for two weeks, last week OpenAI released GPT-4o-mini and everyone was in my mentions saying, Alex, how are you missing this?? and I'm so glad I missed that last week and not this one, because while GPT-4o-mini is incredible (GPT-4o level distill with incredible speed and almost 99% cost reduction from 2 years ago?) it's not open source. So welcome back to ThursdAI, and buckle up because we're diving into what might just be the craziest week in open-source AI since... well, ever! This week, we saw Meta drop LLAMA 3.1 405B like it's hot (including updated 70B and 8B), Mistral joining the party with their Large V2, and DeepSeek quietly updating their coder V2 to blow our minds. Oh, and did I mention Google DeepMind casually solving math Olympiad problems at silver level medal ๐Ÿฅˆ? Yeah, it's been that kind of week. TL;DR of all topics covered: * Open Source * Meta LLama 3.1 updated models (405B, 70B, 8B) - Happy LLama Day! (X, Announcement, Zuck, Try It, Try it Faster, Evals, Provider evals) * Mistral Large V2 123B (X, HF, Blog, Try It) * DeepSeek-Coder-V2-0724 update (API only) * Big CO LLMs + APIs * ๐Ÿฅˆ Google Deepmind wins silver medal at Math Olympiad - AlphaGeometry 2 (X) * OpenAI teases SearchGPT - their reimagined search experience (Blog) * OpenAI opens GPT-4o-mini finetunes + 2 month free (X) * This weeks Buzz * I compare 5 LLama API providers for speed and quantization using Weave (X) * Voice & Audio * Daily announces a new open standard for real time Voice and Video RTVI-AI (X, Try it, Github) Meta LLAMA 3.1: The 405B Open Weights Frontier Model Beating GPT-4 ๐Ÿ‘‘ Let's start with the star of the show: Meta's LLAMA 3.1. This isn't just a 0.1 update; it's a whole new beast. We're talking about a 405 billion parameter model that's not just knocking on GPT-4's door โ€“ it's kicking it down. Here's the kicker: you can actually download this internet scale intelligence (if you have 820GB free). That's right, a state-of-the-art model beating GPT-4 on multiple benchmarks, and you can click a download button. As I said during the show, "This is not only refreshing, it's quite incredible." Some highlights: * 128K context window (finally!) * MMLU score of 88.6 * Beats GPT-4 on several benchmarks like IFEval (88.6%), GSM8K (96.8%), and ARC Challenge (96.9%) * Has Tool Use capabilities (also beating GPT-4) and is Multilingual (ALSO BEATING GPT-4) But that's just scratching the surface. Let's dive deeper into what makes LLAMA 3.1 so special. The Power of Open Weights Mark Zuckerberg himself dropped an exclusive interview with our friend Rowan Cheng from Rundown AI. And let me tell you, Zuck's commitment to open-source AI is no joke. He talked about distillation, technical details, and even released a manifesto on why open AI (the concept, not the company) is "the way forward". As I mentioned during the show, "The fact that this dude, like my age, I think he's younger than me... knows what they released to this level of technical detail, while running a multi billion dollar company is just incredible to me." Evaluation Extravaganza The evaluation results for LLAMA 3.1 are mind-blowing. We're not just talking about standard benchmarks here. The model is crushing it on multiple fronts: * MMLU (Massive Multitask Language Understanding): 88.6% * IFEval (Instruction Following): 88.6% * GSM8K (Grade School Math): 96.8% * ARC Challenge: 96.9% But it doesn't stop there. The fine folks at meta also for the first time added new categories like Tool Use (BFCL 88.5) and Multilinguality (Multilingual MGSM 91.6) (not to be confused with MultiModality which is not yet here, but soon) Now, these are official evaluations from Meta themselves, that we know, often don't really represent the quality of the model, so let's take a look at other, more vibey results shall we? On SEAL leaderboards from Scale (held back so can't be trained on) LLama 405B is beating ALL other models on Instruction Following, getti
More Episodes
This week is a very exciting one in the world of AI news, as we get 3 SOTA models, one in overall LLM rankings, on in OSS coding and one in OSS voice + a bunch of new breaking news during the show (which we reacted to live on the pod, and as we're now doing video, you can see us freak out in real...
Published 11/15/24
๐Ÿ‘‹ Hey all, this is Alex, coming to you from the very Sunny California, as I'm in SF again, while there is a complete snow storm back home in Denver (brrr). I flew here for the Hackathon I kept telling you about, and it was glorious, we had over 400 registered, over 200 approved hackers, 21 teams...
Published 11/08/24