Description
Hey everyone, sending a quick one today, no deep dive, as I'm still in the middle of AI Engineer World's Fair 2024 in San Francisco (in fact, I'm writing this from the incredible floor 32 presidential suite, that the team here got for interviews, media and podcasting, and hey to all new folks who Iβve just met during the last two days!)
It's been an incredible few days meeting so many ThursdAI community members, listeners and folks who came on the pod! The list honestly is too long but I've got to meet friends of the pod Maxime Labonne, Wing Lian, Joao Morra (crew AI), Vik from Moondream, Stefania Druga not to mention the countless folks who came up and gave high fives, introduced themselves, it was honestly a LOT of fun. (and it's still not over, if you're here, please come and say hi, and let's take a LLM judge selfie together!)
On today's show, we recorded extra early because I had to run and play dress up, and boy am I relieved now that both the show and the talk are behind me, and I can go an enjoy the rest of the conference π₯ (which I will bring you here in full once I get the recording!)
On today's show, we had the awesome pleasure to have Surya Bhupatiraju who's a research engineer at Google DeepMind, talk to us about their newly released amazing Gemma 2 models! It was very technical, and a super great conversation to check out!
Gemma 2 came out with 2 sizes, a 9B and a 27B parameter models, with 8K context (we addressed this on the show) and this 27B model incredible performance is beating LLama-3 70B on several benchmarks and is even beating Nemotron 340B from NVIDIA!
This model is also now available on the Google AI studio to play with, but also on the hub!
We also covered the renewal of the HuggingFace open LLM leaderboard with their new benchmarks in the mix and normalization of scores, and how Qwen 2 is again the best model that's tested!
It's was a very insightful conversation, that's worth listening to if you're interested in benchmarks, definitely give it a listen.
Last but not least, we had a conversation with Ethan Sutin, the co-founder of Bee Computer. At the AI Engineer speakers dinner, all the speakers received a wearable AI device as a gift, and I onboarded (cause Swyx asked me) and kinda forgot about it. On the way back to my hotel I walked with a friend and chatted about my life.
When I got back to my hotel, the app prompted me with "hey, I now know 7 new facts about you" and it was incredible to see how much of the conversation it was able to pick up, and extract facts and eve TODO's!
So I had to have Ethan on the show to try and dig a little bit into the privacy and the use-cases of these hardware AI devices, and it was a great chat!
Sorry for the quick one today, if this is the first newsletter after you just met me and register, usually thereβs a deeper dive here, expect a more in depth write-ups in the next sessions, as now I have to run down and enjoy the rest of the conference!
Here's the TL;DR and my RAW show notes for the full show, in case it's helpful!
* AI Engineer is happening right now in SF
* Tracks include Multimodality, Open Models, RAG & LLM Frameworks, Agents, Al Leadership, Evals & LLM Ops, CodeGen & Dev Tools, Al in the Fortune 500, GPUs & Inference
* Open Source LLMs
* HuggingFace - LLM Leaderboard v2 - (Blog)
* Old Benchmarks sucked and it's time to renew
* New Benchmarks
* MMLU-Pro (Massive Multitask Language Understanding - Pro version, paper)
* GPQA (Google-Proof Q&A Benchmark, paper). GPQA is an extremely hard knowledge dataset
* MuSR (Multistep Soft Reasoning, paper).
* MATH (Mathematics Aptitude Test of Heuristics, Level 5 subset, paper)
* IFEval (Instruction Following Evaluation, paper)
* π€ BBH (Big Bench Hard, paper). BBH is a subset of 23 challenging tasks from the BigBench dataset
* The community will be able to vote for models, and we will prioritize running models with the most votes first
* Mozilla announces Builders Accelerator @ AI E
This week is a very exciting one in the world of AI news, as we get 3 SOTA models, one in overall LLM rankings, on in OSS coding and one in OSS voice + a bunch of new breaking news during the show (which we reacted to live on the pod, and as we're now doing video, you can see us freak out in real...
Published 11/15/24
π Hey all, this is Alex, coming to you from the very Sunny California, as I'm in SF again, while there is a complete snow storm back home in Denver (brrr).
I flew here for the Hackathon I kept telling you about, and it was glorious, we had over 400 registered, over 200 approved hackers, 21 teams...
Published 11/08/24