📆 🎂 - ThursdAI #52 - Moshi Voice, Qwen2 finetunes, GraphRag deep dive and more AI news on this celebratory 1yr ThursdAI
Description
Hey everyone! Happy 4th of July to everyone who celebrates! I celebrated today by having an intimate conversation with 600 of my closest X friends 😂 Joking aside, today is a celebratory episode, 52nd consecutive weekly ThursdAI show! I've been doing this as a podcast for a year now!
Which means, there are some of you, who've been subscribed for a year 😮 Thank you! Couldn't have done this without you. In the middle of my talk at AI Engineer (I still don't have the video!) I had to plug ThursdAI, and I asked the 300+ audience who is a listener of ThursdAI, and I saw a LOT of hands go up, which is honestly, still quite humbling. So again, thank you for tuning in, listening, subscribing, learning together with me and sharing with your friends!
This week, we covered a new (soon to be) open source voice model from KyutAI, a LOT of open source LLM, from InternLM, Cognitive Computations (Eric Hartford joined us), Arcee AI (Lukas Atkins joined as well) and we have a deep dive into GraphRAG with Emil Eifrem CEO of Neo4j (who shares why it was called Neo4j in the first place, and that he's a ThursdAI listener, whaaat? 🤯), this is definitely a conversation you don't want to miss, so tune in, and read a breakdown below:
TL;DR of all topics covered:
* Voice & Audio
* KyutAI releases Moshi - first ever 7B end to end voice capable model (Try it)
* Open Source LLMs
* Microsoft Updated Phi-3-mini - almost a new model
* InternLM 2.5 - best open source model under 12B on Hugging Face (HF, Github)
* Microsoft open sources GraphRAG (Announcement, Github, Paper)
* OpenAutoCoder-Agentless - SOTA on SWE Bench - 27.33% (Code, Paper)
* Arcee AI - Arcee Agent 7B - from Qwen2 - Function / Tool use finetune (HF)
* LMsys announces RouteLLM - a new Open Source LLM Router (Github)
* DeepSeek Chat got an significant upgrade (Announcement)
* Nomic GPT4all 3.0 - Local LLM (Download, Github)
* This weeks Buzz
* New free Prompts course from WandB in 4 days (pre sign up)
* Big CO LLMs + APIs
* Perplexity announces their new pro research mode (Announcement)
* X is rolling out "Grok Analysis" button and it's BAD in "fun mode" and then paused roll out
* Figma pauses the rollout of their AI text to design tool "Make Design" (X)
* Vision & Video
* Cognitive Computations drops DolphinVision-72b - VLM (HF)
* Chat with Emil Eifrem - CEO Neo4J about GraphRAG, AI Engineer
Voice & Audio
KyutAI Moshi - a 7B end to end voice model (Try It, See Announcement)
Seemingly out of nowhere, another french AI juggernaut decided to drop a major announcement, a company called KyutAI, backed by Eric Schmidt, call themselves "the first European private-initiative laboratory dedicated to open research in artificial intelligence" in a press release back in November of 2023, have quite a few rockstar co founders ex Deep Mind, Meta AI, and have Yann LeCun on their science committee.
This week they showed their first, and honestly quite mind-blowing release, called Moshi (Japanese for Hello, Moshi Moshi), which is an end to end voice and text model, similar to GPT-4o demos we've seen, except this one is 7B parameters, and can run on your mac!
While the utility of the model right now is not the greatest, not remotely close to anything resembling the amazing GPT-4o (which was demoed live to me and all of AI Engineer by Romain Huet) but Moshi shows very very impressive stats!
Built by a small team during only 6 months or so of work, they have trained an LLM (Helium 7B) an Audio Codec (Mimi) a Rust inference stack and a lot more, to give insane performance.
Model latency is 160ms and mic-to-speakers latency is 200ms, which is so fast it seems like it's too fast. The demo often responds faster than I'm able to finish my sentence, and it results in an uncanny, "reading my thoughts" type feeling.
The most important part is this though, a quote of KyutAI post after the announcement :
Developing Moshi required significant contributions to audio codecs, multimodal LLMs, multimo
This week is a very exciting one in the world of AI news, as we get 3 SOTA models, one in overall LLM rankings, on in OSS coding and one in OSS voice + a bunch of new breaking news during the show (which we reacted to live on the pod, and as we're now doing video, you can see us freak out in real...
Published 11/15/24
👋 Hey all, this is Alex, coming to you from the very Sunny California, as I'm in SF again, while there is a complete snow storm back home in Denver (brrr).
I flew here for the Hackathon I kept telling you about, and it was glorious, we had over 400 registered, over 200 approved hackers, 21 teams...
Published 11/08/24