📅 ThursdAI - Aug 29 - AI Plays DOOM, Cerebras breaks inference records, Google gives new Geminis, OSS vision SOTA & 100M context windows!?
Description
Hey, for the least time during summer of 2024, welcome to yet another edition of ThursdAI, also happy skynet self-awareness day for those who keep track :)
This week, Cerebras broke the world record for fastest LLama 3.1 70B/8B inference (and came on the show to talk about it) Google updated 3 new Geminis, Anthropic artifacts for all, 100M context windows are possible, and Qwen beats SOTA on vision models + much more!
As always, this weeks newsletter is brought to you by Weights & Biases, did I mention we're doing a hackathon in SF in September 21/22 and that we have an upcoming free RAG course w/ Cohere & Weaviate?
TL;DR
* Open Source LLMs
* Nous DisTrO - Distributed Training (X , Report)
* NousResearch/ hermes-function-calling-v1 open sourced - (X, HF)
* LinkedIN Liger-Kernel - OneLine to make Training 20% faster & 60% more memory Efficient (Github)
* Cartesia - Rene 1.3B LLM SSM + Edge Apache 2 acceleration (X, Blog)
* Big CO LLMs + APIs
* Cerebras launches the fastest AI inference - 447t/s LLama 3.1 70B (X, Blog, Try It)
* Google - Gemini 1.5 Flash 8B & new Gemini 1.5 Pro/Flash (X, Try it)
* Google adds Gems & Imagen to Gemini paid tier
* Anthropic artifacts available to all users + on mobile (Blog, Try it)
* Anthropic publishes their system prompts with model releases (release notes)
* OpenAI has project Strawberry coming this fall (via The information)
* This weeks Buzz
* WandB Hackathon hackathon hackathon (Register, Join)
* Also, we have a new RAG course w/ Cohere and Weaviate (RAG Course)
* Vision & Video
* Zhipu AI CogVideoX - 5B Video Model w/ Less 10GB of VRAM (X, HF, Try it)
* Qwen-2 VL 72B,7B,2B - new SOTA vision models from QWEN (X, Blog, HF)
* AI Art & Diffusion & 3D
* GameNgen - completely generated (not rendered) DOOM with SD1.4 (project)
* FAL new LORA trainer for FLUX - trains under 5 minutes (Trainer, Coupon for ThursdAI)
* Tools & Others
* SimpleBench from AI Explained - closely matches human experience (simple-bench.com)
ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
Open Source
Let's be honest - ThursdAI is a love letter to the open-source AI community, and this week was packed with reasons to celebrate.
Nous Research DiStRO + Function Calling V1
Nous Research was on fire this week (aren't they always?) and they kicked off the week with the release of DiStRO, which is a breakthrough in distributed training. You see, while LLM training requires a lot of hardware, it also requires a lot of network bandwidth between the different GPUs, even within the same data center.
Proprietary networking solutions like Nvidia NVLink, and more open standards like Ethernet work well within the same datacenter, but training across different GPU clouds has been unimaginable until now.
Enter DiStRo, a new decentralized training by the mad geniuses at Nous Research, in which they reduced the required bandwidth to train a 1.2B param model from 74.4GB to just 86MB (857x)!
This can have massive implications for training across compute clusters, doing shared training runs, optimizing costs and efficiency and democratizing LLM training access! So don't sell your old GPUs just yet, someone may just come up with a folding@home but for training the largest open source LLM, and it may just be Nous!
Nous Research also released their function-calling-v1 dataset (HF) that was used to train Hermes-2, and we had InterstellarNinja who authored that dataset, join the show and chat about it. This is an incredible unlock for the open source community, as function calling become a de-facto standard now. Shout out to the Glaive team as well for their pioneering work that paved the way!
LinkedIn's Liger Kernel: Unleashing the Need for Speed (with One Line of Code)
What if I told you, that whatever software you develop, you can add 1 line of code, and it'll run 20% faster, and require 60% less memory?
This week is a very exciting one in the world of AI news, as we get 3 SOTA models, one in overall LLM rankings, on in OSS coding and one in OSS voice + a bunch of new breaking news during the show (which we reacted to live on the pod, and as we're now doing video, you can see us freak out in real...
Published 11/15/24
👋 Hey all, this is Alex, coming to you from the very Sunny California, as I'm in SF again, while there is a complete snow storm back home in Denver (brrr).
I flew here for the Hackathon I kept telling you about, and it was glorious, we had over 400 registered, over 200 approved hackers, 21 teams...
Published 11/08/24