📅 ThursdAI Feb 22nd - Groq near instant LLM calls, SDXL Lightning near instant SDXL, Google gives us GEMMA open weights and refuses to draw white people, Stability announces SD3 & more AI news
Listen now
Description
Hey, this is Alex, Ok let's start with the big news, holy crap this week was a breakthrough week for speed! We had both Groq explode in popularity, and ByteDance release an updated SDXL model called Lightning, able to generate full blown SDXL 1024 images in 300ms. I've been excited about seeing what real time LLM/Diffusion can bring, and with both of these news release the same week, I just had to go and test them out together: Additionally, we had Google step into a big open weights role, and give us Gemma, 2 open weights models 2B and 7B (which is closer to 9B per Junyang) and it was great to see google committing to releasing at least some models in the open. We also had breaking news, Emad from Stability announced SD3, which looks really great, Google to pay Reddit 200M for AI training on their data & a few more things. TL;DR of all topics covered: * Big CO LLMs + APIs * Groq custom LPU inference does 400T/s Llama/Mistral generation (X, Demo) * Google image generation is in Hot Waters and was reportedly paused (refuses to generate white people) * Gemini 1.5 long context is very impressive to folks (Matt Shumer, Ethan Mollick) * Open Weights LLMs * Google releases GEMMA, open weights 2B and 7B models (Announcement, Models) * Teknium releases Nous Hermes DPO (Announcement, HF) * Vision & Video * YoLo V9 - SOTA real time object detector is out (Announcement, Code) * This weeks Buzz (What I learned in WandB this week) * Went to SF to cohost an event with A16Z, Nous, Mistral (Thread, My Report) * AI Art & Diffusion & 3D * ByteDance presents SDXL-Lightning (Try here, Model) * Stability announces Stable Diffusion 3 (Announcement) * Tools * Replit releases a new experimental Figma plugin for UI → Code (Announcement) * Arc browser adds "AI pinch to understand" summarization (Announcement) Big CO LLMs + APIs Groq's new LPU show extreme performance for LLMs - up to 400T/s (example) * Groq created a novel processing unit known as the Tensor Streaming Processor (TSP) which they categorize as a Linear Processor Unit (LPU). Unlike traditional GPUs that are parallel processors with hundreds of cores designed for graphics rendering, LPUs are architected to deliver deterministic performance for AI computations. * Analogy: They know where all the cars are going when everyone wakes up for work (when they compile) and how fast they all drive (compute latency) so they can get rid of traffic lights (routers) and turn lanes (backpressure) by telling everyone when to leave the house. * Why would we need something like this? Some folks are saying that average human reading is only 30T/s, I created an example that uses near instant Groq Mixtral + Lightning SDXL to just create images with Mixtral as my prompt manager Open Source Weights LLMs Google Gemma - 2B and 7B open weights models (demo) * 4 hours after release, Llama.cpp added support, Ollama and LM Studio added support, Tri dao added Flash attention support * Vocab size is 256K * 8K context window * Tokenizer similar to LLama * Folks are... not that impressed as far as I've seen * Trained on 6 trillion tokens * Google also released Gemma.cpp (local CPU inference) - Announcement Nous/Teknium re-release Nous Hermes with DPO finetune (Announcement) * DPO RLHF is performing better than previous models * Models are GGUF and can be found here * DPO enables Improvements across the board This weeks Buzz (What I learned with WandB this week) * Alex was in SF last week * A16Z + 20 something cohosts including Weights & Biases talked about importance of open source * Huge Shoutout Rajko and Marco from A16Z, and tons of open source folks who joined * Nous, Ollama, LLamaIndex, LMSys folks, Replicate, Perplexity, Mistral, Github, as well as Eric Hartford, Jon Durbin, Haotian Liu, HuggingFace, tons of other great folks from Mozilla, linux foundation and Percy from Together/Stanford Also had a chance to checkout one of the smol dinners in SF, they go really hard, had a great time showing folks th
More Episodes
This week is a very exciting one in the world of AI news, as we get 3 SOTA models, one in overall LLM rankings, on in OSS coding and one in OSS voice + a bunch of new breaking news during the show (which we reacted to live on the pod, and as we're now doing video, you can see us freak out in real...
Published 11/15/24
👋 Hey all, this is Alex, coming to you from the very Sunny California, as I'm in SF again, while there is a complete snow storm back home in Denver (brrr). I flew here for the Hackathon I kept telling you about, and it was glorious, we had over 400 registered, over 200 approved hackers, 21 teams...
Published 11/08/24