ThursdAI - June 13th, 2024 - Apple Intelligence recap, Elons reaction, Luma's Dream Machine, AI Engineer invite, SD3 & more AI news from this past week
Description
Happy Apple AI week everyone (well, those of us who celebrate, some don't) as this week we finally got told what Apple is planning to do with this whole generative AI wave and presented Apple Intelligence (which is AI, get it? they are trying to rebrand AI!)
This weeks pod and newsletter main focus will be Apple Intelligence of course, as it was for most people compared to how the market reacted ($APPL grew over $360B in a few days after this announcement) and how many people watched each live stream (10M at the time of this writing watched the WWDC keynote on youtube, compared to 4.5 for the OpenAI GPT-4o, 1.8 M for Google IO)
On the pod we also geeked out on new eval frameworks and benchmarks including a chat with the authors of MixEvals which I wrote about last week and a new benchmark called Live Bench from Abacus and Yan Lecun
Plus a new video model from Luma and finally SD3, let's go! 👇
TL;DR of all topics covered:
* Apple WWDC recap and Apple Intelligence (X)
* This Weeks Buzz
* AI Engineer expo in SF (June 25-27) come see my talk, it's going to be Epic (X, Schedule)
* Open Source LLMs
* Microsoft Samba - 3.8B MAMBA + Sliding Window Attention beating Phi 3 (X, Paper)
* Sakana AI releases LLM squared - LLMs coming up with preference algorithms to train better LLMS (X, Blog)
* Abacus + Yan Lecun release LiveBench.AI - impossible to game benchmark (X, Bench
* Interview with MixEval folks about achieving 96% arena accuracy with 5000x less price
* Big CO LLMs + APIs
* Mistral announced a 600M series B round
* Revenue at OpenAI DOUBLED in the last 6 month and is now at $3.4B annualized (source)
* Elon drops lawsuit vs OpenAI
* Vision & Video
* Luma drops DreamMachine - SORA like short video generation in free access (X, TRY IT)
* AI Art & Diffusion & 3D
* Stable Diffusion Medium weights are here (X, HF, FAL)
* Tools
* Google releases GenType - create an alphabet with diffusion Models (X, Try It)
Apple Intelligence
Technical LLM details
Let's dive right into what wasn't show on the keynote, in a 6 minute deep dive video from the state of the union for developers and in a follow up post on machine learning blog, Apple shared some very exciting technical details about their on device models and orchestration that will become Apple Intelligence.
Namely, on device they have trained a bespoke 3B parameter LLM, which was trained on licensed data, and uses a bunch of very cutting edge modern techniques to achieve quite an incredible on device performance. Stuff like GQA, Speculative Decoding, a very unique type of quantization (which they claim is almost lossless)
To maintain model , we developed a new framework using LoRA adapters that incorporates a mixed 2-bit and 4-bit configuration strategy — averaging 3.5 bits-per-weight — to achieve the same accuracy as the uncompressed models [...] on iPhone 15 Pro we are able to reach time-to-first-token latency of about 0.6 millisecond per prompt token, and a generation rate of 30 tokens per second
These small models (they also have a bespoke image diffusion model as well) are going to be finetuned with a lot of LORA adapters for specific tasks like Summarization, Query handling, Mail replies, Urgency and more, which gives their foundational models the ability to specialize itself on the fly to the task at hand, and be cached in memory as well for optimal performance.
Personal and Private (including in the cloud)
While these models are small, they will also benefit from 2 more things on device, a vector store of your stuff (contacts, recent chats, calendar, photos) they call semantic index and a new thing apple is calling App Intents, which developers can expose (and the OS apps already do) that will allows the LLM to use tools like moving files, extracting data across apps, and do actions, this already makes the AI much more personal and helpful as it has in its context things about me and what my apps can do on my phone.
Handoff to the Private Cloud (and then to O
This week is a very exciting one in the world of AI news, as we get 3 SOTA models, one in overall LLM rankings, on in OSS coding and one in OSS voice + a bunch of new breaking news during the show (which we reacted to live on the pod, and as we're now doing video, you can see us freak out in real...
Published 11/15/24
👋 Hey all, this is Alex, coming to you from the very Sunny California, as I'm in SF again, while there is a complete snow storm back home in Denver (brrr).
I flew here for the Hackathon I kept telling you about, and it was glorious, we had over 400 registered, over 200 approved hackers, 21 teams...
Published 11/08/24