ThursdAI - May 2nd - New GPT2? Copilot Workspace, Evals and Vibes from Reka, LLama3 1M context (+ Nous finetune) & more AI news
Listen now
Description
Hey 👋 Look it May or May not be the first AI newsletter you get in May, but it's for sure going to be a very information dense one. As we had an amazing conversation on the live recording today, over 1K folks joined to listen to the first May updates from ThursdAI. As you May know by now, I just love giving the stage to folks who are the creators of the actual news I get to cover from week to week, and this week, we had again, 2 of those conversations. First we chatted with Piotr Padlewski from Reka, the author on the new Vibe-Eval paper & Dataset which they published this week. We've had Yi and Max from Reka on the show before, but it was Piotr's first time and he was super super knowledgeable, and was really fun to chat with. Specifically, as we at Weights & Biases launch a new product called Weave (which you should check out at https://wandb.me/weave) I'm getting more a LOT more interested in Evaluations and LLM scoring, and in fact, we started the whole show today with a full segment on Evals, Vibe checks and covered a new paper from Scale about overfitting. The second deep dive was with my friend Idan Gazit, from GithubNext, about the new iteration of Github Copilot, called Copilot Workspace. It was a great one, and you should definitely give that one a listen as well TL;DR of all topics covered + show notes * Scores and Evals * No notable changes, LLama-3 is still #6 on LMsys * gpt2-chat came and went (in depth chan writeup) * Scale checked for Data Contamination on GSM8K using GSM-1K (Announcement, Paper) * Vibes-Eval from Reka - a set of multimodal evals (Announcement, Paper, HF dataset) * Open Source LLMs * Gradient releases 1M context window LLama-3 finetune (X) * MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.4 (X, HF) * Nous Research - Hermes Pro 2 - LLama 3 8B (X, HF) * AI Town is running on Macs thanks to Pinokio (X) * LMStudio releases their CLI - LMS (X, Github) * Big CO LLMs + APIs * Github releases Copilot Workspace (Announcement) * AI21 - releases Jamba Instruct w/ 256K context (Announcement) * Google shows Med-Gemini with some great results (Announcement) * Claude releases IOS app and Team accounts (X) * This weeks Buzz * We're heading to SF to sponsor the biggest LLama-3 hackathon ever with Cerebral Valley (X) * Check out my video for Weave our new product, it's just 3 minutes (Youtube) * Vision & Video * Intern LM open sourced a bunch of LLama-3 and Phi based VLMs (HUB) * And they are MLXd by the "The Bloke" of MLX, Prince Canuma (X) * AI Art & Diffusion & 3D * ByteDance releases Hyper-SD - Stable Diffusion in a single inference step (Demo) * Tools & Hardware * Still haven't open the AI Pin, and Rabbit R1 just arrived, will open later today * Co-Hosts and Guests * Piotr Padlewski (@PiotrPadlewski) from Reka AI * Idan Gazit (@idangazit) from Github Next * Wing Lian (@winglian) * Nisten Tahiraj (@nisten) * Yam Peleg (@yampeleg) * LDJ (@ldjconfirmed) * Wolfram Ravenwolf (@WolframRvnwlf) * Ryan Carson (@ryancarson) Scores and Evaluations New corner in today's pod and newsletter given the focus this week on new models and comparing them to existing models. What is GPT2-chat and who put it on LMSys? (and how do we even know it's good?) For a very brief period this week, a new mysterious model appeared on LMSys, and was called gpt2-chat. It only appeared on the Arena, and did not show up on the leaderboard, and yet, tons of sleuths from 4chan to reddit to X started trying to figure out what this model was and wasn't. Folks started analyzing the tokenizer, the output schema, tried to get the system prompt and gauge the context length. Many folks were hoping that this is an early example of GPT4.5 or something else entirely. It did NOT help that uncle SAMA first posted the first tweet and then edited it to remove the - and it was unclear if he's trolling again or foreshadowing a completely new release or an old GPT-2 but retrained on newer data or something. The model was really surprisingly good, s
More Episodes
Wow, holy s**t, insane, overwhelming, incredible, the future is here!, "still not there", there are many more words to describe this past week. (TL;DR at the end of the blogpost) I had a feeling it's going to be a big week, and the companies did NOT disappoint, so this is going to be a very big...
Published 05/17/24
Hey 👋 (show notes and links a bit below) This week has been a great AI week, however, it does feel like a bit "quiet before the storm" with Google I/O on Tuesday next week (which I'll be covering from the ground in Shoreline!) and rumors that OpenAI is not just going to let Google have all the...
Published 05/10/24