🎄ThursdAI - LAION down, OpenChat beats GPT3.5, Apple is showing

ThursdAI - The top AI news from the past week

🎄ThursdAI - LAION down, OpenChat beats GPT3.5, Apple is showing where it's going, Midjourney v6 is here & Suno can make music!

Listen now

Description

Hey everyone, happy ThursdAI! As always, here's a list of things we covered this week, including show notes and links, to prepare you for the holidays. TL;DR of all topics covered: * Open Source AI * OpenChat-3.5-1210 - a top performing open source 7B model from OpenChat team beating GPT3.5 and Grok (link, HF, Demo) * LAION 5B dataset taken down due to CSAM allegations from Stanford (link, full report pdf) * FLASK - New evaluation framework from KAIST - based on skillset (link) * Shows a larger difference between open/closed source * Open leaderboard reliability issues, vibes benchmarks and more * HF releases a bunch of MLX ready models (LLama, Phi, Mistral, Mixtral) (link) * New transformer alternative architectures - Hyena & Mamba are heating up (link) * Big CO LLMs + APIs * Apple - LLM in a flash paper is making rounds (AK, Takeaways thread) * Anthropic adheres to the messages API format (X) * Microsoft Copilot finally has plugins (X) * Voice & Audio * AI Music generation Suno is now part of Microsoft Copilot plugins and creates long beautiful songs (link) * AI Art & Diffusion * Midjourney v6 is out - better text, great at following instructions (link) Open Source AI We start today with a topic I didn't expect to be covering, the LAION 5B dataset, was taken down, after a report from Stanford Internet Observatory found instances of CSAM (Child Sexual Abuse material) in the vast dataset. The outlined report had identified hundreds to thousands of instances of images of this sort, and used something called PhotoDNA by Microsoft to identify the images by hashes, using a sample of NSFW marked images. LAION 5B was used to train Stable Diffusion, and 1.4 and 1.5 were trained on a lot of images from that dataset, however SD2 for example was only trained on images not marked as NSFW. The report is very thorough, going through the methodology to find and check those types of images. Worth noting that LAION 5B itself is not an image dataset, as it only contains links to images and their descriptions from alt tags. Obviously this is a very touchy topic, given the way this dataset was scraped from the web, and given how many image models were trained on it, the report doesn't allege anything close to influence on the models it was trained on, and outlines a few methods of preventing issues like this in the future. One unfortunate outcome of such a discovery, is that this type of work can only be done on open datasets like LAION 5B, while closed source datasets don't get nearly to this level of scrutiny, and this can slow down the advancement of multi-modal open source multi modal models while closed source will continue having these issues and still prevail. The report alleges they found and validated between hundreds to a few thousand of CSAM verified imagery, which considering the size of the dataset, is infinitesimally small, however, it still shouldn't exist at all and better techniques to clean those scraping datasets should exist. The dataset was taken down for now from HuggingFace and other places. New version of a 7B model that beats chatGPT from OpenChat collective (link, HF, Demo) Friend of the pod Alpay Aryak and team released an update to one of the best 7B models, namely OpenChat 7B (1210) is a new version of one of the top models in the 7B world called OpenChat with a significant score compared to chatGPT 3.5 and Grok and with very high benchmark hits (63.4% on HumanEval compared to GPT3.5 64%) Scrutiny of open source benchmarks and leaderboards being gamed We've covered State of the art models on ThursdAI, and every time we did, we covered the benchmarks, and evaluation scores, Whether that's the popular MMLU (Multi-Task Language Understanding) or HumanEval (Python coding questions) and almost always, we've referred to the HuggingFace Open LLM leaderboard for the latest and greatest models. This week, there's a long thread on the hugging face forums that HF eventually had to shut down, that alleges that a new c

More Episodes

See all »

📅 ThursdAI - May 30 - 1000 T/s inference w/ SambaNova, 135ms TTS with Cartesia, SEAL leaderboard from Scale & more AI news

Hey everyone, Alex here! Can you believe it's already end of May? And that 2 huge AI companies conferences are behind us (Google IO, MSFT Build) and Apple's WWDC is just ahead in 10 days! Exciting! I was really looking forward to today's show, had quite a few guests today, I'll add all their...

Published 05/31/24

ThursdAI - The top AI news from the past week

Published 05/31/24

📅 ThursdAI - May 23 - OpenAI troubles, Microsoft Build, Phi-3 small/large, new Mistral & more AI news

Hello hello everyone, this is Alex, typing these words from beautiful Seattle (really, it only rained once while I was here!) where I'm attending Microsoft biggest developer conference BUILD. This week we saw OpenAI get in the news from multiple angles, none of them positive and Microsoft...

Published 05/23/24