The AI-First Graphics Editor - with Suhail Doshi of Playground AI

Latent Space: Founders, Engineers, and News on...

Listen now

Description

We are running an end of year survey for our listeners! Please let us know any feedback you have, what episodes resonated with you, and guest requests for 2024! Survey link here! Before language models became all the rage in November 2022, image generation was the hottest space in AI (it was the subject of our first piece on Latent Space!) In our interview with Sharif Shameem from Lexica we talked through the launch of StableDiffusion and the early days of that space. At the time, the toolkit was still pretty rudimentary: Lexica made it easy to search images, you had the AUTOMATIC1111 Web UI to generate locally, some HuggingFace spaces that offered inference, and eventually DALL-E 2 through OpenAI’s platform, but not much beyond basic text-to-image workflows. Today’s guest, Suhail Doshi, is trying to solve this with Playground AI, an image editor reimagined with AI in mind. Some of the differences compared to traditional text-to-image workflows: * Real-time preview rendering using consistency: as you change your prompt, you can see changes in real-time before doing a final rendering of it. * Style filtering: rather than having to prompt exactly how you’d like an image to look, you can pick from a whole range of filters both from Playground’s model as well as Stable Diffusion (like RealVis, Starlight XL, etc). We talk about this at 25:46 in the podcast. * Expand prompt: similar to DALL-E3, Playground will do some prompt tuning for you to get better results in generation. Unlike DALL-E3, you can turn this off at any time if you are a prompting wizard * Image editing: after generation, you have tools like a magic eraser, inpainting pencil, etc. This makes it easier to do a full workflow in Playground rather than switching to another tool like Photoshop. Outside of the product, they have also trained a new model from scratch, Playground v2, which is fully open source and open weights and allows for commercial usage. They benchmarked the model against SDXL across 1,000 prompts and found that humans preferred the Playground generation 70% of the time. They had similar results on PartiPrompts: They also created a new benchmark, MJHQ-30K, for “aesthetic quality”: We introduce a new benchmark, MJHQ-30K, for automatic evaluation of a model’s aesthetic quality. The benchmark computes FID on a high-quality dataset to gauge aesthetic quality. We curate the high-quality dataset from Midjourney with 10 common categories, each category with 3K samples. Following common practice, we use aesthetic score and CLIP score to ensure high image quality and high image-text alignment. Furthermore, we take extra care to make the data diverse within each category. Suhail was pretty open with saying that Midjourney is currently the best product for imagine generation out there, and that’s why they used it as the base for this benchmark. I think it's worth comparing yourself to maybe the best thing and try to find like a really fair way of doing that. So I think more people should try to do that. I definitely don't think you should be kind of comparing yourself on like some Google model or some old SD, Stable Diffusion model and be like, look, we beat Stable Diffusion 1.5. I think users ultimately want care, how close are you getting to the thing that people mostly agree with? [00:23:47] We also talked a lot about Suhail’s founder journey from starting Mixpanel in 2009, then going through YC again with Mighty, and eventually sunsetting that to pivot into Playground. Enjoy! Show Notes * Suhail’s Twitter * “Starting my road to learn AI” * Bill Gates book trip * Playground * Playground v2 Announcement * $40M raise announcement * “Running infra dev ops for 24 A100s” * Mixpanel * Mighty * “I decided to stop working on Mighty” * Fast.ai * Civit Timestamps * [00:00:00] Intros * [00:02:59] Being early in ML at Mixpanel * [00:04:16] Pivoting from Mighty to Playground and focusing on generative AI * [00:07:54] How DALL-E 2 inspir

More Episodes

See all »

Agents @ Work: Lindy.ai

Alessio will be at AWS re:Invent next week and hosting a casual coffee meetup on Wednesday, RSVP here! And subscribe to our calendar for our Singapore, NeurIPS, and all upcoming meetups! We are still taking questions for our next big recap episode! Submit questions and messages on Speakpipe here...

Published 11/15/24

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

Published 11/15/24

Agents @ Work: Dust.tt

We are recording our next big recap episode and taking questions! Submit questions and messages on Speakpipe here for a chance to appear on the show! Also subscribe to our calendar for our Singapore, NeurIPS, and all upcoming meetups! In our first ever episode with Logan Kilpatrick we called out...

Published 11/11/24