Jonathan Frankle: From Lottery Tickets to LLMs
Listen now
Description
In episode 96 of The Gradient Podcast, Daniel Bashir speaks to Jonathan Frankle. Jonathan is the Chief Scientist at MosaicML and (as of release). Jonathan completed his PhD at MIT, where he investigated the properties of sparse neural networks that allow them to train effectively through his lottery ticket hypothesis. He also spends a portion of his time working on technology policy, and currently works with the OECD to implement the AI principles he helped develop in 2019. Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at [email protected] Subscribe to The Gradient Podcast:  Apple Podcasts  | Spotify | Pocket Casts | RSSFollow The Gradient on Twitter Outline: * (00:00) Intro * (02:35) Jonathan’s background and work * (04:25) Origins of the Lottery Ticket Hypothesis * (06:00) Jonathan’s empiricism and approach to science * (08:25) More Karl Popper discourse + hot takes * (09:45) Walkthrough of the Lottery Ticket Hypothesis * (12:00) Issues with the Lottery Ticket Hypothesis as a statement * (12:30) Jonathan’s advice for PhD students, on asking good questions * (15:55) Strengths and Promise of the Lottery Ticket Hypothesis * (18:55) More Lottery Ticket Hypothesis Papers * (19:10) Comparing Rewinding and Fine-tuning * (23:00) Care in making experimental choices * (25:05) Linear Mode Connectivity and the Lottery Ticket Hypothesis * (27:50) On what is being measured and how * (28:50) “The outcome of optimization is determined to a linearly connected region” * (31:15) On good metrics * (32:54) On the Predictability of Pruning Across Scales — scaling laws for pruning * (34:40) The paper’s takeaway * (38:45) Pruning Neural Networks at Initialization — on a scientific disagreement * (45:00) On making takedown papers useful * (46:15) On what can be known early in training * (49:15) Jonathan’s perspective on important research questions today * (54:40) MosaicML * (55:19) How Mosaic got started * (56:17) Mosaic highlights * (57:33) Customer stories * (1:00:30) Jonathan’s work and perspectives on AI policy * (1:05:45) The key question: what we want * (1:07:35) Outro Links: * Jonathan’s homepage and Twitter * Papers * The Lottery Ticket Hypothesis and follow-up work * Comparing Rewinding and Fine-tuning in Neural Network Pruning * Linear Mode Connectivity and the LTH * On the Predictability of Pruning Across Scales * Pruning Neural Networks at Initialization: Why Are We Missing The Mark? * Desirable Inefficiency Get full access to The Gradient at thegradientpub.substack.com/subscribe
More Episodes
Episode 140 I spoke with Professor Jacob Andreas about: * Language and the world * World models * How he’s developed as a scientist Enjoy! Jacob is an associate professor at MIT in the Department of Electrical Engineering and Computer Science as well as the Computer Science and Artificial...
Published 10/10/24
Episode 139 I spoke with Evan Ratliff about: * Shell Game, Evan’s new podcast, where he creates an AI voice clone of himself and sets it loose. * The end of the Longform Podcast and his thoughts on the state of journalism. Enjoy! Evan is an award-winning investigative journalist, bestselling...
Published 09/26/24