Episode 12: Jacob Steinhardt, UC Berkeley, on machine learning safety, alignment and measurement
Listen now
Description
Jacob Steinhardt (Google Scholar) (Website) is an assistant professor at UC Berkeley.  His main research interest is in designing machine learning systems that are reliable and aligned with human values.  Some of his specific research directions include robustness, rewards specification and reward hacking, as well as scalable alignment. Highlights: 📜“Test accuracy is a very limited metric.” 👨‍👩‍👧‍👦“You might not be able to get lots of feedback on human values.” 📊“I’m interested in measuring the progress in AI capabilities.”
More Episodes
Percy Liang is an associate professor of computer science and statistics at Stanford. These days, he’s interested in understanding how foundation models work, how to make them more efficient, modular, and robust, and how they shift the way people interact with AI—although he’s been working on...
Published 05/09/24
Seth Lazar is a professor of philosophy at the Australian National University, where he leads the Machine Intelligence and Normative Theory (MINT) Lab. His unique perspective bridges moral and political philosophy with AI, introducing much-needed rigor to the question of what will make for a good...
Published 03/12/24
Published 03/12/24