87. Evan Hubinger - The Inner Alignment Problem
Listen now
Description
How can you know that a super-intelligent AI is trying to do what you asked it to do? The answer, it turns out, is: not easily. And unfortunately, an increasing number of AI safety researchers are warning that this is a problem we’re going to have to solve sooner rather than later, if we want to avoid bad outcomes — which may include a species-level catastrophe. The type of failure mode whereby AIs optimize for things other than those we ask them to is known as an inner alignment failure in the context of AI safety. It’s distinct from outer alignment failure, which is what happens when you ask your AI to do something that turns out to be dangerous, and it was only recognized by AI safety researchers as its own category of risk in 2019. And the researcher who led that effort is my guest for this episode of the podcast, Evan Hubinger. Evan is an AI safety veteran who’s done research at leading AI labs like OpenAI, and whose experience also includes stints at Google, Ripple and Yelp. He currently works at the Machine Intelligence Research Institute (MIRI) as a Research Fellow, and joined me to talk about his views on AI safety, the alignment problem, and whether humanity is likely to survive the advent of superintelligent AI.
More Episodes
On the last episode of the Towards Data Science Podcast, host Jeremie Harris offers his perspective on the last two years of AI progress, and what he thinks it means for everything, from AI safety to the future of humanity. Going forward, Jeremie will be exploring these topics on the new...
Published 10/19/22
Published 10/19/22
Progress in AI has been accelerating dramatically in recent years, and even months. It seems like every other day, there’s a new, previously-believed-to-be-impossible feat of AI that’s achieved by a world-leading lab. And increasingly, these breakthroughs have been driven by the same, simple...
Published 10/12/22