Joe Carlsmith on Scheming AI
Listen now
Description
Joe Carlsmith is a writer, researcher, and philosopher. He works as a senior research analyst at Open Philanthropy, where he focuses on existential risk from advanced artificial intelligence. He also writes independently about various topics in philosophy and futurism, and holds a doctorate in philosophy from the University of Oxford. You can find links and a transcript at www.hearthisidea.com/episodes/carlsmith In this episode we talked about a report Joe recently authored, titled ‘Scheming AIs: Will AIs fake alignment during training in order to get power?’. The report “examines whether advanced AIs that perform well in training will be doing so in order to gain power later”; a behaviour Carlsmith calls scheming. We talk about: Distinguishing ways AI systems can be deceptive and misaligned Why powerful AI systems might acquire goals that go beyond what they’re trained to do, and how those goals could lead to scheming Why scheming goals might perform better (or worse) in training than less worrying goals The ‘counting argument’ for scheming AI Why goals that lead to scheming might be simpler than the goals we intend Things Joe is still confused about, and research project ideas You can get in touch through our website or on Twitter. Consider leaving us an honest review wherever you're listening to this — it's the best free way to support the show. Thanks for listening!
More Episodes
Published 03/16/24
Eric Schwitzgebel is a professor of philosophy at the University of California, Riverside. His main interests include connections between empirical psychology and philosophy of mind and the nature of belief. His book The Weirdness of the World can be found here. We talk about: The possibility...
Published 02/04/24
Sonia Ben Ouagrham-Gormley is an associate professor at George Mason University and Deputy Director of their Biodefence Programme In this episode we talk about: Where the belief that 'bioweapons are easy to make' came from and why it has been difficult to change Why transferring tacit...
Published 12/19/23