12 - AI Existential Risk with Paul Christiano - Listen - AXRP -

12 - AI Existential Risk with Paul Christiano

Listen now

Description

Why would advanced AI systems pose an existential risk, and what would it look like to develop safer systems? In this episode, I interview Paul Christiano about his views of how AI could be so dangerous, what bad AI scenarios could look like, and what he thinks about various techniques to reduce this risk. Topics we discuss, and timestamps (due to mp3 compression, the timestamps may be tens of seconds off): 00:00:38 - How AI may pose an existential threat 00:13:36 - AI timelines 00:24:49 - Why we might build risky AI 00:33:58 - Takeoff speeds 00:51:33 - Why AI could have bad motivations 00:56:33 - Lessons from our current world 01:08:23 - "Superintelligence" 01:15:21 - Technical causes of AI x-risk 01:19:32 - Intent alignment 01:33:52 - Outer and inner alignment 01:43:45 - Thoughts on agent foundations 01:49:35 - Possible technical solutions to AI x-risk 01:49:35 - Imitation learning, inverse reinforcement learning, and ease of evaluation 02:00:34 - Paul's favorite outer alignment solutions 02:01:20 - Solutions researched by others 02:06:13 - Decoupling planning from knowledge 02:17:18 - Factored cognition 02:25:34 - Possible solutions to inner alignment 02:31:56 - About Paul 02:31:56 - Paul's research style 02:36:36 - Disagreements and uncertainties 02:46:08 - Some favorite organizations 02:48:21 - Following Paul's work The transcript Paul's blog posts on AI alignment Material that we mention: Cold Takes - The Most Important Century Open Philanthropy reports on: Modeling the human trajectory The computational power of the human brain AI timelines (draft) Whether AI could drive explosive economic growth Takeoff speeds Superintelligence: Paths, Dangers, Strategies Wei Dai on metaphilosophical competence: Two neglected problems in human-AI safety The argument from philosophical difficulty Some thoughts on metaphilosophy AI safety via debate Iterated distillation and amplification Scalable agent alignment via reward modeling: a research direction Learning the prior Imitative generalisation (AKA 'learning the prior') When is unaligned AI morally valuable?

More Episodes

See all »

33 - RLHF Problems with Scott Emmons

Reinforcement Learning from Human Feedback, or RLHF, is one of the main ways that makers of large language models make them 'aligned'. But people have long noted that there are difficulties with this approach when the models are smarter than the humans providing feedback. In this episode, I talk...

Published 06/12/24

AXRP - the AI X-risk Research Podcast

Published 06/12/24

32 - Understanding Agency with Jan Kulveit

What's the difference between a large language model and the human brain? And what's wrong with our theories of agency? In this episode, I chat about these questions with Jan Kulveit, who leads the Alignment of Complex Systems research group. Patreon: patreon.com/axrpodcast Ko-fi:...

Published 05/30/24