12 - AI Existential Risk with Paul Christiano
Listen now
Description
Why would advanced AI systems pose an existential risk, and what would it look like to develop safer systems? In this episode, I interview Paul Christiano about his views of how AI could be so dangerous, what bad AI scenarios could look like, and what he thinks about various techniques to reduce this risk. Topics we discuss, and timestamps (due to mp3 compression, the timestamps may be tens of seconds off): 00:00:38 - How AI may pose an existential threat 00:13:36 - AI timelines 00:24:49 - Why we might build risky AI 00:33:58 - Takeoff speeds 00:51:33 - Why AI could have bad motivations 00:56:33 - Lessons from our current world 01:08:23 - "Superintelligence" 01:15:21 - Technical causes of AI x-risk 01:19:32 - Intent alignment 01:33:52 - Outer and inner alignment 01:43:45 - Thoughts on agent foundations 01:49:35 - Possible technical solutions to AI x-risk 01:49:35 - Imitation learning, inverse reinforcement learning, and ease of evaluation 02:00:34 - Paul's favorite outer alignment solutions 02:01:20 - Solutions researched by others 02:06:13 - Decoupling planning from knowledge 02:17:18 - Factored cognition 02:25:34 - Possible solutions to inner alignment 02:31:56 - About Paul 02:31:56 - Paul's research style 02:36:36 - Disagreements and uncertainties 02:46:08 - Some favorite organizations 02:48:21 - Following Paul's work The transcript Paul's blog posts on AI alignment Material that we mention: Cold Takes - The Most Important Century Open Philanthropy reports on: Modeling the human trajectory The computational power of the human brain AI timelines (draft) Whether AI could drive explosive economic growth Takeoff speeds Superintelligence: Paths, Dangers, Strategies Wei Dai on metaphilosophical competence: Two neglected problems in human-AI safety The argument from philosophical difficulty Some thoughts on metaphilosophy AI safety via debate Iterated distillation and amplification Scalable agent alignment via reward modeling: a research direction Learning the prior Imitative generalisation (AKA 'learning the prior') When is unaligned AI morally valuable?
More Episodes
In 2022, it was announced that a fairly simple method can be used to extract the true beliefs of a language model on any given topic, without having to actually understand the topic at hand. Earlier, in 2021, it was announced that neural networks sometimes 'grok': that is, when training them on...
Published 04/25/24