20 - 'Reform' AI Alignment with Scott Aaronson - Listen -

20 - 'Reform' AI Alignment with Scott Aaronson

Listen now

Description

How should we scientifically think about the impact of AI on human civilization, and whether or not it will doom us all? In this episode, I speak with Scott Aaronson about his views on how to make progress in AI alignment, as well as his work on watermarking the output of language models, and how he moved from a background in quantum complexity theory to working on AI. Note: this episode was recorded before this story emerged of a man committing suicide after discussions with a language-model-based chatbot, that included discussion of the possibility of him killing himself. Patreon: https://www.patreon.com/axrpodcast Store: https://store.axrp.net/ Ko-fi: https://ko-fi.com/axrpodcast Topics we discuss, and timestamps: 0:00:36 - 'Reform' AI alignment 0:01:52 - Epistemology of AI risk 0:20:08 - Immediate problems and existential risk 0:24:35 - Aligning deceitful AI 0:30:59 - Stories of AI doom 0:34:27 - Language models 0:43:08 - Democratic governance of AI 0:59:35 - What would change Scott's mind 1:14:45 - Watermarking language model outputs 1:41:41 - Watermark key secrecy and backdoor insertion 1:58:05 - Scott's transition to AI research 2:03:48 - Theoretical computer science and AI alignment 2:14:03 - AI alignment and formalizing philosophy 2:22:04 - How Scott finds AI research 2:24:53 - Following Scott's research The transcript Links to Scott's things: Personal website Book, Quantum Computing Since Democritus Blog, Shtetl-Optimized Writings we discuss: Reform AI Alignment Planting Undetectable Backdoors in Machine Learning Models

More Episodes

See all »

33 - RLHF Problems with Scott Emmons

Reinforcement Learning from Human Feedback, or RLHF, is one of the main ways that makers of large language models make them 'aligned'. But people have long noted that there are difficulties with this approach when the models are smarter than the humans providing feedback. In this episode, I talk...

Published 06/12/24

AXRP - the AI X-risk Research Podcast

Published 06/12/24

32 - Understanding Agency with Jan Kulveit

What's the difference between a large language model and the human brain? And what's wrong with our theories of agency? In this episode, I chat about these questions with Jan Kulveit, who leads the Alignment of Complex Systems research group. Patreon: patreon.com/axrpodcast Ko-fi:...

Published 05/30/24