25 - Cooperative AI with Caspar Oesterheld
Listen now
Description
Imagine a world where there are many powerful AI systems, working at cross purposes. You could suppose that different governments use AIs to manage their militaries, or simply that many powerful AIs have their own wills. At any rate, it seems valuable for them to be able to cooperatively work together and minimize pointless conflict. How do we ensure that AIs behave this way - and what do we need to learn about how rational agents interact to make that more clear? In this episode, I'll be speaking with Caspar Oesterheld about some of his research on this very topic. Patreon: patreon.com/axrpodcast Ko-fi: ko-fi.com/axrpodcast Topics we discuss, and timestamps: 0:00:34 - Cooperative AI 0:06:21 - Cooperative AI vs standard game theory 0:19:45 - Do we need cooperative AI if we get alignment? 0:29:29 - Cooperative AI and agent foundations 0:34:59 - A Theory of Bounded Inductive Rationality 0:50:05 - Why it matters 0:53:55 - How the theory works 1:01:38 - Relationship to logical inductors 1:15:56 - How fast does it converge? 1:19:46 - Non-myopic bounded rational inductive agents? 1:24:25 - Relationship to game theory 1:30:39 - Safe Pareto Improvements 1:30:39 - What they try to solve 1:36:15 - Alternative solutions 1:40:46 - How safe Pareto improvements work 1:51:19 - Will players fight over which safe Pareto improvement to adopt? 2:06:02 - Relationship to program equilibrium 2:11:25 - Do safe Pareto improvements break themselves? 2:15:52 - Similarity-based Cooperation 2:23:07 - Are similarity-based cooperators overly cliqueish? 2:27:12 - Sensitivity to noise 2:29:41 - Training neural nets to do similarity-based cooperation 2:50:25 - FOCAL, Caspar's research lab 2:52:52 - How the papers all relate 2:57:49 - Relationship to functional decision theory 2:59:45 - Following Caspar's research The transcript: axrp.net/episode/2023/10/03/episode-25-cooperative-ai-caspar-oesterheld.html Links for Caspar: FOCAL at CMU: www.cs.cmu.edu/~focal/ Caspar on X, formerly known as Twitter: twitter.com/C_Oesterheld Caspar's blog: casparoesterheld.com/ Caspar on Google Scholar: scholar.google.com/citations?user=xeEcRjkAAAAJ&hl=en&oi=ao Research we discuss: A Theory of Bounded Inductive Rationality: arxiv.org/abs/2307.05068 Safe Pareto improvements for delegated game playing: link.springer.com/article/10.1007/s10458-022-09574-6 Similarity-based Cooperation: arxiv.org/abs/2211.14468 Logical Induction: arxiv.org/abs/1609.03543 Program Equilibrium: citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=e1a060cda74e0e3493d0d81901a5a796158c8410 Formalizing Objections against Surrogate Goals: www.alignmentforum.org/posts/K4FrKRTrmyxrw5Dip/formalizing-objections-against-surrogate-goals Learning with Opponent-Learning Awareness: arxiv.org/abs/1709.04326
More Episodes
The events of this year have highlighted important questions about the governance of artificial intelligence. For instance, what does it mean to democratize AI? And how should we balance benefits and dangers of open-sourcing powerful AI systems such as large language models? In this episode, I...
Published 11/26/23
Recently, OpenAI made a splash by announcing a new "Superalignment" team. Lead by Jan Leike and Ilya Sutskever, the team would consist of top researchers, attempting to solve alignment for superintelligent AIs in four years by figuring out how to build a trustworthy human-level AI alignment...
Published 07/27/23