Michael Cohen on Input Tampering in Advanced RL Agents
Listen now
Description
Michael Cohen is is a DPhil student at the University of Oxford with Mike Osborne. He will be starting a postdoc with Professor Stuart Russell at UC Berkeley, with the Center for Human-Compatible AI. His research considers the expected behaviour of generally intelligent artificial agents, with a view to designing agents that we can expect to behave safely. You can see more links and a full transcript at www.hearthisidea.com/episodes/cohen. We discuss: What is reinforcement learning, and how is it different from supervised and unsupervised learning? Michael's recently co-authored paper titled 'Advanced artificial agents intervene in the provision of reward' Why might it be hard to convey what we really want to RL learners — even when we know exactly what we want? Why might advanced RL systems might tamper with their sources of input, and why could this be very bad? What assumptions need to hold for this "input tampering" outcome? Is reward really the optimisation target? Do models "get reward"? What's wrong with the analogy between RL systems and evolution? Key links: Michael's personal website 'Advanced artificial agents intervene in the provision of reward' by Michael K. Cohen, Marcus Hutter, and Michael A. Osborne 'Pessimism About Unknown Unknowns Inspires Conservatism' by Michael Cohen and Marcus Hutter 'Intelligence and Unambitiousness Using Algorithmic Information Theory' by Michael Cohen, Badri Vallambi, and Marcus Hutter 'Quantilizers: A Safer Alternative to Maximizers for Limited Optimization' by Jessica Taylor 'RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning' by Marc Rigter, Bruno Lacerda, and Nick Hawes 'Quantilizers: A Safer Alternative to Maximizers for Limited Optimization' by Jessica Taylor Season 40 of Survivor
More Episodes
Joe Carlsmith is a writer, researcher, and philosopher. He works as a senior research analyst at Open Philanthropy, where he focuses on existential risk from advanced artificial intelligence. He also writes independently about various topics in philosophy and futurism, and holds a doctorate in...
Published 03/16/24
Published 03/16/24
Eric Schwitzgebel is a professor of philosophy at the University of California, Riverside. His main interests include connections between empirical psychology and philosophy of mind and the nature of belief. His book The Weirdness of the World can be found here. We talk about: The possibility...
Published 02/04/24