9 - Finite Factored Sets with Scott Garrabrant
Listen now
Description
Being an agent can get loopy quickly. For instance, imagine that we're playing chess and I'm trying to decide what move to make. Your next move influences the outcome of the game, and my guess of that influences my move, which influences your next move, which influences the outcome of the game. How can we model these dependencies in a general way, without baking in primitive notions of 'belief' or 'agency'? Today, I talk with Scott Garrabrant about his recent work on finite factored sets that aims to answer this question. Topics we discuss: 00:00:43 - finite factored sets' relation to Pearlian causality and abstraction 00:16:00 - partitions and factors in finite factored sets 00:26:45 - orthogonality and time in finite factored sets 00:34:49 - using finite factored sets 00:37:53 - why not infinite factored sets? 00:45:28 - limits of, and follow-up work on, finite factored sets 01:00:59 - relevance to embedded agency and x-risk 01:10:40 - how Scott researches 01:28:34 - relation to Cartesian frames 01:37:36 - how to follow Scott's work Link to the transcript Link to a transcript of Scott's talk on finite factored sets Scott's LessWrong account Other work mentioned in the discussion: Causality, by Judea Pearl Scott's work on Cartesian frames
More Episodes
Reinforcement Learning from Human Feedback, or RLHF, is one of the main ways that makers of large language models make them 'aligned'. But people have long noted that there are difficulties with this approach when the models are smarter than the humans providing feedback. In this episode, I talk...
Published 06/12/24
What's the difference between a large language model and the human brain? And what's wrong with our theories of agency? In this episode, I chat about these questions with Jan Kulveit, who leads the Alignment of Complex Systems research group. Patreon: patreon.com/axrpodcast Ko-fi:...
Published 05/30/24