“Hierarchical Agency: A Missing Piece in AI Alignment” by

“Hierarchical Agency: A Missing Piece in AI Alignment” by Jan_Kulveit

Listen now

Description

The most ambitious direction I've worked on in the past few years is a theory of hierarchical agency (as mentioned in the ACS announcement). Recently, I've become worried that the inferential distance between "the median of current AI safety discourse" and "what I and my colleagues at ACS work on" has become too large. After multiple attempts to write a good introduction to hierarchical agency ended up in perpetual draft form, I decided to try something else: explain it to Claude. This is a lightly edited transcript of the first part of our conversation. Claude Shall we begin? Could you start by explaining what you mean by "hierarchical agency" and why you think it's important for AI safety? Jan Broadly, there is a pattern in the world where you can often see agents composed of other agents. Like, corporations and their departments, states and citiziens, and [...] The original text contained 1 footnote which was omitted from this narration. --- First published: November 27th, 2024 Source: https://www.lesswrong.com/posts/xud7Mti9jS4tbWqQE/hierarchical-agency-a-missing-piece-in-ai-alignment --- Narrated by TYPE III AUDIO.

More Episodes

See all »

“Deep Causal Transcoding: A Framework for Mechanistically Eliciting Latent Behaviors in Language Models” by Andrew Mack, TurnTrout

Audio note: this article contains 449 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description. Based off research performed in the MATS 5.1 extension program, under the mentorship of Alex Turner (TurnTrout). Research...

Published 12/04/24

“Cognitive Biases Contributing to AI X-risk — a deleted excerpt from my 2018 ARCHES draft” by Andrew_Critch

Preface Several friends have asked me about what psychological effects I think could affect human judgement about x-risk. This isn't a complete answer, but in 2018 I wrote a draft of "AI Research Considerations for Human Existential Safety" (ARCHES) that included an overview of cognitive biases...

Published 12/04/24

“Book a Time to Chat about Interp Research” by Logan Riggs

In the spirit of the season, you can book a call with me to help w/ your interp project (no large coding though) Would you like someone to: Review your paper or code? Brainstorm ideas on next steps? How to best communicate your results? Discuss conceptual problems Obvious Advice (e.g. being...

Published 12/03/24