“Catastrophic sabotage as a major threat model for human-level AI

“Catastrophic sabotage as a major threat model for human-level AI systems” by evhub

Listen now

Description

Thanks to Holden Karnofsky, David Duvenaud, and Kate Woolverton for useful discussions and feedback.Following up on our recent “Sabotage Evaluations for Frontier Models” paper, I wanted to share more of my personal thoughts on why I think catastrophic sabotage is important and why I care about it as a threat model. Note that this isn’t in any way intended to be a reflection of Anthropic's views or for that matter anyone's views but my own—it's just a collection of some of my personal thoughts...

More Episodes

See all »

“Information vs Assurance” by johnswentworth

In contract law, there's this thing called a “representation”. Example: as part of a contract to sell my house, I might “represent that” the house contains no asbestos. How is this different from me just, y’know, telling someone that the house contains no asbestos? Well, if it later turns out...

Published 11/27/24

LessWrong (Curated & Popular)

Published 11/27/24

“You are not too ‘irrational’ to know your preferences.” by DaystarEld

Epistemic Status: 13 years working as a therapist for a wide variety of populations, 5 of them working with rationalists and EA clients. 7 years teaching and directing at over 20 rationality camps and workshops. This is an extremely short and colloquially written form of points that could be...

Published 11/27/24