1 - Adversarial Policies with Adam Gleave - Listen - AXRP - the AI

1 - Adversarial Policies with Adam Gleave

Listen now

Description

Link to the paper - Adversarial Policies: Attacking Deep Reinforcement Learning Link to the transcript Adam's website Adam's twitter account

More Episodes

See all »

35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization

How do we figure out what large language models believe? In fact, do they even have beliefs? Do those beliefs have locations, and if so, can we edit those locations to change the beliefs? Also, how are we going to get AI to perform tasks so hard that we can't figure out if they succeeded at them?...

Published 08/24/24

34 - AI Evaluations with Beth Barnes

How can we figure out if AIs are capable enough to pose a threat to humans? When should we make a big effort to mitigate risks of catastrophic AI misbehaviour? In this episode, I chat with Beth Barnes, founder of and head of research at METR, about these questions and more. Patreon:...

Published 07/28/24

AXRP - the AI X-risk Research Podcast

Published 07/28/24