1 - Adversarial Policies with Adam Gleave
Listen now
Description
Link to the paper - Adversarial Policies: Attacking Deep Reinforcement Learning Link to the transcript Adam's website Adam's twitter account
More Episodes
How do we figure out what large language models believe? In fact, do they even have beliefs? Do those beliefs have locations, and if so, can we edit those locations to change the beliefs? Also, how are we going to get AI to perform tasks so hard that we can't figure out if they succeeded at them?...
Published 08/24/24
How can we figure out if AIs are capable enough to pose a threat to humans? When should we make a big effort to mitigate risks of catastrophic AI misbehaviour? In this episode, I chat with Beth Barnes, founder of and head of research at METR, about these questions and more. Patreon:...
Published 07/28/24