1 - Adversarial Policies with Adam Gleave
Listen now
Description
Link to the paper - Adversarial Policies: Attacking Deep Reinforcement Learning Link to the transcript Adam's website Adam's twitter account
More Episodes
What's going on with deep learning? What sorts of models get learned, and what are the learning dynamics? Singular learning theory is a theory of Bayesian statistics broad enough in scope to encompass deep neural networks that may help answer these questions. In this episode, I speak with Daniel...
Published 05/07/24
Top labs use various forms of "safety training" on models before their release to make sure they don't do nasty stuff - but how robust is that? How can we ensure that the weights of powerful AIs don't get leaked or stolen? And what can AI even do these days? In this episode, I speak with Jeffrey...
Published 04/30/24