Ep 14 - Interp, latent robustness, RLHF limitations w/ Stephen Casper (PhD AI researcher, MIT)
Description
We speak with Stephen Casper, or "Cas" as his friends call him. Cas is a PhD student at MIT in the Computer Science (EECS) department, in the Algorithmic Alignment Group advised by Prof Dylan Hadfield-Menell. Formerly, he worked with the Harvard Kreiman Lab and the Center for Human-Compatible AI (CHAI) at Berkeley. His work focuses on better understanding the internal workings of AI models (better known as “interpretability”), making them robust to various kinds of adversarial attacks, and ca...
We speak with Katja Grace. Katja is the co-founder and lead researcher at AI Impacts, a research group trying to answer key questions about the future of AI — when certain capabilities will arise, what will AI look like, how it will all go for humanity.We talk to Katja about:* How AI Impacts...
Published 06/19/24
We speak with Rob Miles. Rob is the host of the “Robert Miles AI Safety” channel on YouTube, the single most popular AI alignment video series out there — he has 145,000 subscribers and his top video has ~600,000 views. He goes much deeper than many educational resources out there on alignment,...
Published 03/08/24