Episodes
Epoch AI is the premier organization that tracks the trajectory of AI - how much compute is used, the role of algorithmic improvements, the growth in data used, and when the above trends might hit an end. In this episode, I speak with the director of Epoch AI, Jaime Sevilla, about how compute, data, and algorithmic improvements are impacting AI, and whether continuing to scale can get us AGI.
Patreon: https://www.patreon.com/axrpodcast
Ko-fi: https://ko-fi.com/axrpodcast
The transcript: ...
Published 10/04/24
Sometimes, people talk about transformers as having "world models" as a result of being trained to predict text data on the internet. But what does this even mean? In this episode, I talk with Adam Shai and Paul Riechers about their work applying computational mechanics, a sub-field of physics studying how to predict random processes, to neural networks.
Patreon: https://www.patreon.com/axrpodcast
Ko-fi: https://ko-fi.com/axrpodcast
The transcript: ...
Published 09/29/24
Patreon: https://www.patreon.com/axrpodcast
MATS: https://www.matsprogram.org
Note: I'm employed by MATS, but they're not paying me to make this video.
Published 09/28/24
How do we figure out what large language models believe? In fact, do they even have beliefs? Do those beliefs have locations, and if so, can we edit those locations to change the beliefs? Also, how are we going to get AI to perform tasks so hard that we can't figure out if they succeeded at them? In this episode, I chat with Peter Hase about his research into these questions.
Patreon: https://www.patreon.com/axrpodcast
Ko-fi: https://ko-fi.com/axrpodcast
The transcript: ...
Published 08/24/24
How can we figure out if AIs are capable enough to pose a threat to humans? When should we make a big effort to mitigate risks of catastrophic AI misbehaviour? In this episode, I chat with Beth Barnes, founder of and head of research at METR, about these questions and more.
Patreon: patreon.com/axrpodcast
Ko-fi: ko-fi.com/axrpodcast
The transcript: https://axrp.net/episode/2024/07/28/episode-34-ai-evaluations-beth-barnes.html
Topics we discuss, and timestamps:
0:00:37 - What is...
Published 07/28/24
Reinforcement Learning from Human Feedback, or RLHF, is one of the main ways that makers of large language models make them 'aligned'. But people have long noted that there are difficulties with this approach when the models are smarter than the humans providing feedback. In this episode, I talk with Scott Emmons about his work categorizing the problems that can show up in this setting.
Patreon: patreon.com/axrpodcast
Ko-fi: ko-fi.com/axrpodcast
The transcript: ...
Published 06/12/24
What's the difference between a large language model and the human brain? And what's wrong with our theories of agency? In this episode, I chat about these questions with Jan Kulveit, who leads the Alignment of Complex Systems research group.
Patreon: patreon.com/axrpodcast
Ko-fi: ko-fi.com/axrpodcast
The transcript: axrp.net/episode/2024/05/30/episode-32-understanding-agency-jan-kulveit.html
Topics we discuss, and timestamps:
0:00:47 - What is active inference?
0:15:14 - Preferences...
Published 05/30/24
What's going on with deep learning? What sorts of models get learned, and what are the learning dynamics? Singular learning theory is a theory of Bayesian statistics broad enough in scope to encompass deep neural networks that may help answer these questions. In this episode, I speak with Daniel Murfet about this research program and what it tells us.
Patreon: patreon.com/axrpodcast
Ko-fi: ko-fi.com/axrpodcast
Topics we discuss, and timestamps:
0:00:26 - What is singular learning theory?
...
Published 05/07/24
Top labs use various forms of "safety training" on models before their release to make sure they don't do nasty stuff - but how robust is that? How can we ensure that the weights of powerful AIs don't get leaked or stolen? And what can AI even do these days? In this episode, I speak with Jeffrey Ladish about security and AI.
Patreon: patreon.com/axrpodcast
Ko-fi: ko-fi.com/axrpodcast
Topics we discuss, and timestamps:
0:00:38 - Fine-tuning away safety training
0:13:50 - Dangers of open...
Published 04/30/24
In 2022, it was announced that a fairly simple method can be used to extract the true beliefs of a language model on any given topic, without having to actually understand the topic at hand. Earlier, in 2021, it was announced that neural networks sometimes 'grok': that is, when training them on certain tasks, they initially memorize their training data (achieving their training goal in a way that doesn't generalize), but then suddenly switch to understanding the 'real' solution in a way that...
Published 04/25/24
How should the law govern AI? Those concerned about existential risks often push either for bans or for regulations meant to ensure that AI is developed safely - but another approach is possible. In this episode, Gabriel Weil talks about his proposal to modify tort law to enable people to sue AI companies for disasters that are "nearly catastrophic".
Patreon: patreon.com/axrpodcast
Ko-fi: ko-fi.com/axrpodcast
Topics we discuss, and timestamps:
0:00:35 - The basic idea
0:20:36 - Tort...
Published 04/17/24
A lot of work to prevent AI existential risk takes the form of ensuring that AIs don't want to cause harm or take over the world---or in other words, ensuring that they're aligned. In this episode, I talk with Buck Shlegeris and Ryan Greenblatt about a different approach, called "AI control": ensuring that AI systems couldn't take over the world, even if they were trying to.
Patreon: patreon.com/axrpodcast
Ko-fi: ko-fi.com/axrpodcast
Topics we discuss, and timestamps:
0:00:31 - What is...
Published 04/11/24
The events of this year have highlighted important questions about the governance of artificial intelligence. For instance, what does it mean to democratize AI? And how should we balance benefits and dangers of open-sourcing powerful AI systems such as large language models? In this episode, I speak with Elizabeth Seger about her research on these questions.
Patreon: patreon.com/axrpodcast
Ko-fi: ko-fi.com/axrpodcast
Topics we discuss, and timestamps:
0:00:40 - What kinds of AI? 0:01:30...
Published 11/26/23
Imagine a world where there are many powerful AI systems, working at cross purposes. You could suppose that different governments use AIs to manage their militaries, or simply that many powerful AIs have their own wills. At any rate, it seems valuable for them to be able to cooperatively work together and minimize pointless conflict. How do we ensure that AIs behave this way - and what do we need to learn about how rational agents interact to make that more clear? In this episode, I'll be...
Published 10/03/23
Recently, OpenAI made a splash by announcing a new "Superalignment" team. Lead by Jan Leike and Ilya Sutskever, the team would consist of top researchers, attempting to solve alignment for superintelligent AIs in four years by figuring out how to build a trustworthy human-level AI alignment researcher, and then using it to solve the rest of the problem. But what does this plan actually involve? In this episode, I talk to Jan Leike about the plan and the challenges it faces.
Patreon:...
Published 07/27/23
Is there some way we can detect bad behaviour in our AI system without having to know exactly what it looks like? In this episode, I speak with Mark Xu about mechanistic anomaly detection: a research direction based on the idea of detecting strange things happening in neural networks, in the hope that that will alert us of potential treacherous turns. We both talk about the core problems of relating these mechanistic anomalies to bad behaviour, as well as the paper "Formalizing the...
Published 07/27/23
Very brief survey: bit.ly/axrpsurvey2023
Store is closing in a week! Link: store.axrp.net/
Patreon: patreon.com/axrpodcast
Ko-fi: ko-fi.com/axrpodcast
Published 06/28/23
What can we learn about advanced deep learning systems by understanding how humans learn and form values over their lifetimes? Will superhuman AI look like ruthless coherent utility optimization, or more like a mishmash of contextually activated desires? This episode's guest, Quintin Pope, has been thinking about these questions as a leading researcher in the shard theory community. We talk about what shard theory is, what it says about humans and neural networks, and what the implications...
Published 06/15/23
Lots of people in the field of machine learning study 'interpretability', developing tools that they say give us useful information about neural networks. But how do we know if meaningful progress is actually being made? What should we want out of these tools? In this episode, I speak to Stephen Casper about these questions, as well as about a benchmark he's co-developed to evaluate whether interpretability tools can find 'Trojan horses' hidden inside neural nets.
Patreon:...
Published 05/02/23
How should we scientifically think about the impact of AI on human civilization, and whether or not it will doom us all? In this episode, I speak with Scott Aaronson about his views on how to make progress in AI alignment, as well as his work on watermarking the output of language models, and how he moved from a background in quantum complexity theory to working on AI.
Note: this episode was recorded before this story emerged of a man committing suicide after discussions with a...
Published 04/12/23
Store: https://store.axrp.net/
Patreon: https://www.patreon.com/axrpodcast
Ko-fi: https://ko-fi.com/axrpodcast
Video: https://www.youtube.com/watch?v=kmPFjpEibu0
Published 02/07/23
How good are we at understanding the internal computation of advanced machine learning models, and do we have a hope at getting better? In this episode, Neel Nanda talks about the sub-field of mechanistic interpretability research, as well as papers he's contributed to that explore the basics of transformer circuits, induction heads, and grokking.
Topics we discuss, and timestamps:
00:01:05 - What is mechanistic interpretability? 00:24:16 - Types of AI cognition 00:54:27 - Automating...
Published 02/04/23
I have a new podcast, where I interview whoever I want about whatever I want. It's called "The Filan Cabinet", and you can find it wherever you listen to podcasts. The first three episodes are about pandemic preparedness, God, and cryptocurrency. For more details, check out the podcast website, or search "The Filan Cabinet" in your podcast app.
Published 10/13/22