Episode 08: Giancarlo Kerg, Mila, on approaching deep learning from mathematical foundations
Listen now
Description
Giancarlo Kerg (Google Scholar) is a PhD student at Mila, supervised by Yoshua Bengio and Guillaume Lajoie.  He is working on out-of-distribution generalization and modularity in memory-augmented neural networks.  Prior to his PhD, he studied pure mathematics at Cambridge and Université Libre de Bruxelles. His most recent paper at NeurIPS is Untangling tradeoffs between recurrence and self-attention in neural networks.  It presents a proof for how self-attention mitigates the gradient vanishing problem when trying to capture long-term dependencies. Building on this, it proposes a way to scalably use sparse self-attention with recurrence, via a relevancy screening mechanism that mirrors the cognitive process of memory consolidation. Highlights from our conversation: 🧮 Pure math foundations as an approach to progress and structural understanding in deep learning research 🧠 How a formal proof on the way self-attention mitigates gradient vanishing when capturing long-term dependencies in RNNs led to a relevancy screening mechanism resembling human memory consolidation 🎯 Out-of-distribution generalization through modularity and inductive biases 🏛 Working at Mila with Yoshua Bengio and other collaborators
More Episodes
Percy Liang is an associate professor of computer science and statistics at Stanford. These days, he’s interested in understanding how foundation models work, how to make them more efficient, modular, and robust, and how they shift the way people interact with AI—although he’s been working on...
Published 05/09/24
Seth Lazar is a professor of philosophy at the Australian National University, where he leads the Machine Intelligence and Normative Theory (MINT) Lab. His unique perspective bridges moral and political philosophy with AI, introducing much-needed rigor to the question of what will make for a good...
Published 03/12/24
Published 03/12/24