This episode explores reinforcement learning and its relationship to MDPs. Also mentioned: exploration v. exploitation, multi-arm bandits, model-free learning, q-learning.
Disclosure: This episode was generated using NotebookLM by uploading Professor Chris Callison-Burch's lecture notes and slides.
Published 10/22/24
This episode explores MDPs, covering stochastic environments, transition functions, reward functions, policies, value iteration, policy iteration, expected utility, finite vs. infinite horizons, discount factors, etc.
Disclosure: This episode was generated using NotebookLM by uploading Professor...
Published 10/05/24