Explore advanced reinforcement learning concepts through this comprehensive graduate-level course from the University of Waterloo's Computer Science department. Master the design of algorithms that enable machines to learn from partial, implicit, and delayed feedback rather than direct supervision. Delve into foundational topics including Markov decision processes, multi-armed bandits, and value iteration before progressing to cutting-edge areas like deep reinforcement learning, hierarchical reinforcement learning, and partially observable environments. Study both model-free and model-based approaches, policy gradient methods, actor-critic algorithms, and trust region optimization techniques. Examine practical applications through case studies in robotic control, autonomous vehicles, game playing (including Go, Chess, and FPS games), conversational agents, and visual navigation. Learn about specialized topics such as inverse reinforcement learning, memory-augmented networks, safe multi-agent systems, and lifelong learning in complex environments like Minecraft. Gain hands-on experience with modern tools including OpenAI environments, TensorFlow implementations, and deep Q-networks while understanding the theoretical foundations of Bayesian reinforcement learning, contextual bandits, and semi-Markov decision processes.

Syllabus

CS885 Lecture 1a: Course Introduction
CS885 Lecture 1b: Markov Processes
CS885 Lecture 2a: Markov Decision Processes
CS885 Lecture 2b: Value Iteration
CS885 Lecture 3a: Policy Iteration
CS885 Lecture 3b: Introduction to RL
CS885 Lecture 4a: Deep Neural Networks
CS885 Lecture 4b: Deep Q-Networks
CS885 Lecture 5: Conversational Agents (Nabiha Asghar)
CS885 Lecture 6a: OpenAI Environments (Mike Rudd)
CS885 Lecture 6b: DQN and TensorFlow (Timmy Tse)
CS885 Lecture 7a: Policy Gradient
CS885 Lecture 7b: Actor Critic
CS885 Lecture 8a: Multi-armed bandits
CS885 Lecture 8b: Bayesian and Contextual Bandits
CS885 Lecture 9: Model-based RL
CS885 Lecture 10: Bayesian RL
CS885 Lecture 11a: Hidden Markov Models
CS885 Lecture 11b: Partially Observable RL
CS885 Lecture 12: Deep Recurrent Q-Networks
CS885 Lecture 13a: Playing FPS Games with Deep RL (presenter: Mark Iwanchyshyn)
CS885 Lecture 13b: Lifelong Learning in Minecraft (Presenter: Yetian Wang)
CS885 Lecture 13c: Adversarial Search
CS885 Lecture 14a: Mastering the Game of Go (Presenter: Henry Chen)
CS885 Lecture 14b: Mastering Chess and Shogi (Presenter: Kira Selby)
CS885 Lecture 14c: Trust Region Methods
CS885 Lecture 15a: Trust Region Policy Optimization (Presenter: Shivam Kalra)
CS885 Lecture 15b: Proximal Policy Optimization (Presenter: Ruifan Yu)
CS885 Lecture 15c: Semi-Markov Decision Processes
CS885 Lecture 16a: The Option-Critic Architecture (Presenter: Zebin Kang)
CS885 Lecture 16b: FeUdal Networks for Hierarchical RL (Presenter: Rene Bidart)
CS885 Lecture 17a: Target-Driven Visual Navigation (Presenter: James Cagalawan)
CS885 Lecture 17b: Control of a Quadrotor (Presenter Nicole McNabb)
CS885 Lecture17c: Inverse Reinforcement Learning
CS885 Lecture 18a: Safe multi-agent RL for autonomous driving (Presenter: Ashish Gaurav)
CS885 Lecture 19a: End-to-end LSTM based dialog control (Presenter: Hamidreza Shahidi)
CS885 Lecture 19b: Learning cooperative visual dialog agents (Presenter: Nalin Chhibber)
CS885 Lecture 19c: Memory Augmented Networks
CS885 Lecture 20a: Neural map: structured memory for deep RL (Presenter: Andreas Stöckel)
CS885 Lecture 20b: Memory augmented control networks (Presenter: Aravind Balakrishnan)
CS885 Lecture 18b: Learning Driving Styles for Autonomous Vehicles (Presenter: Marko Ilievski)