Scaling Deep Q-Learning - Part 2

This lecture explores the challenges and solutions in scaling deep Q-learning for robotics applications. Learn about the instability issues when training Q-functions with deep learning, including how updates affect both predicted and target Q-values that can lead to divergence. Discover stabilization techniques like target networks, which serve as delayed copies of Q-networks to keep target values fixed temporarily. The presentation addresses overestimation problems in Q-learning caused by maximization operations and introduces double Q-learning as a solution, where online Q-functions select actions while target networks evaluate them. Examine the "deadly triad" challenge of combining off-policy learning, bootstrapping, and function approximation, along with how n-step returns can reduce bias and improve training. The lecture concludes with modern Q-learning applications, featuring the QT-Opt algorithm for robotic grasping using multiple robot arms and cross-entropy methods for continuous action spaces, and the PQ-N algorithm that aims to reduce dependency on target networks and replay buffers.