RobotLearning: Scaling Offline Reinforcement Learning

This lecture explores offline reinforcement learning, focusing on its ability to learn from existing datasets without the data inefficiency and divergence issues common in online RL. Learn how to train policies from offline data without divergence, similar to supervised learning approaches. Discover the concept of "stitching" trajectories—a unique RL advantage that allows optimal paths to be constructed from separate data segments using the Markov property—while understanding the practical challenges this presents, especially with partial observations. Examine model-based RL as a potential solution, along with its limitations regarding error accumulation in long-horizon planning. The lecture introduces the Decision Transformer, a supervised learning approach that uses returns as input to generate trajectories while minimizing sequence errors, and discusses its limitations in stitching and handling stochasticity. The presentation also covers recent research on adapting offline RL methods to large transformers, incorporating offline data to improve early training performance, and performing offline-to-online RL without maintaining the original offline dataset. This 1-hour and 47-minute talk from Montreal Robotics provides comprehensive insights into scaling offline reinforcement learning for robotics applications.