Reinforcement Learning in Non-Stationary Environments

Explore the challenges and solutions of reinforcement learning in dynamic environments through this online lecture by Prof. Pranay Sharma from IIT Bombay's Centre for Machine Intelligence and Data Science. Delve into the complexities of non-stationary reinforcement learning within the infinite-horizon average-reward framework, where Markov Decision Processes feature time-varying rewards and transition probabilities. Learn about the limitations of existing model-based and model-free value-based methods in non-stationary settings, and discover why policy-based methods, despite their practical flexibility, remain theoretically underexplored in this domain. Examine the groundbreaking Non-Stationary Natural Actor-Critic (NS-NAC) algorithm, the first model-free policy-based approach designed specifically for non-stationary environments, featuring restart-based exploration for change detection and innovative learning rate interpretation as adapting factors. Understand the development of BORL-NS-NAC, a bandit-over-RL-based parameter-free algorithm that eliminates the need for prior knowledge of variation budgets. Gain insights from Prof. Sharma's extensive research background in federated learning, collaborative learning, stochastic optimization, reinforcement learning, and differential privacy, drawing from his experience as a Research Scientist at Carnegie Mellon University and his academic journey through Syracuse University and IIT Kanpur.