Achieving Instance-Dependent Sample Complexity for Constrained Markov Decision Process

Explore advanced reinforcement learning techniques for constrained Markov decision processes (CMDPs) in this 44-minute conference talk that addresses the critical challenge of satisfying safety and resource constraints in sequential learning environments. Learn about a groundbreaking framework that achieves optimal problem-dependent guarantees for CMDP problems, where an agent must navigate finite resources while learning unknown transition probabilities in a Markov decision process. Discover how the presented algorithm operates in primal space by resolving the primal linear program for CMDP problems online with adaptive remaining resource capacities. Examine three key algorithmic elements: characterization of instance hardness via LP basis, an elimination procedure for identifying optimal LP basis, and an adaptive resolving procedure that maintains the optimal basis while adjusting to remaining resources. Understand how this novel approach improves upon existing worst-case sample complexity bounds for CMDP problems, particularly in terms of accuracy level dependency, representing the first step toward deriving optimal problem-dependent guarantees in this domain.