Markov Decision Processes of the Third Kind - Learning Distributions by Policy Gradient Methods

Explore distributional Markov Decision Processes in this 30-minute conference lecture that examines a novel class of control problems where the objective shifts from optimizing expected values to learning policies that guide cumulative reward distributions toward specific target laws. Delve into the mathematical framework of these "third kind" Markov Decision Processes and discover how they differ from traditional approaches by focusing on distributional outcomes rather than risk functionals. Learn about a proposed model-free policy-gradient algorithm that utilizes neural-network parameterizations of randomized Markov policies on augmented state spaces, combined with sample-based evaluation of characteristic-function loss. Examine the theoretical foundations including convergence proofs to stationary points using stochastic approximation techniques under mild regularity and growth assumptions. Review numerical experiments demonstrating the method's capability to match complex target distributions, and gain insights from collaborative research involving distributional control theory and its practical applications in machine learning and stochastic optimization.