Markov Decision Processes of the Third Kind - Learning Distributions by Policy Gradient Methods
Erwin Schrödinger International Institute for Mathematics and Physics (ESI) via YouTube
-
14
-
- Write review
2,000+ Free Courses with Certificates: Coding, AI, SQL, and More
The Private Equity Associate Certification
Overview
Google, IBM & Meta Certificates – 40% Off
One plan covers every Professional Certificate on Coursera.
Unlock All Certificates
Explore distributional Markov Decision Processes in this 30-minute conference lecture that examines a novel class of control problems where the objective shifts from optimizing expected values to learning policies that guide cumulative reward distributions toward specific target laws. Delve into the mathematical framework of these "third kind" Markov Decision Processes and discover how they differ from traditional approaches by focusing on distributional outcomes rather than risk functionals. Learn about a proposed model-free policy-gradient algorithm that utilizes neural-network parameterizations of randomized Markov policies on augmented state spaces, combined with sample-based evaluation of characteristic-function loss. Examine the theoretical foundations including convergence proofs to stationary points using stochastic approximation techniques under mild regularity and growth assumptions. Review numerical experiments demonstrating the method's capability to match complex target distributions, and gain insights from collaborative research involving distributional control theory and its practical applications in machine learning and stochastic optimization.
Syllabus
Nicole Bäuerle - Markov Decision Processes of the Third Kind: Learning Distributions by Policy...
Taught by
Erwin Schrödinger International Institute for Mathematics and Physics (ESI)