Markov Decision Processes of the Third Kind - Learning Distributions by Policy Gradient Methods
Erwin Schrödinger International Institute for Mathematics and Physics (ESI) via YouTube
-
14
-
- Write review
Learn Excel & Financial Modeling the Way Finance Teams Actually Use Them
MIT Sloan: Lead AI Adoption Across Your Organization — Not Just Pilot It
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore distributional Markov Decision Processes in this 30-minute conference lecture that examines a novel class of control problems where the objective shifts from optimizing expected values to learning policies that guide cumulative reward distributions toward specific target laws. Delve into the mathematical framework of these "third kind" Markov Decision Processes and discover how they differ from traditional approaches by focusing on distributional outcomes rather than risk functionals. Learn about a proposed model-free policy-gradient algorithm that utilizes neural-network parameterizations of randomized Markov policies on augmented state spaces, combined with sample-based evaluation of characteristic-function loss. Examine the theoretical foundations including convergence proofs to stationary points using stochastic approximation techniques under mild regularity and growth assumptions. Review numerical experiments demonstrating the method's capability to match complex target distributions, and gain insights from collaborative research involving distributional control theory and its practical applications in machine learning and stochastic optimization.
Syllabus
Nicole Bäuerle - Markov Decision Processes of the Third Kind: Learning Distributions by Policy...
Taught by
Erwin Schrödinger International Institute for Mathematics and Physics (ESI)