Average Reward Markov Decision Process - Policy Gradient Algorithms and Regret Analysis
Centre for Networked Intelligence, IISc via YouTube
Live Online Classes in Design, Coding & AI — Small Classes, Free Retakes
Learn AI, Data Science & Business — Earn Certificates That Get You Hired
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Learn about infinite horizon average reward Markov Decision Process (MDP) in this comprehensive lecture by Prof. Vaneet Aggarwal from Purdue University. Explore novel approaches to regret guarantees with general parameterization, focusing specifically on policy gradient-based algorithms. Understand the fundamental principles of gradient estimation techniques that achieve a regret bound of O(T^0.75), and discover an efficient momentum-based approach reaching O(T^0.5). Examine innovative methods for reducing mixing time dependency in MDP problems. The speaker, a distinguished professor at Purdue University, brings extensive expertise in Reinforcement Learning, Generative AI, and Quantum Machine Learning, with notable achievements including the 2024 IEEE William R. Bennett Prize and the 2017 Jack Neubauer Memorial Award.
Syllabus
Time: 5:00– PM
Taught by
Centre for Networked Intelligence, IISc