Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

On the Curses of Future and History in Off-policy Evaluation in Non-Markov Environments

Simons Institute via YouTube

Overview

Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Explore a lecture on off-policy evaluation in non-Markov environments, focusing on the challenges of coverage in partially observable Markov decision processes (POMDPs). Delve into the novel framework of future-dependent value functions and learn about belief coverage and outcome coverage assumptions tailored to POMDP structures. Discover how these concepts enable the first polynomial sample complexity guarantee for off-policy evaluation in POMDPs, addressing the limitations of traditional Markov-based approaches. Gain insights into the practical implications for real-world applications of reinforcement learning, including RLHF in large language models.

Syllabus

On the Curses of Future and History in Off-policy Evaluation in non-Markov Environments

Taught by

Simons Institute

Reviews

Start your review of On the Curses of Future and History in Off-policy Evaluation in Non-Markov Environments

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.