Test-Time Preference Optimization: On-the-Fly AI Alignment via Iterative Feedback

Test-Time Preference Optimization: On-the-Fly AI Alignment via Iterative Feedback

Discover AI via YouTube Direct link

DPO to TPO: Test-Time Preference Optimization (RL)

1 of 1

1 of 1

DPO to TPO: Test-Time Preference Optimization (RL)

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Test-Time Preference Optimization: On-the-Fly AI Alignment via Iterative Feedback

Automatically move to the next video in the Classroom when playback concludes

  1. 1 DPO to TPO: Test-Time Preference Optimization (RL)

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.