Completed
Reconciling the Gap: Why High Benchmarks Don't Mean High Productivity
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
METR's Benchmarks vs Economics - The AI Capability Measurement Gap
Automatically move to the next video in the Classroom when playback concludes
- 1 Introduction to METR & The Capability Gap
- 2 The Problem with Current Benchmarks Saturation & Interpretation
- 3 METR’s New Methodology: Human Time Horizons
- 4 Empirical Results: Fitting Capability Curves
- 5 Time Horizon Trends: Claude 3 Opus vs. o1-preview
- 6 Randomized Controlled Trial RCT Discussion
- 7 Reconciling the Gap: Why High Benchmarks Don't Mean High Productivity
- 8 Explaining the Discrepancy: Context, Reliability, and Task Interdependence
- 9 Future Work & Hiring at METR