METR's Benchmarks vs Economics - The AI Capability Measurement Gap

METR's Benchmarks vs Economics - The AI Capability Measurement Gap

AI Engineer via YouTube Direct link

Randomized Controlled Trial RCT Discussion

6 of 9

6 of 9

Randomized Controlled Trial RCT Discussion

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

METR's Benchmarks vs Economics - The AI Capability Measurement Gap

Automatically move to the next video in the Classroom when playback concludes

  1. 1 Introduction to METR & The Capability Gap
  2. 2 The Problem with Current Benchmarks Saturation & Interpretation
  3. 3 METR’s New Methodology: Human Time Horizons
  4. 4 Empirical Results: Fitting Capability Curves
  5. 5 Time Horizon Trends: Claude 3 Opus vs. o1-preview
  6. 6 Randomized Controlled Trial RCT Discussion
  7. 7 Reconciling the Gap: Why High Benchmarks Don't Mean High Productivity
  8. 8 Explaining the Discrepancy: Context, Reliability, and Task Interdependence
  9. 9 Future Work & Hiring at METR

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.