METR's Benchmarks vs Economics - The AI Capability Measurement Gap
AI Engineer via YouTube
-
22
-
- Write review
Overview
Syllabus
Introduction to METR & The Capability Gap
The Problem with Current Benchmarks Saturation & Interpretation
METR’s New Methodology: Human Time Horizons
Empirical Results: Fitting Capability Curves
Time Horizon Trends: Claude 3 Opus vs. o1-preview
Randomized Controlled Trial RCT Discussion
Reconciling the Gap: Why High Benchmarks Don't Mean High Productivity
Explaining the Discrepancy: Context, Reliability, and Task Interdependence
Future Work & Hiring at METR
Taught by
AI Engineer