Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Compact Proofs: Measuring Quality of Understanding with Compression-Based Metrics

Topos Institute via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Watch a technical colloquium talk exploring how compression-based metrics can measure the quality of mechanistic interpretability in AI models. Delve into research findings from studying small transformers trained on Max-of-K tasks, where 102 different computer-assisted proof strategies were developed to assess proof length and bound tightness across 151 models. Learn how shorter proofs correlate with better mechanistic understanding and tighter performance bounds, while examining the challenge of compounding structureless noise in generating compact proofs. Discover ongoing work in relaxing worst-case constraints and fine-tuning partially-interpreted models, along with a roadmap for scaling this approach to frontier models. Explore key concepts including theorem statements, baseline approaches, toy cases, distilling neural networks, and modular arithmetic models through detailed technical discussions and Q&A sessions.

Syllabus

Introduction
Why Metrics
Theorem Statement
Baseline Approach
Brute Force Approach
mechanistic understanding
toy case
current applications
distilling NE networks
compressing proofs
research agenda
QA
Group approach
Model
Brute Force
Insight
Error Term Matrix
Modular Arithmetic Model

Taught by

Topos Institute

Reviews

Start your review of Compact Proofs: Measuring Quality of Understanding with Compression-Based Metrics

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.