Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

GPT-OSS Has a Hidden Confidence Switch - Inside GPT-OSS

Chris Hay via YouTube

Overview

Why Pay Per Course When You Can Get All of Coursera for 40% Off?
10,000+ courses, Google, IBM & Meta certificates, one annual plan at 40% off. Upgrade now.
Get Full Access
Explore the hidden confidence routing mechanism in GPT models through mechanistic interpretability analysis of Layer 15. Discover how the model makes strategic decisions about problem difficulty, operation type, and retrieval confidence before performing any actual computation or retrieval. Learn to identify and manipulate specific neurons that control the separation between mathematical and language tasks, select between different operations, and determine problem difficulty gradients. Master techniques for finding, ablating, and steering neurons to demonstrate how Layer 15's confidence routing affects downstream layers 19-21, revealing that the confidence displayed in model outputs represents internal routing decisions rather than post-computation verification.

Syllabus

- Introduction
- Overview of Layers
- Layer 15: Overview
- Confidence of Facts
- Uncertainity
- The signal dictionary of Layer 15
- Finding the Neurons?
- Ablating Neurons
- Steering Neurons

Taught by

Chris Hay

Reviews

Start your review of GPT-OSS Has a Hidden Confidence Switch - Inside GPT-OSS

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.