Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Formalizing Explanations of Neural Network Behaviors

Simons Institute via YouTube

Overview

Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Explore a novel approach to understanding neural network behaviors in this 59-minute lecture by Paul Christiano from the Alignment Research Center. Delve into the limitations of current mechanistic interpretability research and the challenges of formal proofs for model properties. Discover an alternative strategy for explaining specific neural network behaviors that balances between informal understanding and rigorous proofs. Gain insights into a promising research direction and theoretical questions aimed at improving AI safety and interpretability. Learn how this approach, while not as comprehensive as formal proofs, may offer comparable safety benefits in the field of AI alignment.

Syllabus

Formalizing Explanations of Neural Network Behaviors

Taught by

Simons Institute

Reviews

Start your review of Formalizing Explanations of Neural Network Behaviors

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.