2,000+ Free Courses with Certificates: Coding, AI, SQL, and More
Build the Finance Skills That Lead to Promotions — Not Just Certificates
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
This talk by Jacob Hilton from the Alignment Research Center explores how to establish probabilistic safety guarantees for large language models by examining their internal mechanisms. Learn about innovative approaches to ensuring AI safety through model internals analysis, as presented at the Simons Institute's Safety-Guaranteed LLMs event. The 46-minute presentation delves into technical methods for creating more reliable safety assurances in advanced AI systems.
Syllabus
Probabilistic Safety Guarantees Using Model Internals
Taught by
Simons Institute