Learn Backend Development Part-Time, Online
Build the Finance Skills That Lead to Promotions — Not Just Certificates
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore the concept of adversarial alignment in language models through this insightful lecture by Nicholas Carlini from Google DeepMind. Delve into the challenges of creating truly aligned AI systems that remain helpful and harmless even under adversarial conditions. Examine how standard adversarial example techniques can be used to manipulate otherwise-aligned language models into producing harmful text and behaviors. Gain a deeper understanding of the intersection between AI alignment and adversarial machine learning, and consider the significant advances needed to develop models robust against adversarial attacks.
Syllabus
Are Aligned Language Models “Adversarially Aligned”?
Taught by
Simons Institute