Google AI Professional Certificate - Learn AI Skills That Get You Hired
MIT Sloan AI Adoption: Build a Playbook That Drives Real Business ROI
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off your first 3 months — limited time.
Unlock All Certificates
Explore the concept of adversarial alignment in language models through this insightful lecture by Nicholas Carlini from Google DeepMind. Delve into the challenges of creating truly aligned AI systems that remain helpful and harmless even under adversarial conditions. Examine how standard adversarial example techniques can be used to manipulate otherwise-aligned language models into producing harmful text and behaviors. Gain a deeper understanding of the intersection between AI alignment and adversarial machine learning, and consider the significant advances needed to develop models robust against adversarial attacks.
Syllabus
Are Aligned Language Models “Adversarially Aligned”?
Taught by
Simons Institute