Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Cascading Adversarial Bias from Injection to Distillation in Language Models

Google TechTalks via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore how adversarial bias injection during training propagates and amplifies through the knowledge distillation process in language models in this Google TechTalk. Examine critical security vulnerabilities where minimal data poisoning (just 0.25% of training data) leads to significantly more pronounced biases in student models compared to their teacher models. Discover research findings showing that in targeted scenarios, student models generate biased content 76.9% of the time versus 69.4% in teachers, while untargeted biases appear up to 29.2 times more frequently in student models on previously unseen tasks. Learn about comprehensive testing across various bias types, distillation methods, and data modalities that reveals the inadequacy of current defense mechanisms against these sophisticated attacks. Understand the urgent need for specialized safeguards in machine learning systems and gain insights into practical design principles for developing future mitigation strategies to address this cascading bias amplification phenomenon.

Syllabus

Cascading Adversarial Bias from Injection to Distillation in Language Models

Taught by

Google TechTalks

Reviews

Start your review of Cascading Adversarial Bias from Injection to Distillation in Language Models

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.