Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Persistent Pre-Training Poisoning of LLMs

Google TechTalks via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about the vulnerabilities of large language models to poisoning attacks during the pre-training phase in this Google TechTalk presented by Javier Rando and Yiming Zhang. Discover how malicious actors can compromise language models by poisoning as little as 0.1% of pre-training datasets scraped from the web, and understand why these attacks persist even after models undergo supervised fine-tuning (SFT) and direct preference optimization (DPO) to become helpful and harmless chatbots. Explore four different attack objectives including denial-of-service, belief manipulation, jailbreaking, and prompt stealing, with research findings demonstrating that three out of four attack types remain effective after post-training. Examine experimental results across various model sizes ranging from 600M to 7B parameters, including the particularly concerning finding that simple denial-of-service attacks can persist with poisoning rates as low as 0.001% of the pre-training dataset. Gain insights into the security implications for large language models trained on uncurated web-scraped text datasets consisting of trillions of tokens, and understand the challenges this presents for AI safety and model robustness in real-world deployments.

Syllabus

Persistent Pre-Training Poisoning of LLMs

Taught by

Google TechTalks

Reviews

Start your review of Persistent Pre-Training Poisoning of LLMs

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.