Become an AI & ML Engineer with Cal Poly EPaCE — IBM-Certified Training
PowerBI Data Analyst - Create visualizations and dashboards from scratch
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Explore the critical importance of selecting appropriate threat models when addressing machine learning memorization issues in this Google TechTalk. Learn why the choice of threat model is often overlooked despite being fundamental to understanding privacy and copyright violations in ML systems. Examine two key research examples that demonstrate the consequences of inadequate threat modeling: first, discover how heuristic privacy defenses that sacrifice strong guarantees for utility can completely fail to protect certain samples even in realistic settings, despite appearing effective in average-case evaluations. Second, investigate memorization in large language models and its implications for both privacy and copyright, including findings that conversational models may output text consisting of up to 15% verbatim internet snippets on average, reaching nearly 100% in worst-case scenarios. Understand how current research typically focuses on either worst-case data extraction or broad concepts like linguistic novelty, while missing the middle ground of natural task reproduction under benign prompts. Gain insights into why overly optimistic or inappropriate threat models create false security and learn why proper auditing and mitigation requires moving beyond purely benign assumptions in the evaluation of machine learning privacy defenses.
Syllabus
Threat Models for Memorization: Privacy, Copyright, and Everything In-Between
Taught by
Google TechTalks