Build the Finance Skills That Lead to Promotions — Not Just Certificates
Learn AI, Data Science & Business — Earn Certificates That Get You Hired
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore the critical importance of selecting appropriate threat models when addressing machine learning memorization issues in this Google TechTalk. Learn why the choice of threat model is often overlooked despite being fundamental to understanding privacy and copyright violations in ML systems. Examine two key research examples that demonstrate the consequences of inadequate threat modeling: first, discover how heuristic privacy defenses that sacrifice strong guarantees for utility can completely fail to protect certain samples even in realistic settings, despite appearing effective in average-case evaluations. Second, investigate memorization in large language models and its implications for both privacy and copyright, including findings that conversational models may output text consisting of up to 15% verbatim internet snippets on average, reaching nearly 100% in worst-case scenarios. Understand how current research typically focuses on either worst-case data extraction or broad concepts like linguistic novelty, while missing the middle ground of natural task reproduction under benign prompts. Gain insights into why overly optimistic or inappropriate threat models create false security and learn why proper auditing and mitigation requires moving beyond purely benign assumptions in the evaluation of machine learning privacy defenses.
Syllabus
Threat Models for Memorization: Privacy, Copyright, and Everything In-Between
Taught by
Google TechTalks