Worst-Case Membership Inference of Language Models

Learn about worst-case membership inference attacks on language models in this Google TechTalk presented by Ashwinee Panda. Discover how certain documents in pretraining datasets can be identified through sophisticated inference techniques, challenging the common belief that such identification is impossible. Explore the key characteristics that make specific documents more vulnerable to membership inference attacks and understand how test statistics become informative when computed over particular document spans rather than entire training documents. Examine the phenomenon where test statistic distributions of member documents converge with non-member distributions during model updates, creating an illusion that the model has "forgotten" certain training data. Master the proposed finetuning approach that can successfully separate these distributions and enable membership inference of documents exhibiting worst-case memorization patterns, with significant implications for privacy in machine learning systems.