The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage

Learn about a novel black-box membership inference attack method in this Google TechTalk that demonstrates how simple n-gram coverage can effectively detect whether specific text was part of a language model's training data. Discover the N-Gram Coverage Attack technique, which relies solely on text outputs from target models without requiring access to hidden states or probability distributions, making it applicable to API-only models like GPT-4. Explore how this method leverages the observation that models are more likely to memorize and generate text patterns commonly seen in training data by obtaining multiple model generations conditioned on candidate text prefixes and using n-gram overlap metrics to measure similarities with ground truth suffixes. Examine benchmark results showing this approach outperforms other black-box methods and achieves comparable performance to state-of-the-art white-box attacks despite having access only to text outputs. Understand how attack performance scales with computational budget as the number of generated sequences increases, and review investigations into previously unstudied closed OpenAI models across multiple domains. Gain insights into the evolving privacy landscape of language models, including findings that more recent models like GPT-4o show increased robustness to membership inference attacks, suggesting improved privacy protections over time.