Private Investigator - Extracting Personally Identifiable Information from Large Language Models Using Optimized Prompts

Watch this 12-minute conference presentation from USENIX Security '25 that introduces Private Investigator, a novel attack framework designed to extract personally identifiable information (PII) from fine-tuned large language models through optimized prompt engineering. Learn about the significant privacy threats posed by training data extraction attacks in machine learning deployments, where adversaries can potentially leak PII memorized during the fine-tuning process of pre-trained language models on private user data. Discover how researchers from KAIST and Oregon State University developed a sophisticated prompt generation method that crafts promising prompts to induce target language models to emit maximum PII items by exploring diverse contexts, along with a strategic prompt selection approach that prioritizes the most effective prompts for successful extraction attacks. Examine the evaluation results demonstrating Private Investigator's superior performance, extracting up to 1,254 more email addresses, 634 more phone numbers, and 5,087 more personal names compared to existing attack methods, highlighting critical security vulnerabilities in current language model deployment practices.