I Know What You Said - Unveiling Hardware Cache Side-Channels in Local Large Language Model Inference

Explore a groundbreaking security research presentation that reveals critical vulnerabilities in locally deployed Large Language Models through hardware cache side-channel attacks. Learn how adversaries can exploit cache access patterns during token embedding operations to infer token values and deduce token positions from autoregressive decoding timing, potentially exposing both user input and output text without requiring direct interaction with the victim's LLM or elevated privileges. Discover the novel eavesdropping attack framework developed by researchers from the Chinese Academy of Sciences that targets both open-source and proprietary LLM inference systems including Llama, Falcon, and Gemma. Examine the concerning experimental results showing reconstructed output and input text achieving average edit distances of only 5.2% and 17.3% from ground truth respectively, with cosine similarity scores reaching 98.7% for input and 98.0% for output text reconstruction. Understand the implications of these findings for privacy-sensitive applications of local LLMs deployed by major companies like Meta, Google, and Intel, and gain insights into previously unexplored security risks in the growing field of local LLM deployment.