EmbedX - Embedding-Based Cross-Trigger Backdoor Attack Against Large Language Models

Learn about EmbedX, a novel cross-trigger backdoor attack methodology targeting large language models through this 16-minute conference presentation from USENIX Security '25. Discover how researchers from Wuhan University, Huazhong University of Science and Technology, and Hong Kong University of Science and Technology developed an advanced attack technique that exploits continuous embedding vectors as soft triggers to manipulate LLM behaviors during inference. Explore the limitations of existing single-trigger backdoor attacks and understand how EmbedX addresses these shortcomings by mapping multiple tokens to the same soft trigger, creating backdoor pathways that link various input tokens to attacker-controlled outputs. Examine the latent adversarial backdoor mechanism with dual constraints in frequency and gradient domains designed to maintain attack stealthiness while crafting poisoned samples that closely resemble target samples. Analyze experimental results demonstrating EmbedX's effectiveness across four popular large language models in both classification and generation tasks, showing how the attack achieves its goals while preserving model utility and maintaining stealth characteristics.