From Purity to Peril - Backdooring Merged Models From Harmless Benign Components

Explore a critical security vulnerability in model merging through this 15-minute conference presentation that reveals how seemingly benign AI models can be combined to create backdoored systems. Learn about MergeBackdoor, a novel training framework that demonstrates how multiple harmless upstream models can be strategically designed to suppress backdoor behaviors individually while activating dangerous backdoor functionality when merged together. Discover the comprehensive evaluation results across Vision Transformers (ViT), BERT, and Large Language Models tested on 12 datasets, showing how attack success rates remain at random-guessing levels for individual models but reach nearly 100% effectiveness in merged models. Understand the underlying mechanisms of this supply chain threat and examine why even sophisticated detection methods fail to identify these vulnerabilities before model merging occurs. Gain insights into the security implications for the growing practice of model merging in AI development and the urgent need for comprehensive security audits throughout the entire merging pipeline.