Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
This talk from Black Hat explores a groundbreaking approach to mitigating Trojan backdoors in Large Language Models (LLMs). Discover how LLMs are evolving into central processing hubs with expanded functionalities like browser-based internet access, code interpreter integration, and peripheral device connections, effectively becoming a new operating system abstraction layer. Learn about the critical security challenges this evolution presents, particularly focusing on embedded threats like Trojan backdoors—malicious modifications inserted during training that can be triggered by specific inputs. Understand the innovative defense mechanism proposed by Sophos Senior Data Scientist Tamás Vörös, which involves targeted noising of neurons identified through their activation patterns. See how this technique effectively neutralizes both new and pre-existing Trojans without prior knowledge of their presence while maintaining the model's core functionality and performance. This 39-minute presentation offers valuable insights into an orthogonal security strategy that complements existing guardrails against the emerging threat landscape for LLMs.