Earn Your Business Degree, Tuition-Free, 100% Online!
Free courses from frontend to fullstack and AI
Overview
Google, IBM & Meta Certificates – 40% Off
One plan covers every Professional Certificate on Coursera.
Unlock All Certificates
This talk from Black Hat explores a groundbreaking approach to mitigating Trojan backdoors in Large Language Models (LLMs). Discover how LLMs are evolving into central processing hubs with expanded functionalities like browser-based internet access, code interpreter integration, and peripheral device connections, effectively becoming a new operating system abstraction layer. Learn about the critical security challenges this evolution presents, particularly focusing on embedded threats like Trojan backdoors—malicious modifications inserted during training that can be triggered by specific inputs. Understand the innovative defense mechanism proposed by Sophos Senior Data Scientist Tamás Vörös, which involves targeted noising of neurons identified through their activation patterns. See how this technique effectively neutralizes both new and pre-existing Trojans without prior knowledge of their presence while maintaining the model's core functionality and performance. This 39-minute presentation offers valuable insights into an orthogonal security strategy that complements existing guardrails against the emerging threat landscape for LLMs.
Syllabus
LLMbotomy: Shutting the Trojan Backdoors
Taught by
Black Hat