Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

SelfDefend - LLMs Can Defend Themselves against Jailbreaking in a Practical Manner

USENIX via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about a novel defense framework called SelfDefend that protects large language models from jailbreaking attacks in this 12-minute conference presentation from USENIX Security '25. Discover how researchers from The Hong Kong University of Science and Technology, University of Oregon, Nanyang Technological University, City University of Hong Kong, and HSBC developed a practical solution inspired by traditional shadow stack security concepts to defend against human-based, optimization-based, generation-based, indirect, and multilingual jailbreak attacks. Explore the framework's dual-LLM architecture that establishes a shadow LLM in detection state to protect the target LLM in normal answering state, enabling checkpoint-based access control with minimal latency impact. Examine empirical validation showing that mainstream GPT-3.5/4 models can effectively identify harmful prompts, and understand how data distillation techniques create dedicated open-source defense models that outperform seven state-of-the-art defenses while maintaining compatibility with both open-source and closed-source LLMs including GPT-3.5/4, Claude, Llama-2-7b/13b, and Mistral. Gain insights into the framework's robustness against adaptive jailbreaks and prompt injections, making it a practical solution for real-world LLM security deployment.

Syllabus

USENIX Security '25 - SelfDefend: LLMs Can Defend Themselves against Jailbreaking in...

Taught by

USENIX

Reviews

Start your review of SelfDefend - LLMs Can Defend Themselves against Jailbreaking in a Practical Manner

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.