Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the inner workings of large language model (LLM) jailbreaks through an innovative introspection framework that functions like an MRI for artificial neural networks. Learn from CyberArk's vulnerability research experts Mark Cherp and Shaked Reiner as they demonstrate how to decode the internal neural states of LLMs to understand how adversarial attacks truly operate. Discover groundbreaking techniques that have led to the identification of previously unknown jailbreak methods and gain unprecedented insights into the mechanics behind these security vulnerabilities. Understand how this introspection approach moves beyond traditional black-box analysis to provide clear visibility into the neural patterns that enable successful attacks against LLM systems. Master the methodologies for analyzing adversarial neural patterns and apply these insights to better defend against sophisticated AI security threats.
Syllabus
Beyond the Black Box: Revealing Adversarial Neural Patterns in LLMs
Taught by
RSA Conference