Bypassing AI Security Controls with Prompt Formatting

Learn how to exploit vulnerabilities in AI security systems through a conference talk that demonstrates the prompt formatting technique used to bypass AWS Bedrock Guardrails' Sensitive Information Filter functionality. Discover how cybersecurity expert Nathan Kirk from NR Labs successfully circumvented security controls designed to prevent AI systems from returning sensitive data like names and email addresses by instructing AI models to format responses using programmatic, SQL-like queries. Explore the parallels between this AI security bypass and traditional WAF evasion techniques, understand the implications for AI system security, and examine the mitigation strategies developed to help AWS customers protect against this vulnerability. Gain insights from over a decade of penetration testing experience focused on hardware and web applications, drawing from expertise developed at Mandiant's Offensive Services division and Hilton's security programs.