A Quick Stop at the HostileShop - LLM Agent Hacking and Prompt Injection Framework
media.ccc.de via YouTube
Free courses from frontend to fullstack and AI
AI Engineer - Learn how to integrate AI into software applications
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore a comprehensive conference talk on LLM security vulnerabilities through the lens of HostileShop, a Python-based framework designed to generate prompt injections and jailbreaks against large language model agents. Learn how this innovative tool uses LLMs to attack other LLMs in a simulated web shopping environment, where an attacker agent attempts to manipulate a target shopping agent into performing unauthorized actions. Discover the technical foundations of LLM agent hacking, including context window formats, agent vulnerability surfaces, and the prompting insights that enabled HostileShop's success in OpenAI's GPT-OSS-20B RedTeam Contest. Understand how the framework automatically determines attack success without requiring LLM judgment, reducing costs and enabling rapid continual learning. Examine HostileShop's capabilities in discovering prompt injections that induce improper tool calls and its ability to enhance and mutate universal jailbreaks for cross-LLM adaptation. Gain insights into the current state of LLM security and the ongoing challenges in developing privacy-preserving systems without relying on extensive surveillance measures.
Syllabus
39C3 - A Quick Stop at the HostileShop
Taught by
media.ccc.de