A Quick Stop at the HostileShop - LLM Agent Hacking and Prompt Injection Framework

Explore a comprehensive conference talk on LLM security vulnerabilities through the lens of HostileShop, a Python-based framework designed to generate prompt injections and jailbreaks against large language model agents. Learn how this innovative tool uses LLMs to attack other LLMs in a simulated web shopping environment, where an attacker agent attempts to manipulate a target shopping agent into performing unauthorized actions. Discover the technical foundations of LLM agent hacking, including context window formats, agent vulnerability surfaces, and the prompting insights that enabled HostileShop's success in OpenAI's GPT-OSS-20B RedTeam Contest. Understand how the framework automatically determines attack success without requiring LLM judgment, reducing costs and enabling rapid continual learning. Examine HostileShop's capabilities in discovering prompt injections that induce improper tool calls and its ability to enhance and mutate universal jailbreaks for cross-LLM adaptation. Gain insights into the current state of LLM security and the ongoing challenges in developing privacy-preserving systems without relying on extensive surveillance measures.