Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

A Quick Stop at the HostileShop - LLM Agent Hacking and Prompt Injection Framework

media.ccc.de via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore a comprehensive conference talk on LLM security vulnerabilities through the lens of HostileShop, a Python-based framework designed to generate prompt injections and jailbreaks against large language model agents. Learn how this innovative tool uses LLMs to attack other LLMs in a simulated web shopping environment, where an attacker agent attempts to manipulate a target shopping agent into performing unauthorized actions. Discover the technical foundations of LLM agent hacking, including context window formats, agent vulnerability surfaces, and the prompting insights that enabled HostileShop's success in OpenAI's GPT-OSS-20B RedTeam Contest. Understand how the framework automatically determines attack success without requiring LLM judgment, reducing costs and enabling rapid continual learning. Examine HostileShop's capabilities in discovering prompt injections that induce improper tool calls and its ability to enhance and mutate universal jailbreaks for cross-LLM adaptation. Gain insights into the current state of LLM security and the ongoing challenges in developing privacy-preserving systems without relying on extensive surveillance measures.

Syllabus

39C3 - A Quick Stop at the HostileShop

Taught by

media.ccc.de

Reviews

Start your review of A Quick Stop at the HostileShop - LLM Agent Hacking and Prompt Injection Framework

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.