The Dark Side of AI: Jailbreaking, Injections, Hallucinations & more

Overview

Step over to the dark side and learn about the vulnerabilities, exploits, and unintended consequences that AI models like LLMs suffer from, with hands-on prompting and exercises.

What jailbreaking models involves and how to do it yourself
Understanding vulnerabilities inherent to models, including prompt and data leakage
The risks of exposing LLMs to proprietary or sensitive data
Exploring the toxicity and bias inherently built into different models
Real-world tests using ChatGPT, DeepSeek and other models
Experiment with steering an LLM's neurons to prevent hallucinations

Syllabus

Introduction

Welcome to The Dark Side (Intro to Guardrails and Jailbreaking)
Exercise: Meet Your Classmates and Instructor
Course Resources

The Dark Side of AI

Jailbreak! (The DAN Prompt)
Exercise: Create Your Own Jailbreak
Many Shot Jailbreaking
Prompt Injections - Part 1
Prompt Injections - Part 2
Thinking Like LLMs - Multi-Modal Injection
Leaking - Part 1 (Prompt Leaking)
Leaking - Part 2 (Data Leaking)
Exposure
Poisoning
Toxicity
Hallucinations
Thinking Like LLMs - Big vs Small
Challenge: Conduct Your Own Mechanistic Interpretability Research on Hallucinations
Challenge Instructions
Leaderboard: Mechanistic Interpretability
The Model Card
Model Cards Deep Dive
Exercise: Explore the Model Card for GPT-o3-mini and Learn Something New!