Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Coursera

Secure AI: Red-Teaming & Safety Filters

Coursera via Coursera

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
As large language models revolutionize business operations, sophisticated attackers exploit AI systems through prompt injection, jailbreaking, and content manipulation—vulnerabilities that traditional security tools cannot detect. This intensive course empowers AI developers, cybersecurity professionals, and IT managers to systematically identify and mitigate LLM-specific threats before deployment. Master red-teaming methodologies using industry-standard tools like PyRIT, NVIDIA Garak, and Promptfoo to uncover hidden vulnerabilities through adversarial testing. Learn to design and implement multi-layered content-safety filters that block sophisticated bypass attempts while maintaining system functionality. Through hands-on labs, you'll establish resilience baselines, implement continuous monitoring systems, and create adaptive defenses that strengthen over time. This course is designed for AI engineers, security professionals, data scientists, and developers interested in ensuring the safety and robustness of AI models. It’s also ideal for technology leaders seeking to implement secure, responsible AI frameworks within their organizations. Learners should have a basic understanding of machine learning, AI model architecture, and programming concepts. No prior experience with AI red-teaming or safety systems is required. By end of this course, you'll confidently conduct professional AI security assessments, deploy robust safety mechanisms, and protect LLM applications from evolving attack vectors in production environments.

Syllabus

  • Red-Teaming Scenarios for LLM Vulnerabilities
    • This module introduces participants to the systematic creation and execution of red-teaming scenarios targeting large language models. Students learn to identify common vulnerability categories including prompt injection, jailbreaking, and data extraction attacks. The module demonstrates how to design realistic adversarial scenarios that mirror real-world attack patterns, using structured methodologies to probe LLM weaknesses. Hands-on demonstrations show how red-teamers simulate malicious user behavior to uncover security gaps before deployment.
  • Content-Safety Filters: Implementation and Testing
    • This module covers the design, implementation, and evaluation of content-safety filters for LLM applications. Participants explore multi-layered defense strategies including input sanitization, output filtering, and behavioral monitoring systems. The module demonstrates how to configure safety mechanisms that balance security with functionality, and shows practical testing methods to validate filter effectiveness against sophisticated bypass attempts. Real-world examples illustrate the challenges of maintaining robust content filtering while preserving user experience.
  • Testing LLM Resilience and Improving AI Robustness
    • This module focuses on comprehensive resilience testing and systematic improvement of AI system robustness. Students learn to conduct thorough security assessments that measure LLM resistance to adversarial inputs, evaluate defense mechanism effectiveness, and identify areas for improvement. The module demonstrates how to establish baseline security metrics, implement iterative hardening processes, and validate improvements through continuous testing. Participants gain skills in developing robust AI systems that maintain integrity under real-world adversarial conditions.

Taught by

Brian Newman and Starweaver

Reviews

Start your review of Secure AI: Red-Teaming & Safety Filters

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.