Overview

This course teaches you to build adaptive AI agents. You'll learn to transform static, "frozen" LLMs into dynamic systems that can learn, reason, and act. We cover two key fine-tuning methods: Supervised Fine-Tuning (SFT) for reliable, structured outputs and Parameter-Efficient Fine-Tuning (PEFT) for building specialized models efficiently. You'll design agent "brains" using ReAct reasoning loops and learn to generate training data using a "Teacher-Student" workflow. Finally, you'll tackle advanced AI alignment, learning to prevent "specification gaming" and use Direct Preference Optimization (DPO) to teach agents complex human preferences.

Syllabus

Introduction to Agentic Reinforcement Learning

Learn foundational and advanced techniques to build, train, and align autonomous AI agents using agentic reinforcement learning and fine-tuning methods.

AI Agents and Reinforcement Learning

Explore how AI agents learn by trial and error, using reinforcement learning to develop adaptive strategies and solve complex tasks beyond manual programming.

Supervised Fine Tuning for Agentic Reinforcement Learning

Learn how supervised fine-tuning transforms general models into specialized agents for precise, structured outputs, minimizing creative variability for consistent task performance.

Generating Supervised Fine Tuning Datasets

Learn how to use agents and LLMs to automate the creation of high-quality, structured supervised fine-tuning datasets for Q&A and clinical trial eligibility tasks.

Practical Fine Tuning with PEFT

Learn how PEFT and adapter layers enable efficient fine-tuning for structured outputs, preserving base model knowledge. Emphasizes LoRA, consistency in training data, and scalability across tasks.

Implementing Practical Fine Tuning with PEFT

Learn to fine-tune language models efficiently using PEFT and LoRA adapters, applying agents for data labeling and creating specialized models for sentiment and clinical tasks.

Agent Architecture Fundamentals

Learn the fundamentals of agent architecture, focusing on the reasoning loop, ReAct frameworks, and how dynamic planning enables adaptive, transparent, and debuggable autonomous agents.

Applying Agent Architecture

Learn to design agent architectures by specifying objectives, tools, state/action spaces, and reasoning traces for reliable task execution; apply these by designing and testing agents.

Generating Agentic Training Data

Learn how agent trajectory data captures full step-by-step reasoning and actions, enabling deeper training for agents across simple to complex domains through comprehensive decision records.

Agentic Training Data Generation

Learn to generate and record agent trajectory data, capturing detailed decision-making steps, to train effective agentic models for complex, multi-step tasks.

Theory of AI Alignment

Explore the challenge of aligning AI with complex human values, highlighting risks of specification gaming and the gap between mathematical goals and true human intent.

Implementing Alignment

Learn to implement alignment by using evaluator agents to create preference pairs, scoring responses for quality and safety, and encoding principles for effective model training.

Practical Alignment with Direct Preference Optimization

Learn how Direct Preference Optimization (DPO) aligns models with human values by directly optimizing on preference pairs, simplifying training, and capturing nuanced human judgments.

Implementing Practical Alignment with DPO

Learn to align language models with human preferences using Direct Preference Optimization (DPO), focusing on concise answers and clinical safety through preference pairs and LoRA adapters.

Course Review

Review the journey from static LLMs to agentic reinforcement learning, building, reasoning, and aligning agents using SFT/PEFT, ReAct, and DPO for safe, adaptive AI systems.