This course teaches you to build adaptive AI agents. You'll learn to transform static, "frozen" LLMs into dynamic systems that can learn, reason, and act. We cover two key fine-tuning methods: Supervised Fine-Tuning (SFT) for reliable, structured outputs and Parameter-Efficient Fine-Tuning (PEFT) for building specialized models efficiently. You'll design agent "brains" using ReAct reasoning loops and learn to generate training data using a "Teacher-Student" workflow. Finally, you'll tackle advanced AI alignment, learning to prevent "specification gaming" and use Direct Preference Optimization (DPO) to teach agents complex human preferences.
Overview
Syllabus
- Introduction to Agentic Reinforcement Learning
- Learn foundational and advanced techniques to build, train, and align autonomous AI agents using agentic reinforcement learning and fine-tuning methods.
- AI Agents and Reinforcement Learning
- Explore how AI agents learn by trial and error, using reinforcement learning to develop adaptive strategies and solve complex tasks beyond manual programming.
- Supervised Fine Tuning for Agentic Reinforcement Learning
- Learn how supervised fine-tuning transforms general models into specialized agents for precise, structured outputs, minimizing creative variability for consistent task performance.
- Generating Supervised Fine Tuning Datasets
- Learn how to use agents and LLMs to automate the creation of high-quality, structured supervised fine-tuning datasets for Q&A and clinical trial eligibility tasks.
- Practical Fine Tuning with PEFT
- Learn how PEFT and adapter layers enable efficient fine-tuning for structured outputs, preserving base model knowledge. Emphasizes LoRA, consistency in training data, and scalability across tasks.
- Implementing Practical Fine Tuning with PEFT
- Learn to fine-tune language models efficiently using PEFT and LoRA adapters, applying agents for data labeling and creating specialized models for sentiment and clinical tasks.
- Agent Architecture Fundamentals
- Learn the fundamentals of agent architecture, focusing on the reasoning loop, ReAct frameworks, and how dynamic planning enables adaptive, transparent, and debuggable autonomous agents.
- Applying Agent Architecture
- Learn to design agent architectures by specifying objectives, tools, state/action spaces, and reasoning traces for reliable task execution; apply these by designing and testing agents.
- Generating Agentic Training Data
- Learn how agent trajectory data captures full step-by-step reasoning and actions, enabling deeper training for agents across simple to complex domains through comprehensive decision records.
- Agentic Training Data Generation
- Learn to generate and record agent trajectory data, capturing detailed decision-making steps, to train effective agentic models for complex, multi-step tasks.
- Theory of AI Alignment
- Explore the challenge of aligning AI with complex human values, highlighting risks of specification gaming and the gap between mathematical goals and true human intent.
- Implementing Alignment
- Learn to implement alignment by using evaluator agents to create preference pairs, scoring responses for quality and safety, and encoding principles for effective model training.
- Practical Alignment with Direct Preference Optimization
- Learn how Direct Preference Optimization (DPO) aligns models with human values by directly optimizing on preference pairs, simplifying training, and capturing nuanced human judgments.
- Implementing Practical Alignment with DPO
- Learn to align language models with human preferences using Direct Preference Optimization (DPO), focusing on concise answers and clinical safety through preference pairs and LoRA adapters.
- Course Review
- Review the journey from static LLMs to agentic reinforcement learning, building, reasoning, and aligning agents using SFT/PEFT, ReAct, and DPO for safe, adaptive AI systems.
- Project: MeetMind AI Agent
Taught by
Christopher Agostino