Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Udacity

Fine-Tuning AI Agents with Reinforcement Learning

via Udacity

Overview

This course teaches you to build adaptive AI agents. You'll learn to transform static, "frozen" LLMs into dynamic systems that can learn, reason, and act. We cover two key fine-tuning methods: Supervised Fine-Tuning (SFT) for reliable, structured outputs and Parameter-Efficient Fine-Tuning (PEFT) for building specialized models efficiently. You'll design agent "brains" using ReAct reasoning loops and learn to generate training data using a "Teacher-Student" workflow. Finally, you'll tackle advanced AI alignment, learning to prevent "specification gaming" and use Direct Preference Optimization (DPO) to teach agents complex human preferences.

Syllabus

  • Introduction to Agentic Reinforcement Learning
    • Learn foundational and advanced techniques to build, train, and align autonomous AI agents using agentic reinforcement learning and fine-tuning methods.
  • AI Agents and Reinforcement Learning
    • Explore how AI agents learn by trial and error, using reinforcement learning to develop adaptive strategies and solve complex tasks beyond manual programming.
  • Supervised Fine Tuning for Agentic Reinforcement Learning
    • Learn how supervised fine-tuning transforms general models into specialized agents for precise, structured outputs, minimizing creative variability for consistent task performance.
  • Generating Supervised Fine Tuning Datasets
    • Learn how to use agents and LLMs to automate the creation of high-quality, structured supervised fine-tuning datasets for Q&A and clinical trial eligibility tasks.
  • Practical Fine Tuning with PEFT
    • Learn how PEFT and adapter layers enable efficient fine-tuning for structured outputs, preserving base model knowledge. Emphasizes LoRA, consistency in training data, and scalability across tasks.
  • Implementing Practical Fine Tuning with PEFT
    • Learn to fine-tune language models efficiently using PEFT and LoRA adapters, applying agents for data labeling and creating specialized models for sentiment and clinical tasks.
  • Agent Architecture Fundamentals
    • Learn the fundamentals of agent architecture, focusing on the reasoning loop, ReAct frameworks, and how dynamic planning enables adaptive, transparent, and debuggable autonomous agents.
  • Applying Agent Architecture
    • Learn to design agent architectures by specifying objectives, tools, state/action spaces, and reasoning traces for reliable task execution; apply these by designing and testing agents.
  • Generating Agentic Training Data
    • Learn how agent trajectory data captures full step-by-step reasoning and actions, enabling deeper training for agents across simple to complex domains through comprehensive decision records.
  • Agentic Training Data Generation
    • Learn to generate and record agent trajectory data, capturing detailed decision-making steps, to train effective agentic models for complex, multi-step tasks.
  • Theory of AI Alignment
    • Explore the challenge of aligning AI with complex human values, highlighting risks of specification gaming and the gap between mathematical goals and true human intent.
  • Implementing Alignment
    • Learn to implement alignment by using evaluator agents to create preference pairs, scoring responses for quality and safety, and encoding principles for effective model training.
  • Practical Alignment with Direct Preference Optimization
    • Learn how Direct Preference Optimization (DPO) aligns models with human values by directly optimizing on preference pairs, simplifying training, and capturing nuanced human judgments.
  • Implementing Practical Alignment with DPO
    • Learn to align language models with human preferences using Direct Preference Optimization (DPO), focusing on concise answers and clinical safety through preference pairs and LoRA adapters.
  • Course Review
    • Review the journey from static LLMs to agentic reinforcement learning, building, reasoning, and aligning agents using SFT/PEFT, ReAct, and DPO for safe, adaptive AI systems.
  • Project: MeetMind AI Agent

Taught by

Christopher Agostino

Reviews

Start your review of Fine-Tuning AI Agents with Reinforcement Learning

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.