Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

How to Fine-tune LLMs with RLVR - OpenAI's RFT API

Shaw Talebi via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn to fine-tune large language models using Reinforcement Learning via Verifiable Rewards (RLVR) through OpenAI's RFT API in this comprehensive 26-minute tutorial. Explore the fundamentals of reinforcement learning with LLMs and understand how RLVR differs from traditional supervised fine-tuning approaches. Follow a hands-on example demonstrating how to fine-tune GPT-4-mini for HDFS anomaly detection, covering essential steps including data preparation with train-validation splits, proper data formatting, grader creation, model fine-tuning implementation, and comprehensive evaluation methods. Discover the practical applications of RLVR in improving model performance on specific classification tasks while understanding the limitations and considerations of this approach. Access the complete GitHub repository and dataset to implement the techniques yourself, and gain insights into advanced reinforcement learning techniques for language model optimization as part of a broader series on RL with LLMs.

Syllabus

Introduction -
RL with LLMs -
RLVR -
SFT vs RLVR -
Example: HDFS Classification with RLVR -
Step 0: Imports -
Step 1: Train-Validation Split -
Step 2: Format Data -
Step 3: Create Grader -
Step 4: Fine-tune Model -
Step 5: Evaluate Model -
Limitations -
What's Next? -

Taught by

Shaw Talebi

Reviews

Start your review of How to Fine-tune LLMs with RLVR - OpenAI's RFT API

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.