Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
This conference talk explores the comparative effectiveness of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) in training reasoning capabilities for large vision-language models. Discover the hidden solutions and critical implications for AI reasoning in this 33-minute presentation from Discover AI. Learn why research findings on SFT+RL versus RL-only approaches have such a short validity window in the rapidly evolving AI landscape. The talk presents research from a collaborative team of scholars from the University of California Santa Cruz, University of Texas at Dallas, Pennsylvania State University, and Amazon Research, who authored "SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models."
Syllabus
TEXAS: Fine-Tuning Is for Cowards - Do RL
Taught by
Discover AI