Chain-of-Thought Reasoning in Large Language Models - Exploring SFT and RL Relevance
Discover AI via YouTube
AI Engineer - Learn how to integrate AI into software applications
Advanced Techniques in Data Visualization - Self Paced Online
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore a technical analysis of OpenAI's o3 model's Chain-of-Thought (CoT) reasoning mechanisms during inference time in this 27-minute research presentation. Dive deep into the relationship between Supervised Fine-Tuning (SFT), Reinforcement Learning (RL), and test-time reasoning capabilities. Learn how language models are taught to explicitly reason over safety specifications through CoT, breaking down complex problems into intermediate steps rather than relying solely on pattern recognition. Examine the innovative Alignment Fine-Tuning (AFT) paradigm that addresses assessment misalignment issues through a three-step process involving COT training, response generation, and score calibration. Understand the implications of explicit reasoning versus implicit pattern learning in language models, drawing from research by HuggingFace and OpenAI on scaling test-time compute and deliberative alignment strategies.
Syllabus
o3 Inference Time CoT Reasoning: How relevant is SFT and RL?
Taught by
Discover AI