Earn Your Business Degree, Tuition-Free, 100% Online!
PowerBI Data Analyst - Create visualizations and dashboards from scratch
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Explore a detailed video explanation of the DeepSeek R1 research paper, focusing on how reinforcement learning can enhance reasoning capabilities in Large Language Models (LLMs). Learn about the groundbreaking approach that eliminates the need for Supervised Fine-Tuning (SFT) in LLM training, making DeepSeek R1 the first model to achieve this milestone. Dive into comprehensive coverage of the model architecture, training methodology including Group Relative Policy Optimization, reward modeling techniques, and performance metrics. Understand the self-evolution process and examine the practical results that demonstrate the effectiveness of this innovative training approach. Access additional resources including the official DeepSeek platform, API documentation, and related research papers to further expand your knowledge of this advancement in AI development.
Syllabus
0:00 - Intro
2:38 - Training LLMs
5:05 - DeepSeek R1 Zero Training
5:54 - Group Relative Policy Optimization
8:45 - Reward Modelling
10:21 - Training Performance
11:33 - Self-evolution
17:20 - Results
Taught by
AI Bites