Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Self-Principled Critique Tuning with DeepSeek-GRM-27B

Discover AI via YouTube

Start learning Write review

Details

Start learning

Provider

YouTube
Pricing

Free Video
Languages

English
Duration & workload

19 minutes
Sessions

On-Demand
Level

Intermediate

Found in

DeepSeek Courses

Learn about DeepSeek's innovative learning method "Self-Principled Critique Tuning" (SPCT) and their new reasoning model DeepSeek-GRM-27B in this 19-minute explanatory video. Discover how this breakthrough approach works and why it might form the foundation for the next DeepSeek R2. The video covers the research from "Inference-Time Scaling for Generalist Reward Modeling" by researchers from DeepSeek-AI and Tsinghua University, providing insights into the future of AI reasoning models and reward systems.