Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Self-Principled Critique Tuning with DeepSeek-GRM-27B

Discover AI via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about DeepSeek's innovative learning method "Self-Principled Critique Tuning" (SPCT) and their new reasoning model DeepSeek-GRM-27B in this 19-minute explanatory video. Discover how this breakthrough approach works and why it might form the foundation for the next DeepSeek R2. The video covers the research from "Inference-Time Scaling for Generalist Reward Modeling" by researchers from DeepSeek-AI and Tsinghua University, providing insights into the future of AI reasoning models and reward systems.

Syllabus

NEW by DeepSeek: SPCT w/ DeepSeek-GRM-27B

Taught by

Discover AI

Reviews

Start your review of Self-Principled Critique Tuning with DeepSeek-GRM-27B

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.