Completed
- background of zichen:
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Understanding R1-Zero-Like Training with Dr. GRPO Algorithm
Automatically move to the next video in the Classroom when playback concludes
- 1 - intro:
- 2 - start of the interview:
- 3 - background of zichen:
- 4 - LLM post-training:
- 5 - summarization of R1-Zero-Like training:
- 6 - v3 base model ahah moment:
- 7 - is self reflexion real?:
- 8 - what would happen if we penalizing self reflexion keywords:
- 9 - fusing of keyword/llm-based detection:
- 10 - can you trust the llm-as-a-judge:
- 11 - what's up with qwen:
- 12 - Dr. GRPO overview:
- 13 - why that term is there at all?:
- 14 - GRPO nature paper removed the bias term???:
- 15 - how compaptible Dr. GRPO with GSPO?:
- 16 - is there drawback of Dr. GRPO?:
- 17 - is there other terms we can remove?:
- 18 - balance in the algorithm engineering:
- 19 - next research for the lab:
- 20 - conclusion: