Understanding R1-Zero-Like Training with Dr. GRPO Algorithm

Understanding R1-Zero-Like Training with Dr. GRPO Algorithm

Yacine Mahdid via YouTube Direct link

- intro:

1 of 20

1 of 20

- intro:

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Understanding R1-Zero-Like Training with Dr. GRPO Algorithm

Automatically move to the next video in the Classroom when playback concludes

  1. 1 - intro:
  2. 2 - start of the interview:
  3. 3 - background of zichen:
  4. 4 - LLM post-training:
  5. 5 - summarization of R1-Zero-Like training:
  6. 6 - v3 base model ahah moment:
  7. 7 - is self reflexion real?:
  8. 8 - what would happen if we penalizing self reflexion keywords:
  9. 9 - fusing of keyword/llm-based detection:
  10. 10 - can you trust the llm-as-a-judge:
  11. 11 - what's up with qwen:
  12. 12 - Dr. GRPO overview:
  13. 13 - why that term is there at all?:
  14. 14 - GRPO nature paper removed the bias term???:
  15. 15 - how compaptible Dr. GRPO with GSPO?:
  16. 16 - is there drawback of Dr. GRPO?:
  17. 17 - is there other terms we can remove?:
  18. 18 - balance in the algorithm engineering:
  19. 19 - next research for the lab:
  20. 20 - conclusion:

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.