Code Chain of Thought with Self-Evolution LLM: rStar-Math Mathematical Reasoning Framework
Discover AI via YouTube
Learn Generative AI, Prompt Engineering, and LLMs for Free
Master Agentic AI, GANs, Fine-Tuning & LLM Apps
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Watch a 34-minute technical video exploring Microsoft Research Asia's rStar-Math framework for developing mathematical reasoning capabilities in Small Language Models (SLMs). Learn how DeepSeek 236B generates high-quality, step-by-step reasoning trajectories for mathematical tasks in Round 1, which are then used to fine-tune a 7B Qwen SLM. Discover the self-evolution framework that employs Monte Carlo Tree Search (MCTS) and Process Preference Model (PPM) to iteratively enhance the smaller policy model's performance. Examine an open research question about whether a pure 7B policy model could achieve high-performance mathematical reasoning through self-evolution techniques like MCTS and code-augmented verification. Based on research by Xinyu Guan, Li Lyna Zhang, Yifei Liu, Ning Shang, Youran Sun, Yi Zhu, Fan Yang, and Mao Yang from Microsoft Research Asia.
Syllabus
Code CoT w/ Self-Evolution LLM: rStar-Math Explained
Taught by
Discover AI