Code Chain of Thought with Self-Evolution LLM: rStar-Math Mathematical Reasoning Framework
Discover AI via YouTube
AI Engineer - Learn how to integrate AI into software applications
Learn Backend Development Part-Time, Online
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Watch a 34-minute technical video exploring Microsoft Research Asia's rStar-Math framework for developing mathematical reasoning capabilities in Small Language Models (SLMs). Learn how DeepSeek 236B generates high-quality, step-by-step reasoning trajectories for mathematical tasks in Round 1, which are then used to fine-tune a 7B Qwen SLM. Discover the self-evolution framework that employs Monte Carlo Tree Search (MCTS) and Process Preference Model (PPM) to iteratively enhance the smaller policy model's performance. Examine an open research question about whether a pure 7B policy model could achieve high-performance mathematical reasoning through self-evolution techniques like MCTS and code-augmented verification. Based on research by Xinyu Guan, Li Lyna Zhang, Yifei Liu, Ning Shang, Youran Sun, Yi Zhu, Fan Yang, and Mao Yang from Microsoft Research Asia.
Syllabus
Code CoT w/ Self-Evolution LLM: rStar-Math Explained
Taught by
Discover AI