Code Chain of Thought with Self-Evolution LLM: rStar-Math Mathematical Reasoning Framework
Discover AI via YouTube
35% Off Finance Skills That Get You Hired - Code CFI35
Free AI-powered learning to build in-demand skills
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Watch a 34-minute technical video exploring Microsoft Research Asia's rStar-Math framework for developing mathematical reasoning capabilities in Small Language Models (SLMs). Learn how DeepSeek 236B generates high-quality, step-by-step reasoning trajectories for mathematical tasks in Round 1, which are then used to fine-tune a 7B Qwen SLM. Discover the self-evolution framework that employs Monte Carlo Tree Search (MCTS) and Process Preference Model (PPM) to iteratively enhance the smaller policy model's performance. Examine an open research question about whether a pure 7B policy model could achieve high-performance mathematical reasoning through self-evolution techniques like MCTS and code-augmented verification. Based on research by Xinyu Guan, Li Lyna Zhang, Yifei Liu, Ning Shang, Youran Sun, Yi Zhu, Fan Yang, and Mao Yang from Microsoft Research Asia.
Syllabus
Code CoT w/ Self-Evolution LLM: rStar-Math Explained
Taught by
Discover AI