Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Code Chain of Thought with Self-Evolution LLM: rStar-Math Mathematical Reasoning Framework

Discover AI via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Watch a 34-minute technical video exploring Microsoft Research Asia's rStar-Math framework for developing mathematical reasoning capabilities in Small Language Models (SLMs). Learn how DeepSeek 236B generates high-quality, step-by-step reasoning trajectories for mathematical tasks in Round 1, which are then used to fine-tune a 7B Qwen SLM. Discover the self-evolution framework that employs Monte Carlo Tree Search (MCTS) and Process Preference Model (PPM) to iteratively enhance the smaller policy model's performance. Examine an open research question about whether a pure 7B policy model could achieve high-performance mathematical reasoning through self-evolution techniques like MCTS and code-augmented verification. Based on research by Xinyu Guan, Li Lyna Zhang, Yifei Liu, Ning Shang, Youran Sun, Yi Zhu, Fan Yang, and Mao Yang from Microsoft Research Asia.

Syllabus

Code CoT w/ Self-Evolution LLM: rStar-Math Explained

Taught by

Discover AI

Reviews

Start your review of Code Chain of Thought with Self-Evolution LLM: rStar-Math Mathematical Reasoning Framework

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.