Evaluating Language Models for Mathematics Through Interactive Problem-Solving
Harvard CMSA via YouTube
Get Coursera Plus for 40% off
Power BI Fundamentals - Create visualizations and dashboards from scratch
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Watch a Harvard CMSA seminar presentation where Katherine Collins and Albert Jiang from the University of Cambridge discuss their research on evaluating large language models (LLMs) for mathematical problem-solving through interactive assessment. Explore the development of CheckMate, a prototype platform designed to facilitate human-LLM interactions and evaluation in mathematical contexts. Learn about their comparative study of InstructGPT, ChatGPT, and GPT-4 as mathematical proof assistants, involving participants ranging from undergraduate students to mathematics professors. Discover key insights from their MathConverse dataset, including a taxonomy of human behaviors and the relationship between correctness and perceived helpfulness in LLM responses. Gain valuable perspectives on the practical applications and limitations of LLMs in mathematical reasoning, with particular attention to GPT-4's capabilities as analyzed through expert mathematician case studies. Understand important considerations for both machine learning practitioners and mathematicians, including the benefits of models that effectively communicate uncertainty, respond to corrections, and maintain interpretability and conciseness.
Syllabus
Katherine Collins & Albert Jiang | Evaluating Language Models for Mathematics through Interactions
Taught by
Harvard CMSA