Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Massachusetts Institute of Technology

Quantifying Generalization Complexity for Large Language Models

Massachusetts Institute of Technology via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
This 15-minute conference talk by Hongyin Luo from MIT CSAIL introduces SCYLLA, an innovative evaluation framework designed to distinguish between generalization and memorization in Large Language Models (LLMs). Explore how this dynamic evaluation approach quantifies LLMs' generalization capabilities across different complexity levels, revealing significant insights about performance gaps between in-distribution and out-of-distribution data. Learn about the fascinating "generalization valley" phenomenon—a non-monotonic relationship between task complexity and performance that indicates when LLMs begin to rely too heavily on non-generalizable behavior. Discover how critical complexity thresholds shift as model size increases, suggesting larger models can handle more complex reasoning tasks before defaulting to memorization. The presentation covers benchmarking results across 28 popular LLMs, including both open-source models like LLaMA and Qwen, and closed models such as Claude and GPT, providing valuable insights for researchers and practitioners interested in robust evaluation methods for language models.

Syllabus

Hongyin Lou - Quantifying Generalization Complexity for Large Language Models

Taught by

MIT Embodied Intelligence

Reviews

Start your review of Quantifying Generalization Complexity for Large Language Models

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.