AI's Potemkin Understanding - The Illusion of Comprehension in Large Language Models
Discover AI via YouTube
Gain a Splash of New Skills - Coursera+ Annual Nearly 45% Off
Get 35% Off CFI Certifications - Code CFI35
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore a critical AI research concept through this 22-minute video examining "Potemkin Understanding" in large language models. Delve into groundbreaking research from MIT and Harvard that reveals how AI systems can appear to understand concepts while lacking genuine comprehension. Learn about the phenomenon where language models can perfectly state rules and apply them in familiar contexts, yet fail catastrophically when encountering novel situations due to incoherent underlying understanding. Discover the profound implications this has for AI safety and alignment strategies, particularly the risk of "Potemkin Alignment" where models seem to follow safety principles but may violate them unpredictably in new contexts. Examine research findings from Marina Mancoridis (MIT), Bec Weeks (University of Chicago), Keyon Vafa (Harvard), and Sendhil Mullainathan (MIT) that challenge assumptions about AI comprehension and highlight critical vulnerabilities in current alignment approaches.
Syllabus
Harvard, MIT: AI's Potemkin Understanding
Taught by
Discover AI