Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore groundbreaking research from Princeton, Harvard, and Google that challenges the visual reasoning capabilities of Vision-Language Models (VLMs) in this 28-minute video. Examine two pivotal studies that question whether current VLMs truly possess visual reasoning abilities or merely rely on semantic complexity transfer. Delve into "VLMs have Tunnel Vision: Evaluating Nonlocal Visual Reasoning in Leading VLMs" by Shmuel Berman and Jia Deng from Princeton University, which reveals limitations in how VLMs process visual information beyond localized areas. Analyze the Harvard University research "Does visualization help AI understand data?" by Victoria R. Li, Johnathan L. Sun, and Martin Wattenberg, investigating whether visual representations genuinely enhance AI data comprehension. Gain insights into the current state of visual AI reasoning models and understand the implications of these findings for the development of more sophisticated vision-language systems.
Syllabus
VLM are almost blind? (Princeton, Harvard, Google)
Taught by
Discover AI