Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

VLMs Are Almost Blind - Visual Reasoning in Vision-Language Models

Discover AI via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore groundbreaking research from Princeton, Harvard, and Google that challenges the visual reasoning capabilities of Vision-Language Models (VLMs) in this 28-minute video. Examine two pivotal studies that question whether current VLMs truly possess visual reasoning abilities or merely rely on semantic complexity transfer. Delve into "VLMs have Tunnel Vision: Evaluating Nonlocal Visual Reasoning in Leading VLMs" by Shmuel Berman and Jia Deng from Princeton University, which reveals limitations in how VLMs process visual information beyond localized areas. Analyze the Harvard University research "Does visualization help AI understand data?" by Victoria R. Li, Johnathan L. Sun, and Martin Wattenberg, investigating whether visual representations genuinely enhance AI data comprehension. Gain insights into the current state of visual AI reasoning models and understand the implications of these findings for the development of more sophisticated vision-language systems.

Syllabus

VLM are almost blind? (Princeton, Harvard, Google)

Taught by

Discover AI

Reviews

Start your review of VLMs Are Almost Blind - Visual Reasoning in Vision-Language Models

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.