Scale Can't Overcome Pragmatics - The Impact of Reporting Bias on Vision-Language Reasoning

Attend this 59-minute research seminar exploring how reporting bias in training data fundamentally limits the reasoning capabilities of Vision-Language Models (VLMs), despite their massive scale. Examine the theoretical foundations from pragmatics that explain why people naturally omit tacit information when describing visual content, creating insufficient representation of reasoning skills in web-scale and synthetically generated training corpora. Discover how this communication pattern affects VLM performance across various model and data scales, and learn about potential solutions through more intentional training data curation methods rather than relying solely on scale for emergent reasoning capabilities. Gain insights from PhD candidate Amita Kamath's research at UCLA and University of Washington, conducted in collaboration with the Allen Institute for AI, as she presents findings that challenge conventional approaches to developing reasoning abilities in vision-language systems.