LLMs Ignoring New Context - SIN-Bench for Tracing Native Evidence Chains in Long-Context Multimodal Scientific Literature

Explore a research presentation examining how large language models struggle with incorporating new contextual information, based on collaborative work between Tsinghua University and Stanford University. Delve into the SIN-Bench framework, which traces native evidence chains in long-context multimodal scientific interleaved literature to evaluate LLM performance. Learn about the systematic investigation conducted by researchers from multiple institutions including Tsinghua University, Shanghai AI Laboratory, 2077AI, KuaiShou Inc., Stanford University, and Harvard University, focusing on how models handle complex scientific documents with mixed text and visual elements. Understand the methodology behind measuring whether LLMs can effectively process and reason with newly introduced contextual information in extended scientific literature, and discover the implications for AI reasoning capabilities in academic and research settings.