Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore cutting-edge research in visual intelligence through an in-depth analysis of RSVP (Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought), a groundbreaking method that challenges conventional approaches to visual reasoning. Examine whether visual reasoning requires large language models or if reasoning intelligence can be embedded directly within vision models. Discover how this innovative Visual-Language model system outperforms existing AI systems in complex reasoning tasks, representing the latest advancement in multimodal artificial intelligence. Learn about the technical methodology behind RSVP's visual prompting approach and multi-modal chain-of-thought processing, understanding how it segments and processes visual information for enhanced reasoning capabilities. Gain insights into the research conducted by teams from Opus AI Research, University of Toronto, Southeast University, Brown University, and City University of Hong Kong, and understand the implications of this breakthrough for the future of visual AI systems and their applications in complex problem-solving scenarios.
Syllabus
VISUAL Intelligence - Latest Research
Taught by
Discover AI