Towards Robotics Foundation Models that can Reason

Learn about cutting-edge developments in robotics foundation models through this 49-minute conference talk that explores how multimodal large language models can be adapted for robotic reasoning and control. Discover three groundbreaking research projects that bridge the gap between semantic world knowledge in AI models and practical robot applications. Explore AHA, a vision-language model designed to analyze and learn from robotic manipulation failures to improve system robustness. Examine SAM2Act, a 3D generalist robotic model featuring memory-centric architecture that enables high-precision manipulation while maintaining reasoning capabilities over historical observations. Investigate MolmoAct, AI2's flagship robotic foundation model specifically engineered for spatial reasoning and designed as a versatile system for various downstream manipulation tasks. Understand the current challenges in scaling foundation models for robotics, including data scarcity issues and the complexities of generalizing to real-world scenarios. Gain insights into how recent advances in generative AI, particularly in language and visual understanding, are being leveraged to enhance robots with open-world visual comprehension and reasoning abilities, especially for deployment in household environments.