Learning for Physical Interaction - From Pixels to Machines that See, Reason and Act - Day 4 Afternoon

Learn about the intersection of artificial intelligence and physical interaction in this comprehensive lecture from JSALT 2025. Explore how large-scale neural networks have revolutionized natural language processing and computer vision, yet face unique challenges when applied to physical environments where data collection is constrained by real-world limitations. Discover recent advances in visually guided robotic manipulation, where traditional Internet-scale datasets and transformer architectures must adapt to the slower pace of physical robot data collection. Examine cutting-edge research addressing the fundamental problem of enabling machines to see, reason, and act in physical spaces, moving beyond pixels to practical robotic applications. Gain insights into self-supervised learning techniques, supercomputer infrastructures, and neural architectures specifically designed for physical interaction tasks. Access explanations of key computer vision and robotics concepts made accessible for students with machine learning and speech/language backgrounds but limited expertise in these specialized fields. Benefit from the expertise of Josef Sivic, a distinguished researcher from the Czech Institute of Informatics, Robotics and Cybernetics who leads the Intelligent Machine Perception team and ELLIS Unit Prague, bringing over a decade of experience from Inria Paris and recognition including ERC grants and test-of-time awards at major computer vision conferences.