What Is a Humanoid Foundation Model? An Introduction to GR00T N1

Explore NVIDIA's groundbreaking GR00T N1, the first open Vision-Language-Action foundation model designed specifically for humanoid robots, in this beginner-friendly conference talk recorded at the AI Engineer World's Fair. Learn how foundation models are evolving beyond text and image generation to control physical movement through GR00T's innovative dual-system architecture that combines System 2 high-level reasoning with System 1 real-time motor control. Discover what distinguishes robot foundation models from traditional LLMs and vision models, understand how GR00T's cognitive-inspired design enables end-to-end training on diverse datasets including human videos, robot trajectories, and synthetic simulations, and see how integrating language, vision, and action capabilities unlocks new generalist robot behaviors. Watch demonstrations of the full-sized humanoid robot performing complex bimanual manipulation tasks in real-world environments, and gain insights into how large-scale AI is making the leap from digital to physical applications. Senior Solutions Architect Annika Brundyn and Solutions Architect Aastha Jhunjhunwala from NVIDIA provide expert perspectives on deploying generative AI systems in robotics, making this complex technology accessible without requiring advanced robotics knowledge.