Llama-Nemotron - Efficient Open Reasoning Models

Learn about Llama-Nemotron, an open-source family of reasoning models that delivers state-of-the-art reasoning capabilities with industry-leading inference efficiency in this 30-minute conference talk. Discover how these models, available in three sizes—Nano (8B), Super (49B), and Ultra (253B)—surpass existing open reasoning models such as DeepSeek-R1 while offering substantial improvements in inference throughput and memory efficiency. Explore the specialized training methodology underlying these models, including a two-stage post-training pipeline that combines supervised fine-tuning (SFT) using carefully curated synthetic datasets to effectively distill advanced reasoning behaviors, and large-scale reinforcement learning (RL) with curriculum-driven self-learning to enable models to exceed teacher performance. Examine key innovations such as neural architecture search (NAS) for enhanced model efficiency, targeted inference-time optimizations, and a dynamic toggle for switching reasoning on or off, with emphasis on their practical importance in real-world enterprise deployments. Gain insights from NVIDIA Research Scientist Soumye Singhal, who specializes in LLM post-training and alignment for Nemotron models and has contributed to the development of both Llama-Nemotron reasoning models and Nemotron-Hybrid models.