Learn about NVIDIA's breakthrough MASTERS distillation protocol that solves the "capacity gap" problem when distilling large 72B models into smaller 3B vision AI models. Discover how traditional direct distillation fails because extreme size differences prevent students from resolving high-dimensional teacher manifolds, resulting in noisy, blurred learning instead of precise logic. Explore NVIDIA's innovative solution that dynamically prunes the teacher model to match student capacity, then progressively restores teacher complexity as the student learns. Understand the reverse curriculum strategy and dual-reward offline reinforcement learning loop (GRPO) that enables compact local models to outperform massive 72B systems. Examine the mathematical foundations behind this approach and its implications for efficient AI model deployment, based on the research paper "Masking Teacher and Reinforcing Student for Distilling Vision-Language Models" by NVIDIA researchers.