Learn how to dramatically reduce large language model training time through advanced quantization techniques in this conference talk from AI in Production 2025. Discover ZeRO++, a groundbreaking approach that quantizes both weights and gradients during training to achieve a 4x reduction in communication volume, resulting in over 50% faster end-to-end training times. Explore the technical implementation details of this Microsoft DeepSpeed innovation that addresses communication bottlenecks in large-scale LLM training. Gain insights from Guanhua Wang, Senior Researcher on the DeepSpeed team at Microsoft, who led the ZeRO++ project and contributed to Microsoft Phi-3 model training, as he shares practical strategies for optimizing distributed training workflows and reducing computational overhead in production environments.