Learn about neural network quantization techniques in this 33-minute conference talk from MLOps World. Explore how low-precision quantization reduces computational costs by constraining neural networks to use narrower data formats during storage and computation. Discover the fundamental concepts underlying state-of-the-art quantization methods and get introduced to Brevitas, AMD's PyTorch quantization library. Gain insights into current active research areas in quantization from an AI Research Staff member at AMD who has published in top-tier conferences like ICCV and ICML, and understand how quantization addresses the rising costs of querying increasingly large neural network models as they scale with bigger training datasets and architectures.