The State of LLM Compression — From Research to Production

This weekly AI seminar from the "Random Samples" series explores the evolving landscape of Large Language Model (LLM) compression techniques, bridging research innovations with practical implementation. Dive into the challenges of managing massive LLMs and learn about cutting-edge compression methods including quantization and sparsity, with clear explanations of their accuracy-performance tradeoffs. Understand the significant differences between academic benchmarks and real-world applications, discover which compression techniques are production-ready versus those still in research phases, and explore strategies for optimizing LLM deployment across various computing environments. The presentation includes comprehensive session slides and is designed for AI developers, data scientists, and researchers looking to implement more efficient generative AI systems. Part of a weekly series that keeps participants at the forefront of AI advancements.