Build AI Apps with Azure, Copilot, and Generative AI — Microsoft Certified
Google, IBM & Microsoft Certificates — All in One Plan
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn about advanced techniques for optimizing Large Language Models (LLMs) through weight and key-value cache quantization in this technical lecture presented by Tianyi Zhang. Explore methods for making LLMs both faster and more cost-effective while maintaining performance, with detailed insights into quantization techniques that reduce memory requirements and computational overhead. Dive into practical approaches for implementing these optimizations, understanding their impact on model efficiency, and discovering how to balance speed and resource usage in LLM deployments.
Syllabus
Guest Lecture by Tianyi Zhang: Faster & Cheaper LLMs with Weight and Key-value Cache Quantization
Taught by
UofU Data Science