Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Scaling of Quantized Large Language Models for Efficient Inference

MLOps World: Machine Learning in Production via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the intersection of network quantization and large language models through this 33-minute conference talk that examines decade-old quantization theories from a fresh perspective in the LLM era. Discover how scaling laws predict model quality returns on training computation investment while uncertainty remains high for post-training quantization during inference deployment. Learn about the additional factors that govern LLM scaling after quantization and investigate whether empirical scaling laws can illuminate LLM quantization effectiveness. Gain theoretical insights into the challenges and opportunities of network compression practices through recent research findings. Understand the critical considerations for deploying quantized LLMs efficiently in production environments, with practical implications for AI accelerator applications and algorithm-hardware codesign approaches.

Syllabus

Scaling of Quantized Large Language Models for Efficient Inference

Taught by

MLOps World: Machine Learning in Production

Reviews

Start your review of Scaling of Quantized Large Language Models for Efficient Inference

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.