Completed
00:00 Introduction to Breaking the Inference Pareto Frontier
Class Central Classrooms beta
YouTube videos curated by Class Central.
Classroom Contents
Hacking the Inference Pareto Frontier
Automatically move to the next video in the Classroom when playback concludes
- 1 00:00 Introduction to Breaking the Inference Pareto Frontier
- 2 00:33 Introduction of Kyle Cranon and NVIDIA Dynamo
- 3 01:31 The Three Pillars of Deployment Quality, Latency, Cost
- 4 02:11 Understanding the Pareto Frontier
- 5 03:06 Application-Specific Prioritization of Quality, Latency, and Cost
- 6 04:32 Common Techniques to Manipulate the Pareto Frontier Quantization, RAG, Reasoning
- 7 05:19 Compounding Techniques
- 8 06:04 Three Drivers for Modifying the Pareto Frontier Scale, Structure, Dynamism
- 9 06:20 Scale: Disaggregation
- 10 11:02 Scale: Routing
- 11 13:00 Structure: Inference Time Scaling
- 12 16:14 Structure: KV Manipulation
- 13 17:43 Dynamism: Worker Specialization
- 14 18:42 Dynamism: Dynamic Load Balancing
- 15 19:55 Conclusion and NVIDIA Dynamo Resources