Hacking the Inference Pareto Frontier

Hacking the Inference Pareto Frontier

AI Engineer via YouTube Direct link

00:00 Introduction to Breaking the Inference Pareto Frontier

1 of 15

1 of 15

00:00 Introduction to Breaking the Inference Pareto Frontier

Class Central Classrooms beta

YouTube videos curated by Class Central.

Classroom Contents

Hacking the Inference Pareto Frontier

Automatically move to the next video in the Classroom when playback concludes

  1. 1 00:00 Introduction to Breaking the Inference Pareto Frontier
  2. 2 00:33 Introduction of Kyle Cranon and NVIDIA Dynamo
  3. 3 01:31 The Three Pillars of Deployment Quality, Latency, Cost
  4. 4 02:11 Understanding the Pareto Frontier
  5. 5 03:06 Application-Specific Prioritization of Quality, Latency, and Cost
  6. 6 04:32 Common Techniques to Manipulate the Pareto Frontier Quantization, RAG, Reasoning
  7. 7 05:19 Compounding Techniques
  8. 8 06:04 Three Drivers for Modifying the Pareto Frontier Scale, Structure, Dynamism
  9. 9 06:20 Scale: Disaggregation
  10. 10 11:02 Scale: Routing
  11. 11 13:00 Structure: Inference Time Scaling
  12. 12 16:14 Structure: KV Manipulation
  13. 13 17:43 Dynamism: Worker Specialization
  14. 14 18:42 Dynamism: Dynamic Load Balancing
  15. 15 19:55 Conclusion and NVIDIA Dynamo Resources

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.