Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

AI Agent Inference Performance Optimizations - vLLM vs SGLang vs TensorRT

Generative AI on AWS via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore cutting-edge AI inference optimization techniques through this comprehensive webinar featuring two expert presentations on high-performance LLM deployment and agentic AI systems. Learn about the evolving landscape of LLM engines that manage weight caches, batch scheduling, and hardware-accelerated operations, with detailed comparisons of vLLM, SGLang, and TensorRT performance benchmarks. Discover how Modal's LLM Engine Advisor streamlines deployment by providing throughput and latency benchmarks across different configurations, complete with ready-to-use code snippets for serverless cloud infrastructure. Gain insights into the critical role of high-performance LLM inference for mass adoption of AI agents, including demonstrations of capturing full GPU hardware capabilities using highly-tuned inference compute solutions like vLLM and NVIDIA Dynamo for ultra-scale autonomous AI agents. Understand the co-design principles of software and cutting-edge hardware that address scaling challenges in ultra-scale inference environments required by modern AI agents, drawing from recent breakthroughs in AI systems performance engineering and GPU optimization techniques.

Syllabus

AI Agent Inference Performance Optimizations + vLLM vs. SGLang vs. TensorRT w/ Charles Frye (Modal)

Taught by

Generative AI on AWS

Reviews

Start your review of AI Agent Inference Performance Optimizations - vLLM vs SGLang vs TensorRT

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.