AI Agent Inference Performance Optimizations - vLLM vs SGLang vs TensorRT

Explore cutting-edge AI inference optimization techniques through this comprehensive webinar featuring two expert presentations on high-performance LLM deployment and agentic AI systems. Learn about the evolving landscape of LLM engines that manage weight caches, batch scheduling, and hardware-accelerated operations, with detailed comparisons of vLLM, SGLang, and TensorRT performance benchmarks. Discover how Modal's LLM Engine Advisor streamlines deployment by providing throughput and latency benchmarks across different configurations, complete with ready-to-use code snippets for serverless cloud infrastructure. Gain insights into the critical role of high-performance LLM inference for mass adoption of AI agents, including demonstrations of capturing full GPU hardware capabilities using highly-tuned inference compute solutions like vLLM and NVIDIA Dynamo for ultra-scale autonomous AI agents. Understand the co-design principles of software and cutting-edge hardware that address scaling challenges in ultra-scale inference environments required by modern AI agents, drawing from recent breakthroughs in AI systems performance engineering and GPU optimization techniques.