Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CNCF [Cloud Native Computing Foundation]

Introducing LLM Instance Gateways for Efficient Inference Serving

CNCF [Cloud Native Computing Foundation] via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Lightning talk that introduces LLM Instance Gateways for efficient inference serving in cloud native environments. Learn about the unique challenges of serving Large Language Models (LLMs) in production compared to traditional HTTP/gRPC traffic. Discover why LLM Instance Gateways are crucial for efficiently managing multiple LLM use cases with varying demands on shared infrastructure. Understand the core complexities of LLM inference serving, including resource allocation, traffic management, and performance optimization. Explore how these gateways work to route requests, manage resources, and ensure fairness among different LLM applications. Presented by Abdel Sghiouar from Google Cloud and Daneyon Hansen from solo.io at a CNCF event, this 16-minute talk provides essential insights for organizations looking to optimize their LLM deployment strategies.

Syllabus

Lightning Talk: Introducing LLM Instance Gateways for Efficient I... Abdel Sghiouar & Daneyon Hansen

Taught by

CNCF [Cloud Native Computing Foundation]

Reviews

Start your review of Introducing LLM Instance Gateways for Efficient Inference Serving

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.