Overcoming Challenges in Serving Large Language Models - SREcon23 Europe/Middle East/Africa

Explore the intricacies of hosting GPT-type models in a Kubernetes cluster with multi-GPU nodes in this 31-minute conference talk from SREcon23 Europe/Middle East/Africa. Delve into the challenges SREs face when providing custom GPT model capabilities within organizations, including managing large model sizes, implementing GPU sharding, and utilizing tensor parallelism. Learn about various model file formats, quantization techniques, and the benefits of open-source tools like Huggingface Accelerate. Gain valuable insights into balancing serving latency, prediction accuracy, and distributed serving, while discovering best practices for optimizing resource allocation. Watch a live demonstration showcasing the performance and trade-offs of a GPT-based model, equipping you with practical knowledge to effectively host and manage large language models in your own environment.

Syllabus

SREcon23 Europe/Middle East/Africa - Overcoming Challenges in Serving Large Language Model

Taught by

USENIX

Reviews

Start your review of Overcoming Challenges in Serving Large Language Models - SREcon23 Europe/Middle East/Africa

Master Agentic AI, GANs, Fine-Tuning & LLM Apps

Learn EDR Internals: Research & Development From The Masters

Taught by

The Investment Banker Certification

Artificial Intelligence: How Much Will It Cost You? - SREcon23 Europe/Middle East/Africa

How to Make Your Automation a Better Team Player - SREcon23 Europe/Middle East/Africa

Implementing Open-Source Observability at Maersk - SREcon23 Europe/Middle East/Africa

Reliable Data for Large ML Models: Principles and Practices

Building a 5-Exaflop Supercomputer for Meta-AI Research and Large-Scale Model Training

Earn Your CS Degree, Tuition-Free, 100% Online! Ad

6 Best Site Reliability Engineering Courses to Take in 2026

8 Best Kubernetes Courses for 2026

11 Best DevOps Courses for 2026: From Coding to Reliable Delivery

Write Prompts That Actually Work: ZTM’s Prompt Engineering Bootcamp Review

Best COBOL Courses for 2026 (Free & Paid): Learn COBOL Programming

Never Stop Learning.