Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

How to Deploy LLMs - LLMOps Stack with vLLM, Docker, Grafana and MLflow

Venelin Valkov via YouTube

Start learning Write review

Details

Start learning

Provider

YouTube
Pricing

Free Video
Languages

English
Effort

19 minutes
Sessions

Self-Paced
Level

Intermediate

Found in

Learn to build a production-ready LLM deployment stack that goes beyond basic Python scripts wrapped in Docker containers. Explore the challenges of deploying Large Language Models to production environments, including issues with high latency, security vulnerabilities, and lack of monitoring visibility. Discover how to construct a comprehensive inference stack using consumer GPUs with vLLM for efficient model serving, nginx for load balancing and reverse proxy functionality, and Grafana with Prometheus for comprehensive monitoring and observability. Master the configuration of Docker Compose for orchestrating multiple services, implement proper nginx configuration for production traffic handling, and set up robust monitoring systems to track performance metrics and system health. Follow along with a practical virtual instance setup and witness live load testing using LangChain client to validate the deployment's performance under realistic conditions. Gain insights into why simple containerized Python scripts fail in production scenarios and understand the architectural decisions needed for scalable, secure, and observable LLM deployments.

Syllabus

00:00 - Why Python script fail in production
01:47 - The stack architecture vLLM, nginx, Grafana
04:42 - Docker compose definition
08:35 - Nginx config
09:08 - Monitoring with Prometheus and Grafana config
10:13 - Virtual instance setup
13:54 - Live load test with LangChain client