AI Engineer - Learn how to integrate AI into software applications
All Coursera Certificates 40% Off
Overview
Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Explore architecting and implementing a scalable LLM inference service on Amazon EKS in this 33-minute conference talk from the Linux Foundation. Dive deep into workload orchestration using Kubernetes as the foundation while integrating NVIDIA NIMS for optimal GPU utilization, LangChain for flexible LLM operations, and Milvus for efficient vector storage. Learn how to leverage FluxCD for GitOps-driven deployments, implement Karpenter for horizontal scaling, and establish comprehensive observability with Prometheus and Grafana. Discover best practices for building production-ready large language model inference systems that can scale effectively in cloud-native environments, combining cutting-edge AI technologies with robust Kubernetes orchestration patterns.
Syllabus
Scalable LLM Inference on Kubernetes With NVIDIA NIMS, LangChain, Milvus and Flu... Riccardo Freschi
Taught by
Linux Foundation