Unlocking the Potential of Large Language Models in Production - Best Practices and Solutions
CNCF [Cloud Native Computing Foundation] via YouTube
Overview
Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Explore a conference talk that delves into the challenges and solutions of deploying large language models (LLMs) in production environments. Learn about the paradigm shift from traditional machine learning to GenAI and LLMs, focusing on the complex LLMOps challenges in deployment, scaling, and operations. Discover best practices for building scalable inference platforms using cloud native technologies like Kubernetes, Kubeflow, Kserve, and Knative. Gain insights into essential aspects of LLM operations, including benchmarking tools, storage solutions for efficient auto-scaling, model optimization for specialized accelerators, implementing A/B testing with limited compute resources, and monitoring strategies. Follow a detailed case study of KServe that demonstrates practical solutions to these production challenges, presented by experts from Red Hat and NVIDIA.
Syllabus
Unlocking Potential of Large Models in Production - Yuan Tang, Red Hat & Adam Tetelman, NVIDIA
Taught by
CNCF [Cloud Native Computing Foundation]