Overview
Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Learn advanced production strategies for deploying large language models at enterprise scale in this 47-minute conference talk. Discover how to overcome the critical challenge that causes 90% of LLM prototypes to fail in reaching production due to astronomical inference costs and performance bottlenecks. Explore proven techniques and open-source tooling that can dramatically reduce operational expenses, as demonstrated through a real-world case study showing how to cut monthly LLM costs from $1 million to under $500. Master the complex balance between the three critical factors of latency, accuracy, and cost in large language model deployments. Gain insights into production-proven strategies specifically designed for software architects, engineering leaders, and ML engineers working on enterprise-scale GenAI implementations. Understand the technologies and methodologies that make generative AI viable and sustainable for large-scale enterprise deployments.
Syllabus
GenAI at Scale: Red Hat CTO's $1M to $500/Mo Cost Secret
Taught by
InfoQ