Building a Production-Grade AI/ML Inference Platform on Kubernetes

Learn to design and operate AI/ML inference workloads on Kubernetes for production environments through this 34-minute conference talk from DevOpsDays Tel Aviv. Explore the complete lifecycle of building an AI/ML platform capable of analyzing and summarizing massive volumes of unstructured medical data in real time, going beyond powerful models to focus on production-grade infrastructure that handles complex inference workloads at scale. Discover model handoff processes from data science teams, validation and readiness assessments, deployment architecture design, and effective scheduling and scaling strategies. Master inference traffic management techniques while diving deep into observability practices, performance tuning methodologies, and optimization strategies essential for maintaining system reliability under heavy computational loads. Examine real-world implementation through an Amazon EKS example that demonstrates practical approaches to running high-performance AI/ML inference systems. Gain comprehensive understanding of operational challenges, architectural trade-offs, and engineering solutions required for efficiently and responsibly scaling AI/ML inference platforms in production environments.