Agent-Driven MCP for AI Workloads on Kubernetes

Explore how to build an end-to-end AI Platform-as-a-Service on Kubernetes by combining cloud-native tools, Model Context Protocol (MCP) servers, and intelligent agents in this 28-minute conference talk. Learn to address the complexities of managing AI inference workloads on Kubernetes, including GPU instance type selection, service configuration, cost-performance optimization, YAML management, and continuous monitoring of utilization and inference latency. Discover how an intelligent agent can interpret simple text commands like "deploy llama-3-70b-chat" and automatically call external MCP metadata services such as HuggingFace, calculate optimal GPU topology, provision nodes through the Kubernetes AI Toolchain Operator, deploy models, and implement automatic scaling based on real-time metrics—all without manual manifest editing. Gain insights into handling underspecified aspects such as model quantization levels and cost versus latency tradeoffs, while understanding the essential guardrails needed for validation before deployment.