Orchestrating AI Models in Kubernetes: Deploying Ollama as a Native Container Runtime

This conference talk explores how to simplify AI model deployment in Kubernetes by implementing Ollama as a native container runtime. Learn how Samuel Veloso from Cast AI and Lucas Fernández from Red Hat address the challenges of complex AI model serving workflows through a custom container runtime solution. The presentation demonstrates how this approach extends standard container execution capabilities to enable more straightforward deployment and management of open-source AI models within Kubernetes environments. Drawing parallels with security-focused solutions like gVisor and Kata Containers, the speakers show how similar principles can be applied to AI model serving, creating a more user-friendly experience for developers working with AI in cloud-native environments.