AI Guide to the Galaxy Episode 2 - Running Local LLMs with Docker Model Runner

Learn to deploy and run large language models locally using Docker Model Runner in this 45-minute episode featuring Principal Engineer Jacob Howard and host Oleg. Discover how to install and configure Docker Model Runner on Docker CE and Docker Desktop environments, exploring both GPU and CPU support options along with the underlying container-based architecture. Master the process of running LLMs in CI environments like GitHub Actions while understanding performance benchmarking on lightweight setups with minimal hardware requirements. Explore model selection strategies including choosing appropriate sizes and quantizations based on your hardware capabilities, and learn to deploy Model Runner in production environments using Kubernetes and Google Cloud Run. Gain practical debugging skills using logs, Docker Desktop's request inspector, and OpenAI API compatibility features. Get insights into upcoming features including VLLM backend support and multimodal inference capabilities, with coverage of essential tools like LLaMA.cpp, quantized LLMs, VRAM sizing, and OCI model artifacts for building agentic applications and production-scale AI deployments.