Fast and Flexible Inference on Open-Source AI Models at Scale - BRK117

Learn to deploy and scale open-source AI models efficiently across local and cloud environments in this 43-minute conference talk from Microsoft Ignite 2025. Discover how to run custom AI models with flexibility using Azure Container Apps and serverless GPUs for cost-effective inferencing, while exploring how Azure Kubernetes Service (AKS) enables scalable, high-performance large language model operations with fine-tuned control. Explore use cases including hybrid model architecture, LLM agents, and data boundary control, then dive into GPU-intensive workloads for physics and video processing. Master Docker Compose for AI agents and simplified cloud deployment, followed by hands-on demonstrations of dashboard generators and log streaming visualization. Examine AKS investment areas covering scale, security, cost optimization, and AI support, plus enhanced workload scheduling and configuration specifically designed for AI workloads. Understand inference traffic management using Gateway API through live demos, and learn from real-world case studies including RBC's CI/CD pipeline for secure GPU resource provisioning and their strategy for building Canada's largest AI farm within compliance boundaries. Gain practical knowledge for deploying models with agility and cost clarity, whether you're working in local development environments or enterprise-scale cloud deployments.

Syllabus

0:00 - Use cases: hybrid model architecture, LLMS agents, data boundary control
00:09:09 - Introduction to GPU-intensive workloads like physics and video processing
00:11:47 - Docker Compose for AI agents and simplified cloud deployment
00:16:00 - Live testing of the dashboard generator and log streaming visualization
00:20:31 - AKS investment areas: scale, security, cost optimization and AI support
00:25:04 - Enhanced workload scheduling and configuration for AI workloads
00:30:25 - Inference traffic management using Gateway API and Ignite demo preview
00:35:11 - RBC’s CI/CD pipeline accelerating secure GPU resource provisioning
00:38:01 - RBC strategy: building Canada’s largest AI farm within compliance boundaries