Secure and Optimize AI and ML Workloads with the Cross-Cloud Network

This presentation by Vaibhav Katkade, Product Manager at Google Cloud Networking, explores infrastructure enhancements in cloud networking specifically designed for AI/ML workloads. Learn about the complete AI/ML lifecycle including training, fine-tuning, and inference, with detailed explanations of network requirements for each phase. Discover how Google Cloud's interconnect solutions enable fast, secure data transfer from on-premises environments, and how GKE clusters now support up to 65,000 nodes to accommodate large models like Gemini. Explore the innovative GKE inference gateway that optimizes LLM serving through intelligent load balancing based on KV cache utilization, resulting in 60% lower latency and 40% higher throughput. Understand how the gateway enables autoscaling based on model server metrics, supports multiplexing with LoRa fine-tuned adapters, and integrates with security tools like Google's Model Armor. The presentation addresses the challenge of GPU/TPU capacity constraints across regions with Google's solution for routing to available capacity through a single inference gateway, while providing platform teams centralized control and consistent security coverage across all models. Recorded at AI Infrastructure Field Day in Santa Clara on April 22, 2025.