Scaling the AI Infrastructure to Data Center Regions

Learn how to scale artificial intelligence infrastructure across data center regions in this 21-minute conference talk by Dan Rabinovitsj, VP of Data Center Infrastructure at Meta, presented at the Open Compute Project. Discover the strategic approaches and technical considerations required to expand AI computing capabilities beyond single data centers to regional deployments. Explore the challenges of distributed AI infrastructure, including network architecture, resource allocation, and coordination mechanisms needed to support large-scale machine learning workloads across multiple geographic locations. Gain insights into Meta's experience with building and managing AI infrastructure at scale, including best practices for maintaining performance, reliability, and efficiency when deploying AI systems across data center regions. Understand the implications of regional AI infrastructure scaling for latency optimization, data locality, fault tolerance, and operational complexity in modern cloud computing environments.