Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
This conference talk from SREcon25 Americas explores how Meta's Monetization team transformed their machine learning training infrastructure management through governance approaches to improve efficiency and innovation. Presented by Anamaya Sullerey and Brian Hansen from Meta, the 32-minute talk addresses the challenges of resource allocation as AI model sizes and deployment footprints grow exponentially. Learn about cutting-edge strategies for accurately measuring and attributing ML training costs to focus on high-ROI investments, maximizing existing resources to unlock hidden capacity, and streamlining ML development processes to accelerate time-to-market. Through a detailed case study of a successful ML training workload governance system, discover the complexities of cost attribution in ML training projects and gain valuable insights from Meta's experience bridging the gap between research and production environments.