How Workday Achieved 50x Cheaper Model Serving with Ray Serve

Learn how Workday rebuilt its ML model-serving architecture using Ray Serve to achieve 50× cost savings while maintaining low latency and high reliability across tens of thousands of models. Discover the core challenges Workday faced in early 2023 when serving dedicated ML models for every tenant across multiple environments became increasingly expensive and difficult to scale. Explore the ground-up redesign solution built on Ray Serve that now powers models across more than a dozen environments, leveraging built-in autoscaling and efficient request routing. Examine the unique usage patterns that pushed Ray Serve beyond its original design limits, including how early deployments hit scalability ceilings with just dozens of applications and the critical improvements contributed back to the open-source Ray community to support thousands of applications per cluster. Gain deep insights into Ray Serve internals, practical patterns for building complex serving systems, architectural challenges encountered during scaling, and the specific contributions made to overcome these obstacles. Understand how to leverage Ray Serve for large-scale model serving and find inspiration to contribute to the Ray ecosystem yourself.