Towards Optimal Rack-scale μs-level CPU Scheduling through In-Network Workload Shaping

Learn about Pallas, an innovative application-aware rack-scale CPU scheduling solution designed for microsecond-level services in this 16-minute conference presentation from USENIX ATC '25. Discover how this research addresses the limitations of existing rack-scale CPU scheduling approaches that suffer from inaccurate load balancing and suboptimal scheduling due to their application-agnostic nature. Explore the core innovation of in-network workload shaping that partitions workloads into homogeneous shards based on CPU demands, enabling simple yet near-optimal inter-server load balancing and intra-server scheduling. Examine the comprehensive experimental results demonstrating Pallas's superior performance over state-of-the-art solutions like RackSched, including an 8.5× reduction in tail latency at medium load and up to two orders of magnitude improvement at high load, while maintaining stable performance during workload shifts and transient bursts. Gain insights into the technical implementation details and understand how this solution advances the field of microsecond-level service scheduling in modern data center environments.