Kamino - Efficient VM Allocation at Scale with Latency-Driven Cache-Aware Scheduling

Explore a 17-minute conference presentation from OSDI '25 that introduces Kamino, a high-performance scheduling system designed to optimize virtual machine allocation in large-scale cloud environments. Learn how researchers from Rutgers University, Microsoft Research, and Microsoft Azure developed a novel latency-driven, cache-aware request scheduling algorithm that addresses critical performance bottlenecks in VM allocation systems. Discover the theoretical foundations behind Kamino's scheduling approach, which uses partial cache state indicators to intelligently assign allocation requests to agents with the lowest estimated latency. Examine the system's architecture and understand how it overcomes limitations of traditional load-balancing mechanisms that ignore cache states and latency considerations. Review comprehensive evaluation results from high-fidelity simulations on production workloads showing 42% reduction in average request latencies, along with real-world deployment outcomes in a major public cloud demonstrating 33% decrease in cache miss rates and 17% reduction in memory usage. Gain insights into the challenges of scaling VM allocation systems while maintaining strict latency requirements and learn how cache-aware scheduling can significantly improve performance in distributed cloud infrastructure.