Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about addressing performance bottlenecks in RDMA-based container networks through this 15-minute conference presentation from NSDI '25. Explore how RDMA-offloaded container networks (RCNs) face unexpected performance degradation when scaling to millions of containers in data centers, with researchers identifying RDMA NICs (RNICs) as the primary source of scalability walls. Discover the innovative approach of using combinatorial causal testing to infer RNIC architecture models and performance characteristics despite limited visibility into hardware internals. Examine the ScalaCN system design that proactively optimizes network function offloading schedules, achieving 1.4× improvement in end-to-end network bandwidth and 31% reduction in packet forwarding latency while resolving 82% of identified performance causes. Understand how this research methodology successfully identified and reported RNIC performance issues to vendors, leading to confirmed fixes and ongoing collaboration for hardware improvements in large-scale container networking environments.