Co-Designing for Scale - CXL Based Memory Solution for Data Centric Workloads

Explore hardware-software co-optimized solutions for memory-bound workloads in this 30-minute conference talk from the Open Compute Project. Learn how to leverage disaggregated and composable CXL resources to augment memory-constrained AI accelerators through flexible system architectures designed for data-intensive applications including billion-scale vector search, cloud-native and in-memory databases, data analytics, and large-scale AI inferencing. Discover innovative approaches featuring switch-based disaggregated memory, Near Memory Compute accelerators, and memory pooling appliances that improve efficiency, reduce total cost of ownership, and enhance system performance. Examine how advanced open-source frameworks such as NVIDIA Dynamo, along with KV-cache-focused solutions like Mooncake and LMCache, address software challenges introduced by heterogeneity while lowering adoption barriers for cost-effective, scalable, and energy-efficient AI infrastructure. The presentation features insights from industry experts including Gaurav Agarwal from Marvell, Anil Godbole from Intel, Jianping Jiang from Xconn Technologies Holdings, and Xinjun Yang from Alibaba, providing comprehensive perspectives on cutting-edge CXL-based composable memory systems.