Why NVMe Needs to Evolve for Efficient Storage Access from GPUs

Explore the evolution of NVMe storage protocol in this 49-minute conference presentation that examines the challenges and opportunities for enabling efficient storage access directly from GPUs. Learn how the NVMe protocol, originally designed for latency-sensitive CPUs since 2011, must adapt to meet the unique requirements of GPU-based AI workloads. Discover the fundamental differences between CPU and GPU architectures, including how they execute code, access I/O, and handle parallel processing tasks. Examine the key bottlenecks in current NVMe I/O protocols that hinder efficient GPU storage access, considering that while CPUs prioritize low latency, GPUs are high-compute parallel execution engines with greater latency tolerance. Understand how multiple command initiators from GPUs differ from traditional CPU-based I/O access patterns in AI workloads. Gain insights into specific areas where the NVMe standard should focus protocol improvements to address the next generation of storage challenges posed by GPU computing. The presentation concludes with actionable recommendations for enhancing NVMe standards to better support the growing demands of AI training and inference applications that require direct GPU-to-storage communication.