Conceptualizing Next Generation Memory and Storage Optimized for AI Inference

Explore the evolving landscape of memory and storage systems specifically designed for AI inference applications in this conference talk. Discover how the unprecedented growth of large language models (LLMs) and expanding context lengths are driving massive data access demands, particularly for weights and Key-Value (KV) cache data. Learn about the paradigm shift in memory and storage understanding, where traditional architectures face new challenges requiring compute to be offloaded into or near memory for optimal energy efficiency and power consumption. Examine the unique access patterns of AI inference workloads, including their read-skewed nature compared to general-purpose applications and semi-sequential storage access behaviors that differ from conventional expectations. Gain insights into next-generation memory concepts including Processing-In-Memory (PIM) technology that optimizes energy efficiency and bandwidth by reducing excessive data movement. Understand pathfinding developments in read-skewed high-capacity memory systems and high-performance storage solutions with semi-random access capabilities. Delve into the selection of appropriate interfaces and semantics for these emerging technologies that promise significant improvements in energy efficiency, bandwidth, and capacity for AI systems, presented by Thomas Won Ha Choi, Director and Memory Systems Architect at SK hynix.