Compression Enabled MRAM Memory Chiplet Subsystems for LLM Inference Accelerators
Open Compute Project via YouTube
Lead AI Strategy with UCSB's Agentic AI Program — Microsoft Certified
Future-Proof Your Career: AI Manager Masterclass
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
This 16-minute talk from the Open Compute Project features Nilesh Shah (VP Business Development at Zeropoint Technologies), Angelos Arelakis (CTO at Zeropoint Technologies), and Andy Green (from Numem UK) discussing memory solutions for large language model (LLM) inference. Explore how LLM inference faces memory-bound challenges with a 6:1 read-to-write ratio, while current HBM-based GPUs are designed for balanced access patterns. Learn about compression-enabled MRAM memory chiplet subsystems as a potential solution to address the specific memory requirements of LLM inference accelerators.
Syllabus
Compression Enabled MRAM Memory Chiplet Subsystems for LLM Inference Accelerators
Taught by
Open Compute Project