Turbocharge ANNS on Real Processing-in-Memory by Enabling Fine-Grained Per-PIM-Core Scheduling

Learn how to dramatically improve Approximate Nearest Neighbor Search (ANNS) performance using Processing-in-Memory (PIM) hardware through innovative fine-grained scheduling techniques in this 19-minute conference presentation from USENIX ATC '25. Discover the fundamental challenges of memory-intensive ANNS workloads and understand why traditional batch scheduling approaches lead to severe underutilization in PIM systems like UPMEM hardware. Explore the novel PIMANN system architecture that leverages an undocumented control interface in PIM cores to enable per-core scheduling, breaking away from conventional batching paradigms. Master the implementation of persistent PIM kernel techniques that eliminate idle states between batches and learn about per-PU query dispatching methods that optimize load distribution based on real-time PIM core status. Examine experimental results demonstrating 2.4-10.4× throughput improvements compared to existing CPU and GPU-based ANNS systems, and gain insights into overcoming the memory wall problem in database and AI infrastructure applications.