GPREEMPT - GPU Preemptive Scheduling Made General and Efficient

Learn about GPREEMPT, a novel GPU preemptive scheduling mechanism that addresses the fundamental trade-off between generality and efficiency in GPU resource management. Discover how this research from Tsinghua University and Renmin University of China tackles the challenge of co-locating diverse workloads with different service level agreements (SLA) requirements on GPUs, including latency-critical and best-effort tasks. Explore the limitations of existing preemption strategies, including wait-based approaches that suffer from significant preemption latency and reset-based approaches that require kernel idempotence, thus limiting their applicability. Understand how GPREEMPT implements a timeslice-based yield mechanism to enable context-switch preemption on GPUs while maintaining broad generality. Examine the innovative hint-based pre-preemption technique that overlaps the preemption process with data preparation to minimize context-switching overhead. Analyze the evaluation results demonstrating GPREEMPT's ability to achieve low-latency preemption within 40 microseconds, comparable to executing only latency-critical tasks, while remaining applicable to non-idempotent workloads where traditional reset-based mechanisms fail.