Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Boosting vLLM Inference on Huawei NPU with Ray Compiled Graphs

Anyscale via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to accelerate vLLM inference on Huawei Ascend NPUs using Ray Compiled Graphs in this 11-minute conference talk from Ray Summit 2025. Discover the major advancement presented by Huawei Canada engineers that achieves over 50% performance gains compared to existing NPU-based solutions through a new extension to Ray Compiled Graph. Explore how this work serves as both a production-grade optimization and proof-of-concept for SPMD-mode support in the upcoming vLLM V1 integration with Ray. Understand the central design element of the new NPU Store, inspired by Ray's GPU Store, which streamlines tensor movement and improves cross-device efficiency in heterogeneous pipelines. Examine three key contributions: the Multi-Accelerator Support Layer that provides a generic abstraction layer compatible with GPU Store for NCCL-style peer-to-peer tensor transfers across accelerators, the SPMD-Mode NPU Backend that leverages advanced operator fusion and optimized memory scheduling for high-performance inference on Huawei Ascend NPUs, and the Optimized Cross-Device Tensor Transfer system featuring a prototype NPU Store that maximizes throughput and minimizes latency for CPU-NPU tensor movement. Gain insights into how this design simplifies future integration of other hardware backends including TPUs, NPUs, and emerging accelerators while unlocking substantial speedups for large-scale LLM inference in hybrid inference and post-training pipelines.

Syllabus

Boosting vLLM Inference on Huawei NPU with Ray Compiled Graphs — Huawei | Ray Summit 2025

Taught by

Anyscale

Reviews

Start your review of Boosting vLLM Inference on Huawei NPU with Ray Compiled Graphs

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.