QiMeng-Xpiler - Transcompiling Tensor Programs for Deep Learning Systems with a Neural-Symbolic Approach

Learn about a novel transcompiler that automatically translates tensor programs across heterogeneous deep learning systems in this 11-minute conference talk from OSDI '25. Discover how QiMeng-Xpiler addresses the challenge of developing multiple low-level tensor programs for different platforms like GPUs and ASICs by combining large language models (LLMs) with symbolic program synthesis in a neural-symbolic approach. Explore the key insight of leveraging LLM's code generation capabilities to make search-based symbolic synthesis computationally tractable, including multiple LLM-assisted compilation passes using pre-defined meta-prompts for program transformation. Understand how efficient symbolic program synthesis repairs incorrect code snippets at limited scale, and examine the hierarchical auto-tuning approach that systematically explores parameters and transformation pass sequences for high performance. Review experimental results demonstrating 95% average accuracy in correctly translating tensor programs across four distinct deep learning systems: Intel DL Boost with VNNI, NVIDIA GPU with CUDA, AMD MI with HIP, and Cambricon MLU with BANG, making "Write Once, Run Anywhere" for tensor programs a practical reality.