Learn Backend Development Part-Time, Online
PowerBI Data Analyst - Create visualizations and dashboards from scratch
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn about a novel transcompiler that automatically translates tensor programs across heterogeneous deep learning systems in this 11-minute conference talk from OSDI '25. Discover how QiMeng-Xpiler addresses the challenge of developing multiple low-level tensor programs for different platforms like GPUs and ASICs by combining large language models (LLMs) with symbolic program synthesis in a neural-symbolic approach. Explore the key insight of leveraging LLM's code generation capabilities to make search-based symbolic synthesis computationally tractable, including multiple LLM-assisted compilation passes using pre-defined meta-prompts for program transformation. Understand how efficient symbolic program synthesis repairs incorrect code snippets at limited scale, and examine the hierarchical auto-tuning approach that systematically explores parameters and transformation pass sequences for high performance. Review experimental results demonstrating 95% average accuracy in correctly translating tensor programs across four distinct deep learning systems: Intel DL Boost with VNNI, NVIDIA GPU with CUDA, AMD MI with HIP, and Cambricon MLU with BANG, making "Write Once, Run Anywhere" for tensor programs a practical reality.
Syllabus
OSDI '25 - QiMeng-Xpiler: Transcompiling Tensor Programs for Deep Learning Systems with a...
Taught by
USENIX