Enabling Heterogeneous Compute on Edge-AI Systems

Explore a comprehensive conference talk that addresses the critical challenges of deploying AI models on heterogeneous edge computing systems. Learn about the limitations of current deployment frameworks when working with modern edge-AI chips that integrate multiple compute cores including CPUs, GPUs, and NPUs within a single system. Discover a flexible and lightweight SDK solution designed to bridge this gap by supporting ahead-of-time model compilation across various formats such as PyTorch (both custom and Hugging Face models), TensorFlow Lite, and ONNX. Understand how this SDK abstracts the complexities of heterogeneous systems, enabling seamless deployment across multiple compute targets without requiring specialized hardware knowledge. Examine the Python-friendly API that empowers developers to dynamically switch workloads between different cores, such as balancing inference tasks between CPU and GPU on ARM-based SoCs. See practical demonstrations of how this capability impacts model throughput for both image classification tasks and on-device generative AI workloads, making it essential viewing for developers working on time-critical and data-sensitive edge AI applications.