Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Supercharging Generative AI with PyTorch and Arm Neoverse

LinaroOrg via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to accelerate Generative AI workloads on Arm® Neoverse™ in this 23-minute talk that presents an end-to-end solution combining Arm's software-level AI acceleration with KleidiAI's optimizations. Discover the integration of KleidiAI's highly optimized 4-bit weight-only kernels with dynamic activation quantization directly into PyTorch, making advanced quantization techniques accessible through official PyTorch distribution. Explore the new TorchAO quantizer API that provides a standardized solution for quantizing any PyTorch model, including large language models and other GenAI models. When coupled with TorchChat for LLM serving, this approach enables developers to deploy resource-efficient, high-performance LLMs at scale. The presentation demonstrates significant performance improvements, achieving generation speeds of over 66 tokens per second on models like Llama 2 (7B), compared to 12 tokens per second in their non-quantized state—far exceeding human reading speed of 5-7 tokens per second. This performance boost makes running GenAI models on Arm not just viable but highly competitive for cloud applications, reducing computational costs and energy consumption while enabling real-time, interactive AI applications that can efficiently serve multiple requests in large-scale deployments.

Syllabus

LIS25 117 Supercharging Generative AI KleidiAI, PyTorch and Arm Neoverse

Taught by

LinaroOrg

Reviews

Start your review of Supercharging Generative AI with PyTorch and Arm Neoverse

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.