Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Aggressive LLMs Optimization - Making Them Work on Tiny Devices

Conf42 via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore aggressive optimization techniques for deploying large language models on resource-constrained devices in this 25-minute conference talk from Conf42 ML 2025. Learn how to tackle the fundamental challenge of making LLMs work efficiently on tiny devices by examining GPT-2 as a case study model. Discover theoretical optimization approaches including quantization, pruning, and knowledge distillation before diving into practical research methodologies and experimental results. Understand how to combine multiple optimization methods for maximum efficiency gains, and gain insights into the trade-offs between model performance and computational requirements. Master the essential strategies for bringing powerful language models to edge devices and embedded systems where memory and processing power are severely limited.

Syllabus

00:00 Introduction and Speaker Introduction
00:31 The Challenge of Optimizing Large Language Models
01:47 Choosing the Right Model: GPT-2
03:48 Optimization Techniques: Theory
09:56 Practical Research and Experiments
18:42 Combining Optimization Methods
21:14 Conclusions and Final Takeaways

Taught by

Conf42

Reviews

Start your review of Aggressive LLMs Optimization - Making Them Work on Tiny Devices

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.