This course equips learners with essential methodologies to reduce the size of machine learning models without significantly impacting performance. Starting with an introduction to various techniques, tools, and real-world applications, the course delves into post-training and training-time compression methods. Participants will explore how to build collaborative compression pipelines that enhance model efficiency. In the project "UdaciSense - Optimized Mobile Object Recognition," learners apply their knowledge to develop a practical, optimized solution for mobile devices. This course is perfect for AI practitioners seeking to advance their skills in model optimization.
Overview
Syllabus
- Introduction to Model Compression: Techniques, Tools, and Use Cases
- Explore model compression's importance, major techniques, tools, and real-world applications to make AI models smaller, faster, and efficient for deployment on diverse devices.
- Post-Training Model Compression Techniques
- Learn methods to compress trained models—quantization, pruning, and graph optimizations—for efficient deployment without retraining or original training data.
- Training-Time Model Compression Techniques
- Learn how training-time compression techniques like quantization-aware training, pruning, distillation, and NAS create efficient AI models by embedding optimization directly into training.
- Building Collaborative Compression Pipelines
- Discover how to design multi-stage, collaborative model compression pipelines by combining and sequencing techniques to maximize efficiency for cloud, mobile, and edge deployments.
- UdaciSense - Optimized Mobile Object Recognition
- In this project, you will compress a vision model for mobile deployment using techniques like pruning, quantization, and knowledge distillation to reduce size and latency and preserving accuracy.
Taught by
Samantha Guerriero