Advanced Model Compression Techniques

Overview

This course equips learners with essential methodologies to reduce the size of machine learning models without significantly impacting performance. Starting with an introduction to various techniques, tools, and real-world applications, the course delves into post-training and training-time compression methods. Participants will explore how to build collaborative compression pipelines that enhance model efficiency. In the project "UdaciSense - Optimized Mobile Object Recognition," learners apply their knowledge to develop a practical, optimized solution for mobile devices. This course is perfect for AI practitioners seeking to advance their skills in model optimization.

Syllabus

Introduction to Model Compression: Techniques, Tools, and Use Cases

Explore model compression's importance, major techniques, tools, and real-world applications to make AI models smaller, faster, and efficient for deployment on diverse devices.

Post-Training Model Compression Techniques

Learn methods to compress trained models—quantization, pruning, and graph optimizations—for efficient deployment without retraining or original training data.

Training-Time Model Compression Techniques

Learn how training-time compression techniques like quantization-aware training, pruning, distillation, and NAS create efficient AI models by embedding optimization directly into training.

Building Collaborative Compression Pipelines

Discover how to design multi-stage, collaborative model compression pipelines by combining and sequencing techniques to maximize efficiency for cloud, mobile, and edge deployments.

UdaciSense - Optimized Mobile Object Recognition

In this project, you will compress a vision model for mobile deployment using techniques like pruning, quantization, and knowledge distillation to reduce size and latency and preserving accuracy.