In this course, you'll learn to recognize and address class imbalance in datasets. Explore practical undersampling and oversampling techniques, visualize their effects, and apply advanced resampling strategies. By the end, you'll be able to train models that perform better on imbalanced data.
Overview
Syllabus
- Unit 1: Identifying and Understanding Data Imbalance
- Counting Classes to Spot Imbalance
- Visualizing Imbalance with Bar Plots
- Quantifying Imbalance with Class Percentages
- Unit 2: Undersampling Techniques for Handling Unbalanced Datasets
- Balancing Classes with Random Undersampling
- Cleaning Boundaries with Tomek Links
- Comparing Undersampling Techniques Side by Side
- Applying Undersampling to Real Data
- Unit 3: Oversampling Techniques for Handling Unbalanced Datasets
- Counting the Balance in Random Oversampling
- Custom SMOTE for Partial Rebalancing
- Fine-Tuning ADASYN for Subtle Rebalancing
- SMOTE with Real World Data
- Unit 4: Training a Better Model with Resampling Techniques
- Training a Baseline Logistic Regression Model
- Building a Resampling Pipeline
- Training Models with Resampled Data