The most comprehensive online course on machine learning with imbalanced data. Learn about under-sampling, over-sampling, SMOTE and much more.
Discover the truth about SMOTE and other resampling methods.
If you're disappointed for whatever reason, you'll get a full refund.
Sole is a lead data scientist, instructor and developer of open source software. She created and maintains the Python library for feature engineering Feature-engine, which allows us to impute data, encode categorical variables, transform, create and select features. Sole is also the author of the book "Python Feature engineering Cookbook" by Packt editorial.
Welcome to Machine Learning with Imbalanced Datasets. In this course, you will learn multiple techniques to improve the performance of machine learning models trained with imbalanced datasets.
Imbalanced datasets are those typically used in classification problems where one of the target classes is extremely under-represented. When this happens, we talk about a class imbalance. The class with a small number of samples is called the minority class, and the class or classes with plenty of data are called the majority class or classes.
Imbalanced datasets are a common occurrence in data science. Examples of imbalanced datasets are those used for fraud detection or medical diagnosis.
Most machine learning algorithms assume balanced class distributions. Thus, training classifiers on imbalanced data will naturally bias the model towards the majority class.
In addition, because the number of samples for the minority class is small, rules to accurately predict these classes are hard to find. Thus, observations belonging to the minority class most often end up being misclassified by the classification models.
Fortunately, there are various ways in which we can improve the performance of classifiers trained on data with imbalanced classes, including resampling, cost-sensitive learning, and ensemble methods.
In this course, you will learn multiple methods to improve the performance of machine learning models trained on imbalanced data and decrease the misclassification of the minority class or classes.
The course is divided into the following sections:
You will learn suitable metrics to assess imbalanced classification models trained with imbalanced datasets. You will learn about the roc-curve and the roc-auc. You will create a confusion matrix, find true positives, true negatives, false positives, and false negatives, and then use them to calculate other metrics like precision, recall, and the f1-score. You will also learn about specific performance metrics to assess imbalanced classification models, like the imbalanced accuracy, among others.