Feature Engineering for Machine Learning

Overview

Course on feature engineering for machine learning. The MOST comprehensive course on feature engineering available online.

Transform your data and build better performing models.

If you're disappointed for whatever reason, you'll get a full refund.

Sole is a lead data scientist, instructor, and developer of open source software. She created and maintains the Python library Feature-engine, which allows us to impute data, encode categorical variables, transform, create, and select features. Sole is also the author of the"Python Feature Engineering Cookbook," published by Packt.

Welcome to the most comprehensive course on feature engineering for machine learning available online.

In this course, you will learn everything you need to preprocess your datasets to train machine learning models like linear regression, logistic regression, decision trees, random forests and gradient boosting machines.

Feature engineering consists in using domain knowledge and statistical methods to create features that make machine learning algorithms work effectively.

Raw data is almost never suitable to train machine learning models. In fact, data scientists devote a lot of effort to data analysis, data engineering and preprocessing, and feature extraction, to create the best features to train predictive models.

Feature engineering includes imputation of missing data, encoding of categorical variables, transformation or discretization of continuous variables, combination of variables, extraction of dates and times, and much more.

In this course, you will learn about missing data imputation, encoding of categorical features, numerical variable transformation, discretization, and how to create new features from your dataset.

You probably saw a lot of courses on other learning platforms like Coursera or Udemy. In fact, this is the full version of the Udemy course. Why is this course special?

While most online courses will teach you the very basics of feature engineering, like imputing variables with the mean or transforming categorical features using one hot encoding, this course will teach you all of that, and much more.

Here, you will first learn the most popular techniques for variable engineering, like mean and median imputation, one-hot encoding, transformation with logarithm, and discretization. Then, you will discover more advanced methods that capture information while encoding or transforming your variables, to obtain better features and improve the performance of regression and classification models.

Syllabus

Welcome
- Introduction
- Course curriculum
- Course requirements
- How to approach this course
- Setting up your computer
- Refer a friend program
Course material
- Course material
- Download Jupyter notebooks
- Download datasets
- Download presentations
- How did you hear about us?
Variable types
- Variables | Intro
- Numerical variables
- Categorical variables
- Date and time variables
- Mixed variables
- How are we doing?
- Exercise
Variable characteristics
- Variable characteristics
- Missing data
- Cardinality
- Rare labels
- Variable distribution
- Outliers
- Linear models assumptions
- Variable magnitude
- How are we doing?
- Summary table
- Additional reading resources
- Exercise
Missing data imputation - Basic
- Basic imputation methods
- Mean or median imputation
- Arbitrary value imputation
- Frequent category imputation
- Missing category imputation
- Adding a missing indicator
- Basic methods - considerations
- Basic imputation with pandas
- Basic imputation with pandas - demo
- Basic methods with Scikit-learn
- Mean or median imputation with Scikit-learn
- Arbitrary value imputation with Scikit-learn
- Frequent category imputation with Scikit-learn
- Missing category imputation with Scikit-learn
- Adding a missing indicator with Scikit-learn
- Imputation with GrdiSearch - Scikit-learn
- Basic methods with Feature-engine
- Mean or median imputation with Feature-engine
- Arbitrary value imputation with Feature-engine
- Frequent category imputation with Feature-engine
- Arbitrary string imputation with Feature-engine
- Adding a missing indicator with Feature-engine
- Wrapping up
- How are we doing?
- Exercise
- Added Treat: A Movie We Recommend🍿
Missing data imputation - Alternative methods
- Alternative imputation methods
- Complete Case Analysis
- CCA - considerations with code demo
- End of distribution imputation
- Random sample imputation
- Random imputation - considerations with code
- Mean or median imputation per group
- CCA with pandas
- End of distribution imputation with pandas
- Random sample imputation with pandas
- Mean imputation per group with pandas
- CCA with Feature-engine
- End of distribution imputation with Feature-engine
- Random sample imputation with Feature-engine
- Wrapping up
- Imputation - Summary table
- Exercise
Multivariate imputation
- Multivariate Imputation
- KNN imputation
- KNN imputation - Demo
- MICE
- missForest
- MICE and missForest - Demo
- Additional reading resources
- Exercise
- Extra Treat: Our Reading Suggestion 📕
Categorical encoding - Basic methods
- Categorical encoding | Introduction
- One hot encoding
- One hot encoding with pandas
- One hot encoding with sklearn
- One hot encoding with Feature-engine
- One hot encoding with Category encoders
- Ordinal encoding
- Ordinal encoding with pandas
- Ordinal encoding with sklearn
- Ordinal encoding with Feature-engine
- Ordinal encoding with Category encoders
- Count or frequency encoding
- Count encoding with pandas
- Count encoding with Feature-engine
- Count encoding with Category encoders
- Unseen categories
- Wrapping up
Categorical encoding - monotonic
- Categorical encoding | Monotonic
- Ordered ordinal encoding
- Ordered ordinal encoding with pandas
- Ordered ordinal encoding with Feature-engine
- Mean encoding
- Mean encoding with pandas
- Mean encoding with Feature-engine
- Mean encoding with Category encoders
- Mean encoding plus smoothing
- Mean encoding plus smoothing - Category encoders
- Mean encoding plus smoothing - Feature-engine
- Weight of evidence (WoE)
- Weight of Evidence with pandas
- Weight of Evidence with Feature-engine
- Weight of Evidence with Category encoders
- Weight of evidence - gotchas
- Unseen categories
- Wrapping up
- Comparison of categorical variable encoding
- Additional reading resources
Categorical encoding - Rare labels
- Grouping rare labels
- One hot encoding of top categories
- OHE of top categories with pandas
- OHE of top categories with Feature-engine
- OHE of top categories with sklearn
- Rare label encoding
- Rare label encoding with pandas
- Rare label encoding with Feature-engine
- Wrapping up
- Categorical encoding - More...
- More Wisdom: Our Chosen Podcast Episode 🎧
Variable transformation
- Variable transformation - Introduction
- Variable transformation
- Box-Cox transformation
- Yeo-Johnson transformation
- Logarithm transformation with Numpy
- Reciprocal transformation with Numpy
- Square-root transformation with Numpy
- Power transformation with Numpy
- Box-Cox transformation with Scipy
- Yeo-Johnson transformation with Scipy
- Arcsin transformation with Numpy
- Logarithm transformation with sklearn
- Reciprocal transformation with sklearn
- Square-root transformation with sklearn
- Power transformation with sklearn
- Box-Cox transformation with sklearn
- Yeo-Johnson transformation with sklearn
- Arcsin transformation with sklearn
- Logarithm transformation with Feature-engine
- Reciprocal transformation with Feature-engine
- Square-root transformation with Feature-engine
- Power transformation with Feature-engine
- Box-Cox transformation with Feature-engine
- Yeo-Johnson transformation with Feature-engine
- Arcsin transformation with Feature-engine
- Wrapping up
- Additional reading resources
- Quiz
Discretization - Basic methods
- Discretization
- Discretization methods
- Equal-width discretization
- Equal-width discretization with pandas
- Equal-width discretization with sklearn
- Equal-width discretization with Feature-engine
- Equal-frequency discretization
- Equal-frequency discretization with pandas
- Equal-frequency discretization with sklearn
- Equal-frequency discretization with Feature-engine
- Arbitrary discretization
- Arbitrary discretization with pandas
- Arbitrary discretization with Feature-engine
- Discretization plus categorical encoding
- Discretization plus encoding | Demo
- Wrapping up
- Additional reading resources
Discretization - Alternative methods
- Discretization - section intro
- K-means discretization
- K-means discretization with sklearn
- Discretization with classification trees
- Discretization with decision trees using Scikit-learn
- Discretization with decision trees using Feature-engine
- Binarization
- Binarization with sklearn
- Additional reading resources
Outliers
- Outlier Engineering
- Outlier trimming with pandas
- Outlier trimming with Feature-engine
- Outlier capping with pandas
- Outlier capping with Feature-engine
- Arbitrary capping with Feature-engine
- Additional reading resources
Datetime variables
- Datetime variables
- Date features with pandas
- Time features with pandas
- Date and time features with Feature-engine
- Cyclical features
- Cyclical features with pandas
- Cyclical features with Feature-engine
Engineering mixed variables
- Mixed variables
- Mixed variables | Demo
Feature creation
- Feature creation
- Math functions
- Math functions with pandas
- Math functions with Feature-engine
- Relative functions with pandas
- Relative functions with Feature-engine
- Polynomial features
- Polynomial features demo
- Features from decision trees
Feature scaling
- Feature scaling
- Scaling and distributions
- Standardisation
- Standardisation | Demo
- Scaling to minimum and maximum values
- MinMaxScaling | Demo
- Mean normalisation
- Mean normalisation | Demo
- Maximum absolute scaling
- MaxAbsScaling | Demo
- Scaling to median and quantiles
- Robust Scaling | Demo
- Scaling to vector unit length
- Scaling to vector unit length | Demo
- Scaling categorical variables
- Additional reading resources
Assembling feature engineering pipelines
- Putting it all together
- Feature Engineering Pipeline
- Classification pipeline
- Regression pipeline
- Feature engineering pipeline with cross-validation
- More examples
Congratulations! You did it!
- Congratulations
- Next steps