Clustering and Dimensionality Reduction

via Train in Data

Go to class Write review

Overview

Course on Clustering and Dimensionality Reduction in Machine Learning.

Simplify, visualize, and understand high-dimensional data.

Access to video lectures, Jupyter notebooks, quizzes and more.

If you're disappointed for whatever reason, you'll get a full refund.

Dalibor is a data scientist and bio-statistician with a Master’s degree in signal processing. He's analyzed complex biological data and economics data, where he studied market trends.

At work, he advocates for a balanced approach that combines theoretical learning with practical applications. Find out more about Dalibor on Linkedin.

Welcome to the definitive course on unsupervised machine learning—designed to go deeper than any other resource online.

While platforms like Udemy and Coursera offer introductory content, this course delivers unmatched depth, combining rigorous theory, hands-on implementation, and real-world case studies you won’t find elsewhere.

This isn’t just another overview—it’s the deepest dive into unsupervised learning available online.

Unsupervised learning unlocks hidden patterns and structures in data (a process known as data mining) without relying on pre-labeled examples. This approach isn’t just useful—it’s often essential when labeling data is impractical or impossible.

In this course, we’ll focus on two transformative techniques:

Mastering these methods is key to extracting actionable insights—a must-have skill in data science.

These techniques power real-world applications across industries:

We’ll break down unsupervised learning algorithms—exploring how they work, their strengths, and their limitations. But we won’t stop at theory. You’ll implement them yourself through:

By the end, you’ll have the skills to apply these techniques in your own projects. Whether you’re a practicing data scientist or a curious learner, this course will deepen your understanding of machine learning’s unsupervised frontier.

From zero to hero—no prior expertise required.

We designed this course so that even with minimal Python experience, you'll finish with the ability to analyze real data using clustering and dimensionality reduction—while advanced learners can dive straight into practical applications.

Syllabus

Introduction
- Introduction
- Unsupervised machine learning
- Course contents
- Learning tips
- Jupyter notebook overview
- Course slides
- How did you hear about us?
- Refer a friend program
Python basics
- Chapter agenda
- ---- Part 1 - basic python data types ----
- Numerical data types
- Boolean data type
- String data type
- Python lists - part 1
- Python lists - part 2
- Sets and tuples
- Dictionaries and "None"
- Truthiness
- ---- Part 2 - basic python functionalities ----
- Copying in python (shallow & deep copying)
- Unpacking iterable data types
- Python functions, *args and **kwargs
- Python functions - demo
- Lambda functions, scopes and decorators
- Python classes
- Python classes - demo
- While loops and loop control statements
- Comprehensions in python
- Chapter summary
- How are we doing?
Python data science libraries
- Chapter agenda
- ---- Part 1 - Numpy ----
- Indexing & slicing in numpy
- Indexing & slicing in numpy - demo
- Operations on single numpy arrays
- Operations on single numpy arrays - demo
- Operations between numpy arrays & broadcasting
- Operations between numpy arrays & broadcasting - demo
- Merging numpy arrays
- Data types in numpy
- Matrix operations in numpy
- ---- Part 2 - pandas ----
- Pandas indexing and slicing
- Creating data frames
- Pandas indexing and slicing - demo
- Operations on single data frames/series
- Operations on single data frames/series - demo
- Operations between data frames/series
- Operations between data frames/series - demo
- Other useful pandas functionalities
- Pandas data types
- Pandas data types - demo
- Pandas group by statement
- Pandas group by statement - demo
- ---- Part 3 - Data visualisations ----
- Matplotlib basics
- Seaborn basics
- Chapter summary
- How are we doing?
K-means clustering - part 1
- Chapter agenda
- K-means clustering algorithm
- Avoiding suboptimal solutions
- Demo: Implementing k-means clustering algorithm from scratch - part 1
- Demo: Implementing k-means clustering algorithm from scratch - part 2
- K-means in sklearn
- Data preprocessing for K-means
- Adjusted rand index
- Demo: Data preprocessing, k-means & adjusted rand index
- Inferring number of clusters with inertia knee method
- Silhouette scores (inferring number of clusters & analyzing cluster quality)
- Demo: inertia knee method & silhouette scores - part 1
- Demo: inertia knee method & silhouette scores - part 2
- Chapter summary
- How are we doing?
Principal component analysis (PCA)
- Chapter agenda
- Feature coordinate systems
- PCA and feature coordinate systems
- Intuition behind Principal Component Analysis
- PCA as a linear transformation of the data - introduction
- Linear transformations
- Eigenvectors and eigenvalues
- Change of basis
- Variance and covariance
- PCA from eigendecomposition perspective
- Principal component analysis for dimensionality reduction
- Demo: Performing PCA by using eigendecomposition
- Principal component analysis in sklearn
- Demo: PCA in sklearn (artificial data)
- Demo: PCA in sklearn (real data)
- Guidelines for choosing number of principal component
- Demo : Choosing number of principal components
- Chapter summary
- How are we doing?
Uniform Manifold Approximation and Projection for Dimension Reduction (UMAP)
- Chapter agenda
- Graph theory basics
- UMAP introduction
- Fuzzy set basics
- Gradient descent & stochastic gradient descent
- Sparse matrices with SciPy
- UMAP theory - part 1
- Demo: Implementing UMAP from scratch - part 1
- UMAP theory - part 2
- Speeding up python code with numba
- Demo: Implementing UMAP from scratch - part 2
- UMAP python package (umap-learn)
- Demo: Running UMAP with umap-learn python package
- Tuning UMAP parameters
- Demo: Tuning UMAP parameters
- UMAP caveats
- Demo: UMAP caveats
- Chapter summary
K-means clustering - part 2
- Chapter agenda
- Yellowbrick python library
- Characterizing clusters using data visualizations
- Demo: K-means, yellowbrick and cluster characterization
- Handling and encoding categorical features
- Encoding categorical data in python
- Measuring distance in categorical data
- Distance measures & distance metrics
- Calculating distance with SciPy & choosing distance measures in other algorithms
- Demo: Calculating distances with SciPy and sklearn (applied to categorical data)
- K-modes clustering algorithm
- K-modes python package
- Demo: Clustering categorical data using the K-means algorithm
- Demo: Clustering categorical data using the K-modes algorithm
- Mixed data & gower distance
- K-prototypes clustering algorithm
- K-prototypes python package
- Clustering customers demo prerequisites
- Demo: Clustering customers (mixed data)
- K-means algorithm pros & cons
- Demo: k-means algorithm limitations
- Chapter summary
Case study - clustering cells based on RNA data
- Case study intro
- Understanding the data - central dogma of molecular biology
- Understanding the data - single cell RNA sequencing
- Analyzing the data - removing low quality cells
- Analyzing the data - normalization, gene selection, PCA, UMAP and clustering
- Demo prerequisite - statistical testing basics
- Demo: analyzing the data - part 1
- Demo: analyzing the data - part 2
- Case study summary
Agglomerative hierarchical clustering
- Chapter agenda
- Hierarchical & agglomerative clustering introduction
- Dendrogram linkages & constructing dendrograms
- Cophenetic distance & cophenetic correlation
- Constructing dendrograms with SciPy
- Demo: Constructing dendrograms with SciPy
- Approaches for extracting clusters from dendrograms
- Agglomerative clustering with SciPy and sklearn
- Demo: Agglomerative clustering with SciPy & dendrogram manipulation
- Demo: Agglomerative clustering with sklearn
- Agglomerative clustering general guidelines
- Demo: Clustering cars (numerical data)
- Demo: Clustering animals (categorical data)
- Demo: Clustering cars (mixed data)
- Chapter summary
Density based clustering
- Chapter agenda
- Density based clustering - introduction
- DBSCAN clustering algorithm
- Nearest neighbors basics
- Nearest neighbors in sklearn
- Demo: implementing DBSCAN from scratch
- DBSCAN in sklearn
- Tuning DBSCAN parameters
- Demo: Tuning DBSCAN parameters
- Density based clustering validation (DBCV) - part 1
- Density based clustering validation (DBCV) - part 2
- Demo: Implementing DBCV from scratch - part 1
- Demo: Implementing DBCV from scratch - part 2 + DBCV python function
- DBSCAN general guideliness
- Demo: Clustering digits (mnist784) with DBSCAN
- Demo: Clustering animals with DBSCAN (categorical data)
- HDBSCAN clustering algorithm - part 1
- HDBSCAN clustering algorithm - part 2
- HDBSCAN clustering algorithm - part 3
- HDBSCAN python library (hdbscan)
- Demo: Implementing HDBSCAN (partial implementation)
- HDBSCAN general guideliness
- Demo: Clustering iris and digits (mnist784) with HDBSCAN
- Demo: Clustering animals with HDBSCAN (categorical data)
- Robust scaler (demo prerequisite)
- Demo: Clustering phones with HDBSCAN (mixed data)
- Case study: Geospatial clustering with DBSCAN and HDBSCAN - introduction
- Case study: Geospatial clustering with DBSCAN and HDBSCAN
- Chapter summary
Graph based clustering
- Chapter agenda
- Graphs, graph layouts and graph communities
- Igraph python library
- Demo: Igraph library capabilities
- Modularity in graph community structures
- Louvain clustering algorithm
- Louvain clustering - resolution parameter
- Demo: Implementing Louvain from scratch
- Analyzing community structure quality
- Igraph - other useful functionalities
- Demo: Community quality metrics
- Case study: Clustering actors
- Using graph clustering with numerical/categorical/mixed data (KNN graph)
- Shared neighbors graph (SNN graph)
- Creating KNN (k nearest neighbors) graphs with sklearn
- Graph clustering guidelines
- Louvain algorithm pros & cons
- Demo: Clustering digits (mnist784) with Louvain
- Demo: Clustering animals with Louvain (categorical data)
- Chapter summary
Wrap-Up
- Chapter agenda
- Single cell RNA case study - part 2
- Clustering and dimensionality reduction - summary
- Course end