Building Convolutional Neural Networks for Computer Vision

Overview

This course introduces Convolutional Neural Networks, the most widely used type of neural networks specialized in image processing. You will learn the main characteristics of CNNs that make them so useful for image processing, their inner workings, and how to build them from scratch to complete image classification tasks. You will learn what are the most successful CNN architectures, and what are their main characteristics. You will apply these architectures to custom datasets using transfer learning. You will also learn about autoencoders, a very important architecture at the basis of many modern CNNs, and how to use them for anomaly detection as well as image denoising. Finally, you will learn how to use CNNs for object detection and semantic segmentation.

Syllabus

Course Overview

Explore the course goals, meet your instructor, and discover how computer vision and CNNs enable computers to interpret images and solve real-world visual tasks.

Introduction to Computer Vision and CNNs

Explore how Convolutional Neural Networks (CNNs) revolutionize computer vision by preserving spatial structure and enabling translation-invariant image recognition.

Convolutions and Feature Extraction

Discover how convolutional neural networks extract image features by applying learnable filters, enabling efficient visual recognition through spatial awareness and translation invariance.

Implementing Convolutions with PyTorch

Learn to implement 2D convolutional layers in PyTorch, use custom filters to extract features from images, and visualize feature maps for foundational computer vision skills.

Kernels, Strides, and Padding

Learn how padding and stride control feature map size and detail in CNNs, preserve spatial information, and enable efficient, deeper models for robust computer vision tasks.

Pooling in PyTorch

Learn how pooling layers in PyTorch downsample feature maps, boost efficiency, and build robust CNNs using max and average pooling techniques for image recognition tasks.

CNN Architectures

Learn how CNN architectures stack convolution, activation, and pooling layers to extract features and classify images, powering modern computer vision applications.

Building a CNN from Scratch

Learn to build, train, and visualize a CNN for image classification, interpret feature maps and filters, and refine models using data, architecture, and regularization techniques.

Data Augmentation

Learn how data augmentation expands training data using transformations, helping models generalize better and prevent overfitting by teaching invariance to real-world variations.

Data Augmentation Pipelines with PyTorch

Learn to build and apply data augmentation pipelines in PyTorch, creating robust vision models by chaining transforms for training and evaluation using torchvision and DataLoader.

Advanced CNN Training

Master advanced CNN training by tackling imbalanced data, optimizing learning rates, and using regularization techniques for robust, reliable, and generalizable deep models.

Advanced CNN Training in PyTorch

Master advanced CNN training with PyTorch: tackle overfitting using data augmentation, dropout, batch normalization, learning rate scheduling, and early stopping for robust models.

Famous Vision Architectures

Explore key vision architectures like LeNet, AlexNet, VGG, ResNet, ConvNeXt, and Vision Transformers, learning their innovations and how they shaped modern computer vision.

Transfer Learning and Fine-Tuning

Learn to leverage pretrained models with transfer learning through feature extraction and fine-tuning to quickly build accurate computer vision models with limited data.

Transfer Learning in PyTorch

Learn transfer learning in PyTorch: fine-tune pretrained models, adapt classifiers, handle real-world data challenges, and build custom image classifiers efficiently.

Explainable and Interpretable CNNs

Learn how visualization and attribution methods like feature maximization and Grad-CAM make CNNs explainable, interpretable, and more trustworthy in critical applications.

Application: Neural Style Transfer

Learn to blend the content of one image with the style of another using neural style transfer and a pre-trained VGG19 network, creating unique, artistic images with deep learning.

Encoder-Decoder Networks

Explore encoder-decoder networks (autoencoders) that compress and reconstruct data, enabling image denoising, anomaly detection, and unsupervised feature learning.

Building and Using Autoencoders with PyTorch

Learn to build and train autoencoders with PyTorch for image reconstruction and denoising, using fully-connected and convolutional architectures on the MNIST dataset.

Object Detection

Discover object detection by learning how models find and classify objects in images using bounding boxes, multi-head networks, and metrics like IoU and mean Average Precision (mAP).

Object Detection with PyTorch

Learn to use and fine-tune PyTorch pretrained models for object detection. Build YOLO from scratch, apply Faster R-CNN, process outputs, and visualize results on real-world images.

Image Segmentation

Learn how semantic segmentation assigns a class to every pixel using encoder-decoder networks like U-Net, combining context and spatial detail via skip connections for precise image mapping.

Image Segmentation with PyTorch

Learn semantic segmentation with PyTorch by building and training U-Net models to classify pixels, using theory, demos, and hands-on exercises for real-world image analysis tasks.