Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 50% Off
One plan covers every Professional Certificate on Coursera. 50% off Coursera Plus Annual for 10 days only — price increases June 17.
Unlock All Certificates
This three-course specialization is built for engineers who have moved past the basics and are ready to tackle the complexities of modern, massive deep learning architectures. You will go under the hood of Transformers and Diffusion Models — mastering not just how they work, but how to fine-tune and optimize them for specific use cases without needing a million-dollar compute cluster. Starting with advanced architectures, you will work with Vision Transformers, ConvNeXt, and modern training dynamics including RMSNorm, SwiGLU activations, and Mixed Precision Training using PyTorch Lightning and Timm.
As you progress, you will deep-dive into decoder-only Transformer internals, KV Caching, and Parameter-Efficient Fine-Tuning using LoRA and QLoRA to fine-tune billion-parameter models on consumer GPUs.
Disclaimer: This is an independent educational resource created by Board Infinity for informational and educational purposes only. This course is not affiliated with, endorsed by, sponsored by, or officially associated with any company, organization, or certification body unless explicitly stated. The content provided is based on industry knowledge and best practices but does not constitute official training material for any specific employer or certification program. All company names, trademarks, service marks, and logos referenced are the property of their respective owners and are used solely for educational identification and comparison purposes.
Syllabus
- Course 1: Deep Learning: Advanced Backbones and Efficient GPU Training
- Course 2: Generative AI: Fine-Tuning LLMs and Diffusion Models
- Course 3: Deploying Deep Learning: Quantization, Serving, and Edge AI
Courses
-
Master advanced deep learning architectures and efficient training techniques using PyTorch Lightning, timm, ConvNeXt, Vision Transformers, RoPE, SwiGLU, RMSNorm, and Weights & Biases. This course equips you to design, train, and benchmark modern backbones on limited GPU hardware for real-world production use. Module 1 introduces modern backbone architectures, tracing the evolution from ResNets to ConvNeXt and Vision Transformers, covering patch embeddings, multi-head self-attention, and position encodings. Module 2 dives into training dynamics and stabilization techniques including RMSNorm, SwiGLU activations, and Rotary Position Embeddings (RoPE) for stable, scalable training. Module 3 focuses on efficient training on limited GPUs using mixed precision (FP16/BF16), gradient accumulation, efficient data pipelines, and distributed training with DDP/FSDP in Lightning. Module 4 covers experiment tracking with TensorBoard and W&B, profiling FLOPs and throughput, and a hands-on ViT vs. CNN Showdown project with fine-tuning in timm. By the end of this course, you will: - Build and fine-tune ConvNeXt and Vision Transformer backbones using PyTorch Lightning and timm - Apply RMSNorm, SwiGLU, and RoPE to stabilize and scale deep transformer training - Implement mixed precision, gradient accumulation, and DDP/FSDP for efficient multi-GPU training - Design controlled CNN vs. ViT experiments with W&B tracking and PyTorch profiling Disclaimer: This is an independent educational resource created by Board Infinity for informational and educational purposes only. This course is not affiliated with, endorsed by, sponsored by, or officially associated with any company, organization, or certification body unless explicitly stated. The content provided is based on industry knowledge and best practices but does not constitute official training material for any specific employer or certification program. All company names, trademarks, service marks, and logos referenced are the property of their respective owners and are used solely for educational identification and comparison purposes.
-
"Production Deep Learning: Inference, Quantization & Edge Deployment is designed for ML engineers and developers who want to master the full deployment lifecycle — from compressing and quantizing models to serving them at scale using vLLM, Triton, ONNX, and Llama.cpp. Module 1 covers model compression fundamentals, including pruning, distillation, and INT8/INT4 quantization using AWQ and GPTQ, with a focus on the accuracy–latency tradeoff. Module 2 dives into high-throughput serving architectures, exploring vLLM's PagedAttention, NVIDIA Triton, TensorRT, and scaling inference across GPU clusters with autoscaling patterns. Module 3 focuses on CPU and edge deployment using ONNX Runtime, GGUF, and Llama.cpp, plus multimodal inference with CLIP and LLaVA on resource-constrained devices. Module 4 is a capstone project where you'll quantize a fine-tuned LLM, build a production API with vLLM, benchmark performance, and containerize your model with Docker for cloud and edge deployment. By the end of this course, you will: - Apply INT4/INT8 quantization techniques (AWQ, GPTQ, GGUF) to compress LLMs for production - Deploy high-throughput inference servers using vLLM, Triton, and ONNX Runtime - Run optimized models on GPU, CPU, and edge devices using Llama.cpp and TensorRT - Build, benchmark, and containerize an end-to-end production-ready inference API" Disclaimer: This is an independent educational resource created by Board Infinity for informational and educational purposes only. This course is not affiliated with, endorsed by, sponsored by, or officially associated with any company, organization, or certification body unless explicitly stated. The content provided is based on industry knowledge and best practices but does not constitute official training material for any specific employer or certification program. All company names, trademarks, service marks, and logos referenced are the property of their respective owners and are used solely for educational identification and comparison purposes.
-
"Master Generative AI with hands-on training in Large Language Models (LLMs), PEFT techniques (LoRA, QLoRA), and Diffusion Models using Hugging Face, diffusers, peft, trl, and bitsandbytes. This course takes you from the internals of decoder-only transformers to building a specialist fine-tuned LLM and generating high-quality, controllable images with ControlNet. In Module 1, explore decoder-only transformer architectures, self-attention, causal masking, KV caching, and token flow mechanics. Module 2 focuses on Parameter-Efficient Fine-Tuning (PEFT), where you'll implement LoRA, QLoRA, and 4-bit quantization to fine-tune large models on consumer GPUs using SFT pipelines. Module 3 dives into diffusion models, covering forward/reverse processes, UNet, schedulers (DDIM, Euler, DPM++), and ControlNet conditioning. Module 4 is a capstone where you'll build a Specialist LLM — from dataset creation to adapter export and evaluation. By the end of this course, you will: - Build and optimize decoder-only transformer pipelines with KV caching - Fine-tune 7B+ LLMs using LoRA, QLoRA, and SFT pipelines on limited hardware - Configure diffusers pipelines with ControlNet for controllable image generation - Train, export, and evaluate a domain-specialized LLM adapter end-to-end" Disclaimer: This is an independent educational resource created by Board Infinity for informational and educational purposes only. This course is not affiliated with, endorsed by, sponsored by, or officially associated with any company, organization, or certification body unless explicitly stated. The content provided is based on industry knowledge and best practices but does not constitute official training material for any specific employer or certification program. All company names, trademarks, service marks, and logos referenced are the property of their respective owners and are used solely for educational identification and comparison purposes.
Taught by
Board Infinity