Understanding How Large Language Models Generate Images - From Autoencoders to Multimodal LLMs
Neural Breakdown with AVB via YouTube
Learn AI, Data Science & Business — Earn Certificates That Get You Hired
Lead AI Strategy with UCSB's Agentic AI Program — Microsoft Certified
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Explore the fascinating world of image generation through Large Language Models in this 18-minute educational video that breaks down complex concepts from basic to advanced topics. Starting with fundamental concepts like latent space and autoencoders, progress through detailed explanations of Vector-Quantized Variational Autoencoders (VQ-VAEs), codebooks, and modern multimodal models like Google's Gemini, Parti, and OpenAI's DallE. Learn how these text-based models successfully generate images by understanding the underlying architecture and mechanisms. Supplemented with references to essential research papers, related educational content, and clear timestamps for easy navigation through topics, making it perfect for both beginners and those looking to deepen their understanding of AI image generation technology.
Syllabus
- Intro
- Autoencoders
- Latent Spaces
- VQ-VAE
- Codebook Embeddings
- Multimodal LLMs generating images
Taught by
Neural Breakdown with AVB