Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
This specialization is intended for learners with foundational knowledge in Python and Machine Learning who seek to develop a deep, hands-on understanding of Generative AI and its real-world applications. Across three courses, you’ll explore the full spectrum of GenAI. from understanding what it is and how it works, to building and deploying advanced systems responsibly. You’ll begin with the foundations of Large Language Models, learning about tokenization, embeddings, and the Transformer architecture that powers modern AI tools. Then, you’ll dive into multimodal generation, discovering how models like VAEs, GANs, Transformers, and Diffusion networks create and manipulate audio, image, and video content. In the final course, you’ll move from theory to implementation: building a Transformer from scratch in Python, extending its capabilities with Retrieval-Augmented Generation (RAG), and developing agentic AI systems using Google’s Agent Development Kit (ADK) on GCP. Throughout, you’ll engage with ethical considerations around bias, transparency, copyright, and responsible deployment.
Syllabus
- Course 1: Introduction to Generative AI: Concepts and Techniques
- Course 2: Generative AI for Audio and Images: Models and Applications
- Course 3: Building and Deploying Generative AI Models
Courses
-
This four-module course gives you a clear, practical foundation in Generative AI from what it is and where it’s used, to how modern models work and how to apply them responsibly. You’ll start with the big picture: GenAI capabilities across text, image, audio, and video, plus real-world industry applications. Then you’ll dive into the science behind today’s Large Language Models: text representation (tokenization, embeddings), and the Transformer architecture (positional encoding, self-attention, encoder/decoder flow). Next, you’ll get hands-on with LLMs and workflows: crafting effective prompts, calling models via web/UI and APIs, running models locally (e.g., via Ollama), and extending capabilities with Retrieval-Augmented Generation (RAG) and fine-tuning. Finally, you’ll examine challenges and responsible practice, including copyright, privacy and security, explainability, and questions of ownership in the GenAI era. Designed for learners with basic Machine Learning and Python familiarity, the course blends short lessons with labs, quizzes, and exercises. By the end, you’ll understand the core concepts and architectures behind GenAI with a strong sense in ethical and responsible use and GenAI limitations. By the end of this course, learners will be able to: Explain how generative AI spans text, image, audio, and video and assess real industry workflows where it creates value. Trace the evolution of language modeling from probabilistic/NLP approaches to Transformers, and justify why attention overcomes prior limitations. Understand tokenization and word embeddings, and reason about how these representations affect model behavior. Decompose a Transformer block and follow tensors, through self-attention, MLPs, and normalization to explain how representations are formed and refined. Operate LLMs via web UIs, APIs, and locally with Ollama to write minimal inference code and improve outputs using prompt patterns and get familiar with concepts of RAG and Fine-Tuning as possible next steps. Identify, analyze, and explain LLMs shortcomings such as bias, hallucination, ownership, and prompt injection by formulating user-level guidelines, organizational processes, and governance policies.
-
Generative AI for Audio and Images: Models and Applications offers an in-depth exploration of how modern generative models such as Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Transformers, and Diffusion models are used to create, manipulate, and enhance audio, image, and video content. Learners examine the architectures, training processes, and use cases of these models across different modalities, gaining both conceptual understanding and practical insights through hands-on activities. The course also highlights the ethical and societal implications of generative AI, including bias, transparency, intellectual property, and the challenges of deepfake technologies. By covering foundational theory as well as state-of-the-art approaches and applications, this course prepares learners to apply and develop generative AI creatively and responsibly for the audio and image modalities. By the end of this course, learners will be able to: Outline core concepts, challenges, and the history of AI-generated audio. Analyze important foundational audio generation models, such as variational and vector quantized autoencoders (VAE and VQ-VAE) Examine how these models integrate with the latest GenAI technologies to form hybrid, state-of-the-art transformer and diffusion-based audio generation systems, Study the architecture and functionality of Generative Adversarial Networks (GANs), and their variations. Implement and train GAN models for creating and enhancing visual content, Explore cutting-edge techniques such as diffusion models and transformers for image and video creation. Discuss the ethical considerations regarding generative AI for audio and images.
-
Transition from theoretical concepts to production-ready engineering in this hands-on course which is the final part in "Fundamentals of Generative AI" specialization. Designed for learners ready to move beyond the theory, this course focuses entirely on construction: you won't just learn about Large Language Models (LLMs); you will build, refine, and deploy them. We start at the foundational level, coding different types of Transformer architectures from scratch using PyTorch. Through high-performance training with Automatic Mixed Precision and ROUGE/BLEU evaluation, you will learn the techniques to scale custom components into optimized systems. By utilizing pre-trained models and weighing performance trade-offs, you will gain the insight needed to select the most efficient path for large-scale deployment. Moving to applied architecture, you will master Retrieval Augmented Generation (RAG) using LangChain, learning to evaluate pipelines and apply advanced techniques such as different chunking strategies, reranking and compression, and query transformation. You'll also navigate model selection as well as the critical trade-offs between RAG and Fine-tuning. Finally, you will step into the future of AI by developing autonomous Agents. You will bridge the gap between development and production by setting up a professional workflow with Poetry and deploying a Summarizer AI Agent directly to the Google Cloud Platform (Vertex AI). By the end of this course, you will possess a tangible portfolio of code and a live deployment, proving your ability to engineer robust Generative AI solutions.
Taught by
Amreen Anbar, Anahita Doosti and Soroush Razavi