Overview

Imagine deploying schema changes with confidence—knowing your pipeline will handle them gracefully, consumers will stay healthy, and your data will stay consistent. That's the difference between hoping your CDC pipeline works and knowing it will. In this course you will learn how to build a working, vendor‑neutral CDC pipeline and a single, unified table from evolving source schemas. Starting with Debezium streaming changes from Postgres/MySQL into Kafka, you will use Schema Registry to enforce compatibility, then apply streaming SQL in Flink (or ksqlDB) to map, cast, and merge divergent fields into a canonical model. Finally, you will persist results to an Apache Iceberg table and query it instantly with Trino. Along the way, you’ll learn practical strategies to manage schema drift, choose compatibility modes (backward/full), and avoid breaking downstream consumers. Everything runs locally with Docker so you can reproduce it anywhere and take the same patterns to your cloud stack later. This course is designed for engineers working with Kafka, Debezium, and streaming SQL who need reliable schema evolution and canonical modeling skills. Learners should be familiar with Basic SQL, Docker, and familiarity with Kafka or streaming concepts. By the end of the course,you will be able to implement a small end‑to‑end CDC pipeline that streams from a source DB and unifies evolving schemas into a single queryable table.

Syllabus

CDC Foundations: Building Your First Streaming Pipeline

Deploy a local Debezium, Kafka, Schema Registry, and Flink/ksqlDB stack to observe row-level changes in real-time. Intentionally modify the source schema, then employ streaming SQL to map, cast, and coalesce fields into a canonical table. Perform upserts using stable keys and verify the data is correctly stored in Iceberg. By the conclusion, you will have established an operational CDC loop and a unified, queryable dataset.

Operate the Pipeline: Registry Rules & Recovery

Learn to prevent consumer disruptions by enforcing compatibility at both the subject and global levels. We will deliberately deploy an incompatible schema, observe the failure, and proceed safely using defaults and transitive modes. Implement practical safeguards such as CI schema checks, DLQs, alerts, and lag probes to ensure issues are promptly identified and contained. The emphasis is on repeatable recovery, not heroics.

Canonical Models, Iceberg Sinks & Fast Queries

Develop a robust canonical model encompassing naming conventions, data types and units, nullability, and soft delete mechanisms, and store it in Iceberg on MinIO utilizing streaming upserts. Perform immediate queries with Trino and employ time-travel features for validation or debugging regressions. The project involves constructing a denormalized “latest per customer” view for analytical purposes, as well as discussing partitioning strategies, equality deletes, and data compaction. Participants will acquire scalable patterns suitable for deployment from laptops to cloud environments.