Pipeline Architects: Data Engineering to Lakehouse

Coursera via Coursera Specialization

Go to class Write review

Details

Go to class

Provider

Coursera Specialization
Pricing

Paid Course
Languages

English
Certificate

Certificate Available
Duration & workload

4 weeks, 10 hours a week
Level

Intermediate

Found in

Overview

Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off

One annual plan covers every course and certificate on Coursera. 40% off for a limited time.

Get Full Access

Raw data sitting in disconnected silos is not a data platform — it is a liability. Building systems that ingest, transform, reconcile, version, & serve data reliably at enterprise scale is what separates engineers who prototype from architects who build infrastructure teams depend on. This program teaches you how to do the latter. Pipeline Architects is an intermediate program designed for data engineers, analytics engineers, & data platform professionals who want to build complete, production-ready data engineering skills. Across ten focused courses, you will master the full data engineering stack: mapping data flows, ingesting from relational databases, streaming platforms & REST APIs, building and transforming modular pipelines, evaluating storage formats, loading warehouses incrementally, implementing SCD2 historical tracking, applying data lake transactions and versioning, building lakehouse architectures, automating workflows with Apache Airflow, and unifying data through SQL MERGE reconciliation and performance tuning. You'll work with industry-standard tools including Python, SQL, Apache Airflow, dbt, Snowflake, Apache Kafka, Airbyte, Delta Lake, Iceberg, and Hudi, applying hands-on techniques to realistic production data engineering scenarios. By the end of the program, you will be equipped to architect, build, & operate data pipelines from raw ingestion through lakehouse delivery with the reliability and performance that modern analytics infrastructure demands.

Syllabus

Course 1: Map Data Flows Fast
Course 2: Unify Diverse Data Sources
Course 3: Evaluate Storage for Data Warehousing Success
Course 4: Build & Transform Data Pipelines
Course 5: Update Your Data Warehouse Incrementally
Course 6: Apply SCD2 to Build Dynamic Data Models
Course 7: Apply Data Lake Transactions & Versioning
Course 8: Build & Analyze Your Data Lakehouse
Course 9: Automate Data Workflows with Airflow Excellence
Course 10: Unify, Reconcile, and Tune Data Systems

Courses

0 reviews
1 hour 23 minutes
View details

Transform your data engineering capabilities with production-ready Apache Airflow workflows that eliminate manual intervention and ensure bulletproof reliability. This course empowers data engineers to move beyond simple task scheduling to architecting resilient, maintainable, and configurable automated pipelines that handle real-world complexities. You'll master the art of defining logical task dependencies, implementing automated retry mechanisms for transient failures, configuring Service Level Agreements with proactive alerting, and designing parameterized workflows that adapt to different scenarios. By course completion, you'll confidently create robust DAGs that integrate monitoring systems like Slack, handle edge cases gracefully, and scale from development to production environments. This course is unique because it focuses on production-grade practices from day one, teaching you to build workflows that data teams actually trust to run unsupervised. You'll work with real-world scenarios involving sales data processing, automated monitoring, and enterprise-level reliability requirements. To be successful in this course, you should have basic Python knowledge and familiarity with data processing concepts.
0 reviews
1 hour 42 minutes
View details

Transform your raw data files into robust, auditable data lake tables with database-like guarantees. This Short Course was created to help data professionals accomplish reliable data lake management with transactional integrity and versioning capabilities. By completing this course, you'll be able to convert existing data files into transactional formats, execute atomic operations that ensure data integrity during concurrent jobs, query historical versions for auditing and recovery, and manage schema evolution safely—all skills you can apply immediately to your data pipelines. By the end of this course, you will be able to: - Apply transactional and versioning features to data lake tables This course is unique because it focuses on hands-on implementation of data lake reliability patterns using open-source tools, bridging the gap between raw cloud storage and enterprise-grade data management. To be successful in this course, you should have a background in basic SQL and data file formats.
0 reviews
1 hour 52 minutes
View details

Did you know that without historical data tracking, over 40% of business insights can become inaccurate or misleading? Implementing Slowly Changing Dimension (SCD) Type 2 ensures every change in your data tells the full story over time. This Short Course was created to help professionals in this field implement robust historical data tracking systems that maintain complete audit trails and support accurate trend analysis in enterprise data warehouses. By completing this course, you will be able to apply SCD Type 2 logic to build dynamic data models that capture historical changes, enabling reliable reporting, timebased analysis, and improved business intelligence accuracy. By the end of this 3hour long course, you will be able to: Apply slowlychanging dimension (SCD) Type 2 logic to build data models that track historical changes. This course is unique because it connects data modeling theory with practical warehouse implementation, giving you the skills to design scalable, auditready models that preserve data integrity across time. To be successful in this project, you should have: Basic SQL knowledge Understanding of data modeling concepts Familiarity with dbt fundamentals Data warehouse basics
0 reviews
1 hour 40 minutes
View details

Ready to build data pipelines that power modern analytics? This course transforms you from someone who processes data manually into a data engineer who creates automated, modular pipeline systems. This Short Course was created to help Data Management and Engineering professionals accomplish scalable, maintainable data processing workflows. By completing this course, you'll be able to design and implement production-ready pipelines that seamlessly move data from raw sources to analytics-ready destinations using industry-standard tools. By the end of this course, you will be able to: • Create modular pipeline stages for data ingestion, cleansing, transformation, and loading • Implement automated workflows using Python, dbt, and Airflow • Deploy scalable solutions on cloud platforms like AWS and Snowflake This course is unique because it focuses on hands-on implementation with real-world scenarios using popular open-source tools that drive today's data infrastructure. To be successful in this project, you should have a background in basic SQL, Python programming, and familiarity with data concepts.
0 reviews
1 hour 18 minutes
View details

Transform complex data systems into clear, actionable visual maps that drive better engineering decisions and team collaboration. This Short Course was created to help data management and engineering professionals accomplish systematic visualization of data pipelines from source to destination. By completing this course, you'll be able to design comprehensive data flow diagrams that identify all data sources, map transformation processes, and specify final data destinations. You'll master the essential skill of creating visual blueprints that facilitate team collaboration, ensure system clarity, and accelerate pipeline development timelines. By the end of this course, you will be able to: Create end-to-end data flow diagrams that map sources, transformations, and data sinks This course is unique because it focuses on practical diagram creation using industry-standard tools and real-world data engineering scenarios, emphasizing immediate workplace application over theoretical concepts. To be successful in this project, you should have a background in basic data concepts and familiarity with data systems terminology.
0 reviews
1 hour 44 minutes
View details

Master the critical decision-making skills for optimizing data warehouse storage architecture. This course equips data professionals with the analytical expertise to evaluate columnar versus row-oriented storage formats based on workload characteristics, query patterns, and performance requirements. You'll learn to analyze compression ratios, assess ingestion performance implications, and conduct systematic benchmarking of formats like Parquet, ORC, and Avro. Transform your ability to make informed storage architecture decisions that directly impact analytical performance and cost-effectiveness in enterprise data warehousing environments. This course is unique because it combines theoretical understanding with hands-on benchmarking practice, giving you real-world experience in evaluating storage formats using actual performance metrics. To be successful in this project, you should have basic understanding of data warehousing concepts and familiarity with SQL queries.
0 reviews
1 hour 22 minutes
View details

Transform disconnected data silos into unified insights with enterprise-grade connector configuration skills. This Short Course was created to help data management and engineering professionals accomplish seamless integration of diverse data sources into centralized staging environments. By completing this course, you'll be able to configure Airbyte connectors for relational databases with proper authentication, set up real-time streaming connections to Kafka topics, and establish secure REST API endpoints - skills you can apply immediately to modernize your organization's data infrastructure. By the end of this course, you will be able to: Configure connector settings for relational databases with connection strings and authentication Set up streaming platform connections with proper topic subscriptions and offset management Establish REST API connections with authentication methods and endpoint configuration This course is unique because it provides hands-on experience with Airbyte, the fastest-growing open-source data integration platform, using real-world scenarios that mirror actual enterprise data challenges. To be successful in this project, you should have a background in basic database concepts and familiarity with data integration fundamentals.e.g. This is primarily aimed at first- and second-year undergraduates interested in engineering or science, along with high school students and professionals with an interest in programming.
0 reviews
2 hours 40 minutes
View details

Did you know that inconsistent or poorly synchronized data can derail analytics, disrupt integrations, and slow critical business processes? Effective reconciliation and performance tuning are essential for keeping enterprise systems aligned and efficient. This Short Course was created to help professionals in this field master advanced data synchronization, conflict resolution, and performance optimization techniques for enterprise-scale data pipeline transformation and optimization. By completing this course, you will be able to apply SQL MERGE for upsert operations, design field-level reconciliation rules to resolve data conflicts, and evaluate integration performance to recommend tuning actions—skills vital for building accurate, reliable, and high-performing data systems. By the end of this 4-hour long course, you will be able to: By the end of this 165‑minute (2.75‑hour) course, you will be able to: Apply the SQL MERGE statement to perform upsert operations on a target table. Analyze field-level conflicts to design data reconciliation rules. Evaluate system integration performance to recommend tuning actions. This course is unique because it blends advanced SQL techniques with enterprise data governance and optimization strategies, giving you hands-on experience designing robust pipelines that unify datasets while maintaining accuracy and speed. To be successful in this project, you should have: Advanced SQL knowledge Understanding of database design concepts Data integration experience Familiarity with performance monitoring practices
0 reviews
2 hours 24 minutes
View details

The modern data landscape demands professionals who can seamlessly bridge the gap between data lakes and data warehouses. This course transforms your ability to architect, implement, and optimize lakehouse platforms that deliver both flexibility and performance. This Short Course was created to help data engineering professionals accomplish scalable data platform implementation using advanced SQL and lakehouse patterns. By completing this course, you'll be able to register massive file-based datasets as queryable external tables, make informed decisions between Delta Lake, Iceberg, and Hudi formats, and automate robust data ingestion pipelines that keep your warehouse synchronized with your lake. By the end of this course, you will be able to: - Apply configurations to register file-based datasets as external tables - Analyze the technical capabilities of different open-source table formats - Create a data ingestion pipeline within a lakehouse architecture This course is unique because it combines hands-on SQL implementation with strategic architectural decision-making, giving you both the technical skills and analytical framework needed for enterprise-scale data platforms. To be successful in this course, you should have a background in SQL, data warehousing concepts, and distributed systems fundamentals.
0 reviews
1 hour 51 minutes
View details

Transform your data warehousing efficiency with incremental loading - the strategic approach that processes only what's changed rather than rebuilding everything from scratch. This Short Course was created to help data management and engineering professionals accomplish systematic data synchronization that dramatically reduces processing time and computational costs. By completing this course, you'll be able to implement incremental load strategies using Snowflake's powerful MERGE INTO command, execute staging table workflows that isolate incoming data before integration, and define conditional logic for updating existing records while inserting new ones. You'll master the art of comparing records between staging and target tables using business keys, ensuring your data pipelines are both performant and cost-effective. By the end of this course, you will be able to: Apply incremental load strategies to efficiently update data in a data warehouse. This course is unique because it focuses on hands-on implementation of real-world incremental loading patterns using industry-standard tools and practices that mirror authentic enterprise data engineering workflows. To be successful in this project, you should have a background in basic SQL knowledge and understanding of data warehouse concepts.