Overview

Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off

One annual plan covers every course and certificate on Coursera. 40% off for a limited time.

You will design and implement enterprise-grade data models, from traditional star schemas to modern lakehouse architectures. This comprehensive course equips you with the skills to build cost-effective, scalable data solutions that drive business intelligence and analytics. You'll gain hands-on experience creating dimensional models with surrogate keys, optimizing database schemas through partitioning and clustering, and implementing slowly changing dimensions for historical data tracking. The course covers advanced topics like semantic metrics layers, multi-cluster warehouse architectures, and open-source table formats for data lakes. What makes this course unique is its end-to-end approach to modern data architecture. You'll work with real-world scenarios, from analyzing storage costs to designing data ingestion pipelines that span from raw files to analytics-ready tables. By completion, you'll confidently architect data solutions that balance performance, cost, and scalability—skills essential for senior data engineering and architecture roles in today's data-driven organizations.

Syllabus

Analyze Snowflake Schema Redundancies

You will examine existing snowflake schemas to pinpoint performance bottlenecks caused by redundant lookup paths and develop systematic approaches for identifying optimization opportunities.

Apply Star-Schema Dimensional Modeling

You will construct optimized star-schema dimensional models with proper fact and dimension table structures, implementing surrogate keys and design patterns that maximize query performance for analytical workloads.

Create Semantic Metrics Layer

You will develop standardized semantic metrics layers that ensure consistent business logic across analytics platforms, eliminate metric drift, and provide a unified source of truth for enterprise reporting.

Apply Partitioning and Clustering Strategies

You will implement advanced partitioning and clustering techniques using SQL DDL commands to optimize query performance for large-scale datasets.

Analyze Normalization vs Performance Trade-offs

You will evaluate database normalization levels against query performance requirements to make strategic denormalization decisions for optimizing analytical workloads.

Create Entity-Relationship Diagrams

You will design and document comprehensive Entity-Relationship diagrams that effectively communicate complex data structures and relationships for enterprise data systems.

Implement Data Pipelines for Historical Changes

You will build automated SCD Type 2 pipelines using MERGE statements and window functions to preserve historical data integrity in enterprise environments.

Analyze Storage and Compute Cost Trends

You will conduct comprehensive cost analysis of data lifecycle patterns to develop strategic archiving recommendations that balance storage economics with business value.

Create Multi-Cluster Warehouse Architecture

You will design scalable multi-cluster data warehouse architectures that isolate workloads for optimal performance while implementing comprehensive cost control and resource management policies.

External Table Configuration Mastery

You will learn the technical implementation of external table configurations to enable direct querying of file-based datasets in cloud storage.

Open-Source Table Format Analysis

You will develop analytical frameworks to evaluate and compare the technical capabilities of Delta Lake, Apache Iceberg, and Apache Hudi for specific business requirements.

Data Ingestion Pipeline Implementation

You will architect and implement automated data ingestion pipelines that orchestrate data movement across medallion architecture zones within lakehouse platforms.

Project: Data Modeling and Lakehouse Architecture with SQL

You will design and implement a comprehensive data lakehouse architecture that integrates dimensional modeling, schema optimization, cost management, and multi-format data ingestion. This project synthesizes advanced SQL skills to create a production-ready data engineering solution.