Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Data Warehouses on AWS

via Udacity

Go to class Write review

Details

Go to class

Provider

Udacity
Pricing

Paid Course
Languages

English
Certificate

Certificate Available
Effort

12 hours
Sessions

Self-Paced
Level

Intermediate

Found in

Overview

Build cloud-based data warehouses that power analytical workloads. Learn dimensional modeling techniques—including star and snowflake schemas, fact grain, and surrogate keys—to structure data for efficient OLAP queries. Use Python and SQL to build ETL pipelines that extract from diverse source systems like PostgreSQL, Cassandra, and Neo4j, clean and conform data across sources, and load it into Amazon Redshift. Optimize table performance with distribution styles, sort keys, and compression to speed up queries at scale. Create materialized views that pre-compute common aggregations so analysts get fast answers without recalculating. Validate data quality to ensure your warehouse is accurate, complete, and production-ready.

Syllabus

Introduction to Data Warehousing

Explore how data warehouses unify scattered operational systems into a single source of truth, and compare OLTP vs. OLAP, dimensional modeling basics, and star and snowflake schemas.

Dimensional Modeling for Analytics

Define fact grain, build fact and dimension tables with surrogate keys, and write Redshift DDL that encodes distribution styles, sort keys, and compression for performance.

Extracting and Transforming Source Data

Build ETL pipelines in Python that pull data from PostgreSQL, Cassandra, and Neo4j, then clean, conform, and derive dimensions for a consistent warehouse-ready dataset.

Loading, Optimization, and Validation in Redshift

Stage data through S3, load it with the COPY command, optimize tables for parallel query performance, create materialized views, and run data quality validation checks.

Build a Multi-Source E-commerce Analytics Warehouse in Redshift

Design a star schema and end-to-end ETL pipeline that integrates e-commerce data from three source systems into an optimized, validated Redshift warehouse.