Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

CodeSignal

Designing & Ingesting Data into AWS Data Lakes

via CodeSignal

Overview

Build a scalable data lake on AWS. Learn to structure S3 storage with data zones, ingest streaming events with Amazon Kinesis and Firehose, and catalog data using AWS Glue Crawlers to prepare for downstream transformation and querying.

Syllabus

  • Unit 1: Structuring Data Lakes
    • Creating Your First S3 Bucket
    • Setting Up Data Lake Zones
    • Generating Realistic User Event Data
    • Fixing Time Based Data Partitioning
    • Implementing Dual Dimension Data Partitioning
    • Verifying Your Data Lake Structure
  • Unit 2: Streaming Ingestion with Kinesis
    • Configure Your First Kinesis Stream
    • Implement Your First Kinesis Stream
    • Wait for Stream Activation
    • Verify Stream Status and Details
    • Fix the Broken Data Producer
    • Send Events to Kinesis Stream
    • Implement Custom Partition Key Strategy
  • Unit 3: Delivering Streaming Data to S3
    • Complete the ARN Helper Functions
    • Dynamically Build the Firehose IAM Role ARN
    • Connect Kinesis Stream to Firehose Delivery Stream
    • Configure Firehose S3 Destination with Buffering and Compression
    • Verify Your Firehose Delivery Pipeline
  • Unit 4: Creating Glue Databases
    • Complete Your First Glue Database
    • Fix Database Creation Error Handling
    • List Your Database Catalog
    • Search Your Database Catalog
  • Unit 5: Cataloging S3 Data with Glue
    • Setting Up Your Data Crawler
    • Configuring Crawler Targets and Policies
    • Running Your First Data Crawler
    • Exploring Your Discovered Data Catalog

Reviews

Start your review of Designing & Ingesting Data into AWS Data Lakes

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.