UC San Diego Product Management Certificate — AI-Powered PM Training
AI, Data Science & Cloud Certificates from Google, IBM & Meta
Overview
Build a Learning Habit
Download Class Central's free printable study calendar
Download for Free
This course focuses on building full-fledged solutions that combine Apache Spark with other Big Data tools to create end-to-end data pipelines.
Syllabus
Introduction
- Driving big data engineering with Apache Spark
- Course prerequisites
- Setting up the exercise files
- What is data engineering?
- Data engineering vs. data analytics vs. data science
- Data engineering functions
- Batch vs. real-time processing
- Data engineering with Spark
- Spark architecture review
- Parallel processing with Spark
- Spark execution plan
- Stateful stream processing
- Spark analytics and ML
- Batch processing use case: Problem statement
- Batch processing use case: Design
- Setting up the local DB
- Uploading stock to a central store
- Aggregating stock across warehouses
- Real-time use case: Problem
- Real-time use case: Design
- Generating a visits data stream
- Building a website analytics job
- Executing the real-time pipeline
- Batch vs. real-time options
- Scaling extraction and loading operations
- Scaling processing operations
- Building resiliency
- Project exercise requirements
- Solution design
- Extracting long last actions
- Building a scorecard
- More about Apache Spark
Taught by
Ben Sullins