Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Google

Building Batch Data Pipelines on Google Cloud

Google via Google Skills

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
In this intermediate course, you will learn to design, build, and optimize robust batch data pipelines on Google Cloud. Moving beyond fundamental data handling, you will explore large-scale data transformations and efficient workflow orchestration, essential for timely business intelligence and critical reporting. Get hands-on practice using Dataflow for Apache Beam and Serverless for Apache Spark (Dataproc Serverless) for implementation, and tackle crucial considerations for data quality, monitoring, and alerting to ensure pipeline reliability and operational excellence. A basic knowledge of data warehousing, ETL/ELT, SQL, Python, and Google Cloud concepts is recommended.

Syllabus

  • When to choose batch data pipelines
    • Batch data pipelines and their use cases
    • Processing and common challenges
    • Module 1 Quiz: When to choose batch data pipelines
  • Design and build batch data pipelines
    • Design batch pipelines
    • Large scale data transformations
    • Dataflow and Serverless for Apache Spark
    • Module 2 Quiz: Design and transformations
    • Data connections and orchestration
    • Execute an Apache Spark pipeline
    • Optimize batch pipeline performance
    • Use Serverless for Apache Spark to Load BigQuery
    • Build a Simple Batch Pipeline with Dataflow Job Builder UI
  • Control data quality in batch data pipelines
    • Batch data validation and cleansing
    • Log and analyze errors
    • Schema evolution for batch pipelines
    • Module 3 Quiz: Data validation and schema evolution
    • Data integrity and duplication
    • Deduplication with Serverless for Apache Spark
    • Deduplication with Dataflow
    • Validate Data Quality for a Batch Data Pipeline using Serverless for Apache Spark
  • Orchestrate and monitor batch data pipelines
    • Orchestration for batch processing
    • Cloud Composer
    • Module 4 Quiz: Orchestrations and DAGs
    • Unified observability
    • Alerts and troubleshooting
    • Module 4 Quiz: Observability
    • Visual pipeline management
    • Building Batch Pipelines in Cloud Data Fusion
    • Congratulations: Course summary
  • Your Next Steps
    • Course Badge

Reviews

Start your review of Building Batch Data Pipelines on Google Cloud

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.