In this lab, you a) build a batch ETL pipeline in Apache Beam, which takes raw data from Google Cloud Storage and writes it to Google BigQuery b) run the Apache Beam pipeline on Cloud Dataflow and c) parameterize the execution of the pipeline.

Syllabus

Overview
Setup and requirements
Apache Beam and Cloud Dataflow
Lab part 1. Writing an ETL pipeline from scratch
Task 1. Generate synthetic data
Task 2. Read data from your source
Task 3. Run your pipeline to verify that it works
Task 4. Adding in a transformation
Task 5. Writing to a sink
Task 6. Run your pipeline
Lab part 2. Parameterizing basic ETL
Task 1. Creating a JSON schema file
Task 2. Writing a JavaScript user-defined function
Task 3. Running a Dataflow Template
Task 4. Inspect the Dataflow Template code
End your lab

Reviews

Start your review of Serverless Data Processing with Dataflow - Writing an ETL pipeline using Apache Beam and Cloud Dataflow (Java)

Learn AI, Data Science & Business — Earn Certificates That Get You Hired

Free courses from frontend to fullstack and AI

Tags

Live Online Classes in Design, Coding & AI — Small Classes, Free Retakes

Serverless Data Processing with Dataflow - Writing an ETL Pipeline using Apache Beam and Cloud Dataflow (Python)

Data Engineering with Google Dataflow and Apache Beam on GCP

Serverless Data Processing with Dataflow: Foundations

Serverless Data Processing with Dataflow: Foundations

Learn Practical Apache Beam in Java | BigData framework

Start speaking a new language. It’s just 3 weeks away. Ad

16 Best Java Courses Online for 2026: Beginner to Advanced

8 Best Groovy Courses for 2026: The Python of Java

10 Best Spring/Boot Courses for 2026: Microservices & Web Apps

9 Best Microservices Courses for 2026: Scalability, Block by Block

11 Best Programming Courses for 2026

Never Stop Learning.