Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

DataCamp

Serverless Data Processing with Dataflow: Operations

via DataCamp

Overview

Operate Dataflow pipelines in production. Learn monitoring, logging, troubleshooting, performance tuning, CI/CD, reliability, and templates.

Learn to operate and optimize Dataflow pipelines in production. This course covers monitoring with Job Info, metrics, and Metrics Explorer, logging and error reporting, troubleshooting workflows, pipeline design and performance tuning, testing and CI/CD, reliability patterns, disaster recovery, and Dataflow templates.

Syllabus

  • Introduction
    • This module covers the course outline
  • Monitoring
    • In this module, we learn how to use the Jobs List page to filter for jobs that we want to monitor or investigate. We look at how the Job Graph, Job Info, and Job Metrics tabs collectively provide a comprehensive summary of your Dataflow job. Lastly, we learn how we can use Dataflow’s integration with Metrics Explorer to create alerting policies for Dataflow metrics.
  • Logging and Error Reporting
    • In this module, we learn how to use the Log panel at the bottom of both the Job Graph and Job Metrics pages, and learn about the centralized Error Reporting page.
  • Troubleshooting and Debug
    • In this module, we learn how to troubleshoot and debug Dataflow pipelines. We will also review the four common modes of failure for Dataflow: failure to build the pipeline, failure to start the pipeline on Dataflow, failure during pipeline execution, and performance issues.
  • Performance
    • In this module, we will discuss performance considerations we should be aware of while developing batch and streaming pipelines in Dataflow.
  • Testing and CI/CD
    • This module will discuss unit testing your Dataflow pipelines. We also introduce frameworks and features available to streamline your CI/CD workflow for Dataflow pipelines.
  • Reliability
    • In this module we will discuss methods for building systems that are resilient to corrupted data and data center outages.
  • Flex Templates
    • This module covers Flex Templates, a feature that helps data engineering teams standardize and reuse Dataflow pipeline code. Many operational challenges can be solved with Flex Templates.
  • Summary
    • This module reviews the topics covered in the course

Taught by

Google Cloud

Reviews

Start your review of Serverless Data Processing with Dataflow: Operations

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.