Learn how to design and implement robust data pipelines using Lakehouse architecture principles, medallion architecture, and Lakeflow Spark Declarative Pipelines in Azure Databricks.
By the end of this module, you'll be able to:
- Design the order of operations for data pipelines from ingestion to serving.
- Choose between notebooks and Lakeflow Spark Declarative Pipelines based on use cases.
- Design task logic, dependencies, and execution patterns for Lakeflow Jobs.
- Implement error handling strategies, including retry policies and expectations.
- Create data pipelines using both notebooks and Lakeflow Spark Declarative Pipelines.
Learn how to create, configure, schedule, and monitor Lakeflow Jobs in Azure Databricks to automate your data pipelines.
By the end of this module, you'll be able to:
- Create and configure Lakeflow Jobs with tasks and compute resources
- Configure job triggers including table updates and file arrivals
- Schedule jobs using intervals and cron expressions
- Configure job alerts and notifications for monitoring
- Configure automatic restarts and retry policies for reliability
Learn how to implement development lifecycle processes in Azure Databricks, including Git version control, branching strategies, testing approaches, and Databricks Asset Bundle deployment using the CLI.
By the end of this module, you'll be able to:
- Apply Git version control best practices using Git folders in Azure Databricks
- Manage branching, pull requests, and conflict resolution for collaborative development
- Implement a testing strategy including unit, integration, end-to-end, and user acceptance tests
- Configure and customize Databricks Asset Bundles for deployment automation
- Deploy bundles using the Databricks CLI across development and production environments
Learn how to monitor cluster consumption, troubleshoot job failures, diagnose Spark performance issues, and implement log streaming to Azure Log Analytics for centralized monitoring of Azure Databricks workloads.
By the end of this module, you'll be able to:
- Monitor and manage cluster consumption using metrics, auto-termination, and budgets
- Troubleshoot and repair failed Lakeflow Jobs using the Jobs UI and repair run feature
- Diagnose Spark job failures and resource bottlenecks using the Spark UI and compute metrics
- Investigate and resolve caching, data skew, memory spill, and shuffle performance issues
- Implement log streaming from Azure Databricks to Azure Log Analytics for centralized monitoring

Syllabus

Design and implement data pipelines with Azure Databricks
- Introduction
- Design order of operations for a pipeline
- Choose notebook vs Lakeflow Pipelines
- Design Lakeflow job logic
- Design error handling in pipelines and jobs
- Create pipeline with notebook
- Create pipeline with Lakeflow Spark Declarative Pipelines
- Exercise - Design and Implement Data Pipelines with Azure Databricks
- Module assessment
- Summary
Implement Lakeflow Jobs with Azure Databricks
- Introduction
- Create job setup and configuration
- Configure job triggers
- Schedule a job
- Configure job alerts
- Configure automatic restarts
- Exercise - Implement Lakeflow Jobs with Azure Databricks
- Module assessment
- Summary
Implement development lifecycle processes in Azure Databricks
- Introduction
- Apply Git version control best practices
- Manage branching and pull requests
- Implement testing strategy
- Configure and package DABs
- Deploy bundle with Databricks CLI
- Exercise - Implement Development Lifecycle Processes in Azure Databricks
- Module assessment
- Summary
Monitor, troubleshoot and optimize workloads in Azure Databricks
- Introduction
- Monitor and manage cluster consumption
- Troubleshoot and repair Lakeflow Jobs
- Troubleshoot Spark jobs and notebooks
- Investigate caching, skewing, spilling, shuffle
- Implement log streaming with Azure Log Analytics
- Exercise - Monitor, Troubleshoot and Optimize Workloads in Azure Databricks
- Module assessment
- Summary