Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Udemy

Master Data Engineering using Azure Data Analytics

via Udemy

Overview

Learn Azure Storage for Data Lake, ADF for ETL, Synapse for Data Warehouse, Databricks for Big Data Pipeline, etc

What you'll learn:
  • Data Engineering leveraging Services under Azure Data Analytics such as Azure Storage, Data Factory, Azure SQL, Synapse, Databricks, etc.
  • Setup Development Environment using Visual Studio Code on Windows
  • Building Data Lake using Azure Storage (Blob and ADLS)
  • Build Data Warehouse using Azure Synapse
  • Implement ETL Logic using ADF Data Flow with Azure Storage as Source and Target
  • In Depth Coverage of Orchestration using ADF Pipeline
  • Overview of Azure SQL and Azure Synapse Serverless and Dedicated Pool Features
  • Implement ETL Logic using ADF Data Flow with Azure SQL as Source and Azure Synapse as Target
  • Using Data Copy to copy data between different sources and targets
  • Performance Tuning Scenarios of ADF Data Flow and Pipelines
  • Build Big Data Solutions using Azure Databricks
  • Overview of Spark SQL and Pyspark Data Frame APIs
  • Build ELT Pipelines using Databricks Jobs and Workflows
  • Orchestrate Databricks Notebooks using ADF Pipelines

Data Engineering is all about building Data Pipelines to get data from multiple sources into Data Lakes or Data Warehouses and then from Data Lakes or Data Warehouses to downstream systems. As part of this course, I will walk you through how to build Data Engineering Pipelines using Azure Data Analytics Stack. It includes services such as Azure Storage (both Blob and ADLS), ADF Data Flow, ADFPipeline, Azure SQL, Azure Synapse, Azure Databricks, and many more.

  • As part of this course, first, you will go ahead and set up the environment to learn using VS Code on Windows and Mac.

  • Once the environment is ready, you need to sign up for Azure Portal. We will provide all the instructions to sign up for Azure Portal Account including reviewing billing as well as getting USD 200 Credit valid for up to a month.

  • We typically use Azure Storage as Data Lake. As part of this course, you will learn how to use Azure Storage as Data Lake along with how to manage the files in Azure Storage using tools such as Azure Storage Explorer.

  • ADF (Azure Data Factory) is used for both ETL as well as Orchestration. First, you will understand how to perform ETLusing ADF Data Flow. The source and target will be Files in Azure Storage Account. As part of this process, you will also learn how to set up Linked Services and Data Sets in ADF (Azure Data Factory).

  • Once ADF Data Flow is ready, you will go ahead and build Pipeline for Orchestration using ADF Pipeline. You will also learn how to parameterize and also how to take care of baseline load.

  • You will also understand key performance tuning techniques using ADF Pipeline such as controlling the number of partitions, custom integration runtimes (IR), etc.

  • Azureprovides RDBMS as different services for Postgres, SQLServer, etc. You will learn how to set up Azure SQL Once the Azure SQLis set up, you will also understand how to create required tables and run queries against them.

  • ADF provides ADFData Copy to copy data from different sources and different targets. Once the Database tables are ready you will use ADF Data Copy to copy data into the tables.

  • Azure provides Synapse Analytics for Data Warehouse. You will get an overview of both serverless as well as dedicated pools. You will end up setting up a Dedicated Pool for ETL using ADF.

  • Once Azure SQL and Azure Synapse are ready, you will build ETLPipeline using ADF Data Flow and Orchestrate using ADF Pipeline.

  • Azure Databricks is the service for Big Data Processing using Spark Engine. You will learn how to set up Azure Databricks, integrate with ADLS, and also managing secrets.

  • You will also get an overview of Spark SQLand Pyspark Data Frame APIs using Azure Databricks.

  • You will also build ELT Pipeline using Databricks Jobs and Workflows where tasks are defined based on Pyspark as well as Spark SQL.

  • You will also understand how to build ADF Pipelines to orchestrate Databricks Notebooks.

Syllabus

  • Udemy Introduction for Data Engineering using Azure
  • Setup Environment for Data Engineering using Azure
  • Getting Started with Azure for Data Engineering
  • Getting Started with Azure Resource Groups
  • Setup Data Sets for Data Engineering
  • Getting Started with Azure Data Factory
  • ADF Data Flow for ETL Logic to Compute Daily Product Revenue
  • Run ADF Pipelines Dynamically using Parameters
  • Run Baseline ETL Loads using ADF Pipeline
  • Performance Tuning of ADF Data Flows and Pipelines
  • Getting Started with Azure SQL Database
  • ADF Data Copy to Copy Data From Files to SQL Server Tables
  • Getting Started with Azure Synapse Analytics
  • Build ADF Data Flow using Azure SQL and Synapse Analytics
  • Getting Started with Azure Databricks
  • Integration of Azure Storage and Databricks
  • Overview of Databricks Secrets
  • Basic Transformations using Spark SQL
  • Ranking using Spark SQL Windowing Functions
  • Getting Started with PySpark Data Frame APIs
  • Build ELT Pipelines using Databricks Jobs and Workflows
  • Orchestrate Azure Databricks Applications using ADF Pipelines
  • Build Data Pipelines using ADF Pipelines and Databricks

Taught by

Durga Viswanatha Raju Gadiraju and Phani Bhushan Bozzam

Reviews

4.5 rating at Udemy based on 360 ratings

Start your review of Master Data Engineering using Azure Data Analytics

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.