Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Udemy

Azure DataBricks - Data Engineering With Real Time Project

via Udemy

Overview

Real Time Project on Retail Data - PySpark ,SQL, Delta/Delta Live Table,Unity Catalogue, AutoLoader & Performance Tuning

What you'll learn:
  • Medallion Architecture , Dimensional Data Modelling Design , DeltalakeHouse Design , Spark Core Architecture , Unity Catalogue Setup , Spark Cluster Setup
  • PySpark Dataframe Reader , Writer , Transformation Functions , Action Functions , DateTime Functions , Aggregation Functions , Dataframe Joins , Complex Data
  • Spark SQL External Tables , Managed Tables , Delta Lake Tables , Create Table As Script(CTAS) , Temp Views , Table Joins , Data Transformation Functions
  • Four Reusable Ingestion Pipelines To Ingest Source Data From Web(HTTP) Service , Database Tables , API Source Systems , Incremental Loading & Job Scheduling
  • Seven Data Transformation Pipelines to process source data in Silver & Gold Layers and Build Reporting Database And Datalake With Change Data Capture
  • Spark Streaming Reader & Writer Configuration To Process Real Time Steaming Data , CHECKPOINTLOCATION setup for automated Incremental Loading in Streaming Data
  • Delta Live Tables - Materialised Views , Streaming Tables setup , Delta Live Table Pipeline Configuration , Data Quality Checks , AUTOLOADER and APPLY CHANGES
  • Monitoring And Logging Setup To Monitor Production Job Runs, Setup Alerts for Job Failure and Extended Logging of Job Runs and Service Metrics
  • Security Settings in Azure using Microsoft Entra ID , IAM Role Based Access Control(RBAC) And Databricks Workspace Admin Settings
  • Configure Github Repository , Git Repos Folders in Databricks Workspace , Ways of Working with Git branches , Merging Code & PULL requests
  • Setup Production Environment , CI/CD Pipeline to automate Code Deployment Using GitHub Actions

ByCompleting this course you will be equipped with below Data Engineer Roles &Responsibilities in the real time project

• Designing and Configuring UnityCatalogue for Better Access Control & Connecting toExternal DataStores

• Designing and Developing Databricks(PySpark) Notebooks to Ingest the data from Web(HTTP)Services

• Designing and Developing Databricks(PySpark) Notebooks to Ingest the data from SQLDatabases

• Designing and Developing Databricks(PySpark) Notebooks to Ingest the data from API source Systems

• Designing and Developing SparkSQLExternal and Managed Tables

• Developed Databricks Spark SQL Reusable Notebooks To Create and populate Delta LakeTables

• Developed Databricks SQLCode to populate Reporting Dimensiontables

• Developed Databricks SQLCode to populate Reporting SCDType 2 Dimensiontables

• Developed Databricks SQLCode to populate Reporting Fact Table

• Designing and Developing Databricks(PySpark ) Notebooks to Process andFlatten Semi Structured JSON Data using EXPLODE function

• Designing and Developing Databricks(PySpark ) Notebooks to Integrate(JOIN) Data and load into Datalake Gold Layer

• Designing and Developing Databricks(PySpark) Notebooks to Process Semi Structured JSON Data in DataLake Silver Layer

• Designing and Developing Databricks(SQL) Notebooks to IntegrateData and load into Datalake Gold Layer

• Developed Databricks Jobs for Scheduling the Data Ingestion and Transformation Notebooks

• Designing and Configuring Delta Live Tables inall layers for seamless Data Integration

• Setup Azure Monitor and Log Analytics for Automated Monitoring of Job Runs and Stored Extended Log Details

• SetupAzure Key Vault and Configure KeyVault Backed Secret Scopes inDatabricks Workspace

• Configuring GitHubRepository and creating GitRepoFolders in Databricks Workspace

• Designing and Configuring CI/CDPipelines to release the code into multiple environment

• Identifying performance bottle necks and perform the performance tuning using ZORDER BY ,BROADCASTJOIN , ADAPTIVEQUERYEXECUTION , DATASALTINGand LIQUIDCLUSTERING


Syllabus

  • Introduction
  • Azure Portal Overview & Create Azure Resources
  • PySpark Introduction
  • SparkSQL Introduction
  • Unity Catalogue Configuration
  • Ingest Source Data From Web(HTTP) Service Into Bronze Layer Using PYSPARK
  • Ingest Source Data From Database Tables Using PYSPARK
  • Silver Layer Transformation - Parquet Files & Delta Table Config Using Spark SQL
  • Dimensional Data Modelling (Star Schema) - Reporting Database Design
  • Reporting Dimension(SCD Types 1 & 2 ) And Fact Tables Load Using Spark SQL
  • Spark Structured Streaming - Real Time Data Processing
  • Delta Live Tables Introduction
  • Datalake Bronze Layer Load - Ingest Geo-Location API Source Data
  • DataLake Silver Layer Transformations - Transform Geo Location API Source Data
  • Datalake Bronze Layer Load - Ingest Weather-Data API Source
  • DataLake Silver Layer Transformations - Transform Weather Data (ASSIGNMENT)
  • DataLake Gold Layer Load- Publish Price Prediction AI Source Data (ASSIGNMENT)
  • Monitoring And Logging - Azure Monitor , Log Analytics & Job Notifications
  • Security Settings - AZURE IAM(RBAC) Access Control & Databricks WorkSpace Admin
  • Git Repository Integration For Databricks WorkSpace
  • CI/CD (Continuous Integration / Continuous Deployment) Pipeline
  • Performance Tuning

Taught by

Ragunathan Ramanujam

Reviews

4.7 rating at Udemy based on 671 ratings

Start your review of Azure DataBricks - Data Engineering With Real Time Project

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.