What you'll learn:
- Medallion Architecture , Dimensional Data Modelling Design , DeltalakeHouse Design , Spark Core Architecture , Unity Catalogue Setup , Spark Cluster Setup
- PySpark Dataframe Reader , Writer , Transformation Functions , Action Functions , DateTime Functions , Aggregation Functions , Dataframe Joins , Complex Data
- Spark SQL External Tables , Managed Tables , Delta Lake Tables , Create Table As Script(CTAS) , Temp Views , Table Joins , Data Transformation Functions
- Four Reusable Ingestion Pipelines To Ingest Source Data From Web(HTTP) Service , Database Tables , API Source Systems , Incremental Loading & Job Scheduling
- Seven Data Transformation Pipelines to process source data in Silver & Gold Layers and Build Reporting Database And Datalake With Change Data Capture
- Spark Streaming Reader & Writer Configuration To Process Real Time Steaming Data , CHECKPOINTLOCATION setup for automated Incremental Loading in Streaming Data
- Delta Live Tables - Materialised Views , Streaming Tables setup , Delta Live Table Pipeline Configuration , Data Quality Checks , AUTOLOADER and APPLY CHANGES
- Monitoring And Logging Setup To Monitor Production Job Runs, Setup Alerts for Job Failure and Extended Logging of Job Runs and Service Metrics
- Security Settings in Azure using Microsoft Entra ID , IAM Role Based Access Control(RBAC) And Databricks Workspace Admin Settings
- Configure Github Repository , Git Repos Folders in Databricks Workspace , Ways of Working with Git branches , Merging Code & PULL requests
- Setup Production Environment , CI/CD Pipeline to automate Code Deployment Using GitHub Actions
ByCompleting this course you will be equipped with below Data Engineer Roles &Responsibilities in the real time project
• Designing and Configuring UnityCatalogue for Better Access Control & Connecting toExternal DataStores
• Designing and Developing Databricks(PySpark) Notebooks to Ingest the data from Web(HTTP)Services
• Designing and Developing Databricks(PySpark) Notebooks to Ingest the data from SQLDatabases
• Designing and Developing Databricks(PySpark) Notebooks to Ingest the data from API source Systems
• Designing and Developing SparkSQLExternal and Managed Tables
• Developed Databricks Spark SQL Reusable Notebooks To Create and populate Delta LakeTables
• Developed Databricks SQLCode to populate Reporting Dimensiontables
• Developed Databricks SQLCode to populate Reporting SCDType 2 Dimensiontables
• Developed Databricks SQLCode to populate Reporting Fact Table
• Designing and Developing Databricks(PySpark ) Notebooks to Process andFlatten Semi Structured JSON Data using EXPLODE function
• Designing and Developing Databricks(PySpark ) Notebooks to Integrate(JOIN) Data and load into Datalake Gold Layer
• Designing and Developing Databricks(PySpark) Notebooks to Process Semi Structured JSON Data in DataLake Silver Layer
• Designing and Developing Databricks(SQL) Notebooks to IntegrateData and load into Datalake Gold Layer
• Developed Databricks Jobs for Scheduling the Data Ingestion and Transformation Notebooks
• Designing and Configuring Delta Live Tables inall layers for seamless Data Integration
• Setup Azure Monitor and Log Analytics for Automated Monitoring of Job Runs and Stored Extended Log Details
• SetupAzure Key Vault and Configure KeyVault Backed Secret Scopes inDatabricks Workspace
• Configuring GitHubRepository and creating GitRepoFolders in Databricks Workspace
• Designing and Configuring CI/CDPipelines to release the code into multiple environment
• Identifying performance bottle necks and perform the performance tuning using ZORDER BY ,BROADCASTJOIN , ADAPTIVEQUERYEXECUTION , DATASALTINGand LIQUIDCLUSTERING