Building Agentic Self-Healing Data Pipeline - End to End Data Engineering Project
CodeWithYu via YouTube
Save 40% on 3 months of Coursera Plus
PowerBI Data Analyst - Create visualizations and dashboards from scratch
Overview
Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Learn to build a production-ready, AI-powered data pipeline that automatically detects and heals data quality issues in real-time using Apache Airflow 3.0 and Ollama with LLaMA 3.2. Master the creation of intelligent pipelines that automatically diagnose data quality problems including missing values, wrong data types, and malformed text while performing self-healing on problematic records without manual intervention. Explore advanced techniques for implementing sentiment analysis on millions of Yelp reviews using local LLM models, generating comprehensive health reports and metrics, and building graceful degradation patterns when systems encounter errors. Discover how to construct agentic workflows in Apache Airflow, integrate local LLMs into data processing pipelines, implement robust self-healing patterns for maintaining data quality, develop effective batch processing strategies for handling large datasets, and build comprehensive health monitoring and observability systems. Follow along with hands-on implementation covering system architecture design, project setup and configuration, embedding AI agents directly into Airflow workflows, advanced pipeline diagnosis and healing mechanisms, automated health report generation, and thorough results analysis and review.
Syllabus
0:00 Introduction
1:43 System Architecture and background
5:49 Setting up the project
13:27 The Agentic Self Healing Pipeline
17:00 Embedding AI Agents in Airflow
40:44 Diagnosing and Healing Pipelines
1:11:44 Generating Health Reports
1:16:12 Results and Review
1:30:00 Outro
Taught by
CodeWithYu