Unifying Human-Curated Data Ingestion and Real-Time Updates with Databricks DLT, Protobuf and BSR
Databricks via YouTube
-
15
-
- Write review
Lead AI Strategy with UCSB's Agentic AI Program — Microsoft Certified
Free courses from frontend to fullstack and AI
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn how to build a streaming-native data system that unifies file-based ingestion and real-time user edits using Databricks Delta Live Tables (DLT), Protobuf, and Buf Schema Registry (BSR) in this 40-minute conference talk. Discover the Red Stapler system architecture that merges different data sources into a single DLT pipeline for near real-time feedback while maintaining data quality and governance. Explore how Protobuf definitions managed in BSR enforce schema and data-quality rules while ensuring backward compatibility across system updates. Understand the implementation of SCD Type 2 tables that store all records regardless of validity, capturing complete version history and enabling immediate quarantine views for invalid data. Master the configuration-driven approach that allows easy adaptation to evolving survey definitions without production risks, while leveraging DLT Serverless and Kafka-compatible Bufstream for cost-effective scaling that reduces to zero during idle periods. Gain insights into achieving consistent validation, quick updates, and comprehensive audit trails essential for building trustworthy and flexible data pipelines in production environments.
Syllabus
Unifying Human-Curated Data Ingestion and Real-Time Updates with Databricks DLT, Protobuf and BSR
Taught by
Databricks