Unifying Human-Curated Data Ingestion and Real-Time Updates with Databricks DLT, Protobuf and BSR
Databricks via YouTube
-
16
-
- Write review
Learn AI, Data Science & Business — Earn Certificates That Get You Hired
Power BI Fundamentals - Create visualizations and dashboards from scratch
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Learn how to build a streaming-native data system that unifies file-based ingestion and real-time user edits using Databricks Delta Live Tables (DLT), Protobuf, and Buf Schema Registry (BSR) in this 40-minute conference talk. Discover the Red Stapler system architecture that merges different data sources into a single DLT pipeline for near real-time feedback while maintaining data quality and governance. Explore how Protobuf definitions managed in BSR enforce schema and data-quality rules while ensuring backward compatibility across system updates. Understand the implementation of SCD Type 2 tables that store all records regardless of validity, capturing complete version history and enabling immediate quarantine views for invalid data. Master the configuration-driven approach that allows easy adaptation to evolving survey definitions without production risks, while leveraging DLT Serverless and Kafka-compatible Bufstream for cost-effective scaling that reduces to zero during idle periods. Gain insights into achieving consistent validation, quick updates, and comprehensive audit trails essential for building trustworthy and flexible data pipelines in production environments.
Syllabus
Unifying Human-Curated Data Ingestion and Real-Time Updates with Databricks DLT, Protobuf and BSR
Taught by
Databricks