Unifying Human-Curated Data Ingestion and Real-Time Updates with Databricks DLT, Protobuf and BSR

Learn how to build a streaming-native data system that unifies file-based ingestion and real-time user edits using Databricks Delta Live Tables (DLT), Protobuf, and Buf Schema Registry (BSR) in this 40-minute conference talk. Discover the Red Stapler system architecture that merges different data sources into a single DLT pipeline for near real-time feedback while maintaining data quality and governance. Explore how Protobuf definitions managed in BSR enforce schema and data-quality rules while ensuring backward compatibility across system updates. Understand the implementation of SCD Type 2 tables that store all records regardless of validity, capturing complete version history and enabling immediate quarantine views for invalid data. Master the configuration-driven approach that allows easy adaptation to evolving survey definitions without production risks, while leveraging DLT Serverless and Kafka-compatible Bufstream for cost-effective scaling that reduces to zero during idle periods. Gain insights into achieving consistent validation, quick updates, and comprehensive audit trails essential for building trustworthy and flexible data pipelines in production environments.