Scaling Identity Graph Ingestion to 1M Events/Sec with Spark Streaming and Delta Lake
Databricks via YouTube
Gain a Splash of New Skills - Coursera+ Annual Nearly 45% Off
AI Adoption - Drive Business Value and Organizational Impact
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how Adobe's Real-Time Customer Data Platform achieves massive scale in identity graph processing through this 30-minute conference talk from Databricks. Discover the technical architecture that connects over 70 billion identities and processes terabytes of data daily across 25+ Databricks deployments spanning Azure and AWS regions. Explore the migration strategy from Flink to Spark Streaming that enabled a 10x increase in ingestion pipeline capacity, allowing the platform to handle over one million records per second during peak traffic events like the Super Bowl. Gain insights into advanced optimization techniques including data deduplication strategies, robust monitoring systems, and anomaly detection implementations that ensure real-time identity resolution at enterprise scale while maintaining compliance and privacy standards. Understand how Spark Streaming and Delta Lake work together to deliver personalized customer experiences through high-performance data processing, presented by senior engineers from Adobe who share practical strategies for scaling data ingestion pipelines in multi-cloud environments.
Syllabus
Scaling Identity Graph Ingestion to 1M Events/Sec with Spark Streaming & Delta Lake
Taught by
Databricks