Processing 1M Identity Graphs per Second with Spark Structured Streaming

Learn how Adobe Experience Platform processes over 1 million identity graphs per second using Spark Structured Streaming and Delta Lake in this 31-minute conference talk. Discover the architecture, data patterns, and techniques that enabled Adobe to scale their ingestion pipeline by 10x over three years while maintaining system stability and regulatory compliance. Explore how micro-batching reduces data de-duplication by 70-80%, understand key metrics for tracking query performance, and see how Delta Lake enables rate limiting and anomalous identity filtering. Gain insights into managing schema evolution, using VACUUM for regulatory compliance, implementing multi-cloud pipeline abstraction, and optimizing async task processing for data ingestion into FoundationDB. Learn about Adobe's custom deployment mechanism that minimizes latency disruption while handling over 70 billion identities across 25 deployments in seven regions on Azure and AWS clouds, enabling personalization at scale while maintaining privacy and compliance standards.