Scaling Data Engineering Pipelines - Preparing Credit Card Transactions Data for Machine Learning
Databricks via YouTube
Master Finance Tools - 35% Off CFI (Code CFI35)
Power BI Fundamentals - Create visualizations and dashboards from scratch
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to build scalable data engineering pipelines for processing credit card transaction data in preparation for machine learning applications through this 34-minute conference talk from Databricks. Explore two comprehensive real-world use cases that demonstrate advanced big data engineering techniques for constructing stable pipelines and managing petabyte-scale storage systems. Discover how implementing Delta Lake can dramatically optimize data pipeline performance, achieving an impressive 80% reduction in query execution time and 70% decrease in storage space requirements. Master the application of Databricks Workflows 'ForEach' operator for executing compute-intensive pipelines across multiple clusters, transforming processing times from months to mere days. Examine a reusable design pattern that isolates notebooks into discrete units of work, enabling data scientists to independently test and develop their solutions while maintaining pipeline stability. Gain insights from Mastercard's Director of Data Science Brandon DeShon and Lead Data Engineer Luke Garzia as they share practical strategies for scaling data engineering operations in enterprise environments focused on financial transaction processing and machine learning readiness.
Syllabus
Scaling Data Engineering Pipelines: Preparing Credit Card Transactions Data for Machine Learning
Taught by
Databricks