Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Scaling Data Engineering Pipelines - Preparing Credit Card Transactions Data for Machine Learning

Databricks via YouTube

Start learning Write review

Learn how to build scalable data engineering pipelines for processing credit card transaction data in preparation for machine learning applications through this 34-minute conference talk from Databricks. Explore two comprehensive real-world use cases that demonstrate advanced big data engineering techniques for constructing stable pipelines and managing petabyte-scale storage systems. Discover how implementing Delta Lake can dramatically optimize data pipeline performance, achieving an impressive 80% reduction in query execution time and 70% decrease in storage space requirements. Master the application of Databricks Workflows 'ForEach' operator for executing compute-intensive pipelines across multiple clusters, transforming processing times from months to mere days. Examine a reusable design pattern that isolates notebooks into discrete units of work, enabling data scientists to independently test and develop their solutions while maintaining pipeline stability. Gain insights from Mastercard's Director of Data Science Brandon DeShon and Lead Data Engineer Luke Garzia as they share practical strategies for scaling data engineering operations in enterprise environments focused on financial transaction processing and machine learning readiness.