Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
In this 32-minute conference talk from Data Council, learn how to design reproducible data workloads over Data Lakes as Ciro Greco shares valuable insights on decoupling code, compute, and data management for deterministic pipeline reproduction. Discover practical approaches for Python data pipeline developers and engineers debugging complex workflows, with a focus on leveraging open-source components like Iceberg, Arrow, and Docker to create declarative functional DAGs that execute efficiently in cloud environments. Particularly valuable for technical teams working with data lakes who need to ensure reproducibility and maintainability in their data processing systems.