Python Over Data Lakes: Declarative Environments, Data Management and Other Things with Feathers
Data Council via YouTube
MIT Sloan AI Adoption: Build a Playbook That Drives Real Business ROI
Learn EDR Internals: Research & Development From The Masters
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
In this 32-minute conference talk from Data Council, learn how to design reproducible data workloads over Data Lakes as Ciro Greco shares valuable insights on decoupling code, compute, and data management for deterministic pipeline reproduction. Discover practical approaches for Python data pipeline developers and engineers debugging complex workflows, with a focus on leveraging open-source components like Iceberg, Arrow, and Docker to create declarative functional DAGs that execute efficiently in cloud environments. Particularly valuable for technical teams working with data lakes who need to ensure reproducibility and maintainability in their data processing systems.
Syllabus
Python Over Data Lakes: Declarative Environments, Data Management & Other Things w/ Feathers
Taught by
Data Council