How to Build a Data Pipeline Using Synthetic Data Generation and Testing with Python
PyCon South Africa via YouTube
Google, IBM & Microsoft Certificates — All in One Plan
MIT Sloan: Lead AI Adoption Across Your Organization — Not Just Pilot It
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn how to overcome data pipeline development challenges when real data is unavailable through this 31-minute conference talk from PyCon South Africa. Discover practical techniques for generating and utilizing synthetic data with Python, including statistical methods and packages like Faker and SDV to create realistic test data for customer profiles, transactions, and time series. Explore how to implement Flyway for loading synthetic data into Postgres databases and managing repeatable deployments. Gain valuable insights into best practices, benefits, and potential challenges of synthetic data testing through code examples and live demonstrations. Designed for intermediate Python developers, master the essential skills needed to build and validate robust data pipelines without requiring access to actual production data.
Syllabus
Time: Oct 05 Thu:
Duration:
Taught by
PyCon South Africa