Bridging Python and Apache Iceberg: The Power of PyIceberg

Discover how to manage large-scale datasets efficiently with this PyCon US talk that explores the integration of Apache Iceberg™ with Python through PyIceberg. Learn how this open table format addresses the challenges of handling terabyte-scale data, evolving schemas, and maintaining consistency across different tools. Follow along as the presentation introduces Iceberg and PyIceberg fundamentals, highlighting features like schema evolution and transactional guarantees specifically designed for the Python ecosystem. See practical demonstrations of creating, querying, and writing to Iceberg tables while maintaining interoperability with Python-native dataframes such as PyArrow and Pandas. Dive deeper into Iceberg's file structure, including metadata files, manifest lists, and manifests, to understand how PyIceberg leverages this architecture for transactional table updates and query optimization. Explore advanced features like hidden partitioning and time travel that make table management more efficient and flexible at scale. This 28-minute talk provides essential knowledge for Python developers working with large datasets.