Power BI Fundamentals - Create visualizations and dashboards from scratch
Free AI-powered learning to build in-demand skills
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to streamline data science workflows by using DuckDB to manage and organize the chaotic collection of files that accumulate in typical data projects. Discover how DuckDB's MIT-licensed database management system integrates seamlessly with Python environments, offering a dependency-free solution that can be installed via pip and works directly with popular dataframe libraries including Pandas, Polars, and Apache Arrow. Master DuckDB's world-class file reading capabilities that handle multiple formats including CSV, Parquet, JSON, Excel, and Google Sheets, with particular emphasis on its robust CSV reader that can efficiently process even messy data files. Explore how DuckDB connects to cloud object stores and reads modern lakehouse formats like Delta and Iceberg for comprehensive data access. Understand how to leverage DuckDB's highly compressed columnar file format to consolidate multiple large tables into single files while maintaining the ability to store processing logic in views and functions and perform partial file updates. Gain insights into DuckDB's ACID transactional safety, parallel processing capabilities, and cross-language compatibility that ensures long-term data accessibility. Learn about community extensions that expand DuckDB's format support through a pip-like package repository system. Acquire practical skills for installing DuckDB locally, integrating it into existing Python scripts and Jupyter Notebooks, and implementing effective file management strategies that enable larger-than-memory analyses to solve complex data problems beyond typical hardware limitations.
Syllabus
Taming file zoos: Data science with DuckDB database files - Alex Monahan
Taught by
PyCon US