Accelerating Pandas with NVIDIA's cuDF: Basic Statistical Analysis and Data Cleaning - Episode 6
Python Tutorials for Digital Humanities via YouTube
AI, Data Science & Business Certificates from Google, IBM & Microsoft
The Private Equity Associate Certification
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn how to accelerate data analysis with NVIDIA's cuDF in this 15-minute tutorial from the Python Tutorials for Digital Humanities series. Discover the performance advantages of GPU acceleration when working with large datasets, specifically demonstrated on 4.3 million newspaper articles. Compare CPU and GPU performance for tasks like word counts and text length calculations using an NVIDIA RTX 5000 GPU. Follow along with essential data cleaning techniques to improve data quality. The tutorial covers introduction to cuDF, hardware setup details, dataset loading and preparation, statistical analysis on both CPU and GPU, and methods for identifying and cleaning problematic data. Access the companion notebook on GitHub to practice these techniques yourself.
Syllabus
00:00 Introduction to QDF and Video Overview
00:47 Exciting Hardware Setup for the Series
02:06 Loading and Preparing the Dataset
03:55 Performing Statistical Analysis on CPU
05:20 Accelerating Analysis with GPU
08:54 Identifying and Cleaning Bad Data
14:31 Conclusion and Next Steps
Taught by
Python Tutorials for Digital Humanities