Accelerating Pandas with NVIDIA's cuDF: Basic Statistical Analysis and Data Cleaning - Episode 6
Python Tutorials for Digital Humanities via YouTube
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to accelerate data analysis with NVIDIA's cuDF in this 15-minute tutorial from the Python Tutorials for Digital Humanities series. Discover the performance advantages of GPU acceleration when working with large datasets, specifically demonstrated on 4.3 million newspaper articles. Compare CPU and GPU performance for tasks like word counts and text length calculations using an NVIDIA RTX 5000 GPU. Follow along with essential data cleaning techniques to improve data quality. The tutorial covers introduction to cuDF, hardware setup details, dataset loading and preparation, statistical analysis on both CPU and GPU, and methods for identifying and cleaning problematic data. Access the companion notebook on GitHub to practice these techniques yourself.
Syllabus
00:00 Introduction to QDF and Video Overview
00:47 Exciting Hardware Setup for the Series
02:06 Loading and Preparing the Dataset
03:55 Performing Statistical Analysis on CPU
05:20 Accelerating Analysis with GPU
08:54 Identifying and Cleaning Bad Data
14:31 Conclusion and Next Steps
Taught by
Python Tutorials for Digital Humanities