Accelerating Pandas with NVIDIA's cuDF: Basic Statistical Analysis and Data Cleaning - Episode 6
Python Tutorials for Digital Humanities via YouTube
Free courses from frontend to fullstack and AI
Learn Generative AI, Prompt Engineering, and LLMs for Free
Overview
AI, Data Science & Cloud Certificates from Google, IBM & Meta — 40% Off
One plan covers every Professional Certificate on Coursera. 40% off Coursera Plus Annual.
Unlock All Certificates
Learn how to accelerate data analysis with NVIDIA's cuDF in this 15-minute tutorial from the Python Tutorials for Digital Humanities series. Discover the performance advantages of GPU acceleration when working with large datasets, specifically demonstrated on 4.3 million newspaper articles. Compare CPU and GPU performance for tasks like word counts and text length calculations using an NVIDIA RTX 5000 GPU. Follow along with essential data cleaning techniques to improve data quality. The tutorial covers introduction to cuDF, hardware setup details, dataset loading and preparation, statistical analysis on both CPU and GPU, and methods for identifying and cleaning problematic data. Access the companion notebook on GitHub to practice these techniques yourself.
Syllabus
00:00 Introduction to QDF and Video Overview
00:47 Exciting Hardware Setup for the Series
02:06 Loading and Preparing the Dataset
03:55 Performing Statistical Analysis on CPU
05:20 Accelerating Analysis with GPU
08:54 Identifying and Cleaning Bad Data
14:31 Conclusion and Next Steps
Taught by
Python Tutorials for Digital Humanities