Speed up Cleaning Data with Pandas and your GPU - CuDF Episode 5
Python Tutorials for Digital Humanities via YouTube
Gain a Splash of New Skills - Coursera+ Annual Nearly 45% Off
AI Adoption - Drive Business Value and Organizational Impact
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
This tutorial video demonstrates how to accelerate data cleaning operations using cuDF, NVIDIA's GPU-powered alternative to Pandas. Learn essential techniques for handling messy real-world data, including managing missing values, dropping unnecessary columns, and converting data types—with special focus on time series data transformation. Watch as the instructor works with a massive dataset of over 4.3 million rows on a Dell Workstation Precision 3680 with an Nvidia RTX 5000 Ada GPU, showcasing the performance advantages of GPU acceleration. The 12-minute guide walks through a complete workflow in Jupyter Notebook, from initial data loading to converting string dates to datetime format, providing practical solutions for common data cleaning challenges that benefit from GPU processing power.
Syllabus
00:00 Introduction and Series Recap
00:16 Data Manipulation Basics
00:30 Working with Real-World Data
01:02 Dell Workstation Overview
02:21 Setting Up the Jupyter Notebook
04:00 Cleaning Data with Pandas
04:38 Handling Null Values
08:33 Working with Dates in Pandas
11:16 Conclusion and Next Steps
Taught by
Python Tutorials for Digital Humanities