Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Real-World Machine Learning Project with XGBoost and NVIDIA GPU

Python Simplified via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn to build a complete machine learning workflow using real-world NYC taxi data to predict tipping patterns, leveraging GPU acceleration with XGBoost CUDA and cuDF Pandas for handling massive datasets efficiently. Master professional data science techniques including data cleaning, missing value handling, anomaly detection, and memory optimization while working with 38 million rows of taxi trip data. Discover how to set up and utilize NVIDIA GPUs both locally and through Google Colab's free T4 GPU, install RAPIDS for GPU-accelerated data processing, and solve common memory limitations and runtime crashes using RMM (RAPIDS Memory Manager). Practice essential data preprocessing steps such as detecting and replacing missing values, investigating ambiguous column names, shuffling and splitting data for training and testing, and handling negative charges and unrealistic transaction values. Implement XGBoost model training and evaluation on GPU, troubleshoot common errors like DataFrame dtype issues and GPU memory exhaustion, and apply data optimization techniques to improve model performance. Explore advanced topics including feature importance analysis, hyperparameter tuning, date feature extraction, and K-fold cross-validation while developing the analytical mindset needed to approach real-world data science problems systematically rather than through guesswork.

Syllabus

01:08 - Download Dataset
01:43 - Solving Big Data Problems with GPU Processing
02:46 - Google Colab Setup with Free T4 GPU
03:02 - Local Setup with NVIDIA GPU
03:43 - RAPIDS Installation Guide
05:07 - Solving Jupyter Kernel Crash with cuDF Pandas
05:29 - Handling Missing Values
05:53 - Detect Missing Values
06:29 - Replace with Zero
07:31 - Replace with Mean
08:57 - Investigate Columns with Ambiguous Names
11:21 - Drop Columns If No Other Option
12:01 - Split Data For Training & Testing
12:07 - Shuffle Data
13:39 - Features & Targets Split
14:02 - Train & Test Split
16:20 - Load XGBoost Model on GPU
17:55 - Train XGBoost Model
18:08 - Test XGBoost Model and Get Predictions
18:45 - Solve ValueError : DataFrame.dtypes must be int float bool or category
20:15 - Evaluate Trained Model
22:39 - Data Optimization & Anomalies
22:41 - Detect Data Anomalies with Aggregation
23:47 - Solve XGBoostError : No GPU Memory Left with RMM
25:04 - Handle Negative Charges and Unrealistic Distances
28:19 - Detect and Handle Unrealistic Transactions
30:28 - Second Train Run on Optimized Data
31:45 - Best Practices
31:45 - Plot Training Results & Feature Importance
32:17 - Hyperparameter Tuning
32:49 - Date Extraction : From String to Int or Category
33:05 - K-Fold Validation
33:45 - Thanks for Watching!

Taught by

Python Simplified

Reviews

Start your review of Real-World Machine Learning Project with XGBoost and NVIDIA GPU

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.