Build with Azure OpenAI, Copilot Studio & Agentic Frameworks — Microsoft Certified
The Most Addictive Python and SQL Courses
Overview
Google, IBM & Meta Certificates – 40% Off
One plan covers every Professional Certificate on Coursera.
Unlock All Certificates
Learn how to optimize XGBoost performance using NVIDIA's Grace Blackwell super chip architecture in this 36-minute conference talk from Databricks. Discover how XGBoost's distributed out-of-core implementation leverages the ultra-high bandwidth of NVLink-C2C connections between CPU and GPU to overcome memory limitations that typically constrain gradient boosting algorithms when working with large tabular datasets. Explore the technical implementation that enables XGBoost to scale up to over 1.2TB of data processing capacity on a single node without performance degradation, taking advantage of the fast chip-to-chip communication enabled by the Grace Blackwell architecture. Understand how this approach extends to GPU clusters using Spark, allowing XGBoost to efficiently handle terabytes of data across distributed systems. See a practical demonstration of integrating XGBoost's out-of-core algorithms with Spark 4.0's latest Connect ML framework for large-scale model training workflows, presented by NVIDIA engineers Bobby Wang and Jiaming Yuan who share their optimization work and real-world implementation strategies.
Syllabus
Scaling XGBoost With Spark Connect ML on Grace Blackwell
Taught by
Databricks