Power BI Fundamentals - Create visualizations and dashboards from scratch
Earn Your CS Degree, Tuition-Free, 100% Online!
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Learn how to optimize XGBoost performance using NVIDIA's Grace Blackwell super chip architecture in this 36-minute conference talk from Databricks. Discover how XGBoost's distributed out-of-core implementation leverages the ultra-high bandwidth of NVLink-C2C connections between CPU and GPU to overcome memory limitations that typically constrain gradient boosting algorithms when working with large tabular datasets. Explore the technical implementation that enables XGBoost to scale up to over 1.2TB of data processing capacity on a single node without performance degradation, taking advantage of the fast chip-to-chip communication enabled by the Grace Blackwell architecture. Understand how this approach extends to GPU clusters using Spark, allowing XGBoost to efficiently handle terabytes of data across distributed systems. See a practical demonstration of integrating XGBoost's out-of-core algorithms with Spark 4.0's latest Connect ML framework for large-scale model training workflows, presented by NVIDIA engineers Bobby Wang and Jiaming Yuan who share their optimization work and real-world implementation strategies.
Syllabus
Scaling XGBoost With Spark Connect ML on Grace Blackwell
Taught by
Databricks