Standardizing GPU Management - Redfish, Telemetry, and Firmware Update Protocols
Open Compute Project via YouTube
MIT Sloan: Lead AI Adoption Across Your Organization — Not Just Pilot It
The Most Addictive Python and SQL Courses
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Explore the collaborative efforts of major tech companies to standardize GPU management protocols in this 28-minute panel discussion from the Open Compute Project. Learn about the significant progress made by AMD, Google, Meta, Microsoft, and NVIDIA in developing scalable management and observability solutions for AI and high-performance computing workloads. Discover the successful publication of a DMTF Message Registry and Redfish Interoperability Profile for GPU management, along with recent work to standardize GPU telemetry interfaces for accessing time-series data, detailed crash dumps, and debug logs. Understand how these standardization efforts focus on supporting low-latency and time-sensitive data streams with service level objectives (SLOs) to improve integration and testability. Gain insights into how this foundational work enables interoperability and provides hyperscalers with consistent management capabilities across multi-vendor GPU deployments while reducing fragmented requests to GPU suppliers.
Syllabus
Panel Standardizing GPU Management Redfish, Telemetry, and Firmware Update Protocols
Taught by
Open Compute Project