Standardizing GPU Management - Redfish, Telemetry, and Firmware Update Protocols
Open Compute Project via YouTube
AI Adoption - Drive Business Value and Organizational Impact
AI Engineer - Learn how to integrate AI into software applications
Overview
Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Explore the collaborative efforts of major tech companies to standardize GPU management protocols in this 28-minute panel discussion from the Open Compute Project. Learn about the significant progress made by AMD, Google, Meta, Microsoft, and NVIDIA in developing scalable management and observability solutions for AI and high-performance computing workloads. Discover the successful publication of a DMTF Message Registry and Redfish Interoperability Profile for GPU management, along with recent work to standardize GPU telemetry interfaces for accessing time-series data, detailed crash dumps, and debug logs. Understand how these standardization efforts focus on supporting low-latency and time-sensitive data streams with service level objectives (SLOs) to improve integration and testability. Gain insights into how this foundational work enables interoperability and provides hyperscalers with consistent management capabilities across multi-vendor GPU deployments while reducing fragmented requests to GPU suppliers.
Syllabus
Panel Standardizing GPU Management Redfish, Telemetry, and Firmware Update Protocols
Taught by
Open Compute Project