Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

YouTube

Mitigating Silent Data Corruption: Industry-Academia Collaboration and Progress

Open Compute Project via YouTube

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
This 20-minute talk from the Open Compute Project features Emel Goksu (Meta - Ecosystem and Partnerships Lead) discussing the critical challenge of Silent Data Corruption (SDC) in computing systems. Learn about the collaborative efforts between major tech companies including AMD, ARM, Google, Intel, Meta, Microsoft, and NVIDIA in developing the Server Compute Resiliency Specification since 2022. Discover how this industry initiative partners with multiple universities through the Open Compute Project to advance research in detecting and mitigating these rare but impactful errors that become increasingly significant at scale, especially with growing AI workloads. The presentation covers the recent milestone achievement of Specification 1.0 released during the 2024 OCP Global Summit, and outlines ongoing work toward the next specification version focusing on GPUs, continued university research, and the development of a handbook for AI developers.

Syllabus

Mitigating Silent Data Corruption: Industry- Academia Collaboration and Progress

Taught by

Open Compute Project

Reviews

Start your review of Mitigating Silent Data Corruption: Industry-Academia Collaboration and Progress

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.