Unified AIOps for Remote Management of Heterogeneous Open Source AI Systems
Open Compute Project via YouTube
Build GenAI Apps from Scratch — UCSB PaCE Certificate Program
Master AI and Machine Learning: From Neural Networks to Applications
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn how to implement unified AIOps solutions for managing diverse open-source AI infrastructure in this 18-minute conference talk. Discover how modern AI systems require intelligent coordination beyond just hardware, encompassing open-source firmware, multiple platforms, and fragmented telemetry data. Explore techniques for normalizing multi-vendor telemetry through AIOps pipelines that enable predictive analytics to identify GPU degradation, thermal hotspots, and system inefficiencies. Master automated remediation strategies including firmware patching, workload migration, and adaptive cooling systems. Understand how AI chatbots can simplify operations through natural language interfaces, making complex infrastructure management more accessible. Examine closed-loop optimization approaches that connect workload behavior with infrastructure conditions to enhance performance-per-watt ratios and reduce carbon footprint. Gain insights into scalable, intelligent AI infrastructure management models that align with Open Compute Project values of openness, modularity, and sustainability for heterogeneous computing environments.
Syllabus
Unified AIOps for Remote Management of Heterogeneous Open Source AI Systems
Taught by
Open Compute Project