Platform Efficiency Challenges and the OCP Opportunity - Context Aware AI ML Agents on DC SCM

Learn about platform efficiency challenges in data center environments and explore AI-ML solutions for optimizing server infrastructure through this 19-minute conference talk from the Open Compute Project. Discover how low infrastructure utilization affects servers, racks, switches, and pods, with GPU utilization hovering around 60% and similar issues impacting CPUs and networks. Examine the need for standardized metrics, real-time monitoring, and automated corrective systems to improve return on investment in data center operations. Explore context-aware resource management techniques and continuous security assessment methodologies that leverage AI-ML solutions to synthesize contextual metrics for actionable insights. Review practical implementations including a thermal management AI-ML model running on DC-SCM that achieves up to 50% cooling power savings, GPU energy prediction capabilities for ML workloads, and an LLM-based model designed for live security threat mitigation. Understand how the Open Compute Project is positioned to drive standardization efforts in post-deployment efficiency optimization and real-time platform adjustments.