Portrait Service - AI-Driven PB-Scale Data Mining for Cost Optimization and Stability Enhancement
CNCF [Cloud Native Computing Foundation] via YouTube
Google AI Professional Certificate - Learn AI Skills That Get You Hired
35% Off Finance Skills That Get You Hired - Code CFI35
Overview
Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Learn how Kuaishou leverages AI-driven data mining to optimize cost and enhance stability across their massive Kubernetes infrastructure in this 30-minute conference talk. Discover how their Portrait Service processes over 10TB of daily data from 200,000+ machines and 10M+ Pods to deliver intelligent system management. Explore the dual approach of stability management through AI-generated machine health scores that integrate with Kubernetes scheduling to automatically evict unhealthy nodes, reducing pod creation delays from 20 to 0.1 cases per day and boosting service availability from 90% to 99.99%. Examine their performance optimization strategy that combines AI with microarchitecture data to analyze 10,000+ services with varying resource sensitivities, creating detailed application profiles that optimize compute, cache, and memory bandwidth allocation, resulting in 20% average IPC increases and dramatic reductions in cache miss rates from over 50% to 10% for cache-sensitive services. Gain insights into their future roadmap for integrating AI Agent technology to automate anomaly detection and reduce manual operations by 80%, providing a comprehensive view of enterprise-scale AI-driven infrastructure optimization.
Syllabus
Portrait Service: AI-Driven PB-Scale Data Mining for Cost Optimization and... Yuji Liu & Zhiheng Sun
Taught by
CNCF [Cloud Native Computing Foundation]