Portrait Service - AI-Driven PB-Scale Data Mining for Cost Optimization and Stability Enhancement
CNCF [Cloud Native Computing Foundation] via YouTube
The Investment Banker Certification
Google Data Analytics, IBM AI & Meta Marketing — All in One Subscription
Overview
Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off
One annual plan covers every course and certificate on Coursera. 40% off for a limited time.
Get Full Access
Learn how Kuaishou leverages AI-driven data mining to optimize cost and enhance stability across their massive Kubernetes infrastructure in this 30-minute conference talk. Discover how their Portrait Service processes over 10TB of daily data from 200,000+ machines and 10M+ Pods to deliver intelligent system management. Explore the dual approach of stability management through AI-generated machine health scores that integrate with Kubernetes scheduling to automatically evict unhealthy nodes, reducing pod creation delays from 20 to 0.1 cases per day and boosting service availability from 90% to 99.99%. Examine their performance optimization strategy that combines AI with microarchitecture data to analyze 10,000+ services with varying resource sensitivities, creating detailed application profiles that optimize compute, cache, and memory bandwidth allocation, resulting in 20% average IPC increases and dramatic reductions in cache miss rates from over 50% to 10% for cache-sensitive services. Gain insights into their future roadmap for integrating AI Agent technology to automate anomaly detection and reduce manual operations by 80%, providing a comprehensive view of enterprise-scale AI-driven infrastructure optimization.
Syllabus
Portrait Service: AI-Driven PB-Scale Data Mining for Cost Optimization and... Yuji Liu & Zhiheng Sun
Taught by
CNCF [Cloud Native Computing Foundation]