Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Learn how Netflix manages large-scale, multi-tenant Kubernetes clusters powering both streaming services and batch workloads in this 31-minute conference talk from CNCF. Discover Netflix's approach to building resilient and cost-efficient capacity management systems at cloud scale, including their federated cellular structure for resource sharing across teams. Explore how they handle both latency-sensitive services and throughput-heavy batch jobs while organizing hardware into managed pools of nodes for various users. Understand their implementation of soft capacity reservations that monitor real-time demand with shared buffers to support traffic spikes, extended disruption budgets using health signals to limit service impact, and automated scaling with predictive resizing to reduce costs. Gain insights into how Netflix fills unused capacity with preemptible, low-priority workloads to minimize waste, along with practical lessons about what strategies have succeeded and failed in their infrastructure management journey.