Overview
Coursera Spring Sale
40% Off Coursera Plus Annual!
Grab it
Explore how Meta monitors Linux kernels across millions of production servers in this 22-minute conference talk from the Linux Plumbers Conference. Learn about the significant challenges of maintaining observability at hyperscale, including managing diverse software and hardware environments while running cutting-edge kernel versions that often reveal issues before other organizations encounter them. Discover Meta's approach to kernel event collection and data aggregation using both commonly available tools like netcons, kdump, and drgn, as well as internally developed solutions designed for massive scale operations. Understand how engineers detect, debug, and prioritize kernel issues based on spread and severity across the infrastructure. Examine the integration of monitoring data into Meta's kernel release process and the critical balance between observability granularity and practical implementation. Gain insights into performance overhead considerations, data volume management strategies, and the technical challenges of maintaining comprehensive kernel observability at one of the world's largest technology infrastructures.
Syllabus
Scaling Kernel Production Monitoring @ Meta - Vlad Poenaru (Meta)
Taught by
Linux Plumbers Conference