Courses from 1000+ universities
$7.2 billion in combined revenue since 2020. $8 billion in lost market value. This merger marks the end of an era in online education.
600 Free Google Certifications
Marketing
Cybersecurity
Machine Learning
Circuits and Electronics 1: Basic Circuit Analysis
Academic Writing Made Easy
Nutrition, Exercise and Sports
Organize and share your learning with Class Central Lists.
View our Lists Showcase
Explore Pinterest's journey in scaling observability tools, from metrics to log search and distributed tracing, as the company grew from startup to web-scale platform.
Explore Adaptive Paging, an innovative alert handler that uses tracing and heuristics to identify and notify the team closest to the problem, reducing alert fatigue in complex distributed systems.
Exploring distributed tracing in real-time data streaming systems, focusing on challenges and solutions for trading platforms, including session tracking, data flow management, and storage optimization.
Explore principles and tools for safer production environments through automation, safe proxies, and audited break-glass, reducing human errors and insider threats in system operations.
Explores limitations of Machine Learning in production engineering, debunking common misconceptions and discussing potential feasible applications for SREs.
Learn how Squarespace's team adopted SRE practices to transform their unreliable logging platform into a trusted system with 99.9% uptime, sharing valuable insights and strategies for improving service reliability.
Explore systems thinking for safety and cybersecurity, integrating approaches to manage emergent properties and control problems in complex systems.
Explore strategies for efficient systems data management, including sampling and aggregation techniques, to maintain crucial information while reducing data volume and costs.
Practical strategies for implementing Site Reliability Engineering principles in resource-constrained environments, focusing on gradual improvements and stress reduction for engineering teams.
Explore Stripe's approach to prioritizing technical infrastructure investments, balancing firefighting and innovation, and enabling long-term success through strategic decision-making and resource allocation.
Explore the journey of defining effective SLOs for data-intensive services, focusing on search engines. Learn about monitoring processes, consistency, and automated mitigation strategies for complex systems.
Explore Google's SRE training program, featuring hands-on exercises in a safe environment. Learn how SRE principles were applied to improve the curriculum, minimize toil, and enhance reliability through automation and monitoring.
Learn to quickly estimate system performance using base rates and napkin math, enabling informed decision-making in technical discussions and design processes without building systems first.
Learn to create a PID controller for autoscaling Kubernetes deployments, ensuring smooth scaling based on custom targets. Explore control theory principles and their application in SRE practices.
Transforming engineering culture: One SRE's journey from chaos to improved reliability, featuring practical tips on implementing SLIs, reducing incident response times, and fostering organizational change.
Get personalized course recommendations, track subjects and courses with reminders, and more.