Courses from 1000+ universities
Buried in Coursera’s 300-page prospectus: two failed merger attempts, competing bidders, a rogue shareholder, and a combined market cap that shrank from $3.8 billion to $1.7 billion.
600 Free Google Certifications
Psychology
Online Education
Data Analysis
Introduction to Real-Time Audio Programming in ChucK
Introduction to Complexity
The Science of the Solar System
Organize and share your learning with Class Central Lists.
View our Lists Showcase
Explore how modeling-driven techniques and TLA+ can enhance postmortem analysis, uncover root causes, and improve system design in distributed database architectures.
Unraveling a complex Kubernetes incident: from DNS suspicions to kernel-level insights, culminating in a surprising three-line code fix. Learn debugging techniques and unexpected system behaviors.
Strategies for effective incident response coordination, focusing on follower roles and organizational preconditions. Insights to improve communication and collaboration during software outages.
Explore sociotechnical engineering strategies for SREs to impact reliability beyond infrastructure, addressing team struggles, burnout, and priorities to enhance overall system performance.
Uncover the truth behind SLO adoption failures, learn to calculate and prove their value, and understand key differences in measurement methods for better system reliability insights.
Explores attributes affecting engineer confidence in handover communications for software operations, based on research and interviews. Highlights importance of effective information transfer in various scenarios.
Explore the 1979 NORAD nuclear near-miss incident, its causes, and implications for modern distributed systems maintenance and operation. Learn from this historical event to improve current practices.
Explore the transition from incident management to incident analysis, highlighting the distinct skills required and the value of post-incident learning for driving meaningful organizational change.
Explore how SREs can align mental models with system reality using resilience stress testing and decision trees. Learn practical tools for documenting, visualizing, and improving complex software systems.
Insights into SRE management: priorities, decision-making, and career advancement for ICs. Learn to recognize effective leadership and navigate the SRE management landscape.
Discover how Google and Major League Hacking collaborate to create diverse SRE talent through the SRE Fellowship, offering underrepresented groups immersive training and career opportunities in site reliability engineering.
Learn effective alert triage through guided experience in real-world scenarios. Discover how "Alert Triage Hour of Power" fosters camaraderie and system understanding in production environments.
Explore how SREs in government navigated regulatory constraints to rapidly deploy critical software services during the COVID-19 pandemic, showcasing their adaptability and incident response skills.
Explore a CDN's metastable failure, design improvements, and unconventional recovery methods. Learn strategies to prevent and address similar incidents in distributed systems.
Insights into perception gaps between reliability practitioners and management based on The SRE Report, offering strategies to bridge differences and improve organizational alignment.
Get personalized course recommendations, track subjects and courses with reminders, and more.