Courses from 1000+ universities
Buried in Coursera’s 300-page prospectus: two failed merger attempts, competing bidders, a rogue shareholder, and a combined market cap that shrank from $3.8 billion to $1.7 billion.
600 Free Google Certifications
Psychology
Online Education
Data Analysis
Introduction to Real-Time Audio Programming in ChucK
Introduction to Complexity
The Science of the Solar System
Organize and share your learning with Class Central Lists.
View our Lists Showcase
Explore how SREs can align mental models with system reality using resilience stress testing and decision trees. Learn practical tools for documenting, visualizing, and improving complex software systems.
Insights on platform engineering best practices, including product management, developer retention, trust-building, and re-skilling ops staff, drawn from diverse organizations' experiences over the past decade.
Explore the pitfalls of blindly adopting others' platform strategies. Learn to retain agency, identify needs, and avoid common mistakes in platform engineering for your unique business context.
Explore J.P. Morgan's transition to public cloud, focusing on SRE's role in overcoming regulatory, technical, and organizational challenges while ensuring stability and reliability.
Explore the evolution of Honeycomb's Kafka cluster and telemetry systems, covering scaling strategies, infrastructure choices, and best practices for handling 10x data volume growth.
Demystifying OpenTelemetry metrics: Learn about different metric instruments, their implementation, and how they can enhance your understanding of system performance and error rates.
Explore scaling Prometheus for massive metrics installations, covering field hint indices, query push down, GitOps deployment, and lessons learned from eBay's journey to planet-scale observability.
Discover how Spotify transforms incident reports into valuable insights for improving system operations and work processes, extracting meaningful data from seemingly mundane paperwork.
Explore building an open-source APM using OpenTelemetry, Prometheus, and Jaeger. Learn implementation strategies, risks, and upcoming improvements in the OTel community for cost-effective application performance monitoring.
Explore chaos experimentation as a test-driven development approach for distributed systems, enhancing reliability and validating changes throughout the software lifecycle.
Explore how SRE teams evolve as startups grow, focusing on organizational changes, advocacy strategies, and overcoming technical debt to support rapid scaling and maintain operational excellence.
Exploring SRE as a cultural force for change, examining its evolution and potential to drive reliability in organizations while drawing parallels to social movements and people-centric approaches.
Exploring Azure's implementation of RDMA for storage, addressing challenges in regional deployment and achieving significant performance improvements and CPU savings.
Optimizing ML training with SYNDICATE: A framework that minimizes communication bottlenecks and speeds up training for large-scale models through novel motif abstraction and joint optimization techniques.
Magma: An open system for building affordable wireless networks in underserved areas, leveraging Internet design patterns and SDN principles to reduce costs and complexity while maintaining key cellular features.
Get personalized course recommendations, track subjects and courses with reminders, and more.