Courses from 1000+ universities
Buried in Coursera’s 300-page prospectus: two failed merger attempts, competing bidders, a rogue shareholder, and a combined market cap that shrank from $3.8 billion to $1.7 billion.
600 Free Google Certifications
Online Education
Marketing
Digital Marketing
Understanding Multiple Sclerosis (MS)
Psychology of Personal Growth
Introducción a la Regulación Emocional basada en Mindfulness
Organize and share your learning with Class Central Lists.
View our Lists Showcase
Hands-on lab exploring observability tools for cloud-native applications. Install and use Prometheus, Grafana, and Jaeger to monitor Kubernetes clusters and instrument microservices for improved troubleshooting and diagnostics.
Explore diverse SRE implementations across companies in this interactive session. Share experiences and gain insights into varied approaches to Site Reliability Engineering in different organizational contexts.
Explore the challenges of distributed systems, including CAP theorem myths, network partitions, consensus algorithms, and human factors, with cat-illustrated examples.
Explore techniques for tracing and improving web service performance, including black & white box tracing, distributed systems tracing, and various tools to measure and enhance service efficiency.
Explore how SRE techniques can be applied to on-premise software delivery, enhancing construction, packaging, and shipment processes for improved reliability and customer satisfaction.
Lightning talks on diverse SRE topics: incident response, distributed systems, resource management, automation, operational maturity, and monitoring. Insights from industry experts on improving reliability and efficiency.
Explore fault tree analysis techniques to enhance Apache Kafka cluster resilience, drawing insights from Lyft's experience in bulletproofing their deployments.
Explore correct approaches to calculating latency SLOs using histograms and mathematically accurate quantiles, improving service level objectives for more effective monitoring and performance management.
Unified framework for deploying erasure coding solutions in distributed storage systems, simplifying integration and optimizing repair performance.
Explores a novel distributed file system design optimized for non-volatile main memory and RDMA networks, achieving high performance for metadata and data access while maintaining byte addressability.
Explores programmable switches for rapid rerouting using TCP-induced signals to detect failures. Presents Blink, a data-driven system analyzing TCP flows at line rate to quickly recover connectivity in the data plane.
Innovative size-aware sharding technique for in-memory key-value stores, reducing tail latencies and improving throughput by avoiding head-of-line blocking and optimizing request distribution across cores.
Efficient alternative for large-scale ML tasks using SSD-resident matrices, enabling near in-memory performance on a single workstation. Outperforms distributed systems for complex algorithms like eigensolvers.
Explore Netflix's approach to chaos engineering, its impact on system resilience, and lessons learned in building a robust software stack through controlled failure experiments.
Explore eBPF for debugging Linux performance issues, focusing on a real-world case of CPU increase in Kafka after a Debian upgrade. Learn tools and techniques for operational engineers.
Get personalized course recommendations, track subjects and courses with reminders, and more.