Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Udacity

Site Reliability Engineering (SRE) Fluency

via Udacity

Overview

In this course, you will learn the core components of Site Reliability Engineering. This course starts with introducing the Zero Trust Security system, then proceeds to discuss Service Level Objectives and indicators, capacity management, on-call effectiveness, and incident management.

Syllabus

  • Zero Trust Security Concepts
    • This lesson is a review of the core components required to implement a zero trust security system and how policy-based management systems allow us to "Never Trust, Always Verify".
  • An Introduction to SLOs and SLIs
    • In this lesson, we will learn about how SREs monitor using SLOs and SLIs. We will create queries in Prometheus and dashboard in Grafana.
  • Capacity Management: Managing System Capacity
    • System capacity is an essential part of ensuring reliability. This lesson discusses how to balance system capacity with costs to ensure that resources and money are not being wasted.
  • On-call Effectiveness and Incident Management Best Practices
    • Having a solid on-call is very important to achieving peak reliability. This lesson discusses how to have balanced on-call shifts with a solid incident management process that your team can follow.

Taught by

Richard Phung, Travis Scotto and Sonny Sevin

Reviews

5 rating at Udacity based on 4 ratings

Start your review of Site Reliability Engineering (SRE) Fluency

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.