Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

Microsoft

Develop a Site Reliability Engineering (SRE) strategy

Microsoft via Microsoft Learn

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
  • Learn about SRE, an engineering discipline that helps you sustainably achieve the appropriate level of reliability in your systems, services, and products.

    In this module you will:

    • Gain a basic understanding of Site Reliability Engineering (SRE).
    • Learn how to get started with this valuable operations practice.
  • Learn how to manage site reliability.

    After completing this module, you'll be able to:

    • Describe how site reliability engineering (SRE) empowers software developers to own the ongoing daily operation of their applications in production.
    • Describe how Application Insights analyzes the performance of your web application and can warn you about potential problems.
    • List the processes that you can implement to monitor site reliability.
    • Build a "just culture" that balances safety and accountability.
  • Cloud Admin course from Dr. Majd Sakr at Carnegie Mellon University. Discover what cloud elasticity means and different ways to scale your cloud resources.

    In this module you will:

    • Describe common load patterns and how they drive the need to scale
    • Enumerate the strategies and considerations in scaling cloud applications
    • Discuss the advantages of auto-scaling and the mechanisms used to achieve it
    • Describe the importance of load balancing in cloud applications and enumerate various methods to achieve it
    • List the primary benefits of serverless computing and explain the concept of serverless functions

    This content is provided in partnership with Dr. Majd Sakr and Carnegie Mellon University.

  • Carnegie Mellon University's Cloud Developer course. Learn how developers write programs that run on the cloud, including how to deploy, be fault-tolerant, load balance, scale, and deal with latency.

    In this module, you will:

    • Evaluate different considerations when programming applications that run on clouds
    • Evaluate different considerations when deploying applications on clouds
    • Compare and contrast proactive and reactive measures for fault tolerance in cloud applications
    • Describe the importance of load balancing in cloud applications and enumerate various methods to achieve it
    • Enumerate the strategies and considerations in scaling cloud applications
    • Motivate the case for minimizing tail latency and discuss the various strategies to reduce tail latency
    • Describe the strategies to optimize total operational cost of using cloud services

    In partnership with Dr. Majd Sakr and Carnegie Mellon University.

  • Learn how to monitor your Azure VMs by using Azure Monitor to collect and analyze VM host and client metrics and logs.
    • Understand which monitoring data you need to collect from your VM.
    • Enable and view recommended alerts and diagnostics.
    • Use Azure Monitor to collect and analyze VM host metrics data.
    • Use Azure Monitor Agent to collect VM client performance metrics and event logs.

Syllabus

  • Introduction to Site Reliability Engineering (SRE)
    • Introduction to Site Reliability Engineering
    • What is SRE and why does it matter?
    • SRE in context
    • Key SRE principles and practices: virtuous cycles
    • Key SRE principles and practices: The human side of SRE
    • Getting started with SRE
    • Summary
  • Manage site reliability
    • Introduction
    • What is reliability engineering?
    • What is Application Insights?
    • Perform ongoing tuning to reduce meaningless alerts
    • Analyze alerts to establish a baseline
    • Blameless postmortems
    • Module assessment
    • Summary
  • Scale your cloud resources with elasticity
    • Introduction
    • Compute load patterns
    • Scaling compute resources
    • Automated scaling on the cloud
    • Load balancing
    • Serverless computing
    • Summary
  • Build applications on the cloud
    • Introduction
    • Programming the cloud
    • Deploy applications on the cloud
    • Build fault-tolerant cloud services
    • Load balancing
    • Scale resources
    • How to deal with tail latency
    • Economics for cloud applications
    • Summary
  • Monitor your Azure virtual machines with Azure Monitor
    • Introduction
    • Monitoring for Azure VMs
    • Monitor VM host data
    • Use Metrics Explorer to view detailed host metrics
    • Collect client performance counters by using VM insights
    • Collect VM client event logs
    • Summary

Reviews

Start your review of Develop a Site Reliability Engineering (SRE) strategy

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.