Overview

Google, IBM & Meta Certificates — All 10,000+ Courses at 40% Off

One annual plan covers every course and certificate on Coursera. 40% off for a limited time.

Modern cloud-native applications rarely crash outright. Instead, they fail in subtle ways such as latency spikes, partial errors, or noisy dependencies. This course helps you become productive with the open-source trio used across the industry: Prometheus for metrics and PromQL analysis, Grafana for dashboards and alerting, and OpenTelemetry for standard, vendor-neutral instrumentation. You will launch a small local stack, scrape metrics, and build a practical three-panel dashboard that tracks requests, errors, and latency. Then you will create alerts that actually matter and instrument a sample service with the OpenTelemetry SDK to produce traces that can be correlated with metrics. Along the way, you will learn key observability patterns like pull versus push collection, label hygiene, histogram quantiles, and Collector pipelines. Learners should be familiar with basic Docker or Linux, YAML/JSON, and be comfortable with web apps/HTTP; Kubernetes familiarity helpful. This course is designed for software engineers, SREs, and platform engineers who want hands-on experience setting up and using an open-source observability stack to diagnose real production issues. By the end, you will have working configurations, starter queries, and a clear path to production that covers exporters, data retention, SLOs, and burn rate alerts.

Syllabus

Foundations: Signals, Tools, and the Minimal Stack

Familiarize yourself with the three primary observability signals—metrics, logs, and traces—and understand how Prometheus, Grafana, and OpenTelemetry correspond to each. We will comprehensively examine the entire data pathway, clarifying the roles of pull versus push mechanisms and exporters versus receivers. Subsequently, you will set up a small local environment using Docker Compose, which will be reused throughout this course. By the conclusion, you will have established a functional laboratory environment where targets are operationally marked in green, and data flows seamlessly.

Prometheus + Grafana Essentials: PromQL and Dashboards

Acquire knowledge of the fundamental components of PromQL essential for daily use: rate(), sum by(), label filters, and histogram quantiles—while avoiding typical pitfalls associated with counters and gauges. Subsequently, transform queries into meaningful signals through the development of a clear three-panel Grafana dashboard displaying RPS, error ratio, and 95th percentile latency, all equipped with appropriate units, legends, and variables. Export the dashboard as JSON and configure a noise-aware alert (error rate >5% over 5 minutes) to practice setting thresholds in relation to time windows. The emphasis is on maintaining practical panel organization and creating queries that can be clearly explained.

OpenTelemetry in Practice: Traces, Collector Pipelines, and Correlation

Implement the demo application with an OpenTelemetry (OTel) Software Development Kit (SDK), establish meaningful resource attributes, and export data via the OpenTelemetry Protocol (OTLP) to a Collector pipeline, which you will configure (receivers → processors → exporters). You will visualize traces using Grafana/Tempo and learn how to navigate from a “hot” metric dashboard directly to the related spans using exemplars. Throughout the process, you will validate the health of the pipeline, incorporate attributes and batching, and practice root-cause analysis on induced failures. The session concludes with next steps including label management, Service Level Objectives (SLOs) and burn rates, as well as retention/export strategies for production environments.