Observability for EKS: Prometheus and Grafana Guide

Monitoring the health of your applications running on Kubernetes (EKS) is vital for detecting and resolving issues. The gold standard in this field is the open-source Prometheus and Grafana duo.

What is Prometheus?

Prometheus is a monitoring system that collects and stores time-series data.

Pull Model: Prometheus goes to its targets (pods, nodes) and says "Give me your metrics" (Scraping).
Alertmanager: Generates alerts based on specific rules (e.g., CPU > 80%) and sends them to channels like Slack/Email.

What is Grafana?

Grafana is a visualization tool that turns data from Prometheus into magnificent charts and dashboards.

Installation on EKS: Kube-Prometheus-Stack

Installing these individually is difficult. The community-developed kube-prometheus-stack Helm chart offers Prometheus, Grafana, Alertmanager, and Node Exporter in a single package.

Installation Steps

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install monitoring prometheus-community/kube-prometheus-stack -n monitoring --create-namespace

With this command, you get:

Prometheus Operator: Makes managing Prometheus easy.
ServiceMonitor: Automatically discovers which services to monitor.
Ready Dashboards: Dozens of ready Grafana boards showing Kubernetes Cluster, Node, Pod performance.

Persistence

Prometheus data is kept inside the pod. If the pod dies, data is lost. In a production environment, you must define an Amazon EBS based Persistent Volume (PV) for Prometheus.

Alternatively, you can use Amazon Managed Service for Prometheus (AMP) for long-term data storage. This service eliminates the hassle of managing the Prometheus server and provides high scalability.

Providing visibility in your EKS clusters allows you to solve problems before the customer notices.