The Cluster Monitoring Operator manages and updates the Prometheus-based monitoring stack deployed on top of OpenShift.
It contains the following components:
- Prometheus Operator
- Prometheus
- Alertmanager cluster for cluster and application level alerting
- kube-state-metrics
- node_exporter
The deployed Prometheus Operator is meant to be leveraged by users to easily deploy new Prometheus setup for their application monitoring.
The Prometheus instance (prometheus-k8s
) is responsible for monitoring and alerting on cluster and OpenShift components. It should not be extended to monitor user applications.
Alertmanager is a cluster-global component for handling alerts generated by all Prometheus instances deployed in that cluster.
Metrics are collected from the following components
- kube-state-metrics
- node_exporter
- Kubelets
- API server
- Prometheus (just
prometheus-k8s
for now) - Alertmanager
Important: The Prometheus Operator managed by the Cluster Monitoring Operator will by default only look for ServiceMonitor
resources in namespaces containing an openshift.io/cluster-monitoring
label (with any value).
The Cluster Monitoring Operator has many builtin ServiceMonitor
resources which enable discovering the metrics endpoints of a variety of well-known components.
To register a new builtin component, make the following changes:
- Add a new
ServiceMonitor
manifest file to assets/prometheus-k8s following the existingprometheus-k8s-service-monitor-$COMPONENT.yaml
naming convention. - Add a constant in pkg/manifests/manifests.go which points to the new manifest file.
- Add a new
Factory
method in pkg/manifests/manifests.go which loads the manifest using the new constant. - Add a step to
PrometheusTask
in pkg/tasks/prometheus.go which creates theServiceMonitor
using theFactory
new method.
To add a new builtin alerting rule:
- Add a new Prometheus rules file to rules/k8s.
Run make generate
after you modify the files and make sure to add the modified files to the commit.
- Monitor etcd
- Adapt Tectonic inherited alerts with OpenShift operational knowledge
Run e2e-tests with make e2e-test
.
Clean up after e2e-tests with make e2e-clean