Skip to content

Commit

Permalink
Add alerting rules for operator (#526)
Browse files Browse the repository at this point in the history
  • Loading branch information
Amper committed Dec 20, 2023
1 parent b830cda commit a5d7fb0
Show file tree
Hide file tree
Showing 3 changed files with 53 additions and 2 deletions.
48 changes: 48 additions & 0 deletions config/alerting/rules.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
apiVersion: operator.victoriametrics.com/v1beta1
kind: VMRule
metadata:
name: vm-operator-rules
spec:
groups:
- name: victoria-metrics-operator
rules:
- alert: LogErrors
expr: sum(rate(operator_log_messages_total{level="error", job=~".*((victoria.*)|vm)-?operator"}[5m])) > 0
for: 15m
labels:
severity: high
show_at: dashboard
annotations:
description: "Operator has too many errors at logs: {{ $value}}, check operator logs"
dashboard: "{{ $externalURL }}/d/1H179hunk/victoriametrics-operator?ds={{ $labels.dc }}&orgId=1&viewPanel=16"
summary: "Too many errors at logs of operator: {{ $value}}"
- alert: ReconcileErrors
expr: sum(rate(controller_runtime_reconcile_errors_total{job=~".*((victoria.*)|vm)-?operator"}[5m])) > 0
for: 10m
labels:
severity: high
show_at: dashboard
annotations:
description: "Operator cannot parse response from k8s api server, possible bug: {{ $value }}, check operator logs"
dashboard: "{{ $externalURL }}/d/1H179hunk/victoriametrics-operator?ds={{ $labels.dc }}&orgId=1&viewPanel=10"
summary: "Too many errors at reconcile loop of operator: {{ $value}}"
- alert: HighQueueDepth
expr: (sum(workqueue_depth{job=~".*((victoria.*)|vm)-?operator"}) by (name)) > 10
for: 15m
labels:
severity: high
show_at: dashboard
annotations:
description: "Operator cannot handle reconciliation load for controller: `{{- $labels.name }}`, current depth: {{ $value }}"
dashboard: "{{ $externalURL }}/d/1H179hunk/victoriametrics-operator?ds={{ $labels.dc }}&orgId=1&viewPanel=20"
summary: "Too many `{{- $labels.name }}` in queue: {{ $value }}"
- alert: BadObjects
expr: (sum(operator_controller_bad_objects_count{job=~".*((victoria.*)|vm)-?operator"}) by (controller)) > 0
for: 15m
labels:
severity: high
show_at: dashboard
annotations:
description: "Operator got incorrect resources in controller {{ $labels.controller }}, check operator logs"
dashboard: "{{ $externalURL }}/d/1H179hunk/victoriametrics-operator?ds={{ $labels.dc }}&orgId=1"
summary: "Incorrect `{{ $labels.controller }}` resources in the cluster"
1 change: 1 addition & 0 deletions docs/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ aliases:
## Next release

- [vmalertmanager](./api.html#vmalertmanagerconfig): fix `VMAlertmanagerConfig` discovery according to [the docs](https://docs.victoriametrics.com/operator/resources/vmalertmanager.html#using-vmalertmanagerconfig).
- [vmoperator](./README.md): add alerting rules for operator itself. See [this issue](https://github.com/VictoriaMetrics/operator/issues/526) for details.

<a name="v0.39.4"></a>
## [v0.39.4](https://github.com/VictoriaMetrics/operator/releases/tag/v0.39.4) - 13 Dec 2023
Expand Down
6 changes: 4 additions & 2 deletions docs/monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,15 +24,17 @@ Official Grafana dashboard available for [vmoperator](https://grafana.com/grafan

Graphs on the dashboards contain useful hints - hover the `i` icon in the top left corner of each graph to read it.

<!-- TODO: alerts for operator -->
## Alerting rules

Alerting rules for VictoriaMetrics operator are available [here](https://github.com/VictoriaMetrics/operator/blob/master/config/alerting/rules.yaml).

## Configuration

### Helm-chart victoria-metrics-k8s-stack

In [victoria-metrics-k8s-stack](https://github.com/VictoriaMetrics/helm-charts/blob/master/charts/victoria-metrics-k8s-stack/README.md) helm-chart operator self-scrapes metrics by default.

This helm-chart also includes [official grafana dashboard for operator](#dashboard).
This helm-chart also includes [official grafana dashboard for operator](#dashboard) and [official alerting rules for operator](#alerting-rules).

### Helm-chart victoria-metrics-operator

Expand Down

0 comments on commit a5d7fb0

Please sign in to comment.