Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AlertManager not respecting repeat_interval timer setting on duplicate alert #3846

Open
geeteq opened this issue May 24, 2024 · 1 comment
Open

Comments

@geeteq
Copy link

geeteq commented May 24, 2024

What did you do?
I've got the following configuration set for duplicate alerts and AlertManager is not respecting the repeat_interval

I've got the following routes;

route:
group_by: ['alertname','site','host_name']
group_wait: 10m
group_interval: 15m
repeat_interval: 76h
receiver: 'slack-only'

routes:

  • receiver: 'my-nice-receiver'
    group_wait: 10m
    group_interval: 15m
    repeat_interval: 600h # 25 days no repeat this alert <-- this one is not working
    matchers:
    • open_ticket="true"

What did you expect to see?
Within a short while I see two alerts with the same alert ID hash

First first: 2:28 PM Alert ID: 03fb6064079b070a [AM1.3] (1)

Fires again 30 minutes later: 2:58 PM Alert ID: 03fb6064079b070a [AM1.3] (1)

I would expect AlertManager to respect the repeat_interval: 600h for this given alert that is flapping but has the same alert hash ID but it's not working

What did you see instead? Under which circumstances?

Environment

  • System information:

    insert output of uname -srm here
    Linux 5.14.0-284.30.1.el9_2.x86_64 x86_64

  • Alertmanager version:

alertmanager, version 0.27.0 (branch: HEAD, revision: 0aa3c2a)
build user: root@22cd11f671e9
build date: 20240228-11:51:20
go version: go1.21.7
platform: linux/amd64
tags: netgo

  • Prometheus version:
    prometheus, version 2.51.1 (branch: HEAD, revision: 855b5ac4b80956874eb1790a04c92327f2f99e38)
    build user: root@d3785d7783f2
    build date: 20240328-09:27:30
    go version: go1.22.1
    platform: linux/amd64
    tags: netgo,builtinassets,stringlabels

  • Alertmanager configuration file:

global:
# scrape_timeout is set to the global default (10s).
  resolve_timeout: 10m
  slack_api_url: https://hooks.slack.com/services/aaaa/xxxx/zzzzz
    #http_config:
    #  proxy_url: 'http://proxy.ip:80/'

templates: ['alert.tmpl']

route:
  group_by: ['alertname','site','host_name'] 
  group_wait: 10m
  group_interval: 15m
  repeat_interval: 76h     
  receiver: 'slack-only'

  routes:
  - receiver: 'slack-and-jira-ticket'
    group_wait: 10m
    group_interval: 15m
    repeat_interval: 600h   
    matchers:
      - jira_ticket="true"


receivers:
  - name: 'slack-only'
    slack_configs:
    - channel: '#mycomm-warning-alert'
      color: '{{ template "slack.color" . }}'
#      title: '{{ template "slack.title" . }}'
#      text: '{{ template "slack.text" . }}'
      send_resolved: true
      api_url: https://slack.com/api/chat.postMessage
      http_config:
        proxy_url: 'http://proxy.ip:80/'
        authorization:
          credentials: foobar
      icon_emoji: ':prometheus:'
      title: "{{ range .Alerts }}{{ .Annotations.summary }}\n{{ end }}"
      text: "{{ range .Alerts }}{{ .Annotations.description }}\n{{ end }}\n[AM1.3] ({{ .Alerts.Firing | len }})\n"
      actions:
        - type: button
          text: 'Silence  :no_bell:'
            #url: '{{ template "__alert_silence_link" . }}'
          url: 'http://1.2.3.4:9093/#/alerts'
 
  - name: 'slack-and-jira-ticket'
    slack_configs:
    - channel: '#baremetal-sentry'
      color: '{{ template "slack.color" . }}'
#      title: '{{ template "slack.title" . }}'
#      text: '{{ template "slack.text" . }}'
      api_url: https://slack.com/api/chat.postMessage
      http_config:
        proxy_url: 'http://proxy.ip:80/'
        authorization:
          credentials: foobar
      icon_emoji: ':prometheus:'
      title: "{{ range .Alerts }}{{ .Annotations.summary }}\n{{ end }}"
      text: "{{ range .Alerts }}{{ .Annotations.description }}\nAlert ID: {{ .Fingerprint }}{{ end }}  ({{ .Alerts.Firing | len }})\n"
      actions:
        - type: button
          text: 'Silence  :no_bell:'
#          url: '{{ template "__alert_silence_link" . }}'
          url: 'http://1.2.3.4:9093/#/silences/new?filter={host_name%3D%22{{ (index .Alerts 0).Labels.host_name }}%22}'

    webhook_configs:
      - url: 'https://mycoolwehook/api/v1/webhooks/alarms'
      
      # JIRA
        send_resolved: false
        http_config:
          tls_config:
            insecure_skip_verify: true




inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']
  • Prometheus configuration file:
insert configuration here (if relevant to the issue)
  • Logs:
insert Prometheus and Alertmanager logs relevant to the issue here
@grobinson-grafana
Copy link
Contributor

Repeat interval suppresses notifications unless the alert state has changed. You mentioned that the alert fired, resolved, and then fired again 30 minutes later. Repeat interval would not work here because the alert resolved somewhere in that 30 minute period.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants