Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Active silences reminder to its creator #1639

Closed
TinLe opened this issue Nov 27, 2018 · 6 comments
Closed

Active silences reminder to its creator #1639

TinLe opened this issue Nov 27, 2018 · 6 comments

Comments

@TinLe
Copy link

TinLe commented Nov 27, 2018

Enhancement Request

What did you do?
Silences were created for large block of alerts, e.g. silent all alerts for team Foo in datacenter Bar. This was meant to be temporary as DC Bar was being brought up. Unfortunately, the slient period was too long, and the DC was put into production sooner than expected.

Alerts were suppressed due to the silences.

What did you expect to see?
A reminder is sent to the creator(or designated team/owner) of the silence once a week so that the silence is not forgotten, and is reviewed at least weekly.

What did you see instead? Under which circumstances?

No reminder and the silence was forgotten till important alerts were suppressed and missed.

Environment
RHEL7.4/CentOS 7.4

  • System information:

    Linux systemA 3.10.0-693.11.6.el7.x86_64 Change model to be more state- and less event-focussed. #1 SMP Thu Jan 4 01:06:37 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

  • Alertmanager version:

    Branch: HEAD
    BuildDate: 20180622-11:58:41
    BuildUser: root@bec9939eb862
    GoVersion: go1.10.3
    Revision: 462c969
    Version: 0.15.0

  • Prometheus version:

Version 2.4.2
Revision c305ffaa092e94e9d2dbbddf8226c4813b1190a0
Branch HEAD
BuildUser root@dcde2b74c858
BuildDate 20180921-07:22:29
GoVersion go1.10.3

  • Alertmanager configuration file:

  • Prometheus configuration file:

  • Logs:

@TinLe
Copy link
Author

TinLe commented Nov 27, 2018

It would be good at time of silence creation to give an option for a reminder, even better is reminder after N time (e.g. after/every 24hrs, after/every N days, etc).

@simonpasquier
Copy link
Member

I'm not sure that your use case is compelling enough to be implemented. In general, I'd recommend to choose the lowest possible value for the silence period. Worst case, the alerts will fire again and you can decide consciously whether or not the silence needs to be extended. Another option is to filter alerts at the Prometheus level and turn them back on when you're rolling the new datacenter to production.
Also silences are completely decoupled from the notifiers so this would require significant changes to the configuration/code to achieve the goal.

@brian-brazil
Copy link
Contributor

I think this sort of check is best done via an external script.

@TinLe
Copy link
Author

TinLe commented Dec 2, 2018

This is similar to / related to #730.

@simonpasquier it's hard when there are multiple teams and large complicated project. I am not saying that excuse it, but mistakes happens and things get dropped. We will implement your suggestions, but knowing human nature, it's going to happen again. This was one of the solution we came up with in our post mortem analysis. It won't completely solve the problem, but help us catch a forgotten silenced alert.

We would like to automate things as much as possible and not rely on humans remembering.

@stuartnelson3
Copy link
Contributor

This change would require settling on the "right way" to notify the creator. If it's email, then we would have to validate the creator is an email address, and force and users of alertmanager to have a mail server configured.

A simple solution might be to have amtool running as a cron check, and set the output to json (-o json). You can then send notifications to the creator based on the startsAt and endsAt times.

Another option is to filter alerts at the Prometheus level and turn them back on when you're rolling the new datacenter to production.

We've done this at SoundCloud when bringing up new DCs via configuration management.

@tewing-riffyn
Copy link

I wish this was implemented or there was some sort of trigger I could call when a silence is created. I want to get a message in a slack channel whenever a silence is created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants