-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sending metadata with metrics #118
Comments
I usually do that by defining the threshold as another metric. |
A standardized way would help to make it easier to automatically process this with a monitoring system. |
I put _low_warning, _low_alert, etc. I agree that having a default way to
define those makes sense.
…On Wed, 14 Nov 2018, 21:47 Alois Reitbauer ***@***.*** wrote:
A standardized way would help to make it easier to automatically process
this with a monitoring system.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#118 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAuEI6MIjX0ikr3cdTMYjhmkWPDcJCDnks5uvB7qgaJpZM4YcFMD>
.
|
Yes, a good set of naming conventions for alerting thresholds would be useful. |
The threshold can be associated to a label value, not the whole metric property. For example in our alerting system, you can define a threshold for os.cpu{}, but you can also subscope to a group or a host. When you use multiple hardware profiles, you can't define the same threshold for all machines cause 80% don't mean the same for a 4c/8t host than a 32c/64t one. Example :
This is also why we don't pervert the data model for alerting, because it's an abstraction above the store. Mixing both of them will mostly apply to server monitoring only use case while you can use time series for business KPIs or whatever that can be moved from a host to another. A good example is managing canary tests and adjust dynamic alerting thresholds accordingly to the ratio. |
@StevenLeRoux Yes, it's already a common pattern in Prometheus to have different thresholds by a label. Typically users create these via recording rules in the monitoring system, because they are related to topology. (prod, canary, team, zone) Thus the target does not have any idea what the values should be. This is done to allow for simplified alerting rules. But there are other use case. So having this data coming from the monitored target may also be a good idea. For example, say you have devices with different temperature operating profiles. You might have some metric for the per-device temp:
Now, each device may be either an SSD or HDD, and have a different operating requirement
This makes it easy to write the alert because the label lines up.
|
Are there any plans to include metadata - like what is a good or bad value - as part of the data. This will be very helpful for analytics tools to interpret the data streams.
The text was updated successfully, but these errors were encountered: