Sending metadata with metrics #118

AloisReitbauer · 2018-11-13T16:55:49Z

Are there any plans to include metadata - like what is a good or bad value - as part of the data. This will be very helpful for analytics tools to interpret the data streams.

SuperQ · 2018-11-13T17:01:55Z

I usually do that by defining the threshold as another metric.

AloisReitbauer · 2018-11-14T13:47:21Z

A standardized way would help to make it easier to automatically process this with a monitoring system.

RichiH · 2018-11-14T14:43:57Z

I put _low_warning, _low_alert, etc. I agree that having a default way to define those makes sense.

…

On Wed, 14 Nov 2018, 21:47 Alois Reitbauer ***@***.*** wrote: A standardized way would help to make it easier to automatically process this with a monitoring system. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#118 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAuEI6MIjX0ikr3cdTMYjhmkWPDcJCDnks5uvB7qgaJpZM4YcFMD> .

SuperQ · 2018-11-15T03:50:56Z

Yes, a good set of naming conventions for alerting thresholds would be useful.

StevenLeRoux · 2018-11-15T08:43:29Z

The threshold can be associated to a label value, not the whole metric property.

For example in our alerting system, you can define a threshold for os.cpu{}, but you can also subscope to a group or a host. When you use multiple hardware profiles, you can't define the same threshold for all machines cause 80% don't mean the same for a 4c/8t host than a 32c/64t one.

Example :

os.cpu{} > 90%
os.cpu{profile=low} > 60%
os.cpu{host=1234} > 95%
temperature{} > 60°C
temperature{profile=GPU} > 70°C

This is also why we don't pervert the data model for alerting, because it's an abstraction above the store. Mixing both of them will mostly apply to server monitoring only use case while you can use time series for business KPIs or whatever that can be moved from a host to another. A good example is managing canary tests and adjust dynamic alerting thresholds accordingly to the ratio.

SuperQ · 2018-11-15T23:19:54Z

@StevenLeRoux Yes, it's already a common pattern in Prometheus to have different thresholds by a label.

Typically users create these via recording rules in the monitoring system, because they are related to topology. (prod, canary, team, zone) Thus the target does not have any idea what the values should be.

This is done to allow for simplified alerting rules.

But there are other use case. So having this data coming from the monitored target may also be a good idea. For example, say you have devices with different temperature operating profiles.

You might have some metric for the per-device temp:

node_disk_temperature_celcius{device="/dev/sda"} 55.2
node_disk_temperature_celcius{device="/dev/sdb"} 57.2

Now, each device may be either an SSD or HDD, and have a different operating requirement

node_disk_temperature_critical_celcius{device="/dev/sda"} 50.0
node_disk_temperature_critical_celcius{device="/dev/sdb"} 60.0

This makes it easy to write the alert because the label lines up.

- alert: DeviceTempTooHigh
  expr: node_disk_temperature_celcius > node_disk_temperature_critical_celcius

RichiH · 2020-12-01T20:02:41Z

Please see https://github.com/OpenObservability/OpenMetrics/blob/master/specification/OpenMetrics.md#thresholds

RichiH closed this as completed Dec 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sending metadata with metrics #118

Sending metadata with metrics #118

AloisReitbauer commented Nov 13, 2018

SuperQ commented Nov 13, 2018

AloisReitbauer commented Nov 14, 2018

RichiH commented Nov 14, 2018 via email

SuperQ commented Nov 15, 2018

StevenLeRoux commented Nov 15, 2018

SuperQ commented Nov 15, 2018 •

edited

Loading

RichiH commented Dec 1, 2020

Sending metadata with metrics #118

Sending metadata with metrics #118

Comments

AloisReitbauer commented Nov 13, 2018

SuperQ commented Nov 13, 2018

AloisReitbauer commented Nov 14, 2018

RichiH commented Nov 14, 2018 via email

SuperQ commented Nov 15, 2018

StevenLeRoux commented Nov 15, 2018

SuperQ commented Nov 15, 2018 • edited Loading

RichiH commented Dec 1, 2020

SuperQ commented Nov 15, 2018 •

edited

Loading