Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubernetes_state.node.count does not get the node labels from K8s #12570

Open
alexbowers opened this issue Jul 22, 2022 · 8 comments
Open

kubernetes_state.node.count does not get the node labels from K8s #12570

alexbowers opened this issue Jul 22, 2022 · 8 comments

Comments

@alexbowers
Copy link

alexbowers commented Jul 22, 2022

Describe the results you received:
CleanShot 2022-07-22 at 12 17 29

Describe the results you expected:
I would expect kubernetes_state.node.count to have the labels that are passed from the node, so that I can get the number of nodes that are within each node-group for monitoring.

Additional information you deem important (e.g. issue happens only occasionally):
As you can see kubernetes_state.node.age (and others) have the node-group name and other information that I want to use.
CleanShot 2022-07-22 at 12 17 41

@clamoriniere
Copy link
Contributor

hi @alexbowers

the fact that kubernetes.node.count is not labeled with node label is the current expected behaviour because this metric is an aggregation (count) of nodes, so it can't have specific node's labels.
We currently aggregate the nodes by: "kubelet_version", "container_runtime_version", "kernel_version", "os_image"
You can see the current implementation here

kubernetes.node.age is not an aggregation because we provide the age for each node.

Please let us know if we can help in another way.

Thanks and regards
Cedric

@alexbowers
Copy link
Author

Can some way of defining specific labels from nodes to be put onto the aggregate metrics so that for example, you can aggregate by environment be considered?

As it stands, the aggregation isn't useful to us at all, because it combines our staging, QA, and production environments together and pollutes the actual data that we'd be looking for.

If there was a way for us to say "include env label in aggregation only" that would solve this problem for us.

@13013SwagR
Copy link

Hey,
I had similar issue, solved it by using the following to get node count per nodegroup
"sum:kubernetes_state.node.by_condition{kube_cluster_name:cluster-name,condition:ready,status:true} by {k8s-nodegroup}"

@drmaciej
Copy link

Great workaround, thanks @13013SwagR.

I just popped in to mention that this issue affected us as well - we have a number of monitors in which we aggregate kubernetes_state.node.count by aws_autoscaling_groupname, disappearance of this label was a fairly unwelcome surprise :(

@clamoriniere
Copy link
Contributor

Hi @drmaciej

We now provide a set of "service checks" to represent the different "standard" Node conditions:

  • kubernetes_state.node.ready
  • kubernetes_state.node.out_of_disk
  • kubernetes_state.node.disk_pressure
  • kubernetes_state.node.network_unavailable
  • kubernetes_state.node.memory_pressure

See: https://docs.datadoghq.com/integrations/kubernetes_state_core/?tab=helm#service-checks

Because these service checks generate a status of each node and they are attached to the corresponding host, All the host tags can be use to "group by" in the monitor.

Screenshot 2023-01-16 at 19 22 28

@drmaciej
Copy link

Thanks @clamoriniere, that makes sense.

I actually do not see kubernetes_state.node.out_of_disk or kubernetes_state.node.network_unavailable in my environments (I do see the other 3). Are those expected to show up only when there is no disk space or the network is not available?

@Suhaidee
Copy link

Suhaidee commented Jan 14, 2025

@drmaciej I am seeing the same as yours, been wondering about this as well.
@clamoriniere Would you please have a look and confirm on this ?

@drmaciej
Copy link

for what it's worth, I still do not have those two checks on my end; my status subcommand shows no issues in the cluster agents; this was never a big priority for me, so I never even followed up with Support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants