Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[proxysql] Generate metric for all pairs of hostgroup/status (backend) #19438

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

frivoire
Copy link
Contributor

What does this PR do?

Ensure that the count metric is always generated, even when 0 backend exists in that status & hostgroup at the moment.

Motivation

Today, when a hostgroup has no online backend server for example, the metric with tag status=ONLINE+hostgroup=42 is not generated at all (no value for sometime, until a backend is online).
But having no online backend is an important piece of information to monitor a proxysql.
Example: a dashboard showing the nb of online backend servers per hostgroup is not giving a good observability if not showing the 0 value, because it has a different meaning than no value :(

NB: the list of hostgroups is dynamic (because it depends on config, thus different for each deployment), and the list of status is hardcoded (because fixed in proxysql, cf doc)

Tests

I've only tested the SQL query alone on a real proxysql:

Before:

ProxySQLadm> SELECT hostgroup, status, COUNT(*)
FROM stats_mysql_connection_pool GROUP BY hostgroup, status;
+-----------+--------------+----------+
| hostgroup | status       | COUNT(*) |
+-----------+--------------+----------+
| 10        | ONLINE       | 1        |
| 10        | SHUNNED      | 2        |
| 20        | ONLINE       | 3        |
| 30        | ONLINE       | 2        |
| 9999      | OFFLINE_HARD | 3        |
+-----------+--------------+----------+

Now:

ProxySQLadm> SELECT refpairs.hostgroup, refpairs.status, count(stats.srv_host) as count
             FROM (
                 SELECT DISTINCT(hostgroup), refstatuses.status
                 FROM stats_mysql_connection_pool
                 JOIN (
                     select 'ONLINE' as status union all select 'SHUNNED' as status union all
                     select 'OFFLINE_SOFT' as status union all select 'OFFLINE_HARD' as status
                 ) refstatuses
             ) refpairs
             LEFT JOIN stats_mysql_connection_pool stats ON stats.hostgroup=refpairs.hostgroup and stats.status=refpairs.status
             GROUP BY refpairs.hostgroup, refpairs.status;
+-----------+--------------+-------+
| hostgroup | status       | count |
+-----------+--------------+-------+
| 10        | OFFLINE_HARD | 0     |
| 10        | OFFLINE_SOFT | 0     |
| 10        | ONLINE       | 1     |
| 10        | SHUNNED      | 2     |
| 20        | OFFLINE_HARD | 0     |
| 20        | OFFLINE_SOFT | 0     |
| 20        | ONLINE       | 3     |
| 20        | SHUNNED      | 0     |
| 30        | OFFLINE_HARD | 0     |
| 30        | OFFLINE_SOFT | 0     |
| 30        | ONLINE       | 2     |
| 30        | SHUNNED      | 0     |
| 9999      | OFFLINE_HARD | 3     |
| 9999      | OFFLINE_SOFT | 0     |
| 9999      | ONLINE       | 0     |
| 9999      | SHUNNED      | 0     |
+-----------+--------------+-------+

=> same values than before (when not zero) ✅
=> and all other pairs of hostgroup+status are now also generated as 0 🆕

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

@frivoire
Copy link
Contributor Author

frivoire commented Jan 20, 2025

NB: I've also notified the Datadog support about this issue => request #1997802

@HadhemiDD
Copy link
Contributor

1 - can you add the changelog entry for this change:
ddev release changelog new

2 - can you fix the linter:
ddev test proxysql --fmt

Copy link
Contributor

@HadhemiDD HadhemiDD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change requested.

Copy link

codecov bot commented Jan 24, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.74%. Comparing base (6782bb7) to head (ac604ed).
Report is 28 commits behind head on master.

Additional details and impacted files
Flag Coverage Δ
activemq ?
cassandra ?
hive ?
hivemq ?
hudi ?
ignite ?
jboss_wildfly ?
kafka ?
presto ?
solr ?

Flags with carried forward coverage won't be shown. Click here to find out more.

@frivoire
Copy link
Contributor Author

frivoire commented Jan 29, 2025

Hello,

1 - can you add the changelog entry for this change: ddev release changelog new
2 - can you fix the linter: ddev test proxysql --fmt

I'm not a Datadog employee, I know nothing about your internal process here 😬
I tried (installing ddev: brew install ddev/ddev/ddev), and running it from inside this git repo's directory locally, but:

flo@macbook:~/git/_external/datadog-integrations-core$ ddev release changelog new
Error: unknown command "release" for "ddev"
Run 'ddev --help' for usage.

$ ddev --version
ddev version v1.24.2

So, can you handle this for us ?
(we are just customers trying to fix a bug in Datadog 😆)
Or at least, provide a doc explaining how to do what you're asking me 🙏

@HadhemiDD
Copy link
Contributor

The datadog agent is designed to only collect metrics when present, so a missing value do not translate into 0 but rather into nothing (no value, no tags) until it gets generated.
So this is not a bug but more of of a feature request.

We can make the change on your behalf(might take some time though), however, after having a second look at it and checking for other similar requests (none found) , we believe it would be better to hide this behaviour behind a config option for this integration that is disabled by default but can be enabled for customers who want it. The reason behind is to avoid generating the extra metrics for other customers and break the current behaviour.

As for ddev, it is not restricted to datadog internal process, for future references you can check these docs:
ddev
dev guide

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants