Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ResponseOps][Task Manager] Task Manager is unhealthy, the assumedRequiredThroughputPerMinutePerKibana (NaN) >= ... #204467

Closed
pmuellr opened this issue Dec 16, 2024 · 2 comments · Fixed by #207116
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@pmuellr
Copy link
Member

pmuellr commented Dec 16, 2024

Noticed a new pattern in our messages like Task Manager is unhealthy, the (some stat)(value) ... - the value was NaN!

An example full message is

Task Manager is unhealthy, the assumedRequiredThroughputPerMinutePerKibana (NaN) >= capacityPerMinutePerKibana (1200)

Easily found in overview via the filter: log.logger: "plugins.taskManager" AND message: "NaN".

@pmuellr pmuellr added bug Fixes for quality problems that affect the customer experience Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Dec 16, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@cedricdv
Copy link

cedricdv commented Jan 2, 2025

Seeing the same behavior after having set xpack.task_manager.capacity in kibana.yml. Before it was unset, now I have set it to capacity: 50 and I see following error:

Task Manager is unhealthy, the assumedRequiredThroughputPerMinutePerKibana (NaN) >= capacityPerMinutePerKibana (6000)

@doakalexi doakalexi self-assigned this Jan 13, 2025
doakalexi added a commit that referenced this issue Feb 12, 2025
…returns NaN (#207116)

Resolves #204467

## Summary

`assumedRequiredThroughputPerMinutePerKibana` is `NaN` when the
`capacityStats.runtime.value.load.p90` is undefined. This PR adds a
check to catch when the load.p90 is undefined, throw an error, and
ignore calculating the capacity estimation.


### Checklist

- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

### To verify
I was not able to reproduce this locally without changing the code, so
here is how I tested the code and I am definitely open to suggestions of
how to better test this.

1. Update the code to set `capacityStats.runtime.value.load.p90:
undefined`. I set it
[here](https://github.com/elastic/kibana/blob/286c9e2ddb9f338b0981cc5145bb4179ef7657cb/x-pack/platform/plugins/shared/task_manager/server/monitoring/capacity_estimation.ts#L55),
but there are other places upstream where you could set it to
`undefined`.
2. Start Kibana
3. Verify that you see the following log message:
```
 Task manager had an issue calculating capacity estimation. averageLoadPercentage: undefined
```
kibanamachine pushed a commit to kibanamachine/kibana that referenced this issue Feb 12, 2025
…returns NaN (elastic#207116)

Resolves elastic#204467

## Summary

`assumedRequiredThroughputPerMinutePerKibana` is `NaN` when the
`capacityStats.runtime.value.load.p90` is undefined. This PR adds a
check to catch when the load.p90 is undefined, throw an error, and
ignore calculating the capacity estimation.

### Checklist

- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

### To verify
I was not able to reproduce this locally without changing the code, so
here is how I tested the code and I am definitely open to suggestions of
how to better test this.

1. Update the code to set `capacityStats.runtime.value.load.p90:
undefined`. I set it
[here](https://github.com/elastic/kibana/blob/286c9e2ddb9f338b0981cc5145bb4179ef7657cb/x-pack/platform/plugins/shared/task_manager/server/monitoring/capacity_estimation.ts#L55),
but there are other places upstream where you could set it to
`undefined`.
2. Start Kibana
3. Verify that you see the following log message:
```
 Task manager had an issue calculating capacity estimation. averageLoadPercentage: undefined
```

(cherry picked from commit 8bff766)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants