[CI] HealthNodeUpgradeIT testHealthNode {upgradedNodes=1} failing #118157

elasticsearchmachine · 2024-12-06T14:18:30Z

Build Scans:

Reproduction Line:

./gradlew ":qa:rolling-upgrade:v8.6.2#bwcTest" -Dtests.class="org.elasticsearch.upgrades.HealthNodeUpgradeIT" -Dtests.method="testHealthNode {upgradedNodes=1}" -Dtests.seed=C8A826B8E990C4F9 -Dtests.bwc=true -Dtests.locale=hu-Latn-HU -Dtests.timezone=EST -Druntime.java=23

Applicable branches:
8.x

Reproduces locally?:
N/A

Failure History:
See dashboard

Failure Message:

org.elasticsearch.client.ResponseException: method [GET], host [http://[::1]:44001], URI [_internal/_health], status line [HTTP/1.1 405 Method Not Allowed]
{"error":"Incorrect HTTP method for uri [_internal/_health] and method [GET], allowed: [POST]","status":405}

Issue Reasons:

[8.x] 12 consecutive failures in step 8.6.2_bwc
[8.x] 13 consecutive failures in step 8.5.3_bwc
[8.x] 25 failures in test testHealthNode {upgradedNodes=1} (2.6% fail rate in 961 executions)
[8.x] 12 failures in step 8.6.2_bwc (100.0% fail rate in 12 executions)
[8.x] 13 failures in step 8.5.3_bwc (100.0% fail rate in 13 executions)
[8.x] 12 failures in pipeline elasticsearch-periodic (100.0% fail rate in 12 executions)

Note:
This issue was created using new test triage automation. Please report issues or feedback to es-delivery.

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2024-12-06T14:18:54Z

Pinging @elastic/es-storage-engine (Team:StorageEngine)

elasticsearchmachine · 2024-12-06T14:31:03Z

Pinging @elastic/es-data-management (Team:Data Management)

elasticsearchmachine · 2024-12-09T22:19:00Z

This has been muted on branch 8.x

Mute Reasons:

[8.x] 13 consecutive failures in step 8.5.3_bwc
[8.x] 11 consecutive failures in step 8.6.2_bwc
[8.x] 24 failures in test testHealthNode {upgradedNodes=1} (2.6% fail rate in 934 executions)
[8.x] 13 failures in step 8.5.3_bwc (100.0% fail rate in 13 executions)
[8.x] 11 failures in step 8.6.2_bwc (100.0% fail rate in 11 executions)
[8.x] 12 failures in pipeline elasticsearch-periodic (100.0% fail rate in 12 executions)

Build Scans:

…pgradedNodes=1} #118157

PeteGillinElastic · 2025-01-07T15:31:16Z

This (and #118158) complains that the GET /_internal/_health endpoint doesn't exist when running the BWC tests for 8.5.3 and 8.6.2. This is because we renamed that endpoint in 8.7.0. (It was very briefly at GET /_health but settled on GET /_health_report.)

PeteGillinElastic · 2025-01-07T16:36:40Z

For the record, this entire test was removed from main, but obviously still runs in these BWC cases.

PeteGillinElastic · 2025-01-07T16:42:05Z

This looks related to #106933 although that is a different exception. At any rate, my understanding is that no change made to the test code in main will affect the BWC test (which uses the test code from the historic tag).

My feeling is that we can just suppress these tests permanently (or for as long as we run the 8.x BWC tests). The _internal endpoint was never meant to be backwards compatible so this seems fine.

PeteGillinElastic · 2025-01-07T18:00:15Z

For the record again: These failures are all on 8.x. We don't do the 8.5.3 or 8.6.2 BWC tests on main.

PeteGillinElastic · 2025-01-07T18:46:06Z

Wait, I don't think this test is doing what I had assumed it was doing. When I run the test from 8.x, it seems to be taking the test code from the head of that branch. (The test class doesn't even exist at 8.6.2 or 8.5.3.) But if I get the test to do GET / then it reports that the node is version 8.18.0-SNAPSHOT.

PeteGillinElastic · 2025-01-07T19:51:10Z

Ah, okay. When I make the test do GET /_cluster/state?filter_path=nodes_features,nodes.*.version, I get this:

{nodes_features=[{features=[], node_id=-wPGWBI-TeOcB9mwggk1sw}, {features=[], node_id=05OafmF5QyCHuTqfkrChpw}, {features=[], node_id=44gnsY7HTsK4sOaHDGEMwg}], nodes={44gnsY7HTsK4sOaHDGEMwg={version=8.18.0}, -wPGWBI-TeOcB9mwggk1sw={version=8.6.2}, 05OafmF5QyCHuTqfkrChpw={version=8.6.2}}}

So this test is running a mixture of 8.6.2 and 8.18.0. I expect that this would not report having the health.supports_health_report_api feature (because that was enabled from 8.7.0 and the cluster feature is only considered present if it is present on all nodes) and so the test will try to GET _internal/_health. That would succeed if the request got served by an 8.6.2 node but not if it got served by an 8.18.0 node. So I think whether the test passes or fails depends which node that request goes to. I'm not even sure whether that's deterministic.

PeteGillinElastic · 2025-01-07T19:53:51Z

I'm coming back to the view that we should suppress the test for the 8.5.3 and 8.6.2 upgrades. I don't think we can reliably do that test when some of the nodes in the cluster have _internal/_health and some of them have _health_report.

This excludes the `HealthNodeUpgradeIT` test for the rolling upgrade tests which use a cluster with a mix of either 8.5.3 or 8.6.2 nodes, which serve the health endpoint at `_internal/_health`, and 8.last nodes, which serve it at `_health_report`. There is no sensible and reliable way to test the endpoint in such clusters. Closes elastic#118157 Closes elastic#118158

* Skip HealthNodeUpgradeIT for some rolling upgrades This skips part of the `HealthNodeUpgradeIT` test for the rolling upgrade tests which use a cluster with a mix of 8.5.x and 8.6.x nodes, which serve the health endpoint at `_internal/_health`, and 8.last nodes, which serve it at `_health_report`. There is no sensible and reliable way to test the endpoint in such clusters. Closes #118157 Closes #118158

PeteGillinElastic · 2025-01-08T15:07:39Z

Fixed by c124f1b .

elasticsearchmachine added :StorageEngine/Mapping The storage related side of mappings >test-failure Triaged test failures from CI Team:StorageEngine needs:risk Requires assignment of a risk label (low, medium, blocker) labels Dec 6, 2024

kkrik-es assigned kkrik-es and unassigned kkrik-es Dec 6, 2024

kkrik-es added Team:Data Management Meta label for data/management team :Data Management/Health and removed Team:StorageEngine :StorageEngine/Mapping The storage related side of mappings labels Dec 6, 2024

elasticsearchmachine added a commit that referenced this issue Dec 9, 2024

Mute org.elasticsearch.upgrades.HealthNodeUpgradeIT testHealthNode {u…

b10f28e

…pgradedNodes=1} #118157

mattc58 assigned PeteGillinElastic Dec 12, 2024

dakrone added low-risk An open issue or test failure that is a low risk to future releases and removed needs:risk Requires assignment of a risk label (low, medium, blocker) labels Dec 17, 2024

PeteGillinElastic mentioned this issue Jan 7, 2025

Skip HealthNodeUpgradeIT for some rolling upgrades #119698

Merged

PeteGillinElastic closed this as completed Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] HealthNodeUpgradeIT testHealthNode {upgradedNodes=1} failing #118157

[CI] HealthNodeUpgradeIT testHealthNode {upgradedNodes=1} failing #118157

elasticsearchmachine commented Dec 6, 2024 •

edited

Loading

elasticsearchmachine commented Dec 6, 2024

elasticsearchmachine commented Dec 6, 2024

elasticsearchmachine commented Dec 9, 2024

PeteGillinElastic commented Jan 7, 2025

PeteGillinElastic commented Jan 7, 2025

PeteGillinElastic commented Jan 7, 2025

PeteGillinElastic commented Jan 7, 2025

PeteGillinElastic commented Jan 7, 2025

PeteGillinElastic commented Jan 7, 2025

PeteGillinElastic commented Jan 7, 2025

PeteGillinElastic commented Jan 8, 2025

[CI] HealthNodeUpgradeIT testHealthNode {upgradedNodes=1} failing #118157

[CI] HealthNodeUpgradeIT testHealthNode {upgradedNodes=1} failing #118157

Comments

elasticsearchmachine commented Dec 6, 2024 • edited Loading

elasticsearchmachine commented Dec 6, 2024

elasticsearchmachine commented Dec 6, 2024

elasticsearchmachine commented Dec 9, 2024

PeteGillinElastic commented Jan 7, 2025

PeteGillinElastic commented Jan 7, 2025

PeteGillinElastic commented Jan 7, 2025

PeteGillinElastic commented Jan 7, 2025

PeteGillinElastic commented Jan 7, 2025

PeteGillinElastic commented Jan 7, 2025

PeteGillinElastic commented Jan 7, 2025

PeteGillinElastic commented Jan 8, 2025

elasticsearchmachine commented Dec 6, 2024 •

edited

Loading