Added additional entries for troubleshooting unhealthy cluster #119914

thekofimensah · 2025-01-10T03:12:39Z

Reordered "Re-enable shard allocation" because not as common as other causes

Added additional causes of yellow statuses

Changed watermark command to include high and low watermark so users can make their cluster operate once again.

Reordered "Re-enable shard allocation" because not as common as other causes Added additional causes of yellow statuses Changed watermark commadn to include high and low watermark so users can make their cluster operate once again.

github-actions · 2025-01-10T03:12:52Z

Documentation preview:

✨ Changed pages

thekofimensah · 2025-01-10T03:13:40Z

cc @georgewallace (can't personally add reviewers at the moment)

elasticsearchmachine · 2025-01-10T03:45:55Z

Pinging @elastic/es-docs (Team:Docs)

leemthompo

Drive-by copyedit with suggestions for concision and some formatting fixes.

docs/reference/troubleshooting/common-issues/red-yellow-cluster-status.asciidoc

leemthompo · 2025-01-10T11:31:15Z

docs/reference/troubleshooting/common-issues/red-yellow-cluster-status.asciidoc

+* If you manually restart a node, then it will temporarily cause an unhealthy cluster until the node has recovered.
+
+* If you have a node that is overloaded or has stopped operating for any reason, then it will temporarily cause an unhealthy cluster. Nodes may disconnect because of prolonged garbage collection (GC) pauses, which can result from "out of memory" errors or high memory usage due to intensive search operations. See <<fix-cluster-status-jvm,Reduce JVM memory pressure>> for more JVM related issues.
+
+* If nodes cannot reliably communicate due to networking issues, they may lose contact with one another. This can cause shards to become out of sync. You can often identify this issue by checking the logs for repeated messages about nodes leaving and rejoining the cluster.


Suggested change

* If you manually restart a node, then it will temporarily cause an unhealthy cluster until the node has recovered.

* If you have a node that is overloaded or has stopped operating for any reason, then it will temporarily cause an unhealthy cluster. Nodes may disconnect because of prolonged garbage collection (GC) pauses, which can result from "out of memory" errors or high memory usage due to intensive search operations. See <<fix-cluster-status-jvm,Reduce JVM memory pressure>> for more JVM related issues.

* If nodes cannot reliably communicate due to networking issues, they may lose contact with one another. This can cause shards to become out of sync. You can often identify this issue by checking the logs for repeated messages about nodes leaving and rejoining the cluster.

* A manual node restart will cause a temporary unhealthy cluster state until the node recovers.

* Node overload or failure causes a temporary unhealthy cluster state. Prolonged garbage collection (GC) pauses, caused by out-of-memory errors or high memory usage during intensive searches, can trigger this state. See <<fix-cluster-status-jvm,Reduce JVM memory pressure>> for more JVM-related issues.

* Network issues can prevent reliable node communication, causing shards to become out of sync. Check the logs for repeated messages about nodes leaving and rejoining the cluster.

copyedit for concision

I like these edits but "Node overload or failure causes a temporary unhealthy cluster state" isn't clear to me.

How do you feel about

"When a node becomes overloaded or fails, it can temporarily disrupt the cluster’s health, leading to an unhealthy state."

@leemthompo

docs/reference/troubleshooting/common-issues/red-yellow-cluster-status.asciidoc

shainaraskas · 2025-01-10T15:46:01Z

run docs-build

docs/reference/troubleshooting/common-issues/red-yellow-cluster-status.asciidoc

…fixes. Co-authored-by: Liam Thompson <[email protected]>

Co-authored-by: Liam Thompson <[email protected]>

…r-status.asciidoc Co-authored-by: shainaraskas <[email protected]>

Co-authored-by: Liam Thompson <[email protected]>

…r-status.asciidoc Co-authored-by: Liam Thompson <[email protected]>

elasticsearchmachine added v8.17.2 external-contributor Pull request authored by a developer outside the Elasticsearch team needs:triage Requires assignment of a team area label labels Jan 10, 2025

georgewallace self-requested a review January 10, 2025 03:44

georgewallace added >docs General docs changes v8.18.0 v9.0.0 Team:Docs Meta label for docs team labels Jan 10, 2025

georgewallace self-assigned this Jan 10, 2025

elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Jan 10, 2025

leemthompo added the auto-backport Automatically create backport pull requests when merged label Jan 10, 2025

leemthompo reviewed Jan 10, 2025

View reviewed changes

shainaraskas reviewed Jan 10, 2025

View reviewed changes

docs/reference/troubleshooting/common-issues/red-yellow-cluster-status.asciidoc Outdated Show resolved Hide resolved

thekofimensah and others added 7 commits January 10, 2025 11:02

Drive-by copyedit with suggestions for concision and some formatting …

8db40fd

…fixes. Co-authored-by: Liam Thompson <[email protected]>

Concision and some formatting fixes.

0cbbedd

Co-authored-by: Liam Thompson <[email protected]>

Colon added

766be14

Co-authored-by: Liam Thompson <[email protected]>

Update docs/reference/troubleshooting/common-issues/red-yellow-cluste…

059159b

…r-status.asciidoc Co-authored-by: shainaraskas <[email protected]>

Title change

f7c80b0

Co-authored-by: Liam Thompson <[email protected]>

Update docs/reference/troubleshooting/common-issues/red-yellow-cluste…

de0e848

…r-status.asciidoc Co-authored-by: Liam Thompson <[email protected]>

Spelling fix

49dea5c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added additional entries for troubleshooting unhealthy cluster #119914

Added additional entries for troubleshooting unhealthy cluster #119914

thekofimensah commented Jan 10, 2025

github-actions bot commented Jan 10, 2025

thekofimensah commented Jan 10, 2025

elasticsearchmachine commented Jan 10, 2025

leemthompo left a comment

leemthompo Jan 10, 2025

thekofimensah Jan 10, 2025

shainaraskas commented Jan 10, 2025

Added additional entries for troubleshooting unhealthy cluster #119914

Are you sure you want to change the base?

Added additional entries for troubleshooting unhealthy cluster #119914

Conversation

thekofimensah commented Jan 10, 2025

github-actions bot commented Jan 10, 2025

thekofimensah commented Jan 10, 2025

elasticsearchmachine commented Jan 10, 2025

leemthompo left a comment

Choose a reason for hiding this comment

leemthompo Jan 10, 2025

Choose a reason for hiding this comment

thekofimensah Jan 10, 2025

Choose a reason for hiding this comment

shainaraskas commented Jan 10, 2025