Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added additional entries for troubleshooting unhealthy cluster #119914

Open
wants to merge 8 commits into
base: 8.17
Choose a base branch
from

Conversation

thekofimensah
Copy link

Reordered "Re-enable shard allocation" because not as common as other causes

Added additional causes of yellow statuses

Changed watermark command to include high and low watermark so users can make their cluster operate once again.

Reordered "Re-enable shard allocation" because not as common as other causes

Added additional causes of yellow statuses

Changed watermark commadn to include high and low watermark so users can make their cluster operate once again.
Copy link
Contributor

Documentation preview:

@elasticsearchmachine elasticsearchmachine added v8.17.2 external-contributor Pull request authored by a developer outside the Elasticsearch team needs:triage Requires assignment of a team area label labels Jan 10, 2025
@thekofimensah
Copy link
Author

cc @georgewallace (can't personally add reviewers at the moment)

@georgewallace georgewallace self-requested a review January 10, 2025 03:44
@georgewallace georgewallace added >docs General docs changes v8.18.0 v9.0.0 Team:Docs Meta label for docs team labels Jan 10, 2025
@georgewallace georgewallace self-assigned this Jan 10, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Jan 10, 2025
@leemthompo leemthompo added the auto-backport Automatically create backport pull requests when merged label Jan 10, 2025
Copy link
Contributor

@leemthompo leemthompo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by copyedit with suggestions for concision and some formatting fixes.

Comment on lines +91 to +95
* If you manually restart a node, then it will temporarily cause an unhealthy cluster until the node has recovered.

* If you have a node that is overloaded or has stopped operating for any reason, then it will temporarily cause an unhealthy cluster. Nodes may disconnect because of prolonged garbage collection (GC) pauses, which can result from "out of memory" errors or high memory usage due to intensive search operations. See <<fix-cluster-status-jvm,Reduce JVM memory pressure>> for more JVM related issues.

* If nodes cannot reliably communicate due to networking issues, they may lose contact with one another. This can cause shards to become out of sync. You can often identify this issue by checking the logs for repeated messages about nodes leaving and rejoining the cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* If you manually restart a node, then it will temporarily cause an unhealthy cluster until the node has recovered.
* If you have a node that is overloaded or has stopped operating for any reason, then it will temporarily cause an unhealthy cluster. Nodes may disconnect because of prolonged garbage collection (GC) pauses, which can result from "out of memory" errors or high memory usage due to intensive search operations. See <<fix-cluster-status-jvm,Reduce JVM memory pressure>> for more JVM related issues.
* If nodes cannot reliably communicate due to networking issues, they may lose contact with one another. This can cause shards to become out of sync. You can often identify this issue by checking the logs for repeated messages about nodes leaving and rejoining the cluster.
* A manual node restart will cause a temporary unhealthy cluster state until the node recovers.
* Node overload or failure causes a temporary unhealthy cluster state. Prolonged garbage collection (GC) pauses, caused by out-of-memory errors or high memory usage during intensive searches, can trigger this state. See <<fix-cluster-status-jvm,Reduce JVM memory pressure>> for more JVM-related issues.
* Network issues can prevent reliable node communication, causing shards to become out of sync. Check the logs for repeated messages about nodes leaving and rejoining the cluster.

copyedit for concision

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like these edits but "Node overload or failure causes a temporary unhealthy cluster state" isn't clear to me.

How do you feel about

"When a node becomes overloaded or fails, it can temporarily disrupt the cluster’s health, leading to an unhealthy state."

@leemthompo

@shainaraskas
Copy link
Contributor

run docs-build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >docs General docs changes external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Docs Meta label for docs team v8.17.2 v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants