Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation around desired balance #119902

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

DiannaHohensee
Copy link
Contributor

@DiannaHohensee DiannaHohensee commented Jan 9, 2025

More documentation again. I'm trying to figure out how everything plugs together so I can hook in my metric collection.

Relates ES-10341

@DiannaHohensee DiannaHohensee added >non-issue :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed Coordination Meta label for Distributed Coordination team labels Jan 9, 2025
@DiannaHohensee DiannaHohensee self-assigned this Jan 9, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Copy link
Contributor

@nicktindall nicktindall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some minor comments (some are probably just preference so up to you how you address)

* @param lastConvergedIndex Identifies what input data the balancer computation round used to produce this {@link DesiredBalance}. See
* {@link DesiredBalanceInput#index()} for details. Each reroute request gets assigned a monotonically increasing
* sequence number, and the balancer, which runs async to reroute, uses the latest request's data to compute the
* desired balance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think this would be "strictly increasing", "monotonically increasing" means values can be repeated? Perhaps "sequence number" is enough as (I think) it implies the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, applied. Simpler.

* produces a new ClusterState with the changes made by {@link DesiredBalanceReconciler#reconcile}. The {@link RerouteStrategy} provided
* to the callback calls into {@link #desiredBalanceReconciler} for the changes. The {@link #masterServiceTaskQueue} will apply the
* cluster state update.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems overly specific to me? Given it's an interface, I feel like I'd rather know what it does rather than how it does it.

I think it's the the "to run ...." bit that I find jarring. If it's a good abstraction, only the what should matter, not the how. We can use our IDEs to find the implementation(s). Also would be less likely to go stale if we were less specific.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that a good abstraction would explain what and not how. The problem with this area of the code is that it's like spaghetti and difficult to follow. Right now the Allocator has a callback to the AllocationService, which has a callback to the Allocator, which produces a result for the AllocationService to feed back into the Allocator's MasterServiceTaskQueue..... The first step to improve the code, in my mind, is to document what's happening, later we can hopefully refactor the code.

* Reconciliation ({@link DesiredBalanceReconciler#reconcile(DesiredBalance, RoutingAllocation)}) takes the {@link DesiredBalance}
* output of {@link DesiredBalanceComputer#compute} and identifies how shards need to be added, moved or removed to go from the current
* cluster shard allocation to the new desired allocation.
*/
private final DesiredBalanceReconciler desiredBalanceReconciler;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this doc be on DesiredBalanceReconciler#reconcile ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR with a new comment on DesiredBalanceReconciler#reconcile

I'd like to explain here how desiredBalanceReconciler differs from reconciler. They have practically the same name right now, so I think it makes the code more understandable to very clearly explain how they are used / what they do in this file's context.

* Accepts listeners with an index value (see {#link #indexGenerator}) and run them whenever a DesiredBalance computation completes with
* an equal or greater index value.
*/
private final PendingListenersQueue pendingListenersQueue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the javadoc on the pending listeners queue is enough? or we're duplicating it a bit (i.e. more to maintain)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, you're right. Rewrote to just say that it tracks and runs listeners for after computation completes

Copy link
Contributor Author

@DiannaHohensee DiannaHohensee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

* @param lastConvergedIndex Identifies what input data the balancer computation round used to produce this {@link DesiredBalance}. See
* {@link DesiredBalanceInput#index()} for details. Each reroute request gets assigned a monotonically increasing
* sequence number, and the balancer, which runs async to reroute, uses the latest request's data to compute the
* desired balance.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, applied. Simpler.

* produces a new ClusterState with the changes made by {@link DesiredBalanceReconciler#reconcile}. The {@link RerouteStrategy} provided
* to the callback calls into {@link #desiredBalanceReconciler} for the changes. The {@link #masterServiceTaskQueue} will apply the
* cluster state update.
*/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that a good abstraction would explain what and not how. The problem with this area of the code is that it's like spaghetti and difficult to follow. Right now the Allocator has a callback to the AllocationService, which has a callback to the Allocator, which produces a result for the AllocationService to feed back into the Allocator's MasterServiceTaskQueue..... The first step to improve the code, in my mind, is to document what's happening, later we can hopefully refactor the code.

* Reconciliation ({@link DesiredBalanceReconciler#reconcile(DesiredBalance, RoutingAllocation)}) takes the {@link DesiredBalance}
* output of {@link DesiredBalanceComputer#compute} and identifies how shards need to be added, moved or removed to go from the current
* cluster shard allocation to the new desired allocation.
*/
private final DesiredBalanceReconciler desiredBalanceReconciler;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR with a new comment on DesiredBalanceReconciler#reconcile

I'd like to explain here how desiredBalanceReconciler differs from reconciler. They have practically the same name right now, so I think it makes the code more understandable to very clearly explain how they are used / what they do in this file's context.

* Accepts listeners with an index value (see {#link #indexGenerator}) and run them whenever a DesiredBalance computation completes with
* an equal or greater index value.
*/
private final PendingListenersQueue pendingListenersQueue;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, you're right. Rewrote to just say that it tracks and runs listeners for after computation completes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >non-issue Team:Distributed Coordination Meta label for Distributed Coordination team v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants