Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transfer total slots calculation to smesher service #6624

Open
Tracked by #338
poszu opened this issue Jan 16, 2025 · 5 comments
Open
Tracked by #338

Transfer total slots calculation to smesher service #6624

poszu opened this issue Jan 16, 2025 · 5 comments
Assignees

Comments

@poszu
Copy link
Contributor

poszu commented Jan 16, 2025

No description provided.

@poszu poszu self-assigned this Jan 16, 2025
@poszu
Copy link
Contributor Author

poszu commented Jan 16, 2025

Moving this calculation to the smeshing service might not give the expected benefit. Currently, slots are calculated on the node-service like this:

ballot, err := ballots.FirstInEpoch(pb.db, s.atx, s.epoch)
if err != nil && !errors.Is(err, sql.ErrNotFound) {
return fmt.Errorf("get refballot %w", err)
}
if errors.Is(err, sql.ErrNotFound) {
s.beacon = pb.shared.beacon
s.eligibilities.slots = proposals.MustGetNumEligibleSlots(
s.atxWeight,
minweight.Select(lid.GetEpoch(), pb.cfg.minActiveSetWeight),
pb.shared.active.weight,
pb.cfg.layerSize,
pb.cfg.layersPerEpoch,
)
} else {
if ballot.EpochData == nil {
return fmt.Errorf("atx %d created invalid first ballot", s.atx)
}
s.ref = ballot.ID()
s.beacon = ballot.EpochData.Beacon
s.eligibilities.slots = ballot.EpochData.EligibilityCount
}

AFAIR, the calculation might return a different result depending on which node is asked or when it is asked (what view on atxSetWeight it has) . Because the smesher must be consistent about its slots throughout the epoch, it mustn't change its mind after publihing the first ballot.

Thus, to remove the API to calculate the slots I would need to add another one to obtain the first ballot. At this point, the entire effort loses the point because it's trading 1 API call for another...

@noamnelke
Copy link
Member

I don't fully understand the dilemma here. The flow should go like this:

Right before the start of the epoch, the smesher should determine their own eligibility for the epoch that's about to start. There are two steps to this calculation:

  1. Determine the number of eligibilities (requires the smesher knowing their own ATX weight and the total weight of all ATXs).
  2. Determine which layer each eligibility is in.

Doing 1 requires access to state for the total weight, so can't be done be the smesher without node involvement. However, other than finding the total weight - the rest of the calculation is trivial. It's just dividing my weight by the total, multiplying by the desired number of ballots and rounding. That's it.

Doing 2 requires the private key, so cannot be done on the node.

I prefer to make most node endpoints cache-able and the return values being universally useful, so for that reason I prefer the node just saying the total weight and letting the smesher calculate the number of eligibilities. The only potential issue with this is that I think we publish the active set in the first ballot (do we?) and that has to be consistent with the total weight used for the eligibility calculation.

If that's not the case: don't change anything.

If it's the case: instead of only asking for the total weight, the smesher should ask for active set AND total weight. It can then use the total weight to calculate the number of eligibilities and keep the active set so it can be included in the first ballot.

Am I missing something?

@poszu
Copy link
Contributor Author

poszu commented Jan 21, 2025

I'm sorry I wasn't clear. The smesher's eligibility count is included in the first proposal in the EpochData:

// EpochData contains information that cannot be changed mid-epoch.
type EpochData struct {
ActiveSetHash Hash32
// the beacon value the smesher recorded for this epoch
Beacon Beacon
// total number of ballots the smesher is eligible in this epoch.
EligibilityCount uint32
}

Once the first proposal is published, this count mustn't be changed (same for the beacon and active set hash). Normally, we look up the smesher's first ballot in the given epoch and pick the eligibility count from it, if it exists. I believe we should do the same in the node-split.

Imagine a situation when the user installs and runs the smeshing-service client on a new computer (the state is not preserved) after publishing its first ballot. Suppose it calculates the eligibility count from the active set weight and ATX weight (which also needs to be fetched from the node-service). In that case, it might get a different result because the active set might be different now (another node was asked or new ATXs were published).

I think that to make eligibility count calculation robust and do it on the smashing-service side, the flow would need to be something like this:

  1. Ask the node-service for the first ballot. If found - take the eligibility count and beacon from the first ballot.
  2. Else, ask the node-service for active set weight, the miner's own ATX weight and VRF nonce from its last ATX targeting this epoch (if not cached locally), and beacon.
  3. Calculate the slots using weight and active set weight.
  4. Calculate the proofs using the slots, beacon and VRF nonce (requires private key)

My claim is, that we cannot get rid of the call in the point in 1 and thus there is no real benefit of moving the calculation of the slots to the smeshing service side. In the current implementation, the logic of checking the first ballot is hidden inside the call from point 2.

@noamnelke
Copy link
Member

Under normal circumstances, the smeshing service will NOT need to ask for the previous ballot. This is only relevant when bringing up a new smeshing service for an existing identity (rare).

I would put this logic on the smeshing service and only have it ask for a potential missing ballot if needed.

@pigmej
Copy link
Member

pigmej commented Feb 3, 2025

bringing up a new smeshing service for an existing identity (rare).

It could not be that rare in cloud setups where someone forgets to keep local DB, etc. And sadly, there is a really high penalty for that mistake.

Can we leave it as post beta for some case optimization?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants