Transfer total slots calculation to smesher service #6624

poszu · 2025-01-16T11:01:27Z

No description provided.

poszu · 2025-01-16T12:39:12Z

Moving this calculation to the smeshing service might not give the expected benefit. Currently, slots are calculated on the node-service like this:

go-spacemesh/miner/proposal_builder.go

Lines 555 to 575 in 837e7fa

    
           ballot, err := ballots.FirstInEpoch(pb.db, s.atx, s.epoch) 
        
           if err != nil && !errors.Is(err, sql.ErrNotFound) { 
        
           	return fmt.Errorf("get refballot %w", err) 
        
           } 
        
           if errors.Is(err, sql.ErrNotFound) { 
        
           	s.beacon = pb.shared.beacon 
        
           	s.eligibilities.slots = proposals.MustGetNumEligibleSlots( 
        
           		s.atxWeight, 
        
           		minweight.Select(lid.GetEpoch(), pb.cfg.minActiveSetWeight), 
        
           		pb.shared.active.weight, 
        
           		pb.cfg.layerSize, 
        
           		pb.cfg.layersPerEpoch, 
        
           	) 
        
           } else { 
        
           	if ballot.EpochData == nil { 
        
           		return fmt.Errorf("atx %d created invalid first ballot", s.atx) 
        
           	} 
        
           	s.ref = ballot.ID() 
        
           	s.beacon = ballot.EpochData.Beacon 
        
           	s.eligibilities.slots = ballot.EpochData.EligibilityCount 
        
           }

AFAIR, the calculation might return a different result depending on which node is asked or when it is asked (what view on atxSetWeight it has) . Because the smesher must be consistent about its slots throughout the epoch, it mustn't change its mind after publihing the first ballot.

Thus, to remove the API to calculate the slots I would need to add another one to obtain the first ballot. At this point, the entire effort loses the point because it's trading 1 API call for another...

noamnelke · 2025-01-20T09:56:03Z

I don't fully understand the dilemma here. The flow should go like this:

Right before the start of the epoch, the smesher should determine their own eligibility for the epoch that's about to start. There are two steps to this calculation:

Determine the number of eligibilities (requires the smesher knowing their own ATX weight and the total weight of all ATXs).
Determine which layer each eligibility is in.

Doing 1 requires access to state for the total weight, so can't be done be the smesher without node involvement. However, other than finding the total weight - the rest of the calculation is trivial. It's just dividing my weight by the total, multiplying by the desired number of ballots and rounding. That's it.

Doing 2 requires the private key, so cannot be done on the node.

I prefer to make most node endpoints cache-able and the return values being universally useful, so for that reason I prefer the node just saying the total weight and letting the smesher calculate the number of eligibilities. The only potential issue with this is that I think we publish the active set in the first ballot (do we?) and that has to be consistent with the total weight used for the eligibility calculation.

If that's not the case: don't change anything.

If it's the case: instead of only asking for the total weight, the smesher should ask for active set AND total weight. It can then use the total weight to calculate the number of eligibilities and keep the active set so it can be included in the first ballot.

Am I missing something?

poszu · 2025-01-21T08:29:50Z

I'm sorry I wasn't clear. The smesher's eligibility count is included in the first proposal in the EpochData:

go-spacemesh/common/types/ballot.go

Lines 242 to 249 in f31f02b

    
           // EpochData contains information that cannot be changed mid-epoch. 
        
           type EpochData struct { 
        
           	ActiveSetHash Hash32 
        
           	// the beacon value the smesher recorded for this epoch 
        
           	Beacon Beacon 
        
           	// total number of ballots the smesher is eligible in this epoch. 
        
           	EligibilityCount uint32 
        
           }

Once the first proposal is published, this count mustn't be changed (same for the beacon and active set hash). Normally, we look up the smesher's first ballot in the given epoch and pick the eligibility count from it, if it exists. I believe we should do the same in the node-split.

Imagine a situation when the user installs and runs the smeshing-service client on a new computer (the state is not preserved) after publishing its first ballot. Suppose it calculates the eligibility count from the active set weight and ATX weight (which also needs to be fetched from the node-service). In that case, it might get a different result because the active set might be different now (another node was asked or new ATXs were published).

I think that to make eligibility count calculation robust and do it on the smashing-service side, the flow would need to be something like this:

Ask the node-service for the first ballot. If found - take the eligibility count and beacon from the first ballot.
Else, ask the node-service for active set weight, the miner's own ATX weight and VRF nonce from its last ATX targeting this epoch (if not cached locally), and beacon.
Calculate the slots using weight and active set weight.
Calculate the proofs using the slots, beacon and VRF nonce (requires private key)

My claim is, that we cannot get rid of the call in the point in 1 and thus there is no real benefit of moving the calculation of the slots to the smeshing service side. In the current implementation, the logic of checking the first ballot is hidden inside the call from point 2.

noamnelke · 2025-01-21T15:03:12Z

Under normal circumstances, the smeshing service will NOT need to ask for the previous ballot. This is only relevant when bringing up a new smeshing service for an existing identity (rare).

I would put this logic on the smeshing service and only have it ask for a potential missing ballot if needed.

pigmej · 2025-02-03T08:43:52Z

bringing up a new smeshing service for an existing identity (rare).

It could not be that rare in cloud setups where someone forgets to keep local DB, etc. And sadly, there is a really high penalty for that mistake.

Can we leave it as post beta for some case optimization?

poszu self-assigned this Jan 16, 2025

poszu mentioned this issue Jan 16, 2025

Node Split beta spacemeshos/pm#338

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transfer total slots calculation to smesher service #6624

Transfer total slots calculation to smesher service #6624

poszu commented Jan 16, 2025

poszu commented Jan 16, 2025

noamnelke commented Jan 20, 2025

poszu commented Jan 21, 2025

noamnelke commented Jan 21, 2025

pigmej commented Feb 3, 2025

Transfer total slots calculation to smesher service #6624

Transfer total slots calculation to smesher service #6624

Comments

poszu commented Jan 16, 2025

poszu commented Jan 16, 2025

noamnelke commented Jan 20, 2025

poszu commented Jan 21, 2025

noamnelke commented Jan 21, 2025

pigmej commented Feb 3, 2025