Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] feat: Prune stale staking data hard fork + onwards pt. 2 #4505

Draft
wants to merge 28 commits into
base: dev
Choose a base branch
from

Conversation

adsorptionenthalpy
Copy link
Contributor

@adsorptionenthalpy adsorptionenthalpy commented Sep 18, 2023

This pull requests is a follow up to #4068

Stale staking data is defined as delegations to a validator which have
no rewards, no amount, and no undelegations. This definition, however,
excludes the (inactive) validator's self delegation since it is expected
to be at index 0, and a validator cannot be deleted at the moment. The
process works as below:

  • At the last block of every epoch including the hard fork epoch,
    obtain a list of validators. For each validator, load the
    ValidatorWrapper and iterate through Delegations to find stale
    delegations. Drop such delegations, and make a map of delegators as
    keys with validator addres(es) as the values.
  • Using the map obtained in (1), clear the delegations by delegator in
    offchain data. The map is therefore cached in the ProcessorResult for
    ease of access (between state_processor and node_newblock).
  • Since the objective of this change is to save some space following
    the hard fork, do not allow small undelegations. This means that
    undelegations must be made in such a manner that the remaining amount
    after the undelegation has been made is at least a 100 ONEs, or zero.
  • Lastly, since the removal results in different lengths between
    delegations of validator snapshots and validator wrappers, modify the
    algorithm for AddReward to be based on DelegatorAddress rather than
    the loop index. See core/state/statedb.go

The impact of this change on block processing speed is small. I ran a
node with this change at epoch 881 on mainnet using the rclone database
and saw that pruneStaleStakingData takes 18ms to complete (if each prune
is not logged individually, it takes 9ms).

The PR has been tested locally for integration testing (the undelegation
bit), and unit tests for pruning stale data, remaining undelegation, and
AddReward have been added. Goimports is included.

Issue

Described above.

Test

Unit Test Coverage

for file in $(git diff --name-status main..clear-stale-staking-data | sed s/\^..//); do go test -cover ./$(dirname $file)/; done | sort -f | uniq -i

Before:

?   	github.com/harmony-one/harmony/consensus/engine	[no test files]
?   	github.com/harmony-one/harmony/hmy	[no test files]
?   	github.com/harmony-one/harmony/internal/chain	[no test files]
?   	github.com/harmony-one/harmony/internal/params	[no test files]
?   	github.com/harmony-one/harmony/test/chain/reward	[no test files]
ok  	github.com/harmony-one/harmony/core	(cached)	coverage: 31.2% of statements
ok  	github.com/harmony-one/harmony/core/state	(cached)	coverage: 71.9% of statements
ok  	github.com/harmony-one/harmony/core/vm	(cached)	coverage: 42.4% of statements
ok  	github.com/harmony-one/harmony/hmy/downloader	(cached)	coverage: 75.9% of statements
ok  	github.com/harmony-one/harmony/node/worker	(cached)	coverage: 27.1% of statements
ok  	github.com/harmony-one/harmony/staking	(cached)	coverage: 89.1% of statements
ok  	github.com/harmony-one/harmony/staking/types	(cached)	coverage: 62.9% of statements

After:

?   	github.com/harmony-one/harmony/consensus/engine	[no test files]
?   	github.com/harmony-one/harmony/hmy	[no test files]
?   	github.com/harmony-one/harmony/internal/params	[no test files]
?   	github.com/harmony-one/harmony/test/chain/reward	[no test files]
ok  	github.com/harmony-one/harmony/core	(cached)	coverage: 31.1% of statements
ok  	github.com/harmony-one/harmony/core/state	(cached)	coverage: 75.6% of statements
ok  	github.com/harmony-one/harmony/core/vm	(cached)	coverage: 42.4% of statements
ok  	github.com/harmony-one/harmony/hmy/downloader	(cached)	coverage: 76.0% of statements
ok  	github.com/harmony-one/harmony/internal/chain	(cached)	coverage: 2.8% of statements
ok  	github.com/harmony-one/harmony/node/worker	(cached)	coverage: 27.0% of statements
ok  	github.com/harmony-one/harmony/staking	(cached)	coverage: 89.1% of statements
ok  	github.com/harmony-one/harmony/staking/types	(cached)	coverage: 63.2% of statements

Per-line test coverage for files modified by this PR is available here

Test/Run Logs

{'blockNum': 23232511,
 'caller': '/home/user/go/src/github.com/harmony-one/harmony/internal/chain/engine.go:364',
 'elapsed time': 18,
 'epoch': 881,
 'level': 'info',
 'message': 'pruneStaleStakingData',
 'time': '2022-02-23T08:43:54.259281751Z'}

Full logs show that stale data was removed from 472 validators (there were 676 overall).

Operational Checklist

  1. Does this PR introduce backward-incompatible changes to the on-disk data structure and/or the over-the-wire protocol?. (If no, skip to question 8.)
    No.

  2. Describe the migration plan.. For each flag epoch, describe what changes take place at the flag epoch, the anticipated interactions between upgraded/non-upgraded nodes, and any special operational considerations for the migration.

  3. Describe how the plan was tested.

  4. How much minimum baking period after the last flag epoch should we allow on Pangaea before promotion onto mainnet?

  5. What are the planned flag epoch numbers and their ETAs on Pangaea?

  6. What are the planned flag epoch numbers and their ETAs on mainnet?

    Note that this must be enough to cover baking period on Pangaea.

  7. What should node operators know about this planned change?

  8. Does this PR introduce backward-incompatible changes NOT related to on-disk data structure and/or over-the-wire protocol? (If no, continue to question 11.)
    No.

  9. Does the existing node.sh continue to work with this change?

  10. What should node operators know about this change?

  11. Does this PR introduce significant changes to the operational requirements of the node software, such as >20% increase in CPU, memory, and/or disk usage?
    No, a bulk pruning takes up only 18ms extra at the hard fork epoch. It is reasonable to expect continuous pruning at the end of each epoch to not take more time than that.

@adsorptionenthalpy adsorptionenthalpy self-assigned this Sep 18, 2023
@adsorptionenthalpy adsorptionenthalpy changed the title feat: Prune stale staking data hard fork + onwards pt. 2 [WIP] feat: Prune stale staking data hard fork + onwards pt. 2 Sep 18, 2023
@adsorptionenthalpy adsorptionenthalpy added the WIP Work in progress don't merge yet! label Sep 18, 2023
@ONECasey ONECasey requested review from ONECasey and Frozen September 19, 2023 21:29
@ONECasey ONECasey requested a review from sophoah September 21, 2023 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Work in progress don't merge yet!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants