Fix rooted accounts cleanup, simplify locking #12194

carllin · 2020-09-11T23:10:23Z

Problem

There were some race conditions between accounts cleanup and accounts store
Cleanup logic was divided into too many areas getting very complicated/hard to understand

Summary of Changes

No more cleanup of old rooted storage entries in store(), store() now only purges storage entries from the same slot
Cleaning up rooted storage entries only occurs in clean() and do_shrink_slots(), via a single handle_reclaims() function
handle_reclaims no longer blocks the store() pipe;ine, so it will always force clean, removing the need for the dead slots counter

Fixes #

codecov · 2020-09-12T00:17:41Z

Codecov Report

Merging #12194 into master will decrease coverage by 0.0%.
The diff coverage is 72.4%.

@@            Coverage Diff            @@
##           master   #12194     +/-   ##
=========================================
- Coverage    82.0%    81.9%   -0.1%     
=========================================
  Files         354      354             
  Lines       82643    82754    +111     
=========================================
+ Hits        67788    67857     +69     
- Misses      14855    14897     +42

runtime/src/accounts_db.rs

ryoqun · 2020-09-14T03:42:55Z

runtime/src/accounts_db.rs

+                if dead_slots.len() == 1 {
+                    assert!(dead_slots.contains(&expected_slot));
+                }


nits (I think this reads better):

if let Some(dead_slot) = dead_slots.first() { assert_eq!(dead_slot, expected_slot); }

unfortunately I don't think Hashset has a first() method 😢

.iter().next() tho?

or .values().first()? Anyway, this isn't important at all. :)

ryoqun · 2020-09-14T04:19:10Z

runtime/src/accounts_db.rs

@@ -476,7 +474,6 @@ impl Default for AccountsDB {
            min_num_stores: num_threads,
            bank_hashes: RwLock::new(bank_hashes),
            frozen_accounts: HashMap::new(),
-            dead_slots: RwLock::new(HashSet::new()),


runtime/src/accounts_db.rs

ryoqun · 2020-09-14T05:50:33Z

runtime/src/accounts_index.rs

            reclaims.extend(list.iter().filter(|(f, _)| *f == slot).cloned());
            list.retain(|(f, _)| *f != slot);

            lock.0.fetch_add(1, Ordering::Relaxed);
            list.push((slot, account_info));
-            // now, do lazy clean
-            self.purge_older_root_entries(&mut list, reclaims);


I think this is generally good direction. But, let's surface the consequences for quick consensus.

Is it ok to accept (1) slightly bigger index size, (2) more visible spikes at each clean_accounts(), and (3) amortized longer lookup time for every account index read for frequently-updated accounts (because of larger list)?

If so, why older_roots has been purged here to begin with? Just because of lack of independent older_root cleaning mechanism at the ancient times?

Also, the index size will be more easily controlled by malicious tx pattern.

I think these concerns are non-issue for now and only hypothetical worries.

As a good by-product, with this relaxed attitude to the index size, we pave a way for rather straightforward implmentation for #11161. Or, even snapshot commitments (now that capitalization check (#11927 ) is enabled, there is some need for this as well).

@ryoqun, yeah I think 1, 2, an 3 are acceptable tradeoffs.

"If so, why older_roots has been purged here to begin with? Just because of lack of independent older_root cleaning mechanism at the ancient times?"

Yeah, I think it was just a byproduct of the initial implementation 😸

ryoqun

LGTM with nits!

Welcome to AccountsDB land with such a great to improvement. :)

Really thanks for spotting and fixing the race condition...

Pull request has been modified.

ryoqun · 2020-09-15T00:22:09Z

@carllin Did you notice any performance difference? Like any noticeable store throughput increase with slightly more memory usage? :)

ryoqun

LGTM in code wise. Curious about the perf. impact as well.

ryoqun · 2020-09-15T00:27:34Z

@carllin Did you notice any performance difference? Like any noticeable store throughput increase with slightly more memory usage? :)

Also, does this somewhat alleviates the get_program_accounts bottleneck? Not much because accouns_index is still there?

ryoqun · 2020-09-15T00:30:56Z

Also, I'd rather want to ensure that this pr doesn't create bad (=unloadable) snapshosts after running a while against testnet/mainnet-beta.

carllin · 2020-09-15T00:51:29Z

@carllin Did you notice any performance difference? Like any noticeable store throughput increase with slightly more memory usage? :)

@ryoqun When I was running against a small 5 node cluster, it didn't seem to affect things too much

Also, does this somewhat alleviates the get_program_accounts bottleneck? Not much because accouns_index is still there?

Yeah I suspect that bottleneck won't improve until I also fix the accounts index

Also, I'd rather want to ensure that this pr doesn't create bad (=unloadable) snapshosts after running a while against testnet/mainnet-beta.

Yeah should I upgrade one of our testnet/mainnet nodes with this fix first?

ryoqun · 2020-09-15T01:04:27Z

@carllin

Yeah should I upgrade one of our testnet/mainnet nodes with this fix first?

Or, you can just run a GCE instance running a validator ad-hoc on it, connecting to the testnet/mainnet.

In both of cases, you'll need to ssh to run while (cp $(ls -tr snapshot*tar.zst | tail -n 1) ./path/to/test/dir && solana-ledger-tool verify --ledger ./path/to/test/dir; do sleep 60; done or similar.

I think few hours is enough.

And cross your fingers. :)

sakridge

looks good to me as well

carllin · 2020-09-15T02:47:40Z

@ryoqun I've set up a validator on MB with these changes and the snapshot validation running at 34.83.141.238. Fingers very much crossed 🤞

Pull request has been modified.

…date()

Co-authored-by: Carl Lin <[email protected]>

ryoqun · 2020-10-16T03:04:43Z

FYI: @carllin this is one of testnet validators which updated to v1.4. nothing odd; it just looks like set_root got a lot faster and unref got slower.

carllin · 2020-10-16T03:15:15Z

@ryoqun oh interesting! The unref is probably slower because there are a lot more dead pubkeys to clean in clean_accounts here because we aggregate the clean?

for (slot, pubkey) in slot_pubkeys {
                if let Some(ref mut purged_account_slots) = purged_account_slots {
                    purged_account_slots.entry(pubkey).or_default().insert(slot);
                }
                index.unref_from_storage(&pubkey);
            }

ryoqun · 2020-10-16T03:23:51Z

yeah, might be true. :)

carllin requested review from ryoqun and sakridge September 11, 2020 23:10

carllin force-pushed the FixAccountsDb branch 2 times, most recently from 6c89460 to 91c208a Compare September 11, 2020 23:12

sakridge reviewed Sep 13, 2020

View reviewed changes

runtime/src/accounts_db.rs Show resolved Hide resolved

carllin force-pushed the FixAccountsDb branch from 2d67721 to c0a4f51 Compare September 13, 2020 20:13