-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Fix rooted accounts cleanup, simplify locking #12194
Conversation
6c89460
to
91c208a
Compare
Codecov Report
@@ Coverage Diff @@
## master #12194 +/- ##
=========================================
- Coverage 82.0% 81.9% -0.1%
=========================================
Files 354 354
Lines 82643 82754 +111
=========================================
+ Hits 67788 67857 +69
- Misses 14855 14897 +42 |
2d67721
to
c0a4f51
Compare
runtime/src/accounts_db.rs
Outdated
if dead_slots.len() == 1 { | ||
assert!(dead_slots.contains(&expected_slot)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits (I think this reads better):
if let Some(dead_slot) = dead_slots.first() {
assert_eq!(dead_slot, expected_slot);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately I don't think Hashset
has a first()
method 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.iter().next()
tho?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or .values().first()?
Anyway, this isn't important at all. :)
@@ -476,7 +474,6 @@ impl Default for AccountsDB { | |||
min_num_stores: num_threads, | |||
bank_hashes: RwLock::new(bank_hashes), | |||
frozen_accounts: HashMap::new(), | |||
dead_slots: RwLock::new(HashSet::new()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:byebye:
runtime/src/accounts_index.rs
Outdated
reclaims.extend(list.iter().filter(|(f, _)| *f == slot).cloned()); | ||
list.retain(|(f, _)| *f != slot); | ||
|
||
lock.0.fetch_add(1, Ordering::Relaxed); | ||
list.push((slot, account_info)); | ||
// now, do lazy clean | ||
self.purge_older_root_entries(&mut list, reclaims); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is generally good direction. But, let's surface the consequences for quick consensus.
Is it ok to accept (1) slightly bigger index size, (2) more visible spikes at each clean_accounts()
, and (3) amortized longer lookup time for every account index read for frequently-updated accounts (because of larger list
)?
If so, why older_roots
has been purged here to begin with? Just because of lack of independent older_root
cleaning mechanism at the ancient times?
Also, the index size will be more easily controlled by malicious tx pattern.
I think these concerns are non-issue for now and only hypothetical worries.
As a good by-product, with this relaxed attitude to the index size, we pave a way for rather straightforward implmentation for #11161. Or, even snapshot
commitments (now that capitalization check (#11927 ) is enabled, there is some need for this as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ryoqun, yeah I think 1, 2, an 3 are acceptable tradeoffs.
"If so, why older_roots has been purged here to begin with? Just because of lack of independent older_root cleaning mechanism at the ancient times?"
Yeah, I think it was just a byproduct of the initial implementation 😸
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with nits!
Welcome to AccountsDB land with such a great to improvement. :)
Really thanks for spotting and fixing the race condition...
Pull request has been modified.
3d1c228
to
b2d171c
Compare
@carllin Did you notice any performance difference? Like any noticeable store throughput increase with slightly more memory usage? :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in code wise. Curious about the perf. impact as well.
Also, does this somewhat alleviates the |
Also, I'd rather want to ensure that this pr doesn't create bad (=unloadable) snapshosts after running a while against testnet/mainnet-beta. |
@ryoqun When I was running against a small 5 node cluster, it didn't seem to affect things too much
Yeah I suspect that bottleneck won't improve until I also fix the accounts index
Yeah should I upgrade one of our testnet/mainnet nodes with this fix first? |
Or, you can just run a GCE instance running a validator ad-hoc on it, connecting to the testnet/mainnet. In both of cases, you'll need to I think few hours is enough. And cross your fingers. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me as well
@ryoqun I've set up a validator on MB with these changes and the snapshot validation running at |
b2d171c
to
3539f8d
Compare
Pull request has been modified.
39fa46a
to
0d48dca
Compare
Co-authored-by: Carl Lin <[email protected]>
FYI: @carllin this is one of testnet validators which updated to v1.4. nothing odd; it just looks like set_root got a lot faster and unref got slower. |
@ryoqun oh interesting! The unref is probably slower because there are a lot more dead pubkeys to clean in
|
yeah, might be true. :) |
Problem
Summary of Changes
handle_reclaims()
functionhandle_reclaims
no longer blocks the store() pipe;ine, so it will always force clean, removing the need for the dead slots counterFixes #