-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Change recursive_mutex to mutex in DatabaseRotatingImp #5276
base: develop
Are you sure you want to change the base?
Conversation
- Follow-up to #4989, which stated "Ideally, the code should be rewritten so it doesn't hold the mutex during the callback and the mutex should be changed back to a regular mutex."
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #5276 +/- ##
=======================================
Coverage 78.1% 78.2%
=======================================
Files 790 790
Lines 67623 67643 +20
Branches 8163 8166 +3
=======================================
+ Hits 52846 52864 +18
- Misses 14777 14779 +2
|
* Use a second mutex to protect the backends from modification * Remove a bunch of warning comments
// backendMutex_ is only needed when the *Backend_ members are modified. | ||
// Reads are protected by the general mutex_. | ||
std::mutex backendMutex_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this sounds like a typical single-write and one-or-more-read scenario, is it possible to use a single shared_mutex
here instead of these two mutexes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible, but there are risks. The biggest one is that I'd have to take a shared_lock
at the start of rotateWithLock
, and upgrade it to a unique_lock
after the callback. If there is somehow ever a second caller to that function, or even a different caller that upgrades the lock, there is a potential deadlock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bthomee @vvysokikh1 Ok, it took waaaaaaay longer than it should have because I kept trying clever things that didn't work or turned out unsupported, but I rewrote the locking, and changed to a shared mutex, and I think I've got a pretty foolproof solution here. And a unit test to exercise it.
But don't take my word for it. The point of code reviews is to spot the stuff I didn't consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your solution is not completely solving the issue. It's still technically possible to deadlock (calling rotateWithLock from inside of the callback, this will cause a deadlock on your new mutex).
If it's good enough for now, please leave some comments to rotateWithLock()
to warn any user of calling rotateWithLock()
directly or indirectly from callback.
* upstream/develop: Updates Conan dependencies (5256)
913df26
to
9f564bc
Compare
- Rewrite the locking in DatabaseRotatingImp::rotateWithLock to use a shared_lock, and write a unit test to show (as much as possible) that it won't deadlock.
13fb47c
to
d912b50
Compare
* upstream/develop: fix: Do not allow creating Permissioned Domains if credentials are not enabled (5275) fix: issues in `simulate` RPC (5265)
std::unique_lock writeLock(mutex_); | ||
if (!rotating) | ||
{ | ||
// Once this flag is set, we're committed to doing the work and | ||
// returning true. | ||
rotating = true; | ||
} | ||
else | ||
{ | ||
// This should only be reachable through unit tests. | ||
XRPL_ASSERT( | ||
unitTest_, | ||
"ripple::NodeStore::DatabaseRotatingImp::rotateWithLock " | ||
"unit testing"); | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to lock mutex here? I would assume we can make rotating
atomic bool and use compare_exchange to switch this flag safely.
auto const writableBackend = [&] { | ||
std::shared_lock readLock(mutex_); | ||
XRPL_ASSERT( | ||
rotating, | ||
"ripple::NodeStore::DatabaseRotatingImp::rotateWithLock rotating " | ||
"flag set"); | ||
|
||
return writableBackend_; | ||
}(); | ||
|
||
auto newBackend = f(writableBackend->getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these lambda and read lock are actually required with current implementation. We are only using write lock before (which might be switched to atomic) and after. Assuming previous synchronization block switches rotating flag, there should be no other 'write' thread being able to proceed and capture writeLock
while we are here.
|
||
clearCaches(validatedSeq); | ||
|
||
return std::move(newBackend); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you have changed the return type of rotateWithLock()
, in the future this callback could be executed but false might be returned. In this case you have moved newBackend
, and then clean it with setDeletePath().
Non-issue right now but maybe discard move unless you have strong perf concerns?
// This should only be reachable through unit tests. | ||
XRPL_ASSERT( | ||
unitTest_, | ||
"ripple::NodeStore::DatabaseRotatingImp::rotateWithLock " | ||
"unit testing"); | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment doesn't work. It can be reached not only with unit tests, but also by accidental concurrent call to rotateWithLock or indirect call to rotateWithLock from the callback.
// "Shared mutexes do not support direct transition from shared to unique | ||
// ownership mode: the shared lock has to be relinquished with | ||
// unlock_shared() before exclusive ownership may be obtained with lock()." | ||
mutable std::shared_timed_mutex mutex_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the reason for choosing timed mutex here? I believe shared_mutex would be enough here
High Level Overview of Change
Follow-up to #4989, which stated "Ideally, the code should be rewritten so it doesn't hold the mutex during the callback and the mutex should be changed back to a regular mutex."
This rewrites the code so that the lock is not held during the callback. Instead it locks twice, once before, and once after. This is safe due to the structure of the code, but is checked after the second lock. This allows
mutex_
to be changed back to a regular mutex.Context of Change
From #4989:
Type of Change
Test Plan
Testing can be the same as that for #4989, plus ensure that there are no regressions.