Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[no-release-notes] go/store/nbs: During a GC process, take dependencies on chunks that are read through the ChunkStore. #8760

Merged
merged 8 commits into from
Jan 28, 2025

Conversation

reltuk
Copy link
Contributor

@reltuk reltuk commented Jan 16, 2025

Previously GC was constructed to walk the transitive closure of reachable chunks from when it started, and to take additional dependencies on any chunks that were written to the ChunkStore during the GC process. This change makes it so that we can additionally take dependencies on any chunks that are read from the ChunkStore during the GC process.

For the nbs layer to signal a new dependency during a GC, it makes use of a keeperFunc, which is passed into the BeginGC call. The contract is that, if keeperFunc is non-nil and it returns false, the chunk address which was given to the function will make its way back to a call to SaveHashes on some MarkAndSweeper whose contents will end up in the final store once the GC is completed. keeperFunc, however, is allowed to return true, which signals to the block store that the in-progress GC has passed a certain point in time and can no longer make that guarantee. As a result, anytime keeperFunc returns true, the block store implementation needs to block until the GC process is over, and then it should resume the operation from the beginning.

That machinery already exists for the Put() and Commit() use case. This PR adds machinery to do all of that on various read paths that explicitly touch chunks, such as Has, HasMany, Get, GetMany and GetManyCompressed. In order to add dependency tracking on the read path, it's important for the GC process itself to not cause the chunks it has read to form additional dependencies. This PR makes a slight change to markAndSweeper so that it can fetch chunks from the source store without recording unnecessary duplicate dependencies and potentially becoming mired in deadlocks.

…equests so that a GC does not end while they are in progress.
…chunks which were in the memtable but get filtered out because they are already in the store.
…h for chunks that are written but which are already present in the store.
…g, make it so that reading chunks as part of the GC process does not take further dependencies on them and never blocks on waitForGC.
@coffeegoddd
Copy link
Contributor

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000
version result total
e423a5c ok 5937457
version total_tests
e423a5c 5937457
correctness_percentage
100.0

@coffeegoddd
Copy link
Contributor

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000
version result total
6ad9b37 ok 5937457
version total_tests
6ad9b37 5937457
correctness_percentage
100.0

@reltuk reltuk requested a review from max-hoffman January 23, 2025 21:39
@coffeegoddd
Copy link
Contributor

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000
version result total
5c04d5f ok 5937457
version total_tests
5c04d5f 5937457
correctness_percentage
100.0

Copy link
Contributor

@max-hoffman max-hoffman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just the one question about locking store access on block

defer gcs.newGen.mu.RUnlock()
return gcs.hasMany(toHasRecords(hashes))
func (gcs *GenerationalNBS) HasMany(ctx context.Context, hashes hash.HashSet) (hash.HashSet, error) {
absent, err := gcs.newGen.HasMany(ctx, hashes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the diff just from collapsing gcs.getMany? or is there more to this?

Comment on lines +234 to +235
err := nbs.waitForGC(ctx)
nbs.mu.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm sure this is intentional, but is it better to block here than let other sessions try to do partial work before blocking on the same wait? ex: marked chunks cache hits under this miss

@reltuk reltuk merged commit 78e9a8a into main Jan 28, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants