[no-release-notes] go/store/nbs: During a GC process, take dependencies on chunks that are read through the ChunkStore. #8760

reltuk · 2025-01-16T20:00:25Z

Previously GC was constructed to walk the transitive closure of reachable chunks from when it started, and to take additional dependencies on any chunks that were written to the ChunkStore during the GC process. This change makes it so that we can additionally take dependencies on any chunks that are read from the ChunkStore during the GC process.

For the nbs layer to signal a new dependency during a GC, it makes use of a keeperFunc, which is passed into the BeginGC call. The contract is that, if keeperFunc is non-nil and it returns false, the chunk address which was given to the function will make its way back to a call to SaveHashes on some MarkAndSweeper whose contents will end up in the final store once the GC is completed. keeperFunc, however, is allowed to return true, which signals to the block store that the in-progress GC has passed a certain point in time and can no longer make that guarantee. As a result, anytime keeperFunc returns true, the block store implementation needs to block until the GC process is over, and then it should resume the operation from the beginning.

That machinery already exists for the Put() and Commit() use case. This PR adds machinery to do all of that on various read paths that explicitly touch chunks, such as Has, HasMany, Get, GetMany and GetManyCompressed. In order to add dependency tracking on the read path, it's important for the GC process itself to not cause the chunks it has read to form additional dependencies. This PR makes a slight change to markAndSweeper so that it can fetch chunks from the source store without recording unnecessary duplicate dependencies and potentially becoming mired in deadlocks.

…equests so that a GC does not end while they are in progress.

…pendencies on chunks during reads.

…chunks which were in the memtable but get filtered out because they are already in the store.

…h for chunks that are written but which are already present in the store.

…g, make it so that reading chunks as part of the GC process does not take further dependencies on them and never blocks on waitForGC.

coffeegoddd · 2025-01-16T20:35:17Z

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`e423a5c`	ok	5937457

version	total_tests
`e423a5c`	5937457

correctness_percentage
100.0

coffeegoddd · 2025-01-16T22:27:07Z

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`6ad9b37`	ok	5937457

version	total_tests
`6ad9b37`	5937457

correctness_percentage
100.0

…getManyWithFunc.

coffeegoddd · 2025-01-23T22:14:42Z

@reltuk DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
`5c04d5f`	ok	5937457

version	total_tests
`5c04d5f`	5937457

correctness_percentage
100.0

max-hoffman

LGTM, just the one question about locking store access on block

max-hoffman · 2025-01-24T21:10:29Z

go/store/nbs/generational_chunk_store.go

-	defer gcs.newGen.mu.RUnlock()
-	return gcs.hasMany(toHasRecords(hashes))
+func (gcs *GenerationalNBS) HasMany(ctx context.Context, hashes hash.HashSet) (hash.HashSet, error) {
+	absent, err := gcs.newGen.HasMany(ctx, hashes)


is the diff just from collapsing gcs.getMany? or is there more to this?

max-hoffman · 2025-01-24T21:21:16Z

go/store/nbs/store.go

+		err := nbs.waitForGC(ctx)
+		nbs.mu.Unlock()


i'm sure this is intentional, but is it better to block here than let other sessions try to do partial work before blocking on the same wait? ex: marked chunks cache hits under this miss

reltuk added 6 commits January 16, 2025 11:43

go/store/nbs: store.go: Add a mechanism to bracket outstanding read r…

3c1a365

…equests so that a GC does not end while they are in progress.

go/store/nbs: chunkReader,chunkSource: GC: Add the ability to take de…

f099fdc

…pendencies on chunks during reads.

go/store/nbs: Add GC keeper calls on reads through NomsBlockStore.

26c909d

go/store/nbs: tablePersister: Add GC dependency capturing to written …

4a59dcb

…chunks which were in the memtable but get filtered out because they are already in the store.

go/store/nbs: table_set: append: Thread GC dependency tracking throug…

78ee20a

…h for chunks that are written but which are already present in the store.

go/store/nbs: store.go: MarkAndSweepChunks: After adding read trackin…

e423a5c

…g, make it so that reading chunks as part of the GC process does not take further dependencies on them and never blocks on waitForGC.

coffeegoddd added the correctness_approved label Jan 16, 2025

repofmt.sh.

6ad9b37

go/store/nbs: store.go: Fix errgroup context usage-after-Wait bug in …

5c04d5f

…getManyWithFunc.

reltuk requested a review from max-hoffman January 23, 2025 21:39

max-hoffman approved these changes Jan 24, 2025

View reviewed changes

reltuk merged commit 78e9a8a into main Jan 28, 2025
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[no-release-notes] go/store/nbs: During a GC process, take dependencies on chunks that are read through the ChunkStore. #8760

[no-release-notes] go/store/nbs: During a GC process, take dependencies on chunks that are read through the ChunkStore. #8760

reltuk commented Jan 16, 2025

coffeegoddd commented Jan 16, 2025

coffeegoddd commented Jan 16, 2025

coffeegoddd commented Jan 23, 2025

max-hoffman left a comment

max-hoffman Jan 24, 2025

max-hoffman Jan 24, 2025

[no-release-notes] go/store/nbs: During a GC process, take dependencies on chunks that are read through the ChunkStore. #8760

[no-release-notes] go/store/nbs: During a GC process, take dependencies on chunks that are read through the ChunkStore. #8760

Conversation

reltuk commented Jan 16, 2025

coffeegoddd commented Jan 16, 2025

coffeegoddd commented Jan 16, 2025

coffeegoddd commented Jan 23, 2025

max-hoffman left a comment

Choose a reason for hiding this comment

max-hoffman Jan 24, 2025

Choose a reason for hiding this comment

max-hoffman Jan 24, 2025

Choose a reason for hiding this comment