-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Destination S3V2: Skip full metadata search when sync mode is append #52094
base: master
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
6fc0d9b
to
56b88b1
Compare
...main/kotlin/io/airbyte/cdk/load/state/object_storage/ObjectStorageDestinationStateManager.kt
Show resolved
Hide resolved
@@ -183,6 +219,15 @@ class ObjectStorageFallbackPersister( | |||
// Add a suffix matching an OPTIONAL -[0-9]+ ordinal | |||
val matcher = | |||
pathFactory.getPathMatcher(stream, suffixPattern = OPTIONAL_ORDINAL_SUFFIX_PATTERN) | |||
|
|||
if (!(stream.importType == Overwrite || stream.minimumGenerationId > 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment explaining why we we can do this optimization in each of these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely include something about these calls being particularly pathological for the append case due to the number of objects we page through increasing over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we should consider making this boolean logic an extension function to encapsulate the goofy booleaness.
...main/kotlin/io/airbyte/cdk/load/state/object_storage/ObjectStorageDestinationStateManager.kt
Show resolved
Hide resolved
@@ -126,18 +133,47 @@ class ObjectStorageDestinationState( | |||
} | |||
|
|||
/** Used to guarantee the uniqueness of a key */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment describing how we ensure uniqueness by looking up the ordinal postfix by searching s3 and incrementing here.
56b88b1
to
c2cb1b0
Compare
runTest { | ||
every { stream.importType } returns Overwrite | ||
every { stream.generationId } returns 1L | ||
every { stream.minimumGenerationId } returns 1L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably test the two conditions (minimumGenerationId > 0 and importType == Overwrite) separately — OR hide them behind a boolean method of some sort which we unit test separately.
@@ -32,7 +32,8 @@ class ObjectStorageDestinationStateTest { | |||
) | |||
|
|||
companion object { | |||
val stream1 = MockDestinationCatalogFactory.stream1 | |||
private val streamSource = MockDestinationCatalogFactory.stream1 | |||
val stream1 = streamSource.copy(minimumGenerationId = streamSource.generationId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this necessary?
assertEquals("cat-1", state.ensureUnique("cat")) | ||
assertEquals("turtle-101", state.ensureUnique("turtle")) | ||
assertEquals("turtle-102", state.ensureUnique("turtle")) | ||
assertEquals("spider", state.ensureUnique("spider")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add some test cases to validate keys that contain numbers—e.g. the cat-1-1 scenario.
As an aside, I really like parameterized tests for testing over a large corpus of inputs. Could be useful here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢 once confident
What
High level: skip metadata search when we're not truncating
Definition of "not truncating":
How
The big complication is that we use the initial search both to
key
->count
we can use to make-1
,-2
suffixes to avoid overwriteSo if we skip the search we still have to provide uniqueness guards:
If append-only:
If truncate:
Also
I dropped some unused constants and one unused method & promoted the unique key count to a mutable map (it should be even if behind a lock, because it's a suspend function)