-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] About vacuum interface. #140
Comments
Hi @triplesheep The docs for Vacuum can be found here When vacuum is called with time parameter it will override the default retention period. If the time in not provided it will take the default retention period. So for your example, it would remove files older than 14 days. |
Hi @rahulsmahadev . I know vacuum will override the default retention period and in my case I supposed to remove files older than 14 days. However, after reading the code I found that it may delete remove files older than 7 days. For current snapshot, we use modify time of AddFileAction to filter and use delete time of RemoveFileAction to filter. Because when snapshot constructing stateReconstruction, it will reply the commit log and form the Acition collection. During that constructing, it will filter out those RemoveFileAction older than TOMBSTONE_RETENTION not than the params we set in vacuum interface and this is my question. Thanks for reply :) |
@triplesheep You are correct, this is a legitimate bug. |
@brkyvz Hi :) I have try to fix it and post a pr. |
* Expression framework prototype (delta-io#37) * WIP; have a basic skeleton for expressions; have a column resolver; todo comparator * WIP; figuring out comparators * finished first pass at a basic skeleton, using just Ints and Booleans Literal types * add leaf expression; make expression and predicate both ABCs; add children() and bound() methods * add nullSafeBoundEval method to BinaryExpression * add verifyInputDataTypes function to Expression * big refactor; add DataType to Column constructor; no more need for 'bound' checks; use nullSafeEval; refactor Comparator usage; short circuit null checks in eval * rename createColumn to column * Update ExpressionSuite.scala * add IsNotNull predicate; test; null check to Column::eval * make Expression interface; add back Predicate interface with default dataType field; make more member fields final * add Not expression; add nullSafeEval to UnaryExpression; add test for Not expr * added interfaces * add RowRecordBuilder; remove duplicate ClosableIterator * add newline to LogStore.java * update interface for OptimisticTransaction with javadoc * update DeltaLog; remove RowRecordBuilder; remove RowRecord build interface * update Operation; add writerId to CommitInfo.java * minor comment change * update javadoc for CommitResult and OptTxn * fix typo * add 2 new public methods to OptTxn interface * add asParquet method to StructType * Update Operation.java * rename writerId to engineInfo * respond to PR comments; fix Operation enum str; remove StructType asParquet; fix LogStore version
About vacuum(time) -> deleteBeforeTimestamp, it will select files that will be deleted as following (let's ignore directory this case).
I suppose it works like:
All files - files: (files.modifyTime > deleteBeforeTimestamp)
- files: (currentSnapshot.addFiles)
- files: (all_removeFiles.deleteTime > deleteBeforeTimestamp)
However it seems works like:
All files - files: (files.modifyTime > deleteBeforeTimestamp)
- files: (currentSnapshot.addFiles)
- files: (currentSnapshot.removeFile.deleteTime > deleteBeforeTimestamp)
The difference between all_removeFiles.deleteTime and currentSnapshot.removeFile.deleteTime is that currentSnapshot.removeFile will filter out those deleteTime < minFileRetentionTimestamp(computed by TOMBSTONE_RETENTION). So it may delete some RemoveFile with removeFiles.deleteTime > deleteBeforeTimestamp but removeFiles.deleteTime < minFileRetentionTimestamp when set vacuum time param > TOMBSTONE_RETENTION.
For example:
Vacuum time param: 14 days
TOMBSTONE_RETENTION: 7 days
Vacuum will delete files with RemoveFile.deleteTime earlier than 7 days rather than earlier 14 days.
I want to confirm if it works lining with expectations
The text was updated successfully, but these errors were encountered: