Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] About vacuum interface. #140

Open
triplesheep opened this issue Aug 23, 2019 · 4 comments
Open

[Bug] About vacuum interface. #140

triplesheep opened this issue Aug 23, 2019 · 4 comments
Labels
acknowledged This issue has been read and acknowledged by Delta admins bug Something isn't working

Comments

@triplesheep
Copy link

About vacuum(time) -> deleteBeforeTimestamp, it will select files that will be deleted as following (let's ignore directory this case).
I suppose it works like:
All files - files: (files.modifyTime > deleteBeforeTimestamp)
- files: (currentSnapshot.addFiles)
- files: (all_removeFiles.deleteTime > deleteBeforeTimestamp)
However it seems works like:
All files - files: (files.modifyTime > deleteBeforeTimestamp)
- files: (currentSnapshot.addFiles)
- files: (currentSnapshot.removeFile.deleteTime > deleteBeforeTimestamp)
The difference between all_removeFiles.deleteTime and currentSnapshot.removeFile.deleteTime is that currentSnapshot.removeFile will filter out those deleteTime < minFileRetentionTimestamp(computed by TOMBSTONE_RETENTION). So it may delete some RemoveFile with removeFiles.deleteTime > deleteBeforeTimestamp but removeFiles.deleteTime < minFileRetentionTimestamp when set vacuum time param > TOMBSTONE_RETENTION.

For example:
Vacuum time param: 14 days
TOMBSTONE_RETENTION: 7 days
Vacuum will delete files with RemoveFile.deleteTime earlier than 7 days rather than earlier 14 days.

I want to confirm if it works lining with expectations

@rahulsmahadev
Copy link
Collaborator

rahulsmahadev commented Aug 23, 2019

Hi @triplesheep

The docs for Vacuum can be found here
https://docs.delta.io/latest/delta-utility.html

When vacuum is called with time parameter it will override the default retention period. If the time in not provided it will take the default retention period.

So for your example, it would remove files older than 14 days.

@triplesheep
Copy link
Author

Hi @rahulsmahadev . I know vacuum will override the default retention period and in my case I supposed to remove files older than 14 days. However, after reading the code I found that it may delete remove files older than 7 days. For current snapshot, we use modify time of AddFileAction to filter and use delete time of RemoveFileAction to filter. Because when snapshot constructing stateReconstruction, it will reply the commit log and form the Acition collection. During that constructing, it will filter out those RemoveFileAction older than TOMBSTONE_RETENTION not than the params we set in vacuum interface and this is my question. Thanks for reply :)

@brkyvz
Copy link
Collaborator

brkyvz commented Aug 25, 2019

@triplesheep You are correct, this is a legitimate bug.

@brkyvz brkyvz added the bug Something isn't working label Aug 25, 2019
@triplesheep triplesheep changed the title [Questions] About vacuum interface. [Bug] About vacuum interface. Aug 25, 2019
@triplesheep
Copy link
Author

@brkyvz Hi :) I have try to fix it and post a pr.

@allisonport-db allisonport-db added the acknowledged This issue has been read and acknowledged by Delta admins label Oct 7, 2021
tdas pushed a commit to tdas/delta that referenced this issue May 31, 2023
* Expression framework prototype (delta-io#37)

* WIP; have a basic skeleton for expressions; have a column resolver; todo comparator

* WIP; figuring out comparators

* finished first pass at a basic skeleton, using just Ints and Booleans Literal types

* add leaf expression; make expression and predicate both ABCs; add children() and bound() methods

* add nullSafeBoundEval method to BinaryExpression

* add verifyInputDataTypes function to Expression

* big refactor; add DataType to Column constructor; no more need for 'bound' checks; use nullSafeEval; refactor Comparator usage; short circuit null checks in eval

* rename createColumn to column

* Update ExpressionSuite.scala

* add IsNotNull predicate; test; null check to Column::eval

* make Expression interface; add back Predicate interface with default dataType field; make more member fields final

* add Not expression; add nullSafeEval to UnaryExpression; add test for Not expr

* added interfaces

* add RowRecordBuilder; remove duplicate ClosableIterator

* add newline to LogStore.java

* update interface for OptimisticTransaction with javadoc

* update DeltaLog; remove RowRecordBuilder; remove RowRecord build interface

* update Operation; add writerId to CommitInfo.java

* minor comment change

* update javadoc for CommitResult and OptTxn

* fix typo

* add 2 new public methods to OptTxn interface

* add asParquet method to StructType

* Update Operation.java

* rename writerId to engineInfo

* respond to PR comments; fix Operation enum str; remove StructType asParquet; fix LogStore version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledged This issue has been read and acknowledged by Delta admins bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants