Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No need to consider ConcurrentAppendException/ConcurrentDeleteReadException when check commit conflict with Compaction's commit Actions #326

Open
windpiger opened this issue Feb 20, 2020 · 8 comments
Labels
acknowledged This issue has been read and acknowledged by Delta admins assessment Assessing issue or PR

Comments

@windpiger
Copy link
Contributor

windpiger commented Feb 20, 2020

I have a Scenario related to Conflict check, that is when the current txn check conflict with a already committed Compaction's actions(AddFile&RemoveFile), should we need to consider ConcurrentAppendException/ConcurrentDeleteReadException?
I think there is no need to do these two Exceptions conflict check, but we should keep the last ConcurrentDeleteDeleteException in this Scenario.
Because the actions of Compation is dataChange=false, it just rerange the data, not modify it.

  1. ConcurrentAppendException
    the data of the new AddFile in Compaction's actions should already be read by the current txn.
  2. ConcurrentDeleteReadException
    the data of the RemoveFile in Compaction's actions does not be deleted, they just reranger
    to other files. So it does not affect the AddFiles read by the current txn, even they deleted by Compaction.

Can any one help to review the logic? @marmbrus @zsxwing Thanks!
If it is correct, we can reduce some conflict.

@tdas
Copy link
Contributor

tdas commented Feb 25, 2020

Yes, you are right. The current logic in OptimisticTransaction.checkAndRetry only handles the case where the current transaction is a Compaction, and makes it not conflict with other many other concurrent transactions. It does not handle the case in the best way where a concurrent transaction is a Compaction and current non-Compaction transaction should have minimal conflicts with the Compaction. And I also think, at a high-level, your suggestion is correct. It would be awesome if you can make PR that modifies the logic in OptimisticTransaction.checkAndRetry to account for this. The main challenge is to add rigorous tests.

@tdas
Copy link
Contributor

tdas commented Feb 25, 2020

@mukulmurthy and I had caught this corner some time ago and discussed the exact same approach.

@windpiger
Copy link
Contributor Author

Yes, you are right. The current logic in OptimisticTransaction.checkAndRetry only handles the case where the current transaction is a Compaction, and makes it not conflict with other many other concurrent transactions. It does not handle the case in the best way where a concurrent transaction is a Compaction and current non-Compaction transaction should have minimal conflicts with the Compaction. And I also think, at a high-level, your suggestion is correct. It would be awesome if you can make PR that modifies the logic in OptimisticTransaction.checkAndRetry to account for this. The main challenge is to add rigorous tests.

Thanks for your reply! I'd like to make pr for this.

@tdas
Copy link
Contributor

tdas commented Feb 25, 2020

Yay! Happy to review it whenever you make one.

@pranavanand
Copy link
Contributor

@windpiger any updates on this?

@windpiger
Copy link
Contributor Author

@windpiger any updates on this?
Recently I am working on other things, I will working on this later. If you have some update for this issue, we can discuss here, thanks!

@zmeir
Copy link

zmeir commented Dec 31, 2020

I ran into the same issue myself and started looking into OptimisticTransaction.checkAndRetry.
I might be missing something, but it looks like I just need to ignore files where dataChange is false when creating predicatesMatchingAddedFiles (to avoid throwing ConcurrentAppendException) and deleteReadOverlap (to avoid throwing ConcurrentDeleteReadException).

Specifically, this is what I changed:

For ConcurrentAppendException:

// before
val conflictingFile = DeltaLog.filterFileList(
  metadata.partitionSchema,
  addedFilesToCheckForConflicts.toDF(), p :: Nil).as[AddFile].take(1)

// after
val conflictingFile = DeltaLog.filterFileList(
  metadata.partitionSchema,
  addedFilesToCheckForConflicts.filter(_.dataChange).toDF(), p :: Nil).as[AddFile].take(1)

For ConcurrentDeleteReadException:

// before
val deleteReadOverlap = removedFiles.find(r => readFilePaths.contains(r.path))

// after
val deleteReadOverlap = removedFiles.find(r => r.dataChange & readFilePaths.contains(r.path))

@zmeir
Copy link

zmeir commented Feb 25, 2021

Just an update: I ran extensive tests with the above changes and this solution seems to perform as required. I'll attempt to submit a PR when I have the time, but if anyone else want to test this solution themselves- it's only 2 lines of code :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
acknowledged This issue has been read and acknowledged by Delta admins assessment Assessing issue or PR
Projects
None yet
Development

No branches or pull requests

5 participants