-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vacuum parallel delete is affected by AQE #810
Comments
Thanks for reporting this, do you want to work on the potential fix here ? |
Yeah, I am inclined to simply turn off AQE and maintain the previous behavior. we can do a simple check.. if parallel delete is enabled, then disable AQE before starting the job. |
Yeah I can work on a fix, just wanted to get a consensus on the solution. I'll play around with disabling AQE for it |
we are planning to make a delta 1.1 release soon since spark 3.2 is about to be released. so would be awesome if we are able to release with this fix. |
I'll try to get figure it out quick, I want to upgrade asap too, glad to hear you are planning to support 3.2 quickly |
Got something thrown together so let me know your thoughts. Verified locally |
Currently the
vacuum.parallelDelete.enabled
flag uses the default shuffle partitions as the parallelism control for deleting, since it is the result of a join. However, when the adaptive query engine is enabled, this affects the partitioning of the join, and since it is purely on file metadata, the resulting shuffle is small and AQE will merge these to a very small number of partitions regardless of your default partition setting.This wasn't a huge deal since AQE is disabled by default, but in Spark 3.2 it will be enabled by default so this will affect more people.
Possible solutions:
The text was updated successfully, but these errors were encountered: