Vacuum parallel delete is affected by AQE #810

Kimahriman · 2021-10-14T14:46:32Z

Currently the vacuum.parallelDelete.enabled flag uses the default shuffle partitions as the parallelism control for deleting, since it is the result of a join. However, when the adaptive query engine is enabled, this affects the partitioning of the join, and since it is purely on file metadata, the resulting shuffle is small and AQE will merge these to a very small number of partitions regardless of your default partition setting.

This wasn't a huge deal since AQE is disabled by default, but in Spark 3.2 it will be enabled by default so this will affect more people.

Possible solutions:

Log something if AQE is enabled saying it will affect the parallelism of the delete
Try to temporarily turn off AQE while doing the vacuum
Change the existing setting or add a new setting for exactly how many partitions you want, and just do a repartition before deleting to the count

The text was updated successfully, but these errors were encountered:

rahulsmahadev · 2021-10-14T16:52:58Z

Thanks for reporting this, do you want to work on the potential fix here ?

tdas · 2021-10-14T17:11:09Z

Yeah, I am inclined to simply turn off AQE and maintain the previous behavior. we can do a simple check.. if parallel delete is enabled, then disable AQE before starting the job.

Kimahriman · 2021-10-14T18:59:52Z

Yeah I can work on a fix, just wanted to get a consensus on the solution. I'll play around with disabling AQE for it

tdas · 2021-10-14T20:11:39Z

we are planning to make a delta 1.1 release soon since spark 3.2 is about to be released. so would be awesome if we are able to release with this fix.

Kimahriman · 2021-10-14T20:33:26Z

I'll try to get figure it out quick, I want to upgrade asap too, glad to hear you are planning to support 3.2 quickly

Kimahriman · 2021-10-14T23:13:51Z

Got something thrown together so let me know your thoughts. Verified locally

rahulsmahadev added the acknowledged This issue has been read and acknowledged by Delta admins label Oct 14, 2021

tdas added this to the 1.1.0 milestone Oct 14, 2021

Kimahriman mentioned this issue Oct 14, 2021

Repartition when performing a vacuum parallel delete to avoid AQE coalescing #811

Closed

martinstuder mentioned this issue Oct 26, 2021

Update Delta to support Apache Spark 3.2 #733

Closed

zsxwing added the waiting for merge label Oct 26, 2021

Yaohua628 closed this as completed in 7f46e91 Oct 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vacuum parallel delete is affected by AQE #810

Vacuum parallel delete is affected by AQE #810

Kimahriman commented Oct 14, 2021

rahulsmahadev commented Oct 14, 2021

tdas commented Oct 14, 2021

Kimahriman commented Oct 14, 2021

tdas commented Oct 14, 2021

Kimahriman commented Oct 14, 2021

Kimahriman commented Oct 14, 2021

Vacuum parallel delete is affected by AQE #810

Vacuum parallel delete is affected by AQE #810

Comments

Kimahriman commented Oct 14, 2021

rahulsmahadev commented Oct 14, 2021

tdas commented Oct 14, 2021

Kimahriman commented Oct 14, 2021

tdas commented Oct 14, 2021

Kimahriman commented Oct 14, 2021

Kimahriman commented Oct 14, 2021