forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-32859][SQL] Introduce physical rule to decide bucketing dynami…
…cally ### What changes were proposed in this pull request? This PR is to add support to decide bucketed table scan dynamically based on actual query plan. Currently bucketing is enabled by default (`spark.sql.sources.bucketing.enabled`=true), so for all bucketed tables in the query plan, we will use bucket table scan (all input files per the bucket will be read by same task). This has the drawback that if the bucket table scan is not benefitting at all (no join/groupby/etc in the query), we don't need to use bucket table scan as it would restrict the # of tasks to be # of buckets and might hurt parallelism. The feature is to add a physical plan rule right after `EnsureRequirements`: The rule goes through plan nodes. For all operators which has "interesting partition" (i.e., require `ClusteredDistribution` or `HashClusteredDistribution`), check if the sub-plan for operator has `Exchange` and bucketed table scan (and only allow certain operators in plan (i.e. `Scan/Filter/Project/Sort/PartialAgg/etc`.), see details in `DisableUnnecessaryBucketedScan.disableBucketWithInterestingPartition`). If yes, disable the bucketed table scan in the sub-plan. In addition, disabling bucketed table scan if there's operator with interesting partition along the sub-plan. Why the algorithm works is that if there's a shuffle between the bucketed table scan and operator with interesting partition, then bucketed table scan partitioning will be destroyed by the shuffle operator in the middle, and we don't need bucketed table scan for sure. The idea of "interesting partition" is inspired from "interesting order" in "Access Path Selection in a Relational Database Management System"(http://www.inf.ed.ac.uk/teaching/courses/adbs/AccessPath.pdf), after discussion with cloud-fan . ### Why are the changes needed? To avoid unnecessary bucketed scan in the query, and this is prerequisite for apache#29625 (decide bucketed sorted scan dynamically will be added later in that PR). ### Does this PR introduce _any_ user-facing change? A new config `spark.sql.sources.bucketing.autoBucketedScan.enabled` is introduced which set to false by default (the rule is disabled by default as it can regress cached bucketed table query, see discussion in apache#29804 (comment)). User can opt-in/opt-out by enabling/disabling the config, as we found in prod, some users rely on assumption of # of tasks == # of buckets when reading bucket table to precisely control # of tasks. This is a bad assumption but it does happen on our side, so leave a config here to allow them opt-out for the feature. ### How was this patch tested? Added unit tests in `DisableUnnecessaryBucketedScanSuite.scala` Closes apache#29804 from c21/bucket-rule. Authored-by: Cheng Su <[email protected]> Signed-off-by: Takeshi Yamamuro <[email protected]>
- Loading branch information
Showing
9 changed files
with
454 additions
and
18 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
161 changes: 161 additions & 0 deletions
161
.../main/scala/org/apache/spark/sql/execution/bucketing/DisableUnnecessaryBucketedScan.scala
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,161 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.spark.sql.execution.bucketing | ||
|
||
import org.apache.spark.sql.catalyst.plans.physical.{ClusteredDistribution, HashClusteredDistribution} | ||
import org.apache.spark.sql.catalyst.rules.Rule | ||
import org.apache.spark.sql.execution.{FileSourceScanExec, FilterExec, ProjectExec, SortExec, SparkPlan} | ||
import org.apache.spark.sql.execution.aggregate.BaseAggregateExec | ||
import org.apache.spark.sql.execution.exchange.Exchange | ||
import org.apache.spark.sql.internal.SQLConf | ||
|
||
/** | ||
* Disable unnecessary bucketed table scan based on actual physical query plan. | ||
* NOTE: this rule is designed to be applied right after [[EnsureRequirements]], | ||
* where all [[ShuffleExchangeExec]] and [[SortExec]] have been added to plan properly. | ||
* | ||
* When BUCKETING_ENABLED and AUTO_BUCKETED_SCAN_ENABLED are set to true, go through | ||
* query plan to check where bucketed table scan is unnecessary, and disable bucketed table | ||
* scan if: | ||
* | ||
* 1. The sub-plan from root to bucketed table scan, does not contain | ||
* [[hasInterestingPartition]] operator. | ||
* | ||
* 2. The sub-plan from the nearest downstream [[hasInterestingPartition]] operator | ||
* to the bucketed table scan, contains only [[isAllowedUnaryExecNode]] operators | ||
* and at least one [[Exchange]]. | ||
* | ||
* Examples: | ||
* 1. no [[hasInterestingPartition]] operator: | ||
* Project | ||
* | | ||
* Filter | ||
* | | ||
* Scan(t1: i, j) | ||
* (bucketed on column j, DISABLE bucketed scan) | ||
* | ||
* 2. join: | ||
* SortMergeJoin(t1.i = t2.j) | ||
* / \ | ||
* Sort(i) Sort(j) | ||
* / \ | ||
* Shuffle(i) Scan(t2: i, j) | ||
* / (bucketed on column j, enable bucketed scan) | ||
* Scan(t1: i, j) | ||
* (bucketed on column j, DISABLE bucketed scan) | ||
* | ||
* 3. aggregate: | ||
* HashAggregate(i, ..., Final) | ||
* | | ||
* Shuffle(i) | ||
* | | ||
* HashAggregate(i, ..., Partial) | ||
* | | ||
* Filter | ||
* | | ||
* Scan(t1: i, j) | ||
* (bucketed on column j, DISABLE bucketed scan) | ||
* | ||
* The idea of [[hasInterestingPartition]] is inspired from "interesting order" in | ||
* the paper "Access Path Selection in a Relational Database Management System" | ||
* (https://dl.acm.org/doi/10.1145/582095.582099). | ||
*/ | ||
case class DisableUnnecessaryBucketedScan(conf: SQLConf) extends Rule[SparkPlan] { | ||
|
||
/** | ||
* Disable bucketed table scan with pre-order traversal of plan. | ||
* | ||
* @param withInterestingPartition The traversed plan has operator with interesting partition. | ||
* @param withExchange The traversed plan has [[Exchange]] operator. | ||
* @param withAllowedNode The traversed plan has only [[isAllowedUnaryExecNode]] operators. | ||
*/ | ||
private def disableBucketWithInterestingPartition( | ||
plan: SparkPlan, | ||
withInterestingPartition: Boolean, | ||
withExchange: Boolean, | ||
withAllowedNode: Boolean): SparkPlan = { | ||
plan match { | ||
case p if hasInterestingPartition(p) => | ||
// Operator with interesting partition, propagates `withInterestingPartition` as true | ||
// to its children, and resets `withExchange` and `withAllowedNode`. | ||
p.mapChildren(disableBucketWithInterestingPartition(_, true, false, true)) | ||
case exchange: Exchange => | ||
// Exchange operator propagates `withExchange` as true to its child. | ||
exchange.mapChildren(disableBucketWithInterestingPartition( | ||
_, withInterestingPartition, true, withAllowedNode)) | ||
case scan: FileSourceScanExec => | ||
if (isBucketedScanWithoutFilter(scan)) { | ||
if (!withInterestingPartition || (withExchange && withAllowedNode)) { | ||
scan.copy(disableBucketedScan = true) | ||
} else { | ||
scan | ||
} | ||
} else { | ||
scan | ||
} | ||
case o => | ||
o.mapChildren(disableBucketWithInterestingPartition( | ||
_, | ||
withInterestingPartition, | ||
withExchange, | ||
withAllowedNode && isAllowedUnaryExecNode(o))) | ||
} | ||
} | ||
|
||
private def hasInterestingPartition(plan: SparkPlan): Boolean = { | ||
plan.requiredChildDistribution.exists { | ||
case _: ClusteredDistribution | _: HashClusteredDistribution => true | ||
case _ => false | ||
} | ||
} | ||
|
||
/** | ||
* Check if the operator is allowed single-child operator. | ||
* We may revisit this method later as we probably can | ||
* remove this restriction to allow arbitrary operator between | ||
* bucketed table scan and operator with interesting partition. | ||
*/ | ||
private def isAllowedUnaryExecNode(plan: SparkPlan): Boolean = { | ||
plan match { | ||
case _: SortExec | _: ProjectExec | _: FilterExec => true | ||
case partialAgg: BaseAggregateExec => | ||
partialAgg.requiredChildDistributionExpressions.isEmpty | ||
case _ => false | ||
} | ||
} | ||
|
||
private def isBucketedScanWithoutFilter(scan: FileSourceScanExec): Boolean = { | ||
// Do not disable bucketed table scan if it has filter pruning, | ||
// because bucketed table scan is still useful here to save CPU/IO cost with | ||
// only reading selected bucket files. | ||
scan.bucketedScan && scan.optionalBucketSet.isEmpty | ||
} | ||
|
||
def apply(plan: SparkPlan): SparkPlan = { | ||
lazy val hasBucketedScanWithoutFilter = plan.find { | ||
case scan: FileSourceScanExec => isBucketedScanWithoutFilter(scan) | ||
case _ => false | ||
}.isDefined | ||
|
||
if (!conf.bucketingEnabled || !conf.autoBucketedScanEnabled || !hasBucketedScanWithoutFilter) { | ||
plan | ||
} else { | ||
disableBucketWithInterestingPartition(plan, false, false, true) | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.