Skip to content

Commit

Permalink
ARROW-6803: [Rust] [DataFusion] Performance optimization for single p…
Browse files Browse the repository at this point in the history
…artition aggregate queries

This PR optimizes the case where there is a single partition being aggregated. In this use case there is no need for a secondary aggregation step.

I ran benchmarks to confirm that this addresses the performance regression.

Closes apache#5590 from andygrove/ARROW-6803 and squashes the following commits:

213fc10 <Andy Grove> Avoid secondary aggregate for single partitions

Authored-by: Andy Grove <[email protected]>
Signed-off-by: Andy Grove <[email protected]>
  • Loading branch information
andygrove committed Oct 8, 2019
1 parent 583fb7e commit 02d1e97
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions rust/datafusion/src/execution/physical_plan/hash_aggregate.rs
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,12 @@ impl ExecutionPlan for HashAggregateExec {
})
.collect();

if partitions.len() == 1 {
// if there is only a single partition then it isn't necessary to perform any
// additional logic
return Ok(partitions);
}

// create partition to combine and aggregate the results
let final_group: Vec<Arc<dyn PhysicalExpr>> = (0..self.group_expr.len())
.map(|i| Arc::new(Column::new(i)) as Arc<dyn PhysicalExpr>)
Expand Down

0 comments on commit 02d1e97

Please sign in to comment.