Skip to content

Commit

Permalink
SPARK-2553. CoGroupedRDD unnecessarily allocates a Tuple2 per depende…
Browse files Browse the repository at this point in the history
…ncy...

... per key

My humble opinion is that avoiding allocations in this performance-critical section is worth the extra code.

Author: Sandy Ryza <[email protected]>

Closes apache#1461 from sryza/sandy-spark-2553 and squashes the following commits:

7eaf7f2 [Sandy Ryza] SPARK-2553. CoGroupedRDD unnecessarily allocates a Tuple2 per dependency per key
  • Loading branch information
sryza authored and mateiz committed Jul 18, 2014
1 parent 29809a6 commit e52b871
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion core/src/main/scala/org/apache/spark/rdd/CoGroupedRDD.scala
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,11 @@ class CoGroupedRDD[K](@transient var rdds: Seq[RDD[_ <: Product2[K, _]]], part:
}
val mergeCombiners: (CoGroupCombiner, CoGroupCombiner) => CoGroupCombiner =
(combiner1, combiner2) => {
combiner1.zip(combiner2).map { case (v1, v2) => v1 ++ v2 }
var depNum = 0
while (depNum < numRdds) {
combiner1(depNum) ++= combiner2(depNum)
depNum += 1
}
}
new ExternalAppendOnlyMap[K, CoGroupValue, CoGroupCombiner](
createCombiner, mergeValue, mergeCombiners)
Expand Down

0 comments on commit e52b871

Please sign in to comment.