Skip to content

Commit

Permalink
[SPARK-14966] SizeEstimator should ignore classes in the scala.reflec…
Browse files Browse the repository at this point in the history
…t package

In local profiling, I noticed SizeEstimator spending tons of time estimating the size of objects which contain TypeTag or ClassTag fields. The problem with these tags is that they reference global Scala reflection objects, which, in turn, reference many singletons, such as TestHive. This throws off the accuracy of the size estimation and wastes tons of time traversing a huge object graph.

As a result, I think that SizeEstimator should ignore any classes in the `scala.reflect` package.

Author: Josh Rosen <[email protected]>

Closes apache#12741 from JoshRosen/ignore-scala-reflect-in-size-estimator.
  • Loading branch information
JoshRosen authored and rxin committed Apr 28, 2016
1 parent f5ebb18 commit 8c49ceb
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions core/src/main/scala/org/apache/spark/util/SizeEstimator.scala
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,9 @@ object SizeEstimator extends Logging {
val cls = obj.getClass
if (cls.isArray) {
visitArray(obj, cls, state)
} else if (cls.getName.startsWith("scala.reflect")) {
// Many objects in the scala.reflect package reference global reflection objects which, in
// turn, reference many other large global objects. Do nothing in this case.
} else if (obj.isInstanceOf[ClassLoader] || obj.isInstanceOf[Class[_]]) {
// Hadoop JobConfs created in the interpreter have a ClassLoader, which greatly confuses
// the size estimator since it references the whole REPL. Do nothing in this case. In
Expand Down

0 comments on commit 8c49ceb

Please sign in to comment.