Skip to content

Commit 5892bbf

Browse files
amanomersrowen
authored andcommitted
[SPARK-30124][MLLIB] unnecessary persist in PythonMLLibAPI.scala
### What changes were proposed in this pull request? Removed unnecessary persist. ### Why are the changes needed? Persist in `PythonMLLibAPI.scala` is unnecessary because later in `run()` of `gmmAlg` is caching the data. https://github.com/apache/spark/blob/710ddab39e20f49e917311c3e27d142b5a2bcc71/mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala#L167-L171 ### Does this PR introduce any user-facing change? No ### How was this patch tested? Manually Closes apache#26758 from amanomer/improperPersist. Authored-by: Aman Omer <[email protected]> Signed-off-by: Sean Owen <[email protected]>
1 parent 35bab33 commit 5892bbf

File tree

2 files changed

+2
-5
lines changed

2 files changed

+2
-5
lines changed

mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala

+1-5
Original file line numberDiff line numberDiff line change
@@ -407,11 +407,7 @@ private[python] class PythonMLLibAPI extends Serializable {
407407

408408
if (seed != null) gmmAlg.setSeed(seed)
409409

410-
try {
411-
new GaussianMixtureModelWrapper(gmmAlg.run(data.rdd.persist(StorageLevel.MEMORY_AND_DISK)))
412-
} finally {
413-
data.rdd.unpersist()
414-
}
410+
new GaussianMixtureModelWrapper(gmmAlg.run(data.rdd))
415411
}
416412

417413
/**

mllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala

+1
Original file line numberDiff line numberDiff line change
@@ -234,6 +234,7 @@ class GaussianMixture private (
234234
iter += 1
235235
compute.destroy()
236236
}
237+
breezeData.unpersist()
237238

238239
new GaussianMixtureModel(weights, gaussians)
239240
}

0 commit comments

Comments
 (0)