Skip to content

guizhixiao/spark-dev

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apache Spark Development

Using Python and Scala

Compliant with Spark 2.0

Samples showing Scala and PySpark API usage in Scala and Python respectively.

Also contains a base class for testing PySpark code using SparkSession and PyUnit.

This base class is a slight changed version from the ReusedPySparkTestCase class in the pyspark.test module.
Subclass the PyUnit test case from the CustomPySparkTestCase class. The CustomPySparkTestCase class encapsulates the SparkSession which can be used as an entry point instead of SparkContext.

Example:

from pysparktest import CustomPySparkTestCase

class SampleTest(CustomPySparkTestCase):
	def test_word_cnt(self):
    	rdd = self.spark.sparkContext.parallelize(['Hi there', 'Hi'])
    	self.assertEqual(word_cnt(rdd).collectAsMap(), {'Hi' : 2, 'there' : 1})

About

Apache Spark development

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Scala 61.6%
  • Python 37.5%
  • Shell 0.9%