Apache-Spark-Projects-with-Python

Learn and master the art of framing data analysis problems as Spark problems through over 15 hands-on examples, and then scale them up to run on cloud computing services.

In this repo, I personally built over 15 real examples of increasing complexity, run and study by myself.

Learn the concepts of Spark's Resilient Distributed Datastores
Develop and run Spark jobs quickly using Python
Translate complex analysis problems into iterative or multi-stage Spark scripts
Scale up to larger data sets using Amazon's Elastic MapReduce service
Understand how Hadoop YARN distributes Spark across computing clusters
Learn about other Spark technologies, like Spark SQL, Spark Streaming, and GraphX

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
01-Spark_Bsics_and_Simple_Example		01-Spark_Bsics_and_Simple_Example
02-Advanced_Examples_of_Spark_Programs		02-Advanced_Examples_of_Spark_Programs
03-Running_Spark_on_a_Cluster/ml-1m		03-Running_Spark_on_a_Cluster/ml-1m
04-SparkSQL_DataFrames_and_DataSets		04-SparkSQL_DataFrames_and_DataSets
05-Other_Spark_Technologies_and_Libraries		05-Other_Spark_Technologies_and_Libraries
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apache-Spark-Projects-with-Python

About

Releases

Packages

Languages

License

xrenaissance/Apache-Spark-Projects-with-Python

Folders and files

Latest commit

History

Repository files navigation

Apache-Spark-Projects-with-Python

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages