Skip to content

Learn and master the art of framing data analysis problems as Spark problems through over 15 hands-on examples, and then scale them up to run on cloud computing services.

License

Notifications You must be signed in to change notification settings

xrenaissance/Apache-Spark-Projects-with-Python

Repository files navigation

Apache-Spark-Projects-with-Python

Learn and master the art of framing data analysis problems as Spark problems through over 15 hands-on examples, and then scale them up to run on cloud computing services.

In this repo, I personally built over 15 real examples of increasing complexity, run and study by myself.

  • Learn the concepts of Spark's Resilient Distributed Datastores
  • Develop and run Spark jobs quickly using Python
  • Translate complex analysis problems into iterative or multi-stage Spark scripts
  • Scale up to larger data sets using Amazon's Elastic MapReduce service
  • Understand how Hadoop YARN distributes Spark across computing clusters
  • Learn about other Spark technologies, like Spark SQL, Spark Streaming, and GraphX

About

Learn and master the art of framing data analysis problems as Spark problems through over 15 hands-on examples, and then scale them up to run on cloud computing services.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages