Data Mining with Spark

This repo details the data mining algorithms I did in my "Data Mining with Spark" class.
All the algorithms are written with distributed processing in mind.
Mainly written in Spark's Python SDK: PySpark
Most algorithms here manipulates Spark RDDs not Spark DataFrames
- With some exceptions using GraphFrames and Spark SQL