- This repo details the data mining algorithms I did in my "Data Mining with Spark" class.
- All the algorithms are written with distributed processing in mind.
- Mainly written in Spark's Python SDK: PySpark
- Most algorithms here manipulates Spark RDDs not Spark DataFrames
- With some exceptions using GraphFrames and Spark SQL