Skip to content

Latest commit

 

History

History
6 lines (6 loc) · 360 Bytes

README.md

File metadata and controls

6 lines (6 loc) · 360 Bytes

Data Mining with Spark

  • This repo details the data mining algorithms I did in my "Data Mining with Spark" class.
  • All the algorithms are written with distributed processing in mind.
  • Mainly written in Spark's Python SDK: PySpark
  • Most algorithms here manipulates Spark RDDs not Spark DataFrames
    • With some exceptions using GraphFrames and Spark SQL