SparkMLPractice

Learning and evaluate spark MLlib

Spark is a new and fast evolving big data computing framework. Many people think it's the future of big data. The foundation of Spark is its in-memory RDD compute model. It can cache the mediate compute result in memory (while Hadoop MapReduce need to flush them into low speed disk) which means Spark especailly fit for iterative computing like machine learning.

In this practice we will evaluate Spark 1.3 mmlib and mlib. We also cover DataFrame which is another highlight feature in Ver 1.3.

What's the different between MLlib and ML Basically speaking
mllib focus on machine learning low level API. It packes many ml algorithms. All of them had be verified to be able to work on distributed environment. Until now (V1.3) MLlib supports
- linear model (SVM, LR, LogR, Naive Bayes, decision tree),
- collaborative filtering,
- clustering,
- dimensionality reduction,
- feature extraction,
On the other hand, ML focus on how to construct an efficiency machine learning system. A productive machine learning system including not only the core algorithm but also a serial steps like data clean, preprocess, model training, model evaluation, parameters adjusting. ML provides some abstract class to represent the needed step in machine learning

Transfomer - abstract for data cleaning Estimator - abstract for learning algorithm Pipeline - combine transfomer and estimator into a streaming line
DataFrame
Basic MLlib practice (including ALS, SVD, K-mean cluster) and DataFrame
ML
Deep Learning(deeplearning4j on spark)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src/main/scala		src/main/scala
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SparkMLPractice

About

Releases

Packages

Languages

License

chenqxi/SparkMLPractice

Folders and files

Latest commit

History

Repository files navigation

SparkMLPractice

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages