Pyspark ML

Pyspark

Using pipenv to create virtual enviorment which content requirements

install pipenv If you don't have pipenv you need to install it you can see detail here
fork this repo

clone this repo in the your system

git clone https://github.com/piyushtada/Pyspark-ML.git

then change directory to Pyspark-ML
```
cd Pyspark-ML
```
Then run command pipenv install
```
pipenv install
```
this will install all the dependences you need to run the project
Run jupyter notebook
```
jupyter notebook
```
it will open the jupyter notebook and you can use spark in it.
Check if everything is working by using test.ipynb
when you want to open the secission again you need run following command after going in the PysparkML folder
```
pipenv shell  
jupyter notebook
```

List of tasks in the project

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
img		img
01_EDA_and_data_prepration.ipynb		01_EDA_and_data_prepration.ipynb
02_clustering_7dec.ipynb		02_clustering_7dec.ipynb
03_Pyspark Randomforest Gradient-Boosted Tree.ipynb		03_Pyspark Randomforest Gradient-Boosted Tree.ipynb
04_Balanced Data- Pyspark Randomforest Gradient-Boosted Tree.ipynb		04_Balanced Data- Pyspark Randomforest Gradient-Boosted Tree.ipynb
MACHINE LEARNING WITH PYSPARK _Piyush_Tada.md		MACHINE LEARNING WITH PYSPARK _Piyush_Tada.md
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
Project-ml-models.ipynb		Project-ml-models.ipynb
Pyspark Linear Regression.ipynb		Pyspark Linear Regression.ipynb
README.md		README.md
Spark Cheat-Sheets (DZone).pdf		Spark Cheat-Sheets (DZone).pdf
test.ipynb		test.ipynb