Skip to content

piyushtada/Pyspark-ML

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pyspark ML

Pyspark

Using pipenv to create virtual enviorment which content requirements

  • install pipenv If you don't have pipenv you need to install it you can see detail here

  • fork this repo alt text

  • clone this repo in the your system

    git clone https://github.com/piyushtada/Pyspark-ML.git
  • then change directory to Pyspark-ML

    cd Pyspark-ML
  • Then run command pipenv install

    pipenv install

    this will install all the dependences you need to run the project

  • Run jupyter notebook

    jupyter notebook

    it will open the jupyter notebook and you can use spark in it.

  • Check if everything is working by using test.ipynb

  • when you want to open the secission again you need run following command after going in the PysparkML folder

    pipenv shell  
    jupyter notebook

List of tasks in the project

  • Do exploratory data analysis
  • Make update to columns with categorical data
  • Visualise the results
  • Make data ready for models
  • Save the file
  • Run one sample model to check everything uptill now working
  • Make list of models to apply
  • Apply models
  • Do hyperparameter tuning for the model

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%