-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathREADME.txt
23 lines (13 loc) · 891 Bytes
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Libraries required:
1. Scikit-learn
2. Matplotlib
3. numpy
Instructions to run:
1. Please run preproc.py first. It will generate 'preprocessed.txt' and 'imdbScores.txt' to be used by the other two scripts.
2. modelSelCV.py and preproc.py each have commented sections in order to view plot of covariance matrix and categorical features respectively.
3. Since model selection performs an exhaustive evaluation of all the models across all the hyperparamters, it takes a long time to execute (approx. 30 minutes depending on the system)
Code referred:
1. For one-hot enconding of catgorical features:
http://fastml.com/converting-categorical-data-into-numbers-with-pandas-and-scikit-learn/
2. For splitting the data to (train,test) and loading the database
https://www.kaggle.com/nsnilay/d/deepmatrix/imdb-5000-movie-dataset/predicting-imdb-ratings-using-numerical-attributes/notebook