- README.MD (this)
- submit
- submission.csv (with RMSE of 1.025)
- data
- data_train.csv (original data from https://www.crowdai.org/challenges/epfl-ml-recommender-system/dataset_files)
- sample_submission.csv (original data from https://www.crowdai.org/challenges/epfl-ml-recommender-system/dataset_files)
- item_feats_SGD.npy (latent features of items after training using SGD, generated by run.py)
- user_feats_SGD.npy (latent features of users after training using SGD, generated by run.py)
- result_of_gs_reg_MF.dat (grid search result of running gs_reg_MF.py on fidis clusters)
- result_of_gs_biased_MF.dat (grid search result of running gs_biased_MF.py on fidis clusters)
- src
-
data_process.py (module for data preprocess and results submission)
-
SGD_helpers.py (module for SGD, mostly matching the lab)
-
MF_helpers.py (module for the bias matrix factorization and user-item ratings matrix)
-
run.py (main script for submiting the final results)
-
gs_reg_MF.py (script for grid search of the regularized matrix factorization)
-
gs_biased_MF.py (script for grid search of the regularized matrix factorization)
-
implement_surprise.py (script for cross validation using library surprise)
-
To run run.py:
- On Mac/Windows :
- Open the Terminal, enter the zipped folder, enter to the folder ./src/;
- To execute in ./src/, enter : python run.py;
- submission.py is generated in ../submit/;
- item_feats_SGD.npy and user_feats_SGD.npy are stored in ../data/.
- Python 3.6+
- Numpy
- Scipy
- Pandas
-
Notebook :
-
Recommender_MF.ipynb: recorded how we analyzed the user-item ratings matrix and how we implemented the regularized MF and the biased MF. The notebook is organized as follows:
- Load data, split the ratings matrix into training and testing set
- Statistics analysis
- Presentation of the MF methods used
- Grid search of the best parameters (This part has been reorganized in gs_reg_MF.py and gs_biased_MF.py)
- Compute the predictions
- Creation of csv file for the submission.
-
-
Python modules :
-
data_process.py : This module transforms the data of a csv file into a sparse ratings matrix, split the data and functions to convert the final made predictions the correct format.
-
SGD_helpers.py : This module initialize the parameters for matrix factorization, compute RMSE and compute the SGD.
-
MF_helpers.py : This module computes the bias of users and items and computes the global average.
-