Group:
- Lorenzo Drudi - 367980
- Mikolaj Boronski - 376967
- Olena Zavertiaieva - 364500
The
run.py
file contains the code for the best model we obtained.The data should be inside the
./data
folder (if it does not exist create it).
Check and install the requirements inside requirements.txt
:
pip install -r requirements.txt
Linear regression using gradient descent
Linear regression using stochastic gradient descent
Least squares regression using normal equation
Ridge regression using normal equation
Logistic regression using gradient descent
Regularized logistic regression using gradient descent
All the algorithms are inside the implementations.py
file.
run.py
: file to run to obtain our best model.param_grid_search.py
: run to test more parameters on the same model.implementations.py
: implementations of all the ML algorithms.src/features
: functions for data cleaning and feature processing (NaN replacing, downsampling, columns removal, outliers removal).src/model
: functions for training and getting the predictions (k-fold cross-validation).src/evaluation
: evaluation metrics (rsme, accuracy, f1).src/utils
: utility functions (batch iteration, submission creation, weights generation).
Data Cleaning and Feature Processing
:- Drop calculated features.
- Drop rows with more than NaN% of values.
- Remove outliers.
- Replace NaN with mean, most frequent, or random uniform distribution between min and max.
- Downsampling (labels balancing).
- Standardization (z-score).
- Polynomial expansion.
Training process
:- K-fold cross-validation.
Evaluation metrics
:- F1 score.
- Accuracy.
- RSME.