This project aims to build a model to detect fraudsters on a Revolut Dataset.
The dataset was downloaded from Kaggle, and it contains three different CSV files.
One called transactions.csv with information about each transaction, user_id, timestamp, etc. Another is called users.csv, which, as the name says, has information about the user: country, age, creation date, etc. And finally, the fraudsters.csv, which contains only the user_id of the fraudsters.
The project comprehends the following phases:
- Merging and cleaning the CSV files;
- Check if the data is balanced. In this case, it was not, so I applied undersampling of the majority class;
- Econding using Target Encoding and One Hot Encoding;
- Feature selection, for this I used Pearson Correlation;
- Try different Regression models. In the end, I chose the Random Forest Regressor Model
- Model Evaluation.
The full description of the project can be followed on this Medium post: