We will investigate the medals won by each country at each historical Olympic Games (dataset pulled from Kaggle). The dataset contains each medal won at each Olympic Games, including the medaling athlete, their gender, and their country and sport.
We will generate a model using Featuretools that predicts whether or not a country will score more than 10 medals at the next Olympics. While it's possible to have some predictive accuracy without machine learning, feature engineering is necessary to improve the score.
- We make predictions for the medals won at various points throughout history. Using just the average number of medals won has an average AUC score of 0.79.
- Use automated feature engineering, to generate hundred of features and improve the score to 0.95 on average
-
Clone the repo
git clone https://github.com/Featuretools/predict-olympic-medals.git
-
Install the requirements
pip install -r requirements.txt
You will also need to install graphviz for this demo. Please install graphviz according to the instructions in the Featuretools Documentation
-
Download the data
You can download the data directly from Kaggle.
After downloading the data Copy the three csv files into the structure directory
data/olympic_games_data/
in the root of this repository. -
Run the notebooks:
jupyter notebook
Featuretools is an open source project created by Feature Labs. To see the other open source projects we're working on visit Feature Labs Open Source. If building impactful data science pipelines is important to you or your business, please get in touch.
Any questions can be directed to [email protected]