Title

Captain Cook: the fabulous recipes explorator

Abstract

Would you use a tool that proposes a recipe that you would really like, given the list of ingredients in your fridge?
This project makes use of the Cooking recipes dataset to offer different recipes, using the list of ingredients given by the user and classifying them by their ratings.
Our aim is to provide a service that helps people to waste less food, improve their health and introduce them to different types of recipes.

Research questions

Which culture has the most famous recipe?
What is the distribution of ingredients for each culture?
~How/Can we replace a famous recipe's ingredient by another similar one?/ Replace a famous recipe by another recipe?
Can we find an equivalent recipe with less calories?

Dataset

Dataset: Cooking recipes
How to get the data:
First, we get only the pages containing recipes while ignoring miscellaneous pages.
Parsing HTML files with BeautifulSoup library and seeking <ingredients> related class or tags in the files with regex and also for the <ratings> and <calories>.
Processing: Since the dataset is quite big (~2.5 Gb), the first part would be done in PySpark but after fetching the recipes which represent a small part of HTML files we can easily use Pandas DataFrame for the implementation of our project.
Enriching: A second dataset, which correspond to the user ingredient, is used as a criterium to seek the matching recipes.
We can propose different levels of ingredient similarity, depending on how many ingredients the user has or wants to use.

A list of internal milestones up until project milestone 2

Loading the HTML files with BeautifulSoup into PySpark
Cleaning Phase with PySpark: Keeping only titles, ingredients list, calories and ratings.
Saving the cleaned DataFrame as a Pandas DataFrame
Classification of the recipes by food ingredient, recipe type/culture or health benefits
A tag (e.g <chocolate>) is assigned for each ingredient, one recipe has multiple tags (like in HW3). Similar ingredients are assigned the same tags
Get which culture has the most famous recipes
Get the distribution of most used ingredients
Finding equivalent recipes with less calories

List of internal milestones achieved for milestone 3

Loading HTML was done with PERL and BASH scripts
Data was cleaned with BASH and then with Pandas since it was small enough, we didn't retrieve ratings but we have the cooking time!
We classified the recipes by nutritional, time or region. We just need to do it by ingredient
Ingredients are identified and cleaned
The distribution of ingredients is done
Create our own JSON map to plot informations about the recipes by region more specifically
Make the map more interactive and correct the colormap issue
Use statistical properties of the English language or Levenshtein distance
Finish the ingredients list cleaning and do classification
Create a user friendly recipe finder

Data Story

Captain Cook

Contributions

Camilla: Problem formulation, data analysis, data visualization, tabulating final results, website/data story writing
Matthieu: Data analysis, data visualization, running tests, tabulating final results
Tim: Problem formulation, data crawling/cleaning, preliminary data analysis, data analysis, data visualization

Name	Name	Last commit message	Last commit date
Latest commit TTimTT Delete Gemfile.lock Aug 21, 2019 99e01ab · Aug 21, 2019 History 167 Commits
.ipynb_checkpoints	.ipynb_checkpoints	Final report with widgets	Dec 16, 2018
Masks	Masks	add wordcloud	Dec 14, 2018
data	data	Final report with widgets	Dec 16, 2018
website/freelancer-theme	website/freelancer-theme	Delete Gemfile.lock	Aug 21, 2019
word_cloud	word_cloud	add wordcloud	Dec 14, 2018
.gitignore	.gitignore	Final report with widgets	Dec 16, 2018
DataAnalysis.ipynb	DataAnalysis.ipynb	Poster in progress	Jan 14, 2019
README.md	README.md	Update README.md	Dec 16, 2018
map_info.html	map_info.html	Final report with widgets	Dec 16, 2018
poster.pdf	poster.pdf	glaaaa	Jan 14, 2019
poster.svg	poster.svg	glaaaa	Jan 14, 2019
posterA1.pdf	posterA1.pdf	Spell check	Jan 17, 2019
posterA1.svg	posterA1.svg	Spell check	Jan 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Title

Captain Cook: the fabulous recipes explorator

Abstract

Research questions

Dataset

A list of internal milestones up until project milestone 2

List of internal milestones achieved for milestone 3

Data Story

Contributions

About

Releases

Packages

Contributors 3

Languages

TTimTT/CTX

Folders and files

Latest commit

History

Repository files navigation

Title

Captain Cook: the fabulous recipes explorator

Abstract

Research questions

Dataset

A list of internal milestones up until project milestone 2

List of internal milestones achieved for milestone 3

Data Story

Contributions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages