This repository contains the exercises for the EPFL master course EE-558 A Network Tour of Data Science (moodle), taught in autumn 2016. Look at the 2017 and 2018 editions for a course more focused on graphs and networks (instead of deep learning). There is two types of exercises.
The Data Scientist toolkit, a set of tools, mostly in Python, to help during the Data Science process.
- Introduction.
- Data acquisition & exploration: demo, exercise, solution.
- Data exploitation: demo, exercise, solution.
- High Performance Computing: exercise, solution.
- Data visualization: exercise, solution.
Machine Learning (ML) & Graph Signal Processing (GSP) algorithms. These exercises are designed so as to familiarize yourself with the algorithms presented in class.
- Graph Science: exercise, solution.
- Clustering: exercise, solution, assignment, solution.
- Classification: exercise, solution.
- TensorFlow: exercise, solution.
- Neural Networks: assignment, solution.
- Recurrent Neural Networks: assignment, solution.
- Graph Fourier Transform: exercise, solution.
- Transductive Learning using Graphs: assignment, solution.
Part of the course is evaluated by a project, proposed and carried out by groups of one to three students. Below is their work.
- [proposal, analysis, slides] Breast Cancer Classifcation, Robin Demesmaeker
- [proposal, analysis, slides] How Fake News Go Viral?, Victor Kristof, William Trouleau
- [proposal, analysis, slides] Twitter User Gender Classification, Gaétan Ramet, Benjamin Schloesing, Yuan Yao
- [proposal, analysis, slides] Youtube Fame Predictor, Benoît Steinmann, Cyrille Rolland
- [proposal, analysis, slides] Emotion Recognition from Faces, Patryk Oleniuk, Carmen Galotta
- [proposal, analysis, slides] Airbnb New User Bookings, Pecoraro Cyril, Jaume Guillaume, Grisard Malo
- [proposal, analysis, slides] Predicting an Election from Tweets, Ercolani Chiara, Vorobiev Mikhail, Juillard Michaël
- [proposal, analysis, slides] Fisheries Monitoring, Damian Pascual Ortiz, Pablo Mainar Jovaní
- [proposal, analysis, slides] El Nino, Cotting Matthieu, Bonelli Alberto, Allani Mohamed
- [proposal, analysis, slides] Alzheimer Disease Detection, Christian Abbet, Maxime Bonhenblust, Nicolas Masserey
- [proposal, analysis, slides] Estimating Hyper-Parameters for Compressed Sensing, Dimitris Perdios
- [proposal, analysis, slides] Global Warming, Effrosyni Simou
- [proposal, analysis, slides] Daily News for Stock Market Prediction, Jeroen Le Maire
- [proposal, analysis, slides] Open Source Software Support, Pavlos Nikolopoulos, Matthaios Olma, Stefanos Skalistis
- [proposal, analysis, slides] Epileptic Seizures Prediction, Sophie du Bois
- [proposal, analysis, slides] Product Recommendation, Berke Aral Sönmez, Alper Köse
- [proposal, analysis, slides] Sentiment Analysis, Meryem Wehbe, Samuel Beuret, Valentine Santarelli
- [proposal, analysis, slides] Bike Sharing Demand, Vincent Hardy
The easiest way to play with the code is to run it inside a docker container, a lightweight virtualization method.
-
Install Docker on your Windows, Mac or Linux machine.
-
Run the image, which is automatically updated from this git repository.
docker pull mdeff/ntds_2016 # to update it docker run --rm -i -p 8871:8888 -v ~/:/data/mount mdeff/ntds_2016
-
Access the container's Jupyter notebook at http://localhost:8871. There you'll find two folders:
repo
contains a copy of this git repository. Nothing you modify in this folder is persistent. If you want to keep your modifications, useFile
,Download as
,Notebook
in the Jupyter interface.mount
contains a view of your home directory, from which you can persistently modify any of your files.
Windows and Mac users may need to redirect the port in VirtualBox.
If you want to use it for your projects and need additional software or Python packages, you'll need to install them into the container.
-
Create your named container.
docker run -i -p 8871:8888 -v ~/:/data/mount --name myproject mdeff/ntds_2016
-
Once you stop it, you'll be able to start it again with
docker start myproject
. -
In another terminal, install packages while the container is running.
docker exec -i myproject /bin/bash pip install mypackage apt-get install myotherpackage
Warning: this may be problematic for Windows users, as TensorFlow does not support Windows yet.
-
Install Python.
- Windows: we recommend to install Anaconda. Please install version 3.5. Most of the packages we'll use during the exercises are included in the distribution. An other option is the Windows Subsystem for Linux, available on Windows 10, which allows you to install packages as if you were on Ubuntu.
- Mac: we recommend that you use the Homebrew package manager and install
Python with
brew install python3
. You can also use Anaconda. - Linux: please use your package manager to install the latest Python 3.x.
-
Clone the course repository. You may need to first install git.
git clone https://github.com/mdeff/ntds_2016.git cd ntds_2016
-
Optionally, create a virtual environment.
pyvenv /path/to/new/virtual/env . /path/to/new/virtual/env/bin/activate
A virtual environment allows you to install a different set of packages for each of your Python project. Each project thus stays cleanly separated from each other. It is a good practice but by no means necessary. You can read more about virtual environments on this blog post. Anaconda users, see here.
-
Install the packages we'll use from PyPI, the Python Package Index.
pip install -r requirements.txt # or make install
-
If it fails, it is probably because you need to install some native packages with your package manager. Please read the error messages and remember, Google is your friend ! You may look at the dockerfile to get an idea of which setup is necessary on a Debian / Ubuntu system.
-
Depending on your installation,
pip
may refer to Python 2 (you can verify withpip -V
). In that case, usepip3
instead ofpip
. -
Anaconda users can also install packages with
conda install packname
. See here for your options.
-
-
Verify that you have a working installation by running a simple test. Again, you may need to call
python3
.python check_install.py # or make test
- If you are on Windows with Anaconda and get
WARNING (theano.configdefaults): g++ not detected!
, you may want to install mingw-w64 withconda install mingw libpython
. Otherwise your Deep Learning models will run extremly slowly. This may however not work for Python 3.5, see this GitHub issue for a workaround.
- If you are on Windows with Anaconda and get
-
Open the jupyter web interface and play with the notebooks !
jupyter notebook
All codes and examples are released under the terms of the MIT License.