This repository contains Python code for a selection of tables, figures and LAB sections from the book 'An Introduction to Statistical Learning with Applications in R' by James, Witten, Hastie, Tibshirani (2013).
This great book gives a thorough introduction to the field of Statistical/Machine Learning. The book is available for download (see link below), but I think this is one those books that is definitely worth buying. The book contains sections with applications in R based on public datasets available for download or which are part of the R-package ISLR. Since Python is my language of choice for data analysis, I decided to try and rework some of the figures and calculations in IPython Notebook using:
- pandas
- numpy
- scikit-learn
- matplotlib
- seaborn
- statsmodels
- patsy
I thought it to be a good way to learn more about Machine Learning in Python by creating these notebooks. I created some of the figures/tables of the chapters and worked through some LAB sections. I realize that at certain points it may look like I tried too hard to make the output identical to the tables and R-plots in the book. But I did this to explore some details of the libraries mentioned above (mostly matplotlib). Note that this repository is not a tutorial and that you probably should have a copy of the book to follow along.
Work in process! Suggestions for improvement and help with unsolved issues are welcome!
Chapter 3 - Linear Regression
Chapter 4 - Classification
Chapter 5 - Resampling Methods
Chapter 6 - Linear Model Selection and Regularization
Chapter 7 - Moving Beyond Linearity
Chapter 8 - Tree-Based Methods
Chapter 9 - Support Vector Machines
Chapter 10 - Unsupervised Learning
For an advanced treatment of these topics see Hastie et al. (2009)
#####References: James, G., Witten, D., Hastie, T., Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R, Springer Science+Business Media, New York. http://www-bcf.usc.edu/~gareth/ISL/index.html
Hastie, T., Tibshirani, R., Friedman, J. (2009). Elements of Statistical Learning, Second Edition, Springer Science+Business Media, New York. http://http://statweb.stanford.edu/~tibs/ElemStatLearn/