GitHub - ucb-stat154/stat154-fall-2017: Course materials for Stat 154, fall 2017, at UC Berkeley

stat154-fall-2017

This repository holds the course materials for the Fall 2017 edition of Statistics 154: Modern Statistical Prediction and Machine Learning at UC Berkeley.

Instructor: Gaston Sanchez, gaston.stat [at] gmail.com
Class Time: MWF 1-2pm in 3108 Etcheverry
Session Dates: 08/23/17 - 12/08/17
Code #: 20978
Units: 4 (more info here)
Office Hours: MW 2:10-3:00pm in 309 Evans (or by appointment)
Final: TBA
GSI: Johnny Hong (OH 428 Evans: Tu 9-11am, Th 1-3pm).

Lab	Date	Room	GSI
101	M 9am-11am	330 Evans	Johnny Hong
102	M 11am-1pm	330 Evans	Johnny Hong

Description

This is an introductory-level course in supervised learning, with a focus on regression and classification methods. The syllabus includes: linear regression, model assessment, model selection, regularization methods (pcr, plsr, ridge and lasso); logistic regression and discriminant analysis; cross-validation and the bootstrap; tree-based methods, random forests and boosting; support-vector machines. Some unsupervised learning methods are discussed: principal components and clustering (k-means and hierarchical).

In this course, we will explore the predictive modeling lifecycle, including question formulation, data preprocessing, exploratory data analysis and visualization, model building, model assessment/validation, model selection, and decision-making.

We will focus on quantitative critical thinking and key principles needed to carry out this cycle. 1) Foundational principles for building predictive models; 2) Intuitive explanations of many commonly used predictive modeling techniques for both classification and regression problems; 3) Principles and steps for validating a predictive model; and 4) write and use computer code to perform the necessary foundational work to build and validate predictive models.

Process of predictive model building
Data Preprocessing
Regression Models
- Linear models
- Non-linear models (time permitting)
- Tree-based methods
Classification Models
- Linear models
- Non-linear models
- Tree-based methods
- Support Vector Machines (time permitting)
Unsupervised methods like PCA and Clustering
Data spending: splitting and resampling methods
Model Evaluation
Model Selection

Prerequisites / Review

Multivariate calculus or the equivalent, esp. partial derivatives; e.g. Math 53
Linear algebra or the equivalent (matrices, vector spaces); e.g. Math 54
Statistical inference or the equivalent; e.g. Stat 135
Scripting experience in R required; e.g. Stat 133

This course will build on a lot of material from matrix algebra. In particular, you should be comfortable with notions such as vector spaces, inner products, norms, matrix products/transpose/rank/determinants/inverses, as well as matrix decompositions.

You should also have some scripting experience---preferably in R---at the level of writing functions, conditionals (if-then-else structures), for loops, while loops, sampling, read in data sets, export results.

Last but not least, it is nice to know the basics of Rmd files, as well as some knowledge of LaTeX, especially some experience writing math symbols and equations.

Textbooks

The primary text is An Introduction to Statistical Learning (ISL) by James, Witten, Hastie, and Tibshirani. Springer, 2013. It is freely available online in pdf format (courtesy of the authors) at http://www-bcf.usc.edu/~gareth/ISL/.

As companion material, especially for the labs, R code and projects, we will also be using Applied Predictive Modeling by Max Kuhn and Kjell Johnson. Springer, 2013.

Other good (optional) references for the course are:

The Elements of Statistical Learning by Hastie, Tibshirani and Friedman. Springer, 2009 (2nd Ed). This book is more mathematically-and-conceptually advanced than ISL. It is freely available online in pdf format (courtesy of the authors) at https://statweb.stanford.edu/~tibs/ElemStatLearn/. This text will not be used directly for this course and is simply a reference for more theoretical details.
Data Mining and Statistics for Decision Making by Stephane Tuffery. Wiley 2011. This book should be in electronic format via the UCB Library Catalog. If the course slides are not self-explanatory enough, you can supplement them with this little known, yet excellent resource.
Statistical Learning from a Regression Perspective by Richard Berk. Springer 2008. You can find this book in electronic format via the UCB Library Catalog. This text will not be used directly for this course and is simply a reference for more theoretical details.

Expectations

We expect that at the end of the course you:

Have a basic, yet solid, understanding of the prediction modeling process/lifecycle.
Be able to read a well-described algorithm, and write code to implement it computationally (in R).
Know the pros and cons of each predictive technique.
Be able to describe (to non-professionals) what a predictive technique is doing.

Methods of Instruction

We will be using a combination of materials such as slides, tutorials, reading assignments, and chalk-and-talk.
The main computational tool will be the computing and programming environment R.
The main workbench will be the IDE RStudio. You will also use a terminal emulator to work with the command line.

Other

Please read the course logistics and policies for mode details about the structure of the course, DO's and DONT's, etc.

License

Unless otherwise noticed, this work, by Gaston Sanchez, is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Author: Gaston Sanchez

Name		Name	Last commit message	Last commit date
Latest commit History 217 Commits
apps		apps
data		data
labs		labs
papers		papers
problems		problems
slides		slides
syllabus		syllabus
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

stat154-fall-2017

Description

Contents

Prerequisites / Review

Textbooks

Expectations

Methods of Instruction

Other

License

About

Releases

Packages

Contributors 2

Languages

ucb-stat154/stat154-fall-2017

Folders and files

Latest commit

History

Repository files navigation

stat154-fall-2017

Description

Contents

Prerequisites / Review

Textbooks

Expectations

Methods of Instruction

Other

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages