index.Rmd

--- 
title: "Introduction to Data Science"
subtitle: "Data Analysis and Prediction Algorithms with R"
author: "Rafael A. Irizarry"
date: "`r Sys.Date()`"
output: pdf_document
urlcolor: blue
geometry: margin=1in
biblio-style: apalike
description: This book introduces concepts and skills that can help you tackle real-world
  data analysis challenges. It covers concepts from probability, statistical inference,
  linear regression and machine learning and helps you develop skills such as R programming,
  data wrangling with dplyr, data visualization with ggplot2, file organization with
  UNIX/Linux shell, version control with GitHub, and reproducible document preparation
  with R markdown.
documentclass: book
classoption: openany
link-citations: yes
bibliography:
- book.bib
- packages.bib
site: bookdown::bookdown_site
always_allow_html: yes  
---


```{r include=FALSE}
# automatically create a bib database for R packages
knitr::write_bib(c(
  .packages(), 'bookdown', 'knitr', 'rmarkdown'), 'packages.bib')
```

# Preface {-}

This book started out as the class notes used in the
 [HarvardX Data Science Series](https://www.edx.org/course/data-science-r-basics-harvardx-ph125-1x). 
 
The link for the online version of the book is [https://rafalab.github.io/dsbook/](https://rafalab.github.io/dsbook/)

The R markdown code used to generate the book are available on  [GitHub](https://github.com/rafalab/dsbook). Note that the individual files are not self contained since we run the code included in [this file](https://raw.githubusercontent.com/rafalab/dsbook/master/_common.R) before each one while creating the book. In particular, most chapters require these packages to be loaded:

```{r, eval=FALSE}
library(tidyverse)
library(dslabs)
```

The graphical theme used for plots throughout the book can be recreated using the `ds_theme_set()` function from __dslabs__ package.

This work is licensed under the [Creative Commons Attribution 3.0 Unported (CC-BY)](https://tldrlegal.com/license/creative-commons-attribution-(cc)) United States License.

We make annonucements related to the book on Twitter. For updates follow [\@rafalab](https://twitter.com/rafalab)

# Acknowledgements {-}

This book is dedicated to all the people involved in building and maintaining R and the R packages we use in this book. A special thanks to the developers and maintainers of R base, the tidyverse and the caret package.

A special thanks to my tidyverse guru David Robinson and Amy Gill for dozens of comments, edits, and suggestions. Also, many thanks to Stephanie Hicks who twice served as a co-instructor in my data science class and Yihui Xie who patiently put up with my many questions about bookdown. Thanks also to Karl Broman, from whom I borrowed ideas for the Data Visualization and Productivity Tools parts, and to Hector Corrada-Bravo, for advice on how to best teach Machine Learning. Thanks to Peter Aldhous from whom I borrowed ideas for the principles of data visualization section and Jenny Bryan for writing _Happy Git and GitHub for the useR_, which influenced our Git chapters. Thanks to 
Alyssa Frazee for helping create the homework problem that became the Recommendation Systems chapter. Also, many thanks to Jeff Leek, Roger Peng and Brian Caffo whose class inspired the way this book is divided and to Garrett Grolemund and Hadley Wickham for making the bookdown code for their R for Data Science book open. Finally, thanks to Alex Nones for proofreading the manuscript during its various stages.

This book was conceived during the teaching of several applied statistics courses, starting over fifteen years ago. The teaching assistants working with me throughout the years made important indirect contributions to this book. The latest iteration of this course is a HarvardX series coordinated by Heather Sternshein. We thank her for her contributions. We are also grateful to all the students whose questions and comments helped us improve the book. The courses were partially funded by NIH grant R25GM114818. We are very grateful to the National Institute of Health for its support.
A special thanks goes to all those that edited the book via GitHub pull requests or made suggestions by creating an _issue_:  Huang Qiang (`nickyfoto`), Marc-André Désautels (`desautm`), Michail Schwab (`michaschwab`), Alvaro Larreategui (`alvarolarreategui`), Jake VanCampen (`jakevc`), Guillermo Lengemann (`omerta`),  Enrico Spinielli (`espinielli`), Aaron Simumba (`asimumba`) Maldewar (`braunschweig`), Grzegorz Wierzchowski (`gwierzchowski`),  Richard Careaga (`technocrat`) and Matthew Ploenzke (`mPloenzke`).