This project is created for Project 2 (Term 2) of Udacity's Data Analyst Nanodegree (DAND). In this project, which is written in R Markdown (.Rmd) format, I conducted exploratory data analysis (EDA) on a dataset containing information of nearly 5,000 white wines, including their chemical properties and quality ratings. Through this EDA, I explored the variables, structure, patterns, oddities, and underlying relationships of the dataset to find out properties of the wines that are correlated to their quality ratings.
You can view and run the Rmd file explore_summarize_data_project.Rmd
RStudio. One way to install RStudio is through Anaconda. After downloading the
Anaconda installer and running it to
install Anaconda, open Anaconda Navigator and click the "Install" button below
RStudio. Once RStudio has been installed, click "Launch" to start using it.
Alternatively, you can download and install the R programming language from CRAN, and subsequently download and install RStudio from RStudio's website. If you choose this option, make sure you install the R programming language first before installing RStudio.
If you plan to only view the analysis and not the code written for the analysis,
you can look at the
file, which is knitted
from the Rmd file. Alternatively, you can download the html file
and open it using your web browser.
The dataset used in this project contains information of 4,898 Portuguese
“Vinho Verde” white wines, along with 11 variables about their chemical
properties and 1 variable about their quality ratings. You can read more about
the dataset at wineQualityInfo.txt
Dataset source:
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236. Available at:
Renaming legend titles and labels:
Renaming facet labels:
Factors that affect the smell or taste of wines: