Course project for Getting and Cleaning Data
The CodeBook describe in detail the data used in the run_analysis.R script.
This script takes the resulting data sets from a fitness study of the accelerometers in the Samsung Galaxy S smartphone: https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip
Using these data sets, the script does the following:
- Downloads the data into a data directory
- Cleans up the activity labels: makes them lowercase and removes the "_"
- Sets the headers for X_test.txt and X_train.txt to the feature labels
- Stacks subject_test.txt on top of subject_train.txt
- Stacks y_test.txt on top of y_train.txt (these are the IDs for the activity performed in each observation)
- Replaces the activity IDs with their respective names
- Stacks X_test.txt on top of X_train.txt
- Extracts only the measurements on the mean and standard deviation for each measurement (does not include meanFreq)
- Column binds subjects, activities (y files), and the measurements (X files) into master_data
- From master_data, creates a second, independent data frame called means_data, with the average of each measurement for each activity and each subject.
- Saves the tidy means data as tidy_means_data.txt
- Deletes the downloaded data
Note: This script uses dplyr.