Cleaning Data Project

Course project for Getting and Cleaning Data

The CodeBook describe in detail the data used in the run_analysis.R script.

This script takes the resulting data sets from a fitness study of the accelerometers in the Samsung Galaxy S smartphone: https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip

Using these data sets, the script does the following:

Downloads the data into a data directory
Cleans up the activity labels: makes them lowercase and removes the "_"
Sets the headers for X_test.txt and X_train.txt to the feature labels
Stacks subject_test.txt on top of subject_train.txt
Stacks y_test.txt on top of y_train.txt (these are the IDs for the activity performed in each observation)
Replaces the activity IDs with their respective names
Stacks X_test.txt on top of X_train.txt
Extracts only the measurements on the mean and standard deviation for each measurement (does not include meanFreq)
Column binds subjects, activities (y files), and the measurements (X files) into master_data
From master_data, creates a second, independent data frame called means_data, with the average of each measurement for each activity and each subject.
Saves the tidy means data as tidy_means_data.txt
Deletes the downloaded data

Note: This script uses dplyr.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
CodeBook.md		CodeBook.md
README.md		README.md
run_analysis.R		run_analysis.R

Provide feedback