Codebook

This document describes the steps involved in cleaning the data as described in readme.md

RAW DATA

All the files contain data in space-separated-values format

None of the files contain column names or row names

features.txt: (dimesions:561,2): Contains the IDs and names of the variables
activity_labels.txt: (dimesions:6,2): Contains the IDs and names of the activities

train/X_train.txt: (dimesions:7352,561): Contains the values of all the variables in each observation of the training experiments set
train/y_train.txt: (dimesions:7352,1): Contains the activity ID of each observation/row in the observations file (train/X_train.txt)
train/subject_train.txt: (dimesions:7352,1): Contains the participant ID of each observation/row in the observations file (train/X_train.txt)

test/X_test.txt: (dimesions:7352,561): Contains the values of all the variables in each observation of the testing experiments set
test/y_test.txt: (dimesions:7352,1): Contains the activity ID of each observation/row in the observations file (test/X_test.txt)
test/subject_test.txt: (dimesions:7352,1): Contains the participant ID of each observation/row in the observations file (test/X_test.txt)

Check whether the UCI HAR Dataset folder exists in the current directory. If not, download and unzip it
Load the files train/X_train.txt, train/y_train.txt, train/subject_train.txt, test/X_test.txt, test/y_test.txt, test/subject_test.txt and activity_labels.txt into respective variables
Bind the columns of subject_test, y_test and x_test, to form a testing dataframe
Bind the columns of subject_train, y_train and x_train to form a training dataframe
Bind the rows of the two data frames obained in the 2 previous steps (3 and 4), to obtain big_DF
Set the column names of big_DF to "subject" then "activity" then the values read from features.txt

Use grep() to find the column names that match "subject", "activity" or contain "mean()" or "std()"
Assign to big_DF a dataframe that only contains the columns from big_DF that we obtained from the previous step.

Match the keys from the activities dataframe to the activity values in big_DF and replace them with the corresponding names from the activities DF.

Use the melt() function to reshape big_DF into a 4-column dataframe that contains the columns "activity", "subject", "variable" and "value". (The id.vars used are "activity" and "subject")
Use the dcast() function to cast the previously molten dataframe while applying the mean aggregate function in order to collapse the rows that have a commmon subject and activity while applying the mean function to the values
Assign the obtained data.frame to a variable called clean_DF in the calling environment