This is the repository for the Getting and Cleaning Data Course Project on Coursera.
The following files are available:
README.md - this file run_analysis.R - downloads the data and does all processing (includes comments) tidydata.csv - the output of run_analysis.R CodeBook.md - a description of tidydata.csv's format
Here are the data for the project:
The script does the following:
- Download the data file if it doesn't exist in the current working directory
- Unzip the data file if the directory doesn't already exist
- Read the data from X_train.txt, y_train.txt and subject_train.txt
- Read the data from X_test.txt, y_test.txt and subject_test.txt
- Read the column names from features.txt and convert them to lower case. Also remove all non-letter-characters.
- Read all activity names from activity_names.txt
- Substitute all activity indices with their names
- Add y_train and subject_train as columns to the train dataset
- Add y_test and subject_test as columns to the test dataset
- Combine the train and test datasets
- Filter all columns that contain either "mean" or "std" in their column names
- Create a tidy dataset with the average of all those variables per activity and subject
- Write that dataset to a file "tidydata.csv"