Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.datasets		.datasets
Output		Output
RankingData		RankingData
.gitignore		.gitignore
AFINN-111.txt		AFINN-111.txt
Amazon_Reviews_Modeling.json		Amazon_Reviews_Modeling.json
Amazon_Reviews_Modeling_Assist.ipynb		Amazon_Reviews_Modeling_Assist.ipynb
Amazon_Watches_Reviews_EDA.ipynb		Amazon_Watches_Reviews_EDA.ipynb
Amazon_Watches_Reviews_EDA_revA - Zeppelin.pdf		Amazon_Watches_Reviews_EDA_revA - Zeppelin.pdf
Amazon_Watches_Reviews_EDA_revA.json		Amazon_Watches_Reviews_EDA_revA.json
Amazon_Watches_Reviews_EDA_revB.json		Amazon_Watches_Reviews_EDA_revB.json
Amazon_Watches_Reviews_EDA_revC.json		Amazon_Watches_Reviews_EDA_revC.json
Amazon_Watches_Reviews_Sentiment_revA.json		Amazon_Watches_Reviews_Sentiment_revA.json
Amazon_Watches_Reviews_Sentiment_revE.json		Amazon_Watches_Reviews_Sentiment_revE.json
Good_Reads_Reviews.ipynb		Good_Reads_Reviews.ipynb
Movie_Lens_Reviews.ipynb		Movie_Lens_Reviews.ipynb
README.md		README.md
Video_Games_Reviews.ipynb		Video_Games_Reviews.ipynb
data_set_downloader.py		data_set_downloader.py
requirements.txt		requirements.txt

Repository files navigation

CS 498 CCA: Project Team 40

Improving Amazon Star Ratings through Text Analysis on Review Content

Download Data Sets

Install Spark on your local machine, e.g. brew install apache-spark for MacOS
Install Python 3, e.g. brew install python for MacOS
Install the Python dependencies: pip install -r requirements.txt
Run the Data Set Downloader: python data_set_downloader.py

Usage and Options

Usage: data_set_downloader.py [OPTIONS]

  Utility to download Amazon Product Reviews data set.

Options:
  -a, --aws-access-key-id TEXT    AWS Access Key ID
  -s, --aws-secret-access-key TEXT
                                  AWS Secret Access Key
  -r, --aws-data-set-region TEXT  AWS Data Set Region
  -b, --aws-data-set-bucket TEXT  AWS Data Set S3 Bucket
  -f, --file-names TEXT           Data Set File Filter
  -d, --file-destination PATH     Data Set File Destination
  --debug                         Print additional information
  --help                          Show this message and exit.

Default Values

Option	Value
`aws-data-set-region`	`us-east-1`
`aws-data-set-bucket`	`amazon-reviews-pds`
`file-destination`	`./datasets`

Example Usage

Run with file name filter, custom file destination and debug info activated:

python data_set_downloader.py --file-names tsv/amazon_reviews_us_Watches_v1_00.tsv.gz --file-names tsv/amazon_reviews_us_Home_Entertainment_v1_00.tsv.gz -d .target --debug

Jupyter Notebooks

Install Spark on your local machine, e.g. brew install apache-spark for MacOS
Install Python 3, e.g. brew install python for MacOS
Install the Python dependencies: pip install -r requirements.txt
Start Jupyter Notebook: jupyter notebook

Statistics

A preliminary example of the process we intend to implement in a distributed computing process is shown in Amazon_Watches_Reviews_EDA.ipynb. This Juptyer notebook demonstates how the Amazon Reviews can be divided into several tiers and how a training data set and model can be used to predict the tier that the product should belong using certain aggregate features.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CS 498 CCA: Project Team 40

Download Data Sets

Usage and Options

Default Values

Example Usage

Jupyter Notebooks

Statistics

About

Releases

Packages

Contributors 4

Languages

FraBle/uiuc_cs_498

Folders and files

Latest commit

History

Repository files navigation

CS 498 CCA: Project Team 40

Download Data Sets

Usage and Options

Default Values

Example Usage

Jupyter Notebooks

Statistics

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages