Amazon_Fine_Food_Review

Sentiment Analysis of the Amazon Fine Food Review competition from Kaggle

Data Source: https://www.kaggle.com/snap/amazon-fine-food-reviews

The Amazon Fine Food Reviews dataset consists of reviews of fine foods from Amazon.

Number of reviews: 568,454
Number of users: 256,059
Number of products: 74,258
Timespan: Oct 1999 - Oct 2012
Number of Attributes/Columns in data: 10

Attribute Information:

Id -ProductId - unique identifier for the product
UserId - unqiue identifier for the user
ProfileName
HelpfulnessNumerator - number of users who found the review helpful
HelpfulnessDenominator - number of users who indicated whether they found the review helpful or not
Score - rating between 1 and 5
Time - timestamp for the review
Summary - brief summary of the review
Text - text of the review

Objective:

Given a review, determine whether the review is positive (Rating of 4 or 5) or negative (rating of 1 or 2).

[Q] How to determine if a review is positive or negative?

[Ans] We could use the Score/Rating. A rating of 4 or 5 could be cosnidered a positive review. A review of 1 or 2 could be considered negative. A review of 3 is nuetral and ignored. This is an approximate and proxy way of determining the polarity (positivity/negativity) of a review.

Steps to Solve

Clean the given text data
Convert the text data into vectors using NLTK (Bag of Words, TFIDF, word2vec etc.)
Apply classification algorithm like logistic regression

Screenshots of the app

What is bag of words

The bag-of-words model is a simplifying representation used in natural language processing and information retrieval (IR). In this model, a text (such as a sentence or a document) is represented as the bag (multiset) of its words, disregarding grammar and even word order but keeping multiplicity. The bag-of-words model has also been used for computer vision.

The bag-of-words model is commonly used in methods of document classification where the (frequency of) occurrence of each word is used as a feature for training a classifier. wikipedia

image taken from

what is TFIDF

In information retrieval, tf–idf, TF*IDF, or TFIDF, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.[1] It is often used as a weighting factor in searches of information retrieval, text mining, and user modeling. The tf–idf value increases proportionally to the number of times a word appears in the document and is offset by the number of documents in the corpus that contain the word, which helps to adjust for the fact that some words appear more frequently in general. tf–idf is one of the most popular term-weighting schemes today. A survey conducted in 2015 showed that 83% of text-based recommender systems in digital libraries use tf–idf.

Variations of the tf–idf weighting scheme are often used by search engines as a central tool in scoring and ranking a document's relevance given a user query. tf–idf can be successfully used for stop-words filtering in various subject fields, including text summarization and classification. Wikipedia

images taken from

ML Technologies Used

Pandas
Sklearn
BeautifulSoup
NLTK
Matplotlib
Seaborn

Web Technologies

Flask
HTML
AWS (For EC2 deployment)

How to run

To run this app locally you will need to install the packages mentioned

python app.py

open the browser and go to 127.0.0.1:5000/ and enter the text for review

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
images		images
templates		templates
.gitignore		.gitignore
Amazon_Fine_Food_Reviews_Analysis.ipynb		Amazon_Fine_Food_Reviews_Analysis.ipynb
LICENSE		LICENSE
README.md		README.md
amazon_fine_food_review_sentiment_analysis.ipynb		amazon_fine_food_review_sentiment_analysis.ipynb
app.py		app.py
bow_counts.pickle		bow_counts.pickle
data_clean.pkl		data_clean.pkl
data_removed_stop_words.pkl		data_removed_stop_words.pkl
lr_bow.model		lr_bow.model
lr_tf_idf.model		lr_tf_idf.model
model.py		model.py
reviews.csv		reviews.csv
tfidf_counts.pickle		tfidf_counts.pickle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon_Fine_Food_Review

Attribute Information:

Objective:

[Q] How to determine if a review is positive or negative?

Steps to Solve

Screenshots of the app

What is bag of words

what is TFIDF

ML Technologies Used

Web Technologies

How to run

About

Releases

Packages

Languages

License

Vikas-KM/amazon_fine_food_review

Folders and files

Latest commit

History

Repository files navigation

Amazon_Fine_Food_Review

Attribute Information:

Objective:

[Q] How to determine if a review is positive or negative?

Steps to Solve

Screenshots of the app

What is bag of words

what is TFIDF

ML Technologies Used

Web Technologies

How to run

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages