-
Notifications
You must be signed in to change notification settings - Fork 0
vg4838/Sentiment-analysis-on-hotel-reviews
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
README hotels_dataset_analysis.R - Use RStudio to cleanse hotel 1 dataset. - Select pertinent features - Remove rows that have at least one empty row - Extract a random sample of 10000 rows into a csv file vader.py This file contains three methods: preprocess_data(), sentiment_score(), and main(), as well as several libraries Libraries - nltk, vaderSentiment.vaderSentiment, numpy, pandas, scipy.stats preprocess_data() - Reads data from a file whose name has to be manually typed - Extracts text and title reviews - Performs the followings operations: tokenization, putting words in lowercase, removing stop words, and stemming - Calls sentiment_score() method - Prints most frequent sentiment within a corpus sentiment_score() - Calculates the polarity and intensity of all the sentences in each review. - Prints the overall polarity of the reviews main() - Calls preprocess_data() method Execution process - Make sure the libraries are downloaded using pip or pip3 functions - Manually type the name of the data file - Depending on the name of the features in the data file, you will have to change the attributes you extract For Hotels datasets 1 and 2, we use "review.text" and "review.title". For the Amazon dataset, we used "Reviews" - Run the code for each selected feature ------------------------------------------------------------------------------------------------------------------ naive bayes.py This file contains seven methods: read_file(), pre_process(), remove_noise(), create_data(), get_reviews_for_model(), check_for_custom_reviews() and main(), as well as several libraries Libraries - nltk, numpy, panda read_file(): Reads the file using pandas and returns reviews and ratings as array preprocess_data(): Tokenizes and cleans the data and removes stop words remove_noise(): Removes punctuations, lemmatizes the tokens, and performs pos tagging create_data(): Labels reviews as positive, negative or neutral based on rating and splits into train and test set get_reviews_for_model(): Creates a dictionary of token:True for each token required for naive bayes classifier check_for_custom_reviews(): Tests the model on custom reviews main():start point of program. Execution process: Pass the file path of the dataset you want to test asked as input. The code handles the column name for different datasets. Also, if you want to provide your custom review for hotels, update the custom reviews array in main() function with your own text. ------------------------------------------------------------------------------------------------------------------ preprocess.py Reads data from a file whose name has to be manually typed Extracts text and title reviews Tokenization, Removing of stop words, stemming, retokinzation Return a corpus as a list of detockenized words k nearest neighbors.py This file contains three methods: get_sentiment(), knn(), and main(), as well as several libraries Libraries - nltk, vaderSentiment.vaderSentiment, numpy, panda get_sentiment(prediction) - classifies neighbours based on their score main() -run_k_nearest_neighbors() run_k_nearest_neighbors() - Reads data from a file file whose name has to be manually type - Calculates the k-nearest neighbour of a predictable variable. Calculates the accuracy of the dataset and polarity of the reviews Execution process - Make sure the libraries are downloaded using pip or pip3 functions - Manually type the name of the data file - Depending on the name of the features in the data file, you will have to change the attributes you extract. For Hotels datasets 1 and 2, we use "review.text" and "review.title". For the Amazon dataset, we used "Reviews" - Run the code for each selected feature. For Amazon, the preprocess.py has to be run first and then k nearest neighbors.py
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published