Skip to content

BasselSharaf/Movies-Sentiment-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Steps

A) Data Collection

We use beautifulsoup to write a script to download all movie scripts from IMSDB.

B) Data Preprocessing

In this part, we will be cleaning the text files before extracting a list of words from them. The preprocessing pipeline includes at least 3 steps e.g. (removing spaces, removing stopwords, removing punctuation).

C) Feature Extraction

In this part, we will make sure that each movie script has now been converted into a vector of filtered words.

D) VAD Vectorization

Converting each movie script’s list of words into valence, arousal and dominance could be done manually using map() function or could be done using emotion() function using labMT’s builtin method. Since some of the words in the scripts do not have a corresponding value in the VAD dictionary, we can replace them with 0s. Finally, we now have 3 very large vectors, that consist of 0s and other values that were replaced from the VAD dictionary. We need to strip down all the 0s from the vectors and average them using windows of size 500. So every 500 (nonzero) values will be replaced with a single value that represents the average.

E) Output

Now that we have 3 vectors for every movie script. We plot all 3 vectors onto the same figure (using any 3 different colors) and save that figure to a jpg file with the same name as the movie script. For instance, the corresponding figure for the movie “17 Again” would be “17 Again.jpg”

Samples

Dark City

Dark-Ciy

Halloween The Curse of Michael Myers

Halloween-The-Curse-of-Michael-Myers

Final Destination

Final-Desinaion

Boogie Nights

Boogie-Nighs

About

Built emotional arcs for movie-scripts using sentiment analysis.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published