Project Steps

A) Data Collection

We use beautifulsoup to write a script to download all movie scripts from IMSDB.

B) Data Preprocessing

In this part, we will be cleaning the text files before extracting a list of words from them. The preprocessing pipeline includes at least 3 steps e.g. (removing spaces, removing stopwords, removing punctuation).

C) Feature Extraction

In this part, we will make sure that each movie script has now been converted into a vector of filtered words.

D) VAD Vectorization

Converting each movie script’s list of words into valence, arousal and dominance could be done manually using map() function or could be done using emotion() function using labMT’s builtin method. Since some of the words in the scripts do not have a corresponding value in the VAD dictionary, we can replace them with 0s. Finally, we now have 3 very large vectors, that consist of 0s and other values that were replaced from the VAD dictionary. We need to strip down all the 0s from the vectors and average them using windows of size 500. So every 500 (nonzero) values will be replaced with a single value that represents the average.

E) Output

Now that we have 3 vectors for every movie script. We plot all 3 vectors onto the same figure (using any 3 different colors) and save that figure to a jpg file with the same name as the movie script. For instance, the corresponding figure for the movie “17 Again” would be “17 Again.jpg”

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.ipynb_checkpoints		.ipynb_checkpoints
pics		pics
scripts		scripts
LICENSE		LICENSE
NRC-VAD-Lexicon.txt		NRC-VAD-Lexicon.txt
Project.ipynb		Project.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Steps

A) Data Collection

B) Data Preprocessing

C) Feature Extraction

D) VAD Vectorization

E) Output

Samples

Dark City

Halloween The Curse of Michael Myers

Final Destination

Boogie Nights

About

Releases

Packages

Languages

License

BasselSharaf/Movies-Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Project Steps

A) Data Collection

B) Data Preprocessing

C) Feature Extraction

D) VAD Vectorization

E) Output

Samples

Dark City

Halloween The Curse of Michael Myers

Final Destination

Boogie Nights

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages