Skip to content

Scraping and processing of the TV series Suits transcript to extract most common phrases

Notifications You must be signed in to change notification settings

tsitsimis/suits-overused-phrases

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Suits Overused Phrases

Motivation

Whoever has watched enough of the TV show Suits knows that there are specific phrases repeated over and over in many episodes. Not only that but these phrases are used by many characters, as if they all have the same way of talking.

Feeling the urge to quantify this observation, this notebook downloads, parses and analyses all the subtitles from all 134 episodes (9 seasons) of Suits. It uses n-grams to assist finding common phrases and regular expressions to match them and similar ones in the subtitles corpus.

Reddit post

Tools

  • requests and BeautifulSoup to fetch and parse episode transcripts from online source
  • Python's re for Regular Expressions to match similar phrases
  • nltk for most common n-grams detection
  • matplotlib and PowerPoint for final visualization

About

Scraping and processing of the TV series Suits transcript to extract most common phrases

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published