moviemap

What is moviemap?

Visualizing and exploring data points is a great (fun) way to get insights about data.
What if we can make individual movie into single representation?
If the representation quality is good enough similar movies would be clusttered together, so that recommending movie would become a trivial thing.

Scrape movie list from movie review web site, Watcha
- Get 5200 + rated movies by critic Lee Dongjyn.
- Features processed: ["title", "plot", "date", "genere", "director", "country", "user_rating", "critic_rating"]
Make representation vector for each movie
- Using open source Korean NLP model (KoBERT), make plot into single sentence vector
- Features describing movies are used together [director, date, genere, country, user_rating]
- Train movie rating prediction model (forward path: bottom to top)
  - [Loss] (output, critic rating)
  - [MLP]
  - [plot vector] + [genere embs concat] + [director emb] + [country emb] + [date emb] + [user rating]
  - [koBERT (partially freezed)]
  - [tokenized plot text]
- After training, get representation vector from hidden dim of MLP model
- Reduce dimensionality using t-SNE (hidden_dim -> 2 dim)
Make interactive plot on Web
- Used Nomic AI's deepscatter library (https://github.com/nomic-ai/deepscatter) for efficiency and speed.

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
moviemap-js		moviemap-js
repr_extractor		repr_extractor
scraper		scraper
.gitignore		.gitignore
README.md		README.md
sample.jpg		sample.jpg