Skip to content

DICPZhou/article_analysis_word2vec

 
 

Repository files navigation

Article trend Analysis by word2vec

Summary

This Jupyter Notebook demonstrates application example of NLP to energy-industry articles in PDF.

Part1: Preprocessing

Preprocessing part is described: conversion from PDF to text, tokenizer, duplicate file deletion.
About 600 articles were collected and converted into text files.
https://github.com/Jun-Tam/article_analysis_word2vec/blob/master/NLP_Articles_Preprocess.ipynb

Part2: Trend Analysis

Analysis part is described: BoW, IDF/TF, word2vec, WordCloud
https://github.com/Jun-Tam/article_analysis_word2vec/blob/master/NLP_Articles_Word2Vec.ipynb

Word count from all the articles is as shown below.

demo

Word Cloud is a usefull tool to visualize what were people's interests in each year.

demo

Using word2vec "ness" vectors are defined, and each article is converted into ness vectors.
The time-series plots below show recent article trends for individual ness vectors along with oil price history.

demo

Reference

Natural Language Processing In Action, Undestanding, analyzing, and generating text with Python, Manning
Hobson Lane, Cole Howard, Hannes Max Hapke

About

NLP Article Trend Analysis (Oil & Gas)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.9%
  • Python 0.1%