GitHub - eunseojo/frequency_tools: frequency tools for basic descriptives

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
collocations.py		collocations.py
common_ngrams.py		common_ngrams.py
common_ngrams_reader.py		common_ngrams_reader.py
find_ngrams.py		find_ngrams.py
graph_collocations.py		graph_collocations.py
graph_difference.py		graph_difference.py
lemmatized_pickles.py		lemmatized_pickles.py
make_ngrams.py		make_ngrams.py
make_yearly_files.py		make_yearly_files.py
ngram_graphs.py		ngram_graphs.py
readme.txt		readme.txt
regression_plots.py		regression_plots.py
sliding_window.py		sliding_window.py
tfidf_application.py		tfidf_application.py
wc.txt		wc.txt
yearly_dated_corr.py		yearly_dated_corr.py

Repository files navigation

Files in here make frequency tools for the FRUS, using the tokenized text. Edited: June 1, 2018.

N-grams
To make ngrams, build the freq files in the following order: 
1) make_yearly_files.py : This file concats all tokenized text of given year; this gets saved in the 'years' directory. 
2) make_ngrams.py : This file creates ngram files in the form of pickles for [1,2,3,4]-grams in the 'pickles' directory. Uses the nltk.ngrams function.
** To make lemmatized ngrams, run:
1) make_yearly_files.py
2) make_pickles_lemmat.py

To call ngrams and graph freq:
1) find_ngrams.py : This file searches freq of given phrase (up to 4 tokens) and charts relative freq. over the years


Regress Ngram Data
For simple regression of trend of freq. run:
1) regression_plots.py : plots seaborn linear regression to figure_*.png and outputs basic stats to stats_*