topic-modeler

Framework to apply LDA and Biterm topic modelling to an unlabeled corpus.

The code for LDA utilized the implementation offered by Gensim here and the code for the Biterm topic model uses the implementation available here.

The folder is organized as follows:

pip install -r requirements.txt

/models/: Separated by biterm and LDA, includes methods to retrieve top vocabulary words and coherence scores
/preprocessing/: Handles text preprocessing
/util/: Extra utility methods

Scripts in main directory:

run_model.py: Sample code to train LDA/Biterm/Guided LDA models
get_coherence.py: Retrieves coherence metrics for LDA and Biterm models. Topic coherence models from implementation offered by Gensim here.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
models		models
preprocessing		preprocessing
results		results
util		util
LICENSE		LICENSE
README.md		README.md
get_biterm_info.py		get_biterm_info.py
get_coherence.py		get_coherence.py
requirements.txt		requirements.txt
run_model.py		run_model.py

Provide feedback