Framework to apply LDA and Biterm topic modelling to an unlabeled corpus.
The code for LDA utilized the implementation offered by Gensim here and the code for the Biterm topic model uses the implementation available here.
The folder is organized as follows:
requirements.txt
: python packages needed for this project. Install using
pip install -r requirements.txt
/models/
: Separated by biterm and LDA, includes methods to retrieve top vocabulary words and coherence scores/preprocessing/
: Handles text preprocessing/util/
: Extra utility methods
Scripts in main directory:
run_model.py
: Sample code to train LDA/Biterm/Guided LDA modelsget_coherence.py
: Retrieves coherence metrics for LDA and Biterm models. Topic coherence models from implementation offered by Gensim here.