Skip to content

Framework to apply LDA and Biterm topic modelling to an unlabeled corpus

License

Notifications You must be signed in to change notification settings

smacawi/topic-modeler

Repository files navigation

topic-modeler

Framework to apply LDA and Biterm topic modelling to an unlabeled corpus.

The code for LDA utilized the implementation offered by Gensim here and the code for the Biterm topic model uses the implementation available here.

The folder is organized as follows:

  • requirements.txt: python packages needed for this project. Install using
pip install -r requirements.txt
  • /models/: Separated by biterm and LDA, includes methods to retrieve top vocabulary words and coherence scores
  • /preprocessing/: Handles text preprocessing
  • /util/: Extra utility methods

Scripts in main directory:

  • run_model.py: Sample code to train LDA/Biterm/Guided LDA models
  • get_coherence.py: Retrieves coherence metrics for LDA and Biterm models. Topic coherence models from implementation offered by Gensim here.

About

Framework to apply LDA and Biterm topic modelling to an unlabeled corpus

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •