Skip to content

jerielizabeth/text2topics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

text2topics

Collection of functions used in my dissertation, A Gospel of Health and Salvation.

Available sections of the module are:

  • clean -- code for cleaning messy OCR
  • models -- code creating topic modeling pipeline
  • phrases -- collection of most common noun phrases in corpus
  • preprocess -- prepare text for modeling with Mallet
  • reports -- code for taking the data about the corpus and isolating particular elements
  • utilities -- helper functions for executing the above tasks

Examples

To generate error rate statistics:

from text2topics import reports

reports.process_directory(directory, spelling_dictionary)

To create a spelling dictionary from text files:

from text2topics import utilities

utilities.create_spelling_dictionary(directory, wordlists)

wordlists is a list of file(s) containing the verified words and directory is the directory where those wordlist files reside. This function converts all words to lowercase and returns only the list of unique entries.

Installation

To install, navigate to the root directory of module (text2topics/) and run

pip install .

To update, run

pip install --upgrade .

About

Library for dissertation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages