text2topics

Collection of functions used in my dissertation, A Gospel of Health and Salvation.

Available sections of the module are:

clean -- code for cleaning messy OCR
models -- code creating topic modeling pipeline
phrases -- collection of most common noun phrases in corpus
preprocess -- prepare text for modeling with Mallet
reports -- code for taking the data about the corpus and isolating particular elements
utilities -- helper functions for executing the above tasks

Examples

To generate error rate statistics:

from text2topics import reports

reports.process_directory(directory, spelling_dictionary)

To create a spelling dictionary from text files:

from text2topics import utilities

utilities.create_spelling_dictionary(directory, wordlists)

wordlists is a list of file(s) containing the verified words and directory is the directory where those wordlist files reside. This function converts all words to lowercase and returns only the list of unique entries.

Installation

To install, navigate to the root directory of module (text2topics/) and run

pip install .

To update, run

pip install --upgrade .

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
text2topics		text2topics
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.rst		README.rst
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

text2topics

Examples

Installation

About

Releases

Packages

Languages

License

jerielizabeth/text2topics

Folders and files

Latest commit

History

Repository files navigation

text2topics

Examples

Installation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages