Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
requirements		requirements
spellchecker		spellchecker
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Repository files navigation

pyspellchecker

Pure Python Spell Checking based on Peter Norvig's blog post on setting up a simple spell checking algorithm.

It uses a Levenshtein Distance algorithm to find permutations within an edit distance of 2 from the original word. It then compares all permutations (insertions, deletions, replacements, and transpositions) to known words in a word frequency list. Those words that are found more often in the frequency list are more likely the correct results.

Installation

The easiest method to install is using pip:

pip install pyspellchecker

To install from source:

git clone https://github.com/barrust/pyspellchecker.git
cd pyspellchecker
python setup.py install

As always, I highly recommend using the Pipenv package to help manage dependencies!

Quickstart

After installation, using pyspellchecker should be fairly straight forward:

from spellchecker import SpellChecker


spell = SpellChecker()

# find those words that may be misspelled
misspelled = spell.unknown(['something', 'is', 'hapenning', 'here'])

for word in misspelled:
    # Get the one `most likely` answer
    print(spell.correction(word))

    # Get a list of `likely` options
    print(spell.candidates(word))

If the Word Frequency list is not to your liking, you can add additional text to generate a more appropriate list for your use case.

from spellChecker import SpellChecker

spell = SpellChecker()  # loads default word frequency list
spell.word_frequency.load_text_file('./my_free_text_doc.txt')

# if I just want to make sure some words are not flagged as misspelled
spell.word_frequency.load_words(['microsoft', 'apple', 'google'])
spell.known(['microsoft', 'google'])  # will return both now!

More work in storing and loading word frequency lists is planned; stay tuned.

Additional Methods

On-line documentation is in the future; until then you can find SpellChecker here:

correction(word): Returns the most probable result for the misspelled word

candidates(word): Returns a set of possible candidates for the misspelled word

known([words]): Returns those words that are in the word frequency list

unknown([words]): Returns those words that are not in the frequency list

word_probability(word): The frequency of the given word out of all words in the frequency list

The following are less likely to be needed by the user but are available:

edit_distance_1(word): Returns a set of all strings at a Levenshtein Distance of one

edit_distance_2(word): Returns a set of all strings at a Levenshtein Distance of two

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pyspellchecker

Installation

Quickstart

Additional Methods

The following are less likely to be needed by the user but are available:

About

Releases

Packages

Languages

License

mrjamesriley/pyspellchecker

Folders and files

Latest commit

History

Repository files navigation

pyspellchecker

Installation

Quickstart

Additional Methods

The following are less likely to be needed by the user but are available:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages