better_profanity

A Python library to clean swear words (and their leetspeak) in strings

Inspired from package profanity of Ben Friedland, this library is much faster than the original one, by using string comparison instead of regex.

It supports modified spellings (such as p0rn, h4ndjob and handj0b).

Requirements

To make use of Python static typing, this package only works with Python 3.5+.

Installation

$ pip install better_profanity

Unicode characters

Only Unicode characters from categories Ll, Lu, Mc and Mn are added. More on Unicode categories can be found here.

However, this library has not supported all languages yet, such as Chinese.

Usage

By default, on the first .censor() call, function .load_censor_words() generates all possible leetspeak words, from profanity_wordlist.txt, to be used to compare against the input texts. The full mapping of the library can be found in profanity.py.

For example, the word handjob would be loaded into:

'h@ndjob', 'handj0b', 'handj@b', 'h*ndj*b', 'h*ndjob', 'h@ndj0b', 'h@ndj*b', 'h4ndj*b',
'h@ndj@b', 'handjob', 'h4ndj0b', 'h4ndjob', 'h4ndj@b', 'h*ndj0b', 'handj*b', 'h*ndj@b'

This set of words will be stored in memory (~5MB+).

1. Censor swear words from a text

By default, profanity replaces each swear words with 4 asterisks ****.

from better_profanity import profanity

if __name__ == "__main__":
    text = "You p1ec3 of sHit."

    censored_text = profanity.censor(text)
    print(censored_text)
    # You **** of ****.

2. Censor doesn't care about word dividers

The function .censor() also hide words separated not just by an empty space but also other dividers, such as _, , and .. Except for @, $, *, ", '.

from better_profanity import profanity

if __name__ == "__main__":
    text = "...sh1t...hello_cat_fuck,,,,123"

    censored_text = profanity.censor(text)
    print(censored_text)
    # "...****...hello_cat_****,,,,123"

3. Censor swear words with custom character

4 instances of the character in second parameter in .censor() will be used to replace the swear words.

from better_profanity import profanity

if __name__ == "__main__":
    text = "You p1ec3 of sHit."

    censored_text = profanity.censor(text, '-')
    print(censored_text)
    # You ---- of ----.

4. Check if the string contains any swear words

Function .contains_profanity() return True if any words in the given string has a word existing in the wordlist.

from better_profanity import profanity

if __name__ == "__main__":
    dirty_text = "That l3sbi4n did a very good H4ndjob."

    profanity.contains_profanity(dirty_text)
    # True

5. Censor swear words with a custom wordlist

Function .load_censor_words() takes a List of strings as censored words. The provided list will replace the default wordlist.

from better_profanity import profanity

if __name__ == "__main__":
    custom_badwords = ['happy', 'jolly', 'merry']
    profanity.load_censor_words(custom_badwords)

    print(profanity.contains_profanity("Fuck you!"))
    # Fuck you

    print(profanity.contains_profanity("Have a merry day! :)"))
    # Have a **** day! :)

6. Censor Unicode characters

No extra steps needed!

from better_profanity import profanity

if __name__ == "__main__":
    bad_text = "Эффекти́вного противоя́дия от я́да фу́гу не существу́ет до сих пор"
    profanity.load_censor_words(["противоя́дия"])

    censored_text = profanity.censor(text)
    print(censored_text)
    # Эффекти́вного **** от я́да фу́гу не существу́ет до сих пор

Testing

$ python tests.py

Versions

v0.3.2 - Fix a typo in documentation.
v0.3.1 - Remove unused dependencies.
v0.3.0 - Add support for Unicode characters (Categories: Ll, Lu, Mc and Mn) #2.
v0.2.0 - Bug fix + faster censoring
v0.1.0 - Initial release

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Special thanks to

Andrew Grinevich - Add support for Unicode characters.

Acknowledgments

Ben Friedland - For the inspiring package profanity.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
better_profanity		better_profanity
.gitignore		.gitignore
.travis.yml		.travis.yml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py
tests.py		tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

better_profanity

Requirements

Installation

Unicode characters

Usage

1. Censor swear words from a text

2. Censor doesn't care about word dividers

3. Censor swear words with custom character

4. Check if the string contains any swear words

5. Censor swear words with a custom wordlist

6. Censor Unicode characters

Testing

Versions

Contributing

License

Special thanks to

Acknowledgments

About

Releases

Packages

Languages

License

adarsa/better_profanity

Folders and files

Latest commit

History

Repository files navigation

better_profanity

Requirements

Installation

Unicode characters

Usage

1. Censor swear words from a text

2. Censor doesn't care about word dividers

3. Censor swear words with custom character

4. Check if the string contains any swear words

5. Censor swear words with a custom wordlist

6. Censor Unicode characters

Testing

Versions

Contributing

License

Special thanks to

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages