A Python library to clean swear words (and their leetspeak) in strings
Inspired from package profanity of Ben Friedland, this library is much faster than the original one, by using string comparison instead of regex.
It supports modified spellings (such as p0rn
, h4ndjob
and handj0b
).
To make use of Python static typing, this package only works with Python 3.5+
.
$ pip install better_profanity
Only Unicode characters from categories Ll
, Lu
, Mc
and Mn
are added. More on Unicode categories can be found here.
However, this library has not supported all languages yet, such as Chinese.
By default, on the first .censor()
call, function .load_censor_words()
generates all possible leetspeak words, from profanity_wordlist.txt, to be used to compare against the input texts. The full mapping of the library can be found in profanity.py.
For example, the word handjob
would be loaded into:
'h@ndjob', 'handj0b', 'handj@b', 'h*ndj*b', 'h*ndjob', 'h@ndj0b', 'h@ndj*b', 'h4ndj*b',
'h@ndj@b', 'handjob', 'h4ndj0b', 'h4ndjob', 'h4ndj@b', 'h*ndj0b', 'handj*b', 'h*ndj@b'
This set of words will be stored in memory (~5MB+).
By default, profanity
replaces each swear words with 4 asterisks ****
.
from better_profanity import profanity
if __name__ == "__main__":
text = "You p1ec3 of sHit."
censored_text = profanity.censor(text)
print(censored_text)
# You **** of ****.
The function .censor()
also hide words separated not just by an empty space
but also other dividers, such as _
, ,
and .
. Except for @, $, *, ", '
.
from better_profanity import profanity
if __name__ == "__main__":
text = "...sh1t...hello_cat_fuck,,,,123"
censored_text = profanity.censor(text)
print(censored_text)
# "...****...hello_cat_****,,,,123"
4 instances of the character in second parameter in .censor()
will be used to replace the swear words.
from better_profanity import profanity
if __name__ == "__main__":
text = "You p1ec3 of sHit."
censored_text = profanity.censor(text, '-')
print(censored_text)
# You ---- of ----.
Function .contains_profanity()
return True
if any words in the given string has a word existing in the wordlist.
from better_profanity import profanity
if __name__ == "__main__":
dirty_text = "That l3sbi4n did a very good H4ndjob."
profanity.contains_profanity(dirty_text)
# True
Function .load_censor_words()
takes a List
of strings as censored words.
The provided list will replace the default wordlist.
from better_profanity import profanity
if __name__ == "__main__":
custom_badwords = ['happy', 'jolly', 'merry']
profanity.load_censor_words(custom_badwords)
print(profanity.contains_profanity("Fuck you!"))
# Fuck you
print(profanity.contains_profanity("Have a merry day! :)"))
# Have a **** day! :)
No extra steps needed!
from better_profanity import profanity
if __name__ == "__main__":
bad_text = "Эффекти́вного противоя́дия от я́да фу́гу не существу́ет до сих пор"
profanity.load_censor_words(["противоя́дия"])
censored_text = profanity.censor(text)
print(censored_text)
# Эффекти́вного **** от я́да фу́гу не существу́ет до сих пор
$ python tests.py
- v0.3.2 - Fix a typo in documentation.
- v0.3.1 - Remove unused dependencies.
- v0.3.0 - Add support for Unicode characters (Categories: Ll, Lu, Mc and Mn) #2.
- v0.2.0 - Bug fix + faster censoring
- v0.1.0 - Initial release
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.
This project is licensed under the MIT License - see the LICENSE.md file for details
- Andrew Grinevich - Add support for Unicode characters.
- Ben Friedland - For the inspiring package profanity.