A Python package for bi-directional transliteration of Cyrillic script to Latin script and vice versa.
By default, transliterates for the Serbian language. A language flag can be set in order to transliterate to and from Bulgarian, Montenegrin, Macedonian, Mongolian, Russian, Serbian, Tajik, and Ukrainian.
Transliteration is the conversion of a text from one script to another. For instance, a Latin alphabet transliteration of the Serbian phrase "Мој ховеркрафт је пун јегуља" is "Moj hoverkraft je pun jegulja".
CyrTranslit is hosted in the Python Package Index (PyPI) so it can be installed using pip:
python -m pip install cyrtranslit # latest version
python -m pip install cyrtranslit==1.0 # specific version
python -m pip install cyrtranslit>=1.0 # minimum version
CyrTranslit currently supports bi-directional transliteration of Bulgarian, Montenegrin, Macedonian, Mongolian, Russian, Serbian, Tajik, and Ukrainian:
>>> import cyrtranslit
>>> cyrtranslit.supported()
['bg', 'me', 'mk', 'mn', 'ru', 'sr', 'tj', 'ua']
>>> import cyrtranslit
>>> cyrtranslit.to_latin("Съединението прави силата!", "bg")
"Săedinenieto pravi silata!"
>>> cyrtranslit.to_cyrillic("Săedinenieto pravi silata!", "bg")
"Съединението прави силата!"
>>> import cyrtranslit
>>> cyrtranslit.to_latin("Република", "me")
"Republika"
>>> cyrtranslit.to_cyrillic("Republika", "me")
"Република"
>>> import cyrtranslit
>>> cyrtranslit.to_latin("Моето летачко возило е полно со јагули", "mk")
"Moeto letačko vozilo e polno so jaguli"
>>> cyrtranslit.to_cyrillic("Moeto letačko vozilo e polno so jaguli", "mk")
"Моето летачко возило е полно со јагули"
>>> import cyrtranslit
>>> cyrtranslit.to_latin("Амрагаа Сүнжидмаагаа гэсээр ирлээ дээ хө-хө-хө", "mn")
"Amragaa Sünjidmaagaa geseer irlee dee khö-khö-khö"
>>> cyrtranslit.to_cyrillic("Amragaa Sünjidmaagaa geseer irlee dee khö-khö-khö", "mn")
"Амрагаа Сүнжидмаагаа гэсээр ирлээ дээ хө-хө-хө"
>>> import cyrtranslit
>>> cyrtranslit.to_latin("Моё судно на воздушной подушке полно угрей", "ru")
"Moyo sudno na vozdushnoj podushke polno ugrej"
>>> cyrtranslit.to_cyrillic("Moyo sudno na vozdushnoj podushke polno ugrej", "ru")
"Моё судно на воздушной подушке полно угрей"
>>> import cyrtranslit
>>> cyrtranslit.to_latin("Мој ховеркрафт је пун јегуља")
"Moj hoverkraft je pun jegulja"
>>> cyrtranslit.to_cyrillic("Moj hoverkraft je pun jegulja")
"Мој ховеркрафт је пун јегуља"
>>> import cyrtranslit
>>> cyrtranslit.to_latin("Ман мактуб навишта истодам", "tj")
"Man maktub navišta istodam"
>>> cyrtranslit.to_cyrillic("Man maktub navišta istodam", "tj")
"Ман мактуб навишта истодам"
>>> import cyrtranslit
>>> cyrtranslit.to_latin("Під лежачий камінь вода не тече", "ua")
"Pid ležačyj kamin' voda ne teče"
>>> cyrtranslit.to_cyrillic("Pid ležačyj kamin' voda ne teče", "ua")
"Під лежачий камінь вода не тече"
You can include support for other Cyrillic script alphabets. Follow these steps in order to do so:
- Create a new transliteration dictionary in the mapping.py file and reference to it in the TRANSLIT_DICT dictionary.
- Watch out for cases where two consecutive Latin alphabet letters are meant to transliterate into a single Cyrillic script letter. These cases need to be explicitly checked for inside the to_cyrillic() function in __init__.py.
- Add test cases inside of tests.py.
- Update the documentation in the README.md and in the doc directory.
A big thank you to everyone who contributed:
- @Syndamia / Bulgarian 🇧🇬
- @ratijas / Russian 🇷🇺
- @diejani / Tajik 🇹🇯
- @AnonymousVoice1 / Ukrainian 🇺🇦
- @Serbipunk / Mongolian 🇲🇳
A citation would be much appreciated if you use CyrTranslit in a research publication:
BibTex entry:
@software{georges_labreche_2021_4643047,
author = {Georges Labrèche},
title = {CyrTranslit},
month = mar,
year = 2021,
note = {{A Python package for bi-directional
transliteration of Cyrillic script to Latin script
and vice versa. Supports Bulgarian, Montenegrin,
Macedonian, Mongolian, Russian, Serbian, Tajik, and
Ukrainian.}},
publisher = {Zenodo},
version = {v1.0},
doi = {10.5281/zenodo.4643047},
url = {https://doi.org/10.5281/zenodo.4643047}
}