A toolbox for phoneme-related text processing. It is suitable for Chinese, English, and mixed Chinese and English phonemes. The phonetic transcriptions of Chinese characters use the phonemes of Tsinghua University, and English characters are divided into letters and words.
- Text to Pinyin
- Pinyin to Phoneme
- Text to Phoneme
- Text to ID list
Number Pronunciation - read by digit or by value.
Text Conversion - Full-width and half-width conversion, Simplified and Traditional Chinese conversion.
After running successfully, the command line should display the following information:
# SequenceUtils.text2pinyin("文本转为拼音。")
wen2 ben3 zhuan3 wei4 pin1 yin1。
# SequenceUtils.pinyin2phoneme(SequenceUtils.text2pinyin("拼音转为音素。")
[p, in, 1, -, ii, in, 1, -, zh, uan, 3, -, uu, ui, 4, -, ii, in, 1, -, s, u, 4, -, ., -, ~, _]
# SequenceUtils.text2phoneme("文本转为音素。")
[uu, un, 2, -, b, en, 2, -, zh, uan, 3, -, uu, ui, 4, -, ii, in, 1, -, s, u, 4, -, ., -, ~, _]
# SequenceUtils.text2sequence("文本转为ID列表。")
[25, 63, 72, 2, 4, 37, 72, 2, 29, 59, 73, 2, 25, 62, 74, 2, 2, 15, 45, 74, 2, 4, 44, 73, 2, 130, 2, 1, 0]
After running successfully, the command line should display the following information:
# NumberUtils.sayDigit("1234567890123456")
# NumberUtils.sayNumber("123456")
# NumberUtils.sayDecimal("3.14")
After running successfully, the command line should display the following information:
# Half-width to Full-width ConvertUtils.ban2quan("aA1 ,:$。、")
aA1 ,:$。、
# Full-width to Half-width ConvertUtils.quan2ban("aA1 ,:$。、")
aA1 ,:$。、
# Simplified to Traditional ConvertUtils.jian2fan("中国语言")
# Traditional to Simplified ConvertUtils.fan2jian("中國語言")