Open Chinese Convert (OpenCC, 開放中文轉換) is an opensource project for conversions between Traditional Chinese, Simplified Chinese and Japanese Kanji (Shinjitai). It supports character-level and phrase-level conversion, character variant conversion and regional idioms among Mainland China, Taiwan and Hong Kong. This is not translation tool between Mandarin and Cantonese, etc.
中文簡繁轉換開源項目,支持詞彙級別的轉換、異體字轉換和地區習慣用詞轉換(中國大陸、臺灣、香港、日本新字體)。不提供普通話與粵語的轉換。
Discussion (Telegram): https://t.me/open_chinese_convert
- 嚴格區分「一簡對多繁」和「一簡對多異」。
- 完全兼容異體字,可以實現動態替換。
- 嚴格審校一簡對多繁詞條,原則爲「能分則不合」。
- 支持中國大陸、臺灣、香港異體字和地區習慣用詞轉換,如「裏」「裡」、「鼠標」「滑鼠」。
- 詞庫和函數庫完全分離,可以自由修改、導入、擴展。
See Download.
Warning: This is NOT an API. You will be banned if you make calls programmatically.
npm npm install opencc
const OpenCC = require('opencc');
const converter = new OpenCC('s2t.json');
converter.convertPromise("汉字").then(converted => {
console.log(converted); // 漢字
});
import { OpenCC } from 'opencc';
async function main() {
const converter: OpenCC = new OpenCC('s2t.json');
const result: string = await converter.convertPromise('汉字');
console.log(result);
}
See demo.js and ts-demo.ts.
PyPI pip install opencc
(Windows, Linux, Mac)
import opencc
converter = opencc.OpenCC('s2t.json')
converter.convert('汉字') # 漢字
#include "opencc.h"
int main() {
const SimpleConverter converter("s2t.json");
converter.Convert("汉字"); // 漢字
return 0;
}
#include "opencc.h"
int main() {
opencc_t opencc = opencc_open("s2t.json");
const char* input = "汉字";
char* converted = opencc_convert_utf8(opencc, input, strlen(input)); // 漢字
opencc_convert_utf8_free(converted);
opencc_close(opencc);
return 0;
}
Document 文檔: https://byvoid.github.io/OpenCC/
opencc --help
opencc_dict --help
opencc_phrase_extract --help
- Swift (iOS): SwiftyOpenCC
- Java: opencc4j
- Android: android-opencc
- PHP: opencc4php
- WebAssembly: wasm-opencc
s2t.json
Simplified Chinese to Traditional Chinese 簡體到繁體t2s.json
Traditional Chinese to Simplified Chinese 繁體到簡體s2tw.json
Simplified Chinese to Traditional Chinese (Taiwan Standard) 簡體到臺灣正體tw2s.json
Traditional Chinese (Taiwan Standard) to Simplified Chinese 臺灣正體到簡體s2hk.json
Simplified Chinese to Traditional Chinese (Hong Kong variant) 簡體到香港繁體hk2s.json
Traditional Chinese (Hong Kong variant) to Simplified Chinese 香港繁體到簡體s2twp.json
Simplified Chinese to Traditional Chinese (Taiwan Standard) with Taiwanese idiom 簡體到繁體(臺灣正體標準)並轉換爲臺灣常用詞彙tw2sp.json
Traditional Chinese (Taiwan Standard) to Simplified Chinese with Mainland Chinese idiom 繁體(臺灣正體標準)到簡體並轉換爲中國大陸常用詞彙t2tw.json
Traditional Chinese (OpenCC Standard) to Taiwan Standard 繁體(OpenCC 標準)到臺灣正體hk2t.json
Traditional Chinese (Hong Kong variant) to Traditional Chinese 香港繁體到繁體(OpenCC 標準)t2hk.json
Traditional Chinese (OpenCC Standard) to Hong Kong variant 繁體(OpenCC 標準)到香港繁體t2jp.json
Traditional Chinese Characters (Kyūjitai) to New Japanese Kanji (Shinjitai) 繁體(OpenCC 標準,舊字體)到日文新字體jp2t.json
New Japanese Kanji (Shinjitai) to Traditional Chinese Characters (Kyūjitai) 日文新字體到繁體(OpenCC 標準,舊字體)tw2t.json
Traditional Chinese (Taiwan standard) to Traditional Chinese 臺灣正體到繁體(OpenCC 標準)
g++ 4.6+ or clang 3.2+ is required.
make
build.cmd
make test
test.cmd
make benchmark
Example results (from Travis CI):
1: ------------------------------------------------------------------
1: Benchmark Time CPU Iterations
1: ------------------------------------------------------------------
1: BM_Initialization/hk2s 1587215 ns 1587485 ns 435
1: BM_Initialization/hk2t 126112 ns 125976 ns 5384
1: BM_Initialization/jp2t 245646 ns 245414 ns 2847
1: BM_Initialization/s2hk 26017749 ns 26017390 ns 27
1: BM_Initialization/s2t 26298084 ns 26296375 ns 27
1: BM_Initialization/s2tw 26483120 ns 26482164 ns 27
1: BM_Initialization/s2twp 26455564 ns 26454666 ns 26
1: BM_Initialization/t2hk 44759 ns 44636 ns 15733
1: BM_Initialization/t2jp 143401 ns 143227 ns 4876
1: BM_Initialization/t2s 1374298 ns 1373979 ns 510
1: BM_Initialization/tw2s 1443389 ns 1443701 ns 464
1: BM_Initialization/tw2sp 1699645 ns 1699823 ns 399
1: BM_Initialization/tw2t 76294 ns 76083 ns 9229
1: BM_Convert 581 ms 581 ms 1
1/1 Test #1: BenchmarkTest .................... Passed 14.49 sec
Apache License 2.0
- darts-clone BSD License
- marisa-trie BSD License
- tclap MIT License
- rapidjson MIT License
- Google Test BSD License
All these libraries are statically linked.
- Introduction 詳細介紹 https://github.com/BYVoid/OpenCC/wiki/%E7%B7%A3%E7%94%B1
- 現代漢語常用簡繁一對多字義辨析表 http://ytenx.org/byohlyuk/KienxPyan
- BYVoid
- 佛振
- Peng Huang
- LI Daobing
- Kefu Chai
- Kan-Ru Chen
- Ma Xiaojun
- Jiang Jiang
- Ruey-Cheng Chen
- Paul Meng
- Lawrence Lau
- 瑾昀
- 內木一郎
- Marguerite Su
- Brian White
- Qijiang Fan
- LEOYoon-Tsaw
- Steven Yao
- Pellaeon Lin
- stony
- steelywing
- 吕旭东
- Weng Xuetian
- Ma Tao
- Heinz Wiesinger
- J.W
- Amo Wu
- Mark Tsai
- Zhe Wang
- sgqy
- Qichuan (Sean) ZHANG
- Flandre Scarlet
- 宋辰文
- iwater
- Xpol Wan
- Weihang Lo
- Cychih
- kyleskimo
- Ryuan Choi
- Tony Able
- Xiao Liang
Please update this list you have contributed OpenCC.