Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Morizeyao authored Dec 7, 2019
1 parent 81d6b43 commit ce72444
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@
- Chinese version of GPT2 training code, using BERT tokenizer or BPE tokenizer. It is based on the extremely awesome repository from HuggingFace team [Transformers](https://github.com/huggingface/transformers). Can write poems, news, novels, or train general language models. Support char level, word level and BPE level. Support large training corpus.
- 中文的GPT2训练代码,使用BERT的Tokenizer或Sentencepiece的BPE model(感谢[kangzhonghua](https://github.com/kangzhonghua)的贡献,实现BPE模式需要略微修改train.py的代码)。可以写诗,新闻,小说,或是训练通用语言模型。支持字为单位或是分词模式或是BPE模式(需要略微修改train.py的代码)。支持大语料训练。

## NEWS 12.7.2019

- 新项目[Decoders-Chinese-TF2.0](https://github.com/Morizeyao/Decoders-Chinese-TF2.0)同样支持GPT2的中文训练,在使用上更加简单,不易产生各种问题。目前还在测试阶段,欢迎大家提出意见。

## NEWS 11.9

- [GPT2-ML](https://github.com/imcaspar/gpt2-ml)(与本项目无任何直接关联)已发布,包含1.5B中文GPT2模型。大家如有兴趣或需要可将其转换为本项目支持的Pytorch格式进行进一步训练或生成测试。
Expand Down

0 comments on commit ce72444

Please sign in to comment.