Awesome-Chinese-LLM

以ChatGPT为代表的大语言模型(LLM)展现出成为通用人工智能(AGI)的潜力，并受到~~NLP社区~~社会各界的广泛关注。然而，目前整理LLM的项目大多以英文语料和英文LLM为主，这为构建高质量的中文对话大模型形成阻碍。

为了促进中文LLM的发展，本项目整理了可用的中文大模型，中文开源数据，也欢迎大家继续在此基础上补充，一起为中文大模型的发展添砖加瓦。

Awesome-Chinese-LLM
- 中文大模型
- 开源数据
Coming soon
- 中文大模型测评

中文大模型

https://github.com/ydli-ai/Chinese-ChatLLaMA 中文ChatLlama

https://github.com/LianjiaTech/BELLE 基于BLOOM和LLAMA针对中文做了优化

https://github.com/THUDM/GLM 清华THUDM开源的中/英预训练模型

https://github.com/THUDM/ChatGLM-6B 清华THUDM开源的双语对话模型

https://github.com/PhoebusSi/Alpaca-CoT 在Alpaca基础上使用CoT数据调整

https://github.com/ymcui/Chinese-LLaMA-Alpaca/tree/v1.0 中文LLaMA&Alpaca大语言模型+本地部署

开源数据

https://github.com/XueFuzhao/InstructionWild colossal-ai的self-instruct数据集，中英。

https://github.com/LianjiaTech/BELLE/blob/main/zh_seed_tasks.json lianjia.tech的self-instruct数据集

https://huggingface.co/datasets/BelleGroup/generated_train_1M_CN lianjia.tech参考Stanford Alpaca 生成的中文数据集1M

https://huggingface.co/datasets/BelleGroup/generated_train_0.5M_CN lianjia.tech参考Stanford Alpaca 生成的中文数据集0.5M

https://zhuanlan.zhihu.com/p/163616279/ HelloNLP总结的多个中文预训练语料

https://github.com/dbiir/UER-py/wiki/%E9%A2%84%E8%AE%AD%E7%BB%83%E6%95%B0%E6%8D%AE CLUECorpusSmall，News Commentary v13

https://github.com/ydli-ai/Chinese-ChatLLaMA#%E4%B8%AD%E6%96%87%E6%8C%87%E4%BB%A4%E6%95%B0%E6%8D%AE%E9%9B%86 多个中文指令数据集

中文大模型测评

待补充。。。

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src		src
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-Chinese-LLM

中文大模型

开源数据

中文大模型测评

About

Releases

Packages

butyuhao/Awesome-Chinese-LLM

Folders and files

Latest commit

History

Repository files navigation

Awesome-Chinese-LLM

中文大模型

开源数据

中文大模型测评

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages