旨在记录全球开源开放大模型发展情况,欢迎提供
- 线索
- 材料
- PR
- Issue
序号 | 名称 | 参数规模 | 数据规模 | 说明 |
---|---|---|---|---|
1 | LLaMA-2 | 7B,13B,34B,70B | 2T | 可商用 |
2 | Falcon | 7B,40B,180B | 3.5T | 数据集 RefinedWeb |
3 | baichuan-2 | 7B,13B | 2.6T | 开放,商用需授权,baichuan-1 |
4 | InternLM | 7B,20B | 2.3T | 开放,商用需授权 |
5 | BLOOM | 3B,7.1B,176B | 366B | 可商用,最为宽松,详细介绍 |
6 | GALACTICA | 6.7B,30B,120B | 106B | 开放的科学文本和数据 |
7 | LLaMA | 7B,13B,30B,65B | 1.4T | Meta,代码开源,模型“泄露”,不可商用,详细介绍 |
8 | MOSS-moon | 16B | 700B | 6.67x1022 FLOPs |
9 | ChatGLM2 | 6B | 1.4T | |
10 | StableLM | 3B,7B | 800B | |
11 | RedPajama-INCITE | 3B,7B | 1T | |
12 | GPT-NeoX | 20B | 3.15M | 800GB的The Pile数据集 |
13 | OpenLLaMA | 3B,7B,13B | 1T | |
14 | MPT | 7B,30B | 1T | |
15 | Pythia | 2.8B,6.9B,12B | 300B | |
16 | XGen | 7B | 1.5T | |
17 | OPT | 6.7B,13B,30B,66B,175B | 180B | |
18 | Qwen | 7B,14B | 2.4T,3.0T | |
19 | XVERSE | 13B,65B | 1.4T,2.6T | |
20 | Aquila2 | 7B,34B | 2T | |
21 | Prithvi | IBM+NASA,地理空间,100M(图片) | ||
22 | Skywork | 13B | 3.2T | 昆仑万维·天工 |
23 | Deepseek Coder | 1.3B,6.7B,33B | 2T | Deepseek Coder comprises a series of code language models trained on both 87% code and 13% natural language in English and Chinese, with each model pre-trained on 2T tokens. |
24 | Aquila | 7B | 悟道·天鹰 | |
25 | Yi | 6B,34B | 3T | |
26 | Mistral | 7B | 欧洲 | |
27 | Yuan-2 | 2B,51B,102B |
- WizardLM,WizardMath,WizardCoder
- Alpaca
- Vicuna
- Guanaco
- CodeLLaMA
- 7B,13B,34B,基于LLaMA2,增加了650B左右的代码词元进行增量训练和微调