support megatron (modelscope#1365)

hanlv15 · Jul 24, 2024 · c306194 · c306194
1 parent 00a6706
commit c306194
Show file tree

Hide file tree

Showing 26 changed files with 1,107 additions and 43 deletions.
diff --git a/README.md b/README.md
@@ -55,6 +55,7 @@ You can contact us and communicate with us by adding our group:
 <img src="asset/discord_qr.jpg" width="200" height="200">  |  <img src="asset/wechat.png" width="200" height="200">
 
 ## 🎉 News
+- 🔥2024.07.24: Support using Megatron for CPT and SFT on the Qwen2 series. You can refer to the [Megatron training documentation](docs/source_en/LLM/Megatron-training.md).
 - 🔥2024.07.20: Support llama3.1 series models.
 - 2024.07.20: Support mistral-nemo series models. Use `--model_type mistral-nemo-base-2407` and `--model_type mistral-nemo-instruct-2407` to begin.
 - 2024.07.19: Support [Q-Galore](https://arxiv.org/abs/2407.08296), this algorithm can reduce the training memory cost by 60% (qwen-7b-chat, full, 80G -> 35G), use `swift sft --model_type xxx --use_galore true --galore_quantization true` to begin!

diff --git a/README_CN.md b/README_CN.md
@@ -56,6 +56,7 @@ SWIFT具有丰富全面的文档，请查看我们的文档网站:
 
 
 ## 🎉 新闻
+- 🔥2024.07.24: 支持使用megatron对qwen2系列进行CPT和SFT. 可以查看[megatron训练文档](docs/source/LLM/Megatron训练文档.md).
 - 🔥2024.07.24: 支持llama3.1系列模型.
 - 2024.07.20: 支持mistral-nemo系列模型. 使用`--model_type mistral-nemo-base-2407`以及`--model_type mistral-nemo-instruct-2407`开始训练和推理.
 - 🔥2024.07.19: 支持[Q-Galore](https://arxiv.org/abs/2407.08296)算法, 该算法可以减少显存使用约60% (qwen-7b-chat, full, 80G -> 35G), 使用命令行:`swift sft --model_type xxx --use_galore true --galore_quantization true`来开始训练!

diff --git a/docs/source/LLM/Megatron训练文档.md b/docs/source/LLM/Megatron训练文档.md
@@ -0,0 +1,166 @@
+# Megatron训练文档 （测试版）
+
+## 目录
+- [环境准备](#环境准备)
+- [SFT案例](#SFT案例)
+- [多机预训练案例](#多机预训练案例)
+- [MegatronArguments与SftArguments的映射](#MegatronArguments与SftArguments的映射)
+
+
+## 环境准备
+
+```shell
+# 设置pip全局镜像 (加速下载)
+pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
+# 安装ms-swift
+git clone https://github.com/modelscope/swift.git
+cd swift
+pip install -e '.[llm]'
+
+# 安装megatron相关依赖 (你不需要安装megatron-ml等其他依赖库)
+# transformer_engine
+pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable
+# apex
+git clone https://github.com/NVIDIA/apex
+cd apex
+pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
+```
+
+其他两个依赖库为[Megatron-LM](https://github.com/NVIDIA/Megatron-LM)和[Pai-Megatron-Patch](https://github.com/alibaba/Pai-Megatron-Patch). 会由swift进行git clone并安装, 不需要用户进行安装.
+
+
+## SFT案例
+这里介绍可以很快跑通的使用megatron训练的案例，通过此案例，你可以熟悉magatron训练的全流程。使用HF Trainer进行微调的对应案例可以查看[自我认知微调最佳实践](自我认知微调最佳实践.md).
+
+1. HF格式的权重转成megatron格式的权重:
+```shell
+# 默认输出路径: --megatron_output_dir {model_type}-tp{tp}-pp{pp}
+CUDA_VISIBLE_DEVICES=0 swift export --model_type qwen2-7b-instruct \
+    --to_megatron true --tp 2 --dtype bf16
+```
+
+2. 使用megatron格式权重进行微调，命令脚本如下:
+```shell
+# Experimental Environment: 4 * A100
+# GPU Memory Requirement: 4 * 55GB
+# TP=2, DP=2
+CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft \
+    --resume_from_checkpoint qwen2-7b-instruct-tp2-pp1 \
+    --dataset swift-mix:sharegpt#500 swift-mix:codefuse#250 swift-mix:metamathqa#250 self-cognition#500 \
+    --max_length 2048 \
+    --learning_rate 2e-6 \
+    --output_dir output \
+    --model_name 小黄 'Xiao Huang' \
+    --model_author 魔搭 ModelScope \
+    --train_backend megatron
+```
+
+3. 将megatron格式权重重新转成HF格式:
+```shell
+# 未微调模型
+CUDA_VISIBLE_DEVICES=0 swift export \
+    --ckpt_dir qwen2-7b-instruct-tp2-pp1 --to_hf true
+
+# 微调后模型
+CUDA_VISIBLE_DEVICES=0 swift export \
+    --ckpt_dir output/qwen2-7b-instruct-tp2-pp1/vx-xxx --to_hf true
+```
+
+4. 对获得的权重进行推理测试，并使用vLLM进行加速:
+```shell
+# 未微调模型
+CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen2-7b-instruct \
+    --model_id_or_path qwen2-7b-instruct-tp2-pp1/qwen2-7b-instruct-hf \
+
+# 微调后模型
+CUDA_VISIBLE_DEVICES=0 swift infer \
+    --ckpt_dir output/qwen2-7b-instruct-tp2-pp1/vx-xxx/qwen2-7b-instruct-hf
+```
+
+微调后模型效果如下：
+```python
+"""
+<<< 你是谁
+我是小黄，由魔搭开发的人工智能聊天机器人。我的目标是通过文本交流提供帮助、信息和娱乐。如果您有任何问题或需要帮助，请随时向我提问。
+--------------------------------------------------
+<<< who are you
+I am Xiao Huang, an artificial intelligence chatbot developed by ModelScope. My purpose is to provide assistance, information, and entertainment through text communication. If you have any questions or need help, please feel free to ask me at any time.
+--------------------------------------------------
+<<< 晚上睡不着觉怎么办
+晚上睡不着觉可能是因为多种原因，例如压力、焦虑、不规律的作息时间、咖啡因摄入过多、睡眠环境不佳等。以下是一些可能有助于改善睡眠质量的建议：
+
+1. 建立规律的作息时间：每天尽量在同一时间上床睡觉和起床，即使在周末也是如此。这有助于调整您的生物钟并改善睡眠质量。
+2. 创造舒适的睡眠环境：确保您的卧室安静、黑暗、凉爽，并且床铺舒适。使用遮光窗帘、耳塞或白噪音机等设备可以帮助创造一个更舒适的睡眠环境。
+3. 避免咖啡因和酒精：避免在睡前几小时内摄入咖啡因和酒精，因为它们可能会影响您的睡眠质量。
+4. 放松身心：尝试进行深呼吸、冥想、瑜伽或其他放松技巧，以帮助您放松身心并准备入睡。
+5. 避免使用电子设备：在睡前避免使用电子设备，因为屏幕发出的蓝光可能会影响您的睡眠质量。
+6. 避免午睡：如果您在白天打盹，可能会影响您晚上的睡眠质量。尽量避免在晚上睡觉前几小时内打盹。
+7. 限制晚上摄入的液体：在睡前几小时内避免摄入过多的液体，以减少夜间起床上厕所的次数。
+8. 保持积极的心态：避免在睡前担心或焦虑，因为这可能会影响您的睡眠质量。尝试进行积极的思考，例如思考您期待的第二天的事情。
+9. 尝试放松技巧：尝试进行深呼吸、冥想、瑜伽或其他放松技巧，以帮助您放松身心并准备入睡。
+10. 如果您尝试了上述建议但仍然无法入睡，请考虑咨询医生或睡眠专家以获取更多建议。
+"""
+```
+
+我们对训练完的HF模型进行评测：
+```shell
+pip install llmuses==0.4.0
+# 原始模型
+CUDA_VISIBLE_DEVICES=0 swift eval --model_type qwen2-7b-instruct \
+    --eval_dataset ceval mmlu gsm8k arc --eval_backend Native
+
+# 未微调模型
+CUDA_VISIBLE_DEVICES=0 swift eval --model_type qwen2-7b-instruct \
+    --model_id_or_path qwen2-7b-instruct-tp2-pp1/qwen2-7b-instruct-hf \
+    --eval_dataset ceval mmlu gsm8k arc --eval_backend Native
+
+# 微调后模型
+CUDA_VISIBLE_DEVICES=0 swift eval \
+    --ckpt_dir output/qwen2-7b-instruct-tp2-pp1/vx-xxx/qwen2-7b-instruct-hf \
+    --eval_dataset ceval mmlu gsm8k arc --eval_backend Native
+```
+
+评测结果：
+|     |  ceval    | mmlu   | gsm8k    | arc   |
+| ---- | ---- | ---- | ---- | ---- |
+|  原始模型  |    0.6642  |  0.6909    |    0.787  |  0.8507    |
+|  未微调  |    0.6642  |  0.6909    |    0.787  |  0.8507    |
+|  微调后  |   0.7392   |    0.6878  |  0.8241    |    0.8481  |
+
+
+## 多机预训练案例
+敬请期待...
+
+
+## MegatronArguments与SftArguments的映射
+|  MegatronArguments    |  SftArguments |
+| ---- | ---- |
+|   optimizer   | optim |
+|   lr_decay_style   | lr_scheduler_type |
+|  weight_decay  | weight_decay |
+| clip_grad   |  max_grad_norm |
+|   adam_beta1 | adam_beta1 |
+|  adam_beta2  | adam_beta2 |
+| adam_eps  | adam_epsilon |
+|  lr  | learning_rate |
+|  min_lr  | min_lr |
+|   fp16<br> apply_query_key_layer_scaling | fp16 |
+|  bf16  | bf16 |
+|  tensor_model_parallel_size  | tp |
+|  pipeline_model_parallel_size  | pp |
+|  seed  | seed |
+|  load  | resume_from_checkpoint |
+|  save  | output_dir |
+|  tensorboard_dir  | logging_dir |
+|  log_interval  | logging_steps |
+|  eval_interval  | eval_steps |
+|  save_interval  | save_steps |
+|  micro_batch_size  | batch_size |
+|  global_batch_size  | batch_size * gradient_accumulation_steps * world_size |
+|  sequence_parallel  | sequence_parallel |
+|  num_workers  | dataloader_num_workers |
+|  use_flash_attn  | use_flash_attn |
+|  train_iters  | int(math.ceil(len(train_dataset) * num_train_epochs / global_batch_size)) |
+|  eval_iters  | int(math.ceil(len(val_dataset) / global_batch_size)) |
+|  lr_warmup_iters  |  warmup_steps if warmup_steps > 0 else math.ceil(train_iters * warmup_ratio) |
+|  no_save_optim<br>no_save_rng  | save_only_model |
diff --git a/docs/source/LLM/Qwen1.5全流程最佳实践.md b/docs/source/LLM/Qwen1.5全流程最佳实践.md
@@ -190,7 +190,6 @@ sft_args = SftArguments(
     model_type=ModelType.qwen1half_7b_chat,
     dataset=[f'{DatasetName.alpaca_zh}#500', f'{DatasetName.alpaca_en}#500',
              f'{DatasetName.self_cognition}#500'],
-    logging_steps=5,
     max_length=2048,
     learning_rate=1e-4,
     output_dir='output',
@@ -212,7 +211,6 @@ CUDA_VISIBLE_DEVICES=0,1 \
 swift sft \
     --model_type qwen1half-7b-chat \
     --dataset alpaca-zh#500 alpaca-en#500 self-cognition#500 \
-    --logging_steps 5 \
     --max_length 2048 \
     --learning_rate 1e-4 \
     --output_dir output \
@@ -230,7 +228,6 @@ NPROC_PER_NODE=4 \
 swift sft \
     --model_type qwen1half-7b-chat \
     --dataset alpaca-zh#500 alpaca-en#500 self-cognition#500 \
-    --logging_steps 5 \
     --max_length 2048 \
     --learning_rate 1e-4 \
     --output_dir output \
@@ -479,7 +476,6 @@ NPROC_PER_NODE=4 \
 swift sft \
     --model_type qwen1half-72b-chat \
     --dataset alpaca-zh#500 alpaca-en#500 self-cognition#500 \
-    --logging_steps 5 \
     --max_length 4096 \
     --learning_rate 1e-4 \
     --output_dir output \

diff --git a/docs/source/LLM/VLLM推理加速与部署.md b/docs/source/LLM/VLLM推理加速与部署.md
@@ -611,7 +611,6 @@ NPROC_PER_NODE=4 \
 swift sft \
     --model_type llama2-7b-chat \
     --dataset self-cognition#500 sharegpt-gpt4:default#1000 \
-    --logging_steps 5 \
     --max_length 4096 \
     --learning_rate 1e-4 \
     --output_dir output \

diff --git a/docs/source/LLM/index.md b/docs/source/LLM/index.md
@@ -15,6 +15,7 @@
 11. [ORPO最佳实践](ORPO算法最佳实践.md)
 12. [SimPO最佳实践](SimPO算法最佳实践.md)
 13. [人类偏好对齐训练文档](人类偏好对齐训练文档.md)
+14. [Megatron训练文档](Megatron训练文档.md)
 
 ### ⭐️最佳实践系列
 

diff --git a/docs/source/LLM/自我认知微调最佳实践.md b/docs/source/LLM/自我认知微调最佳实践.md
@@ -112,7 +112,6 @@ sft_args = SftArguments(
     model_type=ModelType.qwen2_7b_instruct,
     dataset=[f'{DatasetName.alpaca_zh}#500', f'{DatasetName.alpaca_en}#500',
              f'{DatasetName.self_cognition}#500'],
-    logging_steps=5,
     max_length=2048,
     learning_rate=1e-4,
     output_dir='output',
@@ -170,7 +169,6 @@ CUDA_VISIBLE_DEVICES=0 \
 swift sft \
     --model_type qwen2-7b-instruct \
     --dataset alpaca-zh#500 alpaca-en#500 self-cognition#500 \
-    --logging_steps 5 \
     --max_length 2048 \
     --learning_rate 1e-4 \
     --output_dir output \
@@ -189,7 +187,6 @@ NPROC_PER_NODE=4 \
 swift sft \
     --model_type qwen2-7b-instruct \
     --dataset alpaca-zh#500 alpaca-en#500 self-cognition#500 \
-    --logging_steps 5 \
     --max_length 2048 \
     --learning_rate 1e-4 \
     --output_dir output \

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -28,6 +28,7 @@ Swift DOCUMENTATION
    LLM/LLM量化文档.md
    LLM/VLLM推理加速与部署.md
    LLM/LmDeploy推理加速与部署.md
+   LLM/Megatron训练文档.md
    LLM/LLM实验文档.md
    LLM/命令行参数.md
    LLM/支持的模型和数据集.md