Skip to content

Commit

Permalink
fix doc (modelscope#1875)
Browse files Browse the repository at this point in the history
  • Loading branch information
tastelikefeet authored Aug 31, 2024
1 parent 923c7d8 commit 469d44c
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion docs/source/LLM/命令行参数.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@
- `--use_dora`: 默认为`False`, 是否使用`DoRA`.
- `--use_rslora`: 默认为`False`, 是否使用`RS-LoRA`.
- `--neftune_noise_alpha`: `NEFTune`添加的噪声系数, 可以提升模型在指令微调中的性能, 默认为`None`. 通常可以设置为5, 10, 15. 你可以查看[相关论文](https://arxiv.org/abs/2310.05914).
- `--neftune_backend`: `NEFTune`的backend,默认使用`transformers`库, 当训练VL模型时可能遇到不适配的情况, 此时建议指定为`swift`.
- `--neftune_backend`: `NEFTune`的backend,支持`transformers``swift`两种, 默认使用`transformers`.
- `--gradient_checkpointing`: 是否开启gradient checkpointing, 默认为`True`. 该参数可以用于节约显存, 虽然这会略微降低训练速度. 该参数在max_length较大, batch_size较大时作用显著.
- `--deepspeed`: 用于指定deepspeed的配置文件的路径或者直接传入json格式的配置信息, 默认为`None`, 即不开启deepspeed. deepspeed可以节约显存. 我们书写了默认的[ZeRO-2配置文件](https://github.com/modelscope/swift/blob/main/swift/llm/ds_config/zero2.json), [ZeRO-3配置文件](https://github.com/modelscope/swift/blob/main/swift/llm/ds_config/zero3.json), [ZeRO-2 Offload配置文件](https://github.com/modelscope/swift/blob/main/swift/llm/ds_config/zero2_offload.json)[ZeRO-3 Offload配置文件](https://github.com/modelscope/swift/blob/main/swift/llm/ds_config/zero3_offload.json). 你只需要指定'default-zero2', 'default-zero3', 'zero2-offload', 'zero3-offload'即可.
- `--batch_size`: 训练时的batch_size, 默认为`1`. 增大batch_size可以增加GPU的利用率, 但不一定会增加训练速度, 因为在一个batch中, 需要对较短的句子按该batch中最长句子的长度进行padding, 从而引入无效的计算量.
Expand Down
2 changes: 1 addition & 1 deletion docs/source_en/LLM/Command-line-parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@
- `--use_dora`: Default is `False`, whether to use `DoRA`.
- `--use_rslora`: Default is `False`, whether to use `RS-LoRA`.
- `--neftune_noise_alpha`: The noise coefficient added by `NEFTune` can improve performance of instruction fine-tuning, default is `None`. Usually can be set to 5, 10, 15. See [related paper](https://arxiv.org/abs/2310.05914).
- `--neftune_backend`: The backend of `NEFTune`, default uses `transformers` library, may encounter incompatibility when training VL models, in which case it's recommended to specify as `swift`.
- `--neftune_backend`: The backend of `NEFTune`, supported values are `transformers`, `swift`, default is `transformers`.
- `--gradient_checkpointing`: Whether to enable gradient checkpointing, default is `True`. This can be used to save memory, although it slightly reduces training speed. Has significant effect when max_length and batch_size are large.
- `--deepspeed`: Used to specify the path to the deepspeed configuration file or directly pass JSON formatted configuration information. By default, it is set to `None`, which means deepspeed is not enabled. Deepspeed can save GPU memory. We have written default [ZeRO-2 configuration file](https://github.com/modelscope/swift/blob/main/swift/llm/ds_config/zero2_offload.json), [ZeRO-3 configuration file](https://github.com/modelscope/swift/blob/main/swift/llm/ds_config/zero3.json), [ZeRO-2 Offload configuration file](https://github.com/modelscope/swift/blob/main/swift/llm/ds_config/zero2_offload.json ), and [ZeRO-3 Offload configuration file](https://github.com/modelscope/swift/blob/main/swift/llm/ds_config/zero3_offload.json). You only need to specify 'default-zero2', 'default-zero3', 'zero2-offload', 'zero3-offload'.
- `--batch_size`: Batch_size during training, default is `1`. Increasing batch_size can improve GPU utilization, but won't necessarily improve training speed, because within a batch, shorter sentences need to be padded to the length of the longest sentence in the batch, introducing invalid computations.
Expand Down

0 comments on commit 469d44c

Please sign in to comment.