Skip to content

Commit

Permalink
update docs (specific model arguments) (#2822)
Browse files Browse the repository at this point in the history
  • Loading branch information
Jintao-Huang authored Dec 31, 2024
1 parent 054ae1a commit d87d8ed
Show file tree
Hide file tree
Showing 4 changed files with 130 additions and 10 deletions.
64 changes: 62 additions & 2 deletions docs/source/Instruction/命令行参数.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# 命令行参数

命令行参数的介绍会分为基本参数,原子参数和集成参数。命令行最终使用的参数列表为集成参数。集成参数继承自基本参数和一些原子参数。
命令行参数的介绍会分为基本参数,原子参数、集成参数和特定模型参数。命令行最终使用的参数列表为集成参数。集成参数继承自基本参数和一些原子参数。特定模型参数是针对于具体模型的参数,可以通过`--model_kwargs'`或者环境变量进行设置

## 基本参数

- 🔥tuner_backend: 可选为'peft', 'unsloth', 默认为'peft'
- 🔥train_type: 默认为'lora'. 可选为: 'lora', 'full', 'longlora', 'adalora', 'llamapro', 'adapter', 'vera', 'boft', 'fourierft', 'reft'
- 🔥adapters: 用于指定adapter的id/path的list,默认为`[]`.
- seed: 默认为42
- model_kwargs: 特定模型可传入的额外参数. 该参数列表会在训练推理时打印日志进行提示
- model_kwargs: 特定模型可传入的额外参数. 该参数列表会在训练推理时打印日志进行提示,例如`--model_kwargs '{"fps_max_frames": 12}'`
- load_args: 当指定`--resume_from_checkpoint`, `--model`, `--adapters`会读取保存文件中的`args.json`,将默认为None的`基本参数`(除去数据参数和生成参数)进行赋值(可通过手动传入进行覆盖)。默认为True
- load_data_args: 如果将该参数设置为True, 则会额外读取数据参数. 默认为False
- use_hf: 默认为False. 控制模型下载、数据集下载、模型push的hub
Expand Down Expand Up @@ -392,3 +392,63 @@ App参数继承于[部署参数](#部署参数), [Web-UI参数](#Web-UI参数)
- hub_model_id: 推送的model_id,默认为None
- hub_private_repo: 是否是private repo,默认为False
- commit_message: 提交信息,默认为'update files'


## 特定模型参数
特定模型参数可以通过`--model_kwargs`或者环境变量进行设置,例如: `--model_kwargs '{"fps_max_frames": 12}'`或者`FPS_MAX_FRAMES=12`

### qwen2_vl, qvq
参数含义可以查看[这里](https://github.com/QwenLM/Qwen2-VL/blob/main/qwen-vl-utils/src/qwen_vl_utils/vision_process.py#L24)

- IMAGE_FACTOR: 默认为28
- MIN_PIXELS: 默认为`4 * 28 * 28`
- MAX_PIXELS: 默认为`16384 * 28 * 28`,参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/ocr.sh#L3)
- MAX_RATIO: 默认为200
- VIDEO_MIN_PIXELS: 默认为`128 * 28 * 28`
- VIDEO_MAX_PIXELS: 默认为`768 * 28 * 28`,参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/video.sh#L7)
- VIDEO_TOTAL_PIXELS: 默认为`24576 * 28 * 28`
- FRAME_FACTOR: 默认为2
- FPS: 默认为2.0
- FPS_MIN_FRAMES: 默认为4
- FPS_MAX_FRAMES: 默认为768,参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/video.sh#L8)

### internvl, internvl_phi3
参数含义可以查看[这里](https://modelscope.cn/models/OpenGVLab/Mini-InternVL-Chat-2B-V1-5)
- MAX_NUM: 默认为12
- INPUT_SIZE: 默认为448

### internvl2, internvl2_phi3, internvl2_5
- MAX_NUM: 默认为12
- INPUT_SIZE: 默认为448
- VIDEO_MAX_NUM: 默认为1。视频的MAX_NUM
- VIDEO_SEGMENTS: 默认为8


### minicpmv2_6
- MAX_SLICE_NUMS: 默认为9,参考[这里](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/file/view/master?fileName=config.json&status=1)
- VIDEO_MAX_SLICE_NUMS: 默认为1,视频的MAX_SLICE_NUMS,参考[这里](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6)
- MAX_NUM_FRAMES: 默认为64,参考[这里](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6)

### ovis1_6
- MAX_PARTITION: 参考[这里](https://github.com/AIDC-AI/Ovis/blob/d248e34d755a95d24315c40e2489750a869c5dbc/ovis/model/modeling_ovis.py#L312)

### mplug_owl3, mplug_owl3_241101
- MAX_NUM_FRAMES: 默认为16,参考[这里](https://modelscope.cn/models/iic/mPLUG-Owl3-7B-240728)

### xcomposer2_4khd
- HD_NUM: 默认为55,参考[这里](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer2-4khd-7b)

### xcomposer2_5
- HD_NUM: 图片数量为1时,默认值为24。大于1,默认为6。参考[这里](https://modelscope.cn/models/AI-ModelScope/internlm-xcomposer2d5-7b/file/view/master?fileName=modeling_internlm_xcomposer2.py&status=1#L254)

### video_cogvlm2
- NUM_FRAMES: 默认为24,参考[这里](https://github.com/THUDM/CogVLM2/blob/main/video_demo/inference.py#L22)

### phi3_vision
- NUM_CROPS: 默认为4,参考[这里](https://modelscope.cn/models/LLM-Research/Phi-3.5-vision-instruct)

### llama3_1_omni
- N_MELS: 默认为128,参考[这里](https://github.com/ictnlp/LLaMA-Omni/blob/544d0ff3de8817fdcbc5192941a11cf4a72cbf2b/omni_speech/infer/infer.py#L57)

### video_llava
- NUM_FRAMES: 默认为16
63 changes: 61 additions & 2 deletions docs/source_en/Instruction/Command-line-parameters.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# Command Line Parameters

The introduction to command line parameters will cover base arguments, atomic arguments, and integration arguments. The final list of arguments used in the command line is the integration arguments. The integration arguments inherit from the base arguments and some atomic arguments.
The introduction to command line parameters will cover base arguments, atomic arguments, and integrated arguments, and specific model arguments. The final list of arguments used in the command line is the integration arguments. Integrated arguments inherit from basic arguments and some atomic arguments. Specific model arguments are designed for specific models and can be set using `--model_kwargs'` or the environment variable.

## Base Arguments

- 🔥tuner_backend: Optional values are 'peft' and 'unsloth', default is 'peft'
- 🔥train_type: Default is 'lora'. Optional values: 'lora', 'full', 'longlora', 'adalora', 'llamapro', 'adapter', 'vera', 'boft', 'fourierft', 'reft'
- 🔥adapters: A list used to specify the ID/path of the adapter, default is `[]`.
- seed: Default is 42
- model_kwargs: Extra parameters specific to the model. This parameter list will be logged during training for reference.
- model_kwargs: Extra parameters specific to the model. This parameter list will be logged during training for reference, for example, `--model_kwargs '{"fps_max_frames": 12}'`.
- load_args: When `--resume_from_checkpoint`, `--model`, or `--adapters` is specified, it will read the `args.json` file from the saved checkpoint and assign values to the `BaseArguments` that are defaulted to None (excluding DataArguments and GenerationArguments). These can be overridden by manually passing in values. The default is `True`.
- load_data_args: If this parameter is set to True, it will additionally read the data parameters. The default is `False`.
- use_hf: Default is False. Controls model and dataset downloading, and model pushing to the hub.
Expand Down Expand Up @@ -392,3 +392,62 @@ Export Arguments include the [basic arguments](#base-arguments) and [merge argum
- hub_model_id: Model ID for pushing, default is None.
- hub_private_repo: Whether it is a private repo, default is False.
- commit_message: Commit message, default is 'update files'.

## Specific Model Arguments

Specific model arguments can be set using `--model_kwargs` or environment variables, for example: `--model_kwargs '{"fps_max_frames": 12}'` or `FPS_MAX_FRAMES=12`.

### qwen2_vl, qvq
For the meaning of the arguments, please refer to [here](https://github.com/QwenLM/Qwen2-VL/blob/main/qwen-vl-utils/src/qwen_vl_utils/vision_process.py#L24)

- IMAGE_FACTOR: Default is 28
- MIN_PIXELS: Default is `4 * 28 * 28`
- MAX_PIXELS: Default is `16384 * 28 * 28`, refer to [here](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/ocr.sh#L3)
- MAX_RATIO: Default is 200
- VIDEO_MIN_PIXELS: Default is `128 * 28 * 28`
- VIDEO_MAX_PIXELS: Default is `768 * 28 * 28`, refer to [here](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/video.sh#L7)
- VIDEO_TOTAL_PIXELS: Default is `24576 * 28 * 28`
- FRAME_FACTOR: Default is 2
- FPS: Default is 2.0
- FPS_MIN_FRAMES: Default is 4
- FPS_MAX_FRAMES: Default is 768, refer to [here](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/video.sh#L8)

### internvl, internvl_phi3
For the meaning of the arguments, please refer to [here](https://modelscope.cn/models/OpenGVLab/Mini-InternVL-Chat-2B-V1-5)
- MAX_NUM: Default is 12
- INPUT_SIZE: Default is 448

### internvl2, internvl2_phi3, internvl2_5
- MAX_NUM: Default is 12
- INPUT_SIZE: Default is 448
- VIDEO_MAX_NUM: Default is 1, which is the MAX_NUM for videos
- VIDEO_SEGMENTS: Default is 8

### minicpmv2_6
- MAX_SLICE_NUMS: Default is 9, refer to [here](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6/file/view/master?fileName=config.json&status=1)
- VIDEO_MAX_SLICE_NUMS: Default is 1, which is the MAX_SLICE_NUMS for videos, refer to [here](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6)
- MAX_NUM_FRAMES: Default is 64, refer to [here](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6)

### ovis1_6
- MAX_PARTITION: Refer to [here](https://github.com/AIDC-AI/Ovis/blob/d248e34d755a95d24315c40e2489750a869c5dbc/ovis/model/modeling_ovis.py#L312)

### mplug_owl3, mplug_owl3_241101
- MAX_NUM_FRAMES: Default is 16, refer to [here](https://modelscope.cn/models/iic/mPLUG-Owl3-7B-240728)

### xcomposer2_4khd
- HD_NUM: Default is 55, refer to [here](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm-xcomposer2-4khd-7b)

### xcomposer2_5
- HD_NUM: Default is 24 when the number of images is 1. Greater than 1, the default is 6. Refer to [here](https://modelscope.cn/models/AI-ModelScope/internlm-xcomposer2d5-7b/file/view/master?fileName=modeling_internlm_xcomposer2.py&status=1#L254)

### video_cogvlm2
- NUM_FRAMES: Default is 24, refer to [here](https://github.com/THUDM/CogVLM2/blob/main/video_demo/inference.py#L22)

### phi3_vision
- NUM_CROPS: Default is 4, refer to [here](https://modelscope.cn/models/LLM-Research/Phi-3.5-vision-instruct)

### llama3_1_omni
- N_MELS: Default is 128, refer to [here](https://github.com/ictnlp/LLaMA-Omni/blob/544d0ff3de8817fdcbc5192941a11cf4a72cbf2b/omni_speech/infer/infer.py#L57)

### video_llava
- NUM_FRAMES: Default is 16
5 changes: 4 additions & 1 deletion swift/llm/template/template/internvl.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,10 @@ def _encode(self, inputs: StdTemplateInputs) -> Dict[str, Any]:
if images:
has_video = bool(inputs.videos)
input_size = get_env_args('input_size', int, 448)
max_num = get_env_args('max_num', int, 1 if has_video else 12)
max_num = get_env_args('max_num', int, 12)
video_max_num = get_env_args('video_max_num', int, 1)
if has_video:
max_num = video_max_num
pixel_values = [transform_image(image, input_size, max_num) for image in images]
num_patches = [pv.shape[0] for pv in pixel_values]
pixel_values = torch.cat(pixel_values).to(self.config.torch_dtype)
Expand Down
8 changes: 3 additions & 5 deletions swift/llm/template/template/minicpm.py
Original file line number Diff line number Diff line change
Expand Up @@ -174,13 +174,11 @@ def _encode(self, inputs: StdTemplateInputs) -> Dict[str, Any]:
use_video = bool(inputs.videos)
is_plain_text = not images and not use_video
use_image_id = True
max_slice_nums = None

max_slice_nums = get_env_args('max_slice_nums', int, None)
video_max_slice_nums = get_env_args('video_max_slice_nums', int, 1) # or 2
if use_video:
max_slice_nums = video_max_slice_nums
use_image_id = False
max_slice_nums = 1 # or 2

max_slice_nums = get_env_args('max_slice_nums', int, max_slice_nums)
input_ids = encoded['input_ids']
labels = encoded['labels']
idx_list = findall(input_ids, -100)
Expand Down

0 comments on commit d87d8ed

Please sign in to comment.