Request Failed with 422 Error: Input Should Be a Valid String for Image Paths #4135

wwwadx · 2025-05-08T12:25:59Z

Describe the bug
用vllm_server_host grpo训练qwen-2.5-VL-3B报错：

vllm启动命令：

CUDA_VISIBLE_DEVICES=7 \
swift rollout \
    --model /save/models/Qwen2.5-VL-3B-Instruct \
    --tensor_parallel_size 1 \
    --max_model_len 8192 \
    --max_new_tokens 2048

训练启动脚本：

MAX_PIXELS=602112 \
CUDA_VISIBLE_DEVICES=0,1 \
NPROC_PER_NODE=2 \
NCCL_TIMEOUT=360000 \
swift rlhf \
    --rlhf_type grpo \
    --model Qwen/Qwen2.5-VL-3B-Instruct \
    --external_plugins /save/ms-swift/examples/train/grpo/plugin/plugin.py \
    --reward_funcs external_math_format \
    --use_vllm true \
    --vllm_server_host 127.0.0.1 \
    --vllm_server_port 8001 \
    --train_type lora \
    --torch_dtype bfloat16 \
    --dataset /save/cxr_report/reasoning_inital_sft/qwen_training_data_for_grpo_standard_messages.jsonl \
    --max_completion_length 1536 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --learning_rate 1e-7 \
    --do_eval False \
    --eval_steps 100000000000 \
    --save_steps 100 \
    --save_total_limit 100 \
    --logging_steps 1 \
    --output_dir output \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 4 \
    --dataset_num_proc 4 \
    --num_generations 8 \
    --temperature 1.0 \
    --top_p 0.9 \
    --top_k 50 \
    --async_generate true \
    --offload_optimizer true \
    --gc_collect_after_offload true \
    --sleep_level 1 \
    --vllm_enforce_eager true \
    --vllm_enable_prefix_caching true \
    --vllm_limit_mm_per_prompt '{"image": 1}' \
    --system '/save/ms-swift/examples/train/grpo/prompt_cxr.txt' \
    --deepspeed zero2 \
    --log_completions true \
    --num_iterations 1 \
    --num_infer_workers 2 \
    --report_to wandb \
    --beta 0.0 \
    --move_model_batches 20

Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等)

ms_swift                                 3.5.0.dev0    /save/ms-swift
vllm                                     0.8.4
trl                                      0.17.0
transformers                             4.51.3
torch                                    2.6.0
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Sep__8_19:17:24_PDT_2023
Cuda compilation tools, release 12.3, V12.3.52
Build cuda_12.3.r12.3/compiler.33281558_0

Additional context
Add any other context about the problem here(在这里补充其他信息)
数据格式：

{
  "id": "s58971208",
  "messages": [
    {
      "role": "user",
      "content": "<image>"
    },
    {
      "role": "assistant",
      "content": "<findings>The ET tube terminates approximately 2.9 cm from the carina.  The NG tube courses below the diaphragm with the tip out of the field of view of the film.  There has been interval worsening of the right linear opacification likely secondary to atelectasis. No pneumothorax or definite pleural effusion is seen. The hilar and mediastinal contours are normal. There is mild cardiomegaly, stable compared to the preior exam.</findings>\n\n<impression>Slight interval worsening of right lower lung atelectasis.</impression>"
    }
  ],
  "images": "/save/datasets/LLaVA-Pretrain/cxr_224/1a1fe7e3-cbac5d93-b339aeda-86bb86b5-4f31e82e.jpg",
  "solution": "<findings>The ET tube terminates approximately 2.9 cm from the carina.  The NG tube courses below the diaphragm with the tip out of the field of view of the film.  There has been interval worsening of the right linear opacification likely secondary to atelectasis. No pneumothorax or definite pleural effusion is seen. The hilar and mediastinal contours are normal. There is mild cardiomegaly, stable compared to the preior exam.</findings>\n\n<impression>Slight interval worsening of right lower lung atelectasis.</impression>"
}

The text was updated successfully, but these errors were encountered:

wwwadx · 2025-05-09T05:30:21Z

问题是由发给VLLM进行推理的数据导致的
在grpo trainer中：

会导致在external vllm的模式下，发给vllm的数据没有bytes这一项，而vllm需要接受bytes格式的image 修改这个函数：

    def _process_infer_requests_images(self, infer_requests: List[InferRequest]):
        import base64
        from PIL import Image
        import os
        
        if not any('images' in request for request in infer_requests):
            return
            
        for request in infer_requests:
            if 'images' not in request:
                continue
                
            for i, img in enumerate(request['images']):
                # If bytes is available, use it
                if 'bytes' in img and img['bytes']:
                    request['images'][i] = base64.b64encode(img['bytes']).decode('utf-8')
                # If bytes is null but path is available, load from path
                elif 'path' in img and img['path'] and os.path.exists(img['path']):
                    with open(img['path'], 'rb') as image_file:
                        image_bytes = image_file.read()
                        request['images'][i] = base64.b64encode(image_bytes).decode('utf-8')
        return

解决了发给vllm的数据缺少bytes的问题；
同时 _engine_infer在发infer_requests的时候，是不包含系统默认设置的system prompt的，如果需要传递system prompt，需要直接添加的数据里；
经过上面修改，成功跑起来，结果看起来也没什么问题

hjh0119 mentioned this issue May 8, 2025

refactor grpo internal mode #4097

Draft

8 tasks

hjh0119 self-assigned this May 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request Failed with 422 Error: Input Should Be a Valid String for Image Paths #4135

Request Failed with 422 Error: Input Should Be a Valid String for Image Paths #4135

wwwadx commented May 8, 2025 •

edited

Loading

wwwadx commented May 9, 2025 •

edited

Loading

Request Failed with 422 Error: Input Should Be a Valid String for Image Paths #4135

Request Failed with 422 Error: Input Should Be a Valid String for Image Paths #4135

Comments

wwwadx commented May 8, 2025 • edited Loading

wwwadx commented May 9, 2025 • edited Loading

wwwadx commented May 8, 2025 •

edited

Loading

wwwadx commented May 9, 2025 •

edited

Loading