Skip to content

Request Failed with 422 Error: Input Should Be a Valid String for Image Paths #4135

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wwwadx opened this issue May 8, 2025 · 1 comment
Assignees

Comments

@wwwadx
Copy link

wwwadx commented May 8, 2025

Describe the bug
用vllm_server_host grpo训练qwen-2.5-VL-3B报错:

Image Image

vllm启动命令:

CUDA_VISIBLE_DEVICES=7 \
swift rollout \
    --model /save/models/Qwen2.5-VL-3B-Instruct \
    --tensor_parallel_size 1 \
    --max_model_len 8192 \
    --max_new_tokens 2048

训练启动脚本:

MAX_PIXELS=602112 \
CUDA_VISIBLE_DEVICES=0,1 \
NPROC_PER_NODE=2 \
NCCL_TIMEOUT=360000 \
swift rlhf \
    --rlhf_type grpo \
    --model Qwen/Qwen2.5-VL-3B-Instruct \
    --external_plugins /save/ms-swift/examples/train/grpo/plugin/plugin.py \
    --reward_funcs external_math_format \
    --use_vllm true \
    --vllm_server_host 127.0.0.1 \
    --vllm_server_port 8001 \
    --train_type lora \
    --torch_dtype bfloat16 \
    --dataset /save/cxr_report/reasoning_inital_sft/qwen_training_data_for_grpo_standard_messages.jsonl \
    --max_completion_length 1536 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --learning_rate 1e-7 \
    --do_eval False \
    --eval_steps 100000000000 \
    --save_steps 100 \
    --save_total_limit 100 \
    --logging_steps 1 \
    --output_dir output \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 4 \
    --dataset_num_proc 4 \
    --num_generations 8 \
    --temperature 1.0 \
    --top_p 0.9 \
    --top_k 50 \
    --async_generate true \
    --offload_optimizer true \
    --gc_collect_after_offload true \
    --sleep_level 1 \
    --vllm_enforce_eager true \
    --vllm_enable_prefix_caching true \
    --vllm_limit_mm_per_prompt '{"image": 1}' \
    --system '/save/ms-swift/examples/train/grpo/prompt_cxr.txt' \
    --deepspeed zero2 \
    --log_completions true \
    --num_iterations 1 \
    --num_infer_workers 2 \
    --report_to wandb \
    --beta 0.0 \
    --move_model_batches 20

Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)

ms_swift                                 3.5.0.dev0    /save/ms-swift
vllm                                     0.8.4
trl                                      0.17.0
transformers                             4.51.3
torch                                    2.6.0
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Sep__8_19:17:24_PDT_2023
Cuda compilation tools, release 12.3, V12.3.52
Build cuda_12.3.r12.3/compiler.33281558_0

Additional context
Add any other context about the problem here(在这里补充其他信息)
数据格式:

{
  "id": "s58971208",
  "messages": [
    {
      "role": "user",
      "content": "<image>"
    },
    {
      "role": "assistant",
      "content": "<findings>The ET tube terminates approximately 2.9 cm from the carina.  The NG tube courses below the diaphragm with the tip out of the field of view of the film.  There has been interval worsening of the right linear opacification likely secondary to atelectasis. No pneumothorax or definite pleural effusion is seen. The hilar and mediastinal contours are normal. There is mild cardiomegaly, stable compared to the preior exam.</findings>\n\n<impression>Slight interval worsening of right lower lung atelectasis.</impression>"
    }
  ],
  "images": "/save/datasets/LLaVA-Pretrain/cxr_224/1a1fe7e3-cbac5d93-b339aeda-86bb86b5-4f31e82e.jpg",
  "solution": "<findings>The ET tube terminates approximately 2.9 cm from the carina.  The NG tube courses below the diaphragm with the tip out of the field of view of the film.  There has been interval worsening of the right linear opacification likely secondary to atelectasis. No pneumothorax or definite pleural effusion is seen. The hilar and mediastinal contours are normal. There is mild cardiomegaly, stable compared to the preior exam.</findings>\n\n<impression>Slight interval worsening of right lower lung atelectasis.</impression>"
}
@hjh0119 hjh0119 self-assigned this May 8, 2025
@wwwadx
Copy link
Author

wwwadx commented May 9, 2025

问题是由发给VLLM进行推理的数据导致的
在grpo trainer中:

Image 会导致在external vllm的模式下,发给vllm的数据没有bytes这一项,而vllm需要接受bytes格式的image 修改这个函数:
    def _process_infer_requests_images(self, infer_requests: List[InferRequest]):
        import base64
        from PIL import Image
        import os
        
        if not any('images' in request for request in infer_requests):
            return
            
        for request in infer_requests:
            if 'images' not in request:
                continue
                
            for i, img in enumerate(request['images']):
                # If bytes is available, use it
                if 'bytes' in img and img['bytes']:
                    request['images'][i] = base64.b64encode(img['bytes']).decode('utf-8')
                # If bytes is null but path is available, load from path
                elif 'path' in img and img['path'] and os.path.exists(img['path']):
                    with open(img['path'], 'rb') as image_file:
                        image_bytes = image_file.read()
                        request['images'][i] = base64.b64encode(image_bytes).decode('utf-8')
        return

解决了发给vllm的数据缺少bytes的问题;
同时 _engine_infer在发infer_requests的时候,是不包含系统默认设置的system prompt的,如果需要传递system prompt,需要直接添加的数据里;
经过上面修改,成功跑起来,结果看起来也没什么问题

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants