[Bug] vlm模型的聊天模版，只会提取同一个role的最后一段text吗？ #2911

OftenDream · 2024-12-17T12:51:41Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

我使用lmdeploy部署了qwen-vl。输入遵循openai的vision格式，但是一个role会有两段text，如：
"messages"= [
{
"content": [
{
"type": "text",
"text": "你好"
},
{
"image_url": {
"url": "{{url}}"
},
"type": "image_url"
},
{
"type": "text",
"text": "描述一下这个图片"
}
],
"role": "user"
}]
但是后端拼接的结果长这样：

把第一段的"你好"给吞掉了。
查阅了相关源码，发现构造 vision message的时候会最后一个text会覆盖之前的text。

请问这个现象是正常的吗，能修复吗？

Reproduction

from lmdeploy import pipeline, ChatTemplateConfig

model_path = {{qwen_vl_dir}}
pipe = pipeline(model_path=model_path,
                               chat_template_config=ChatTemplateConfig(model_name='qwen'),
                               log_level='INFO')

prompts = [
    {
        'role': 'user',
        'content': [
            {'type': 'text', 'text': '你好'},
            {'type': 'image_url', 'image_url': {'url': '{{url}}'}},
            {'type': 'text', 'text': '描述一下这个图片'}
        ]
    }
]
response = pipe(prompts)
print(response)

Environment

lmdeploy==0.6.0
torch==2.4.0
cuda==11.8

Error traceback

No response

The text was updated successfully, but these errors were encountered:

lvhan028 · 2024-12-17T13:13:29Z

是的。不考虑。因为不清楚这种情况下，应该怎么个拼接法。没有发现开源模型侧有这样的规范，所以不敢轻易去定义在这种输入情况下，prompt拼接的行为。

irexyc · 2024-12-18T02:23:34Z

你可以这样构造，用 <IMAGE_TOKEN>（lmdeploy 中的表示图片的特殊符号) 来表示图片，将text 合并到一个

prompts = [
    {
        'role': 'user',
        'content': [
            {'type': 'text', 'text': '你好<IMAGE_TOKEN>描述一下这个图片'},
            {'type': 'image_url', 'image_url': {'url': '{{url}}'}},
        ]
    }
]

OftenDream · 2024-12-18T03:03:54Z

你可以这样构造，用 <IMAGE_TOKEN>（lmdeploy 中的表示图片的特殊符号) 来表示图片，将text 合并到一个
prompts = [
    {
        'role': 'user',
        'content': [
            {'type': 'text', 'text': '你好<IMAGE_TOKEN>描述一下这个图片'},
            {'type': 'image_url', 'image_url': {'url': '{{url}}'}},
        ]
    }
]

好的，感谢~

github-actions · 2024-12-26T02:36:45Z

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

lvhan028 assigned irexyc Dec 17, 2024

lvhan028 added the awaiting response label Dec 17, 2024

github-actions bot added the Stale label Dec 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] vlm模型的聊天模版，只会提取同一个role的最后一段text吗？ #2911

[Bug] vlm模型的聊天模版，只会提取同一个role的最后一段text吗？ #2911

OftenDream commented Dec 17, 2024

lvhan028 commented Dec 17, 2024

irexyc commented Dec 18, 2024

OftenDream commented Dec 18, 2024

github-actions bot commented Dec 26, 2024

[Bug] vlm模型的聊天模版，只会提取同一个role的最后一段text吗？ #2911

[Bug] vlm模型的聊天模版，只会提取同一个role的最后一段text吗？ #2911

Comments

OftenDream commented Dec 17, 2024

Checklist

Describe the bug

Reproduction

Environment

Error traceback

lvhan028 commented Dec 17, 2024

irexyc commented Dec 18, 2024

OftenDream commented Dec 18, 2024

github-actions bot commented Dec 26, 2024