[Bug] glm-4v图像理解在传json情况下生成乱码 #2909

Sunxiaohu0406 · 2024-12-17T11:41:44Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

用加模板的方式启用：
lmdeploy serve api_server --eager-mode /home/nfs/appnfs/sxh/pre_models/ZhipuAI/glm-4v-9b --backend pytorch --model-name glm-4v-9b --device ascend --tp 2 --chat-template /opt/lmdeploy/chat_template/glm-4v.json --server-name 0.0.0.0 --server-port 50055
生成为空或乱码

过程以及错误结果如下：
curl "http://192.168.1.49:50055/v1/chat/completions" -H "Content-Type: application/json" -d '{
"model": "glm-4v-9b",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "描述这张图片"},
{ "type": "image_url", "image_url": { "url": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACgAAAAfCAIAAAAa4xgvAAAACXBIWXMAABJ0AAASdAHeZh94AAAAEXRFWHRTb2Z0d2FyZQBTbmlwYXN0ZV0Xzt0AAAGDSURBVEiJ7ZZNboJAFMefxnMIsvQAphtkgQt7AxIJLMzEnoGkLtqEM2gIi6oLjsACmkC78hBWepHpYuwEBuTDjNSk/sJiPvnPe3nvzXSG4wdoGQwA0GtZj9JtTzjLnwlzdTWuXkK5u7oZTXzLwFrsrVykG2dlmO+XyVjZ++HeD721W1O4zGJv5UqCWDgVxJFlv9jWcjJWyMhoqtJD2NaycH1dYe1pnu6imaHKirbIDDq7N2e3ye+l5wAApBuDPmtAj86hmUnakiCS9uhRhatxEna2G2e7AQBv7YYfUaERfLm9dPLWrmW/HpJj+X6km0g3aTeIo/fP6EJh8i9tMa9UhaLgonFeV5jmxrlA5UuXqn59H0dTtY6VXDhZzGR3C/CMakkQ65fMLlN1L4PUatt6ZuoaQZWz4YYBcJPbadAXD0nC6AEoSDeDOErXSMLeD2k7f6CTcNpL6UWM95j9JGuDuDhx80dJ0xnKnJ+35HYqU8XXEa4GA/zHN9ft3U6XcH9Xl/ED3BKQRuI2xpEAAAAASUVORK5CYII="}}
]
}
],
"max_tokens": 128,
"stream": false
}'

{"id":"1","object":"chat.completion","created":1734435379,"model":"glm-4v-9b","choices":[{"index":0,"message":{"role":"assistant","content":"(preview) is a pattern","tool_calls":null},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":1647,"total_tokens":1653,"completion_tokens":6}}

内容有时候还会生成：“，，，，，，）））））））），，，，，”类似这种不正常的输出

不加模板启动生成是正常的，但是如果我用k8s调用的话必须要求传模板

我的json模板如下：
{
"model_name": "glm-4v-9b",
"system": "<|vision_start|>system\n",
"meta_instruction": "你是一个名为 GLM-4 的人工智能助手。你是基于智谱AI训练的语言模型 GLM-4 模型开发的，你的任务是针对用户的问题和要求提供适当的答复和支持。",
"eosys": "<|vision_end|>\n",
"user": "<|vision_start|>user\n",
"eoh": "<|vision_end|>\n",
"assistant": "<|vision_start|>assistant\n",
"eoa": "<|vision_end|>",
"separator": "\n",
"capability": "chat",
"stop_words": ["<|vision_end|>"]
}

Reproduction

lmdeploy serve api_server
--eager-mode /home/nfs/appnfs/xxx/pre_models/ZhipuAI/glm-4v-9b
--backend pytorch
--model-name glm-4v-9b
--device ascend
--tp 2
--chat-template /opt/lmdeploy/chat_template/glm-4v.json
--server-name 0.0.0.0
--server-port 50055

Environment

[W compiler_depend.ts:615] Warning: expandable_segments currently defaults to false. You can enable this feature by `export PYTORCH_NPU_ALLOC_CONF = expandable_segments:True`. (function operator())
sys.platform: linux
Python: 3.10.5 (main, Sep 24 2024, 03:43:49) [GCC 9.4.0]
CUDA available: False
MUSA available: False
numpy_random_seed: 2147483648
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
PyTorch: 2.3.1
PyTorch compiling details: PyTorch built with:
  - GCC 10.2
  - C++ Version: 201703
  - Intel(R) MKL-DNN v3.3.6 (Git Hash 86e6af5974177e513fd3fee58425e1063e7f1361)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: NO AVX
  - Build settings: BLAS_INFO=open, BUILD_TYPE=Release, CXX_COMPILER=/opt/rh/devtoolset-10/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=open, TORCH_VERSION=2.3.1, USE_CUDA=OFF, USE_CUDNN=OFF, USE_CUSPARSELT=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, 

TorchVision: 0.18.1
LMDeploy: 0.6.3+178ec7b
transformers: 4.46.3
gradio: Not Found
fastapi: 0.115.5
pydantic: 2.10.0
triton: Not Found

Error traceback

No response

RunningLeon · 2024-12-18T02:36:10Z

@Sunxiaohu0406 hi 这套对话模板在hf 模型上能正常对话吗？看你用的special token在原模型里是没有的 _history_to_prompt

Sunxiaohu0406 · 2024-12-18T02:38:45Z

@Sunxiaohu0406 hi 这套对话模板在hf 模型上能正常对话吗？看你用的special token在原模型里是没有的 _history_to_prompt

LLM的服务化部署交流是正常可用的，就是图像理解是有问题的。

RunningLeon · 2024-12-18T02:52:36Z

@Sunxiaohu0406 hi 这套对话模板在hf 模型上能正常对话吗？看你用的special token在原模型里是没有的 _history_to_prompt

LLM的服务化部署交流是正常可用的，就是图像理解是有问题的。

你先用transformers跑下看看结果如何，好有个对比基准。

lvhan028 assigned RunningLeon Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] glm-4v图像理解在传json情况下生成乱码 #2909

[Bug] glm-4v图像理解在传json情况下生成乱码 #2909

Sunxiaohu0406 commented Dec 17, 2024

RunningLeon commented Dec 18, 2024

Sunxiaohu0406 commented Dec 18, 2024

RunningLeon commented Dec 18, 2024

[Bug] glm-4v图像理解在传json情况下生成乱码 #2909

[Bug] glm-4v图像理解在传json情况下生成乱码 #2909

Comments

Sunxiaohu0406 commented Dec 17, 2024

Checklist

Describe the bug

Reproduction

Environment

Error traceback

RunningLeon commented Dec 18, 2024

Sunxiaohu0406 commented Dec 18, 2024

RunningLeon commented Dec 18, 2024