Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] glm-4v图像理解在传json情况下生成乱码 #2909

Open
3 tasks
Sunxiaohu0406 opened this issue Dec 17, 2024 · 3 comments
Open
3 tasks

[Bug] glm-4v图像理解在传json情况下生成乱码 #2909

Sunxiaohu0406 opened this issue Dec 17, 2024 · 3 comments
Assignees

Comments

@Sunxiaohu0406
Copy link

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

用加模板的方式启用:
lmdeploy serve api_server --eager-mode /home/nfs/appnfs/sxh/pre_models/ZhipuAI/glm-4v-9b --backend pytorch --model-name glm-4v-9b --device ascend --tp 2 --chat-template /opt/lmdeploy/chat_template/glm-4v.json --server-name 0.0.0.0 --server-port 50055
生成为空或乱码

过程以及错误结果如下:
curl "http://192.168.1.49:50055/v1/chat/completions" -H "Content-Type: application/json" -d '{
"model": "glm-4v-9b",
"messages": [
{
"role": "user",
"content": [
{ "type": "text", "text": "描述这张图片"},
{ "type": "image_url", "image_url": { "url": ""}}
]
}
],
"max_tokens": 128,
"stream": false
}'

{"id":"1","object":"chat.completion","created":1734435379,"model":"glm-4v-9b","choices":[{"index":0,"message":{"role":"assistant","content":"(preview) is a pattern","tool_calls":null},"logprobs":null,"finish_reason":"stop"}],"usage":{"prompt_tokens":1647,"total_tokens":1653,"completion_tokens":6}}

内容有时候还会生成:“,,,,,,)))))))),,,,,”类似这种不正常的输出

不加模板启动生成是正常的,但是如果我用k8s调用的话必须要求传模板

我的json模板如下:
{
"model_name": "glm-4v-9b",
"system": "<|vision_start|>system\n",
"meta_instruction": "你是一个名为 GLM-4 的人工智能助手。你是基于智谱AI训练的语言模型 GLM-4 模型开发的,你的任务是针对用户的问题和要求提供适当的答复和支持。",
"eosys": "<|vision_end|>\n",
"user": "<|vision_start|>user\n",
"eoh": "<|vision_end|>\n",
"assistant": "<|vision_start|>assistant\n",
"eoa": "<|vision_end|>",
"separator": "\n",
"capability": "chat",
"stop_words": ["<|vision_end|>"]
}

Reproduction

lmdeploy serve api_server
--eager-mode /home/nfs/appnfs/xxx/pre_models/ZhipuAI/glm-4v-9b
--backend pytorch
--model-name glm-4v-9b
--device ascend
--tp 2
--chat-template /opt/lmdeploy/chat_template/glm-4v.json
--server-name 0.0.0.0
--server-port 50055

Environment

[W compiler_depend.ts:615] Warning: expandable_segments currently defaults to false. You can enable this feature by `export PYTORCH_NPU_ALLOC_CONF = expandable_segments:True`. (function operator())
sys.platform: linux
Python: 3.10.5 (main, Sep 24 2024, 03:43:49) [GCC 9.4.0]
CUDA available: False
MUSA available: False
numpy_random_seed: 2147483648
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
PyTorch: 2.3.1
PyTorch compiling details: PyTorch built with:
  - GCC 10.2
  - C++ Version: 201703
  - Intel(R) MKL-DNN v3.3.6 (Git Hash 86e6af5974177e513fd3fee58425e1063e7f1361)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: NO AVX
  - Build settings: BLAS_INFO=open, BUILD_TYPE=Release, CXX_COMPILER=/opt/rh/devtoolset-10/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=open, TORCH_VERSION=2.3.1, USE_CUDA=OFF, USE_CUDNN=OFF, USE_CUSPARSELT=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, 

TorchVision: 0.18.1
LMDeploy: 0.6.3+178ec7b
transformers: 4.46.3
gradio: Not Found
fastapi: 0.115.5
pydantic: 2.10.0
triton: Not Found

Error traceback

No response

@RunningLeon
Copy link
Collaborator

@Sunxiaohu0406 hi 这套对话模板在hf 模型上能正常对话吗?看你用的special token在原模型里是没有的 _history_to_prompt

@Sunxiaohu0406
Copy link
Author

@Sunxiaohu0406 hi 这套对话模板在hf 模型上能正常对话吗?看你用的special token在原模型里是没有的 _history_to_prompt

LLM的服务化部署交流是正常可用的,就是图像理解是有问题的。

@RunningLeon
Copy link
Collaborator

@Sunxiaohu0406 hi 这套对话模板在hf 模型上能正常对话吗?看你用的special token在原模型里是没有的 _history_to_prompt

LLM的服务化部署交流是正常可用的,就是图像理解是有问题的。

你先用transformers跑下看看结果如何,好有个对比基准。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants