Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MTL platform with ARC 770 cannot allocate memory block with size lager than 4GB when running vLLM Qwen2-VL-2B #12136

Open
weijiejx opened this issue Sep 27, 2024 · 5 comments
Assignees

Comments

@weijiejx
Copy link

weijiejx commented Sep 27, 2024

when I run vLLM model like Qwen2-VL-2B with ARC770 on MTL platform, will report error message as below:
RuntimeError: Current platform can NOT allocate memory block with size larger than 4GB! Tried to allocate 6.10 GiB (GPU 0; 15.11 GiB total capacity; 4.84 GiB already allocated; 5.41 GiB reserved in total by PyTorch)

Uploading Screenshot from 2024-09-27 15-45-34.png…

@hzjane
Copy link
Contributor

hzjane commented Sep 29, 2024

Vllm 0.5.4 does not support qwen2-vl model yet. We will support it in the future 0.6.1 version.

@weijiejx
Copy link
Author

Thank you! But I need double confirm, I use ipex to run Qwen2-VL-2B, not OpenVINO, vLLM 0.5.4 not support, right?

@hzjane
Copy link
Contributor

hzjane commented Sep 29, 2024

Yes, even the official version of vllm 0.5.4 does not support it until 0.6.1.

@weijiejx
Copy link
Author

Thanks again.
One more question, is any vLLM model available that I can use with vllm 0.5.4? Can you advise me one or two that I can try it.
Thanks.

@hzjane
Copy link
Contributor

hzjane commented Sep 29, 2024

It is recommended to run Llama Qwen and chatglm models.
for example: Llama-2-7b-chat-hf Qwen1.5-7B-Chat chatglm3-6b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants