Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

指定GPU后仍加载到内存中,使用CPU推理 #50

Open
yutong12 opened this issue Nov 9, 2023 · 2 comments
Open

指定GPU后仍加载到内存中,使用CPU推理 #50

yutong12 opened this issue Nov 9, 2023 · 2 comments

Comments

@yutong12
Copy link

yutong12 commented Nov 9, 2023

实验环境:Tesla T4 16G
问题描述:我们使用的是CodeShell-7B-chat-int4这个版本,运行官方示例时构建过久,不包括下载时间,运行在GPU上加载并输出第一个示例结果用时为5分钟41秒。如何加速推理时间?
在运行自带的demo cli_demo.py和web_demo.py时,仅更换模型路径,运行后发现模型未默认加载到GPU中而是加载到CPU中,--device默认是“cuda:0”
预期结果:能加快推理速度,正常输出

@yutong12
Copy link
Author

yutong12 commented Nov 9, 2023

后续更新:在漫长的加载过后,仍然消耗掉了30G内存,6G显存,是否存在某种平衡?

@shuaizai88
Copy link

可能要调整参数把 ,我反正看着我的内存崩了。。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants