We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
实验环境:Tesla T4 16G 问题描述:我们使用的是CodeShell-7B-chat-int4这个版本,运行官方示例时构建过久,不包括下载时间,运行在GPU上加载并输出第一个示例结果用时为5分钟41秒。如何加速推理时间? 在运行自带的demo cli_demo.py和web_demo.py时,仅更换模型路径,运行后发现模型未默认加载到GPU中而是加载到CPU中,--device默认是“cuda:0” 预期结果:能加快推理速度,正常输出
The text was updated successfully, but these errors were encountered:
后续更新:在漫长的加载过后,仍然消耗掉了30G内存,6G显存,是否存在某种平衡?
Sorry, something went wrong.
可能要调整参数把 ,我反正看着我的内存崩了。。
No branches or pull requests
实验环境:Tesla T4 16G
问题描述:我们使用的是CodeShell-7B-chat-int4这个版本,运行官方示例时构建过久,不包括下载时间,运行在GPU上加载并输出第一个示例结果用时为5分钟41秒。如何加速推理时间?
在运行自带的demo cli_demo.py和web_demo.py时,仅更换模型路径,运行后发现模型未默认加载到GPU中而是加载到CPU中,--device默认是“cuda:0”
预期结果:能加快推理速度,正常输出
The text was updated successfully, but these errors were encountered: