We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
支持多长输入 TurboMind supports Qwen-7B, dynamic NTK-RoPE scaling and dynamic logN scaling
The text was updated successfully, but these errors were encountered:
在用 deploy.py 把 qwen-7b 转成 turbomind 要求的权重格式之后,会生成一个配置文件,路径是 workspace/triton_models/weights/config.ini。
把这个配置文件中几个配置项修改为:
max_position_embeddings = 2048 use_dynamic_ntk = 1 use_logn_attn = 1
就能开启外推能力。可以支持到 8K 长度的对话
Sorry, something went wrong.
@lvhan028 hello,感谢你的指导,我再llama2-70B上使用ntk,发现8K的长度是ok的,但是再长,比如到16k,就会有乱码了,请问这个问题怎么处理,使用q_scaling吗?
在用 deploy.py 把 qwen-7b 转成 turbomind 要求的权重格式之后,会生成一个配置文件,路径是 workspace/triton_models/weights/config.ini。 把这个配置文件中几个配置项修改为: max_position_embeddings = 2048 use_dynamic_ntk = 1 use_logn_attn = 1 就能开启外推能力。可以支持到 8K 长度的对话
qwen-7b 8K以上能支持吗?比如32k
No branches or pull requests
Motivation
支持多长输入 TurboMind supports Qwen-7B, dynamic NTK-RoPE scaling and dynamic logN scaling
Related resources
支持多长输入 TurboMind supports Qwen-7B, dynamic NTK-RoPE scaling and dynamic logN scaling
Additional context
支持多长输入 TurboMind supports Qwen-7B, dynamic NTK-RoPE scaling and dynamic logN scaling
The text was updated successfully, but these errors were encountered: