Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

支持多长输入 TurboMind supports Qwen-7B, dynamic NTK-RoPE scaling and dynamic logN scaling #430

Open
yuanjie-ai opened this issue Sep 19, 2023 · 3 comments

Comments

@yuanjie-ai
Copy link

Motivation

支持多长输入 TurboMind supports Qwen-7B, dynamic NTK-RoPE scaling and dynamic logN scaling

Related resources

支持多长输入 TurboMind supports Qwen-7B, dynamic NTK-RoPE scaling and dynamic logN scaling

Additional context

支持多长输入 TurboMind supports Qwen-7B, dynamic NTK-RoPE scaling and dynamic logN scaling

@lvhan028
Copy link
Collaborator

在用 deploy.py 把 qwen-7b 转成 turbomind 要求的权重格式之后,会生成一个配置文件,路径是 workspace/triton_models/weights/config.ini。

把这个配置文件中几个配置项修改为:

max_position_embeddings = 2048
use_dynamic_ntk = 1
use_logn_attn = 1

就能开启外推能力。可以支持到 8K 长度的对话

@sjzhou4
Copy link

sjzhou4 commented Sep 25, 2023

@lvhan028 hello,感谢你的指导,我再llama2-70B上使用ntk,发现8K的长度是ok的,但是再长,比如到16k,就会有乱码了,请问这个问题怎么处理,使用q_scaling吗?

@zhongjiyongshi
Copy link

在用 deploy.py 把 qwen-7b 转成 turbomind 要求的权重格式之后,会生成一个配置文件,路径是 workspace/triton_models/weights/config.ini。

把这个配置文件中几个配置项修改为:

max_position_embeddings = 2048
use_dynamic_ntk = 1
use_logn_attn = 1

就能开启外推能力。可以支持到 8K 长度的对话

qwen-7b 8K以上能支持吗?比如32k

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants