Skip to content

Commit

Permalink
bug report
Browse files Browse the repository at this point in the history
  • Loading branch information
Eddie-Wang1120 committed Sep 17, 2023
1 parent 57dcb7f commit ec1c2bd
Show file tree
Hide file tree
Showing 11 changed files with 92 additions and 25 deletions.
44 changes: 36 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,34 @@
Whisper Model For Tensorrt-LLM
==============
## 模型实现思路
使用的模型为whisper的large模型,支持多语言,保证功能完全。将whisper分解为encoder和decoder两部分,在tensorrt-llm的model文件夹中分别用WhisperEncoder
和WhisperDecoder。同时在完整的编码和解码流程在run.py中分别封装为WhisperEncoding和WhisperDecoding两部分,具体实现包含在encoding.py和decoding.py中,
最大程度的保证代码的整洁和可读性。
使用的模型为whisper的large模型,支持多语言,保证功能完全。将whisper分解为encoder和decoder两部分,在tensorrt-llm的model文件夹中分别用WhisperEncoder和WhisperDecoder。同时在完整的编码和解码流程在run.py中分别封装为WhisperEncoding和WhisperDecoding两部分,具体实现包含在encoding.py和decoding.py中,最大程度的保证代码的整洁和可读性。

## 对Tensorrt-LLM的改动
原始的Tensorrt-LLM不支持Conv1D和CrossAttention两个算子,对语音识别模型并不友好。在原本Tensorrt-LLM的基础上,我基于原版的Conv2D算子实现了Conv1D算子,基于
原版的Attention算子(只支持MultiHeadAttention)实现了CrossAttention算子(仍在Attention内部实现,使Attention抽象正确),最大程度保留了原版代码的同时增加
了新的功能,保证之前的模型仍能正常使用,完成了Tensorrt-LLM的正确迭代。
原始的Tensorrt-LLM不支持Conv1D和CrossAttention两个算子,对语音识别模型并不友好。在原本Tensorrt-LLM的基础上,我基于原版的Conv2D算子实现了Conv1D算子,基于原版的Attention算子(只支持MultiHeadAttention)实现了CrossAttention算子(仍在Attention内部实现,使Attention抽象正确),最大程度保留了原版代码的同时增加了新的功能,保证之前的模型仍能正常使用,完成了Tensorrt-LLM的正确迭代。

## 可能的BUG
* linear算子权重初始化问题
* linear算子transB问题
* linear算子权重初始化问题
在weight.py对权重进行初始化时,使用np.ascontiguousarray(weight.numpy())将权重内存空间作为连续存储。该操作应该对按连续内存存储的权重无改变,对不按连续存储的权重有正面作用,即无负面作用。但对tensorrt_llm_whisper.blocks[i].attn.dense.weight.value和tensorrt_llm_whisper.blocks[i].attn.dense.weight.value进行该操作时,生成的结果错误,推测linear内部关于内存处理方面操作存在问题。
操作前结果:
![image](./imgs/before.png)
操作后结果:
![image](./imgs/bug1_after.png)
如何复现bug:
在examples/whisper/weight.py中将67行代码改为下图所示,之后重新build encoder即可
![image](./imgs/bug1_how.png)

* linear算子transB问题
在weight.py对权重进行初始化时,由于原版RowLinear算子固定transb为True,所以将原本的weight进行transpose后置入模型,但结果错误。更加离谱的是,就算不进行transpose操作,单纯将原本的weight置入模型,依然会得到相同的错误结果,即此时是否transpose都会得到相同的错误结果,推测linear内部关于内存处理方面操作存在问题。
操作前结果:
![image](./imgs/before.png)
操作后结果:
![image](./imgs/bug2_after.png)
如何复现bug:
在examples/whisper/weight.py中将67行代码改为下图所示(有无transpose(0,1)对结果无影响)
![image](./imgs/bug2_how_1.png)
在tensorrt-llm/layers/linear.py中将当前RowLinear类代码注释,并将142行被注释的原版代码解除注释,之后重新build encoder即可
![image](./imgs/bug2_how_2.png)


## 性能对比
* 纯英文音频,全长18.15秒
Expand Down Expand Up @@ -42,3 +58,15 @@ To run:
python3 run.py --input_file test.m4a
```
提供两个作者录制的示例音频,分别为test.m4a和test_chinese.m4a

To int8:
```
python3 torch_whisper_convert.py -i large-v2.py -o quantize
python3 build.py --use_gpt_attention_plugin --use_gemm_plugin --use_layernorm_plugin --int8_kv_cache
python3 run.py
```

To summarize:
```
python3 summarize.py
```
Binary file added imgs/before.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added imgs/bug1_after.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added imgs/bug1_how.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added imgs/bug2_after.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added imgs/bug2_how_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added imgs/bug2_how_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions tensorrt_llm_july-release-v1/examples/whisper/build.py
Original file line number Diff line number Diff line change
Expand Up @@ -358,8 +358,8 @@ def run_build(args=None):

model = torch.load(args.model_dir)
build_encoder(model, args)
build_decoder(model, args)
build_crossattn_kv_linear(model, args)
# build_decoder(model, args)
# build_crossattn_kv_linear(model, args)

if __name__ == '__main__':
run_build()
2 changes: 1 addition & 1 deletion tensorrt_llm_july-release-v1/examples/whisper/summarize.py
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@ def main(args):
default='fp16')
parser.add_argument('--log_level', type=str, default='info')
parser.add_argument('--engine_dir', type=str, default='whisper_outputs')
parser.add_argument('--dataset_dir', type=str, default='./LibriSpeech/dev-clean-2/84/121550')
parser.add_argument('--dataset_dir', type=str, default='./LibriSpeech/test')
parser.add_argument('--checkpoint_file', type=str, default='./large-v2.pt')
parser.add_argument('--check_accuracy', action='store_true')
parser.add_argument('--tensorrt_llm_rouge1_threshold',
Expand Down
28 changes: 14 additions & 14 deletions tensorrt_llm_july-release-v1/examples/whisper/weight.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ def sinusoids(length, channels, max_timescale=10000):
return torch.cat([torch.sin(scaled_time), torch.cos(scaled_time)], dim=1)

def trans_weight(weight):
return np.ascontiguousarray(weight.numpy())
return np.ascontiguousarray(weight)

def load_encoder_weight(tensorrt_llm_whisper: WhisperEncoder,
model_metadata : dict,
Expand Down Expand Up @@ -71,12 +71,12 @@ def load_encoder_weight(tensorrt_llm_whisper: WhisperEncoder,
tensorrt_llm_whisper.blocks[i].mlp_ln.weight.value = model_params['encoder.blocks.'+str(i)+'.mlp_ln.weight'].numpy()
tensorrt_llm_whisper.blocks[i].mlp_ln.bias.value = model_params['encoder.blocks.'+str(i)+'.mlp_ln.bias'].numpy()

tensorrt_llm_whisper.blocks[i].mlp1.weight.value = trans_weight(model_params['encoder.blocks.'+str(i)+'.mlp.0.weight'])
tensorrt_llm_whisper.blocks[i].mlp1.bias.value = trans_weight(model_params['encoder.blocks.'+str(i)+'.mlp.0.bias'])
tensorrt_llm_whisper.blocks[i].mlp1.weight.value = trans_weight(model_params['encoder.blocks.'+str(i)+'.mlp.0.weight'].numpy())
tensorrt_llm_whisper.blocks[i].mlp1.bias.value = trans_weight(model_params['encoder.blocks.'+str(i)+'.mlp.0.bias'].numpy())
# tensorrt_llm_whisper.blocks[0].mlp1.matmul_trans_weight = False

tensorrt_llm_whisper.blocks[i].mlp2.weight.value = trans_weight(model_params['encoder.blocks.'+str(i)+'.mlp.2.weight'])
tensorrt_llm_whisper.blocks[i].mlp2.bias.value = trans_weight(model_params['encoder.blocks.'+str(i)+'.mlp.2.bias'])
tensorrt_llm_whisper.blocks[i].mlp2.weight.value = trans_weight(model_params['encoder.blocks.'+str(i)+'.mlp.2.weight'].numpy())
tensorrt_llm_whisper.blocks[i].mlp2.bias.value = trans_weight(model_params['encoder.blocks.'+str(i)+'.mlp.2.bias'].numpy())

tensorrt_llm_whisper.ln_post.weight.value = model_params['encoder.ln_post.weight'].numpy()
tensorrt_llm_whisper.ln_post.bias.value = model_params['encoder.ln_post.bias'].numpy()
Expand Down Expand Up @@ -141,8 +141,8 @@ def load_decoder_weight(
tensorrt_llm_whisper.blocks[i].cross_attn_ln.weight.value = model_params['decoder.blocks.'+str(i)+'.cross_attn_ln.weight'].numpy()
tensorrt_llm_whisper.blocks[i].cross_attn_ln.bias.value = model_params['decoder.blocks.'+str(i)+'.cross_attn_ln.bias'].numpy()

tensorrt_llm_whisper.blocks[i].cross_attn.q_linear.weight.value = trans_weight(model_params['decoder.blocks.'+str(i)+'.cross_attn.query.weight'])
tensorrt_llm_whisper.blocks[i].cross_attn.q_linear.bias.value = trans_weight(model_params['decoder.blocks.'+str(i)+'.cross_attn.query.bias'])
tensorrt_llm_whisper.blocks[i].cross_attn.q_linear.weight.value = trans_weight(model_params['decoder.blocks.'+str(i)+'.cross_attn.query.weight'].numpy())
tensorrt_llm_whisper.blocks[i].cross_attn.q_linear.bias.value = trans_weight(model_params['decoder.blocks.'+str(i)+'.cross_attn.query.bias'].numpy())

tensorrt_llm_whisper.blocks[i].cross_attn.dense.weight.value = model_params['decoder.blocks.'+str(i)+'.cross_attn.out.weight'].numpy()
tensorrt_llm_whisper.blocks[i].cross_attn.dense.bias.value = model_params['decoder.blocks.'+str(i)+'.cross_attn.out.bias'].numpy()
Expand All @@ -151,12 +151,12 @@ def load_decoder_weight(
tensorrt_llm_whisper.blocks[i].mlp_ln.weight.value = model_params['decoder.blocks.'+str(i)+'.mlp_ln.weight'].numpy()
tensorrt_llm_whisper.blocks[i].mlp_ln.bias.value = model_params['decoder.blocks.'+str(i)+'.mlp_ln.bias'].numpy()

tensorrt_llm_whisper.blocks[i].mlp1.weight.value = trans_weight(model_params['decoder.blocks.'+str(i)+'.mlp.0.weight'])
tensorrt_llm_whisper.blocks[i].mlp1.bias.value = trans_weight(model_params['decoder.blocks.'+str(i)+'.mlp.0.bias'])
tensorrt_llm_whisper.blocks[i].mlp1.weight.value = trans_weight(model_params['decoder.blocks.'+str(i)+'.mlp.0.weight'].numpy())
tensorrt_llm_whisper.blocks[i].mlp1.bias.value = trans_weight(model_params['decoder.blocks.'+str(i)+'.mlp.0.bias'].numpy())
# tensorrt_llm_whisper.blocks[0].mlp1.matmul_trans_weight = False

tensorrt_llm_whisper.blocks[i].mlp2.weight.value = trans_weight(model_params['decoder.blocks.'+str(i)+'.mlp.2.weight'])
tensorrt_llm_whisper.blocks[i].mlp2.bias.value = trans_weight(model_params['decoder.blocks.'+str(i)+'.mlp.2.bias'])
tensorrt_llm_whisper.blocks[i].mlp2.weight.value = trans_weight(model_params['decoder.blocks.'+str(i)+'.mlp.2.weight'].numpy())
tensorrt_llm_whisper.blocks[i].mlp2.bias.value = trans_weight(model_params['decoder.blocks.'+str(i)+'.mlp.2.bias'].numpy())

tensorrt_llm_whisper.ln.weight.value = model_params['decoder.ln.weight'].numpy()
tensorrt_llm_whisper.ln.bias.value = model_params['decoder.ln.bias'].numpy()
Expand All @@ -169,10 +169,10 @@ def load_crossattn_linear_weight(tensorrt_llm_whisper: CrossAttn_KV,
tensorrt_llm.logger.info('Loading CrossAttn weights from PT...')

for i in range(n_layer):
tensorrt_llm_whisper.blocks[i].k_linear.weight.value = trans_weight(model_params['decoder.blocks.'+str(i)+'.cross_attn.key.weight'])
tensorrt_llm_whisper.blocks[i].k_linear.weight.value = trans_weight(model_params['decoder.blocks.'+str(i)+'.cross_attn.key.weight'].numpy())

tensorrt_llm_whisper.blocks[i].v_linear.weight.value = trans_weight(model_params['decoder.blocks.'+str(i)+'.cross_attn.value.weight'])
tensorrt_llm_whisper.blocks[i].v_linear.weight.bias = trans_weight(model_params['decoder.blocks.'+str(i)+'.cross_attn.value.bias'])
tensorrt_llm_whisper.blocks[i].v_linear.weight.value = trans_weight(model_params['decoder.blocks.'+str(i)+'.cross_attn.value.weight'].numpy())
tensorrt_llm_whisper.blocks[i].v_linear.weight.bias = trans_weight(model_params['decoder.blocks.'+str(i)+'.cross_attn.value.bias'].numpy())

# model = torch.load("large-v2.pt")

Expand Down
39 changes: 39 additions & 0 deletions tensorrt_llm_july-release-v1/tensorrt_llm/layers/linear.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,3 +138,42 @@ def forward(self, x):
x = x + self.bias.value

return x

# class RowLinear(Module):

# def __init__(self,
# in_features,
# out_features,
# bias=True,
# dtype=None,
# tp_group=None,
# tp_size=1):
# super().__init__()
# self.in_features = in_features // tp_size
# self.out_features = out_features
# self.dtype = dtype

# self.weight = Parameter(shape=(self.out_features, self.in_features),
# dtype=dtype)

# if bias:
# self.bias = Parameter(shape=(self.out_features, ), dtype=dtype)
# else:
# self.register_parameter('bias', None)

# self.tp_group = tp_group
# self.tp_size = tp_size

# def forward(self, x):
# if default_net().plugin_config.gemm_plugin:
# x = _gemm_plugin(x, self.weight.value, transb=True)
# else:
# x = matmul(x, self.weight.value, transb=True)

# if self.tp_size > 1 and self.tp_group is not None:
# x = allreduce(x, self.tp_group)

# if self.bias is not None:
# x = x + self.bias.value

# return x

0 comments on commit ec1c2bd

Please sign in to comment.