New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

关于inference的效率问题 #35

Open

Youarerare opened this issue Mar 23, 2021 · 1 comment

Youarerare commented Mar 23, 2021

FastBERT/run_fastbert.py

Line 132 in 5f9e98b

# inference

我觉得推理性能慢不是因为nozero。
看代码实现，实际上相当于每过一层transformer encoder，就在当前这个batch剔除掉过于简单的样本 ,也就是batchsize变得更小，然而只要有一个样本到达最后一层，耗时都会比原来bert要多。

有没有办法能够更灵活的调度需要计算的样本，比如建立一个pool，进入到第10层之后的都放到一个池子里，一起调度，让每一层计算的batchsize固定，这样充分利用显卡资源的话推理起来应该会快很多。

Author

Youarerare commented Mar 23, 2021

是否还可以考虑模型分层，bert12层直接分成12个模型，前面层数的模型的判断不了，就放进pool里面，集体调用后面的。而不是一个batch里面出现一个就一直往下走

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment