We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FastBERT/run_fastbert.py
Line 132 in 5f9e98b
有没有办法能够更灵活的调度需要计算的样本,比如建立一个pool,进入到第10层之后的都放到一个池子里,一起调度,让每一层计算的batchsize固定,这样充分利用显卡资源的话推理起来应该会快很多。
The text was updated successfully, but these errors were encountered:
是否还可以考虑模型分层,bert12层直接分成12个模型,前面层数的模型的判断不了,就放进pool里面,集体调用后面的。而不是一个batch里面出现一个就一直往下走
Sorry, something went wrong.
No branches or pull requests
FastBERT/run_fastbert.py
Line 132 in 5f9e98b
我觉得推理性能慢不是因为nozero。
看代码实现,实际上相当于每过一层transformer encoder,就在当前这个batch剔除掉过于简单的样本 ,也就是batchsize变得更小,然而只要有一个样本到达最后一层,耗时都会比原来bert要多。
有没有办法能够更灵活的调度需要计算的样本,比如建立一个pool,进入到第10层之后的都放到一个池子里,一起调度,让每一层计算的batchsize固定,这样充分利用显卡资源的话推理起来应该会快很多。
The text was updated successfully, but these errors were encountered: