-
Notifications
You must be signed in to change notification settings - Fork 946
Pull requests: NVIDIA/TensorRT-LLM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
docs: clarify the slurm case
documentation
Improvements or additions to documentation
triaged
Issue has been triaged by maintainers
#2316
opened Oct 10, 2024 by
stas00
Loading…
Add missing headers for mpiUtils.h to compile with gcc13
#2315
opened Oct 10, 2024 by
mfuntowicz
Loading…
Fixed minor typo in advanced docs
documentation
Improvements or additions to documentation
#2290
opened Oct 5, 2024 by
SachinVarghese
Loading…
doc: add the missing BF16
documentation
Improvements or additions to documentation
#2285
opened Oct 3, 2024 by
stas00
Loading…
README.md: Add 3rd Party Inference Speed Dashboard
documentation
Improvements or additions to documentation
#2244
opened Sep 22, 2024 by
matichon-vultureprime
Loading…
Modify small-batched weight only quantization
quantization
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2213
opened Sep 10, 2024 by
dasistwo
Loading…
Fix extra-index-url for torch
installation
Merged
Windows
#2188
opened Sep 3, 2024 by
pamelap-nvidia
Loading…
[examples/bert/build.py]: Load weights for BertModel and RobertaModel if Issue has been triaged by maintainers
--model_dir
is provided
triaged
#2187
opened Sep 3, 2024 by
tkhanipov
Loading…
Add workaround instruction for a known issue of v0.11 on Windows
Merged
#2146
opened Aug 23, 2024 by
pamelap-nvidia
Loading…
fix wrong buffer for
oneShotAllReduceKernel
under PUSH_MODE
#2099
opened Aug 8, 2024 by
YconquestY
Loading…
Fix the workspace size calculation for quantization plugins
Merged
#2097
opened Aug 7, 2024 by
ZhangGe6
Loading…
decoder MMHA kernel support INT8 SCALE_Q_INSTEAD_OF_K and SCALE_P_INS…
#2085
opened Aug 5, 2024 by
lishicheng1996
Loading…
Include use_fused_mlp when constructing BuildConfig from dict
Merged
#2081
opened Aug 2, 2024 by
ethnzhng
Loading…
fix wrong arg in Engine Building Command in docs/source/performance/perf-overview.md
documentation
Improvements or additions to documentation
#2057
opened Jul 30, 2024 by
RuibaiXu
Loading…
Fix default min length
triaged
Issue has been triaged by maintainers
#1935
opened Jul 11, 2024 by
akhoroshev
Loading…
Bump transformers from 4.36.2 to 4.38.0 in /examples/multimodal
bug
Something isn't working
dependencies
Pull requests that update a dependency file
triaged
Issue has been triaged by maintainers
waiting for feedback
#1689
opened May 28, 2024 by
dependabot
bot
Loading…
add cached generation buffer
triaged
Issue has been triaged by maintainers
waiting for feedback
#1685
opened May 28, 2024 by
michael200892458
Loading…
Fix CUDA OOM when creating Mixtral checkpoint
triaged
Issue has been triaged by maintainers
waiting for feedback
#1629
opened May 19, 2024 by
VivekBits2210
Loading…
Previous Next
ProTip!
Updated in the last three days: updated:>2024-10-10.