Skip to content

Pull requests: vllm-project/vllm

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

[Misc] Support HF Hub remote loading for LoRA adapters documentation Improvements or additions to documentation needs-rebase unstale
#4939 opened May 21, 2024 by Isotr0py Loading…
3 tasks done
[ CI ] Awq Marlin Integration Tests ci/build needs-rebase ready ONLY add when PR is ready to merge/full CI is needed
#6627 opened Jul 22, 2024 by robertgshaw2-neuralmagic Loading…
[Kernel] Unify the kernel used in flash attention backend needs-rebase ready ONLY add when PR is ready to merge/full CI is needed
#6052 opened Jul 2, 2024 by LiuXiaoxuanPKU Loading…
[Multi-step] Remove redundant CPU to GPU transfer for non-last rank PP/TP ready ONLY add when PR is ready to merge/full CI is needed
#7715 opened Aug 21, 2024 by SolitaryThinker Loading…
Prefix caching and deallocation mechanism documentation Improvements or additions to documentation frontend needs-rebase unstale
#2511 opened Jan 19, 2024 by jadielam Loading…
[Misc] Allow for unsigned zero NAN representation in ScalarType ready ONLY add when PR is ready to merge/full CI is needed
#7661 opened Aug 19, 2024 by LucasWilkinson Loading…
[Misc]Minor Changes about Worker ready ONLY add when PR is ready to merge/full CI is needed
#11555 opened Dec 27, 2024 by noemotiovon Loading…
[V1] 7/N API Server: Update LM-Eval To Use Streaming ci/build ready ONLY add when PR is ready to merge/full CI is needed
#11590 opened Dec 28, 2024 by robertgshaw2-neuralmagic Loading…
[Misc][Quark] Upstream Quark format to VLLM ready ONLY add when PR is ready to merge/full CI is needed
#10765 opened Nov 29, 2024 by kewang-xlnx Loading…
[Doc] Proofreading documentation documentation Improvements or additions to documentation
#6998 opened Jul 31, 2024 by sgolebiewski-intel Loading…
[Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS documentation Improvements or additions to documentation needs-rebase
#6036 opened Jul 1, 2024 by LeiWang1999 Loading…
3 tasks done
[v1][stats][1/n] Add RequestStatsUpdate and RequestStats types ready ONLY add when PR is ready to merge/full CI is needed
#10907 opened Dec 4, 2024 by rickyyx Loading…
[Bugfix] Check prompt length < max_model_len for all models in AsyncLLMEngine ready ONLY add when PR is ready to merge/full CI is needed
#10881 opened Dec 4, 2024 by aurickq Loading…
[Misc] Add multipstep chunked-prefill support for FlashInfer ready ONLY add when PR is ready to merge/full CI is needed
#10467 opened Nov 20, 2024 by elfiegg Loading…
[Kernel][ROCm][AMD] fp8 moe configs for MI300X. Mixtral-8x(7B,22B) TP=1,2,4,8 ready ONLY add when PR is ready to merge/full CI is needed rocm
#9820 opened Oct 29, 2024 by divakar-amd Loading…
[Bugfix] limit lora init id greater than 0 ready ONLY add when PR is ready to merge/full CI is needed
#9093 opened Oct 5, 2024 by Ssunbell Loading…
[Bugfix] fix error due to an uninitialized tokenizer when using skip_tokenizer_init with num_scheduler_steps ready ONLY add when PR is ready to merge/full CI is needed
#9276 opened Oct 11, 2024 by junstar92 Loading…
[Hardware][TPU] workaround fix for MoE on TPU ready ONLY add when PR is ready to merge/full CI is needed
#11764 opened Jan 6, 2025 by avshalomman Loading…
[V1][Core] Autotune encoder cache budget
#11895 opened Jan 9, 2025 by ywang96 Loading…
ProTip! Exclude everything labeled bug with -label:bug.