-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Bugfix][Core] Output sampling: heuristic to choose between candidates
needs-rebase
unstale
#6589
opened Jul 19, 2024 by
NihalPotdar
Loading…
[ Do Not Merge ] pyzmq based openai server prototypes (w/ protobuf)
#6880
opened Jul 29, 2024 by
robertgshaw2-neuralmagic
•
Draft
[ DO NOT MERGE] pyzmq based openai server prototypes (w/ python pickle)
#6874
opened Jul 28, 2024 by
robertgshaw2-neuralmagic
•
Draft
Add required libcuda.so
ci/build
needs-rebase
ready
ONLY add when PR is ready to merge/full CI is needed
#6864
opened Jul 27, 2024 by
sdake
Loading…
[Draft] [Speculative decoding] Use SPMD worker to reduce control plane communication
#6664
opened Jul 23, 2024 by
cadedaniel
•
Draft
[ CI ] Awq Marlin Integration Tests
ci/build
needs-rebase
ready
ONLY add when PR is ready to merge/full CI is needed
#6627
opened Jul 22, 2024 by
robertgshaw2-neuralmagic
Loading…
[Kernel] Unify the kernel used in flash attention backend
needs-rebase
ready
ONLY add when PR is ready to merge/full CI is needed
#6052
opened Jul 2, 2024 by
LiuXiaoxuanPKU
Loading…
[Not for review] Spmd tp rebase
ready
ONLY add when PR is ready to merge/full CI is needed
#6483
opened Jul 16, 2024 by
ruisearch42
•
Draft
[ Misc ] Support Act Order in Compressed Tensors
needs-rebase
ready
ONLY add when PR is ready to merge/full CI is needed
#6358
opened Jul 12, 2024 by
robertgshaw2-neuralmagic
Loading…
[BigFix] Fix the lm_head in gpt_bigcode in lora mode
#6357
opened Jul 12, 2024 by
maxdebayser
Loading…
[Model] Implement DualChunkAttention for Qwen2 Models
needs-rebase
#6139
opened Jul 4, 2024 by
hzhwcmhf
Loading…
ProTip!
Updated in the last three days: updated:>2025-01-09.