vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 5.1k
Star 33.6k

Code
Issues 1.2k
Pull requests 456
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: vllm-project/vllm

Labels 56 Milestones 0

New pull request New

Clear current search query, filters, and sorts

425 Open 4,739 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[Bugfix][Core] Output sampling: heuristic to choose between candidates needs-rebase unstale

#6589 opened Jul 19, 2024 by NihalPotdar

Loading…

[ Do Not Merge ] pyzmq based openai server prototypes (w/ protobuf)

#6880 opened Jul 29, 2024 by robertgshaw2-neuralmagic • Draft

[ DO NOT MERGE] pyzmq based openai server prototypes (w/ python pickle)

#6874 opened Jul 28, 2024 by robertgshaw2-neuralmagic • Draft

Add required libcuda.so ci/build needs-rebase ready

ONLY add when PR is ready to merge/full CI is needed

#6864 opened Jul 27, 2024 by sdake

Loading…

[Model] Teleflm Support

#6822 opened Jul 26, 2024 by horizon94

Loading…

[CI/Build] upgrade Dockerfile to ubuntu 22.04

#6820 opened Jul 26, 2024 by samos123 • Draft

Prefetch all needs-rebase

#6817 opened Jul 26, 2024 by gnpinkert

Loading…

[Core] Get KV from Block, add KV to Block

#6808 opened Jul 26, 2024 by KrishnaM251 • Draft

Update logits processor with tensor caching

#6715 opened Jul 24, 2024 by lynkz-matt-psaltis • Draft

[Draft] [Speculative decoding] Use SPMD worker to reduce control plane communication

#6664 opened Jul 23, 2024 by cadedaniel • Draft

[DOC] Correct warning about performance

#6654 opened Jul 22, 2024 by casper-hansen

Loading…

[ CI ] Awq Marlin Integration Tests ci/build needs-rebase ready

ONLY add when PR is ready to merge/full CI is needed

#6627 opened Jul 22, 2024 by robertgshaw2-neuralmagic

Loading…

[WIP] Fp8 marlin grouped

#6608 opened Jul 20, 2024 by mgoin • Draft

[Kernel] Unify the kernel used in flash attention backend needs-rebase ready

ONLY add when PR is ready to merge/full CI is needed

#6052 opened Jul 2, 2024 by LiuXiaoxuanPKU

Loading…

[Not for review] Pp adag proto

#6526 opened Jul 17, 2024 by ruisearch42 • Draft

[Model] Add Support for GPTQ Fused MOE

#6502 opened Jul 17, 2024 by izhuhaoran

Loading…

[Not for review] Spmd tp rebase ready

ONLY add when PR is ready to merge/full CI is needed

#6483 opened Jul 16, 2024 by ruisearch42 • Draft

[Not for review] PP ADAG

#6448 opened Jul 15, 2024 by ruisearch42 • Draft

[Draft] proposal for ipex quant support

#6440 opened Jul 15, 2024 by jikunshang • Draft

torch.compile based model optimizer

#6377 opened Jul 12, 2024 by bnellnm • Draft

[ Misc ] Support Act Order in Compressed Tensors needs-rebase ready

ONLY add when PR is ready to merge/full CI is needed

#6358 opened Jul 12, 2024 by robertgshaw2-neuralmagic

Loading…

[BigFix] Fix the lm_head in gpt_bigcode in lora mode

#6357 opened Jul 12, 2024 by maxdebayser

Loading…

[Model] Implement DualChunkAttention for Qwen2 Models needs-rebase

#6139 opened Jul 4, 2024 by hzhwcmhf

Loading…

[WIP] Emulated fp8 inference

#6111 opened Jul 3, 2024 by mgoin • Draft

[Not for review] Accelerated dag p2p 2

#6075 opened Jul 2, 2024 by ruisearch42 • Draft

Previous 1 2 3 4 5 6 7 … 16 17 Next

Previous Next

ProTip! Updated in the last three days: updated:>2025-01-09.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly