-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Bugfix] handle alignment of arguments in convert_sparse_cross_attention_mask_to_dense
#12347
opened Jan 23, 2025 by
tjohnson31415
Loading…
[Build] Only build 9.0a for scaled_mm and sparse kernels
ci/build
ready
ONLY add when PR is ready to merge/full CI is needed
#12339
opened Jan 23, 2025 by
LucasWilkinson
Loading…
[Frontend] Generate valid tool call IDs when using
tokenizer-mode=mistral
frontend
#12332
opened Jan 22, 2025 by
rafvasq
Loading…
[Core] Optimizing cross-attention
QKVParallelLinear
computation
#12325
opened Jan 22, 2025 by
NickLucche
Loading…
2 tasks
[Hardware][Gaudi][Bugfix] Fix error for guided decoding
ci/build
#12317
opened Jan 22, 2025 by
zhouyu5
Loading…
[do-not-merge][perf-benchmark] cleanup unused docker images/containers
ci/build
perf-benchmarks
#12306
opened Jan 22, 2025 by
khluu
Loading…
[Feature][Spec Decode] Simplify the use of Eagle Spec Decode
#12304
opened Jan 22, 2025 by
ShangmingCai
Loading…
[Hardware][Gaudi][Feature] Enable Dynamic MoE for Mixtral
#12303
opened Jan 22, 2025 by
zhenwei-intel
Loading…
[Core] Make disaggregated prefill compatible with pipeline parallelism
#12301
opened Jan 22, 2025 by
YuhanLiu11
Loading…
[Kernel] Pipe attn_logits_soft_cap through paged attention TPU kernels
ci/build
#12294
opened Jan 22, 2025 by
fenghuizhang
Loading…
[Core] Optimize topp/topk calculation in sampler
#12156
opened Jan 17, 2025 by
afierka-intel
Loading…
[Core] Prefill Only Tokens Without KV Cache in Batch Requests (Disagg Prefill)
#12285
opened Jan 21, 2025 by
Shaoting-Feng
Loading…
[CI/Build] Add label automation for structured-output / speculative-decoding
ci/build
#12280
opened Jan 21, 2025 by
russellb
Loading…
NVIDIA Blackwell codegen
ci/build
documentation
Improvements or additions to documentation
#12271
opened Jan 21, 2025 by
johnnynunez
Loading…
[Model] Enable Inference Support for the New Baichuan-M1 Model
documentation
Improvements or additions to documentation
new model
Requests to new models
#12251
opened Jan 21, 2025 by
rainkert
Loading…
[Misc] Move find_loaded_library to platform_aware_utils.py
#12231
opened Jan 20, 2025 by
houseroad
Loading…
[V1][Spec Decode] Ngram Spec Decode
#12193
opened Jan 19, 2025 by
LiuXiaoxuanPKU
Loading…
4 of 5 tasks
[Bugfix] fix race condition that leads to wrong order of token returned
#12192
opened Jan 19, 2025 by
joennlae
Loading…
[Kernel] add triton fused moe kernel for gptq/awq
moe
quantization
ready
ONLY add when PR is ready to merge/full CI is needed
#12185
opened Jan 18, 2025 by
jinzhen-lin
Loading…
[Quantization/Parameter] WIP: Another Implementation of the Quantization Parameter Subclass Substitution
#12158
opened Jan 17, 2025 by
cennn
Loading…
ProTip!
Find all pull requests that aren't related to any open issues with -linked:issue.