Skip to content

Pull requests: vllm-project/vllm

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

[Core] tokens in queue metric
#12286 opened Jan 21, 2025 by annapendleton Loading…
FLOP counting for vLLM inference
#12341 opened Jan 23, 2025 by dianastea Draft
[Build] Only build 9.0a for scaled_mm and sparse kernels ci/build ready ONLY add when PR is ready to merge/full CI is needed
#12339 opened Jan 23, 2025 by LucasWilkinson Loading…
[Core] Optimizing cross-attention QKVParallelLinear computation
#12325 opened Jan 22, 2025 by NickLucche Loading…
2 tasks
[Core] Optimize topp/topk calculation in sampler
#12156 opened Jan 17, 2025 by afierka-intel Loading…
NVIDIA Blackwell codegen ci/build documentation Improvements or additions to documentation
#12271 opened Jan 21, 2025 by johnnynunez Loading…
[Model] Enable Inference Support for the New Baichuan-M1 Model documentation Improvements or additions to documentation new model Requests to new models
#12251 opened Jan 21, 2025 by rainkert Loading…
[V1][Spec Decode] Ngram Spec Decode
#12193 opened Jan 19, 2025 by LiuXiaoxuanPKU Loading…
4 of 5 tasks
[Misc] Add Gemma2 GGUF support
#12186 opened Jan 18, 2025 by Isotr0py Loading…
[Kernel] add triton fused moe kernel for gptq/awq moe quantization ready ONLY add when PR is ready to merge/full CI is needed
#12185 opened Jan 18, 2025 by jinzhen-lin Loading…
ProTip! Find all pull requests that aren't related to any open issues with -linked:issue.