-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Feature]
reasoning_content
in API for reasoning models like DeepSeek R1
feature request
#12468
opened Jan 27, 2025 by
gaocegege
1 task done
Release v0.7.1
release
Related to new version release
#12465
opened Jan 27, 2025 by
simon-mo
4 tasks
[Feature]: Support Qwen/Qwen2.5-14B-Instruct-1M
feature request
#12452
opened Jan 26, 2025 by
casper-hansen
1 task done
[Bug]: the most recent xla nightly is breaking vllm on TPU
bug
Something isn't working
#12451
opened Jan 26, 2025 by
hosseinsarshar
1 task done
[New Model]: IDEA-Research/ChatRex-7B
new model
Requests to new models
#12444
opened Jan 26, 2025 by
Fr0do
1 task done
[Bug]: nrt_tensor_allocate status=4 message="Allocation Failure" on AWS Neuron
bug
Something isn't working
#12443
opened Jan 26, 2025 by
StefanDimitrov95
1 task done
[Usage]: Shape mismatch when batch requests with openai chat completion apis and qwen2-vl
usage
How to use vllm
#12442
opened Jan 26, 2025 by
javasy
[Bug]: Could not run '_C::rms_norm' with arguments from the 'CUDA' backend.
bug
Something isn't working
#12441
opened Jan 26, 2025 by
851780266
1 task done
[Bug]: python -m vllm.entrypoints.openai.api_server --served-model-name TableGPT2-7B --port 12233 --trust-remote-code --gpu-memory-utilization 0.9 --model ./TableGPT2-7B/ --dtype=half
bug
Something isn't working
#12440
opened Jan 26, 2025 by
851780266
1 task done
[Feature]: Deepseek R1 GGUF 4bit(Q4KM) support
feature request
#12436
opened Jan 26, 2025 by
wuyaoxuehun
1 task done
[Bug]: macOS with vllm-cpu v0.6.6-post2 serving Qwen2.5-1.5b-Instruct results in endless exclamation marks
bug
Something isn't working
#12427
opened Jan 25, 2025 by
liric24
1 task done
[Installation]: no module named "resources"
installation
Installation problems
#12425
opened Jan 25, 2025 by
Omni-NexusAI
1 task done
[Misc]: How to use a chat template to be applied ?
misc
#12423
opened Jan 24, 2025 by
MohamedAliRashad
1 task done
[Usage]: Is it possible to use How to use vllm
meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8
with vLLM?
usage
#12411
opened Jan 24, 2025 by
mrakgr
1 task done
[Bug]: Performance regression when use PyTorch regional compilation
bug
Something isn't working
#12410
opened Jan 24, 2025 by
anko-intel
1 task done
[Bug]: Slower inference time on less input tokens
bug
Something isn't working
#12406
opened Jan 24, 2025 by
vishalkumardas
1 task done
[Bug]: InternVL2-26B-AWQ Service startup failure
bug
Something isn't working
#12404
opened Jan 24, 2025 by
CallmeZhangChenchen
1 task done
[Bug]: AsyncEngineDeadError during inference of two vllm engine on single gpu
bug
Something isn't working
#12401
opened Jan 24, 2025 by
semensorokin
1 task done
[Usage]: Overwhelmed trying to find out information about how to serve Llama-3 70b to multiple users with 128k context
usage
How to use vllm
#12400
opened Jan 24, 2025 by
Arche151
[Feature]: Consider integrating SVDquant (W4A4 quantization) from Nunchaku project
feature request
#12399
opened Jan 24, 2025 by
dengyingxu
1 task
[Performance]: Unexpected performance of vLLM Cascade Attention
performance
Performance-related issues
#12395
opened Jan 24, 2025 by
lauthu
1 task done
[Usage]: use vllm to serve gguf model with cpu only
usage
How to use vllm
#12391
opened Jan 24, 2025 by
pamdla
1 task done
[Bug]: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 0: invalid start byte
bug
Something isn't working
#12390
opened Jan 24, 2025 by
jaydyi
1 task done
[Performance]: Details about the performance of vLLM on Performance-related issues
reasoning models
performance
#12387
opened Jan 24, 2025 by
shaoyuyoung
1 task done
Previous Next
ProTip!
Updated in the last three days: updated:>2025-01-24.