-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Memory leak while using tensor_parallel_size>1
bug
Something isn't working
unstale
#694
opened Aug 8, 2023 by
haiasd
How to use vllm to compute ppl score for input text?
usage
How to use vllm
#1019
opened Sep 12, 2023 by
yinochaos
Generate nothing from VLLM output
bug
Something isn't working
#1185
opened Sep 26, 2023 by
FocusLiwen
Could not build wheels for vllm, which is required to install pyproject.toml-based projects
installation
Installation problems
stale
#1391
opened Oct 17, 2023 by
ABooth01
API causes slowdown in batch request handling
bug
Something isn't working
unstale
#1707
opened Nov 17, 2023 by
jpeig
Is there a way to terminate vllm.LLM and release the GPU memory
#1908
opened Dec 4, 2023 by
sfc-gh-zhwang
Recent vLLMs ask for too much memory: ValueError: No available memory for the cache blocks. Try increasing Something isn't working
unstale
gpu_memory_utilization
when initializing the engine.
bug
#2248
opened Dec 24, 2023 by
pseudotensor
[Feature Request] Support input embedding in
LLM.generate()
feature request
#416
opened Jul 10, 2023 by
KimmiShi
anyone can Qwen-14B-Chat-AWQ work with VLLM/TP ?
ray
anything related with ray
unstale
#2419
opened Jan 11, 2024 by
s-natsubori
Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.
#2729
opened Feb 2, 2024 by
gty111
Please add lora support for higher ranks and alpha values
unstale
#2847
opened Feb 13, 2024 by
parikshitsaikia1619
Failed to find C compiler. Please specify via CC environment variable
#2997
opened Feb 22, 2024 by
gangooteli
[Bug]: triton.runtime.errors.OutOfResources: out of resource: shared memory, Required: 66560, Hardware limit: 65536. Reducing block sizes or Something isn't working
num_stages
may help.
bug
#12498
opened Jan 28, 2025 by
pseudotensor
1 task done
Previous Next
ProTip!
Updated in the last three days: updated:>2025-01-24.