vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 5.3k
Star 35.1k

Code
Issues 1.2k
Pull requests 483
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: vllm-project/vllm

[Roadmap] vLLM Roadmap Q1 2025

#11862 opened Jan 8, 2025 by simon-mo

Open 3

vLLM's V1 Engine Architecture

#8779 opened Sep 24, 2024 by simon-mo

Open 11

Labels 56 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear current search query, filters, and sorts

1,164 Open 4,945 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

ValueError: The model's max seq len (4096) is larger than the maximum number of tokens that can be stored in KV cache (3664). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine.`

#2418 opened Jan 11, 2024 by handsomelys

Memory leak while using tensor_parallel_size>1 bug

Something isn't working

unstale

#694 opened Aug 8, 2023 by haiasd

Does vllm support do_sample? feature request unstale

#699 opened Aug 8, 2023 by leiwen83

How to use vllm to compute ppl score for input text? usage

How to use vllm

#1019 opened Sep 12, 2023 by yinochaos

Generate nothing from VLLM output bug

Something isn't working

#1185 opened Sep 26, 2023 by FocusLiwen

Could not build wheels for vllm, which is required to install pyproject.toml-based projects installation

Installation problems

stale

#1391 opened Oct 17, 2023 by ABooth01

[new feature] flash decoding ++ feature request unstale

#1568 opened Nov 5, 2023 by John-Ge

API causes slowdown in batch request handling bug

Something isn't working

unstale

#1707 opened Nov 17, 2023 by jpeig

Lookahead decoding unstale

#1742 opened Nov 21, 2023 by TheodoreGalanos

Feature request: prompt lookup decoding feature request

#1802 opened Nov 27, 2023 by kevinhu

Is there a way to terminate vllm.LLM and release the GPU memory

#1908 opened Dec 4, 2023 by sfc-gh-zhwang

Recent vLLMs ask for too much memory: ValueError: No available memory for the cache blocks. Try increasing gpu_memory_utilization when initializing the engine. bug

Something isn't working

unstale

#2248 opened Dec 24, 2023 by pseudotensor

[Feature Request] Support input embedding in LLM.generate() feature request

#416 opened Jul 10, 2023 by KimmiShi

Compute perplexity/logits for the prompt

#2364 opened Jan 7, 2024 by dsmilkov

anyone can Qwen-14B-Chat-AWQ work with VLLM/TP ? ray

anything related with ray

unstale

#2419 opened Jan 11, 2024 by s-natsubori

Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.

#2729 opened Feb 2, 2024 by gty111

Better defaults to match Hugging Face

#2733 opened Feb 3, 2024 by titu1994

Please add lora support for higher ranks and alpha values unstale

#2847 opened Feb 13, 2024 by parikshitsaikia1619

[feat] vLLM generation deterministic option/flag unstale

#2910 opened Feb 18, 2024 by PeterSH6

vllm keeps hanging when using djl-deepspeed unstale

#2912 opened Feb 18, 2024 by ali-firstparty

Inference based on vllm qwen14B is inconsistent with the original qwen results, and the accuracy will drop significantly

#2950 opened Feb 21, 2024 by chenshukai1015

Failed to find C compiler. Please specify via CC environment variable

#2997 opened Feb 22, 2024 by gangooteli

Loading models from an S3 location instead of local path

#3090 opened Feb 28, 2024 by simon-mo

Conda Forge Package keep-open

#3126 opened Feb 29, 2024 by iamthebot

[Bug]: triton.runtime.errors.OutOfResources: out of resource: shared memory, Required: 66560, Hardware limit: 65536. Reducing block sizes or num_stages may help. bug

Something isn't working

#12498 opened Jan 28, 2025 by pseudotensor

1 task done

Previous 1 2 3 4 5 … 46 47 Next

Previous Next

ProTip! Updated in the last three days: updated:>2025-01-24.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly