vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 5.3k
Star 35.1k

Code
Issues 1.2k
Pull requests 482
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: vllm-project/vllm

[Roadmap] vLLM Roadmap Q1 2025

#11862 opened Jan 8, 2025 by simon-mo

Open 3

vLLM's V1 Engine Architecture

#8779 opened Sep 24, 2024 by simon-mo

Open 11

Labels 56 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear current search query, filters, and sorts

1,164 Open 4,945 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[RFC]: Multi-modality Support on vLLM feature request RFC

#4194 opened Apr 19, 2024 by ywang96

45 of 78 tasks

[Misc]: Throughput/Latency for guided_json with ~100% GPU cache utilization misc structured-output

#3567 opened Mar 22, 2024 by jens-create

Llama3.2 Vision Model: Guides and Issues stale

#8826 opened Sep 25, 2024 by simon-mo

[Installation] pip install vllm (0.6.3) will force a reinstallation of the CPU version torch and replace cuda torch on windows installation

Installation problems

#9701 opened Oct 25, 2024 by xiezhipeng-git

Recent vLLMs ask for too much memory: ValueError: No available memory for the cache blocks. Try increasing gpu_memory_utilization when initializing the engine. bug

Something isn't working

unstale

#2248 opened Dec 24, 2023 by pseudotensor

[Performance]: decoding speed on long context performance

Performance-related issues

#11286 opened Dec 18, 2024 by 155394551lzk

1 task done

[Bug]: vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already. bug

Something isn't working

#5060 opened May 26, 2024 by heungson

Is there a way to terminate vllm.LLM and release the GPU memory

#1908 opened Dec 4, 2023 by sfc-gh-zhwang

ValueError: The model's max seq len (4096) is larger than the maximum number of tokens that can be stored in KV cache (3664). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine.`

#2418 opened Jan 11, 2024 by handsomelys

API causes slowdown in batch request handling bug

Something isn't working

unstale

#1707 opened Nov 17, 2023 by jpeig

[RFC]: Performance Roadmap keep-open RFC

#6801 opened Jul 25, 2024 by simon-mo

3 of 5 tasks

Generate nothing from VLLM output bug

Something isn't working

#1185 opened Sep 26, 2023 by FocusLiwen

[Bug]: v0.6.4.post1 crashed：Error in model execution: CUDA error: an illegal memory access was encountered bug

Something isn't working

#10389 opened Nov 16, 2024 by wciq1208

1 task done

[Usage]: Does serving the model in **manual** way differ than the **predefined** *(OpenAI)* way? A quick question, please guide usage

How to use vllm

#11569 opened Dec 27, 2024 by AayushSameerShah

ExLlamaV2: exl2 support feature request

#3203 opened Mar 5, 2024 by pabl-o-ce

[Bug]: VLLM 0.5.3.post1 [rank0]: RuntimeError: NCCL error: unhandled cuda error (run with NCCL_DEBUG=INFO for details) bug

Something isn't working

#6732 opened Jul 24, 2024 by jueming0312

[Model] DeepSeek-V3 Enhancements new model

Requests to new models

performance

Performance-related issues

#11539 opened Dec 27, 2024 by simon-mo

2 of 10 tasks

[Usage]: how to use EAGLE on vLLM? usage

How to use vllm

#11126 opened Dec 12, 2024 by xiongqisong

1 task done

[Bug]: Is vllm support function call mode? bug

Something isn't working

#6631 opened Jul 22, 2024 by FanZhang91

[Bug]: Qwen1.5-14B-Chat使用vllm==0.3.3版本在Tesla V100-PCIE-32GB显卡上部署结果全部是感叹号，无结果 bug

Something isn't working

#3998 opened Apr 11, 2024 by li995495592

Could not build wheels for vllm, which is required to install pyproject.toml-based projects installation

Installation problems

stale

#1391 opened Oct 17, 2023 by ABooth01

[Roadmap] vLLM Roadmap Q4 2024

#9006 opened Oct 1, 2024 by simon-mo

22 of 40 tasks

[Bug]: No available block found in 60 second in shm bug

Something isn't working

#6614 opened Jul 21, 2024 by wjj19950828

[New Model]: Qwen/QwQ-32B-Preview new model

Requests to new models

#10737 opened Nov 28, 2024 by SionicAI-Engineering

1 task done

[Performance]: phi 3.5 vision model consuming high CPU RAM and the process getting killed performance

Performance-related issues

stale

#9190 opened Oct 9, 2024 by kuladeephx

1 task done

Previous 1 2 3 4 5 … 46 47 Next

Previous Next

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly