vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 5.3k
Star 34.9k

Code
Issues 1.2k
Pull requests 481
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: vllm-project/vllm

[Roadmap] vLLM Roadmap Q1 2025

#11862 opened Jan 8, 2025 by simon-mo

Open 3

vLLM's V1 Engine Architecture

#8779 opened Sep 24, 2024 by simon-mo

Open 11

Labels 56 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1,160 Open 4,938 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Feature] reasoning_content in API for reasoning models like DeepSeek R1 feature request

#12468 opened Jan 27, 2025 by gaocegege

1 task done

Release v0.7.1 release

Related to new version release

#12465 opened Jan 27, 2025 by simon-mo

4 tasks

[Feature]: Support Qwen/Qwen2.5-14B-Instruct-1M feature request

#12452 opened Jan 26, 2025 by casper-hansen

1 task done

[Bug]: the most recent xla nightly is breaking vllm on TPU bug

Something isn't working

#12451 opened Jan 26, 2025 by hosseinsarshar

1 task done

[New Model]: IDEA-Research/ChatRex-7B new model

Requests to new models

#12444 opened Jan 26, 2025 by Fr0do

1 task done

[Bug]: nrt_tensor_allocate status=4 message="Allocation Failure" on AWS Neuron bug

Something isn't working

#12443 opened Jan 26, 2025 by StefanDimitrov95

1 task done

[Usage]: Shape mismatch when batch requests with openai chat completion apis and qwen2-vl usage

How to use vllm

#12442 opened Jan 26, 2025 by javasy

[Bug]: Could not run '_C::rms_norm' with arguments from the 'CUDA' backend. bug

Something isn't working

#12441 opened Jan 26, 2025 by 851780266

1 task done

[Bug]: python -m vllm.entrypoints.openai.api_server --served-model-name TableGPT2-7B --port 12233 --trust-remote-code --gpu-memory-utilization 0.9 --model ./TableGPT2-7B/ --dtype=half bug

Something isn't working

#12440 opened Jan 26, 2025 by 851780266

1 task done

[Feature]: Deepseek R1 GGUF 4bit(Q4KM) support feature request

#12436 opened Jan 26, 2025 by wuyaoxuehun

1 task done

Flash Attention 3 (FA3) Support

#12429 opened Jan 25, 2025 by mgoin

3 tasks

[Bug]: macOS with vllm-cpu v0.6.6-post2 serving Qwen2.5-1.5b-Instruct results in endless exclamation marks bug

Something isn't working

#12427 opened Jan 25, 2025 by liric24

1 task done

[Installation]: no module named "resources" installation

Installation problems

#12425 opened Jan 25, 2025 by Omni-NexusAI

1 task done

[Misc]: How to use a chat template to be applied ? misc

#12423 opened Jan 24, 2025 by MohamedAliRashad

1 task done

[Usage]: Is it possible to use meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8 with vLLM? usage

How to use vllm

#12411 opened Jan 24, 2025 by mrakgr

1 task done

[Bug]: Performance regression when use PyTorch regional compilation bug

Something isn't working

#12410 opened Jan 24, 2025 by anko-intel

1 task done

[Bug]: Slower inference time on less input tokens bug

Something isn't working

#12406 opened Jan 24, 2025 by vishalkumardas

1 task done

[Bug]: InternVL2-26B-AWQ Service startup failure bug

Something isn't working

#12404 opened Jan 24, 2025 by CallmeZhangChenchen

1 task done

[Bug]: AsyncEngineDeadError during inference of two vllm engine on single gpu bug

Something isn't working

#12401 opened Jan 24, 2025 by semensorokin

1 task done

[Usage]: Overwhelmed trying to find out information about how to serve Llama-3 70b to multiple users with 128k context usage

How to use vllm

#12400 opened Jan 24, 2025 by Arche151

[Feature]: Consider integrating SVDquant (W4A4 quantization) from Nunchaku project feature request

#12399 opened Jan 24, 2025 by dengyingxu

1 task

[Performance]: Unexpected performance of vLLM Cascade Attention performance

Performance-related issues

#12395 opened Jan 24, 2025 by lauthu

1 task done

[Usage]: use vllm to serve gguf model with cpu only usage

How to use vllm

#12391 opened Jan 24, 2025 by pamdla

1 task done

[Bug]: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 0: invalid start byte bug

Something isn't working

#12390 opened Jan 24, 2025 by jaydyi

1 task done

[Performance]: Details about the performance of vLLM on reasoning models performance

Performance-related issues

#12387 opened Jan 24, 2025 by shaoyuyoung

1 task done

Previous 1 2 3 4 5 … 46 47 Next

Previous Next

ProTip! Updated in the last three days: updated:>2025-01-24.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly