Support generation from input embedding #1265

pfldy2850 · 2023-10-05T09:43:57Z

This PR implements the feature of generating text from embedding input (popularly known as inputs_embeds).
This is related to #369 and #416.

More to do

Enhance test codes for generate.
Determine whether the feature reduces core performance.
Add more details to the comments.
apply it to async_llm_engine and api_server

pfldy2850 · 2023-10-12T06:21:08Z

We conducted several tests and confirmed that the performance degradation was not significant.

In fact, we measured the benchmark 5 times for the main branch and feature branch using the command below.

python benchmarks/benchmark_latency.py --input-len=2048 --num-iters=5

## main
Avg latency: 0.36247589644044637 seconds
Avg latency: 0.35677395705133674 seconds
Avg latency: 0.3622682703658938 seconds
Avg latency: 0.36043337155133487 seconds
Avg latency: 0.3593990854918957 seconds

## feature
Avg latency: 0.3586543008685112 seconds
Avg latency: 0.3557318979874253 seconds
Avg latency: 0.36645207908004523 seconds
Avg latency: 0.3598199490457773 seconds
Avg latency: 0.36111502479761837 seconds

bobchen1980

fixed input embedding function related to #369 and #416

pfldy2850 · 2023-10-18T06:56:49Z

@WoosukKwon @zhuohan123

Hello authors, I have tested this PR and completed the alignment with the latest prepare_inputs function.
Could you please review this PR?

js8544 · 2024-01-03T14:49:30Z

We've been using this branch in production and it works like a charm. Thanks so much for your contribution. Can't wait for it to be merged!

fedshyvana · 2024-01-21T17:38:05Z

thanks for this! Any plan to merge this into main anytime soon?

pfldy2850 · 2024-01-31T06:43:04Z

Hello @zhuohan123 ,

I just saw that you created an issue for the vLLM Q1 2024 roadmap.

If you have any plans to consider this feature or merge for this PR,
I would like to resume the updating work for that PR.

matankley · 2024-02-08T12:03:01Z

This PR would be super valuable for us. @pfldy2850 Do you plan to adjust it to the current master branch ? Because I see it is a bit outdated.

bks5881 · 2024-03-01T09:37:09Z

vllm/entrypoints/api_server.py

    - stream: whether to stream the results or not.
    - other fields: the sampling parameters (See `SamplingParams` for details).
    """
    request_dict = await request.json()
    prompt = request_dict.pop("prompt")
+    prompt_embeds = request_dict.pop("prompt_embeds", None)
+    if prompt_embeds is not None:
+        prompt_embeds = torch.tensor(prompt_embeds).to("cuda")


This loads stuff in float32. Eats all the GPU.

bks5881 · 2024-03-01T09:37:57Z

vllm/entrypoints/api_server.py

@@ -29,16 +30,27 @@ async def generate(request: Request) -> Response:

    The request should be a JSON object with the following fields:
    - prompt: the prompt to use for the generation.
+    - prompt_embeds: the prompt embedding to use for the generation
+        instead of the prompt.
    - stream: whether to stream the results or not.
    - other fields: the sampling parameters (See `SamplingParams` for details).
    """
    request_dict = await request.json()
    prompt = request_dict.pop("prompt")


This throws an error when only prompt_embeds are passed.

bks5881

Thanks a lot for doing this. I tested it and had some issues I ran into but fixed them locally.
Also, for some reason when serializing i got torch.cuda.is_available() as false. so had to set CUDA_VISBILE_DEVICES in ray. init.py

tweeter0830 · 2024-03-22T17:24:41Z

@zhuohan123 Do you have plans for this? It would be really helpful to me for this MR to get merged. I can help push it through if you need.

zhuohan123 · 2024-03-22T19:58:41Z

@zhuohan123 Do you have plans for this? It would be really helpful to me for this MR to get merged. I can help push it through if you need.

We are doing this in this PR for llava support: #3042. Please take a look and let us know any suggestions!

github-actions · 2024-10-30T02:05:01Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

mergify · 2024-10-30T02:05:39Z

This pull request has merge conflicts that must be resolved before it can be
merged. @pfldy2850 please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

pfldy2850 added 11 commits October 5, 2023 18:17

feat: add prompt_embeds interface

bed0e15

fix: add get_input_embeddings

3394d25

feat: support all models to generate from embeds

aa9b215

Merge branch 'main' into feature-input-embeds

ce70fe7

fix: bugfix for inputs_embeds and add last line

de4199d

fix: add prompt_embeds to async engine

9275b2d

Merge branch 'main' into feature-input-embeds

e6963eb

fix: bugfix of get_last_token_id

bd5539a

fix: apply prompt_embeds to api_server

99605bc

refact: refactor test_models

87162d2

fix: apply style guide

a3d9de6

pfldy2850 changed the title ~~[WIP] Support generate from input embedding~~ [WIP] Support generation from input embedding Oct 12, 2023

pfldy2850 added 4 commits October 12, 2023 11:56

fix: improve comments

44ff4ec

refact: refactor prepare_inputs and models

a37cef0

fix: apply style guide

9633148

refact: refactor zero embeds

eec19ed

fix: apply style guide

bebc26b

pfldy2850 changed the title ~~[WIP] Support generation from input embedding~~ Support generation from input embedding Oct 12, 2023

bobchen1980 approved these changes Oct 15, 2023

View reviewed changes

pfldy2850 added 5 commits October 17, 2023 09:40

Merge branch 'main' into feature-input-embeds

a2f2054

Merge branch 'main' into feature-input-embeds

58391ac

fix: update for new prepare_inputs

c28d8bf

fix: rollback commented

117b47f

fix: update style

c0fae79

dimitry12 mentioned this pull request Oct 20, 2023

[Question] Usage with Multimodal LLM #307

Closed

WoosukKwon self-requested a review October 22, 2023 16:42

Merge branch 'main' into feature-input-embeds

2151bc1

WoosukKwon mentioned this pull request Nov 2, 2023

[v0.2.2] Release Tracker #1551

Closed

3 tasks

Merge branch 'main' into feature-input-embeds

f2b10c3

Aakash-kaushik mentioned this pull request Jan 23, 2024

feat: Input embeddings #2563

Closed

zhuohan123 mentioned this pull request Jan 31, 2024

[Roadmap] vLLM Roadmap Q1 2024 #2681

Closed

30 tasks

bks5881 reviewed Mar 1, 2024

View reviewed changes

DarkLight1337 mentioned this pull request Jun 3, 2024

[RFC]: Multi-modality Support on vLLM #4194

Open

85 tasks

AlekseyKorshuk mentioned this pull request Jul 26, 2024

[Feature] Generation Inputs: input_embeds sgl-project/sglang#745

Open

Nan2018 mentioned this pull request Jul 27, 2024

[Core] generate from input embeds #6869

Open

zTaoplus mentioned this pull request Aug 20, 2024

WIP: support table multi data base on vllm-0.5.4 tablegpt/vllm#2

Closed

4 tasks

github-actions bot added the stale label Oct 30, 2024

mergify bot added frontend needs-rebase labels Oct 30, 2024

github-actions bot added unstale and removed stale labels Nov 3, 2024

simon-mo requested review from DarkLight1337, ywang96, zhuohan123, youkaichao, alexm-redhat, comaniac and njhill as code owners November 26, 2024 05:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support generation from input embedding #1265

Support generation from input embedding #1265

pfldy2850 commented Oct 5, 2023 •

edited

Loading

pfldy2850 commented Oct 12, 2023

bobchen1980 left a comment •

edited

Loading

pfldy2850 commented Oct 18, 2023

js8544 commented Jan 3, 2024

fedshyvana commented Jan 21, 2024

pfldy2850 commented Jan 31, 2024

matankley commented Feb 8, 2024

bks5881 Mar 1, 2024 •

edited

Loading

bks5881 Mar 1, 2024

bks5881 left a comment

tweeter0830 commented Mar 22, 2024

zhuohan123 commented Mar 22, 2024

github-actions bot commented Oct 30, 2024

mergify bot commented Oct 30, 2024

Support generation from input embedding #1265

Are you sure you want to change the base?

Support generation from input embedding #1265

Conversation

pfldy2850 commented Oct 5, 2023 • edited Loading

pfldy2850 commented Oct 12, 2023

bobchen1980 left a comment • edited Loading

Choose a reason for hiding this comment

pfldy2850 commented Oct 18, 2023

js8544 commented Jan 3, 2024

fedshyvana commented Jan 21, 2024

pfldy2850 commented Jan 31, 2024

matankley commented Feb 8, 2024

bks5881 Mar 1, 2024 • edited Loading

Choose a reason for hiding this comment

bks5881 Mar 1, 2024

Choose a reason for hiding this comment

bks5881 left a comment

Choose a reason for hiding this comment

tweeter0830 commented Mar 22, 2024

zhuohan123 commented Mar 22, 2024

github-actions bot commented Oct 30, 2024

mergify bot commented Oct 30, 2024

pfldy2850 commented Oct 5, 2023 •

edited

Loading

bobchen1980 left a comment •

edited

Loading

bks5881 Mar 1, 2024 •

edited

Loading