[V1] [2/n] Logging and Metrics - `OutputProcessor` Abstraction #11973

robertgshaw2-neuralmagic · 2025-01-12T18:47:05Z

SUMMARY:

VLLM V1 design minimizes number of python loops over all items in the batch for performance. As we add metrics and logging, we need to loop over all items in the batch another time
This PR renames Detokenizer >> OutputProcessor.
- All functionality that need to touch each item should implement XXXClass.update_from_output + be called in OutputProcessor.process_outputs loop.
- Moves self._process_request_outputs into this loop (previously this was a separate loop in output_handler)
- AddIterationStats.update_from_output() to this loop

NOTES:

Follow on to: [V1][Core][1/n] Logging and Metrics #11962
Previous experiments showed that having Detokenizer in a separate process hurts performance ([V1] [7/N] API Server: Multiprocessing Detokenizer [ DO NOT MERGE ] #11636). So feel confident that adding all of this to a single loop is the right approach.

TODO:

Fix Tests

Signed-off-by: [email protected] <[email protected]>

github-actions · 2025-01-12T18:47:16Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

robertgshaw2-neuralmagic · 2025-01-12T22:04:12Z

vllm/v1/engine/detokenizer.py

@@ -189,85 +178,3 @@ def _get_next_output_text(self, finished: bool, delta: bool) -> str:
            self._last_output_text_offset = length
            return self.output_text[last_offset:length]
        return ""
-
-
-class Detokenizer:


This is now called OutputProcessor, since we will do more than "just" detokenization in the step() function

robertgshaw2-neuralmagic · 2025-01-12T22:20:46Z

vllm/v1/engine/async_llm.py

@@ -233,57 +225,46 @@ async def generate(
            await self.abort(request_id)
            raise

-    def _process_request_outputs(self, request_outputs: List[RequestOutput]):


This logic is moved into OutputProcessor.process_outputs() loop

robertgshaw2-neuralmagic · 2025-01-12T22:23:45Z

vllm/v1/engine/output_processor.py

+            queue=queue,
+        )
+
+


NOTE: this was previously called Detokenizer

robertgshaw2-neuralmagic · 2025-01-12T22:29:43Z

vllm/v1/engine/async_llm.py

@@ -59,9 +59,6 @@ def __init__(
            lora_config=vllm_config.lora_config)
        self.tokenizer.ping()

-        # Request streams (map of request_id -> queue).


NOTE: these queues are held in OutputProcessor

mgoin

Nice structure and comments, this LGTM. It would be nice to have a test that IterationStats gets updated within the OutputProcessor

rickyyx · 2025-01-12T22:53:34Z

vllm/v1/engine/output_processor.py

+        If you need to touch every element of the batch, implement a
+        method called XXXClass.update_from_output() to be called
+        within the loop below. For examples, see:
+            * IterationStats.update_from_output()


This is a great abstraction IMO.

nit: I wonder if we also want to make it more explicit by having something like a OutputHandler protocol that takes in the engine core output + maybe a current request state?

I will add a comment suggesting that we do this once we add RequestStats. I want to keep flexibility while we are in the development stage

robertgshaw2-neuralmagic added 13 commits January 11, 2025 22:06

added code

cfa8c2b

Signed-off-by: [email protected] <[email protected]>

fixed

6d8e4f3

fixed

c78a56f

updated

7b39705

updated

6e9cd1c

fixed

2657b7f

updated

249b9ff

refactoring metrics

c1f9292

updated

c641866

updated

1ce7a5f

Merge branch 'v1-metrics' into v1-metrics-2

c1baa6d

added output processor

f8de299

added all files

49ca9bb

robertgshaw2-neuralmagic requested review from WoosukKwon, njhill, ywang96, comaniac and alexm-neuralmagic as code owners January 12, 2025 18:47

robertgshaw2-neuralmagic changed the title ~~[V1] [2/n] Logging and Metrics - Output Processor Abstraction~~ [V1] [2/n] Logging and Metrics - OutputProcessor Abstraction Jan 12, 2025

robertgshaw2-neuralmagic added 10 commits January 12, 2025 18:58

stash

86d33a1

working again

4066fc8

Merge branch 'v1-metrics' into v1-metrics-2

5ef374c

fixed sorting

c9ffc60

Merge branch 'main' into v1-metrics-2

5f3f3b7

merged

e34b9dc

reduce number of changes

dd6e3d6

reduce changes

dbd86b8

reduce changes

ebf3530

updared

7b6d9b3

robertgshaw2-neuralmagic added 9 commits January 12, 2025 21:39

reduce LOC changes

3746183

updated

449405b

remove file

79f2f5f

updated

a16d27f

reduce LOC changes

19372f9

updated

39be503

updated

833f028

updated

ef2c3f9

updated

33303fc

robertgshaw2-neuralmagic commented Jan 12, 2025

View reviewed changes

robertgshaw2-neuralmagic added 6 commits January 12, 2025 22:07

updated

edae5d2

updated

a20c7b5

updated

b7e5a91

fixed

9353010

cleanup

94de9f5

revert abort test

2ea4283

robertgshaw2-neuralmagic commented Jan 12, 2025

View reviewed changes

vllm/v1/engine/output_processor.py

queue=queue,

)

Copy link

Collaborator Author

robertgshaw2-neuralmagic Jan 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: this was previously called Detokenizer

robertgshaw2-neuralmagic requested a review from russellb January 12, 2025 22:26

robertgshaw2-neuralmagic commented Jan 12, 2025

View reviewed changes

updared

b9683d1

mgoin reviewed Jan 12, 2025

View reviewed changes

stash

92c3b0c

rickyyx reviewed Jan 12, 2025

View reviewed changes

robertgshaw2-neuralmagic added 6 commits January 12, 2025 23:00

added logging and comment

a985a73

starting to fix tests - stash

6c36d87

updated tests

595fd12

make tests pass

5ecfe8e

reduce LOC changes

5f37918

updated

1d9b233

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1] [2/n] Logging and Metrics - `OutputProcessor` Abstraction #11973

[V1] [2/n] Logging and Metrics - `OutputProcessor` Abstraction #11973

robertgshaw2-neuralmagic commented Jan 12, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 12, 2025

robertgshaw2-neuralmagic Jan 12, 2025

robertgshaw2-neuralmagic Jan 12, 2025 •

edited

Loading

robertgshaw2-neuralmagic Jan 12, 2025

robertgshaw2-neuralmagic Jan 12, 2025

mgoin left a comment

rickyyx Jan 12, 2025

robertgshaw2-neuralmagic Jan 12, 2025

[V1] [2/n] Logging and Metrics - OutputProcessor Abstraction #11973

Are you sure you want to change the base?

[V1] [2/n] Logging and Metrics - OutputProcessor Abstraction #11973

Conversation

robertgshaw2-neuralmagic commented Jan 12, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 12, 2025

robertgshaw2-neuralmagic Jan 12, 2025

Choose a reason for hiding this comment

robertgshaw2-neuralmagic Jan 12, 2025 • edited Loading

Choose a reason for hiding this comment

robertgshaw2-neuralmagic Jan 12, 2025

Choose a reason for hiding this comment

robertgshaw2-neuralmagic Jan 12, 2025

Choose a reason for hiding this comment

mgoin left a comment

Choose a reason for hiding this comment

rickyyx Jan 12, 2025

Choose a reason for hiding this comment

robertgshaw2-neuralmagic Jan 12, 2025

Choose a reason for hiding this comment

[V1] [2/n] Logging and Metrics - `OutputProcessor` Abstraction #11973

[V1] [2/n] Logging and Metrics - `OutputProcessor` Abstraction #11973

robertgshaw2-neuralmagic commented Jan 12, 2025 •

edited by github-actions bot

Loading

robertgshaw2-neuralmagic Jan 12, 2025 •

edited

Loading