[Feature] `reasoning_content` in API for reasoning models like DeepSeek R1 #12468

gaocegege · 2025-01-27T07:27:57Z

🚀 The feature, motivation and pitch

To better support reasoning models like DeepSeek-R1, adding a reasoning_content parameter to the API would be great so that users can see the steps in the reasoning process.

Ref sgl-project/sglang#3043

Alternatives

No response

Additional context

No response

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

gaocegege · 2025-01-27T07:30:42Z

For DeepSeek series models, this means we could shift the <think></think> tags into the reasoning_content.

Current output format:

content='<think>\nAlright, I just received a query asking, "Wher....\n</think>\n\nThe 2020 World Series was played in **Texas** at the residence of the Los Angeles Dodgers team, the **Rays**. The series lasted from July 19 to July 31 and was won by the **Los Angeles Dodgers**.'```

simon-mo · 2025-01-27T07:58:05Z

So this does break OpenAI compatibility but I think it is the right time to break it. Do you have suggestion on how to automatically figure out the <think> token so it's a bit general for the future model?

gaocegege · 2025-01-27T08:18:42Z

The OpenAI Python library still runs fine, even if we’re kinda messing with the compatibility a bit.

I think we could keep it more general since the token might change for different models. I'm diving into the OpenAI server wrappers to raise a basic design proposal here.

from openai import OpenAI
client = OpenAI(api_key="", base_url="https://api.deepseek.com")

messages = [{"role": "user", "content": "9.11 and 9.8, which is greater?"}]
response = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=messages
)

+ reasoning_content = response.choices[0].message.reasoning_content
content = response.choices[0].message.content

arunpatala · 2025-01-27T10:34:46Z

This would also be great to have when we are trying to get structured outputs from content without trying to parse the reasoning output.

seconded +1

gaocegege · 2025-01-27T11:03:12Z

@arunpatala Hi, could you please explain more about the use case?

arunpatala · 2025-01-27T11:09:35Z

Its just to make sure that if we provide JSON schema for the output to follow, we specify the following using OpenAI API:


from pydantic import BaseModel
from openai import OpenAI


class Info(BaseModel):
    name: str
    age: int


client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="dummy")
completion = client.beta.chat.completions.parse(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "My name is Cameron, I'm 28. What's my name and age?"},
    ],
    response_format=Info
)

Just want to make sure the thinking tokens are seperated from the main content and the main content follows the schema. Maybe this is already how it is being implemented.

gaocegege · 2025-01-27T11:13:55Z

Hey @simon-mo,

I was thinking it might be cool to create a new abstraction called Reasoning Parser, kind of like what we have in abstract_tool_parser.py.

We could have a specific implementation like DeepSeekR1ReasoningParser that parses the <think> and </think> tokens to generate reasoning_content for delta_message in streaming requests and message in sync requests.

And we need to add two CLI arguments to vllm serve:

vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
    --enable-reasoning \
    --reasoning-parser deepseek_r1 \

One key insight is that the <think> token usually shows up as the first token in the response. But I'm not sure if that’s consistent across different models in the future, so I’d rather not rely on it for optimization.

What do you think?

gaocegege · 2025-01-27T12:41:18Z

@arunpatala

I don’t think it will work because the structured output engine, like xgrammar, sets the logits for the reasoning tokens to −∞. As a result, the output from the LLMEngine doesn't include the reasoning content.

tokens that would violate the required structure are identified as invalid. Their logits are set to −∞, effectively assigning them zero probability after the softmax operation and preserving the relative probabilities of other valid tokens. This ensures that only valid tokens are sampled.

Do you have some suggestions about it?

gaocegege · 2025-01-28T02:10:08Z

Here are the proposed changes: #12473

lucasalvarezlacasa · 2025-01-31T08:15:50Z

This is not available yet, isn't it?
I'm getting the following error when trying to use:

api_server.py: error: unrecognized arguments: --enable-reasoning --reasoning-parser deepseek_r1

This is the full command I'm launching:

docker run --runtime nvidia --gpus all \
    -v /path/to/weights:/root/.cache/huggingface \
    -p 8000:8000 \
    --ipc=host \
    vllm/vllm-openai:v0.7.0 \
    --model deepseek-ai/DeepSeek-R1-Distill-Llama-8B \
    --trust-remote-code \
    --enable-chunked-prefill \
    --uvicorn-log-level error \
    --gpu-memory-utilization 0.95 \
    --dtype bfloat16 \
    --enable-reasoning --reasoning-parser deepseek_r1 \
    --max-model-len 8192

gaocegege · 2025-01-31T08:44:27Z

Hi, it does not work with v0.7.0. Perhaps you could try using vllm/vllm-openai:latest instead.

lucasalvarezlacasa · 2025-01-31T12:52:28Z

I tried using latest before "v0.7.0" and it didn't work either. I think this is still not released.

DarkLight1337 · 2025-01-31T13:06:45Z

Yeah, it's not released yet. You need to use latest code, not latest release. (i.e. you need to use the docker image after 0.7.0)

gaocegege added the feature request label Jan 27, 2025

gaocegege mentioned this issue Jan 27, 2025

[Frontend] Support reasoning content for deepseek r1 #12473

Merged

DarkLight1337 closed this as completed in #12473 Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] `reasoning_content` in API for reasoning models like DeepSeek R1 #12468

[Feature] `reasoning_content` in API for reasoning models like DeepSeek R1 #12468

gaocegege commented Jan 27, 2025

gaocegege commented Jan 27, 2025 •

edited

Loading

simon-mo commented Jan 27, 2025

gaocegege commented Jan 27, 2025

arunpatala commented Jan 27, 2025

gaocegege commented Jan 27, 2025

arunpatala commented Jan 27, 2025

gaocegege commented Jan 27, 2025 •

edited

Loading

gaocegege commented Jan 27, 2025 •

edited

Loading

gaocegege commented Jan 28, 2025 •

edited

Loading

lucasalvarezlacasa commented Jan 31, 2025

gaocegege commented Jan 31, 2025

lucasalvarezlacasa commented Jan 31, 2025

DarkLight1337 commented Jan 31, 2025 •

edited

Loading

[Feature] reasoning_content in API for reasoning models like DeepSeek R1 #12468

[Feature] reasoning_content in API for reasoning models like DeepSeek R1 #12468

Comments

gaocegege commented Jan 27, 2025

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

gaocegege commented Jan 27, 2025 • edited Loading

simon-mo commented Jan 27, 2025

gaocegege commented Jan 27, 2025

arunpatala commented Jan 27, 2025

gaocegege commented Jan 27, 2025

arunpatala commented Jan 27, 2025

gaocegege commented Jan 27, 2025 • edited Loading

gaocegege commented Jan 27, 2025 • edited Loading

gaocegege commented Jan 28, 2025 • edited Loading

lucasalvarezlacasa commented Jan 31, 2025

gaocegege commented Jan 31, 2025

lucasalvarezlacasa commented Jan 31, 2025

DarkLight1337 commented Jan 31, 2025 • edited Loading

[Feature] `reasoning_content` in API for reasoning models like DeepSeek R1 #12468

[Feature] `reasoning_content` in API for reasoning models like DeepSeek R1 #12468

gaocegege commented Jan 27, 2025 •

edited

Loading

gaocegege commented Jan 27, 2025 •

edited

Loading

gaocegege commented Jan 27, 2025 •

edited

Loading

gaocegege commented Jan 28, 2025 •

edited

Loading

DarkLight1337 commented Jan 31, 2025 •

edited

Loading