-
-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming does not work as expected #52
Comments
This is not happening on my end. Can you please give me some reproduction info? |
Here is an example using the import time
from openai import OpenAI
# Toggle to use Tabby instead of OpenAI
use_tabby = False
if use_tabby:
client = OpenAI(
base_url="http://your-tabby-server",
api_key="dummy"
)
model="LoneStriker--Nous-Capybara-34B-4.65bpw-h6-exl2"
else:
client = OpenAI(
api_key="your-api-key"
)
model = "gpt-3.5-turbo"
prompt = "Write a python webserver which returns the number pi at the url `/pi`"
start = time.time()
stream = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
stream=True,
)
end = time.time()
print("Completions:", end - start)
start = time.time()
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
end = time.time()
print("Stream:", end - start) When running this, you can see that when using OpenAI, the characters get output one-by-one vs. with Tabby, there is a big delay after |
I was able to reproduce the issue using your code. To fix this on my end, I removed the |
Closing this issue due to inactivity. |
Oh right, almost. The Also, for a real test, I had to add |
(Adding to this, I would expect |
Defaulting As for your problem with setting max_tokens in the request. There is a new feature coming soon to override generation parameter defaults from tabby's side. You can try it by pulling the |
When using
"stream": true
, the results are returned in the expected format, but (as opposed to OpenAI) they seem to be returned in a big batch after the complete response has been generated. Is this because of a limitation in tabbyAPI or in ExllamaV2?The text was updated successfully, but these errors were encountered: