Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new static_cache option seem to have broken API with exl2 models #6702

Open
1 task done
jsboige opened this issue Jan 26, 2025 · 1 comment
Open
1 task done

new static_cache option seem to have broken API with exl2 models #6702

jsboige opened this issue Jan 26, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@jsboige
Copy link
Contributor

jsboige commented Jan 26, 2025

Describe the bug

Regular UI still works fine, but with API calls, I have the following exception trace:

2025-01-26 22:29:18 File "/venv/lib/python3.10/site-packages/starlette/routing.py", line 73, in app
2025-01-26 22:29:18 response = await f(request)
2025-01-26 22:29:18 File "/venv/lib/python3.10/site-packages/fastapi/routing.py", line 301, in app
2025-01-26 22:29:18 raw_response = await run_endpoint_function(
2025-01-26 22:29:18 File "/venv/lib/python3.10/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
2025-01-26 22:29:18 return await dependant.call(**values)
2025-01-26 22:29:18 File "/app/extensions/openai/script.py", line 139, in openai_chat_completions
2025-01-26 22:29:18 response = OAIcompletions.chat_completions(to_dict(request_data), is_legacy=is_legacy)
2025-01-26 22:29:18 File "/app/extensions/openai/completions.py", line 544, in chat_completions
2025-01-26 22:29:18 return deque(generator, maxlen=1).pop()
2025-01-26 22:29:18 File "/app/extensions/openai/completions.py", line 333, in chat_completions_common
2025-01-26 22:29:18 for a in generator:
2025-01-26 22:29:18 File "/app/modules/chat.py", line 410, in generate_chat_reply
2025-01-26 22:29:18 for history in chatbot_wrapper(text, state, regenerate=regenerate, _continue=_continue, loading_message=loading_message, for_ui=for_ui):
2025-01-26 22:29:18 File "/app/modules/chat.py", line 352, in chatbot_wrapper
2025-01-26 22:29:18 for j, reply in enumerate(generate_reply(prompt, state, stopping_strings=stopping_strings, is_chat=True, for_ui=for_ui)):
2025-01-26 22:29:18 File "/app/modules/text_generation.py", line 42, in generate_reply
2025-01-26 22:29:18 for result in _generate_reply(*args, **kwargs):
2025-01-26 22:29:18 File "/app/modules/text_generation.py", line 97, in _generate_reply
2025-01-26 22:29:18 for reply in generate_func(question, original_question, seed, state, stopping_strings, is_chat=is_chat):
2025-01-26 22:29:18 File "/app/modules/text_generation.py", line 338, in generate_reply_HF
2025-01-26 22:29:18 if state['static_cache']:
2025-01-26 22:29:18 KeyError: 'static_cache'

Is there an existing issue for this?

  • I have searched the existing issues

Reproduction

Since it does that with all my exl2 models, I suppose hosting an exl2 and using chatcompletion api should suffice to reproduce the error.

Screenshot

No response

Logs

2025-01-26 22:29:18   File "/venv/lib/python3.10/site-packages/starlette/routing.py", line 73, in app
2025-01-26 22:29:18     response = await f(request)
2025-01-26 22:29:18   File "/venv/lib/python3.10/site-packages/fastapi/routing.py", line 301, in app
2025-01-26 22:29:18     raw_response = await run_endpoint_function(
2025-01-26 22:29:18   File "/venv/lib/python3.10/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
2025-01-26 22:29:18     return await dependant.call(**values)
2025-01-26 22:29:18   File "/app/extensions/openai/script.py", line 139, in openai_chat_completions
2025-01-26 22:29:18     response = OAIcompletions.chat_completions(to_dict(request_data), is_legacy=is_legacy)
2025-01-26 22:29:18   File "/app/extensions/openai/completions.py", line 544, in chat_completions
2025-01-26 22:29:18     return deque(generator, maxlen=1).pop()
2025-01-26 22:29:18   File "/app/extensions/openai/completions.py", line 333, in chat_completions_common
2025-01-26 22:29:18     for a in generator:
2025-01-26 22:29:18   File "/app/modules/chat.py", line 410, in generate_chat_reply
2025-01-26 22:29:18     for history in chatbot_wrapper(text, state, regenerate=regenerate, _continue=_continue, loading_message=loading_message, for_ui=for_ui):
2025-01-26 22:29:18   File "/app/modules/chat.py", line 352, in chatbot_wrapper
2025-01-26 22:29:18     for j, reply in enumerate(generate_reply(prompt, state, stopping_strings=stopping_strings, is_chat=True, for_ui=for_ui)):
2025-01-26 22:29:18   File "/app/modules/text_generation.py", line 42, in generate_reply
2025-01-26 22:29:18     for result in _generate_reply(*args, **kwargs):
2025-01-26 22:29:18   File "/app/modules/text_generation.py", line 97, in _generate_reply
2025-01-26 22:29:18     for reply in generate_func(question, original_question, seed, state, stopping_strings, is_chat=is_chat):
2025-01-26 22:29:18   File "/app/modules/text_generation.py", line 338, in generate_reply_HF
2025-01-26 22:29:18     if state['static_cache']:
2025-01-26 22:29:18 KeyError: 'static_cache'

System Info

Local 4090, running in docker-desktop on Windows 11 Pro.
@jsboige jsboige added the bug Something isn't working label Jan 26, 2025
@Sylphar
Copy link

Sylphar commented Jan 28, 2025

Can confirm having the same problem, no exllamav2 model can communicate with SillyTavern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants