new static_cache option seem to have broken API with exl2 models #6702

jsboige · 2025-01-26T22:17:29Z

Describe the bug

Regular UI still works fine, but with API calls, I have the following exception trace:

2025-01-26 22:29:18 File "/venv/lib/python3.10/site-packages/starlette/routing.py", line 73, in app
2025-01-26 22:29:18 response = await f(request)
2025-01-26 22:29:18 File "/venv/lib/python3.10/site-packages/fastapi/routing.py", line 301, in app
2025-01-26 22:29:18 raw_response = await run_endpoint_function(
2025-01-26 22:29:18 File "/venv/lib/python3.10/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
2025-01-26 22:29:18 return await dependant.call(**values)
2025-01-26 22:29:18 File "/app/extensions/openai/script.py", line 139, in openai_chat_completions
2025-01-26 22:29:18 response = OAIcompletions.chat_completions(to_dict(request_data), is_legacy=is_legacy)
2025-01-26 22:29:18 File "/app/extensions/openai/completions.py", line 544, in chat_completions
2025-01-26 22:29:18 return deque(generator, maxlen=1).pop()
2025-01-26 22:29:18 File "/app/extensions/openai/completions.py", line 333, in chat_completions_common
2025-01-26 22:29:18 for a in generator:
2025-01-26 22:29:18 File "/app/modules/chat.py", line 410, in generate_chat_reply
2025-01-26 22:29:18 for history in chatbot_wrapper(text, state, regenerate=regenerate, _continue=_continue, loading_message=loading_message, for_ui=for_ui):
2025-01-26 22:29:18 File "/app/modules/chat.py", line 352, in chatbot_wrapper
2025-01-26 22:29:18 for j, reply in enumerate(generate_reply(prompt, state, stopping_strings=stopping_strings, is_chat=True, for_ui=for_ui)):
2025-01-26 22:29:18 File "/app/modules/text_generation.py", line 42, in generate_reply
2025-01-26 22:29:18 for result in _generate_reply(*args, **kwargs):
2025-01-26 22:29:18 File "/app/modules/text_generation.py", line 97, in _generate_reply
2025-01-26 22:29:18 for reply in generate_func(question, original_question, seed, state, stopping_strings, is_chat=is_chat):
2025-01-26 22:29:18 File "/app/modules/text_generation.py", line 338, in generate_reply_HF
2025-01-26 22:29:18 if state['static_cache']:
2025-01-26 22:29:18 KeyError: 'static_cache'

Is there an existing issue for this?

I have searched the existing issues

Reproduction

Since it does that with all my exl2 models, I suppose hosting an exl2 and using chatcompletion api should suffice to reproduce the error.

Screenshot

No response

Logs

2025-01-26 22:29:18   File "/venv/lib/python3.10/site-packages/starlette/routing.py", line 73, in app
2025-01-26 22:29:18     response = await f(request)
2025-01-26 22:29:18   File "/venv/lib/python3.10/site-packages/fastapi/routing.py", line 301, in app
2025-01-26 22:29:18     raw_response = await run_endpoint_function(
2025-01-26 22:29:18   File "/venv/lib/python3.10/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
2025-01-26 22:29:18     return await dependant.call(**values)
2025-01-26 22:29:18   File "/app/extensions/openai/script.py", line 139, in openai_chat_completions
2025-01-26 22:29:18     response = OAIcompletions.chat_completions(to_dict(request_data), is_legacy=is_legacy)
2025-01-26 22:29:18   File "/app/extensions/openai/completions.py", line 544, in chat_completions
2025-01-26 22:29:18     return deque(generator, maxlen=1).pop()
2025-01-26 22:29:18   File "/app/extensions/openai/completions.py", line 333, in chat_completions_common
2025-01-26 22:29:18     for a in generator:
2025-01-26 22:29:18   File "/app/modules/chat.py", line 410, in generate_chat_reply
2025-01-26 22:29:18     for history in chatbot_wrapper(text, state, regenerate=regenerate, _continue=_continue, loading_message=loading_message, for_ui=for_ui):
2025-01-26 22:29:18   File "/app/modules/chat.py", line 352, in chatbot_wrapper
2025-01-26 22:29:18     for j, reply in enumerate(generate_reply(prompt, state, stopping_strings=stopping_strings, is_chat=True, for_ui=for_ui)):
2025-01-26 22:29:18   File "/app/modules/text_generation.py", line 42, in generate_reply
2025-01-26 22:29:18     for result in _generate_reply(*args, **kwargs):
2025-01-26 22:29:18   File "/app/modules/text_generation.py", line 97, in _generate_reply
2025-01-26 22:29:18     for reply in generate_func(question, original_question, seed, state, stopping_strings, is_chat=is_chat):
2025-01-26 22:29:18   File "/app/modules/text_generation.py", line 338, in generate_reply_HF
2025-01-26 22:29:18     if state['static_cache']:
2025-01-26 22:29:18 KeyError: 'static_cache'

System Info

Local 4090, running in docker-desktop on Windows 11 Pro.

Sylphar · 2025-01-28T08:48:19Z

Can confirm having the same problem, no exllamav2 model can communicate with SillyTavern.

jsboige added the bug Something isn't working label Jan 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new static_cache option seem to have broken API with exl2 models #6702

new static_cache option seem to have broken API with exl2 models #6702

jsboige commented Jan 26, 2025

Sylphar commented Jan 28, 2025

new static_cache option seem to have broken API with exl2 models #6702

new static_cache option seem to have broken API with exl2 models #6702

Comments

jsboige commented Jan 26, 2025

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

Sylphar commented Jan 28, 2025