You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Regular UI still works fine, but with API calls, I have the following exception trace:
2025-01-26 22:29:18 File "/venv/lib/python3.10/site-packages/starlette/routing.py", line 73, in app
2025-01-26 22:29:18 response = await f(request)
2025-01-26 22:29:18 File "/venv/lib/python3.10/site-packages/fastapi/routing.py", line 301, in app
2025-01-26 22:29:18 raw_response = await run_endpoint_function(
2025-01-26 22:29:18 File "/venv/lib/python3.10/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
2025-01-26 22:29:18 return await dependant.call(**values)
2025-01-26 22:29:18 File "/app/extensions/openai/script.py", line 139, in openai_chat_completions
2025-01-26 22:29:18 response = OAIcompletions.chat_completions(to_dict(request_data), is_legacy=is_legacy)
2025-01-26 22:29:18 File "/app/extensions/openai/completions.py", line 544, in chat_completions
2025-01-26 22:29:18 return deque(generator, maxlen=1).pop()
2025-01-26 22:29:18 File "/app/extensions/openai/completions.py", line 333, in chat_completions_common
2025-01-26 22:29:18 for a in generator:
2025-01-26 22:29:18 File "/app/modules/chat.py", line 410, in generate_chat_reply
2025-01-26 22:29:18 for history in chatbot_wrapper(text, state, regenerate=regenerate, _continue=_continue, loading_message=loading_message, for_ui=for_ui):
2025-01-26 22:29:18 File "/app/modules/chat.py", line 352, in chatbot_wrapper
2025-01-26 22:29:18 for j, reply in enumerate(generate_reply(prompt, state, stopping_strings=stopping_strings, is_chat=True, for_ui=for_ui)):
2025-01-26 22:29:18 File "/app/modules/text_generation.py", line 42, in generate_reply
2025-01-26 22:29:18 for result in _generate_reply(*args, **kwargs):
2025-01-26 22:29:18 File "/app/modules/text_generation.py", line 97, in _generate_reply
2025-01-26 22:29:18 for reply in generate_func(question, original_question, seed, state, stopping_strings, is_chat=is_chat):
2025-01-26 22:29:18 File "/app/modules/text_generation.py", line 338, in generate_reply_HF
2025-01-26 22:29:18 if state['static_cache']:
2025-01-26 22:29:18 KeyError: 'static_cache'
Is there an existing issue for this?
I have searched the existing issues
Reproduction
Since it does that with all my exl2 models, I suppose hosting an exl2 and using chatcompletion api should suffice to reproduce the error.
Screenshot
No response
Logs
2025-01-26 22:29:18 File "/venv/lib/python3.10/site-packages/starlette/routing.py", line 73, in app
2025-01-26 22:29:18 response = await f(request)
2025-01-26 22:29:18 File "/venv/lib/python3.10/site-packages/fastapi/routing.py", line 301, in app
2025-01-26 22:29:18 raw_response = await run_endpoint_function(
2025-01-26 22:29:18 File "/venv/lib/python3.10/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
2025-01-26 22:29:18 return await dependant.call(**values)
2025-01-26 22:29:18 File "/app/extensions/openai/script.py", line 139, in openai_chat_completions
2025-01-26 22:29:18 response = OAIcompletions.chat_completions(to_dict(request_data), is_legacy=is_legacy)
2025-01-26 22:29:18 File "/app/extensions/openai/completions.py", line 544, in chat_completions
2025-01-26 22:29:18 return deque(generator, maxlen=1).pop()
2025-01-26 22:29:18 File "/app/extensions/openai/completions.py", line 333, in chat_completions_common
2025-01-26 22:29:18 forain generator:
2025-01-26 22:29:18 File "/app/modules/chat.py", line 410, in generate_chat_reply
2025-01-26 22:29:18 forhistoryin chatbot_wrapper(text, state, regenerate=regenerate, _continue=_continue, loading_message=loading_message, for_ui=for_ui):
2025-01-26 22:29:18 File "/app/modules/chat.py", line 352, in chatbot_wrapper
2025-01-26 22:29:18 forj, replyin enumerate(generate_reply(prompt, state, stopping_strings=stopping_strings, is_chat=True, for_ui=for_ui)):
2025-01-26 22:29:18 File "/app/modules/text_generation.py", line 42, in generate_reply
2025-01-26 22:29:18 forresultin _generate_reply(*args, **kwargs):
2025-01-26 22:29:18 File "/app/modules/text_generation.py", line 97, in _generate_reply
2025-01-26 22:29:18 forreplyin generate_func(question, original_question, seed, state, stopping_strings, is_chat=is_chat):
2025-01-26 22:29:18 File "/app/modules/text_generation.py", line 338, in generate_reply_HF
2025-01-26 22:29:18 if state['static_cache']:
2025-01-26 22:29:18 KeyError: 'static_cache'
System Info
Local 4090, running in docker-desktop on Windows 11 Pro.
The text was updated successfully, but these errors were encountered:
Describe the bug
Regular UI still works fine, but with API calls, I have the following exception trace:
2025-01-26 22:29:18 File "/venv/lib/python3.10/site-packages/starlette/routing.py", line 73, in app
2025-01-26 22:29:18 response = await f(request)
2025-01-26 22:29:18 File "/venv/lib/python3.10/site-packages/fastapi/routing.py", line 301, in app
2025-01-26 22:29:18 raw_response = await run_endpoint_function(
2025-01-26 22:29:18 File "/venv/lib/python3.10/site-packages/fastapi/routing.py", line 212, in run_endpoint_function
2025-01-26 22:29:18 return await dependant.call(**values)
2025-01-26 22:29:18 File "/app/extensions/openai/script.py", line 139, in openai_chat_completions
2025-01-26 22:29:18 response = OAIcompletions.chat_completions(to_dict(request_data), is_legacy=is_legacy)
2025-01-26 22:29:18 File "/app/extensions/openai/completions.py", line 544, in chat_completions
2025-01-26 22:29:18 return deque(generator, maxlen=1).pop()
2025-01-26 22:29:18 File "/app/extensions/openai/completions.py", line 333, in chat_completions_common
2025-01-26 22:29:18 for a in generator:
2025-01-26 22:29:18 File "/app/modules/chat.py", line 410, in generate_chat_reply
2025-01-26 22:29:18 for history in chatbot_wrapper(text, state, regenerate=regenerate, _continue=_continue, loading_message=loading_message, for_ui=for_ui):
2025-01-26 22:29:18 File "/app/modules/chat.py", line 352, in chatbot_wrapper
2025-01-26 22:29:18 for j, reply in enumerate(generate_reply(prompt, state, stopping_strings=stopping_strings, is_chat=True, for_ui=for_ui)):
2025-01-26 22:29:18 File "/app/modules/text_generation.py", line 42, in generate_reply
2025-01-26 22:29:18 for result in _generate_reply(*args, **kwargs):
2025-01-26 22:29:18 File "/app/modules/text_generation.py", line 97, in _generate_reply
2025-01-26 22:29:18 for reply in generate_func(question, original_question, seed, state, stopping_strings, is_chat=is_chat):
2025-01-26 22:29:18 File "/app/modules/text_generation.py", line 338, in generate_reply_HF
2025-01-26 22:29:18 if state['static_cache']:
2025-01-26 22:29:18 KeyError: 'static_cache'
Is there an existing issue for this?
Reproduction
Since it does that with all my exl2 models, I suppose hosting an exl2 and using chatcompletion api should suffice to reproduce the error.
Screenshot
No response
Logs
System Info
Local 4090, running in docker-desktop on Windows 11 Pro.
The text was updated successfully, but these errors were encountered: