Releases: gpustack/llama-box
Releases · gpustack/llama-box
v0.0.107
- Fix crash on truncating long kv cache;
- Fix crash on chatting with the image which has Alpha channel;
- Fix VRAM occupations when zero offloading with
--mmproj
; - Compatible with some GGUF files which described the wrong
kv_count
, e.g: CompendiumLabs/bge-large-zh-v1.5-gguf/FP16.
v0.0.106
- Remove the VRAM occupation when zero offloading:
-ngl 0
; - Fix rerank model loading error: gpustack/gte-multilingual-reranker-base-GGUF, gpustack/jina-reranker-v2-base-multilingual-GGUF
- Support tool calling in ChatGLM4 series;
- Introduce DDIM(
ddim_trailing
) sample method; - Support multiple devices offloading image model.
v0.0.105
- Fix RPC Server calling;
- Fix SD3.x image generation problem;
- Support
assistant
role chat withimage
message type; - Support using data URL(
http://
,https://
) inimage
message type.
v0.0.104
v0.0.103
- Allow distributing deploy Q*K(_M) model;
- Support DeepSeek v3;
- (BC) Do NOT compatible with the previous RPC server.
v0.0.102
- Fix Embedding crashing;
- Support Lora per request;
- Support multiple-level verbosity logging.
v0.0.101
refactor: log verbosity Signed-off-by: thxCode <[email protected]>
v0.0.100
docs: readme Signed-off-by: thxCode <[email protected]>
v0.0.99
refactor: meta Signed-off-by: thxCode <[email protected]>
v0.0.98
ci: fix build - windows amd64 - remove darwin ccache Signed-off-by: thxCode <[email protected]>