Skip to content

Releases: gpustack/llama-box

v0.0.107

16 Jan 11:08
Compare
Choose a tag to compare
  1. Fix crash on truncating long kv cache;
  2. Fix crash on chatting with the image which has Alpha channel;
  3. Fix VRAM occupations when zero offloading with --mmproj;
  4. Compatible with some GGUF files which described the wrong kv_count, e.g: CompendiumLabs/bge-large-zh-v1.5-gguf/FP16.

v0.0.106

15 Jan 00:40
Compare
Choose a tag to compare
  1. Remove the VRAM occupation when zero offloading: -ngl 0;
  2. Fix rerank model loading error: gpustack/gte-multilingual-reranker-base-GGUF, gpustack/jina-reranker-v2-base-multilingual-GGUF
  3. Support tool calling in ChatGLM4 series;
  4. Introduce DDIM(ddim_trailing) sample method;
  5. Support multiple devices offloading image model.

image
image

v0.0.105

10 Jan 17:41
Compare
Choose a tag to compare
  1. Fix RPC Server calling;
  2. Fix SD3.x image generation problem;
  3. Support assistant role chat with image message type;
  4. Support using data URL(http://, https://) in image message type.

v0.0.104

07 Jan 16:34
Compare
Choose a tag to compare
  1. Fix AMD GPU utilization 100%(incomplete), see #23.
  2. Support HYGON GPU.
  3. Reduce VRAM occupation when no GPU offloaing.

v0.0.103

05 Jan 12:06
Compare
Choose a tag to compare
  1. Allow distributing deploy Q*K(_M) model;
  2. Support DeepSeek v3;
  3. (BC) Do NOT compatible with the previous RPC server.

v0.0.102

05 Jan 03:14
Compare
Choose a tag to compare
  1. Fix Embedding crashing;
  2. Support Lora per request;
  3. Support multiple-level verbosity logging.

v0.0.101

03 Jan 09:43
Compare
Choose a tag to compare
refactor: log verbosity

Signed-off-by: thxCode <[email protected]>

v0.0.100

31 Dec 14:21
Compare
Choose a tag to compare
docs: readme

Signed-off-by: thxCode <[email protected]>

v0.0.99

27 Dec 17:50
Compare
Choose a tag to compare
refactor: meta

Signed-off-by: thxCode <[email protected]>

v0.0.98

26 Dec 17:12
Compare
Choose a tag to compare
ci: fix build

- windows amd64
- remove darwin ccache

Signed-off-by: thxCode <[email protected]>