Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

gpustack / llama-box Public

Notifications You must be signed in to change notification settings
Fork 9
Star 64

Code
Issues 3
Pull requests
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Security
Insights

Releases: gpustack/llama-box

Releases · gpustack/llama-box

v0.0.107

16 Jan 11:08

Compare

Choose a tag to compare

Loading

v0.0.107 Latest

Latest

Fix crash on truncating long kv cache;
Fix crash on chatting with the image which has Alpha channel;
Fix VRAM occupations when zero offloading with --mmproj;
Compatible with some GGUF files which described the wrong kv_count, e.g: CompendiumLabs/bge-large-zh-v1.5-gguf/FP16.

Assets 29

llama-box-darwin-amd64-avx2.zip

5.17 MB 2025-01-16T11:09:01Z
llama-box-darwin-amd64-metal.zip

5.28 MB 2025-01-16T11:09:01Z
llama-box-darwin-arm64-metal.zip

5.34 MB 2025-01-16T11:09:01Z
llama-box-linux-amd64-avx2.zip

10.9 MB 2025-01-16T11:09:01Z
llama-box-linux-amd64-avx512.zip

12.4 MB 2025-01-16T11:09:01Z
llama-box-linux-amd64-cann-8.0-310p.zip

6.46 MB 2025-01-16T11:09:01Z
llama-box-linux-amd64-cann-8.0-openeuler20.03-310p.zip

6.19 MB 2025-01-16T11:09:01Z
llama-box-linux-amd64-cann-8.0-openeuler20.03.zip

6.18 MB 2025-01-16T11:09:01Z
llama-box-linux-amd64-cann-8.0.zip

6.46 MB 2025-01-16T11:09:01Z
llama-box-linux-amd64-cuda-11.8.zip

136 MB 2025-01-16T11:09:01Z
Source code (zip)

2025-01-16T08:35:24Z
Source code (tar.gz)

2025-01-16T08:35:24Z

All reactions

v0.0.106

15 Jan 00:40

Compare

Choose a tag to compare

Loading

v0.0.106

Remove the VRAM occupation when zero offloading: -ngl 0;
Fix rerank model loading error: gpustack/gte-multilingual-reranker-base-GGUF, gpustack/jina-reranker-v2-base-multilingual-GGUF
Support tool calling in ChatGLM4 series;
Introduce DDIM(ddim_trailing) sample method;
Support multiple devices offloading image model.

Assets 29

Loading

All reactions

v0.0.105

10 Jan 17:41

Compare

Choose a tag to compare

Loading

v0.0.105

Fix RPC Server calling;
Fix SD3.x image generation problem;
Support assistant role chat with image message type;
Support using data URL(http://, https://) in image message type.

Assets 29

Loading

All reactions

v0.0.104

07 Jan 16:34

Compare

Choose a tag to compare

Loading

v0.0.104

Fix AMD GPU utilization 100%(incomplete), see #23.
Support HYGON GPU.
Reduce VRAM occupation when no GPU offloaing.

Assets 29

Loading

All reactions

v0.0.103

05 Jan 12:06

Compare

Choose a tag to compare

Loading

v0.0.103

Allow distributing deploy Q*K(_M) model;
Support DeepSeek v3;
(BC) Do NOT compatible with the previous RPC server.

Assets 28

Loading

All reactions

v0.0.102

05 Jan 03:14

Compare

Choose a tag to compare

Loading

v0.0.102

Fix Embedding crashing;
Support Lora per request;
Support multiple-level verbosity logging.

Assets 28

Loading

All reactions

v0.0.101

03 Jan 09:43

Compare

Choose a tag to compare

Loading

v0.0.101

refactor: log verbosity

Signed-off-by: thxCode <[email protected]>

Assets 28

Loading

All reactions

v0.0.100

31 Dec 14:21

Compare

Choose a tag to compare

Loading

v0.0.100

docs: readme

Signed-off-by: thxCode <[email protected]>

Assets 28

Loading

All reactions

v0.0.99

27 Dec 17:50

Compare

Choose a tag to compare

Loading

v0.0.99

refactor: meta

Signed-off-by: thxCode <[email protected]>

Assets 28

Loading

All reactions

v0.0.98

26 Dec 17:12

Compare

Choose a tag to compare

Loading

v0.0.98

ci: fix build

- windows amd64
- remove darwin ccache

Signed-off-by: thxCode <[email protected]>

Assets 28

Loading

All reactions

Previous 1 2 3 4 5 … 10 11 Next

Previous Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.