GitHub - collinzrj/llm_kv_tree: A Tree that automatically stores and loads LLM KV Cache

LLM KV Tree

LLM KV Tree helps you speeds up your tree search in LLM inference by sharing the KV cache. This can be used to speed up Beam Search and Monte Carlo Tree Search.

LLM KV Tree is a python class that automatically stores kv cache of LLM in a tree format, you don't need to handle the store and loading of kv cache yourself, you can just call llm_tree_accelerate_next_logit, and it automatically finds the maximum prefix cached, and stores the kv cache of current tokens

You can also call llm_tree_accelerate_next_logit_batch for batch processing, it automatically finds the maximum length n such that tokens[:n] of all the tokens in the batch are cached

LLM KV Tree is especially useful when you do beam search like decoding/searching in LLM, for which you can reuse the kv cache a lot, and you don't need to handle the storage and loading of kv cache yourself

TODO

Implement Beam Search Example
Implement Monte Carlo Tree Search example

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
llm_kv_tree.py		llm_kv_tree.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM KV Tree

TODO

About

Uh oh!

Releases

Packages

Languages

collinzrj/llm_kv_tree

Folders and files

Latest commit

History

Repository files navigation

LLM KV Tree

TODO

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages