Skip to content

A Tree that automatically stores and loads LLM KV Cache

Notifications You must be signed in to change notification settings

collinzrj/llm_kv_tree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

LLM KV Tree

LLM KV Tree helps you speeds up your tree search in LLM inference by sharing the KV cache. This can be used to speed up Beam Search and Monte Carlo Tree Search.

LLM KV Tree is a python class that automatically stores kv cache of LLM in a tree format, you don't need to handle the store and loading of kv cache yourself, you can just call llm_tree_accelerate_next_logit, and it automatically finds the maximum prefix cached, and stores the kv cache of current tokens

You can also call llm_tree_accelerate_next_logit_batch for batch processing, it automatically finds the maximum length n such that tokens[:n] of all the tokens in the batch are cached

LLM KV Tree is especially useful when you do beam search like decoding/searching in LLM, for which you can reuse the kv cache a lot, and you don't need to handle the storage and loading of kv cache yourself

TODO

  • Implement Beam Search Example
  • Implement Monte Carlo Tree Search example

About

A Tree that automatically stores and loads LLM KV Cache

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages