This software enables you to join your llama.cpp server to the KoboldAI Horde and make it into a Scribe worker, performing distributed text generation.
It is a fork of KoboldAI-Horde-Bridge.
See this reddit post, using this trick older Pascal GPUs (GTX 10x0, P40, K80) are almost twice as fast, particulary at long contexts.
Compile llama.cpp with make LLAMA_CUBLAS=1 LLAMA_CUDA_FORCE_MMQ=1
to get a Pascal-optimized server
binary.
- Launch llama.cpp server, something like:
server -m /path/to/model.gguf -ngl 100 -c 2048
- Obtain a Horde API key
- Copy
clientData_template.py
toclientData.py
and customize the configuration:kai_url
LlamaCpp server endpoint (default OK if same machine)kai_name
Horde worker nameapi_key
Hode API key
- Run
bridge.py
Note that for quick Testing, you can provide these arguments via the CLI: bridge.py -k <kai_url> -a <api_key> -n <kai_name>