Self-hosted AI coding assistant. An opensource / on-prem alternative to GitHub Copilot.
Warning Tabby is still in the alpha phase
- Self-contained, with no need for a DBMS or cloud service
- Web UI for visualizing and configuration models and MLOps.
- OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE).
- Consumer level GPU supports (FP-16 weight loading with various optimization).
The easiest way of getting started is using the docker image:
# Create data dir and grant owner to 1000 (Tabby run as uid 1000 in container)
mkdir -p data/hf_cache && chown -R 1000 data
docker run \
-it --rm \
-v ./data:/data \
-v ./data/hf_cache:/home/app/.cache/huggingface \
-p 5000:5000 \
-e MODEL_NAME=TabbyML/J-350M \
tabbyml/tabby
To use the GPU backend (triton) for a faster inference speed:
docker run \
--gpus all \
-it --rm \
-v ./data:/data \
-v ./data/hf_cache:/home/app/.cache/huggingface \
-p 5000:5000 \
-e MODEL_NAME=TabbyML/J-350M \
-e MODEL_BACKEND=triton \
tabbyml/tabby
Note: To use GPUs, you need to install the NVIDIA Container Toolkit. We also recommend using NVIDIA drivers with CUDA version 11.8 or higher.
You can then query the server using /v1/completions
endpoint:
curl -X POST http://localhost:5000/v1/completions -H 'Content-Type: application/json' --data '{
"prompt": "def binarySearch(arr, left, right, x):\n mid = (left +"
}'
We also provides an interactive playground in admin panel localhost:5000/_admin
See deployment/skypilot/README.md
Tabby opens an FastAPI server at localhost:5000, which embeds an OpenAPI documentation of the HTTP API.
Go to development
directory.
make dev
or
make dev-triton # Turn on triton backend (for cuda env developers)