Self-hosted AI coding assistant. An opensource / on-prem alternative to GitHub Copilot.
Warning Tabby is still in the alpha phase
- Self-contained, with no need for a DBMS or cloud service
- Web UI for visualizing and configuration models and MLOps.
- OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE).
- Consumer level GPU supports (FP-16 weight loading with various optimization).
NOTE: Tabby requires NVIDIA GPU.
Before running Tabby, ensure the installation of the NVIDIA Container Toolkit. We suggest using NVIDIA drivers that are compatible with CUDA version 11.8 or higher.
# Create data dir and grant owner to 1000 (Tabby run as uid 1000 in container)
mkdir -p data/hf_cache && chown -R 1000 data
docker run \
--gpus all \
-it --rm \
-v "./data:/data" \
-v "./data/hf_cache:/home/app/.cache/huggingface" \
-p 5000:5000 \
-e MODEL_NAME=TabbyML/J-350M \
-e MODEL_BACKEND=triton \
--name=tabby \
tabbyml/tabby
You can then query the server using /v1/completions
endpoint:
curl -X POST http://localhost:5000/v1/completions -H 'Content-Type: application/json' --data '{
"prompt": "def binarySearch(arr, left, right, x):\n mid = (left +"
}'
We also provides an interactive playground in admin panel localhost:5000/_admin
See deployment/skypilot/README.md
Tabby opens an FastAPI server at localhost:5000, which embeds an OpenAPI documentation of the HTTP API.
Go to development
directory.
make dev
or
make dev-triton # Turn on triton backend (for cuda env developers)