(attempting to) Deploy a pre-trained model to k8s for inferencing. Note, that this will require downloading models which can range from 500MB to 100GB.
Local run:
# Setup virtualenv
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Run
source .env
python3 app.py
Kubernetes run:
docker buildx build --platform linux/amd64,linux/arm64 -t thomasvn/python-inference . --push
docker push thomasvn/python-inference
source .env
envsubst < k8s.yaml | kubectl apply -f -
- openai-community/gpt2. ~525MB. 137M Params.
- mistralai/Mixtral-8x7B-Instruct-v0.1. ~87GB. 46.7B Params.