You'll need a k8s cluster first. While there are a lot of different ways to get a cluster, if you're doing this locally, we recommend using k3d.
Note: the local example includes a basic ingress - components/ingress. The ingress exposes the API server and is required to use Indexify. If you're doing a different setup, you'll want to make an ingress definition that is specific to your environment.
One way to create a cluster is using k3d. This will run a lightweight version of Kubernetes entirely within docker on your local system.
k3d cluster create -p "8081:80@loadbalancer" indexify
When using this setup, Indexify will be exposed via k3d's ingress which will be
http://localhost:8900. You'll want to configure
IndexifyClient(service_url="http://localhost:8900")
.
To run locally, you can apply the local setup and then go through the getting started guide. To install, run:
kubectl apply -k kustomize/local
There are optional components that you can use as part of your Indexify installation. To make this possible, the optional pieces have been split out into separate components. The postgres and minio examples are not meant to be run in production. Make sure to create your own in a way that reflects your environment.
- base - this includes the API server and the coordinator.
- components/ingress - a basic ingress definition used to get access to the API server as part of the local install.
- components/postgres - a simple, ephemeral example of using postgres for all database operations including the vector store.
- components/minio - an ephemeral example of using S3 for blog storage.
- components/extractors - extractors are published as common containers, this component is used by all the extractors, such as minilm-l6 to provide extraction.
[!NOTE] The API server comes with an ingress resource by default that exposes the api at
/
. Make sure to change this if you'd like it at a different location.
To customize the installation so that it works in your environment, take a look at local/kustomize.yaml. Each resource and component entry are optional and can be swapped out to use your own solution.
For example, if you would like to use S3 instead of minio but otherwise leave
the example intact, you would write a kustomization.yaml
file that looks like:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: indexify
resources:
- ../base
- ../components/ingress
- ../components/chunker
- ../components/minilm-l6
components:
- ../components/postgres
patches:
# base/api.yaml
- patch: |-
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
labels:
app.kubernetes.io/component: api
spec:
template:
spec:
containers:
- name: indexify
env:
# Ideally, this config is coming from IAM in your cluster.
- name: AWS_ACCESS_KEY_ID
value: XXXX
- name: AWS_SECRET_ACCESS_KEY
value: XXXX
# components/extractor/extractor.yaml
- target:
kind: Deployment
labelSelector: app.kubernetes.io/component=extractor
patch: |-
apiVersion: apps/v1
kind: Deployment
metadata:
name: extractor
spec:
template:
spec:
containers:
- name: extractor
env:
# Ideally, this config is coming from IAM in your cluster.
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: blob-store
key: AWS_ACCESS_KEY_ID
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: blob-store
key: AWS_SECRET_ACCESS_KEY
# base/config.yaml
- patch: |-
apiVersion: v1
kind: ConfigMap
metadata:
name: indexify
data:
s3.yml: |-
blob_storage:
backend: s3
s3:
bucket: XXX-my-bucket
region: us-east-1
labels:
- includeSelectors: true
pairs:
app.kubernetes.io/part-of: indexify
These three patches will configure the installation for your environment.
base/api.yaml
- Modifies theindexify
container indeploy/api
to include the environment variables required for S3.components/extractor/extractor.yaml
- Uses the labelSelectorapp.kubernetes.io/component=extractor
to modify all extractors and add S3's env.base/config.yaml
- Adds a key to the configmap used by the API server and coordinator. The content of these keys is concatenated into a singleconfig.yaml
file on startup as part of aninitContainer
.
For each extractor you'd like to add, you'll want to create a new
kustomization.yaml
. These will be included in your parent installation the
same way that the local example includes the chunker and minilm-l6 extractors.
To add the PDF extractor, you'll want to create pdf/kustomization.yaml
.
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
components:
- ../extractor
images:
- name: tensorlake/extractor:latest
# Path to the extractor's docker image
newName: tensorlake/pdf-extractor
newTag: latest
patches:
- target:
group: apps
version: v1
kind: Deployment
name: extractor
patch: |-
- op: replace
path: /metadata/name
value: pdf
- op: add
path: /spec/selector/matchLabels/app.kubernetes.io~1name
value: pdf
- target:
version: v1
kind: Service
name: extractor
patch: |-
- op: replace
path: /metadata/name
value: pdf
This new extractor can then be included in your own install
staging/kustomization.yaml
:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: indexify
resources:
- ../base
- ../components/ingress
- ../components/chunker
- ../components/minilm-l6
- ../components/pdf
To run locally, you can install the chart using some pre-configured values and then go through the getting started guide. To install, run:
helm install local helm -f helm/local.yaml -n indexify --create-namespace
Like the kustomize installation, there are some optional pieces that are managed
via values.yaml
.
-
Blob Store - We're using minio for local development via the official chart.
local.yaml
configures it to run without persistence. To use S3, setminio.enabled=false
and make sure IAM has added the correct credentials for accessing S3. To use other blob stores that support S3's API, look into settingblobStore.endpoint
andblobStore.credentialSecret
. -
Database - We're using postgresql for local development via the bitnami chart.
local.yaml
configures it to run as a single primary. Note that the bitnami postgres image does not come with PgVector, so we are repackaging it with the correct files. To use your own database, look at settingdbURL
,indexConfig
andmetadataStorage
to the correct values. -
Extractors - You can add all the extractors you'd like via
extractors
. Thelocal.yaml
example includes a couple, add the image that contains the extractor you'd like and it will be installed as part of the chart.