Skip to content

Commit

Permalink
operations: adding a k8s docs and overview (tensorlakeai#641)
Browse files Browse the repository at this point in the history
  • Loading branch information
grampelberg authored Jun 5, 2024
1 parent 11aad6a commit 5833256
Show file tree
Hide file tree
Showing 7 changed files with 184 additions and 57 deletions.
76 changes: 76 additions & 0 deletions docs/docs/operations/kubernetes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Kubernetes

If you'd like to try with your own cluster, check out the
[instructions][operations/k8s]. They'll walk you through an ephemeral setup
using a local cluster. To get Indexify into production, you'll want to modify
the YAML so that it works with your environment. In particular, make sure to pay
attention to the dependencies.

[operations/k8s]:
https://github.com/tensorlakeai/indexify/tree/main/operations/k8s

## Components

- [API Server][api.yaml] - This is where all your requests go. There's an
ingress which exposes `/` by default.
- [Coordinator][coordinator.yaml] - Task scheduler than manages handing work out
to the extractors.
- [Extractors][extractor.yaml] - Extractors can take multiple forms, this
example is generic and works for all the extractors which are distributed by
the project.

[api.yaml]:
https://github.com/tensorlakeai/indexify/blob/main/operations/k8s/kustomize/base/api.yaml
[coordinator.yaml]:
https://github.com/tensorlakeai/indexify/blob/main/operations/k8s/kustomize/base/coordinator.yaml
[extractor.yaml]:
https://github.com/tensorlakeai/indexify/blob/main/operations/k8s/kustomize/components/extractor/extractor.yaml

## Dependencies

### Blob Store

We recommend using an S3 like service for the blob store. Our [ephemeral
example][kustomize/local] uses minio for this. See the [environment variable
patch][minio/api.yaml] for how this gets configured.

[kustomize/local]:
https://github.com/tensorlakeai/indexify/blob/main/operations/k8s/kustomize/local/kustomization.yaml
[minio/api.yaml]:
https://github.com/tensorlakeai/indexify/blob/main/operations/k8s/kustomize/components/minio/api.yaml

#### GCP

- You'll want to create a [HMAC key][gcp-hmac] to use as `AWS_ACCESS_KEY_ID` and
`AWS_SECRET_ACCESS_KEY`.
- Set `AWS_ENDPOINT_URL` to `https://storage.googleapis.com/`

[gcp-hmac]: https://cloud.google.com/storage/docs/authentication/hmackeys

#### Other Clouds

Not all clouds expose a S3 interface. For those that don't check out the
[s3proxy][s3proxy] project. However, we'd love help implementing your native
blob storage of choice! Please open an [issue][issue] so that we can have a
discussion on how that would look for the project.

[s3proxy]: https://github.com/gaul/s3proxy
[issue]: https://github.com/tensorlakeai/indexify/issues

### Vector Store

We support multiple backends for vectors including `LancDb`, `Qdrant` and
`PgVector`. The ephemeral example uses postgres and `PgVector` for this. The
[database][vector-store.yaml] itself is pretty simple. Pay extra attention to
the [patch][postgres/config.yaml] which configures the API server and collector
to use that backend.

[vector-store.yaml]:
https://github.com/tensorlakeai/indexify/blob/main/operations/k8s/kustomize/components/postgres/vector-store.yaml
[postgres/config.yaml]:
https://github.com/tensorlakeai/indexify/blob/main/operations/k8s/kustomize/components/postgres/config.yaml

### Structured Store

Take a look at the vector store component in kustomize. It implements the
structured store as well.
86 changes: 41 additions & 45 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
site_name: ""
site_name: ''
site_url: https://docs.getindexify.ai/

repo_url: https://github.com/tensorlakeai/indexify
Expand All @@ -25,56 +25,55 @@ markdown_extensions:
- pymdownx.snippets
- meta


nav:
- Home:
- Indexify: index.md
- Getting Started - Basic: 'getting_started.md'
- Getting Started - Intermediate: 'getting_started_intermediate.md'
- Key Concepts: 'concepts.md'
- Architecture: 'architecture.md'
- Comparisons: 'comparisons.md'
- Indexify: index.md
- Getting Started - Basic: 'getting_started.md'
- Getting Started - Intermediate: 'getting_started_intermediate.md'
- Key Concepts: 'concepts.md'
- Architecture: 'architecture.md'
- Comparisons: 'comparisons.md'
- CLI and UI:
- User Interface: 'ui.md'
- Extractor CLI: 'extractor_cli.md'
- Server CLI: 'server_cli.md'
- User Interface: 'ui.md'
- Extractor CLI: 'extractor_cli.md'
- Server CLI: 'server_cli.md'
- Extractors:
- Introduction: 'apis/extractors.md'
- Develop Extractors: 'apis/develop_extractors.md'
- Available Extractors:
- Text: 'apis/extractors/text.md'
- PDF: 'apis/extractors/pdf.md'
- Image: 'apis/extractors/image.md'
- Audio: 'apis/extractors/audio.md'
- Video: 'apis/extractors/video.md'
- Embedding: 'apis/extractors/embedding.md'
- Web: 'apis/extractors/web.md'
- Introduction: 'apis/extractors.md'
- Develop Extractors: 'apis/develop_extractors.md'
- Available Extractors:
- Text: 'apis/extractors/text.md'
- PDF: 'apis/extractors/pdf.md'
- Image: 'apis/extractors/image.md'
- Audio: 'apis/extractors/audio.md'
- Video: 'apis/extractors/video.md'
- Embedding: 'apis/extractors/embedding.md'
- Web: 'apis/extractors/web.md'

- APIs:
- Install: 'apis/install_clients.md'
- Content Ingestion: 'apis/content_ingestion.md'
- Extraction Graphs: 'apis/extraction_graphs.md'
- Retrieval: 'apis/retrieval.md'
- Install: 'apis/install_clients.md'
- Content Ingestion: 'apis/content_ingestion.md'
- Extraction Graphs: 'apis/extraction_graphs.md'
- Retrieval: 'apis/retrieval.md'
- Use Cases:
- Basic RAG: 'usecases/rag.md'
- Audio Extraction: 'usecases/audio_extraction.md'
- PDF Extraction: 'usecases/pdf_extraction.md'
- Image Retrieval: 'usecases/image_retrieval.md'
- Video Understanding: 'usecases/video_rag.md'
- Basic RAG: 'usecases/rag.md'
- Audio Extraction: 'usecases/audio_extraction.md'
- PDF Extraction: 'usecases/pdf_extraction.md'
- Image Retrieval: 'usecases/image_retrieval.md'
- Video Understanding: 'usecases/video_rag.md'
- LLM Frameworks:
- Langchain:
- Python: 'integrations/langchain/python_langchain.md'
- TypeScript: 'integrations/langchain/typescript_langchain.md'
- DSPy:
- Python: 'integrations/dspy/python_dspy.md'
- Langchain:
- Python: 'integrations/langchain/python_langchain.md'
- TypeScript: 'integrations/langchain/typescript_langchain.md'
- DSPy:
- Python: 'integrations/dspy/python_dspy.md'
- Deployment and Operation:
- Configuration: 'configuration.md'
- Deployment: 'deployment.md'
- Metrics: 'metrics.md'
- Develop Indexify: 'develop.md'
- Extractors on GPU: 'gpu-deployment.md'
- Kubernetes: 'operations/kubernetes.md'
- Configuration: 'configuration.md'
- Metrics: 'metrics.md'
- Develop Indexify: 'develop.md'
- Extractors on GPU: 'gpu-deployment.md'
- Examples:
- All Examples: 'examples/landing.md'
- All Examples: 'examples/landing.md'

plugins:
- mkdocs-jupyter
Expand All @@ -100,9 +99,6 @@ theme:
- content.code.copy
- content.code.annotate




palette:
- scheme: default
toggle:
Expand All @@ -128,4 +124,4 @@ extra:
name: Email

extra_css:
- stylesheets/extras.css
- stylesheets/extras.css
23 changes: 12 additions & 11 deletions operations/k8s/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,24 +4,25 @@

The resources have been split into separate components:

- [base](base) - this includes the API server and the coordinator.
- [components/postgres](components/postgres) - a simple, ephemeral example of
using postgres for all database operations including the vector store.
- [components/minio](components/minio) - an ephemeral example of using S3 for
blog storage.
- [components/extractors](components/extractors) - extractors are published as
common containers, this component is used by all the extractors, such as
[minilm-l6](components/minilm-l6) to provide extraction.
- [base](kustomize/base) - this includes the API server and the coordinator.
- [components/postgres](kustomize/components/postgres) - a simple, ephemeral
example of using postgres for all database operations including the vector
store.
- [components/minio](kustomize/components/minio) - an ephemeral example of using
S3 for blog storage.
- [components/extractors](kustomize/components/extractors) - extractors are
published as common containers, this component is used by all the extractors,
such as [minilm-l6](kustomize/components/minilm-l6) to provide extraction.

> [!NOTE] The API server comes with an ingress resource by default that exposes
> the api at `/`. Make sure to change this if you'd like it at a different
> location.
To run locally, you can apply the [local](local) setup and then go through the
getting started guide.
To run locally, you can apply the [local](kustomize/local) setup and then go
through the getting started guide.

```bash
kubectl apply -k local
kubectl apply -k kustomize/local
```

## Cluster Standup
Expand Down
3 changes: 2 additions & 1 deletion operations/k8s/kustomize/base/api.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,8 @@ spec:
image: alpine:3
command: ['/bin/sh', '-c']
args:
- find /fragments -type f | xargs -I{} sh -c "cat {}; echo ''" > /config/config.yaml
- |-
find /fragments -type f | xargs -I{} sh -c "cat {}; echo ''" > /config/config.yaml
volumeMounts:
- mountPath: /config
Expand Down
25 changes: 25 additions & 0 deletions operations/k8s/kustomize/components/minio/api.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
labels:
app.kubernetes.io/component: api
spec:
template:
spec:
containers:
- name: indexify
env:
- name: AWS_ENDPOINT_URL
value: http://blob-store:9000
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: blob-store
key: AWS_ACCESS_KEY_ID
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: blob-store
key: AWS_SECRET_ACCESS_KEY
23 changes: 23 additions & 0 deletions operations/k8s/kustomize/components/minio/extractor.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: extractor
spec:
template:
spec:
containers:
- name: extractor
env:
- name: AWS_ENDPOINT_URL
value: http://blob-store:9000
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: blob-store
key: AWS_ACCESS_KEY_ID
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: blob-store
key: AWS_SECRET_ACCESS_KEY
5 changes: 5 additions & 0 deletions operations/k8s/kustomize/components/minio/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,8 @@ resources:

patches:
- path: config.yaml
- path: api.yaml
- target:
kind: Deployment
labelSelector: app.kubernetes.io/component=extractor
path: extractor.yaml

0 comments on commit 5833256

Please sign in to comment.