Skip to content

Commit

Permalink
docs: add hnsw compute add-ons table
Browse files Browse the repository at this point in the history
  • Loading branch information
egor-romanov committed Sep 11, 2023
1 parent b5ec8cb commit dda76b7
Show file tree
Hide file tree
Showing 2 changed files with 130 additions and 88 deletions.
190 changes: 116 additions & 74 deletions apps/docs/pages/guides/ai/choosing-compute-addon.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,96 +17,49 @@ You have two options for scaling your vector workload:

## Dimensionality

The number of dimensions in your embeddings is the most important factor in choosing the right Compute Add-on. In general, the lower the dimensionality the better the performance. We've provided guidance for some of the more common embedding dimensions below. For each benchmark, we used [Vecs](https://github.com/supabase/vecs) to create a collection, upload the embeddings to a single table, and create an `inner-product` index for the embedding column. We then ran a series of queries to measure the performance of different compute add-ons:
The number of dimensions in your embeddings is the most important factor in choosing the right Compute Add-on. In general, the lower the dimensionality the better the performance. We've provided guidance for some of the more common embedding dimensions below. For each benchmark, we used [Vecs](https://github.com/supabase/vecs) to create a collection, upload the embeddings to a single table, and create both the `IVFFlat` and `HNSW` indexes for `inner-product` distance measure for the embedding column. We then ran a series of queries to measure the performance of different compute add-ons:

## HNSW

### 1536 Dimensions

This benchmark uses the [dbpedia-entities-openai-1M](https://huggingface.co/datasets/KShivendu/dbpedia-entities-openai-1M) dataset, which contains 1,000,000 embeddings of text. Each embedding is 1536 dimensions created with the [OpenAI Embeddings API](https://platform.openai.com/docs/guides/embeddings).
This benchmark uses the [dbpedia-entities-openai-1M](https://huggingface.co/datasets/KShivendu/dbpedia-entities-openai-1M) dataset, which contains 1,000,000 embeddings of text. And 224,482 embeddings from [Wikipedia articles](https://huggingface.co/datasets/Supabase/wikipedia-en-embeddings) for compute add-ons `large` and below. Each embedding is 1536 dimensions created with the [OpenAI Embeddings API](https://platform.openai.com/docs/guides/embeddings).

<Tabs
scrollable
size="small"
type="underlined"
defaultActiveId="dbpedia1536"
defaultActiveId="openai1536"
>
<TabPanel id="dbpedia1536" label="OpenAI-1536, probes = 10">
<TabPanel id="openai1536" label="OpenAI-1536">

| Plan | Vectors | Lists | RPS | Latency Mean | Latency p95 | RAM Usage | RAM |
| ------ | --------- | ----- | ---- | ------------ | ----------- | ------------------ | ------ |
| Free | 20,000 | 40 | 135 | 0.372 sec | 0.412 sec | 1 GB + 200 Mb Swap | 1 GB |
| Small | 50,000 | 100 | 140 | 0.357 sec | 0.398 sec | 1.8 GB | 2 GB |
| Medium | 100,000 | 200 | 130 | 0.383 sec | 0.446 sec | 3.7 GB | 4 GB |
| Large | 250,000 | 500 | 130 | 0.378 sec | 0.434 sec | 7 GB | 8 GB |
| XL | 500,000 | 1000 | 235 | 0.213 sec | 0.271 sec | 13.5 GB | 16 GB |
| 2XL | 1,000,000 | 2000 | 380 | 0.133 sec | 0.236 sec | 30 GB | 32 GB |
| 4XL | 1,000,000 | 2000 | 720 | 0.068 sec | 0.120 sec | 35 GB | 64 GB |
| 8XL | 1,000,000 | 2000 | 1250 | 0.039 sec | 0.066 sec | 38 GB | 128 GB |
| 12XL | 1,000,000 | 2000 | 1600 | 0.030 sec | 0.052 sec | 41 GB | 192 GB |
| 16XL | 1,000,000 | 2000 | 1790 | 0.029 sec | 0.051 sec | 45 GB | 256 GB |

For 1,000,000 vectors 10 probes results to precision of 0.91. And for 500,000 vectors and below 10 probes results to precision in the range of 0.95 - 0.99. To increase precision, you need to increase the number of probes.

</TabPanel>
<TabPanel id="dbpedia1536_40" label="OpenAI-1536, probes = 40">
| Plan | Vectors | m | ef_construction | ef_search | QPS | Latency Mean | Latency p95 | RAM Usage | RAM |
| ------ | --------- | --- | --------------- | --------- | ---- | ------------ | ----------- | ------------------ | ------ |
| Free | 15,000 | 16 | 40 | 40 | 480 | 0.011 sec | 0.016 sec | 1 GB + 200 Mb Swap | 1 GB |
| Small | 50,000 | 32 | 64 | 100 | 175 | 0.031 sec | 0.051 sec | 2 GB + 200 Mb Swap | 2 GB |
| Medium | 100,000 | 32 | 64 | 100 | 240 | 0.083 sec | 0.126 sec | 4 GB | 4 GB |
| Large | 224,482 | 32 | 64 | 100 | 280 | 0.017 sec | 0.028 sec | 8 GB | 8 GB |
| XL | 500,000 | 24 | 56 | 100 | 360 | 0.055 sec | 0.135 sec | 13 GB | 16 GB |
| 2XL | 1,000,000 | 24 | 56 | 250 | 560 | 0.036 sec | 0.058 sec | 32 GB | 32 GB |
| 4XL | 1,000,000 | 24 | 56 | 250 | 950 | 0.021 sec | 0.033 sec | 39 GB | 64 GB |
| 8XL | 1,000,000 | 24 | 56 | 250 | 1650 | 0.016 sec | 0.023 sec | 40 GB | 128 GB |
| 12XL | 1,000,000 | 24 | 56 | 250 | 1900 | 0.015 sec | 0.021 sec | 38 GB | 192 GB |
| 16XL | 1,000,000 | 24 | 56 | 250 | 2200 | 0.015 sec | 0.020 sec | 40 GB | 256 GB |

| Plan | Vectors | Lists | RPS | Latency Mean | Latency p95 | RAM Usage | RAM |
| ------ | --------- | ----- | --- | ------------ | ----------- | --------- | ------ |
| Free | 20,000 | 40 | - | - | - | - | 1 GB |
| Small | 50,000 | 100 | - | - | - | - | 2 GB |
| Medium | 100,000 | 200 | - | - | - | - | 4 GB |
| Large | 250,000 | 500 | - | - | - | - | 8 GB |
| XL | 500,000 | 1000 | - | - | - | - | 16 GB |
| 2XL | 1,000,000 | 2000 | 140 | 0.358 sec | 0.575 sec | 30 GB | 32 GB |
| 4XL | 1,000,000 | 2000 | 270 | 0.186 sec | 0.304 sec | 35 GB | 64 GB |
| 8XL | 1,000,000 | 2000 | 470 | 0.104 sec | 0.166 sec | 38 GB | 128 GB |
| 12XL | 1,000,000 | 2000 | 600 | 0.085 sec | 0.132 sec | 41 GB | 192 GB |
| 16XL | 1,000,000 | 2000 | 670 | 0.081 sec | 0.129 sec | 45 GB | 256 GB |
Accuracy was 0.99 for benchmarks.

For 1,000,000 vectors 40 probes results to precision of 0.98. Note that exact values may vary depending on the dataset and queries, we recommend to run benchmarks with your own data to get precise results. Use this table as a reference.
QPS can also be improved by increasing `m` and `ef_construction`. This will allow you to use a smaller value for `ef_search` and increase QPS. For example, increasing `m` to 32 and `ef_construction` to 80 for 4XL will increase QPS to 1280.

</TabPanel>
</Tabs>

<div>
<img
alt="multi database"
className="dark:hidden"
src="/docs/img/ai/going-prod/size-to-rps--light.png"
/>
<img
alt="multi database"
className="hidden dark:block"
src="/docs/img/ai/going-prod/size-to-rps--dark.png"
/>
</div>

### 960 Dimensions
<Admonition type="note">

This benchmark uses the [gist-960-angular](http://corpus-texmex.irisa.fr/) dataset, which contains 1,000,000 embeddings of images. Each embedding is 960 dimensions.
It is possible to upload more vectors to a single table if Memory allows it (for example, 4XL plan and higher for OpenAI embeddings). But it will affect the performance of the queries: QPS will be lower, and latency will be higher. Scaling should be almost linear, but it is recommended to benchmark your workload to find the optimal number of vectors per table and per database instance.

<Tabs
scrollable
size="small"
type="underlined"
defaultActiveId="gist960"
>
<TabPanel id="gist960" label="gist-960, probes = 10">

| Plan | Vectors | Lists | RPS | Latency Mean | Latency p95 | RAM Usage | RAM |
| ------ | --------- | ----- | ---- | ------------ | ----------- | ------------------ | ------ |
| Free | 30,000 | 30 | 75 | 0.065 sec | 0.088 sec | 1 GB + 100 Mb Swap | 1 GB |
| Small | 100,000 | 100 | 78 | 0.064 sec | 0.092 sec | 1.8 GB | 2 GB |
| Medium | 250,000 | 250 | 58 | 0.085 sec | 0.129 sec | 3.2 GB | 4 GB |
| Large | 500,000 | 500 | 55 | 0.088 sec | 0.140 sec | 5 GB | 8 GB |
| XL | 1,000,000 | 1000 | 110 | 0.046 sec | 0.070 sec | 14 GB | 16 GB |
| 2XL | 1,000,000 | 1000 | 235 | 0.083 sec | 0.136 sec | 10 GB | 32 GB |
| 4XL | 1,000,000 | 1000 | 420 | 0.071 sec | 0.106 sec | 11 GB | 64 GB |
| 8XL | 1,000,000 | 1000 | 815 | 0.072 sec | 0.106 sec | 13 GB | 128 GB |
| 12XL | 1,000,000 | 1000 | 1150 | 0.052 sec | 0.078 sec | 15.5 GB | 192 GB |
| 16XL | 1,000,000 | 1000 | 1345 | 0.072 sec | 0.106 sec | 17.5 GB | 256 GB |
</Admonition>

</TabPanel>
</Tabs>
## IVFFlat

### 512 Dimensions

Expand All @@ -120,7 +73,7 @@ This benchmark uses the [GloVe Reddit comments](https://nlp.stanford.edu/project
>
<TabPanel id="glove512" label="GloVe-512, probes = 10">

| Plan | Vectors | Lists | RPS | Latency Mean | Latency p95 | RAM Usage | RAM |
| Plan | Vectors | Lists | QPS | Latency Mean | Latency p95 | RAM Usage | RAM |
| ------ | --------- | ----- | ---- | ------------ | ----------- | ------------------ | ------ |
| Free | 100,000 | 100 | 250 | 0.395 sec | 0.432 sec | 1 GB + 300 Mb Swap | 1 GB |
| Small | 250,000 | 250 | 440 | 0.223 sec | 0.250 sec | 2 GB + 200 Mb Swap | 2 GB |
Expand All @@ -136,7 +89,7 @@ This benchmark uses the [GloVe Reddit comments](https://nlp.stanford.edu/project
</TabPanel>
<TabPanel id="glove512_60" label="GloVe-512, probes = 60">

| Plan | Vectors | Lists | RPS | Latency Mean | Latency p95 | RAM Usage | RAM |
| Plan | Vectors | Lists | QPS | Latency Mean | Latency p95 | RAM Usage | RAM |
| ------ | --------- | ----- | --- | ------------ | ----------- | --------- | ------ |
| Free | 100,000 | 100 | - | - | - | - | 1 GB |
| Small | 250,000 | 250 | - | - | - | - | 2 GB |
Expand All @@ -153,9 +106,98 @@ This benchmark uses the [GloVe Reddit comments](https://nlp.stanford.edu/project
</TabPanel>
</Tabs>

### 960 Dimensions

This benchmark uses the [gist-960-angular](http://corpus-texmex.irisa.fr/) dataset, which contains 1,000,000 embeddings of images. Each embedding is 960 dimensions.

<Tabs
scrollable
size="small"
type="underlined"
defaultActiveId="gist960"
>
<TabPanel id="gist960" label="gist-960, probes = 10">

| Plan | Vectors | Lists | QPS | Latency Mean | Latency p95 | RAM Usage | RAM |
| ------ | --------- | ----- | ---- | ------------ | ----------- | ------------------ | ------ |
| Free | 30,000 | 30 | 75 | 0.065 sec | 0.088 sec | 1 GB + 100 Mb Swap | 1 GB |
| Small | 100,000 | 100 | 78 | 0.064 sec | 0.092 sec | 1.8 GB | 2 GB |
| Medium | 250,000 | 250 | 58 | 0.085 sec | 0.129 sec | 3.2 GB | 4 GB |
| Large | 500,000 | 500 | 55 | 0.088 sec | 0.140 sec | 5 GB | 8 GB |
| XL | 1,000,000 | 1000 | 110 | 0.046 sec | 0.070 sec | 14 GB | 16 GB |
| 2XL | 1,000,000 | 1000 | 235 | 0.083 sec | 0.136 sec | 10 GB | 32 GB |
| 4XL | 1,000,000 | 1000 | 420 | 0.071 sec | 0.106 sec | 11 GB | 64 GB |
| 8XL | 1,000,000 | 1000 | 815 | 0.072 sec | 0.106 sec | 13 GB | 128 GB |
| 12XL | 1,000,000 | 1000 | 1150 | 0.052 sec | 0.078 sec | 15.5 GB | 192 GB |
| 16XL | 1,000,000 | 1000 | 1345 | 0.072 sec | 0.106 sec | 17.5 GB | 256 GB |

</TabPanel>
</Tabs>

### 1536 Dimensions

This benchmark uses the [dbpedia-entities-openai-1M](https://huggingface.co/datasets/KShivendu/dbpedia-entities-openai-1M) dataset, which contains 1,000,000 embeddings of text. Each embedding is 1536 dimensions created with the [OpenAI Embeddings API](https://platform.openai.com/docs/guides/embeddings).

<Tabs
scrollable
size="small"
type="underlined"
defaultActiveId="dbpedia1536"
>
<TabPanel id="dbpedia1536" label="OpenAI-1536, probes = 10">

| Plan | Vectors | Lists | QPS | Latency Mean | Latency p95 | RAM Usage | RAM |
| ------ | --------- | ----- | ---- | ------------ | ----------- | ------------------ | ------ |
| Free | 20,000 | 40 | 135 | 0.372 sec | 0.412 sec | 1 GB + 200 Mb Swap | 1 GB |
| Small | 50,000 | 100 | 140 | 0.357 sec | 0.398 sec | 1.8 GB | 2 GB |
| Medium | 100,000 | 200 | 130 | 0.383 sec | 0.446 sec | 3.7 GB | 4 GB |
| Large | 250,000 | 500 | 130 | 0.378 sec | 0.434 sec | 7 GB | 8 GB |
| XL | 500,000 | 1000 | 235 | 0.213 sec | 0.271 sec | 13.5 GB | 16 GB |
| 2XL | 1,000,000 | 2000 | 380 | 0.133 sec | 0.236 sec | 30 GB | 32 GB |
| 4XL | 1,000,000 | 2000 | 720 | 0.068 sec | 0.120 sec | 35 GB | 64 GB |
| 8XL | 1,000,000 | 2000 | 1250 | 0.039 sec | 0.066 sec | 38 GB | 128 GB |
| 12XL | 1,000,000 | 2000 | 1600 | 0.030 sec | 0.052 sec | 41 GB | 192 GB |
| 16XL | 1,000,000 | 2000 | 1790 | 0.029 sec | 0.051 sec | 45 GB | 256 GB |

For 1,000,000 vectors 10 probes results to accuracy of 0.91. And for 500,000 vectors and below 10 probes results to accuracy in the range of 0.95 - 0.99. To increase accuracy, you need to increase the number of probes.

</TabPanel>
<TabPanel id="dbpedia1536_40" label="OpenAI-1536, probes = 40">

| Plan | Vectors | Lists | QPS | Latency Mean | Latency p95 | RAM Usage | RAM |
| ------ | --------- | ----- | --- | ------------ | ----------- | --------- | ------ |
| Free | 20,000 | 40 | - | - | - | - | 1 GB |
| Small | 50,000 | 100 | - | - | - | - | 2 GB |
| Medium | 100,000 | 200 | - | - | - | - | 4 GB |
| Large | 250,000 | 500 | - | - | - | - | 8 GB |
| XL | 500,000 | 1000 | - | - | - | - | 16 GB |
| 2XL | 1,000,000 | 2000 | 140 | 0.358 sec | 0.575 sec | 30 GB | 32 GB |
| 4XL | 1,000,000 | 2000 | 270 | 0.186 sec | 0.304 sec | 35 GB | 64 GB |
| 8XL | 1,000,000 | 2000 | 470 | 0.104 sec | 0.166 sec | 38 GB | 128 GB |
| 12XL | 1,000,000 | 2000 | 600 | 0.085 sec | 0.132 sec | 41 GB | 192 GB |
| 16XL | 1,000,000 | 2000 | 670 | 0.081 sec | 0.129 sec | 45 GB | 256 GB |

For 1,000,000 vectors 40 probes results to accuracy of 0.98. Note that exact values may vary depending on the dataset and queries, we recommend to run benchmarks with your own data to get precise results. Use this table as a reference.

</TabPanel>
</Tabs>

<div>
<img
alt="multi database"
className="dark:hidden"
src="/docs/img/ai/going-prod/size-to-rps--light.png"
/>
<img
alt="multi database"
className="hidden dark:block"
src="/docs/img/ai/going-prod/size-to-rps--dark.png"
/>
</div>

<Admonition type="note">

It is possible to upload more vectors to a single table if Memory allows it (for example, 4XL plan and higher for OpenAI embeddings). But it will affect the performance of the queries: RPS will be lower, and latency will be higher. Scaling should be almost linear, but it is recommended to benchmark your workload to find the optimal number of vectors per table and per database instance.
It is possible to upload more vectors to a single table if Memory allows it (for example, 4XL plan and higher for OpenAI embeddings). But it will affect the performance of the queries: QPS will be lower, and latency will be higher. Scaling should be almost linear, but it is recommended to benchmark your workload to find the optimal number of vectors per table and per database instance.

</Admonition>

Expand Down
Loading

0 comments on commit dda76b7

Please sign in to comment.