Skip to content

Commit

Permalink
docs: update going to prod with hnsw
Browse files Browse the repository at this point in the history
  • Loading branch information
egor-romanov committed Sep 12, 2023
1 parent dda76b7 commit 758d674
Show file tree
Hide file tree
Showing 7 changed files with 53 additions and 10 deletions.
63 changes: 53 additions & 10 deletions apps/docs/pages/guides/ai/going-to-prod.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,51 @@ There are a couple of cases where you might not need indexes:

You don't have to create indexes in these cases and can use sequential scans instead. This type of workload will not be RAM bound and will not require any additional resources but will result in higher latencies and lower throughput. Extra CPU cores may help to improve queries per second, but it will not help to improve latency.

On the other hand, if you need to scale your application, you will need to create indexes. This will result in lower latencies and higher throughput, but will require additional RAM to make use of Postgres Caching. Also, using indexes will result in lower accuracy, since you are replacing exact (KNN) search with approximate (ANN) search.
On the other hand, if you need to scale your application, you will need to [create indexes](https://supabase.com/docs/guides/ai/vector-indexes). This will result in lower latencies and higher throughput, but will require additional RAM to make use of Postgres Caching. Also, using indexes will result in lower accuracy, since you are replacing exact (KNN) search with approximate (ANN) search.

## Understanding `probes` and `lists`
## HNSW vs IVFFlat indexes

`pgvector` supports two types of indexes: HNSW and IVFFlat. We recommend using [HNSW](https://supabase.com/docs/guides/ai/vector-indexes/hnsw-indexes) because of its [performance](https://supabase.com/blog/increase-performance-pgvector-hnsw#hnsw-performance-1536-dimensions) and [robustness against changing data](https://supabase.com/docs/guides/ai/vector-indexes).

<div>
<img
alt="dbpedia embeddings comparing ivfflat and hnsw queries-per-second using the 4XL compute addon (light)"
className="dark:hidden"
src="/docs/img/ai/going-prod/dbpedia-ivfflat-vs-hnsw-4xl--light.png"
/>
<img
alt="dbpedia embeddings comparing ivfflat and hnsw queries-per-second using the 4XL compute addon (dark)"
className="hidden dark:block"
src="/docs/img/ai/going-prod/dbpedia-ivfflat-vs-hnsw-4xl--dark.png"
/>
</div>

## HNSW, understanding `ef_construction`, `ef_search`, and `m`

Index build parameters:

- `m` is the number of bi-directional links created for every new element during construction. Higher `m` is suitable for datasets with high dimensionality and/or high accuracy requirements. Reasonable values for `m` are between 2 and 100. Range 12-48 is a good starting point for most use cases (16 is the default value).

- `ef_construction` is the size of the dynamic list for the nearest neighbors (used during the construction algorithm). Higher `ef_construction` will result in better index quality and higher accuracy, but it will also increase the time required to build the index. `ef_construction` has to be at least 2 \* `m` (64 is the default value). At some point, increasing `ef_construction` does not improve the quality of the index. You can measure accuracy when `ef_search`=`ef_construction`: if an accuracy is lower than 0.9, than there is room for improvement.

Search parameters:

- `ef_search` is the size of the dynamic list for the nearest neighbors (used during the search). Increasing `ef_search` will result in better accuracy, but it will also increase the time required to execute a query (40 is the default value).

<div>
<img
alt="dbpedia embeddings comparing hnsw queries-per-second using different build parameters (light)"
className="dark:hidden"
src="/docs/img/ai/going-prod/dbpedia-hnsw-build-parameters--light.png"
/>
<img
alt="dbpedia embeddings comparing hnsw queries-per-second using different build parameters (dark)"
className="hidden dark:block"
src="/docs/img/ai/going-prod/dbpedia-hnsw-build-parameters--dark.png"
/>
</div>

## IVFFlat, understanding `probes` and `lists`

Indexes used for approximate vector similarity search in pgvector divides a dataset into partitions. The number of these partitions is defined by the `lists` constant. The `probes` controls how many lists are going to be searched during a query.

Expand Down Expand Up @@ -58,29 +100,30 @@ First, a few generic tips which you can pick and choose from:
1. The Supabase managed platform will automatically optimize Postgres configs for you based on your compute addon. But if you self-host, consider **adjusting your Postgres config** based on RAM & CPU cores. See [example optimizations](https://gist.github.com/egor-romanov/323e2847851bbd758081511785573c08) for more details.
2. Prefer `inner-product` to `L2` or `Cosine` distances if your vectors are normalized (like `text-embedding-ada-002`). If embeddings are not normalized, `Cosine` distance should give the best results with an index.
3. **Pre-warm your database.** Implement the warm-up technique before transitioning to production or running benchmarks.
- Execute 10,000 to 50,000 "warm-up" queries before each benchmark, matching the number of `probes` you are going to use in production. Additionally, you can execute about 1,000 queries with probes ranging from three to ten times the prod's probes. Both of these help to increase RAM utilization.
4. **Establish your workload.** Increasing the lists constant for the pgvector index can accelerate your queries (at the expense of a slower build). For instance, for benchmarks with 1,000,000 embeddings, we employed a `lists` constant of 2000 (`number of vectors / 500`) as opposed to the suggested 1000 (`number of vectors / 1000`).
5. **Benchmark your own specific workloads.** Doing this during cache warm-up helps gauge the best value for the `probes` constant, balancing accuracy with queries per second (QPS).
- Use [pg_prewarm](https://www.postgresql.org/docs/current/pgprewarm.html) to load the index into RAM `select pg_prewarm('vecs.docs_vec_idx');`. This will help to avoid cold cache issues.
- Execute 10,000 to 50,000 "warm-up" queries before each benchmark/prod. This will help to utilize cache and buffers more efficiently.
4. **Establish your workload.** Finetune `m` and `ef_construction` or `lists` constants for the pgvector index to accelerate your queries (at the expense of a slower build times). For instance, for benchmarks with 1,000,000 OpenAI embeddings, we set `m` and `ef_constuction` to 32 and 80, and it resulted in 35% higher QPS than 24 and 56 values respectively.
5. **Benchmark your own specific workloads.** Doing this during cache warm-up helps gauge the best value for the index build parameters, balancing accuracy with queries per second (QPS).

## Going into production

1. Decide if you are going to use indexes or not. You can skip the rest of this guide if you do not use indexes.
2. Over-provision RAM during preparation. You can scale down in step `5`, but it's better to start with a larger size to get the best results for RAM requirements. (We'd recommend at least 8XL if you're using Supabase.)
3. Upload your data to the database. If you use the [`vecs`](/docs/guides/ai/python/api) library, it will automatically generate an index with default parameters.
4. Run a benchmark using randomly generated queries and observe the results. Again, you can use the `vecs` library with the `ann-benchmarks` tool. Do it with probes set to 10 (default) and then with probes set to 100 or more, so QPS will be lower than 10.
4. Run a benchmark using randomly generated queries and observe the results. Again, you can use the `vecs` library with the `ann-benchmarks` tool. Do it with default values for index build parameters, you can later adjust them to get the best results.
5. Monitor the RAM usage, and save it as a note for yourself. You would likely want to use a compute add-on in the future that has the same amount of RAM that was used at the moment (both actual RAM usage and RAM used for cache and buffers).
6. Scale down your compute add-on to the one that would have the same amount of RAM used at the moment.
7. Repeat step 3 to load the data into RAM. You should see QPS increase on subsequent runs, and stop when it no longer increases. Then repeat the benchmark with probes set to a higher value if you haven't already performed it for that compute add-on size.
8. Run a benchmark using real queries and observe the results. You can use the `vecs` library for that as well with `ann-benchmarks` tool. Set probes to 10 (default) and then gradually increase/decrease probes until you see that both accuracy and QPS match your requirements.
9. If you want higher QPS and you don't expect to have frequent inserts and reindexing, you can increase `lists` constantly. You have to rebuild the index with a higher lists value and repeat steps 6-7 to find the best combination of `lists` and `probes` constants to achieve the best QPS and accuracy values. Higher `lists` mean that index will build slower, but you can achieve better QPS and accuracy. Higher probes mean that select queries will be slower, but you can achieve better accuracy.
7. Repeat step 3 to load the data into RAM. You should see QPS increase on subsequent runs, and stop when it no longer increases.
8. Run a benchmark using real queries and observe the results. You can use the `vecs` library for that as well with `ann-benchmarks` tool. Tweak `ef_search` for HNSW or `probes` for IVFFlat until you see that both accuracy and QPS match your requirements.
9. If you want higher QPS you can increase `m` and `ef_construction` for HNSW or `lists` for IVFFlat parameters (consider switching from IVF to HNSW). You have to rebuild the index with a higher `m` and `ef_construction` values and repeat steps 6-7 to find the best combination of `m`, `ef_construction` and `ef_search` constants to achieve the best QPS and accuracy values. Higher `m`, `ef_construction` mean that index will build slower, but you can achieve better QPS and accuracy. Higher `ef_search` mean that select queries will be slower, but you can achieve better accuracy.

## Useful links

Don't forget to check out the general [Production Checklist](/docs/guides/platform/going-into-prod) to ensure your project is secure, performant, and will remain available for your users.

You can look at our [Choosing Compute Add-on](/docs/guides/ai/choosing-compute-addon) guide to get a basic understanding of how much compute you might need for your workload.

Or take a look at our [pgvector 0.4.0 performance](https://supabase.com/blog/pgvector-performance) blog post to see what pgvector is capable of and how the above technique can be used to achieve the best results.
Or take a look at our [pgvector 0.5.0 performance](https://supabase.com/blog/increase-performance-pgvector-hnsw) and [pgvector 0.4.0 performance](https://supabase.com/blog/pgvector-performance) blog posts to see what pgvector is capable of and how the above technique can be used to achieve the best results.

<div>
<img
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified apps/docs/public/img/ai/going-prod/size-to-rps--dark.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified apps/docs/public/img/ai/going-prod/size-to-rps--light.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 758d674

Please sign in to comment.