Skip to content

Commit

Permalink
change rps - qps
Browse files Browse the repository at this point in the history
  • Loading branch information
egor-romanov authored Sep 6, 2023
1 parent eba940b commit 83b55da
Showing 1 changed file with 12 additions and 12 deletions.
24 changes: 12 additions & 12 deletions apps/www/_blog/2023-07-13-pgvector-performance.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Our goals in this article are:

1. To show the strengths and limitations of the _current version_ of pgvector.
2. Highlight some improvements that are coming to pgvector.
3. Prove to you that it's completely viable for production workloads and give you some tips on using it at scale. We'll show you how to run 1 million Open AI embeddings at ~1800 requests per second with 91% accuracy, or 670 requests per second with 98% accuracy.
3. Prove to you that it's completely viable for production workloads and give you some tips on using it at scale. We'll show you how to run 1 million Open AI embeddings at ~1800 queries per second with 91% accuracy, or 670 queries per second with 98% accuracy.

## Benchmark Methodology

Expand Down Expand Up @@ -119,7 +119,7 @@ The resulting figures were significantly different after these changes.
With the changes above and probes set to 10, pgvector was faster and more accurate:

- accuracy@10 of 0.91
- RPS (requests per second) of 380
- QPS (queries per second) of 380

<div>
<img
Expand All @@ -139,7 +139,7 @@ With the changes above and probes set to 10, pgvector was faster and more accura
If we increase the probes from 10 to 40, pgvector was not just substantially faster but also boasted almost the same accuracy as Qdrant:

- accuracy@10 of 0.98
- RPS of 140
- QPS of 140

<div>
<img
Expand All @@ -156,7 +156,7 @@ If we increase the probes from 10 to 40, pgvector was not just substantially fas

### Scaling the database

Another key takeaway is that the performance scales predictably with the size of the database. For instance, a 4XL instance achieves accuracy@10 of 0.98 and RPS of 270 with probes set to 40. Moreover, an 8XL compute add-on analogously obtains accuracy@10 of 0.98 and an RPS of 470, surpassing the results of Qdrant.
Another key takeaway is that the performance scales predictably with the size of the database. For instance, a 4XL instance achieves accuracy@10 of 0.98 and QPS of 270 with probes set to 40. Moreover, an 8XL compute add-on analogously obtains accuracy@10 of 0.98 and an QPS of 470, surpassing the results of Qdrant.

<div className="bg-gray-300 rounded-lg px-6 py-2 italic">

Expand All @@ -177,13 +177,13 @@ The Qdrant benchmark uses “default” configuration and is in not indicative o
/>
</div>

Although more compute is required to match Qdrant's accuracy and RPS levels concurrently, this is still a satisfying outcome. It means that it's not a _necessity_ to use another vector database. You can put everything in Postgres to lower your operational complexity.
Although more compute is required to match Qdrant's accuracy and QPS levels concurrently, this is still a satisfying outcome. It means that it's not a _necessity_ to use another vector database. You can put everything in Postgres to lower your operational complexity.

### Final results: pgvector performance

Putting it all together, we find that we can predictably scale our database to match the performance we need.

With a 64-core, 256 GB server we achieve ~1800 RPS and 0.91 accuracy. This is for pgvector 0.4.0, and we've heard that the latest version (0.4.4) already has significant improvements. We'll release those benchmarks as soon as we have them.
With a 64-core, 256 GB server we achieve ~1800 QPS and 0.91 accuracy. This is for pgvector 0.4.0, and we've heard that the latest version (0.4.4) already has significant improvements. We'll release those benchmarks as soon as we have them.

<div>
<img
Expand All @@ -208,7 +208,7 @@ Another way to improve performance without throwing more compute would be to inc

We ran a test to measure the impact of list size: we uploaded 90,000 vectors from the Wikipedia dataset and then queried 10,000 vectors from the same dataset. The documentation recommends to use `lists` constant of `number of vectors / 1000`. In this case, it would be 90.

But as our experiment shows, we can improve RPS if we increase `lists` (i.e. with more lists in the index we need to get less index data to get the same accuracy). So for 95% accuracy, we can take any of:
But as our experiment shows, we can improve QPS if we increase `lists` (i.e. with more lists in the index we need to get less index data to get the same accuracy). So for 95% accuracy, we can take any of:

- 3% of index data = 270 lists
- 6% of index data = 90 lists
Expand Down Expand Up @@ -287,20 +287,20 @@ First, a few generic tips which you can pick and choose from:
2. Prefer `inner-product` to `L2` or `Cosine` distances if your vectors are normalized (like `text-embedding-ada-002`). If embeddings are not normalized, `Cosine` distance should give the best results with an index.
3. **Pre-warm your database.** Implement the warm-up technique we described earlier before transitioning to production.
4. **Establish your workload.** Increasing the lists constant for the pgvector index can accelerate your queries (at the expense of a slower build). For instance, for benchmarks with OpenAI embeddings, we employed a `lists` constant of 2000 (`number of vectors / 500`) as opposed to the suggested 1000 (`number of vectors / 1000`).
5. **Benchmark your own specific workloads.** Doing this during cache warm-up helps gauge the best value for the `probes` constant, balancing accuracy with RPS.
5. **Benchmark your own specific workloads.** Doing this during cache warm-up helps gauge the best value for the `probes` constant, balancing accuracy with QPS.

### Going into production

Before running your pgvector workload in production, here are a few steps you can take to maximize performance.

1. Over-provision RAM during preparation. You can scale down in step `5`, but it's better to start with a larger size to get the best results for RAM requirements. (We'd recommend at least 8XL if you're using Supabase.)
2. Upload your data to the database. If you use [`vecs`](https://supabase.com/docs/guides/ai/python/api) library, it will automatically generate an index with default parameters.
3. Run a benchmark using randomly generated queries and see the results. Again, you can use `vecs` library with the `ann-benchmarks` tool. Do it with probes set to 10 (default) and then with probes set to 100 or more, so RPS will be lower than 10.
3. Run a benchmark using randomly generated queries and see the results. Again, you can use `vecs` library with the `ann-benchmarks` tool. Do it with probes set to 10 (default) and then with probes set to 100 or more, so QPS will be lower than 10.
4. Take a look at the RAM usage, and save it as a note for yourself. You would likely want to use compute add-on in the future that would have the same amount of RAM as used at the moment (both actual RAM usage and RAM used for cache and buffers).
5. Scale down your compute add-on to the one that would have the same amount of RAM as used at the moment.
6. Repeat step 3. to load the data into RAM. You should see that RPS is increased on subsequent runs, and stop when it no longer increases. Then repeat the benchmark with probes set to a higher value as well if you didn't do it before on that compute add-on size.
7. Run a benchmark using real queries and see the results. You can use `vecs` library for that as well with `ann-benchmarks` tool. Do it with probes set to 10 (default) and then gradually increase/decrease probes value until you see that both accuracy and RPS match your requirements.
8. If you want higher RPS and you don't expect to have frequent inserts and reindexing, you can increase `lists` constantly. You have to rebuild the index with higher lists value and repeat steps 6-7 to find the best combination of `lists` and `probes` constants to achieve the best RPS and accuracy values. Higher `lists` mean that index will build slower, but you can achieve better RPS and accuracy. Higher probes mean that select queries will be slower, but you can achieve better accuracy.
6. Repeat step 3. to load the data into RAM. You should see that QPS is increased on subsequent runs, and stop when it no longer increases. Then repeat the benchmark with probes set to a higher value as well if you didn't do it before on that compute add-on size.
7. Run a benchmark using real queries and see the results. You can use `vecs` library for that as well with `ann-benchmarks` tool. Do it with probes set to 10 (default) and then gradually increase/decrease probes value until you see that both accuracy and QPS match your requirements.
8. If you want higher QPS and you don't expect to have frequent inserts and reindexing, you can increase `lists` constantly. You have to rebuild the index with higher lists value and repeat steps 6-7 to find the best combination of `lists` and `probes` constants to achieve the best QPS and accuracy values. Higher `lists` mean that index will build slower, but you can achieve better QPS and accuracy. Higher probes mean that select queries will be slower, but you can achieve better accuracy.

## The pgvector roadmap

Expand Down

0 comments on commit 83b55da

Please sign in to comment.