Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqs: Optimize for concurrency #226

Merged
merged 2 commits into from
Dec 4, 2020
Merged

sqs: Optimize for concurrency #226

merged 2 commits into from
Dec 4, 2020

Conversation

antoineco
Copy link
Contributor

@antoineco antoineco commented Dec 1, 2020

Reopens #225, which was accidentally merged (I reverted it).

Closes #222

@antoineco
Copy link
Contributor Author

antoineco commented Dec 1, 2020

Performance on 2 cores with 2 receiver|processor|deleter per thread. No CPU throttling was observed with the current CPU limit of 1.

result-thrpt

@antoineco antoineco changed the title Sqs optimize sqs: Optimize for concurrency Dec 1, 2020
@antoineco
Copy link
Contributor Author

Here is the result of another load test with dynamic concurrency settings (+1 receiver|processor|deleter every 10s), virtually unlimited buffers (size > total messages in the SQS queue), and TriggerMesh's default CPU limit (500m):

result-thrpt

Things become relatively unstable beyond 6 receiver|processor|deleter, which is 3 per thread.

The default limit of 500m is causing a fair amount of CPU throttling:

image

@antoineco
Copy link
Contributor Author

The same load test as above, but this time without CPU limit. This time the performance remains quite stable until 12 receiver|processor|deleter (6 per thread) but figures did not double compared to the previous experiment, because CloudEvents are still sent sequentially after messages are received from the queue.

result-thrpt

Like what I observed before, the CPU usage remains below 400m:

image

@antoineco antoineco requested a review from sebgoa December 2, 2020 19:57
@antoineco
Copy link
Contributor Author

antoineco commented Dec 2, 2020

@sebgoa the decision is now about defining the default per Pod.

Do we want to

  1. Assume maximum performance per Pod is always a goal and raise the default request+limit to achieve that at 6 messages processors per thread? (-> 450 msg/s)

    Trade-off: we won't be able to schedule as many replicas per node, and it might be an overkill for more moderate traffic (500 msg/s feels like a lot to me)

  2. Take a more moderate stance and stick to this PR's values with 2 messages processors per thread? (-> 150 msg/s)

    Trade-off: scaling to higher rates requires horizontal auto-scaling, which we have to enable in a separate PR (this statement can apply to all cases, actually).

  3. Meet half-way, e.g. raise the default to 3 messages processors per thread? (-> 250 msg/s)

    Trade-off: we won't be able to schedule as many replicas per node as in 2., but still a bit more than in 1.

I would personally vote for 3 but follow-up with auto-scaling (#230), and maybe also #227 to spawn/terminate goroutines dynamically based on the number of messages being received.

@antoineco antoineco marked this pull request as ready for review December 2, 2020 20:25
Rewrite of the source adapter to spawn concurrent message processors
instead of executing a single loop sequentially.
The number of receivers, senders and deleters is based on the number of
available CPU cores.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Profile SQS source
1 participant