Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow blocking the queue and retrying messages from the failed message's offset #103

Closed
wants to merge 4 commits into from

Conversation

martinthenth
Copy link

@martinthenth martinthenth commented Jun 10, 2022

Hi there 👋🏻 ! I'd like to implement the option to block the queue and retry failed messages from the failed message's offset. It's the same need as in #30.

Blocking the queue and retrying failed has the advantage that bugs can be detected early and can be fixed on the consumer (and possibly the producer's) side, without losing the order of messages. It has the disadvantage that non-failing messages will be processed many times over. And it can only work with a single BroadwayKafka.Producer per partition. But that's a penalty that I'm willing to take to keep a consistent messages order.

The flow:

  1. BroadwayKafka consumes messages as usual.
  2. One (or more) messages fail to be processed.
  3. The failed messages are marked as failed.
  4. The BroadwayKafka producer retries messages from the earliest failed offset.

It will only work if the Producer commits offsets after the queue in its state has been drained.

This PR implements this flow, and on my local environment I can confirm that the failed message is continuously retried. I added a configuration option called retry_on_failure with the default set to false.

Unfortunately, I'm struggling with the test a bit. It looks like the current implementation of MessageServer can't retry messages and only pushes (but maybe I'm mistaken). Ideally, the messages in the queue are retried until the whole queue is empty, and after 2 confirmed retries we stop the process and the test passes.

Is it possible to add retry logic to ProducerTest without rewriting MessageServer? Feedback is very welcome 🙏🏻

Thanks for writing such great libraries! 🙇🏻

@josevalim
Copy link
Member

Hi @martinthenth! 👋

Thank you for the PR but I am not sure if it should be Broadway's job to do the retry. You can achieve similar by wrapping part of your code in a try/catch. And the best part is that, if you wrap, you can have full control over what to retry, when, which back-off to use and so on.

Thoughts?

@martinthenth
Copy link
Author

Hi!

You're right. I experimented with handle_message/3 a bit, adding message retrying and checking the consumer group offsets. And the offset is not committed until after the message is processed. Which makes sense with the drained? check in the code. It also works with multiple processors 😄

Thanks for helping! 🙇🏻 I'll close this PR as it's not necessary 🙂

@martinthenth martinthenth deleted the dev branch June 11, 2022 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants