-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove automatic retries in case of errors #132
Remove automatic retries in case of errors #132
Conversation
Codecov Report
@@ Coverage Diff @@
## master #132 +/- ##
==========================================
+ Coverage 99.08% 99.15% +0.07%
==========================================
Files 30 30
Lines 1639 1662 +23
==========================================
+ Hits 1624 1648 +24
+ Misses 15 14 -1
Continue to review full report at Codecov.
|
Currently this gems tries to do automatic retries to the next (or same) endpoint in case of an error. This has two issues, stack overflow (retry is by recursion) and not respecting timeouts. Stack overflow is solvable, however the timeout issue not really given current architecture. Cleanest solution is to just rotate the endpoints and move the responsibility for retry to the calling application. Fixes davissp14#130 Fixes davissp14#131
5c3943d
to
685f0ac
Compare
Looks like the travis build passed https://travis-ci.org/davissp14/etcdv3-ruby/ but it's not updated here for some reason? |
Could we possibly get some feedback on this PR? :) |
I'll do my best to take a look this evening. Sorry for the delay!
…On Thu, Feb 14, 2019 at 5:12 AM graywolf-at-work ***@***.***> wrote:
Could we possibly get some feedback on this PR? :)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#132 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAZ0fiSxHPzPDOztPZrN4KNlYsHmMe_cks5vNUShgaJpZM4a0QTt>
.
--
Shaun Davis
Software Engineer
Compose - IBM <http://compose.io>
Phone: (650)288-5791
|
What's currently in place should align pretty closely with what Etcd defines as the command-timeout. The dial-timeout, which I think you are referring too hasn't been implemented yet. The generic naming of Would the implementation of a dial-based timeout address your major concerns? Aside, from the obvious stack-overflow issue.
|
The problem is that behaviour we want to have is that call to the etcd will finish inside X seconds. Sure, if X is for example 2, I could give it But I'm not sure how to make it work for e.g. watch API. I mean, I want the watch to take at most 2 seconds and spent as much of that as possible watching. I could play around with something like How would you recommend to implement this using the two timeouts you propose? EDIT: Also based on grpc documentation, it seems that |
If you're looking for a strict holistic timeout that includes both the time it takes to dial the connection and execution time for the command, I would probably just wrap the command in a Timeout block within your code. There's simply too many factors that could affect the time it takes to establish a new connection, and none of these factors are related to the The I hope that makes sense! |
I'm looking into the
It takes 11 seconds compared to expected around 1. And there is the Compared with using
That looks much more as expected. Could you point me in the right direction of what I'm doing wrong? |
Your results will be off because you're timing the execution of the entire script, rather than just the timeout block. #!/usr/local/env ruby
require 'etcdv3'
require 'timeout'
require 'benchmark'
cl = Etcdv3.new(endpoints: "http://localhost:2379")
Benchmark.bm do |x|
# 1 second timeout
x.report do
begin
Timeout::timeout(1) do
cl.watch('foo')
end
rescue Timeout::Error
end
end
# 5 second timeout
x.report do
begin
Timeout::timeout(5) do
cl.watch('foo2')
end
rescue Timeout::Error
end
end
# 10 second timeout
x.report do
begin
Timeout::timeout(10) do
cl.watch('foo3')
end
rescue Timeout::Error
end
end
end Results
|
Hello again @davissp14,
I believe this is the source of confusion. In our perception, a method which takes a (It can usually delay them a little bit longer, but that extra overhead should be constant and cannot really be practically avoided, because you have to return from the function, perhaps do some cleanup etc.) Providing strict guarantees to the caller is a great feature of code, because then, other code which needs to provide strict guarantees can be built atop. On the other hand, you cannot call code which isn't dependable from something that ought to be dependable.
Yes, you can achieve something that looks similar using Since the lower layer (GRPC in this case) is ultimately the part which implements the timeout-case behavior, it has got enough information to do proper clean-up of resources after failed connection attempts and unsuccessful commands. On the other hand, with (There are great articles which describe the issues in details. Even if But I have to say that I understand your point too. Because the reconnect logic comes handy a lot of the time when you don't care all that much about having proper time-outs all the time, and having to retry yourself can be a bit of an overkill. Hence I'd propose a compromise: keep the current behavior and add a If Would you fined that acceptable? In any case, thank you very much. |
@davissp14 Hi Shaun, would you be willing to further discuss this, please? The proposed behavior in my last comment is IMHO a nice compromise and we're blocked on this. As usual, thanks for the time and energy you put into maintenance of this gem! |
@dcepelik Sorry for the delay, life has been a little crazy as of late. The |
Will implement tomorrow, thank you for greenlighting :) |
This reverts commit 685f0ac.
Sometimes auto-retrying calls is not a good idea. There might be user requirements on precision of timeouts (they don't really work if multiple endpoints specified) or on how to handle (and log) failures. This commit adds new flags that allows changing the default behaviour, therefore not to auto-retry.
Is it acceptable like this? If yes, could we also ask for release again? If further changes are required, let me know. @dcepelik please also have a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks pretty good. Few things that need to be addressed though.
I think that should be all? :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few more small comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for working through this! Nice work! Will get a release up today. |
Just reminding if you would be so kind to do the release for us :) Thank you |
Sorry for the delay, just pushed it up. Thank again! |
Currently this gems tries to do automatic retries to the next (or same)
endpoint in case of an error. This has two issues, stack overflow (retry
is by recursion) and not respecting timeouts. Stack overflow is
solvable, however the timeout issue not really given current
architecture.
Cleanest solution is to just rotate the endpoints and move the
responsibility for retry to the calling application.
Fixes #130
Fixes #131