-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OutlierDetection might require longer time for an ejected host to prove itself #37602
Comments
I think it's by design. If a host is ejected frequently, then the eject time for this host will relatively longer. But the eject time will never longer than the max_eject time. |
There is no doubt that the eject time for the host should be longer. Now the question is, how long is good enough? Let's do some math work. Define variables: maxEjectTimeBackoff = upper_bound(max_eject_time / base_eject_time)
realMaxEjectTimeBackoff is the max host_monitors_[host]->ejectTimeBackoff(), which is equal to So there will always be a one-off that doesn't contribute to the max ejection. BTW, I wonder why outlier detection doesn't use exponential backoff, which is more natural for continuously failing nodes. |
I think just because there is no strong requirement to that so there is no related implementation. |
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions. |
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions. |
If you are reporting any crash or any potential security issue, do not
open an issue in this repo. Please report the issue via emailing
[email protected] where the issue will be triaged appropriately.
Title: OutlierDetection might require longer time for an ejected host to prove itself
Description:
envoy/source/common/upstream/outlier_detection_impl.cc
Lines 371 to 375 in f5663c1
As
base_eject_time * monitor->ejectTimeBackoff()
is expected to be smaller than max_eject_time, we can get a result that maxEjectTimeBackoff should be not greater thanupper_bound(max_eject_time / base_eject_time)
.Assumed max_eject_time = 7, base_eject_time = 2, so maxEjectTimeBackoff = 4.
However, according to
envoy/source/common/upstream/outlier_detection_impl.cc
Lines 516 to 520 in f5663c1
the maxEjectTimeBackoff could be 5. Since when maxEjectTimeBackoff = 4, the
base_eject_time * monitor->ejectTimeBackoff()
already be larger than max_eject_time, I am not sure whether it's by design to continue increasing the ejectTimeBackoff to 5.A larger maxEjectTimeBackoff will require the host to spend a longer time to decrease the ejectTimeBackoff back to zero.
Change the condition to:
can make the maxEjectTimeBackoff equal to the one calculated from the max_eject_time cap.
The text was updated successfully, but these errors were encountered: