-
Notifications
You must be signed in to change notification settings - Fork 7.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZOOKEEPER-4021: Solve Poll timeout failure caused by POLLNVAL #1680
base: branch-3.7
Are you sure you want to change the base?
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @qwedsazzcc,
Thank you for your investigation and contribution. I understand what you are reporting, and imagine that your patch effectively makes the 100% CPU usage situation disappear—but unless I am missing something, it does not address the root cause.
Unfortunately, I have been unable to reproduce the specific sequence you are observing.
According to POSIX, POLLNVAL
means "file descriptor not open," and while I could imagine some code closing the FD which is being polled, it is difficult to imagine how it would happen more than once and/or lead to a busy loop.
In your experience, is the problem deterministic? Is it "easy" to reproduce? Also: would you have log entries or some other kind of trace which might help figure out the exact conditions?
As for the error condition(s), which should indeed be handled, how about the following:
--- a/zookeeper-client/zookeeper-client-c/src/mt_adaptor.c
+++ b/zookeeper-client/zookeeper-client-c/src/mt_adaptor.c
@@ -388,7 +388,7 @@ void *do_io(void *v)
timeout=tv.tv_sec * 1000 + (tv.tv_usec/1000);
poll(fds,maxfd,timeout);
- if (fd != -1) {
+ if (fd != 1 && (fds[1].revents&POLLNVAL) == 0) {
interest=(fds[1].revents&POLLIN)?ZOOKEEPER_READ:0;
interest|=((fds[1].revents&POLLOUT)||(fds[1].revents&POLLHUP))?ZOOKEEPER_WRITE:0;
}
Not resetting interest
should lead to check_events
accessing the socket and noticing the closed state. Or does such a patch still cause 100% CPU usage?
Cheers, -D
The error log in console is
It's call getaddrinfo too fast cause hight cpu use . I run zookeeper server in local , and add a dns rule in hosts (127.0.0.1 zookeeper) , my process connected zookeeper with zookeeper:2181 , remove the rule in hosts and then stop zookeeper server immediately . It's might reproduce . I used this
It didn't solve my problem. Thanks for your reply |
Zookeeper c client request server by dns , when the zookeeper server and dns server down at the same time , c client poll will set revent to POLLNVAL . At this time The parameter timeout of poll is not valid. it cause very frequent requests and high cpu use