-
Notifications
You must be signed in to change notification settings - Fork 7.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZOOKEEPER-3286: xid wrap-around causes connection loss/segfault when hitting predefined XIDs #831
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome.
Can we add a minimal test case ?
Regarding an automated test case: In the C client the XID counter is stored in a static variable within the scope of the get_xid function in st_adapter.c/mt_adapter.c, which makes writing a test case a little harder than for the Java case, because you can't set the initial value for the XID counter from outside that function. In order to be able to write a test case similar to what we have for Java would require moving the static XID counter to the library's global scope and making it accessible/settable from outside the library, which we probably don't want for regular use cases of the library. However, if that's acceptable I can modify the patch and cook up a unit test... |
The code looks good to me. As for the unit test, making the counter global strikes me as overkill too but I'll let someone who represents the project make the actual decision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please confirm the starting value of xid
in the multi-threaded case.
|
||
// The XID returned should not be negative to avoid collisions | ||
// with reserved XIDs, such as AUTH_XID or SET_WATCHES_XID. | ||
return fetch_and_add(&xid,1) & ~(1<<31); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will return 0
when the overflow happens at the first time, so when xid == INT_MIN
, right?
Which is not consistent with the single threaded case where we restart xid from 1 (similarly to the java client).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct. -- The first revision of the patch had behavior (and code) that was symmetric between ST/MT. -- I looked at the code (and ran some tests) and there is nothing special about XID being 0 in the C client.
I think at this point we have two options:
-
reset XID to 0 instead of 1 in the ST version of the code. This would give you symmetric behavior (even tho the code no longer looks symmetric).
-
If there is some opposition against ever having 0 returned, I can modify the MT code such that if you were about to return 0, you do another fetch and add. Something like:
result = 0;
while (result == 0) {
result = fetch_and_add(&xid,1) & ~(1<<31);
}
return result;
Please LMK what your preference is. IMHO none of the two solutions proposed has a correctness issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's anything wrong with using 0
as xid
, but I would stick to consistent behaviour between clients anyway.
So, I vote for option 2., but let's see what other people think.
I just saw ZOOKEEPER-3253, which seems to fix the issue for the Java client.
This fixes the issue for the C client, by avoiding returning negative XIDs.