Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZOOKEEPER-3286: xid wrap-around causes connection loss/segfault when hitting predefined XIDs #831

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ccma14
Copy link

@ccma14 ccma14 commented Feb 22, 2019

I just saw ZOOKEEPER-3253, which seems to fix the issue for the Java client.
This fixes the issue for the C client, by avoiding returning negative XIDs.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome.
Can we add a minimal test case ?

@ccma14
Copy link
Author

ccma14 commented Feb 22, 2019

Regarding an automated test case:

In the C client the XID counter is stored in a static variable within the scope of the get_xid function in st_adapter.c/mt_adapter.c, which makes writing a test case a little harder than for the Java case, because you can't set the initial value for the XID counter from outside that function.

In order to be able to write a test case similar to what we have for Java would require moving the static XID counter to the library's global scope and making it accessible/settable from outside the library, which we probably don't want for regular use cases of the library.

However, if that's acceptable I can modify the patch and cook up a unit test...

@enixon
Copy link

enixon commented Mar 6, 2019

The code looks good to me.

As for the unit test, making the counter global strikes me as overkill too but I'll let someone who represents the project make the actual decision.

Copy link
Contributor

@anmolnar anmolnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please confirm the starting value of xid in the multi-threaded case.


// The XID returned should not be negative to avoid collisions
// with reserved XIDs, such as AUTH_XID or SET_WATCHES_XID.
return fetch_and_add(&xid,1) & ~(1<<31);
Copy link
Contributor

@anmolnar anmolnar Mar 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will return 0 when the overflow happens at the first time, so when xid == INT_MIN, right?
Which is not consistent with the single threaded case where we restart xid from 1 (similarly to the java client).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. -- The first revision of the patch had behavior (and code) that was symmetric between ST/MT. -- I looked at the code (and ran some tests) and there is nothing special about XID being 0 in the C client.

I think at this point we have two options:

  1. reset XID to 0 instead of 1 in the ST version of the code. This would give you symmetric behavior (even tho the code no longer looks symmetric).

  2. If there is some opposition against ever having 0 returned, I can modify the MT code such that if you were about to return 0, you do another fetch and add. Something like:

    result = 0;
    while (result == 0) {
    result = fetch_and_add(&xid,1) & ~(1<<31);
    }
    return result;

Please LMK what your preference is. IMHO none of the two solutions proposed has a correctness issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's anything wrong with using 0 as xid, but I would stick to consistent behaviour between clients anyway.
So, I vote for option 2., but let's see what other people think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants