Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZOOKEEPER-1364: Add orthogonal fault injection mechanism/framework #123

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

ZOOKEEPER-1364: Add orthogonal fault injection mechanism/framework #123

wants to merge 1 commit into from

Conversation

eribeiro
Copy link
Contributor

Add Byteman (http://byteman.jboss.org/) as a dependency to ZooKeeper project to enable the dynamic injection of failures.

@eribeiro eribeiro changed the title Add orthogonal fault injection mechanism/framework ZOOKEEPER-1364: Add orthogonal fault injection mechanism/framework Dec 13, 2016
@eribeiro
Copy link
Contributor Author

@yufeldman I really like your ZOOKEEPER-2549 patch, but I think it would be an awesome opportunity of introducing a much needed failure injection framework into ZooKeeper. So, I wrote this patch with your changes as example to see what is your opinion and if we could backport those changes to your PR or at least discuss the introduction of Byteman. IMHO, it would save a lot of changes due by ZK-2549, but I am really open to hear your opinion and concerns. Cheers! 😃

@yufeldman
Copy link

@eribeiro I am not sure that critical bug is the place to introduce injection framework to Zookeeper. Can't it be done as a separate JIRA?

@eribeiro
Copy link
Contributor Author

@yufeldman Sure, totally agree about this being a separate issue! 👍

My only concern is that IF we decide to introduce this injection framework into ZK after merging ZOOKEEPER-2549 then some of the mocking and reflection changes introduced by ZOOKEEPER-2549 for the purposes of testing become redundant, so basically we would have to undo parts of it (but not the bug fix, of course). See this PR for a concrete idea of what I am trying to say.

But you raised a very important point: ZOOKEEPER-2549 is critical, and delaying its merging is gonna be bad for the project release, IMHO. So, I am all for you keep going your work (congrats, btw) and run this as a future issue. Or whatever the committers find more suitable. /cc @fpj @rgs1 @hanm @phunt

@hanm
Copy link
Contributor

hanm commented Dec 15, 2016

Good initiative, @eribeiro! This will be a valuable addition to the project. Please keep driving this forward.

Regarding dependency between ZOOKEEPER-2549 and ZOOKEEPER-1364, the mock stuff ZOOKEEPER-2549 introduced is localized so it shouldn't take a lot effort to replace that with what to be done in ZOOKEEPER-1364. Since ZOOKEEPER-2549 is in relative good shape to be merged in my opinion here is to get ZOOKEEPER-2549 in first then start iterating on ZOOKEEPER-1364.

@anmolnar
Copy link
Contributor

@eribeiro This is still a very important initiative to improve ZooKeeper testing. Are u still interested in working on this?

@maoling
Copy link
Member

maoling commented Dec 27, 2019

ping @eribeiro @yufeldman
Are you free to refresh this patch? If you don't reply me within ten days, I will pick it up. Of course, the new patch's commit message will also be signed by your name.

@yufeldman
Copy link

yufeldman commented Dec 27, 2019 via email

@asf-ci
Copy link

asf-ci commented Dec 27, 2019

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build-maven/1764/

@eribeiro
Copy link
Contributor Author

@maoling @yufeldman Sure, no problem you picking up this issue, maoling. :) This is a pretty nice opportunity to see this merged to ZK codebase. Please, count on me to any help/discussion about it, btw. Cheers!

@maoling
Copy link
Member

maoling commented Jan 14, 2020

@eribeiro @yufeldman I take a close look at this issue today, I guess this issue had been fixed

  • org.apache.zookeeper.server.ServerCnxn#serializeRecord it now throws an IOException
  • To recheck this issue, I inject a fault into that code:
    RULE trace zk_nio_exception_injects_faults
    CLASS org.apache.zookeeper.server.ServerCnxn
    METHOD serializeRecord
    IF true
    DO
      traceln("*** zk nio exception injects faults***");
      throw new IOException()

don't have a hang phenomenon, the client got the following:

Closing socket connection. Attempting reconnect except it is a SessionExpiredException.
org.apache.zookeeper.ClientCnxn$SessionTimeoutException: Client session timed out, have not heard from server in 20002ms for session id 0x1000222974b0001
  at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1230)
KeeperErrorCode = ConnectionLoss for /01-14-7
WATCHER::

@anmolnar
Copy link
Contributor

@maoling Are u working on this then?
Can I close this PR?

asfgit pushed a commit that referenced this pull request May 29, 2021
…ZooKeeper

- **Byteman** is a powerful tool to inject faults during runtime, especially for distributed system
- In the future, we can also introduce it into our unit test infra, just as hadoop, cassandra did. [PR-123](#123) is a good starting.
- more details in the [ZOOKEEPER-3601](https://issues.apache.org/jira/browse/ZOOKEEPER-3601)

Author: maoling <[email protected]>

Reviewers: Andor Molnar <[email protected]>

Closes #1135 from maoling/ZOOKEEPER-3601
anuragmadnawat1 pushed a commit to anuragmadnawat1/zookeeper that referenced this pull request Nov 2, 2022
…ZooKeeper

- **Byteman** is a powerful tool to inject faults during runtime, especially for distributed system
- In the future, we can also introduce it into our unit test infra, just as hadoop, cassandra did. [PR-123](apache#123) is a good starting.
- more details in the [ZOOKEEPER-3601](https://issues.apache.org/jira/browse/ZOOKEEPER-3601)

Author: maoling <[email protected]>

Reviewers: Andor Molnar <[email protected]>

Closes apache#1135 from maoling/ZOOKEEPER-3601
anuragmadnawat1 added a commit to anuragmadnawat1/zookeeper that referenced this pull request Nov 2, 2022
…ZooKeeper (#150)

- **Byteman** is a powerful tool to inject faults during runtime, especially for distributed system
- In the future, we can also introduce it into our unit test infra, just as hadoop, cassandra did. [PR-123](apache#123) is a good starting.
- more details in the [ZOOKEEPER-3601](https://issues.apache.org/jira/browse/ZOOKEEPER-3601)

Author: maoling <[email protected]>

Reviewers: Andor Molnar <[email protected]>

Closes apache#1135 from maoling/ZOOKEEPER-3601

Co-authored-by: maoling <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants