Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run tests only if connection with server is made. #1

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

suvenrj
Copy link
Owner

@suvenrj suvenrj commented Mar 12, 2023

Flaky Test

com.alibaba.wasp.jdbc.TestJdbcStatement

Why is it flaky?

The following call TEST_UTIL.startMiniCluster(3) on line 57 passes the time-out time on occasions when the Apache ZooKeeper server fails to start. (This might happen due to reasons not under our control)
Note: I ran this test several times with NonDex and it doesn't always give this error.

Non Dex log before making the changes:

Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 200.986 sec <<< FAILURE!

com.alibaba.wasp.jdbc.TestJdbcStatement Time elapsed: -1,573,282.773 sec <<< ERROR!

java.io.IOException: Waiting for startup of standalone server

at org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.startup(MiniZooKeeperCluster.java:180)

at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniZKCluster(HBaseTestingUtility.java:518)

at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniZKCluster(HBaseTestingUtility.java:507)

at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:623)

at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:575)

at com.alibaba.wasp.WaspTestingUtility.startMiniCluster(WaspTestingUtility.java:172)

at com.alibaba.wasp.WaspTestingUtility.startMiniCluster(WaspTestingUtility.java:144)

at com.alibaba.wasp.jdbc.TestJdbcStatement.beforeClass(TestJdbcStatement.java:57)

com.alibaba.wasp.jdbc.TestJdbcStatement Time elapsed: -1,573,282.773 sec <<< ERROR!

java.lang.NullPointerException: null

at com.alibaba.wasp.jdbc.TestJdbcStatement.afterClass(TestJdbcStatement.java:64)

What change have I made?

As the server didn't start, the connection with it couldn't be established and the object 'conn' (59) continued to have a 'null' value. This is the reason why we got a null pointer exception when the program reached 'conn.close()' (64) in the 'afterclass' function. I have added the if-else checks to ensure that the tests run only when the connection with server could be established.

@suvenrj suvenrj marked this pull request as draft March 12, 2023 11:00
Copy link

@winglam winglam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please improve your PR to (1) make the least amount of changes as possible (see the "Files changed" tab) -- any changes that were not necessary to remove the flakiness should NOT appear in this tab) and (2) update the description of your PR to more closely match the following. LALAYANG/hbase#1

@suvenrj
Copy link
Owner Author

suvenrj commented Mar 18, 2023

Please improve your PR to (1) make the least amount of changes as possible (see the "Files changed" tab) -- any changes that were not necessary to remove the flakiness should NOT appear in this tab) and (2) update the description of your PR to more closely match the following. LALAYANG/hbase#1

Made the changes as you said.

Copy link

@winglam winglam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many times did you run this test in NonDex to observe it to fail without your changes?
How many times did you run this test in NonDex to observe it to not fail with your changes?

Does this test fail when you run it without NonDex for the same number of times?

}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove unnecessary changes. You may have added an empty line at the end of this file when there wasn't one.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why, but the GitHub editor is weirdly adding a line automatically at the end of the file. Even if I remove the line, it appears again. It appears in the 'Files changed' page but doesn't appear when I just view the commit.

}

@Test
public void testStatement() throws SQLException, IOException,
InterruptedException {
if (conn!=(Connection)null){
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this if statement, you may use the following instead.
https://junit.org/junit4/javadoc/4.12/org/junit/Assume.html#assumeNotNull(java.lang.Object...)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the change.

@suvenrj
Copy link
Owner Author

suvenrj commented Mar 19, 2023

How many times did you run this test in NonDex to observe it to fail without your changes?

It gives an error once in about 100 times.

How many times did you run this test in NonDex to observe it to not fail with your changes?

The change I have made doesn't fix the cause of the error i.e Hadoop ZookeeperCluster failing to start. So, it would still give an error once in about 100 times.

Does this test fail when you run it without NonDex for the same number of times?

No.

@@ -29,6 +29,7 @@
import org.junit.AfterClass;
import org.junit.BeforeClass;
import org.junit.Test;
import org.junit.Assume;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep imports sorted.

@@ -61,6 +62,7 @@ public static void beforeClass() throws Exception {

@AfterClass
public static void afterClass() throws Exception {
Assume.assumeNotNull(conn);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you skip the test result if you cannot close the connection?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have made a mistake here. I should have maybe preserved the if-else check here so that the conn.close() line isn't executed if conn is null. 'Assume' isn't probably the right thing to use here.

@winglam
Copy link

winglam commented Mar 20, 2023

You need a bigger sample size than 100 if it fails once out of 100.

Can you please run the test and report back your results from
(1) 1000 times without your changes and with NonDex?
(2) 1000 times with your changes and with NonDex?

(1) should show the test failing 10 or so times, while (2) should show the test failing 0 times (instead it should just be skipped/ignored).

@suvenrj
Copy link
Owner Author

suvenrj commented Mar 20, 2023

You need a bigger sample size than 100 if it fails once out of 100.

Can you please run the test and report back your results from (1) 1000 times without your changes and with NonDex? (2) 1000 times with your changes and with NonDex?

(1) should show the test failing 10 or so times, while (2) should show the test failing 0 times (instead it should just be skipped/ignored).

Okay, will do that.

@suvenrj
Copy link
Owner Author

suvenrj commented Mar 25, 2023

(1) 1000 times without your changes and with NonDex?

I ran the test 1000 times and it failed once. I used the following bash script for it:

for d in */ ; do cd $d if [[ -s failures ]] then echo "$d" fi cd - done

(2) 1000 times with your changes and with NonDex?

Even with my changes, NonDex categorises the failure as an Error (not skipped/ignored). I think this is because the error props up in the @Beforeclass function. So, even though the test is skipped, that piece of code is executed, hence NonDex reports it.

@winglam
Copy link

winglam commented Mar 27, 2023

The bash script that you wrote does not seem to be running NonDex.

When you run the test with your changes and NonDex, does the Surefire XML reports say that the test failed or that the test was skipped?

@suvenrj
Copy link
Owner Author

suvenrj commented Mar 28, 2023

The bash script that you wrote does not seem to be running NonDex.

The bash script is meant for counting number of failures among the 1000 runs. For running NonDeX, I used:
mvn -pl . edu.illinois:nondex-maven-plugin:2.1.1:nondex -Dtest=com.alibaba.wasp.jdbc.TestJdbcStatement -DnondexRuns=1000

When you run the test with your changes and NonDex, does the Surefire XML reports say that the test failed or that the test was skipped?

Neither. It categorises it as an 'Error'.

@winglam
Copy link

winglam commented Mar 28, 2023

Thanks for the clarifications. Please let me know when you have found a fix that makes the test pass when it is run with NonDex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants