Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undo update_suite ignoring precision difference for UNKNOWN #286

Merged
merged 3 commits into from
Jul 20, 2021

Conversation

sim642
Copy link
Member

@sim642 sim642 commented Jul 14, 2021

In #196 among many other changes we merged this innocent-looking change: 0cb2c9c. This PR reverts that.

During #278 I looked over all the UNKNOWN asserts to see if making assert refine the state would cause problems. I saw cases where problems should have appeared but didn't due to the reverted change, which completely silenced all the cases where we still have an UNKNOWN annotation but suddenly output success instead. I only realized this when I used regtest (which awfully does similar expected comparisons within Goblint...) and saw different output.

Because we haven't completely gone through the entire regression test suite to recategorize each UNKNOWN to either UNKNOWN! or TODO, then we still have cases where UNKNOWN is used with the meaning that it must be unknown. Therefore the script should report if we're suddenly unsoundly reporting success or fail. See #110.

@vogler
Copy link
Collaborator

vogler commented Jul 14, 2021

I only realized this when I used regtest (which awfully does similar expected comparisons within Goblint...) and saw different output.

I agree with 'awful' concerning duplication, but IMO it'd be better for goblint to handle these things instead of some ruby script - ideally we'd get rid of it in favor of an OCaml executable or some minimal shell script (parallel) with goblint doing most of the work.

@michael-schwarz
Copy link
Member

michael-schwarz commented Jul 14, 2021

@jerhard and I were tossing this back and forth over lunch and something we discussed at some point was using something along the lines of __goblint_check(a == 42, __GOBLINT_UNKNOWN) or __goblint_check(x ==17, __GOBLINT_SUCCESS) in the regression tests instead of having an assert(...) //UNKNOWN.

Then, the expected result is immediately obvious to Goblint itself, and Goblint can either output things for all asserts or only those where something failed without having to somehow do a regex on the comments.

Also, once could use the syntactic search to identify those places where __goblint_check(...) is called that are unreachable and then also warn there that the assert is unreachable. (These are the cases where warnings are missing in ./regtest.sh vs the ruby script).

The question that is still open is how to integrate warnings different from asserts into this setting.

Opinions?

@vogler
Copy link
Collaborator

vogler commented Jul 14, 2021

Also, once could use the syntactic search to identify those places where __goblint_check(...) is called that are unreachable and then also warn there that the assert is unreachable. (These are the cases where warnings are missing in ./regtest.sh vs the ruby script).

Good point to keep in mind.
This (together with the rest?) could be done as post-processing in finalize of the assert-analysis.

using something along the lines of __goblint_check(a == 42, __GOBLINT_UNKNOWN) or __goblint_check(x ==17, __GOBLINT_SUCCESS) in ehe regression tests instead of having an assert(...) //UNKNOWN.

Pro:

  • type-safe (constant instead of comment)
  • available in CIL instead of having to read the comment from source

Con:

  • can't compile and execute it anymore without including another file?
  • assert is easily understood and portable (assert.h is available on every system)

@sim642
Copy link
Member Author

sim642 commented Jul 14, 2021

Tbh, I don't really like the idea of this regression testing check (dbg.regression for regtest) being inside Goblint itself at all. It's ideologically wrong to have the testing logic right in the middle of Goblint (which would be both the tester and the testee simultaneously) instead of being independent. And having to write __goblint_check(x == 17, __GOBLINT_SUCCESS) is so much more verbose than a standard assert(x == 17).

But that's all beside the point for this issue of update_suite not reporting possible unsoundness cases. If we want to discuss completely redoing the regression testing architecture, it's best to have another issue for that. And that won't solve the fact that we still have ambiguous uses of UNKNOWN.

@vogler
Copy link
Collaborator

vogler commented Jul 14, 2021

And that won't solve the fact that we still have ambiguous uses of UNKNOWN.

#288 (comment)
If we did something like this, we'd at least see those diffs.

$ grep -E 'UNKNOWN$' -r tests/regression | wc -l
313

All these need to be replaced with UNKNOWN! xor TODO?

@sim642
Copy link
Member Author

sim642 commented Jul 14, 2021

All these need to be replaced with UNKNOWN! xor TODO?

This is the idea of #110, yes. Although we also have some cases like this:

g = 100;
// This is only unknown because exp.earlyglobs is on
assert(g == 100); //UNKNOWN!

It's not UNKNOWN! because in the concrete semantics, it is known. It's not TODO because it shouldn't be fixed, because it's the whole point of earlyglobs. So probably one needs a third UNKNOWN!-like category for things which should remain unknown but only by our own choice of imprecision, not by concrete semantics.

@sim642
Copy link
Member Author

sim642 commented Jul 20, 2021

This revealed an issue with a congruences test now: #260 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants