regression2sv-benchmarks improvements #383

sim642 · 2021-10-11T07:46:40Z

This issue is to document possible improvements and fixes to the regression2sv-benchmarks script. Not intended to be done for SV-COMP 2022 any more.

Support multi-property benchmarks.

Technically this is already possible to an extent (data race and assert properties might end up in the same benchmark), but the validity of these combinations is questionable. For example, benchmarks containing data races wouldn't be valid for unreach-call at all.
Also, our testing language doesn't allow specifying expected output for assert and race on the same line, which makes it difficult to properly label such multi-property benchmarks.
Support successful asserts with irrelevant comments (i.e. not indicative of any other result).

We have some asserts, where the comment is just a human-readable explanation, but this currently makes it unacceptable for the script. This was to ensure that we don't accidentally submit untrue assertions.
Extra checks have to be put in place to consider comments that aren't failing or unknown or ignored to be handled as succeeding.
Support asserts with NOWARN.

Sometimes we do this to indicate unreachable code (i.e. Goblint shouldn't say anything about the assertion success/unknown/fail). These can be converted to reach_error() with expected unreach-call verdict being true (given compatibility with other asserts).
Support always true asserts.

Sometimes we use always true asserts (e.g. assert(1)) to indicate that the location should be reachable. As an SV-COMP benchmark, it would always be correct since it never fails, but such benchmark doesn't actually require the verifier to know it's reachable.
Instead these should be converted to reach_error() with expected unreach-call verdict being false (given compatibility with other asserts).
Check reachability of all normal asserts.

Generalizing the previous point about always true asserts, in our testing language we actually require all asserts to be (may-)reachable. Therefore it would be possible to generate a special reachability variant for each assert individually, by converting it to reach_error() with expected unreach-call verdict being false.
This requires that all of our asserts are actually reachable and aren't simply bogus checks on unreachable code (which should be explicitly NOWARN).
Change from 32bit to 64bit architecture?

Currently all benchmarks are generated to have 32bit architecture, but Goblint doesn't actually care and analyzes everything with the architecture where it was compiled on (make sizes of primitive types configurable with current machine as default #54). Since we all probably use 64bit machines and some singlethreaded benchmarks do checks specific to 64bit integer ranges, 64bit would probably be more appropriate.
Generate all singlethreaded benchmarks.

Currently the script is only restricted to multithreaded benchmarks. This is purely for logistical reasons:
- We have much more singlethreaded tests and they'd have to be carefully considered for sv-benchmarks submission.
- In sv-benchmarks the directory c/goblint-regression is part of ConcurrencySafety, so only multithreaded benchmarks should go in there. To submit singlethreaded benchmarks we would have to split the directories in sv-benchmarks first.

The text was updated successfully, but these errors were encountered:

sim642 added feature sv-comp SV-COMP (analyses, results), witnesses testing labels Oct 11, 2021

sim642 mentioned this issue Dec 16, 2021

Integrate regression2sv-benchmarks.py goblint/GobExec#9

Open

sim642 added student-job benchmarking labels Sep 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

regression2sv-benchmarks improvements #383

regression2sv-benchmarks improvements #383

sim642 commented Oct 11, 2021

regression2sv-benchmarks improvements #383

regression2sv-benchmarks improvements #383

Comments

sim642 commented Oct 11, 2021