Add failure summary option to resultsComparer #1807

L2 · 2021-05-13T02:12:00Z

Prints the aggregate counts and benchmark id(s) for failures when it detects:
Base Valid + Diff Null,
Base Null + Diff Valid,
Base Null + Diff Null
Fix a bug when only failures are found, but we are still printing:
"No differences found between..."

- Prints the aggregate counts and benchmark id(s) for failures when it detects: Base Valid + Diff Null, Base Null + Diff Valid, Base Null + Diff Null - Fix a bug when only failures are found, but we are still printing: "No differences found between..."

L2 · 2021-05-13T02:16:45Z

Example output for when we compare base and diff for a single benchmark where the diff results are null but base results are valid:

Before:

No differences found between the benchmark results with threshold...

After:

Found at least 1 failure between the benchmark results.

With the failure summary option enabled, we also get the relevant counts and ids:

After:

Found at least 1 failure between the benchmarks results.
Failure Summary:
1x Base Valid, Diff Null Failure(s).
0x Base Null, Diff Valid Failure(s).
0x Base Null, Diff Null Failure(s).
Benchmark ID(s) for: Base Valid, Diff Null Failure(s)
-------
System.Collections.ContainsTrue<Int32>.SortedSet(Size: 512)

L2 · 2021-06-08T23:47:48Z

Any suggested improvements I can make to this feature? Thanks.

adamsitnik · 2021-06-10T05:32:01Z

@L2 Could you please say something more about the problem are you trying to solve? Are you running into a situation where some benchmarks runs have failed and you don't get an apples-to-apples comparison?

BenchmarkDotNet exposes a possibility to stop on the first error:

--stopOnFirstError true

And this is something that our CI uses, so that is why we most probably never get into such a situation.

L2 · 2021-06-10T17:30:12Z

Thanks @adamsitnik , sorry I should have added a bit more earlier on:

resultsComparer is an excellent tool and I think it should be independent of the results it receives and how they were generated. It would be great if it had some mechanism to detect failures.
Thanks, I did not know about --stopOnFirstError true. Say I receive a pair of results from someone and I want to compare them, without looking at the logs and json data itself or the BDN options that were used to generate the results, it's hard to know if any failed. resultsComparer could have an option that warns the user instead of ignoring the invalid results which could be interpreted as the results being statistically equivalent (no change).
I sometimes use the "*" BDN filter to run everything and would like to continue running even if there are a few failures. Feeding this into resultsComparer with the failure summary option would be an easy way for the user to know about the failures at the time of comparison.
Feeding in a pair of results where A is null but B is valid, still produces what I think is an incorrect statement: No differences found between the benchmark results with threshold.... To me this means that the results are statistically equivalent (no change), however this is not the case.

danmoseley · 2022-02-05T00:57:46Z

@adamsitnik looks like your questions were answered, do you think we should take this PR?

Remove extra s letter

d603c1c

danmoseley requested a review from adamsitnik June 9, 2021 17:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add failure summary option to resultsComparer #1807

Add failure summary option to resultsComparer #1807

L2 commented May 13, 2021

L2 commented May 13, 2021 •

edited

Loading

L2 commented Jun 8, 2021

adamsitnik commented Jun 10, 2021

L2 commented Jun 10, 2021

danmoseley commented Feb 5, 2022

Add failure summary option to resultsComparer #1807

Are you sure you want to change the base?

Add failure summary option to resultsComparer #1807

Conversation

L2 commented May 13, 2021

L2 commented May 13, 2021 • edited Loading

L2 commented Jun 8, 2021

adamsitnik commented Jun 10, 2021

L2 commented Jun 10, 2021

danmoseley commented Feb 5, 2022

L2 commented May 13, 2021 •

edited

Loading