Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add failure summary option to resultsComparer #1807

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

L2
Copy link
Contributor

@L2 L2 commented May 13, 2021

  • Prints the aggregate counts and benchmark id(s) for failures when it detects:
    Base Valid + Diff Null,
    Base Null + Diff Valid,
    Base Null + Diff Null
  • Fix a bug when only failures are found, but we are still printing:
    "No differences found between..."

- Prints the aggregate counts and benchmark id(s) for failures when it detects:
  Base Valid + Diff Null,
  Base Null + Diff Valid,
  Base Null + Diff Null
- Fix a bug when only failures are found, but we are still printing:
  "No differences found between..."
@L2
Copy link
Contributor Author

L2 commented May 13, 2021

Example output for when we compare base and diff for a single benchmark where the diff results are null but base results are valid:

Before:

No differences found between the benchmark results with threshold...

After:

Found at least 1 failure between the benchmark results.

With the failure summary option enabled, we also get the relevant counts and ids:

After:

Found at least 1 failure between the benchmarks results.
Failure Summary:
1x Base Valid, Diff Null Failure(s).
0x Base Null, Diff Valid Failure(s).
0x Base Null, Diff Null Failure(s).
Benchmark ID(s) for: Base Valid, Diff Null Failure(s)
-------
System.Collections.ContainsTrue<Int32>.SortedSet(Size: 512)

@L2
Copy link
Contributor Author

L2 commented Jun 8, 2021

Any suggested improvements I can make to this feature? Thanks.

@danmoseley danmoseley requested a review from adamsitnik June 9, 2021 17:44
@adamsitnik
Copy link
Member

@L2 Could you please say something more about the problem are you trying to solve? Are you running into a situation where some benchmarks runs have failed and you don't get an apples-to-apples comparison?

BenchmarkDotNet exposes a possibility to stop on the first error:

--stopOnFirstError true

And this is something that our CI uses, so that is why we most probably never get into such a situation.

@L2
Copy link
Contributor Author

L2 commented Jun 10, 2021

Thanks @adamsitnik , sorry I should have added a bit more earlier on:

  • resultsComparer is an excellent tool and I think it should be independent of the results it receives and how they were generated. It would be great if it had some mechanism to detect failures.
  • Thanks, I did not know about --stopOnFirstError true. Say I receive a pair of results from someone and I want to compare them, without looking at the logs and json data itself or the BDN options that were used to generate the results, it's hard to know if any failed. resultsComparer could have an option that warns the user instead of ignoring the invalid results which could be interpreted as the results being statistically equivalent (no change).
  • I sometimes use the "*" BDN filter to run everything and would like to continue running even if there are a few failures. Feeding this into resultsComparer with the failure summary option would be an easy way for the user to know about the failures at the time of comparison.
  • Feeding in a pair of results where A is null but B is valid, still produces what I think is an incorrect statement: No differences found between the benchmark results with threshold.... To me this means that the results are statistically equivalent (no change), however this is not the case.

@danmoseley
Copy link
Member

@adamsitnik looks like your questions were answered, do you think we should take this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants