Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add assertion benchmarks #165

Merged

Conversation

cloud8421
Copy link
Contributor

@cloud8421 cloud8421 commented Jan 7, 2025

Closes #164

Add the first set of benchmarks for assertions.

  • Setup benchee
  • Wire with testing application
  • Write first example benchmark
  • Benchmark: standard LV string-comparison assertions
  • Benchmark: standard LV element-based assertions
  • Benchmark: PhoenixTest based assertions
  • Benchmark: PhoenixTest based assertions using within

CI will be addressed separately in another PR.

Environment set also to test so that the testing webapp is available
Includes formatter change as the file is in a new bench folder.
3 examples (plain assert, tag, id+tag). Examples are run each one with a
separate session in order to avoid any implicit optimization deriving
from the reuse of the same session. Note that this cannot be avoided
when optimizations kick in when calling assert_has/2-3 on the same
session multiple times.
Returns {:error, :nosession}
@cloud8421
Copy link
Contributor Author

cloud8421 commented Jan 8, 2025

EDIT: this was resolved.

@germsvel not sure if you came across this issue.

Using the standard LV test infra outside of test modules requires some ceremony due to the need to setup the @endpoint attribute (see https://github.com/germsvel/phoenix_test/pull/165/files#diff-3e1e203eb1762597a06f32a45c52f756ae5c8e42b20bfd4f2d955436e9a92393R11).

When running this benchmark with MIX_ENV=test mix run bench/assertions.exs, the test errors out while running lv_setup_fn/1 at https://github.com/germsvel/phoenix_test/pull/165/files#diff-3e1e203eb1762597a06f32a45c52f756ae5c8e42b20bfd4f2d955436e9a92393R59 with the following error:

10:21:26.581 [error] Task #PID<0.339.0> started from #PID<0.94.0> terminating
** (MatchError) no match of right hand side value: {:error, :nosession}
    bench/assertions.exs:61: PhoenixTestBenchmark.lv_setup_fn/1
    (benchee 1.3.1) lib/benchee/benchmark/runner.ex:100: Benchee.Benchmark.Runner.measure_scenario/2
    (elixir 1.18.1) lib/task/supervised.ex:101: Task.Supervised.invoke_mfa/2
    (elixir 1.18.1) lib/task/supervised.ex:36: Task.Supervised.reply/4
Function: #Function<2.121299024/0 in Benchee.Utility.Parallel.map/2>
    Args: []
** (EXIT from #PID<0.94.0>) an exception was raised:
    ** (MatchError) no match of right hand side value: {:error, :nosession}
        bench/assertions.exs:61: PhoenixTestBenchmark.lv_setup_fn/1
        (benchee 1.3.1) lib/benchee/benchmark/runner.ex:100: Benchee.Benchmark.Runner.measure_scenario/2
        (elixir 1.18.1) lib/task/supervised.ex:101: Task.Supervised.invoke_mfa/2
        (elixir 1.18.1) lib/task/supervised.ex:36: Task.Supervised.reply/4

The offending line is https://github.com/germsvel/phoenix_test/pull/165/files#diff-3e1e203eb1762597a06f32a45c52f756ae5c8e42b20bfd4f2d955436e9a92393R61:

{:ok, view, html} = live(conn, "/page/index")

I suspect I'm missing some setup code, but looking at a standard LV application I can't see any. The error seems to stem from https://github.com/phoenixframework/phoenix_live_view/blob/38be0a11eb8f509ac6995c5557d782327371820c/lib/phoenix_live_view/test/live_view_test.ex#L343, so unless you have any suggestion on how to fix this, I'll read the source from there and try to see what is going wrong.

@cloud8421
Copy link
Contributor Author

Never mind - it's just that I was using a static page path, I misread the router. Nothing to see here...

@cloud8421
Copy link
Contributor Author

We're getting somewhere.

For the sake of moving forward, I wrapped the entire benchmark in a ExUnit test. This is needed because LiveView helpers do not work unless they're run inside an actual test:

** (ArgumentError) LiveView helpers can only be invoked from the test process.

I'll think of a more elegant approach, but for now it will do.

Here's the output of a benchmark run on my machine (Mac mini m4):

Running ExUnit with seed: 302289, max_cases: 20

Operating System: macOS
CPU Information: Apple M4
Number of Available Cores: 10
Available memory: 16 GB
Elixir 1.18.1
Erlang 27.0
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 35 s

Benchmarking LiveView string matching ...
Benchmarking PhoenixTest.assert_has/2 ...
Benchmarking PhoenixTest.assert_has/3, id+tag selector ...
Benchmarking PhoenixTest.assert_has/3, tag selector ...
Benchmarking PhoenixTest.assert_has/3, using within id, tag selector ...
Calculating statistics...
Formatting results...

Name                                                              ips        average  deviation         median         99th %
LiveView string matching                                    1286.30 K        0.78 μs  ±1692.27%        0.75 μs        0.88 μs
PhoenixTest.assert_has/3, id+tag selector                      2.63 K      380.87 μs    ±11.92%      368.21 μs      510.93 μs
PhoenixTest.assert_has/3, tag selector                         2.60 K      385.34 μs    ±11.63%      374.25 μs      531.37 μs
PhoenixTest.assert_has/2                                       2.57 K      388.58 μs    ±18.67%      358.75 μs      583.52 μs
PhoenixTest.assert_has/3, using within id, tag selector        2.53 K      394.64 μs    ±12.53%      384.83 μs      551.06 μs

Comparison:
LiveView string matching                                    1286.30 K
PhoenixTest.assert_has/3, id+tag selector                      2.63 K - 489.91x slower +380.09 μs
PhoenixTest.assert_has/3, tag selector                         2.60 K - 495.67x slower +384.57 μs
PhoenixTest.assert_has/2                                       2.57 K - 499.83x slower +387.81 μs
PhoenixTest.assert_has/3, using within id, tag selector        2.53 K - 507.62x slower +393.86 μs
.
Finished in 36.9 seconds (0.01s on load, 36.9s async, 0.00s sync)
1 test, 0 failures

I'll now finish writing the missing benchmarks.

@cloud8421 cloud8421 marked this pull request as ready for review January 8, 2025 17:57
@cloud8421 cloud8421 changed the title (WIP) Add assertion benchmarks Add assertion benchmarks Jan 8, 2025
@cloud8421
Copy link
Contributor Author

We got a first stab, here's some more results:

Running ExUnit with seed: 415907, max_cases: 20

Operating System: macOS
CPU Information: Apple M4
Number of Available Cores: 10
Available memory: 16 GB
Elixir 1.18.1
Erlang 27.0
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 7 s

Benchmarking LiveView element assertion ...
Calculating statistics...
Formatting results...

Name                                 ips        average  deviation         median         99th %
LiveView element assertion       33.47 K       29.88 μs    ±13.24%       29.25 μs       39.13 μs
.
Finished in 7.1 seconds (0.01s on load, 7.1s async, 0.00s sync)
1 test, 0 failures
phoenix_test on  cloud8421/add-benchmark-infrastructure [$!] took 7s ❯ MIX_ENV=test mix run bench/assertions.exs
Running ExUnit with seed: 431627, max_cases: 20

Operating System: macOS
CPU Information: Apple M4
Number of Available Cores: 10
Available memory: 16 GB
Elixir 1.18.1
Erlang 27.0
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 49 s

Benchmarking LiveView id+tag selector ...
Benchmarking LiveView string matching ...
Benchmarking LiveView tag selector ...
Benchmarking PhoenixTest.assert_has/2 ...
Benchmarking PhoenixTest.assert_has/3, id+tag selector ...
Benchmarking PhoenixTest.assert_has/3, tag selector ...
Benchmarking PhoenixTest.assert_has/3, using within id, tag selector ...
Calculating statistics...
Formatting results...

Name                                                              ips        average  deviation         median         99th %
LiveView string matching                                    1303.86 K        0.77 μs  ±1797.03%        0.75 μs        0.83 μs
LiveView tag selector                                         34.09 K       29.33 μs     ±8.21%       29.88 μs       34.13 μs
LiveView id+tag selector                                      32.43 K       30.83 μs     ±7.03%       30.08 μs       36.33 μs
PhoenixTest.assert_has/3, id+tag selector                      2.62 K      381.02 μs    ±11.08%      367.13 μs      492.73 μs
PhoenixTest.assert_has/2                                       2.62 K      381.27 μs    ±11.58%      363.63 μs      499.48 μs
PhoenixTest.assert_has/3, tag selector                         2.62 K      381.82 μs    ±10.79%      372.92 μs      493.11 μs
PhoenixTest.assert_has/3, using within id, tag selector        2.58 K      387.90 μs    ±10.07%      379.04 μs      497.79 μs

Comparison:
LiveView string matching                                    1303.86 K
LiveView tag selector                                         34.09 K - 38.25x slower +28.57 μs
LiveView id+tag selector                                      32.43 K - 40.20x slower +30.06 μs
PhoenixTest.assert_has/3, id+tag selector                      2.62 K - 496.79x slower +380.25 μs
PhoenixTest.assert_has/2                                       2.62 K - 497.12x slower +380.50 μs
PhoenixTest.assert_has/3, tag selector                         2.62 K - 497.84x slower +381.06 μs
PhoenixTest.assert_has/3, using within id, tag selector        2.58 K - 505.77x slower +387.13 μs
.
Finished in 50.6 seconds (0.01s on load, 50.6s async, 0.00s sync)
1 test, 0 failures

@cloud8421
Copy link
Contributor Author

@germsvel just wanted to ask if you need more info to move this forward (no rush)? I know I left one point open (running tests on CI) because I wanted to focus on "is this useful at all" first. Thanks!

@germsvel
Copy link
Owner

germsvel commented Jan 14, 2025

@cloud8421 this is incredibly helpful! Thank you so much! 🙏

I don't think you should do any more work, since I wouldn't imagine this is something we'd need to do in every CI run or anything like that (unless you have an idea for that? In which case, I'm all ears).

But having this branch (and those initial benchmarks) will allow us to have an awesome starting point.

Update: I honestly didn't envision this running in CI, but now that you mention it, is that something you think we could do? Could be very interesting knowing how benchmarks change with changes, and having this be auto-generated. Would love your thoughts there.

@cloud8421
Copy link
Contributor Author

@germsvel thank you!

I think a weekly run against main would be enough, and I would just target latest Elixir/latest OTP. I don't know if it's possible to "natively" setup a notification threshold (i.e. we're on average 10% slower than last week) but considering that Benchee supports storing and loading results for comparison one can store the previous run as an artifact and compare.

@cloud8421
Copy link
Contributor Author

Elaborating further: the project moves at a speed where I think a weekly run is enough to surface issues, and if that happens one can easily bisect until the reason is clear.

As a maintainer I think you have an option to manually unblock these tests for PRs, so that they don't run automatically (because I think they would burn capacity fairly quickly).

@germsvel
Copy link
Owner

germsvel commented Jan 15, 2025

the project moves at a speed where I think a weekly run is enough to surface issues, and if that happens one can easily bisect until the reason is clear.

Yeah, I think weekly would be fine. And I certainly don't want speed regressions to be a blocker -- sometimes we may have to make things slightly slower. But I would like to keep an eye on it to make sure we're not getting super slow.

As a maintainer I think you have an option to manually unblock these tests for PRs, so that they don't run automatically (because I think they would burn capacity fairly quickly).

Yep, happy to enable that.

I guess my question is, do we include the CI work here? Or is that something for the future? We can merge this as-is, and leave CI for future work. I ran this locally, and it's really nice to have.

@cloud8421
Copy link
Contributor Author

I'd say if you're happy let's merge this and I'll tackle the CI bit separately. Thanks!

@germsvel germsvel merged commit 549cd49 into germsvel:main Jan 16, 2025
4 checks passed
@germsvel
Copy link
Owner

Thanks so much @cloud8421!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Question on performance of assertions
2 participants