Add assertion benchmarks #165

cloud8421 · 2025-01-07T12:25:49Z

Closes #164

Add the first set of benchmarks for assertions.

Setup benchee
Wire with testing application
Write first example benchmark
Benchmark: standard LV string-comparison assertions
Benchmark: standard LV element-based assertions
Benchmark: PhoenixTest based assertions
Benchmark: PhoenixTest based assertions using within

CI will be addressed separately in another PR.

Environment set also to test so that the testing webapp is available

Includes formatter change as the file is in a new bench folder.

3 examples (plain assert, tag, id+tag). Examples are run each one with a separate session in order to avoid any implicit optimization deriving from the reuse of the same session. Note that this cannot be avoided when optimizations kick in when calling assert_has/2-3 on the same session multiple times.

Returns {:error, :nosession}

cloud8421 · 2025-01-08T10:31:51Z

EDIT: this was resolved.

@germsvel not sure if you came across this issue.

Using the standard LV test infra outside of test modules requires some ceremony due to the need to setup the @endpoint attribute (see https://github.com/germsvel/phoenix_test/pull/165/files#diff-3e1e203eb1762597a06f32a45c52f756ae5c8e42b20bfd4f2d955436e9a92393R11).

When running this benchmark with MIX_ENV=test mix run bench/assertions.exs, the test errors out while running lv_setup_fn/1 at https://github.com/germsvel/phoenix_test/pull/165/files#diff-3e1e203eb1762597a06f32a45c52f756ae5c8e42b20bfd4f2d955436e9a92393R59 with the following error:

10:21:26.581 [error] Task #PID<0.339.0> started from #PID<0.94.0> terminating
** (MatchError) no match of right hand side value: {:error, :nosession}
    bench/assertions.exs:61: PhoenixTestBenchmark.lv_setup_fn/1
    (benchee 1.3.1) lib/benchee/benchmark/runner.ex:100: Benchee.Benchmark.Runner.measure_scenario/2
    (elixir 1.18.1) lib/task/supervised.ex:101: Task.Supervised.invoke_mfa/2
    (elixir 1.18.1) lib/task/supervised.ex:36: Task.Supervised.reply/4
Function: #Function<2.121299024/0 in Benchee.Utility.Parallel.map/2>
    Args: []
** (EXIT from #PID<0.94.0>) an exception was raised:
    ** (MatchError) no match of right hand side value: {:error, :nosession}
        bench/assertions.exs:61: PhoenixTestBenchmark.lv_setup_fn/1
        (benchee 1.3.1) lib/benchee/benchmark/runner.ex:100: Benchee.Benchmark.Runner.measure_scenario/2
        (elixir 1.18.1) lib/task/supervised.ex:101: Task.Supervised.invoke_mfa/2
        (elixir 1.18.1) lib/task/supervised.ex:36: Task.Supervised.reply/4

The offending line is https://github.com/germsvel/phoenix_test/pull/165/files#diff-3e1e203eb1762597a06f32a45c52f756ae5c8e42b20bfd4f2d955436e9a92393R61:

{:ok, view, html} = live(conn, "/page/index")

I suspect I'm missing some setup code, but looking at a standard LV application I can't see any. The error seems to stem from https://github.com/phoenixframework/phoenix_live_view/blob/38be0a11eb8f509ac6995c5557d782327371820c/lib/phoenix_live_view/test/live_view_test.ex#L343, so unless you have any suggestion on how to fix this, I'll read the source from there and try to see what is going wrong.

cloud8421 · 2025-01-08T17:12:00Z

Never mind - it's just that I was using a static page path, I misread the router. Nothing to see here...

cloud8421 · 2025-01-08T17:53:35Z

We're getting somewhere.

For the sake of moving forward, I wrapped the entire benchmark in a ExUnit test. This is needed because LiveView helpers do not work unless they're run inside an actual test:

** (ArgumentError) LiveView helpers can only be invoked from the test process.

I'll think of a more elegant approach, but for now it will do.

Here's the output of a benchmark run on my machine (Mac mini m4):

Running ExUnit with seed: 302289, max_cases: 20

Operating System: macOS
CPU Information: Apple M4
Number of Available Cores: 10
Available memory: 16 GB
Elixir 1.18.1
Erlang 27.0
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 35 s

Benchmarking LiveView string matching ...
Benchmarking PhoenixTest.assert_has/2 ...
Benchmarking PhoenixTest.assert_has/3, id+tag selector ...
Benchmarking PhoenixTest.assert_has/3, tag selector ...
Benchmarking PhoenixTest.assert_has/3, using within id, tag selector ...
Calculating statistics...
Formatting results...

Name                                                              ips        average  deviation         median         99th %
LiveView string matching                                    1286.30 K        0.78 μs  ±1692.27%        0.75 μs        0.88 μs
PhoenixTest.assert_has/3, id+tag selector                      2.63 K      380.87 μs    ±11.92%      368.21 μs      510.93 μs
PhoenixTest.assert_has/3, tag selector                         2.60 K      385.34 μs    ±11.63%      374.25 μs      531.37 μs
PhoenixTest.assert_has/2                                       2.57 K      388.58 μs    ±18.67%      358.75 μs      583.52 μs
PhoenixTest.assert_has/3, using within id, tag selector        2.53 K      394.64 μs    ±12.53%      384.83 μs      551.06 μs

Comparison:
LiveView string matching                                    1286.30 K
PhoenixTest.assert_has/3, id+tag selector                      2.63 K - 489.91x slower +380.09 μs
PhoenixTest.assert_has/3, tag selector                         2.60 K - 495.67x slower +384.57 μs
PhoenixTest.assert_has/2                                       2.57 K - 499.83x slower +387.81 μs
PhoenixTest.assert_has/3, using within id, tag selector        2.53 K - 507.62x slower +393.86 μs
.
Finished in 36.9 seconds (0.01s on load, 36.9s async, 0.00s sync)
1 test, 0 failures

I'll now finish writing the missing benchmarks.

cloud8421 · 2025-01-08T17:58:23Z

We got a first stab, here's some more results:

Running ExUnit with seed: 415907, max_cases: 20

Operating System: macOS
CPU Information: Apple M4
Number of Available Cores: 10
Available memory: 16 GB
Elixir 1.18.1
Erlang 27.0
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 7 s

Benchmarking LiveView element assertion ...
Calculating statistics...
Formatting results...

Name                                 ips        average  deviation         median         99th %
LiveView element assertion       33.47 K       29.88 μs    ±13.24%       29.25 μs       39.13 μs
.
Finished in 7.1 seconds (0.01s on load, 7.1s async, 0.00s sync)
1 test, 0 failures
phoenix_test on  cloud8421/add-benchmark-infrastructure [$!] took 7s ❯ MIX_ENV=test mix run bench/assertions.exs
Running ExUnit with seed: 431627, max_cases: 20

Operating System: macOS
CPU Information: Apple M4
Number of Available Cores: 10
Available memory: 16 GB
Elixir 1.18.1
Erlang 27.0
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 49 s

Benchmarking LiveView id+tag selector ...
Benchmarking LiveView string matching ...
Benchmarking LiveView tag selector ...
Benchmarking PhoenixTest.assert_has/2 ...
Benchmarking PhoenixTest.assert_has/3, id+tag selector ...
Benchmarking PhoenixTest.assert_has/3, tag selector ...
Benchmarking PhoenixTest.assert_has/3, using within id, tag selector ...
Calculating statistics...
Formatting results...

Name                                                              ips        average  deviation         median         99th %
LiveView string matching                                    1303.86 K        0.77 μs  ±1797.03%        0.75 μs        0.83 μs
LiveView tag selector                                         34.09 K       29.33 μs     ±8.21%       29.88 μs       34.13 μs
LiveView id+tag selector                                      32.43 K       30.83 μs     ±7.03%       30.08 μs       36.33 μs
PhoenixTest.assert_has/3, id+tag selector                      2.62 K      381.02 μs    ±11.08%      367.13 μs      492.73 μs
PhoenixTest.assert_has/2                                       2.62 K      381.27 μs    ±11.58%      363.63 μs      499.48 μs
PhoenixTest.assert_has/3, tag selector                         2.62 K      381.82 μs    ±10.79%      372.92 μs      493.11 μs
PhoenixTest.assert_has/3, using within id, tag selector        2.58 K      387.90 μs    ±10.07%      379.04 μs      497.79 μs

Comparison:
LiveView string matching                                    1303.86 K
LiveView tag selector                                         34.09 K - 38.25x slower +28.57 μs
LiveView id+tag selector                                      32.43 K - 40.20x slower +30.06 μs
PhoenixTest.assert_has/3, id+tag selector                      2.62 K - 496.79x slower +380.25 μs
PhoenixTest.assert_has/2                                       2.62 K - 497.12x slower +380.50 μs
PhoenixTest.assert_has/3, tag selector                         2.62 K - 497.84x slower +381.06 μs
PhoenixTest.assert_has/3, using within id, tag selector        2.58 K - 505.77x slower +387.13 μs
.
Finished in 50.6 seconds (0.01s on load, 50.6s async, 0.00s sync)
1 test, 0 failures

cloud8421 · 2025-01-14T14:15:23Z

@germsvel just wanted to ask if you need more info to move this forward (no rush)? I know I left one point open (running tests on CI) because I wanted to focus on "is this useful at all" first. Thanks!

germsvel · 2025-01-14T20:48:08Z

@cloud8421 this is incredibly helpful! Thank you so much! 🙏

I don't think you should do any more work, since I wouldn't imagine this is something we'd need to do in every CI run or anything like that (unless you have an idea for that? In which case, I'm all ears).

But having this branch (and those initial benchmarks) will allow us to have an awesome starting point.

Update: I honestly didn't envision this running in CI, but now that you mention it, is that something you think we could do? Could be very interesting knowing how benchmarks change with changes, and having this be auto-generated. Would love your thoughts there.

cloud8421 · 2025-01-14T21:10:15Z

@germsvel thank you!

I think a weekly run against main would be enough, and I would just target latest Elixir/latest OTP. I don't know if it's possible to "natively" setup a notification threshold (i.e. we're on average 10% slower than last week) but considering that Benchee supports storing and loading results for comparison one can store the previous run as an artifact and compare.

cloud8421 · 2025-01-14T21:28:19Z

Elaborating further: the project moves at a speed where I think a weekly run is enough to surface issues, and if that happens one can easily bisect until the reason is clear.

As a maintainer I think you have an option to manually unblock these tests for PRs, so that they don't run automatically (because I think they would burn capacity fairly quickly).

germsvel · 2025-01-15T10:09:16Z

the project moves at a speed where I think a weekly run is enough to surface issues, and if that happens one can easily bisect until the reason is clear.

Yeah, I think weekly would be fine. And I certainly don't want speed regressions to be a blocker -- sometimes we may have to make things slightly slower. But I would like to keep an eye on it to make sure we're not getting super slow.

As a maintainer I think you have an option to manually unblock these tests for PRs, so that they don't run automatically (because I think they would burn capacity fairly quickly).

Yep, happy to enable that.

I guess my question is, do we include the CI work here? Or is that something for the future? We can merge this as-is, and leave CI for future work. I ran this locally, and it's really nice to have.

cloud8421 · 2025-01-15T12:56:47Z

I'd say if you're happy let's merge this and I'll tackle the CI bit separately. Thanks!

germsvel · 2025-01-16T20:52:42Z

Thanks so much @cloud8421!

cloud8421 added 6 commits January 7, 2025 12:17

Add benchee dependency

15fd098

Environment set also to test so that the testing webapp is available

Add sample benchmark test

a95c9a1

Includes formatter change as the file is in a new bench folder.

Document benchmark infra

f4177e0

Add assertion for within + tag

5b13780

Not working

5d6ffe2

Returns {:error, :nosession}

Working by wrapping in a ExUnit test

9e5dbf2

Add lv benchmarks for tag, id+tag

8040fe3

cloud8421 marked this pull request as ready for review January 8, 2025 17:57

cloud8421 changed the title ~~(WIP) Add assertion benchmarks~~ Add assertion benchmarks Jan 8, 2025

germsvel merged commit 549cd49 into germsvel:main Jan 16, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add assertion benchmarks #165

Add assertion benchmarks #165

cloud8421 commented Jan 7, 2025 •

edited

Loading

cloud8421 commented Jan 8, 2025 •

edited

Loading

cloud8421 commented Jan 8, 2025

cloud8421 commented Jan 8, 2025

cloud8421 commented Jan 8, 2025

cloud8421 commented Jan 14, 2025

germsvel commented Jan 14, 2025 •

edited

Loading

cloud8421 commented Jan 14, 2025

cloud8421 commented Jan 14, 2025

germsvel commented Jan 15, 2025 •

edited

Loading

cloud8421 commented Jan 15, 2025

germsvel commented Jan 16, 2025

Add assertion benchmarks #165

Add assertion benchmarks #165

Conversation

cloud8421 commented Jan 7, 2025 • edited Loading

cloud8421 commented Jan 8, 2025 • edited Loading

cloud8421 commented Jan 8, 2025

cloud8421 commented Jan 8, 2025

cloud8421 commented Jan 8, 2025

cloud8421 commented Jan 14, 2025

germsvel commented Jan 14, 2025 • edited Loading

cloud8421 commented Jan 14, 2025

cloud8421 commented Jan 14, 2025

germsvel commented Jan 15, 2025 • edited Loading

cloud8421 commented Jan 15, 2025

germsvel commented Jan 16, 2025

cloud8421 commented Jan 7, 2025 •

edited

Loading

cloud8421 commented Jan 8, 2025 •

edited

Loading

germsvel commented Jan 14, 2025 •

edited

Loading

germsvel commented Jan 15, 2025 •

edited

Loading