Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement dog_response; improve doc of gaussian_blur #261

Merged
merged 7 commits into from
Mar 18, 2025

Conversation

Force1ess
Copy link
Contributor

Resolve #245

@Force1ess
Copy link
Contributor Author

This related to a bugfix: kornia/kornia#3137

Copy link
Member

@edgarriba edgarriba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks quite good. Can we benchmark this against image-rs or if we find another implementation. To have a sense how’s the behaviour?

@Force1ess
Copy link
Contributor Author

I checked their repo, but I did not find the implementation of difference_of_gaussians, only gaussian_blur.
How should we benchmark with image-rs, should we focus on benchmarking gaussian_blur?
Or I try to implement dog_response in several parallel ways, to pick up the best one.

@edgarriba
Copy link
Member

You could try to implement dog with image-rs (two blurs+diff)? And yes let´s try few and choose the best one

@Force1ess Force1ess changed the title implement dog_response; improve doc of gaussian_blur implement dog_response; improve doc of gaussian_blur Mar 13, 2025
@Force1ess
Copy link
Contributor Author

Force1ess commented Mar 13, 2025

Benchmark Result

dog_response_row_parallel: use parallel::par_iter_rows_val_two
dog_response_serial: compute serially
dog_response_rayon: use rayon::par_iter_mut
extern_dog_response_serial: image::imageops::blur and compute serially

I think dog_response_serial is good enough, as dog_response is simply a matrix subtraction on two gaussian blurred images.

Dog Response/dog_response_row_parallel/32x32
time: [52.236 µs 53.426 µs 55.127 µs]
thrpt: [18.575 Melem/s 19.167 Melem/s 19.603 Melem/s]
change:
time: [-15.583% -12.184% -7.9591%] (p = 0.00 < 0.05)
thrpt: [+8.6473% +13.875% +18.460%]
Performance has improved.
Found 4 outliers among 30 measurements (13.33%)
4 (13.33%) high severe
Dog Response/dog_response_serial/32x32
time: [18.577 µs 18.617 µs 18.652 µs]
thrpt: [54.901 Melem/s 55.003 Melem/s 55.121 Melem/s]
change:
time: [-1.6123% -0.6873% -0.0245%] (p = 0.09 > 0.05)
thrpt: [+0.0245% +0.6920% +1.6387%]
No change in performance detected.
Found 1 outliers among 30 measurements (3.33%)
1 (3.33%) high mild
Dog Response/dog_response_rayon/32x32
time: [60.706 µs 60.997 µs 61.371 µs]
thrpt: [16.685 Melem/s 16.788 Melem/s 16.868 Melem/s]
change:
time: [-19.414% -18.230% -16.902%] (p = 0.00 < 0.05)
thrpt: [+20.339% +22.295% +24.091%]
Performance has improved.
Dog Response/extern_dog_response/32x32
time: [28.923 µs 29.134 µs 29.393 µs]
thrpt: [34.839 Melem/s 35.147 Melem/s 35.405 Melem/s]
change:
time: [-0.6379% +0.2214% +0.9998%] (p = 0.62 > 0.05)
thrpt: [-0.9899% -0.2209% +0.6420%]
No change in performance detected.
Found 2 outliers among 30 measurements (6.67%)
1 (3.33%) high mild
1 (3.33%) high severe
Dog Response/dog_response_row_parallel/512x512
time: [4.4438 ms 4.4544 ms 4.4688 ms]
thrpt: [58.662 Melem/s 58.850 Melem/s 58.992 Melem/s]
change:
time: [-12.305% -8.1625% -4.4853%] (p = 0.00 < 0.05)
thrpt: [+4.6959% +8.8880% +14.031%]
Performance has improved.
Found 2 outliers among 30 measurements (6.67%)
2 (6.67%) high severe
Dog Response/dog_response_serial/512x512
time: [4.3211 ms 4.3287 ms 4.3372 ms]
thrpt: [60.440 Melem/s 60.559 Melem/s 60.666 Melem/s]
change:
time: [-22.728% -15.953% -9.0685%] (p = 0.00 < 0.05)
thrpt: [+9.9729% +18.981% +29.413%]
Performance has improved.
Found 1 outliers among 30 measurements (3.33%)
1 (3.33%) high severe
Dog Response/dog_response_rayon/512x512
time: [4.4721 ms 4.4827 ms 4.4944 ms]
thrpt: [58.327 Melem/s 58.479 Melem/s 58.618 Melem/s]
change:
time: [-1.3529% -0.2746% +0.5836%] (p = 0.63 > 0.05)
thrpt: [-0.5802% +0.2754% +1.3715%]
No change in performance detected.
Found 5 outliers among 30 measurements (16.67%)
3 (10.00%) high mild
2 (6.67%) high severe
Dog Response/extern_dog_response/512x512
time: [7.0199 ms 7.0853 ms 7.1460 ms]
thrpt: [36.684 Melem/s 36.998 Melem/s 37.343 Melem/s]
change:
time: [-2.4167% -1.0859% +0.1821%] (p = 0.12 > 0.05)
thrpt: [-0.1817% +1.0978% +2.4766%]
No change in performance detected.
Found 2 outliers among 30 measurements (6.67%)
2 (6.67%) high mild
Benchmarking Dog Response/dog_response_row_parallel/8192x8192: Warming up for 3.0000 s
Warning: Unable to complete 30 samples in 5.0s. You may wish to increase target time to 36.2s, or reduce sample count to 10.
Dog Response/dog_response_row_parallel/8192x8192
time: [1.1962 s 1.2004 s 1.2053 s]
thrpt: [55.679 Melem/s 55.906 Melem/s 56.104 Melem/s]
change:
time: [-4.0215% -2.9624% -2.0077%] (p = 0.00 < 0.05)
thrpt: [+2.0488% +3.0528% +4.1900%]
Performance has improved.
Found 5 outliers among 30 measurements (16.67%)
2 (6.67%) high mild
3 (10.00%) high severe
Benchmarking Dog Response/dog_response_serial/8192x8192: Warming up for 3.0000 s
Warning: Unable to complete 30 samples in 5.0s. You may wish to increase target time to 36.6s, or reduce sample count to 10.
Dog Response/dog_response_serial/8192x8192
time: [1.2129 s 1.2143 s 1.2161 s]
thrpt: [55.185 Melem/s 55.264 Melem/s 55.328 Melem/s]
change:
time: [+1.5804% +1.8012% +2.0060%] (p = 0.00 < 0.05)
thrpt: [-1.9665% -1.7694% -1.5558%]
Performance has regressed.
Found 2 outliers among 30 measurements (6.67%)
1 (3.33%) high mild
1 (3.33%) high severe
Benchmarking Dog Response/dog_response_rayon/8192x8192: Warming up for 3.0000 s
Warning: Unable to complete 30 samples in 5.0s. You may wish to increase target time to 36.2s, or reduce sample count to 10.
Dog Response/dog_response_rayon/8192x8192
time: [1.1930 s 1.1941 s 1.1952 s]
thrpt: [56.148 Melem/s 56.200 Melem/s 56.250 Melem/s]
change:
time: [+0.2691% +0.5910% +0.8902%] (p = 0.00 < 0.05)
thrpt: [-0.8823% -0.5876% -0.2684%]
Change within noise threshold.
Found 1 outliers among 30 measurements (3.33%)
1 (3.33%) high mild
Benchmarking Dog Response/extern_dog_response/8192x8192: Warming up for 3.0000 s
Warning: Unable to complete 30 samples in 5.0s. You may wish to increase target time to 153.3s, or reduce sample count to 10.
Dog Response/extern_dog_response/8192x8192
time: [5.0597 s 5.0643 s 5.0691 s]
thrpt: [13.239 Melem/s 13.251 Melem/s 13.263 Melem/s]
change:
time: [-0.9447% -0.4477% -0.0653%] (p = 0.04 < 0.05)
thrpt: [+0.0653% +0.4497% +0.9537%]
Change within noise threshold.

@Force1ess
Copy link
Contributor Author

Hi @edgarriba, do you have any further suggestions or comments?

Copy link
Member

@edgarriba edgarriba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting to see that serial is the one with the better trade off, right ?

@Force1ess
Copy link
Contributor Author

interesting to see that serial is the one with the better trade off, right ?

yes, this is why we need benchmark :)

Copy link
Member

@edgarriba edgarriba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also proposed in another PR about the possibility to provide the three variants serial/parallel_row/parallel_full with proper docs and hints about performance. What do you think ?

@Force1ess
Copy link
Contributor Author

Here's the newest benchmark result
Dog Response/dog_response_row_parallel/32x32
time: [64.043 µs 64.574 µs 65.041 µs]
thrpt: [15.744 Melem/s 15.858 Melem/s 15.989 Melem/s]
change:
time: [-2.3239% -0.2280% +1.8266%] (p = 0.83 > 0.05)
thrpt: [-1.7938% +0.2285% +2.3792%]
No change in performance detected.
Found 2 outliers among 30 measurements (6.67%)
1 (3.33%) low severe
1 (3.33%) low mild
Dog Response/dog_response_serial/32x32
time: [18.504 µs 18.534 µs 18.559 µs]
thrpt: [55.176 Melem/s 55.249 Melem/s 55.338 Melem/s]
change:
time: [-1.2647% -0.4501% +0.1370%] (p = 0.26 > 0.05)
thrpt: [-0.1368% +0.4522% +1.2809%]
No change in performance detected.
Found 8 outliers among 30 measurements (26.67%)
2 (6.67%) low severe
3 (10.00%) low mild
1 (3.33%) high mild
2 (6.67%) high severe
Dog Response/dog_response_rayon/32x32
time: [72.157 µs 74.545 µs 76.233 µs]
thrpt: [13.432 Melem/s 13.737 Melem/s 14.191 Melem/s]
change:
time: [-7.1346% -3.4486% -0.1878%] (p = 0.06 > 0.05)
thrpt: [+0.1881% +3.5718% +7.6828%]
No change in performance detected.
Found 5 outliers among 30 measurements (16.67%)
4 (13.33%) low mild
1 (3.33%) high mild
Dog Response/extern_dog_response/32x32
time: [28.681 µs 28.730 µs 28.796 µs]
thrpt: [35.561 Melem/s 35.642 Melem/s 35.704 Melem/s]
change:
time: [-0.7131% -0.1333% +0.4013%] (p = 0.65 > 0.05)
thrpt: [-0.3996% +0.1335% +0.7182%]
No change in performance detected.
Found 4 outliers among 30 measurements (13.33%)
3 (10.00%) high mild
1 (3.33%) high severe
Dog Response/dog_response_row_parallel/512x512
time: [4.4476 ms 4.4517 ms 4.4558 ms]
thrpt: [58.832 Melem/s 58.887 Melem/s 58.941 Melem/s]
change:
time: [-0.4038% +0.1866% +0.8274%] (p = 0.57 > 0.05)
thrpt: [-0.8207% -0.1863% +0.4054%]
No change in performance detected.
Found 2 outliers among 30 measurements (6.67%)
2 (6.67%) high severe
Dog Response/dog_response_serial/512x512
time: [4.3066 ms 4.3148 ms 4.3243 ms]
thrpt: [60.621 Melem/s 60.755 Melem/s 60.871 Melem/s]
change:
time: [-15.293% -9.7871% -4.7917%] (p = 0.00 < 0.05)
thrpt: [+5.0329% +10.849% +18.054%]
Performance has improved.
Found 1 outliers among 30 measurements (3.33%)
1 (3.33%) high severe
Dog Response/dog_response_rayon/512x512
time: [4.4673 ms 4.4725 ms 4.4795 ms]
thrpt: [58.521 Melem/s 58.613 Melem/s 58.680 Melem/s]
change:
time: [-0.2705% +0.8436% +2.1025%] (p = 0.18 > 0.05)
thrpt: [-2.0593% -0.8366% +0.2713%]
No change in performance detected.
Found 5 outliers among 30 measurements (16.67%)
1 (3.33%) high mild
4 (13.33%) high severe
Dog Response/extern_dog_response/512x512
time: [6.9513 ms 6.9822 ms 7.0164 ms]
thrpt: [37.361 Melem/s 37.544 Melem/s 37.711 Melem/s]
change:
time: [-1.0965% -0.0085% +1.3584%] (p = 0.99 > 0.05)
thrpt: [-1.3402% +0.0085% +1.1086%]
No change in performance detected.
Found 2 outliers among 30 measurements (6.67%)
1 (3.33%) high mild
1 (3.33%) high severe
Benchmarking Dog Response/dog_response_row_parallel/8192x8192: Warming up for 3.0000 s
Warning: Unable to complete 30 samples in 5.0s. You may wish to increase target time to 35.4s, or reduce sample count to 10.
Dog Response/dog_response_row_parallel/8192x8192
time: [1.1755 s 1.1762 s 1.1769 s]
thrpt: [57.023 Melem/s 57.057 Melem/s 57.090 Melem/s]
change:
time: [-5.4084% -4.2453% -3.3331%] (p = 0.00 < 0.05)
thrpt: [+3.4480% +4.4335% +5.7176%]
Performance has improved.
Benchmarking Dog Response/dog_response_serial/8192x8192: Warming up for 3.0000 s
Warning: Unable to complete 30 samples in 5.0s. You may wish to increase target time to 35.4s, or reduce sample count to 10.
Dog Response/dog_response_serial/8192x8192
time: [1.1798 s 1.1806 s 1.1815 s]
thrpt: [56.798 Melem/s 56.843 Melem/s 56.881 Melem/s]
change:
time: [-5.5600% -4.9425% -4.4093%] (p = 0.00 < 0.05)
thrpt: [+4.6127% +5.1995% +5.8873%]
Performance has improved.
Found 2 outliers among 30 measurements (6.67%)
2 (6.67%) high mild
Benchmarking Dog Response/dog_response_rayon/8192x8192: Warming up for 3.0000 s
Warning: Unable to complete 30 samples in 5.0s. You may wish to increase target time to 35.3s, or reduce sample count to 10.
Dog Response/dog_response_rayon/8192x8192
time: [1.1750 s 1.1769 s 1.1794 s]
thrpt: [56.900 Melem/s 57.023 Melem/s 57.116 Melem/s]
change:
time: [-3.1963% -2.8571% -2.5146%] (p = 0.00 < 0.05)
thrpt: [+2.5795% +2.9411% +3.3019%]
Performance has improved.
Found 2 outliers among 30 measurements (6.67%)
2 (6.67%) high severe
Benchmarking Dog Response/extern_dog_response/8192x8192: Warming up for 3.0000 s
Warning: Unable to complete 30 samples in 5.0s. You may wish to increase target time to 150.5s, or reduce sample count to 10.
Dog Response/extern_dog_response/8192x8192
time: [5.0155 s 5.0184 s 5.0207 s]
thrpt: [13.366 Melem/s 13.373 Melem/s 13.380 Melem/s]
change:
time: [-3.3467% -3.0507% -2.7625%] (p = 0.00 < 0.05)
thrpt: [+2.8410% +3.1467% +3.4626%]
Performance has improved.
Found 2 outliers among 30 measurements (6.67%)
1 (3.33%) low severe
1 (3.33%) low mild

@Force1ess
Copy link
Contributor Author

I also proposed in another PR about the possibility to provide the three variants serial/parallel_row/parallel_full with proper docs and hints about performance. What do you think ?

I think the idea of providing all three variants—serial, parallel_row, and parallel_full—with proper docs and performance hints is definitely interesting.

However, looking at the benchmark results, it seems the serial approach either outperforms or comes very close to the parallel implementations across nearly all scenarios.
Given that, I’m not sure it’s worth the added complexity at this point.
What do you think about sticking with serial for now?

@edgarriba
Copy link
Member

alright, then go ahead with the serial implementation

@Force1ess
Copy link
Contributor Author

Does it seem like a good time to merge now?

@edgarriba edgarriba merged commit e614c67 into kornia:main Mar 18, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

implement dog_response
2 participants