Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIOPerformanceTester: Add benchmarks for datagram channel #2198

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

simonjbeaumont
Copy link
Contributor

Motivation:

#2084 recently added preliminary support for connected datagram sockets which presents opportunities for performance optimisations in some use cases. In order to start optimising these flows there should be some benchmarks in place.

As described in #2187, connected-mode UDP introduces two additional dimensions:

  • Whether to connect the socket.
  • Whether to continue to use AddressedEnvelope (when connected).

It also explains the kind of flows we're interested in benchmarking.

Modifications:

  • Add benchmarks to NIOPerformanceTester for various datagram channel flows.

Result:

@simonjbeaumont
Copy link
Contributor Author

@swift-nio-bot test perf please

1 similar comment
@simonjbeaumont
Copy link
Contributor Author

@swift-nio-bot test perf please

@swift-server-bot
Copy link

performance report

build id: 145

timestamp: Fri Jun 17 18:29:19 UTC 2022

results

nameminmaxmeanstd
write_http_headers 0.041662725 0.041954117 0.0417399599 0.00011345406113925086
http_headers_canonical_form 0.085428163 0.08689467 0.0860069001 0.000418608187151565
http_headers_canonical_form_trimming_whitespace 0.016327569 0.016840279 0.016400108599999998 0.00015620830807553803
http_headers_canonical_form_trimming_whitespace_from_short_string 0.01485141 0.015482014 0.014981323 0.00018809702040938593
http_headers_canonical_form_trimming_whitespace_from_long_string 0.023703219 0.024245214 0.023769357999999997 0.0001673811408213805
bytebuffer_write_12MB_short_string_literals 0.171829906 0.179072549 0.17415634289999998 0.002313312308401836
bytebuffer_write_12MB_short_calculated_strings 0.076275591 0.077013078 0.0765971128 0.0002671916913906138
bytebuffer_write_12MB_medium_string_literals 0.991952937 0.997821391 0.9950064941999999 0.0019974303613217626
bytebuffer_write_12MB_medium_calculated_strings 0.089416799 0.090976292 0.090223944 0.000530345736136971
bytebuffer_write_12MB_large_calculated_strings 0.143007008 0.149249929 0.1441605813 0.0018567326639665862
bytebuffer_lots_of_rw 0.040199883 0.04083413 0.040418492900000005 0.00019752172964143658
bytebuffer_write_http_response_ascii_only_as_string 0.034776952 0.035405193 0.035038620300000003 0.0002273467003274711
bytebuffer_write_http_response_ascii_only_as_staticstring 0.02667423 0.027356333 0.026904085799999998 0.0001875688331731996
bytebuffer_write_http_response_some_nonascii_as_string 0.033814384 0.034402253 0.0339735787 0.00021536782879532152
bytebuffer_write_http_response_some_nonascii_as_staticstring 0.025510816 0.026161096 0.0257119154 0.00019938907724134568
no-net_http1_1k_reqs_1_conn 0.01085703 0.011351992 0.0109769514 0.00014108999716823766
http1_1k_reqs_1_conn 0.060217899 0.06234237 0.0613679059 0.0006652131084516292
http1_1k_reqs_100_conns 0.089372321 0.090349762 0.08959744430000001 0.0002859719582406822
future_whenallsucceed_100k_immediately_succeeded_off_loop 0.070086548 0.071362458 0.070609967 0.0003784833826632585
future_whenallsucceed_100k_immediately_succeeded_on_loop 0.070175207 0.078099572 0.0719849875 0.0025632567743309282
future_whenallsucceed_10k_deferred_off_loop 0.029955298 0.030688631 0.0301279381 0.0002238622081129621
future_whenallsucceed_10k_deferred_on_loop 0.012757189 0.013343613 0.0128745472 0.00017032740312951562
future_whenallcomplete_100k_immediately_succeeded_off_loop 0.0347632 0.035306709 0.034918460899999997 0.00019230192644233447
future_whenallcomplete_100k_immediately_succeeded_on_loop 0.034652291 0.035418259 0.0349014604 0.0002491877178956545
future_whenallcomplete_10k_deferred_off_loop 0.02301545 0.023577227 0.0231371461 0.00016886669002398147
future_whenallcomplete_100k_deferred_on_loop 0.071692844 0.074537799 0.0724980266 0.0008404933087374874
future_reduce_10k_futures 0.015298047 0.015823573 0.0154090788 0.00015099190882648876
future_reduce_into_10k_futures 0.01334563 0.013995041 0.013500395100000001 0.00021623154007192793
channel_pipeline_1m_events 0.097174929 0.097292591 0.0972392551 4.490477505529383e-05
websocket_encode_50b_space_at_front_100k_frames_cow 0.044737258 0.045191195 0.0448347607 0.00018379124404428225
websocket_encode_50b_space_at_front_1m_frames_cow_masking 0.605975686 0.614835793 0.6086842936 0.003361086806587107
websocket_encode_1kb_space_at_front_1m_frames_cow 0.471223077 0.471874009 0.4714554197 0.0002547513720883489
websocket_encode_50b_no_space_at_front_100k_frames_cow 0.0447588 0.045239332 0.0448657257 0.0001922333654037611
websocket_encode_1kb_no_space_at_front_100k_frames_cow 0.047024214 0.047487858 0.0471760697 0.0002097948935842129
websocket_encode_50b_space_at_front_100k_frames 0.065566216 0.066013743 0.06572203169999999 0.0001960908099291691
websocket_encode_50b_space_at_front_10k_frames_masking 0.008334063 0.008833188 0.0084121422 0.00014886420863532735
websocket_encode_1kb_space_at_front_10k_frames 0.012748835 0.012835852 0.0127703509 2.667295678085455e-05
websocket_encode_50b_no_space_at_front_100k_frames 0.064778805 0.065260351 0.0649391168 0.0002123368906603097
websocket_encode_1kb_no_space_at_front_10k_frames 0.012199704 0.012642312 0.012259744 0.00013541699506749899
websocket_decode_125b_10k_frames 0.011557208 0.011725306 0.011643592999999999 5.579105769047794e-05
websocket_decode_125b_with_a_masking_key_10k_frames 0.011886457 0.012551941 0.011995295100000001 0.00019918137190368332
websocket_decode_64kb_10k_frames 0.011865997 0.012305131 0.011972299799999999 0.0001275894228686516
websocket_decode_64kb_with_a_masking_key_10k_frames 0.012216301 0.012325203 0.0122574825 3.8014819795121245e-05
websocket_decode_64kb_+1_10k_frames 0.011873218 0.012379003 0.0119486826 0.00015304052431968596
websocket_decode_64kb_+1_with_a_masking_key_10k_frames 0.012258658 0.012739337 0.0123750879 0.0001432503810407576
circular_buffer_into_byte_buffer_1kb 0.033003023 0.033443884 0.033059138700000004 0.0001357714591829883
circular_buffer_into_byte_buffer_1mb 0.064649225 0.065173264 0.0648183481 0.0002241563442696228
byte_buffer_view_iterator_1mb 0.017564919 0.018072208 0.0176481632 0.00015418835839380919
byte_buffer_view_contains_12mb 0.052876061 0.053382761 0.0530510339 0.0002250170668759007
byte_to_message_decoder_decode_many_small 0.040078499 0.040637553 0.0402146081 0.00020313974447680163
generate_10k_random_request_keys 0.08931608 0.089666493 0.0894665862 0.00013491582260044627
bytebuffer_rw_10_uint32s 0.038834545 0.05205646 0.0417507431 0.003688137038442894
bytebuffer_multi_rw_10_uint32s 0.057375531 0.067059859 0.059650549100000005 0.0030920550751587514
lock_1_thread_10M_ops 0.15941446 0.166886246 0.1608610881 0.0023665043693615285
lock_2_threads_10M_ops 0.886533031 0.952784335 0.9107198442 0.019006729394206272
lock_4_threads_10M_ops 0.940749099 0.979265196 0.9652099404 0.011636315685027038
lock_8_threads_10M_ops 0.835241447 0.908570668 0.8920025543 0.021813553831235136
schedule_100k_tasks 0.073955651 0.113300493 0.0818714579 0.01247563908324537
schedule_and_run_100k_tasks 0.404691309 0.416430189 0.4085794434 0.0036160657093091348
execute_100k_tasks 0.21332552 0.213667242 0.2134483496 0.00013655915291933306
bytebufferview_copy_to_array_100k_times_1kb 0.010827164 0.010905942 0.010853081300000001 2.2478568519320367e-05
circularbuffer_copy_to_array_10k_times_1kb 0.021153188 0.021597117 0.0212081326 0.0001368975463567147
deadline_now_1M_times 0.027116441 0.027432526 0.027170087 9.424844404138531e-05
datagram_channel_bootstrap_create 0.012875235 0.013402205 0.0129658401 0.00015514791938981622
datagram_channel_bind 0.091323155 0.093014746 0.09207692619999999 0.0006473000119240261
datagram_channel_connect 0.095020046 0.096223019 0.0958040854 0.00036551044288355865
datagram_channel_write_unconnected_addressed 0.061169251 0.107407391 0.0772126829 0.0128148027770061
datagram_channel_write_connected_addressed 0.068652302 0.09341905 0.0754894025 0.007017704983422118
datagram_channel_write_connected_unaddressed 0.067763803 0.092570627 0.0747627371 0.0069965531470506164
datagram_channel_write_unconnected_addressed_metadata 0.059443588 0.09628679 0.072932644 0.009792955583426214
datagram_channel_write_connected_addressed_metadata 0.059332421 0.091242056 0.072987651 0.008692150428514926

comparison

name current previous winner diff
write_http_headers 0.041662725 0.041754654 current 0%
http_headers_canonical_form 0.085428163 0.085106232 previous 0%
http_headers_canonical_form_trimming_whitespace 0.016327569 0.016522931 current -1%
http_headers_canonical_form_trimming_whitespace_from_short_string 0.01485141 0.014978842 current 0%
http_headers_canonical_form_trimming_whitespace_from_long_string 0.023703219 0.023846697 current 0%
bytebuffer_write_12MB_short_string_literals 0.171829906 0.165883397 previous 3%
bytebuffer_write_12MB_short_calculated_strings 0.076275591 0.074170159 previous 2%
bytebuffer_write_12MB_medium_string_literals 0.991952937 1.05653436 current -6%
bytebuffer_write_12MB_medium_calculated_strings 0.089416799 0.092970701 current -3%
bytebuffer_write_12MB_large_calculated_strings 0.143007008 0.145352455 current -1%
bytebuffer_lots_of_rw 0.040199883 0.040862078 current -1%
bytebuffer_write_http_response_ascii_only_as_string 0.034776952 0.033303627 previous 4%
bytebuffer_write_http_response_ascii_only_as_staticstring 0.02667423 0.025783209 previous 3%
bytebuffer_write_http_response_some_nonascii_as_string 0.033814384 0.033415085 previous 1%
bytebuffer_write_http_response_some_nonascii_as_staticstring 0.025510816 0.025714829 current 0%
no-net_http1_1k_reqs_1_conn 0.01085703 0.010687336 previous 1%
http1_1k_reqs_1_conn 0.060217899 0.061152645 current -1%
http1_1k_reqs_100_conns 0.089372321 0.089673268 current 0%
future_whenallsucceed_100k_immediately_succeeded_off_loop 0.070086548 0.071476632 current -1%
future_whenallsucceed_100k_immediately_succeeded_on_loop 0.070175207 0.071561295 current -1%
future_whenallsucceed_10k_deferred_off_loop 0.029955298 0.03026754 current -1%
future_whenallsucceed_10k_deferred_on_loop 0.012757189 0.012883374 current 0%
future_whenallcomplete_100k_immediately_succeeded_off_loop 0.0347632 0.035803356 current -2%
future_whenallcomplete_100k_immediately_succeeded_on_loop 0.034652291 0.035592812 current -2%
future_whenallcomplete_10k_deferred_off_loop 0.02301545 0.02306938 current 0%
future_whenallcomplete_100k_deferred_on_loop 0.071692844 0.072248607 current 0%
future_reduce_10k_futures 0.015298047 0.015524425 current -1%
future_reduce_into_10k_futures 0.01334563 0.013711641 current -2%
channel_pipeline_1m_events 0.097174929 0.097228554 current 0%
websocket_encode_50b_space_at_front_100k_frames_cow 0.044737258 0.045291944 current -1%
websocket_encode_50b_space_at_front_1m_frames_cow_masking 0.605975686 0.607334035 current 0%
websocket_encode_1kb_space_at_front_1m_frames_cow 0.471223077 0.476651184 current -1%
websocket_encode_50b_no_space_at_front_100k_frames_cow 0.0447588 0.045347631 current -1%
websocket_encode_1kb_no_space_at_front_100k_frames_cow 0.047024214 0.047565333 current -1%
websocket_encode_50b_space_at_front_100k_frames 0.065566216 0.065666789 current 0%
websocket_encode_50b_space_at_front_10k_frames_masking 0.008334063 0.008006966 previous 4%
websocket_encode_1kb_space_at_front_10k_frames 0.012748835 0.012827992 current 0%
websocket_encode_50b_no_space_at_front_100k_frames 0.064778805 0.06498286 current 0%
websocket_encode_1kb_no_space_at_front_10k_frames 0.012199704 0.012234885 current 0%
websocket_decode_125b_10k_frames 0.011557208 0.011942959 current -3%
websocket_decode_125b_with_a_masking_key_10k_frames 0.011886457 0.012278416 current -3%
websocket_decode_64kb_10k_frames 0.011865997 0.012314648 current -3%
websocket_decode_64kb_with_a_masking_key_10k_frames 0.012216301 0.012636669 current -3%
websocket_decode_64kb_+1_10k_frames 0.011873218 0.012244892 current -3%
websocket_decode_64kb_+1_with_a_masking_key_10k_frames 0.012258658 0.012632232 current -2%
circular_buffer_into_byte_buffer_1kb 0.033003023 0.032998042 previous 0%
circular_buffer_into_byte_buffer_1mb 0.064649225 0.064677653 current 0%
byte_buffer_view_iterator_1mb 0.017564919 0.017567211 current 0%
byte_buffer_view_contains_12mb 0.052876061 0.053074737 current 0%
byte_to_message_decoder_decode_many_small 0.040078499 0.040067633 previous 0%
generate_10k_random_request_keys 0.08931608 0.089737416 current 0%
bytebuffer_rw_10_uint32s 0.038834545 0.038877813 current 0%
bytebuffer_multi_rw_10_uint32s 0.057375531 0.057155477 previous 0%
lock_1_thread_10M_ops 0.15941446 0.159308348 previous 0%
lock_2_threads_10M_ops 0.886533031 0.792885972 previous 11%
lock_4_threads_10M_ops 0.940749099 0.939501409 previous 0%
lock_8_threads_10M_ops 0.835241447 0.917753059 current -8%
schedule_100k_tasks 0.073955651 0.074337226 current 0%
schedule_and_run_100k_tasks 0.404691309 0.440949714 current -8%
execute_100k_tasks 0.21332552 0.215309541 current 0%
bytebufferview_copy_to_array_100k_times_1kb 0.010827164 0.011545535 current -6%
circularbuffer_copy_to_array_10k_times_1kb 0.021153188 0.021215125 current 0%
deadline_now_1M_times 0.027116441 0.02699915 previous 0%
datagram_channel_bootstrap_create 0.012875235 n/a n/a n/a%
datagram_channel_bind 0.091323155 n/a n/a n/a%
datagram_channel_connect 0.095020046 n/a n/a n/a%
datagram_channel_write_unconnected_addressed 0.061169251 n/a n/a n/a%
datagram_channel_write_connected_addressed 0.068652302 n/a n/a n/a%
datagram_channel_write_connected_unaddressed 0.067763803 n/a n/a n/a%
datagram_channel_write_unconnected_addressed_metadata 0.059443588 n/a n/a n/a%
datagram_channel_write_connected_addressed_metadata 0.059332421 n/a n/a n/a%

significant differences found

@simonjbeaumont simonjbeaumont marked this pull request as ready for review July 1, 2022 08:50
@simonjbeaumont
Copy link
Contributor Author

@Lukasa I think this is ready for a review.

@Lukasa Lukasa added the semver/none No version bump required. label Jul 4, 2022
Copy link
Contributor

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! I have some notes in the diff.

}
}

final class DatagramBootstrapCreateBenchmark: DatagramClientBenchmark, Benchmark {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to inherit? It doesn't seem to use this well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right—throughout this PR—that some of these benchmark classes inherit very little from the base. This one inherits almost nothing.

The reason I went this way is because I planned to port this code over to the allocation test framework where it would be nice for everything to share some common scaffolding.

It also makes it clear that everything is set up the same for every benchmark, and then we'll just benchmark a different part of the flow in the critical loop. This is motivated by the discussion in the issue #2187. It would be nice to make it very clear that in all of these tests, we establish a control, and change and measure just one thing.

WDYT?


func run() throws -> Int {
for _ in 1...self.iterations {
try! self.clientBootstrap.bind(to: self.localhostPickPort).flatMap { $0.close() }.wait()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I don't think this inheritance serves us for this benchmark.

}
}

final class DatagramChannelConnectBenchmark: DatagramClientBenchmark, Benchmark {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nor does it really serve us much here, I think.


func run() throws -> Int {
for _ in 1...self.iterations {
try! self.clientChannel.writeAndFlush(self.payload).wait()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs some signal to delay the finishing of the test until the server has read everything. Otherwise this test will be very variable based on how fast the server is reading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's an interesting one. I noticed that the current allocation benchmark we have waits for the server to read everything. However, I found that when I increased the iterations in here to get the runtime we want for the perf benchmarks, I would see some non-uniformity and some lost packets, even on localhost. IIUC there's nothing to guarantee that all the packets would arrive, even on localhost.

Before opening this PR, I had something in here to assert-at-least-one-echo-response-was-received but I thought better of it.

This test currently measures the client sending out the datagrams, at which point we've measured all of NIO's involvement in getting the packets out the door.

You're right that we're missing some test of the read path. We could consider adding a test where the client continually sends payloads and we fulfil a promise only when the server has seen a given number of responses?

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that latter idea is the way to go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver/none No version bump required.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add performance tests for connected-mode UDP flows
3 participants