NIOPerformanceTester: Add benchmarks for datagram channel #2198

simonjbeaumont · 2022-06-17T14:42:40Z

Motivation:

#2084 recently added preliminary support for connected datagram sockets which presents opportunities for performance optimisations in some use cases. In order to start optimising these flows there should be some benchmarks in place.

As described in #2187, connected-mode UDP introduces two additional dimensions:

Whether to connect the socket.
Whether to continue to use AddressedEnvelope (when connected).

It also explains the kind of flows we're interested in benchmarking.

Modifications:

Add benchmarks to NIOPerformanceTester for various datagram channel flows.

Result:

Benchmarks exist to help assess future optimisation proposals to these flows.
Resolves Add performance tests for connected-mode UDP flows #2187.

simonjbeaumont · 2022-06-17T18:25:36Z

@swift-nio-bot test perf please

simonjbeaumont · 2022-06-17T18:25:36Z

@swift-nio-bot test perf please

swift-server-bot · 2022-06-17T18:29:23Z

performance report

build id: 145

timestamp: Fri Jun 17 18:29:19 UTC 2022

results

name	min	max	mean	std
write_http_headers	0.041662725	0.041954117	0.0417399599	0.00011345406113925086
http_headers_canonical_form	0.085428163	0.08689467	0.0860069001	0.000418608187151565
http_headers_canonical_form_trimming_whitespace	0.016327569	0.016840279	0.016400108599999998	0.00015620830807553803
http_headers_canonical_form_trimming_whitespace_from_short_string	0.01485141	0.015482014	0.014981323	0.00018809702040938593
http_headers_canonical_form_trimming_whitespace_from_long_string	0.023703219	0.024245214	0.023769357999999997	0.0001673811408213805
bytebuffer_write_12MB_short_string_literals	0.171829906	0.179072549	0.17415634289999998	0.002313312308401836
bytebuffer_write_12MB_short_calculated_strings	0.076275591	0.077013078	0.0765971128	0.0002671916913906138
bytebuffer_write_12MB_medium_string_literals	0.991952937	0.997821391	0.9950064941999999	0.0019974303613217626
bytebuffer_write_12MB_medium_calculated_strings	0.089416799	0.090976292	0.090223944	0.000530345736136971
bytebuffer_write_12MB_large_calculated_strings	0.143007008	0.149249929	0.1441605813	0.0018567326639665862
bytebuffer_lots_of_rw	0.040199883	0.04083413	0.040418492900000005	0.00019752172964143658
bytebuffer_write_http_response_ascii_only_as_string	0.034776952	0.035405193	0.035038620300000003	0.0002273467003274711
bytebuffer_write_http_response_ascii_only_as_staticstring	0.02667423	0.027356333	0.026904085799999998	0.0001875688331731996
bytebuffer_write_http_response_some_nonascii_as_string	0.033814384	0.034402253	0.0339735787	0.00021536782879532152
bytebuffer_write_http_response_some_nonascii_as_staticstring	0.025510816	0.026161096	0.0257119154	0.00019938907724134568
no-net_http1_1k_reqs_1_conn	0.01085703	0.011351992	0.0109769514	0.00014108999716823766
http1_1k_reqs_1_conn	0.060217899	0.06234237	0.0613679059	0.0006652131084516292
http1_1k_reqs_100_conns	0.089372321	0.090349762	0.08959744430000001	0.0002859719582406822
future_whenallsucceed_100k_immediately_succeeded_off_loop	0.070086548	0.071362458	0.070609967	0.0003784833826632585
future_whenallsucceed_100k_immediately_succeeded_on_loop	0.070175207	0.078099572	0.0719849875	0.0025632567743309282
future_whenallsucceed_10k_deferred_off_loop	0.029955298	0.030688631	0.0301279381	0.0002238622081129621
future_whenallsucceed_10k_deferred_on_loop	0.012757189	0.013343613	0.0128745472	0.00017032740312951562
future_whenallcomplete_100k_immediately_succeeded_off_loop	0.0347632	0.035306709	0.034918460899999997	0.00019230192644233447
future_whenallcomplete_100k_immediately_succeeded_on_loop	0.034652291	0.035418259	0.0349014604	0.0002491877178956545
future_whenallcomplete_10k_deferred_off_loop	0.02301545	0.023577227	0.0231371461	0.00016886669002398147
future_whenallcomplete_100k_deferred_on_loop	0.071692844	0.074537799	0.0724980266	0.0008404933087374874
future_reduce_10k_futures	0.015298047	0.015823573	0.0154090788	0.00015099190882648876
future_reduce_into_10k_futures	0.01334563	0.013995041	0.013500395100000001	0.00021623154007192793
channel_pipeline_1m_events	0.097174929	0.097292591	0.0972392551	4.490477505529383e-05
websocket_encode_50b_space_at_front_100k_frames_cow	0.044737258	0.045191195	0.0448347607	0.00018379124404428225
websocket_encode_50b_space_at_front_1m_frames_cow_masking	0.605975686	0.614835793	0.6086842936	0.003361086806587107
websocket_encode_1kb_space_at_front_1m_frames_cow	0.471223077	0.471874009	0.4714554197	0.0002547513720883489
websocket_encode_50b_no_space_at_front_100k_frames_cow	0.0447588	0.045239332	0.0448657257	0.0001922333654037611
websocket_encode_1kb_no_space_at_front_100k_frames_cow	0.047024214	0.047487858	0.0471760697	0.0002097948935842129
websocket_encode_50b_space_at_front_100k_frames	0.065566216	0.066013743	0.06572203169999999	0.0001960908099291691
websocket_encode_50b_space_at_front_10k_frames_masking	0.008334063	0.008833188	0.0084121422	0.00014886420863532735
websocket_encode_1kb_space_at_front_10k_frames	0.012748835	0.012835852	0.0127703509	2.667295678085455e-05
websocket_encode_50b_no_space_at_front_100k_frames	0.064778805	0.065260351	0.0649391168	0.0002123368906603097
websocket_encode_1kb_no_space_at_front_10k_frames	0.012199704	0.012642312	0.012259744	0.00013541699506749899
websocket_decode_125b_10k_frames	0.011557208	0.011725306	0.011643592999999999	5.579105769047794e-05
websocket_decode_125b_with_a_masking_key_10k_frames	0.011886457	0.012551941	0.011995295100000001	0.00019918137190368332
websocket_decode_64kb_10k_frames	0.011865997	0.012305131	0.011972299799999999	0.0001275894228686516
websocket_decode_64kb_with_a_masking_key_10k_frames	0.012216301	0.012325203	0.0122574825	3.8014819795121245e-05
websocket_decode_64kb_+1_10k_frames	0.011873218	0.012379003	0.0119486826	0.00015304052431968596
websocket_decode_64kb_+1_with_a_masking_key_10k_frames	0.012258658	0.012739337	0.0123750879	0.0001432503810407576
circular_buffer_into_byte_buffer_1kb	0.033003023	0.033443884	0.033059138700000004	0.0001357714591829883
circular_buffer_into_byte_buffer_1mb	0.064649225	0.065173264	0.0648183481	0.0002241563442696228
byte_buffer_view_iterator_1mb	0.017564919	0.018072208	0.0176481632	0.00015418835839380919
byte_buffer_view_contains_12mb	0.052876061	0.053382761	0.0530510339	0.0002250170668759007
byte_to_message_decoder_decode_many_small	0.040078499	0.040637553	0.0402146081	0.00020313974447680163
generate_10k_random_request_keys	0.08931608	0.089666493	0.0894665862	0.00013491582260044627
bytebuffer_rw_10_uint32s	0.038834545	0.05205646	0.0417507431	0.003688137038442894
bytebuffer_multi_rw_10_uint32s	0.057375531	0.067059859	0.059650549100000005	0.0030920550751587514
lock_1_thread_10M_ops	0.15941446	0.166886246	0.1608610881	0.0023665043693615285
lock_2_threads_10M_ops	0.886533031	0.952784335	0.9107198442	0.019006729394206272
lock_4_threads_10M_ops	0.940749099	0.979265196	0.9652099404	0.011636315685027038
lock_8_threads_10M_ops	0.835241447	0.908570668	0.8920025543	0.021813553831235136
schedule_100k_tasks	0.073955651	0.113300493	0.0818714579	0.01247563908324537
schedule_and_run_100k_tasks	0.404691309	0.416430189	0.4085794434	0.0036160657093091348
execute_100k_tasks	0.21332552	0.213667242	0.2134483496	0.00013655915291933306
bytebufferview_copy_to_array_100k_times_1kb	0.010827164	0.010905942	0.010853081300000001	2.2478568519320367e-05
circularbuffer_copy_to_array_10k_times_1kb	0.021153188	0.021597117	0.0212081326	0.0001368975463567147
deadline_now_1M_times	0.027116441	0.027432526	0.027170087	9.424844404138531e-05
datagram_channel_bootstrap_create	0.012875235	0.013402205	0.0129658401	0.00015514791938981622
datagram_channel_bind	0.091323155	0.093014746	0.09207692619999999	0.0006473000119240261
datagram_channel_connect	0.095020046	0.096223019	0.0958040854	0.00036551044288355865
datagram_channel_write_unconnected_addressed	0.061169251	0.107407391	0.0772126829	0.0128148027770061
datagram_channel_write_connected_addressed	0.068652302	0.09341905	0.0754894025	0.007017704983422118
datagram_channel_write_connected_unaddressed	0.067763803	0.092570627	0.0747627371	0.0069965531470506164
datagram_channel_write_unconnected_addressed_metadata	0.059443588	0.09628679	0.072932644	0.009792955583426214
datagram_channel_write_connected_addressed_metadata	0.059332421	0.091242056	0.072987651	0.008692150428514926

comparison

name	current	previous	winner	diff
write_http_headers	0.041662725	0.041754654	current	0%
http_headers_canonical_form	0.085428163	0.085106232	previous	0%
http_headers_canonical_form_trimming_whitespace	0.016327569	0.016522931	current	-1%
http_headers_canonical_form_trimming_whitespace_from_short_string	0.01485141	0.014978842	current	0%
http_headers_canonical_form_trimming_whitespace_from_long_string	0.023703219	0.023846697	current	0%
bytebuffer_write_12MB_short_string_literals	0.171829906	0.165883397	previous	3%
bytebuffer_write_12MB_short_calculated_strings	0.076275591	0.074170159	previous	2%
bytebuffer_write_12MB_medium_string_literals	0.991952937	1.05653436	current	-6%
bytebuffer_write_12MB_medium_calculated_strings	0.089416799	0.092970701	current	-3%
bytebuffer_write_12MB_large_calculated_strings	0.143007008	0.145352455	current	-1%
bytebuffer_lots_of_rw	0.040199883	0.040862078	current	-1%
bytebuffer_write_http_response_ascii_only_as_string	0.034776952	0.033303627	previous	4%
bytebuffer_write_http_response_ascii_only_as_staticstring	0.02667423	0.025783209	previous	3%
bytebuffer_write_http_response_some_nonascii_as_string	0.033814384	0.033415085	previous	1%
bytebuffer_write_http_response_some_nonascii_as_staticstring	0.025510816	0.025714829	current	0%
no-net_http1_1k_reqs_1_conn	0.01085703	0.010687336	previous	1%
http1_1k_reqs_1_conn	0.060217899	0.061152645	current	-1%
http1_1k_reqs_100_conns	0.089372321	0.089673268	current	0%
future_whenallsucceed_100k_immediately_succeeded_off_loop	0.070086548	0.071476632	current	-1%
future_whenallsucceed_100k_immediately_succeeded_on_loop	0.070175207	0.071561295	current	-1%
future_whenallsucceed_10k_deferred_off_loop	0.029955298	0.03026754	current	-1%
future_whenallsucceed_10k_deferred_on_loop	0.012757189	0.012883374	current	0%
future_whenallcomplete_100k_immediately_succeeded_off_loop	0.0347632	0.035803356	current	-2%
future_whenallcomplete_100k_immediately_succeeded_on_loop	0.034652291	0.035592812	current	-2%
future_whenallcomplete_10k_deferred_off_loop	0.02301545	0.02306938	current	0%
future_whenallcomplete_100k_deferred_on_loop	0.071692844	0.072248607	current	0%
future_reduce_10k_futures	0.015298047	0.015524425	current	-1%
future_reduce_into_10k_futures	0.01334563	0.013711641	current	-2%
channel_pipeline_1m_events	0.097174929	0.097228554	current	0%
websocket_encode_50b_space_at_front_100k_frames_cow	0.044737258	0.045291944	current	-1%
websocket_encode_50b_space_at_front_1m_frames_cow_masking	0.605975686	0.607334035	current	0%
websocket_encode_1kb_space_at_front_1m_frames_cow	0.471223077	0.476651184	current	-1%
websocket_encode_50b_no_space_at_front_100k_frames_cow	0.0447588	0.045347631	current	-1%
websocket_encode_1kb_no_space_at_front_100k_frames_cow	0.047024214	0.047565333	current	-1%
websocket_encode_50b_space_at_front_100k_frames	0.065566216	0.065666789	current	0%
websocket_encode_50b_space_at_front_10k_frames_masking	0.008334063	0.008006966	previous	4%
websocket_encode_1kb_space_at_front_10k_frames	0.012748835	0.012827992	current	0%
websocket_encode_50b_no_space_at_front_100k_frames	0.064778805	0.06498286	current	0%
websocket_encode_1kb_no_space_at_front_10k_frames	0.012199704	0.012234885	current	0%
websocket_decode_125b_10k_frames	0.011557208	0.011942959	current	-3%
websocket_decode_125b_with_a_masking_key_10k_frames	0.011886457	0.012278416	current	-3%
websocket_decode_64kb_10k_frames	0.011865997	0.012314648	current	-3%
websocket_decode_64kb_with_a_masking_key_10k_frames	0.012216301	0.012636669	current	-3%
websocket_decode_64kb_+1_10k_frames	0.011873218	0.012244892	current	-3%
websocket_decode_64kb_+1_with_a_masking_key_10k_frames	0.012258658	0.012632232	current	-2%
circular_buffer_into_byte_buffer_1kb	0.033003023	0.032998042	previous	0%
circular_buffer_into_byte_buffer_1mb	0.064649225	0.064677653	current	0%
byte_buffer_view_iterator_1mb	0.017564919	0.017567211	current	0%
byte_buffer_view_contains_12mb	0.052876061	0.053074737	current	0%
byte_to_message_decoder_decode_many_small	0.040078499	0.040067633	previous	0%
generate_10k_random_request_keys	0.08931608	0.089737416	current	0%
bytebuffer_rw_10_uint32s	0.038834545	0.038877813	current	0%
bytebuffer_multi_rw_10_uint32s	0.057375531	0.057155477	previous	0%
lock_1_thread_10M_ops	0.15941446	0.159308348	previous	0%
lock_2_threads_10M_ops	0.886533031	0.792885972	previous	11%
lock_4_threads_10M_ops	0.940749099	0.939501409	previous	0%
lock_8_threads_10M_ops	0.835241447	0.917753059	current	-8%
schedule_100k_tasks	0.073955651	0.074337226	current	0%
schedule_and_run_100k_tasks	0.404691309	0.440949714	current	-8%
execute_100k_tasks	0.21332552	0.215309541	current	0%
bytebufferview_copy_to_array_100k_times_1kb	0.010827164	0.011545535	current	-6%
circularbuffer_copy_to_array_10k_times_1kb	0.021153188	0.021215125	current	0%
deadline_now_1M_times	0.027116441	0.02699915	previous	0%
datagram_channel_bootstrap_create	0.012875235	n/a	n/a	n/a%
datagram_channel_bind	0.091323155	n/a	n/a	n/a%
datagram_channel_connect	0.095020046	n/a	n/a	n/a%
datagram_channel_write_unconnected_addressed	0.061169251	n/a	n/a	n/a%
datagram_channel_write_connected_addressed	0.068652302	n/a	n/a	n/a%
datagram_channel_write_connected_unaddressed	0.067763803	n/a	n/a	n/a%
datagram_channel_write_unconnected_addressed_metadata	0.059443588	n/a	n/a	n/a%
datagram_channel_write_connected_addressed_metadata	0.059332421	n/a	n/a	n/a%

significant differences found

Signed-off-by: Si Beaumont <[email protected]>

simonjbeaumont · 2022-07-01T08:54:55Z

@Lukasa I think this is ready for a review.

Lukasa

Thanks for this! I have some notes in the diff.

Lukasa · 2022-07-04T14:52:16Z

Sources/NIOPerformanceTester/DatagramChannelBenchmark.swift

+    }
+}
+
+final class DatagramBootstrapCreateBenchmark: DatagramClientBenchmark, Benchmark {


Does this need to inherit? It doesn't seem to use this well.

You're right—throughout this PR—that some of these benchmark classes inherit very little from the base. This one inherits almost nothing.

The reason I went this way is because I planned to port this code over to the allocation test framework where it would be nice for everything to share some common scaffolding.

It also makes it clear that everything is set up the same for every benchmark, and then we'll just benchmark a different part of the flow in the critical loop. This is motivated by the discussion in the issue #2187. It would be nice to make it very clear that in all of these tests, we establish a control, and change and measure just one thing.

WDYT?

Lukasa · 2022-07-04T14:58:14Z

Sources/NIOPerformanceTester/DatagramChannelBenchmark.swift

+
+    func run() throws -> Int {
+        for _ in 1...self.iterations {
+            try! self.clientBootstrap.bind(to: self.localhostPickPort).flatMap { $0.close() }.wait()


Same here, I don't think this inheritance serves us for this benchmark.

Lukasa · 2022-07-04T14:58:43Z

Sources/NIOPerformanceTester/DatagramChannelBenchmark.swift

+    }
+}
+
+final class DatagramChannelConnectBenchmark: DatagramClientBenchmark, Benchmark {


Nor does it really serve us much here, I think.

Lukasa · 2022-07-04T14:59:18Z

Sources/NIOPerformanceTester/DatagramChannelBenchmark.swift

+
+    func run() throws -> Int {
+        for _ in 1...self.iterations {
+            try! self.clientChannel.writeAndFlush(self.payload).wait()


I think this needs some signal to delay the finishing of the test until the server has read everything. Otherwise this test will be very variable based on how fast the server is reading.

Ah, that's an interesting one. I noticed that the current allocation benchmark we have waits for the server to read everything. However, I found that when I increased the iterations in here to get the runtime we want for the perf benchmarks, I would see some non-uniformity and some lost packets, even on localhost. IIUC there's nothing to guarantee that all the packets would arrive, even on localhost.

Before opening this PR, I had something in here to assert-at-least-one-echo-response-was-received but I thought better of it.

This test currently measures the client sending out the datagrams, at which point we've measured all of NIO's involvement in getting the packets out the door.

You're right that we're missing some test of the read path. We could consider adding a test where the client continually sends payloads and we fulfil a promise only when the server has seen a given number of responses?

WDYT?

I think that latter idea is the way to go.

NIOPerformanceTester: Add benchmarks for UDP flows

ee9ea6e

Signed-off-by: Si Beaumont <[email protected]>

simonjbeaumont force-pushed the sb/udp-perf-tests branch from 8e1bf6a to ee9ea6e Compare June 20, 2022 08:13

simonjbeaumont added 4 commits June 20, 2022 09:13

Merge branch 'main' into sb/udp-perf-tests

c504ba1

fixup: Wait for each write to finish

587d3be

Signed-off-by: Si Beaumont <[email protected]>

Merge branch 'main' into sb/udp-perf-tests

dd06c90

Merge branch 'main' into sb/udp-perf-tests

5c19d6c

simonjbeaumont marked this pull request as ready for review July 1, 2022 08:50

Lukasa added the semver/none No version bump required. label Jul 4, 2022

Lukasa requested changes Jul 4, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NIOPerformanceTester: Add benchmarks for datagram channel #2198

NIOPerformanceTester: Add benchmarks for datagram channel #2198

simonjbeaumont commented Jun 17, 2022

simonjbeaumont commented Jun 17, 2022

simonjbeaumont commented Jun 17, 2022

swift-server-bot commented Jun 17, 2022

simonjbeaumont commented Jul 1, 2022

Lukasa left a comment

Lukasa Jul 4, 2022

simonjbeaumont Jul 14, 2022

Lukasa Jul 4, 2022

Lukasa Jul 4, 2022

Lukasa Jul 4, 2022

simonjbeaumont Jul 14, 2022

Lukasa Jul 14, 2022

NIOPerformanceTester: Add benchmarks for datagram channel #2198

Are you sure you want to change the base?

NIOPerformanceTester: Add benchmarks for datagram channel #2198

Conversation

simonjbeaumont commented Jun 17, 2022

Motivation:

Modifications:

Result:

simonjbeaumont commented Jun 17, 2022

simonjbeaumont commented Jun 17, 2022

swift-server-bot commented Jun 17, 2022

performance report

results

comparison

simonjbeaumont commented Jul 1, 2022

Lukasa left a comment

Choose a reason for hiding this comment

Lukasa Jul 4, 2022

Choose a reason for hiding this comment

simonjbeaumont Jul 14, 2022

Choose a reason for hiding this comment

Lukasa Jul 4, 2022

Choose a reason for hiding this comment

Lukasa Jul 4, 2022

Choose a reason for hiding this comment

Lukasa Jul 4, 2022

Choose a reason for hiding this comment

simonjbeaumont Jul 14, 2022

Choose a reason for hiding this comment

Lukasa Jul 14, 2022

Choose a reason for hiding this comment