Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gun:ping/2,3 for user-initiated ping for HTTP/2 #343

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

zuiderkwast
Copy link
Contributor

This feature allows a user to explicitly send a ping frame and receive a ping ack. The use case is to test the latency and health of existing long-lived API connections using some admin tool.

A request, such as OPTIONS or HEAD, can also be used for this purpose, but it may be handled differently by middleware.

This PR implements it only for HTTP/2, which is the use case we have. For other protocols, an error tuple is sent to the reply-to process.

This use case is mentioned in RFC 7540, section 6.7:

The PING frame (type=0x6) is a mechanism for measuring a minimal
round-trip time from the sender, as well as determining whether an
idle connection is still functional.

Section 8.1.4:

The PING frame allows a client to safely test whether a connection is
still active without sending a request.

bjosv pushed a commit to Nordix/gun that referenced this pull request Dec 19, 2024
@essen essen added this to the 2.2.0 milestone Jan 15, 2025
src/gun.erl Outdated
@@ -782,6 +800,8 @@ await(ServerPid, StreamRef, Timeout, MRef) ->
{up, Protocol};
{gun_notify, ServerPid, Type, Info} ->
{notify, Type, Info};
{gun_ping_ack, ServerPid, StreamRef} ->
ping_ack;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need for a new message, we can use gun_notify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't work the same when using gun:await to wait for a specific reply such as this one. The PingRef is matched like a StreamRef for a request. In the test case, I do this to wait :

        PingRef = gun:ping(Pid),
        ping_ack = gun:await(Pid, PingRef, 1000),

If we instead use gun_notify, then gun:await returns doesn't wait specifically for this ping. It may return the reply of a previous ping or some other notify message and the caller would have to check that the Info in {notify, Type = ping_ack, Info} matches the PingRef we want. It's less convenient, or WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not seen that PingRef. The problem with doing it that way is that it circumvents a stream-specific concept (StreamRef) and hijacks it for a connection-specific thing (pings). So I do not like that approach.

I also wonder if gun:await would be useful for this outside of tests. I would say it's OK to use gun_notify. Worst case, the message is documented, so a selective receive lets you receive exactly the message you are looking for if necessary. In the case of pings I think that would be {gun_notify, ServerPid, ping_ack, PingRef}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, selective receive is probably fine.

Are you OK with hijacking the stream ref mechanism for the rest of this implementation?

I was thinking like this: A ping is similar to a request, so the stream ref is a nice abstraction for a user. In HTTP/2 it's a connection level message but it could have been a request. It's a protocol detail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In HTTP/3 it is a QUIC frame, so it sits even below the protocol. I don't think it should be described or used like a StreamRef. It's a PingRef. No problem having two references used for different purposes. There's no hijacking that I could see outside of the message so if gun_notify is used I'm not sure I see a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, makes sense. I hijacked the stream error format too for the {gun_error, ServerPid, StreamRef, not_supported_by_protocol} error, so gun:await for the PingRef would catch it. I'll remove this special case and treat it as a connection error so it causes a disconnect. Less special cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used dereference_stream_ref on the PingRef too. I guess it works for tunneling.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say for tunneling you would have a StreamRef refering to the tunneled connection, but the PingRef would be kept separate, since they're not the same thing.

I don't know whether using ping on a protocol that doesn't support it should cause a disconnect (it can if that's all we have today), but it certainly should cause a connection error forwarded to the user.

src/gun.erl Outdated
commands(Commands, State);
false ->
ReplyTo ! {gun_error, self(), StreamRef, not_supported_for_protocol}
end;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better not have optional callbacks for protocols. Rather make them return an error, and check the return value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a has_keepalive() -> boolean() just to check if a protocol supports keepalive. Shall we follow this style and add has_ping() -> boolean()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think they are different, the only difference will be whether they have been implemented, and that can be signified via the return value.

We need has_keepalive because we must start a timer. We don't need that for user pings, we can just call the function directly and propagate the error if any.

I think we should make a difference between protocols that will implement it in the future (http3 and ws are likely candidates, would be not_implemented) and protocols that won't (http, raw, socks... would be unsupported_by_protocol or something less verbose).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They can just return {error, not_implemented} and {error, unsupported_by_protocol} and we don't even need to check the return value here. Then they'll be handled like any connection error and it leads to a disconnect. Does this seem OK?

Alternatively, I can wrap it in another tuple like {ok, Commands} | {error, UnsupportedReason}. I don't see the need for this now but I can do it if you prefer it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error reasons seem OK to me.

Payload = case erlang:monotonic_time(microsecond) band 16#ffffffffffffffff of
0 -> 1;
Payload0 -> Payload0
end,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just have a counter that starts at 1 (or 1001, or other...) and increment every time, it'll save us trouble.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I wanted avoid wasting memory to store a counter for a feature that almost nobody will use.

What trouble do you anticipate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future changes may use more than 0 for keepalive and we don't want conflicts to happen. The counter can be stored in user_pings when user pings are used, and user_pings can be set to false when they are not used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

erlang:unique_integer([monotonic, positive]) is probably good enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that's fine too. Which do you prefer: unique integer or counter?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unique integer function is a kind of counter. It will work better than keeping a counter ourselves.

1> erlang:unique_integer([monotonic, positive]).
1
2> erlang:unique_integer([monotonic, positive]).
2
3> erlang:unique_integer([monotonic, positive]).
3
4> erlang:unique_integer([monotonic, positive]).
4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Unique integer does some inter-process sync so it may be slower but that's not a concern here. This is not a hot code patch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as monotonic_time in that regard, but better control over the values it returns.

Make it send payloads starting at 10000 (10000 + erlang:unique_integer([monotonic, positive])) so there's plenty of space left if we need other values in the future. Not sure we need to band 16#ffffffffffffffff, I don't know how long that would take for unique_integer to reach that value, but if we do the wrapped value should be > 10000.

Signed-off-by: Viktor Söderqvist <[email protected]>
Signed-off-by: Viktor Söderqvist <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants