-
Notifications
You must be signed in to change notification settings - Fork 249
http: refacto headerCallback and get proxy CONNECT request details #964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
src/http/Client.zig
Outdated
// request is a proxy CONNECT one. | ||
// We know curl uses a CONNECT when it establishes a TLS | ||
// conn. | ||
if (status == 200 and std.mem.startsWith(u8, std.mem.span(url), "https")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During response header callback, we don't have a reliable way to know the current request is a proxy CONNECT
one.
CURLINFO_HTTP_CONNECTCODE
would be the solution, but it is set only after the connect request performed.
That's why I rely on https
prefix instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should check for "http" also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't force curl to always tunneling, it doesn't tunnel for http
requests. So we don't have a pre CONNECT
request here.
@@ -756,7 +756,14 @@ pub const Page = struct { | |||
self.documentIsComplete(); | |||
} | |||
}, | |||
else => unreachable, | |||
.pre => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix a crash in main
.
Try get https://httpbin.io/status/407
src/http/Http.zig
Outdated
@@ -94,6 +94,7 @@ pub const Connection = struct { | |||
opts: Connection.Opts, | |||
|
|||
const Opts = struct { | |||
use_proxy: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this per-connection?
If we stored the proxy state in Client
, I think use that in makeTransfer
to immediately set transfer._use_proxy
.
src/http/Client.zig
Outdated
// request is a proxy CONNECT one. | ||
// We know curl uses a CONNECT when it establishes a TLS | ||
// conn. | ||
if (status == 200 and std.mem.startsWith(u8, std.mem.span(url), "https")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should check for "http" also?
src/http/Client.zig
Outdated
// mark the end of parsing headers | ||
if (buf_len == 2) transfer._resp_header_status = .end; | ||
|
||
switch (transfer._resp_header_status) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand, the challenge is that the easy
we're using may already be connected. So we may or may not get a connect response.
But now that we've removed the callback on a per-header basis, can't we delay doing any of this until the header is complete?
If you take the original code, and keep everything up until the header is complete, i.e.if (buf_len == 2) {
Can you not then do:
if (buf_len != 2) {
// inversed this from the original to remove 1 layer of nesting.
// we haven't reached the end of the header.
return buf_len;
}
if (transfer._use_proxy) {
// We're connecting to a proxy. If this is a new connection to the proxy, then we expect
// curl to issue a CONNECT request, and then this would be the header for that request.
// However, if curl is re-using an easy handle, one that previously connected to the proxy
// than this would be the actual http request for the resource. We need to figure which this is.
// there might be a better way to detect this, like CURLINFO_HTTP_CONNECTCODE != 0
// or maybe CURLINFO_USED_PROXY returns something useful
// or maybe CURLINFO_EFFECTIVE_METHOD would be 'CONNECT' ??
if (getResponseHeader(easy, "connecting", 0)) |ct| {
if (std.ascii.eqlIgnoreCase(ct, "true") == false) {
// dunno, some error?
}
// Ok, this is a response to our connect proxy.
// we can handle authentication requests, or just let it continue
} else {
// We were already connected to the proxy. This is the response
// for the actual resource, treat this as-if `_use_proxy == false`
}
}
// This isn't a CONNECT request
// same code as before
if (getResponseHeader(easy, "content-type", 0)) |ct| {
// ....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand, the challenge is that the easy we're using may already be connected. So we may or may not get a connect response.
It's not about being already connected, but about having or not a pre-CONNECT
request.
- no proxy case: 1 request:
GET /xx
sent to the target => we want look at the headers - proxy and http final target: 1 request:
GET /xx
sent to the proxy => we want look at the headers - proxy and https target: 2 requests:
CONNECT
sent to the proxy (ignore headers except in failure) thenGET /xx
sent to the target => we want look at the headers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But now that we've removed the callback on a per-header basis, can't we delay doing any of this until the header is complete?
I will try doing this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// there might be a better way to detect this, like CURLINFO_HTTP_CONNECTCODE != 0
// or maybe CURLINFO_USED_PROXY returns something useful
// or maybe CURLINFO_EFFECTIVE_METHOD would be 'CONNECT' ??
I tried all these methods:
CURLINFO_HTTP_CONNECTCODE
is filled only once the request is processed, so it is not available hereCURLINFO_EFFECTIVE_METHOD
contains the method for the final target, isGET
instead of theCONNECT
(likeCURLINFO_EFFECTIVE_URL
contains the final target url instead of the proxy one) I would like to have aCURLINFO_METHOD
but it doesn't exists...CURLINFO_USED_PROXY
is not helpful as fa as I can see
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did change the code in f753253
And call it only after the headers are parsed, either from data callback or end of the request.
We need to add tests for the different proxy/non-proxy scenarios |
This PR is a first step to implement the CDP auth challenge interception.
We don't ignore proxy
CONNECT
request anymore b/c we want the details of the response: status code and headers.