Skip to content

http: refacto headerCallback and get proxy CONNECT request details #964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

krichprollsch
Copy link
Member

This PR is a first step to implement the CDP auth challenge interception.
We don't ignore proxy CONNECT request anymore b/c we want the details of the response: status code and headers.

@krichprollsch krichprollsch self-assigned this Aug 21, 2025
// request is a proxy CONNECT one.
// We know curl uses a CONNECT when it establishes a TLS
// conn.
if (status == 200 and std.mem.startsWith(u8, std.mem.span(url), "https")) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During response header callback, we don't have a reliable way to know the current request is a proxy CONNECT one.
CURLINFO_HTTP_CONNECTCODE would be the solution, but it is set only after the connect request performed.
That's why I rely on https prefix instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should check for "http" also?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't force curl to always tunneling, it doesn't tunnel for http requests. So we don't have a pre CONNECT request here.

@@ -756,7 +756,14 @@ pub const Page = struct {
self.documentIsComplete();
}
},
else => unreachable,
.pre => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix a crash in main.
Try get https://httpbin.io/status/407

@@ -94,6 +94,7 @@ pub const Connection = struct {
opts: Connection.Opts,

const Opts = struct {
use_proxy: bool,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this per-connection?

If we stored the proxy state in Client, I think use that in makeTransfer to immediately set transfer._use_proxy.

// request is a proxy CONNECT one.
// We know curl uses a CONNECT when it establishes a TLS
// conn.
if (status == 200 and std.mem.startsWith(u8, std.mem.span(url), "https")) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should check for "http" also?

// mark the end of parsing headers
if (buf_len == 2) transfer._resp_header_status = .end;

switch (transfer._resp_header_status) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand, the challenge is that the easy we're using may already be connected. So we may or may not get a connect response.

But now that we've removed the callback on a per-header basis, can't we delay doing any of this until the header is complete?

If you take the original code, and keep everything up until the header is complete, i.e.if (buf_len == 2) {

Can you not then do:

if (buf_len != 2) {
   // inversed this from the original to remove 1 layer of nesting.
   // we haven't reached the end of the header.
   return buf_len;
}

if (transfer._use_proxy) {
  // We're connecting to a proxy. If this is a new connection to the proxy, then we expect
  // curl to issue a CONNECT request, and then this would be the header for that request.
  // However, if curl is re-using an easy handle, one that previously connected to the proxy
  // than this would be the actual http request for the resource. We need to figure which this is.

  // there might be a better way to detect this, like CURLINFO_HTTP_CONNECTCODE != 0
  // or maybe CURLINFO_USED_PROXY returns something useful
  // or maybe CURLINFO_EFFECTIVE_METHOD would be 'CONNECT' ??     
  if (getResponseHeader(easy, "connecting", 0)) |ct| {
        if (std.ascii.eqlIgnoreCase(ct, "true") == false) {
            // dunno, some error?
        }
       
       // Ok, this is a response to our connect proxy.
       // we can handle authentication requests, or just let it continue
  } else {
       // We were already connected to the proxy. This is the response
       // for the actual resource, treat this as-if `_use_proxy == false`
  }
}

// This isn't a CONNECT request
// same code as before
if (getResponseHeader(easy, "content-type", 0)) |ct| {
//   ....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand, the challenge is that the easy we're using may already be connected. So we may or may not get a connect response.

It's not about being already connected, but about having or not a pre-CONNECT request.

  • no proxy case: 1 request: GET /xx sent to the target => we want look at the headers
  • proxy and http final target: 1 request: GET /xx sent to the proxy => we want look at the headers
  • proxy and https target: 2 requests: CONNECT sent to the proxy (ignore headers except in failure) then GET /xx sent to the target => we want look at the headers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But now that we've removed the callback on a per-header basis, can't we delay doing any of this until the header is complete?

I will try doing this way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// there might be a better way to detect this, like CURLINFO_HTTP_CONNECTCODE != 0
// or maybe CURLINFO_USED_PROXY returns something useful
// or maybe CURLINFO_EFFECTIVE_METHOD would be 'CONNECT' ??

I tried all these methods:

  • CURLINFO_HTTP_CONNECTCODE is filled only once the request is processed, so it is not available here
  • CURLINFO_EFFECTIVE_METHOD contains the method for the final target, is GET instead of the CONNECT (like CURLINFO_EFFECTIVE_URL contains the final target url instead of the proxy one) I would like to have a CURLINFO_METHOD but it doesn't exists...
  • CURLINFO_USED_PROXY is not helpful as fa as I can see

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did change the code in f753253

@krichprollsch
Copy link
Member Author

We need to add tests for the different proxy/non-proxy scenarios

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants