-
-
Notifications
You must be signed in to change notification settings - Fork 614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ca: unsplit issuance flow #8014
base: main
Are you sure you want to change the base?
Conversation
@jsha, this PR appears to contain configuration and/or SQL schema changes. Please ensure that a corresponding deployment ticket has been filed with the new values. |
@jsha, this PR adds one or more new feature flags: UnsplitIssuance. As such, this PR must be accompanied by a review of the Let's Encrypt CP/CPS to ensure that our behavior both before and after this flag is flipped is compliant with that document. Please conduct such a review, then add your findings to the PR description in a paragraph beginning with "CPS Compliance Review:". |
Note from alert triage: after this lands and the flag is flipped, we should have SRE alert on total latency of the new SctProvider.GetSCTs method, rather than alerting on the latency of individual logs as they do today. (We could have done this ages ago with a manual timing metric wrapping the whole log-racing section, but I didn't think about it until now.) edit: I see now that we actually do have this manual timing metric already, so ignore me. |
ca/ca.go
Outdated
DER: resp.DER, | ||
SCTs: scts.SctDER, | ||
RegistrationID: issueReq.RegistrationID, | ||
OrderID: issueReq.RegistrationID, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be issueReq.OrderID
?
mocks/ca.go
Outdated
DER: resp.DER, | ||
SCTs: nil, | ||
RegistrationID: req.RegistrationID, | ||
OrderID: req.RegistrationID, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above: this should probably be req.OrderID
.
There's a CI flake here:
This is the CA reporting that it got a connection refused from an RA during GetSCTs (note: the CA returns that error to the RA, so this is logged by the RA). However, I'm a bit confused by this because startservers.py waits until all services are up. It's true there's a now dependency cycle between the CA and the RA that isn't expressed in startservers.py. But in practical terms that shouldn't generate this error. The |
Also interesting that the RA reports coming up more than a second before the connection refused error:
So maybe the problem is not "service not yet listening," but something else. Maybe "service crashed silently?" 🤔 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what's up with the integration test. The circular dependency does make me a bit nervous...
FYI: I pushed updates addressing feedback but still consider merging blocked on figuring out what's up with the integration test. Fortunately I can occasionally reproduce locally, so I'm off to the races. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM % flaky integration test
Here's what I think is going on:
If I set Still need to think about the best way to solve the problem though. I'd prefer for all our production RPC calls to consistently use NoWaitForReady (and in fact was hoping to deprecate that feature flag). But also this might turn up as a similar problem with health checks in prod, depending on how exactly we use them? One possibility is to run the SCT Provider service from a different set of RAs. Or run a totally different binary. That would nicely solve the topology issue in both test and prod. |
Add a new RPC to the CA:
IssueCertificate
covers issuance of both the precertificate and the final certificate. In between, it calls out to the RA's new methodGetSCTs
.The RA calls the new
CA.IssueCertificate
if theUnsplitIssuance
feature flag is true.The RA had a metric that counted certificates by profile name and hash. Since the RA doesn't receive a profile hash in the new flow, simply record the total number of issuances.
Fixes #7983