Tags: avar/git
Tags
hooks: fix a TOCTOU in "did we run a hook?" heuristic Fix a Time-of-check to time-of-use (TOCTOU) race in code added in 680ee55 (commit: skip discarding the index if there is no pre-commit hook, 2017-08-14). We can fix the race passing around information about whether or not we ran the hook in question, instead of running hook_exists() after the fact to check if the hook in question exists. This problem has been noted on-list when 680ee55 was discussed[1], but had not been fixed. In addition to fixing this for the pre-commit hook as suggested there I'm also fixing this for the pre-merge-commit hook. See 6098817 (git-merge: honor pre-merge-commit hook, 2019-08-07) for the introduction of its previous behavior. Let's also change this for the push-to-checkout hook. Now instead of checking if the hook exists and either doing a push to checkout or a push to deploy we'll always attempt a push to checkout. If the hook doesn't exist we'll fall back on push to deploy. The same behavior as before, without the TOCTOU race. See 0855331 (receive-pack: support push-to-checkout hook, 2014-12-01) for the introduction of the previous behavior. This leaves uses of hook_exists() in two places that matter. The "reference-transaction" check in refs.c, see 6754159 (refs: implement reference transaction hook, 2020-06-19), and the prepare-commit-msg hook, see 66618a5 (sequencer: run 'prepare-commit-msg' hook, 2018-01-24). In both of those cases we're saving ourselves CPU time by not preparing data for the hook that we'll then do nothing with if we don't have the hook. So using this "invoked_hook" pattern doesn't make sense in those cases. More importantly, in those cases the worst we'll do is miss that we "should" run the hook because a new hook appeared, whereas in the pre-commit and pre-merge-commit cases we'll skip an important discard_cache() on the bases of our faulty guess. I do think none of these races really matter in practice. It would be some one-off issue as a hook was added or removed. I did think it was stupid that we didn't pass a "did this run?" flag instead of doing this guessing at a distance though, so now we're not guessing anymore. 1. https://lore.kernel.org/git/[email protected]/ Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]>
bundle-uri docs: add design notes Add a design doc for the bundle-uri protocol extension to go along with the packfile-uri extension added in cd8402e (Documentation: add Packfile URIs design doc, 2020-06-10). Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]>
hooks: fix a TOCTOU in "did we run a hook?" heuristic Fix a Time-of-check to time-of-use (TOCTOU) race in code added in 680ee55 (commit: skip discarding the index if there is no pre-commit hook, 2017-08-14). We can fix the race passing around information about whether or not we ran the hook in question, instead of running hook_exists() after the fact to check if the hook in question exists. This problem has been noted on-list when 680ee55 was discussed[1], but had not been fixed. In addition to fixing this for the pre-commit hook as suggested there I'm also fixing this for the pre-merge-commit hook. See 6098817 (git-merge: honor pre-merge-commit hook, 2019-08-07) for the introduction of its previous behavior. Let's also change this for the push-to-checkout hook. Now instead of checking if the hook exists and either doing a push to checkout or a push to deploy we'll always attempt a push to checkout. If the hook doesn't exist we'll fall back on push to deploy. The same behavior as before, without the TOCTOU race. See 0855331 (receive-pack: support push-to-checkout hook, 2014-12-01) for the introduction of the previous behavior. This leaves uses of hook_exists() in two places that matter. The "reference-transaction" check in refs.c, see 6754159 (refs: implement reference transaction hook, 2020-06-19), and the prepare-commit-msg hook, see 66618a5 (sequencer: run 'prepare-commit-msg' hook, 2018-01-24). In both of those cases we're saving ourselves CPU time by not preparing data for the hook that we'll then do nothing with if we don't have the hook. So using this "invoked_hook" pattern doesn't make sense in those cases. More importantly, in those cases the worst we'll do is miss that we "should" run the hook because a new hook appeared, whereas in the pre-commit and pre-merge-commit cases we'll skip an important discard_cache() on the bases of our faulty guess. I do think none of these races really matter in practice. It would be some one-off issue as a hook was added or removed. I did think it was stupid that we didn't pass a "did this run?" flag instead of doing this guessing at a distance though, so now we're not guessing anymore. 1. https://lore.kernel.org/git/[email protected]/ Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]>
Revert "marks" This reverts commit eb48b1f.
serve: add command to advertise bundle URIs When the uploadpack.bundleURI config is set to a URI (or URIs, if set >1 times), advertise a "bundle-uri" command, then when the client requests "bundle-uri" emit those URIs back at them. The client CAN then request those URIs out of bounds, and after they've done (after either disconnecting & coming back, or leaving us hanging), proceed with the rest of request flow. I.e. issuing a "ls-refs" followed by a "fetch". The client MAY then send us "have" lines with the tips they've unpacked from their newly acquired bundle(s). This commit doesn't implement any of that required client behavior, only the trivial server behavior of spewing a list of URLs at the client on request. There is already a uploadpack.blobPackfileUri setting for the server, so why is this needed? The Documentation/technical/bundle-uri.txt added in a preceding commit discusses that in more detail, but in summary: 1. There is no "real" support for in git.git. The uploadpack.blobPackfileUri setting allows carving out a list of blobs (actually any OIDs), but as alluded to in bfc2a36 (Doc: clarify contents of packfile sent as URI, 2021-01-20) the only "real" implementation is JGit based. 2. The uploadpack.blobPackfileUri is a MUST where this is a "CAN". I.e. once a client says they support packfile-uri of given list of protocols the server will send them a PACK response assuming they've downloaded the URI they client was sent, if the client doesn't do that they don't have a valid repository. Pointing at a bundle and having the client send us "have" lines (or not, maybe they couldn't fetch it, or decided they didn't want to) is more flexible, and can gracefully recover e.g. if the CDN isn't reachable (maybe you do support "https", but the CDN provider is down, or blocked your whole country). 3. Because of the disconnect in #2 "dumb" servers can seed pre-clients, e.g. we might point to a repo.bundle whose exact state we aren't sure of (a cronjob updates it, sometimes). The client will discover its contents, and give us the "have" lines, the "packfile-uri" method effectively requires the server to have those exact "have" lines (or rather, it will produce a similar PACK using give-or-take the same exclusions). 4. This provides an easy way to the long sought after "resumable clones". I.e. since we can assume that it's in the server's interest to keep their bundle(s) as up-to-date as possible, most or all of the history we need to fetch will be in the bundle. If we fail midway through the "clone" we can offload the problem of resuming to wget/curl/rsync/whatever, instead of (as has been suggested, but not implemented for the "normal" dialog) "repairing" a partial PACK response or something. There was a suggestion of implementing a similar feature long ago[1] by Jeff King. The main difference between it and this approach is that we've since gained protocol v2, so we can add this as an optional path in the dialog between client and server. The 2011 implementation hooked into the transport mechanism to try to clone from a bundle directly. See also [2] and [3] for some later mentions of that approach. See also [4] for the series that implemented uploadpack.blobPackfileUri, and [5] for a series on top that did the .gitmodules check in that context. See [6] for the "ls-refs unborn" feature which modified code in similar areas of the request flow. 1. https://lore.kernel.org/git/[email protected]/ 2. https://lore.kernel.org/git/[email protected]/ 3. https://lore.kernel.org/git/[email protected]/ 4. https://lore.kernel.org/git/[email protected]/ Merged as 34e849b (Merge branch 'jt/cdn-offload', 2020-06-25) 5. https://lore.kernel.org/git/[email protected]/ Merged as 6ee353d (Merge branch 'jt/transfer-fsck-across-packs', 2021-03-01) 6. 69571df (Merge branch 'jt/clone-unborn-head', 2021-02-17) Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]>
bundle-uri client: add boolean transfer.bundleURI setting The yet-to-be introduced client support for bundle-uri will always fall back on a full clone, but we'd still like to be able to ignore a server's bundle-uri advertisement entirely. This is useful for testing, and if a server is pointing to bad bundles, they take a while to time out etc. Since we might see the config in any order we need to clear out any accumulated bundle_uri list when we see transfer.bundleURI=false setting, and not add any more things to the list. Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]>
bundle-uri docs: add design notes Add a design doc for the bundle-uri protocol extension to go along with the packfile-uri extension added in cd8402e (Documentation: add Packfile URIs design doc, 2020-06-10). Signed-off-by: Ævar Arnfjörð Bjarmason <[email protected]>
this offset optimization is never going to work consider content from/to 123xxxx 12xxxxx and now if the user's needle is "12x", I'll have skipped up to: 3xxxx xxxxx So even in the fixed-string case it won't work.
don't have a test for this, but harmless?
Revert "assorted changes" This reverts commit 5142fcc.
PreviousNext