Skip to content

rust-installer/install-template.sh: improve efficiency, step 1. #145809

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

he32
Copy link
Contributor

@he32 he32 commented Aug 24, 2025

This round replaces repetitive pattern matching in the inner loop of this script using grep (which causes a fork() for each test) with built-in pattern matching in the Bourne shell using the case / esac construct.

This in reference to
#80684
and is a separated-out request from
rust-lang/rust-installer#111

which apparently never got any review.

The forthcoming planned "step 2" change builds on top of this change, and replaces the inner-loops needless uses of sed (which again causes a fork() for each instance) with the suffix removal constructs from the Bourne shell. Since this change touches lots of the same lines this change does, that pull request cannot be submitted before this one is accepted.

Hopefully this first step is less controversial than the latter change.

This round replaces repetitive pattern matching in the inner loop
of this script using grep (which causes a fork() for each test)
with built-in pattern matching in the bourne shell using the case
/ esac construct.

This in reference to
  rust-lang#80684
and is a separated-out request from
  rust-lang/rust-installer#111

which apparently never got any review.

The forthcoming planned "step 2" change builds on top of this
change, and replaces the inner-loops needless uses of sed (which
again causes a fork() for each instance) with the suffix removal
constructs from the bourne shell.  Since this change touches lots
of the same lines this change does, that pull request cannot be
submitted before this one is accepted.

Hopefully this first step is less controversial than the
latter change.
@rustbot
Copy link
Collaborator

rustbot commented Aug 24, 2025

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) labels Aug 24, 2025
@Kobzol
Copy link
Member

Kobzol commented Aug 24, 2025

Hi, thanks for the PR! Did you try to run any benchmarks (i.e. install a tarball component before/after) the change?

Comment on lines +616 to +619
bin/*)
run cp "$_src_dir/$_component/$_file" "$_file_install_path"
run chmod 755 "$_file_install_path"
;;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a useful change of behavior, but you did not mention it in your PR summary.

Copy link
Contributor Author

@he32 he32 Aug 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little uncertain what "this" refers to. There was no intention to change the behavior of the script. Note that the test in the previous "if" had an "or", so both tests needs to be covered going forward, and that is what I think the new code does. The case construct can only cover the test for the path name, and cannot test for the executeable-ness of the file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I misread. Still I don't understand why these couldn't be combined like in the original code, perhaps by first testing for "bin"ness and saving the result in a variable, then doing the if-or?

@he32
Copy link
Contributor Author

he32 commented Aug 24, 2025

Hi, thanks for the PR! Did you try to run any benchmarks (i.e. install a tarball component before/after) the change?

Admittedly no. However, this set of modifications came about because this script was absurdly slow when doing the builds / install during my testing of the various rust releases up through the ages, on & for the various different NetBSD targets.

To run a benchmark I would have to figure out how to rig up a test scaffolding for this script, since just doing this as part of a full build will take way too long time.

To me it is sort of obvious that doing a massive number of avoidable fork()s inside the inner loop of any shell script is going to needlessly slow it down, especially when in-shell constructs can achieve the same results. The diff is made by making 1:1 changes which do not change the behavior of the script.

The question can of course come from a couple of different perspectives:

  1. A general desire to quantify the size of the improvement
  2. Whether this change is worth the time it takes to review and get it integrated.
  3. A question of whether this makes any difference at all.

I am hoping the question doesn't come from the third perspective. The second perspective also should not be a valid objection; as the referenced issue describes, it is well known that this script in its current form is just slow.

@Mark-Simulacrum
Copy link
Member

@bors try

Let's produce some artifacts -- I think the install scripts aren't exercised anywhere(?) in our normal release process, so it'll need some manual testing.

rust-bors bot added a commit that referenced this pull request Aug 24, 2025
rust-installer/install-template.sh: improve efficiency, step 1.
@rust-bors

This comment has been minimized.

@Mark-Simulacrum
Copy link
Member

It's a larger change, but I'm also wondering why this was written as a shell script. AFAICT, this is always bundled into a tarball we produce that contains some kind of binary artifacts. Maybe this could be a Rust program? That would (a) improve our ability to review changes and (b) likely help with performance, at least insofar as we can avoid any sub-shells that aren't needed.

@he32
Copy link
Contributor Author

he32 commented Aug 24, 2025

OK, I've done some testing.

First off, I install rust 1.88.0 into an empty prefix via

ktrace -i ./install-orig.sh --components=cargo,rust-std-x86_64-unknown-netbsd,rustc --prefix=/var/tmp/t1

to collect system call trace to see how many fork() invocations we have.

kdump | awk '/CALL.*fork/ { n++ } END { print n }'

With the original install.sh script, we get 3910 fork() calls. With the "use case / esac for pattern matching" version we get 2227 fork() calls.

Secondly, doing the install into an already-populated prefix gets us with the original installer the following outputs from "time" in csh:

4.236u 7.596s 0:12.84 92.0%     2546+3121k 0+655io 388pf+0w
4.277u 8.142s 0:13.78 90.0%     2425+2947k 0+649io 379pf+0w
4.212u 7.757s 0:13.37 89.4%     2516+3170k 0+651io 281pf+0w

With the suggested "case / esac for pattern matching" version suggested here, I got

2.823u 5.461s 0:09.31 88.9%     2276+1693k 0+690io 336pf+0w
2.861u 5.468s 0:10.09 82.4%     2265+3026k 0+673io 421pf+0w
2.826u 5.924s 0:09.11 95.9%     2156+2427k 0+673io 5pf+0w

So ... a nearly 30% reduction in wallclock time on this particular host (which is pretty beefy, I predict that the effect will be even more pronounced on slower hosts), and a reduction in the number of fork() invocations of some 43%. And that's before elimination of cut / sed for string suffix / prefix removal, so there is "more to come".

And ... the reason this is particularly noticeable with the "doc" component is most probably that it has a rather larger set of files. The components installed here have relatively few:

$ wc -l cargo/manifest.in rust-std-x86_64-unknown-netbsd/manifest.in rustc/manifest.in
      43 cargo/manifest.in
      28 rust-std-x86_64-unknown-netbsd/manifest.in
      55 rustc/manifest.in
     126 total
$ 

I don't have a build with the doc component at hand at the moment.

@he32
Copy link
Contributor Author

he32 commented Aug 24, 2025

It's a larger change, but I'm also wondering why this was written as a shell script. AFAICT, this is always bundled into a tarball we produce that contains some kind of binary artifacts. Maybe this could be a Rust program? That would (a) improve our ability to review changes and (b) likely help with performance, at least insofar as we can avoid any sub-shells that aren't needed.

Yes, that would be a larger change. Why it is the way it is (a shell script), I cannot comment on at the moment. And turning this into a rust program would exceed my current rust abilities, so it would not come from this corner.

However, let me suggest that we first measure the performance improvements we can get with "known fixes" to the existing script to get this from "unbearably slow" to "manageable also on slow hosts" before embarking on that larger rewrite. Expediency has to count for something...

And... It also looks like this avenue has been attempted before, #80684 contains pointers to both similar suggestions (which were not taken), and some which were (the --bulk-dirs for docs). At the end of this it might be worth reviewing those other old suggestions to see which ones are still applicable.

@rust-bors
Copy link

rust-bors bot commented Aug 24, 2025

☀️ Try build successful (CI)
Build commit: 76ab1d0 (76ab1d0bfb8d5b7380aa094bc6be04c5085dae9f, parent: 41a79f1862aa6b81bac674598e275e80e9f09eb9)

@he32
Copy link
Contributor Author

he32 commented Aug 24, 2025

And ... to preview the suggested next pull request, replacing cut and sed in the inner loop of the script with parameter expansion which does "remove largest suffix pattern" and "remove shortest prefix pattern" modifications reduces the number of fork() invocations further down from the original 3910, improved by the fix in this pull request to 2227, to 1153. Repeating the same test as above gives the following times:

1.401u 3.399s 0:04.64 103.2%    843+1481k 0+655io 20pf+0w
1.450u 3.346s 0:04.58 104.5%    843+971k 0+653io 0pf+0w
1.347u 3.442s 0:04.63 103.2%    845+1218k 0+655io 0pf+0w

So, average wall-clock time of 4.62, which is around 35% of the original timing (reduced by 65%), and the number of fork() invocations is down to around 30% of the original value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants