Remove fewer Storage calls in CopyProp and GVN #142531

ohadravid · 2025-06-15T07:51:38Z

Modify the CopyProp and GVN MIR optimization passes to remove fewer Storage{Live,Dead} calls, allowing for better optimizations by LLVM - see #141649.

Details

The idea is to use a new MaybeUninitializedLocals analysis and remove only the storage calls of locals that are maybe-uninit when accessed in a new location.

rustbot · 2025-06-15T07:51:44Z

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

matthiaskrgr · 2025-06-15T09:37:15Z

@bors try @rust-timer queue

…try> Remove fewer Storage calls in `copy_prop` Modify the `copy_prop` MIR optimization pass to remove fewer `Storage{Live,Dead}` calls, allowing for better optimizations by LLVM - see #141649. ### Details This is my attempt to fix the mentioned issue (this is the first part, I also implemented a similar solution for GVN in [this branch](https://github.com/rust-lang/rust/compare/master...ohadravid:rust:better-storage-calls-gvn-v2?expand=1)). The idea is to use the `MaybeStorageDead` analysis and remove only the storage calls of `head`s that are maybe-storage-dead when the associated `local` is accessed (or, conversely, keep the storage of `head`s that are for-sure alive in _every_ relevant access). When combined with the GVN change, the final example in the issue (#141649 (comment)) is optimized as expected by LLVM. I also measured the effect on a few functions in `rav1d` (where I originally saw the issue) and observed reduced stack usage in several of them. This is my first attempt at working with MIR optimizations, so it's possible this isn't the right approach — but all tests pass, and the resulting diffs appear correct. r? tmiasko since he commented on the issue and pointed to these passes.

bors · 2025-06-15T09:38:28Z

⌛ Trying commit d24d035 with merge ef7d206...

bors · 2025-06-15T12:05:29Z

☀️ Try build successful - checks-actions
Build commit: ef7d206 (ef7d20666974f0dac45b03e051f2e283f9d9f090)

rust-timer · 2025-06-15T13:31:55Z

Finished benchmarking commit (ef7d206): comparison URL.

Overall result: ❌ regressions - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	0.3%	[0.2%, 0.4%]	8
Regressions ❌ (secondary)	0.3%	[0.2%, 0.4%]	7
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.3%	[0.2%, 0.4%]	8

Max RSS (memory usage)

Results (primary 0.7%, secondary 3.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	3.5%	[1.8%, 5.0%]	5
Regressions ❌ (secondary)	3.4%	[3.4%, 3.4%]	1
Improvements ✅ (primary)	-3.9%	[-6.5%, -2.0%]	3
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.7%	[-6.5%, 5.0%]	8

Cycles

Results (primary -0.6%, secondary -0.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	3.8%	[3.8%, 3.8%]	1
Improvements ✅ (primary)	-0.6%	[-0.6%, -0.6%]	1
Improvements ✅ (secondary)	-4.1%	[-4.1%, -4.1%]	1
All ❌✅ (primary)	-0.6%	[-0.6%, -0.6%]	1

Binary size

Results (primary 0.0%, secondary 0.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.0%, 0.8%]	10
Regressions ❌ (secondary)	0.1%	[0.0%, 0.1%]	5
Improvements ✅ (primary)	-0.2%	[-0.8%, -0.0%]	8
Improvements ✅ (secondary)	-0.2%	[-0.2%, -0.2%]	1
All ❌✅ (primary)	0.0%	[-0.8%, 0.8%]	18

Bootstrap: 757.399s -> 756.065s (-0.18%)
Artifact size: 372.20 MiB -> 372.12 MiB (-0.02%)

ohadravid · 2025-06-15T14:55:18Z

@matthiaskrgr - I updated the impl to stop re-checking once a head is found to be maybe-dead, which should be a bit better

matthiaskrgr · 2025-06-15T15:06:54Z

@bors try @rust-timer queue

bors · 2025-06-15T15:08:08Z

⌛ Trying commit 905e968 with merge c0a2949...

…try> Remove fewer Storage calls in `copy_prop` Modify the `copy_prop` MIR optimization pass to remove fewer `Storage{Live,Dead}` calls, allowing for better optimizations by LLVM - see #141649. ### Details This is my attempt to fix the mentioned issue (this is the first part, I also implemented a similar solution for GVN in [this branch](https://github.com/rust-lang/rust/compare/master...ohadravid:rust:better-storage-calls-gvn-v2?expand=1)). The idea is to use the `MaybeStorageDead` analysis and remove only the storage calls of `head`s that are maybe-storage-dead when the associated `local` is accessed (or, conversely, keep the storage of `head`s that are for-sure alive in _every_ relevant access). When combined with the GVN change, the final example in the issue (#141649 (comment)) is optimized as expected by LLVM. I also measured the effect on a few functions in `rav1d` (where I originally saw the issue) and observed reduced stack usage in several of them. This is my first attempt at working with MIR optimizations, so it's possible this isn't the right approach — but all tests pass, and the resulting diffs appear correct. r? tmiasko since he commented on the issue and pointed to these passes.

cjgillot · 2025-06-15T15:45:26Z

Should this check happen in Replacer::visit_local, and move the replacement of storage statements to a dedicated cleanup visitor?

bors · 2025-06-15T17:41:36Z

☀️ Try build successful - checks-actions
Build commit: c0a2949 (c0a294957df10fc3880e1677c72c0cf122485509)

ohadravid · 2025-06-15T18:12:43Z

Should this check happen in Replacer::visit_local

I'm not sure how to make this work: using ResultsCursor requires a &body, but it's not possible to have that while running a MutVisitor since it requires a &mut body.

Is there a different way to do this?

rust-timer · 2025-06-15T20:15:45Z

Finished benchmarking commit (c0a2949): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	0.3%	[0.2%, 0.4%]	9
Regressions ❌ (secondary)	0.3%	[0.2%, 0.4%]	7
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.2%	[-0.2%, -0.2%]	1
All ❌✅ (primary)	0.3%	[0.2%, 0.4%]	9

Max RSS (memory usage)

Results (primary -0.1%, secondary -1.3%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	4.2%	[3.4%, 5.8%]	4
Regressions ❌ (secondary)	3.1%	[3.1%, 3.1%]	1
Improvements ✅ (primary)	-4.4%	[-6.6%, -1.8%]	4
Improvements ✅ (secondary)	-5.8%	[-5.8%, -5.8%]	1
All ❌✅ (primary)	-0.1%	[-6.6%, 5.8%]	8

Cycles

Results (secondary -1.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.3%	[2.3%, 2.3%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.6%	[-2.6%, -2.5%]	2
All ❌✅ (primary)	-	-	0

Binary size

Results (primary -0.0%, secondary 0.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.0%, 0.8%]	10
Regressions ❌ (secondary)	0.1%	[0.0%, 0.1%]	5
Improvements ✅ (primary)	-0.2%	[-0.8%, -0.0%]	8
Improvements ✅ (secondary)	-0.2%	[-0.2%, -0.2%]	1
All ❌✅ (primary)	-0.0%	[-0.8%, 0.8%]	18

Bootstrap: 756.494s -> 757.685s (0.16%)
Artifact size: 372.15 MiB -> 372.11 MiB (-0.01%)

compiler/rustc_mir_transform/src/copy_prop.rs

compiler/rustc_mir_dataflow/src/impls/initialized.rs

compiler/rustc_mir_transform/src/copy_prop.rs

tests/mir-opt/copy-prop/copy_prop_storage_dead_twice.rs

Remove fewer Storage calls in GVN Followup to #142531 (Remove fewer Storage calls in `copy_prop`) Modify the GVN MIR optimization pass to remove fewer Storage{Live,Dead} calls, allowing for better optimizations by LLVM - see #141649. After replacing locals with values, use the `MaybeStorageDead` analysis to check that the replaced locals are storage-live. **A slight problem**: In #142531, `@tmiasko` noted #142531 (comment) that `MaybeStorageDead` isn't enough since there can be a `Live(_1); Dead(_1); Live(_1);` block which forces the optimization to check that each value is initialised (and not only storage-live). This is easy enough in `copy_prop` (because we are checking _before_ the replacement), but in GVN it is actually hard to tell for each statement if the local must be initialized or not after the fact (and modifying `VnState` seems even harder). I opted for something else which might be wrong (implemented in the last two commits): If we consider `Dead->Live` to be the same as `Deinit`, than such a local shouldn't be considered SSA - so I updated `SsaVisitor` to mark such cases as non-SSA. r? tmiasko

compiler/rustc_mir_transform/src/copy_prop.rs

cjgillot · 2025-06-22T13:38:51Z

@ohadravid do you mind merging this PR and #142819? Both should use the same code to decide whether to keep or remove storage statements. And I fear that having 2 PRs mean that @tmiasko and I won't see each other ideas and give you diverging advice.

… to remove fewer storage statements

…r storage statements

ohadravid · 2025-06-22T16:34:40Z

@cjgillot , @tmiasko - merged both PR here.

Current impls are based on the new MaybeUninitializedLocals analysis in both passes, with all the new tests cases passing.

Does GVN require an additional check against borrowed locals like mentioned in #142531 (comment)?

Both only do the more complex analysis when tcx.sess.emit_lifetime_markers(), so they shouldn't negatively affect check/debug builds, but the last perf run did show some changes to them as well.

And thank you both for reviewing these and explaining everything! 🙏

tmiasko · 2025-06-23T07:04:12Z

compiler/rustc_mir_transform/src/copy_prop.rs

+        let mut head_storage_to_check = DenseBitSet::new_empty(fully_moved.domain_size());
        let mut storage_to_remove = DenseBitSet::new_empty(fully_moved.domain_size());


The information stored in head_storage_to_check is redundant, since one can always examine storage_to_remove instead. Can you remove head_storage_to_check?

tmiasko · 2025-06-23T08:52:44Z

compiler/rustc_mir_transform/src/copy_prop.rs

+                storage_to_remove,
+            };
+
+            storage_checker.visit_body(body);


Visit only reachable blocks with traversal::reachable. By default the dataflow engine prohibits obtaining results from unreachable blocks (there is a debug assertion).

Can you also add a test that code from unreachable blocks doesn't block the optimization?

#![feature(custom_mir, core_intrinsics)] extern crate core; use core::intrinsics::mir::*; #[custom_mir(dialect = "runtime", phase = "post-cleanup")] pub fn f(_1: &mut usize) { mir! { let _2: usize; let _3: usize; { StorageLive(_2); _2 = 42; _3 = _2; (*_1) = _3; StorageDead(_2); Return() } bb1 = { // Ensure that _2 is considered uninitialized by `MaybeUninitializedLocals`. StorageLive(_2); // Use of _3 (in an unreachable block) when definition of _2 is unavailable. (*_1) = _3; StorageDead(_2); Return() } } }

tmiasko · 2025-06-23T08:53:59Z

tests/mir-opt/copy-prop/copy_prop_borrowed_storage_not_removed.rs

@@ -0,0 +1,30 @@
+// skip-filecheck


Add FileCheck annotations for the test (and to all new tests in general).

tmiasko · 2025-06-23T08:54:21Z

tests/mir-opt/copy-prop/copy_prop_storage_twice.rs

@@ -0,0 +1,61 @@
+// skip-filecheck


Add FileCheck annotations for the test.

tmiasko · 2025-06-24T07:36:39Z

compiler/rustc_mir_dataflow/src/impls/initialized.rs

+///
+/// This is a simpler analysis than `MaybeUninitializedPlaces`, because it does not track
+/// individual fields.
+pub struct MaybeUninitializedLocals;


The results of the analysis are only meaningful for locals in SSA form. Can you move the implementation to the same module as SsaLocals.

tmiasko · 2025-06-24T08:11:03Z

compiler/rustc_mir_transform/src/copy_prop.rs

+    fn visit_local(&mut self, local: Local, context: PlaceContext, loc: Location) {
+        // We don't need to check storage statements and statements for which the local doesn't need to be initialized.


For local == head we would be preserving the existing behavior and we don't need to check anything. Return early in that situation.

Could you also add test for this? We should optimize and keep storage statements in:

#![feature(custom_mir, core_intrinsics)] extern crate core; use core::intrinsics::mir::*; #[custom_mir(dialect = "runtime", phase = "post-cleanup")] pub fn f(_1: &mut usize) { mir! { let _2: usize; let _3: usize; { StorageLive(_2); _2 = 0; _3 = _2; (*_1) = _3; StorageDead(_2); (*_1) = _2; Return() } } }

tmiasko · 2025-06-24T10:35:27Z

tests/mir-opt/copy-prop/issue_141649.rs

+// skip-filecheck
+// EMIT_MIR_FOR_EACH_PANIC_STRATEGY
+//@ test-mir-pass: CopyProp
+
+// EMIT_MIR issue_141649.main.CopyProp.diff
+fn main() {


Add FileCheck annotations along with an explanation of the test. I would also suggest writing MIR directly, so that the correspondence between source and FileCheck assertions is clear.

tmiasko · 2025-06-25T06:56:32Z

I am not familiar with GVN, so I will leave review of that part to @cjgillot .

bors · 2025-06-25T18:23:13Z

☔ The latest upstream changes (presumably #142870) made this pull request unmergeable. Please resolve the merge conflicts.

rustbot assigned tmiasko Jun 15, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 15, 2025