-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize layout of function arguments in the Rust ABI - take 2 #97559
Conversation
r? @cjgillot (rust-highfive has picked a reviewer for you, use r? to override) |
} else if unlikely!(self.has_all_float(&arg.layout)) { | ||
// We don't want to aggregate floats as an aggregates of Integer | ||
// because this will hurt the generated assembly (#93490) | ||
// | ||
// As an optimization we want to pass homogeneous aggregate of floats | ||
// greater than pointer size as indirect | ||
if size > Pointer.size(self) { | ||
arg.make_indirect(); | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else if unlikely!(self.has_all_float(&arg.layout)) { | |
// We don't want to aggregate floats as an aggregates of Integer | |
// because this will hurt the generated assembly (#93490) | |
// | |
// As an optimization we want to pass homogeneous aggregate of floats | |
// greater than pointer size as indirect | |
if size > Pointer.size(self) { | |
arg.make_indirect(); | |
} | |
} else { | |
} else if size > Pointer.size(self) && unlikely!(self.has_all_float(&arg.layout)) { | |
// We don't want to aggregate floats as an aggregates of Integer | |
// because this will hurt the generated assembly (#93490) | |
// | |
// As an optimization we want to pass homogeneous aggregate of floats | |
// greater than pointer size as indirect | |
arg.make_indirect(); | |
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the same logic as before. Before your suggestion if it was self.has_all_float(&arg.layout)
but NOT size > Pointer.size(self)
nothing would happen, with this suggestion it will take the else branch doing exactly the opposite of what we want to make.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any particular reason that the current behaviour is undesirable for small aggregates with float?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, even if it's desirable that small aggregates with float has a special treatment, it should probably be a separate PR that's backed by separate perf run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well on x86_64
this would mean that f64
-> i64
, f32
-> i32
and [f32; 2]
-> i64
which is not ideal and is the reason of this PR, ie let the possibility for the codegen to use floating point register for small aggregate of floats or to put them behind an indirection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f64
and f32
are not Abi::Aggregate
and will not hit this path. For [f32; 2]
, the current status quo is to convert it to integer.
Essentially this PR is doing two things:
- Pass not-all-float aggregates smaller than 2 register directly
- Stop casting all-float aggregates smaller than 1 register to integer
I am suggesting just do 1 in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried your suggestion locally and only one test failed. I will push it after the perf run so we can have a baseline to know if it's even necessary or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the perf impact of introducing this "fast path" is negligible: #97559 (comment)
https://perf.rust-lang.org/compare.html?start=f220c170f6f0f60621e43d2cb761f7711ae57c43&end=5f9cf23a74c3b0e90cc66c155ed37b74a7906093&stat=instructions%3Au
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 428cb36d1f205f6b99b2ae33849b1957b496d7eb with merge f220c170f6f0f60621e43d2cb761f7711ae57c43... |
☀️ Try build successful - checks-actions |
Queued f220c170f6f0f60621e43d2cb761f7711ae57c43 with parent 4a8d2e3, future comparison URL. |
Finished benchmarking commit (f220c170f6f0f60621e43d2cb761f7711ae57c43): comparison url. Instruction count
Max RSS (memory usage)Results
CyclesResults
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Footnotes |
1d103bc
to
4a6387a
Compare
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 4a6387ad206cdbfda544eb8955d521547bbf8ea6 with merge 5f9cf23a74c3b0e90cc66c155ed37b74a7906093... |
☀️ Try build successful - checks-actions |
Queued 5f9cf23a74c3b0e90cc66c155ed37b74a7906093 with parent dcbd5f5, future comparison URL. |
Finished benchmarking commit (5f9cf23a74c3b0e90cc66c155ed37b74a7906093): comparison url. Instruction count
Max RSS (memory usage)Results
CyclesResults
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Footnotes |
Maybe ready for another round of review after the latest perf. run? Please switch back to the author if more work is needed, thanks! @rustbot ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not an improvement in compile times, and I do not think the improvement in codegen is so clearly significant, from the examples given, as to warrant merging in spite of that. It would be sufficient if we had a sample microbenchmark this wins on in terms of runtime before/after.
@rustbot author
This reverts commit 2e0cf271285089316db55b995312712638126245.
This reverts commit e136c3a9348200c261b9b3c1c50a2f6f6a68b4bd.
a0ede51
to
683e13f
Compare
Me too, and thanks to @bjorn3 #97559 (comment) we now know that it wasn't related to float at all but a changed that I done the the |
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit 683e13f with merge f683963771547e2f8634ff14af3fe3926c12ebd5... |
☀️ Try build successful - checks-actions |
Queued f683963771547e2f8634ff14af3fe3926c12ebd5 with parent 4045ce6, future comparison URL. |
Finished benchmarking commit (f683963771547e2f8634ff14af3fe3926c12ebd5): comparison url. Instruction count
Max RSS (memory usage)Results
CyclesResults
If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Footnotes |
@nbdd0121 Right, what SysV x64 ABI does is more or less the only way to reliably do this. The whole thing with LLVM aggregates and us calling this mode "casting" obscures the fact that if you want to choose which registers something gets passed in, there's no way around looking at its contents and making a decision, nothing else can make that decision for you (though we can provide helpers for common patterns across ABIs, like the "homogeneous aggregate" one). It would be much clearer if we were decomposing a Rust argument into LLVM scalar/vector arguments, and optionally a stack passing argument (but since Clang doesn't already do this for its ABI handling, we risk ABI incompatibilities, so we'd need a lot more testing - see also #65111). |
…ts, r=Amanieu Test that target feature mix up with homogeneous floats is sound This pull-request adds a test in `src/test/abi/` that test that target feature mix up with homogeneous floats is sound. This is basically is ripoff of [src/test/ui/simd/target-feature-mixup.rs](https://github.com/rust-lang/rust/blob/47d1cdb0bcac8e417071ce1929d261efe2399ae2/src/test/ui/simd/target-feature-mixup.rs) but for floats and without `#[repr(simd)]`. *Extracted from rust-lang#97559 since I don't yet know what to do with that PR.*
I don't currently have the time to continue this particular PR, especially if I want to follow eddyb comment. Closing it. |
…ts, r=Amanieu Test that target feature mix up with homogeneous floats is sound This pull-request adds a test in `src/test/abi/` that test that target feature mix up with homogeneous floats is sound. This is basically is ripoff of [src/test/ui/simd/target-feature-mixup.rs](https://github.com/rust-lang/rust/blob/47d1cdb0bcac8e417071ce1929d261efe2399ae2/src/test/ui/simd/target-feature-mixup.rs) but for floats and without `#[repr(simd)]`. *Extracted from rust-lang#97559 since I don't yet know what to do with that PR.*
This PR is another attempt to solving the problem of floats being aggregated as integer, it's purpose is to supersede #94570 which was a big bulldozer and has cause some problem like #97540.
This PR is meant to address issues like #85265 (comment) where floats where being aggregated as integer causing poor codegen. It does this by only passing homogeneous aggregate of floats as array if <= ptr-size and by pointer on array if > ptr-size as an optimization.
It's also a reopening of #93564 with hopefully no regression at all.
Rust code + asm before/after
Rust code:
ASM Before:
ASM After:
cc @nbdd0121 @workingjubilee
Revert #94570
Fixes #97540