Skip to content

dont-shuffle-bswaps codegen test broken on none-x86_64/arm64 archs #142068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Fabian-Gruenbichler opened this issue Jun 5, 2025 · 15 comments
Open
Labels
A-codegen Area: Code generation C-bug Category: This is a bug.

Comments

@Fabian-Gruenbichler
Copy link
Contributor

Fabian-Gruenbichler commented Jun 5, 2025

Summary

the test case introduced by #136761 is broken on at least loong64, ppc64el, mips64el and ppc64.

Command used

./x test tests/codegen/dont-shuffle-bswaps.rs --target powerpc64le-unknown-linux-gnu

Expected behaviour

the test should pass, or be ignored

Actual behaviour

the test fails because the FileCheck annotations don't match (because of missing vectorisation?)

Operating system

Debian experimental

HEAD

425e142686242c7e73f5e32c79071ae266f0f355

Additional context

Test output

------FileCheck stderr------------------------------
/build/reproducible-path/rustc-1.86.0+dfsg1/tests/codegen/dont-shuffle-bswaps.rs:17:10: error: OPT3: expected string not found in input
// OPT3: load <8 x i16>
         ^
/build/reproducible-path/rustc-1.86.0+dfsg1/build/powerpc64-unknown-linux-gnu/test/codegen/dont-shuffle-bswaps.OPT3/dont-shuffle-bswaps.ll:7:22: note: scanning from here
define void @convert(ptr dead_on_unwind noalias nocapture noundef writable writeonly sret([16 x i8]) align 1 dereferenceable(16) %_0, ptr noalias nocapture noundef readonly align 2 dereferenceable(16) %value) unnamed_addr #0 {
                     ^
/build/reproducible-path/rustc-1.86.0+dfsg1/build/powerpc64-unknown-linux-gnu/test/codegen/dont-shuffle-bswaps.OPT3/dont-shuffle-bswaps.ll:9:8: note: possible intended match here
 %_4 = load i16, ptr %value, align 2, !noundef !2
       ^

Input file: /build/reproducible-path/rustc-1.86.0+dfsg1/build/powerpc64-unknown-linux-gnu/test/codegen/dont-shuffle-bswaps.OPT3/dont-shuffle-bswaps.ll
Check file: /build/reproducible-path/rustc-1.86.0+dfsg1/tests/codegen/dont-shuffle-bswaps.rs

-dump-input=help explains the following input dump.

Input was:
<<<<<<
            1: ; ModuleID = 'dont_shuffle_bswaps.5c93dbc6d133cf57-cgu.0' 
            2: source_filename = "dont_shuffle_bswaps.5c93dbc6d133cf57-cgu.0" 
            3: target datalayout = "E-m:e-Fi64-i64:64-n32:64-S128-v256:256:256-v512:512:512" 
            4: target triple = "powerpc64-unknown-linux-gnu" 
            5:  
            6: ; Function Attrs: mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite) uwtable 
            7: define void @convert(ptr dead_on_unwind noalias nocapture noundef writable writeonly sret([16 x i8]) align 1 dereferenceable(16) %_0, ptr noalias nocapture noundef readonly align 2 dereferenceable(16) %value) unnamed_addr #0 { 
check:17'0                          X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
            8: start: 
check:17'0     ~~~~~~~
            9:  %_4 = load i16, ptr %value, align 2, !noundef !2 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
check:17'1            ?                                           possible intended match
           10:  %0 = tail call i16 @llvm.bswap.i16(i16 %_4) 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           11:  %1 = getelementptr inbounds i8, ptr %value, i64 2 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           12:  %_6 = load i16, ptr %1, align 2, !noundef !2 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           13:  %2 = tail call i16 @llvm.bswap.i16(i16 %_6) 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           14:  %3 = getelementptr inbounds i8, ptr %value, i64 4 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           15:  %_8 = load i16, ptr %3, align 2, !noundef !2 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           16:  %4 = tail call i16 @llvm.bswap.i16(i16 %_8) 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           17:  %5 = getelementptr inbounds i8, ptr %value, i64 6 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           18:  %_10 = load i16, ptr %5, align 2, !noundef !2 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           19:  %6 = tail call i16 @llvm.bswap.i16(i16 %_10) 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           20:  %7 = getelementptr inbounds i8, ptr %value, i64 8 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           21:  %_12 = load i16, ptr %7, align 2, !noundef !2 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           22:  %8 = tail call i16 @llvm.bswap.i16(i16 %_12) 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           23:  %9 = getelementptr inbounds i8, ptr %value, i64 10 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           24:  %_14 = load i16, ptr %9, align 2, !noundef !2 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           25:  %10 = tail call i16 @llvm.bswap.i16(i16 %_14) 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           26:  %11 = getelementptr inbounds i8, ptr %value, i64 12 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           27:  %_16 = load i16, ptr %11, align 2, !noundef !2 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           28:  %12 = tail call i16 @llvm.bswap.i16(i16 %_16) 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           29:  %13 = getelementptr inbounds i8, ptr %value, i64 14 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           30:  %_18 = load i16, ptr %13, align 2, !noundef !2 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           31:  %14 = tail call i16 @llvm.bswap.i16(i16 %_18) 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           32:  store i16 %0, ptr %_0, align 1 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           33:  %addr16.sroa.2.0._0.sroa_idx = getelementptr inbounds i8, ptr %_0, i64 2 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           34:  store i16 %2, ptr %addr16.sroa.2.0._0.sroa_idx, align 1 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           35:  %addr16.sroa.3.0._0.sroa_idx = getelementptr inbounds i8, ptr %_0, i64 4 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           36:  store i16 %4, ptr %addr16.sroa.3.0._0.sroa_idx, align 1 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           37:  %addr16.sroa.4.0._0.sroa_idx = getelementptr inbounds i8, ptr %_0, i64 6 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           38:  store i16 %6, ptr %addr16.sroa.4.0._0.sroa_idx, align 1 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           39:  %addr16.sroa.5.0._0.sroa_idx = getelementptr inbounds i8, ptr %_0, i64 8 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           40:  store i16 %8, ptr %addr16.sroa.5.0._0.sroa_idx, align 1 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           41:  %addr16.sroa.6.0._0.sroa_idx = getelementptr inbounds i8, ptr %_0, i64 10 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           42:  store i16 %10, ptr %addr16.sroa.6.0._0.sroa_idx, align 1 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           43:  %addr16.sroa.7.0._0.sroa_idx = getelementptr inbounds i8, ptr %_0, i64 12 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           44:  store i16 %12, ptr %addr16.sroa.7.0._0.sroa_idx, align 1 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           45:  %addr16.sroa.8.0._0.sroa_idx = getelementptr inbounds i8, ptr %_0, i64 14 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           46:  store i16 %14, ptr %addr16.sroa.8.0._0.sroa_idx, align 1 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           47:  ret void 
check:17'0     ~~~~~~~~~~
           48: } 
check:17'0     ~~
           49:  
check:17'0     ~
           50: ; Function Attrs: mustprogress nocallback nofree nosync nounwind speculatable willreturn memory(none) 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           51: declare i16 @llvm.bswap.i16(i16) #1 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           52:  
check:17'0     ~
           53: attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(argmem: readwrite) uwtable "probe-stack"="inline-asm" "target-cpu"="ppc64" } 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           54: attributes #1 = { mustprogress nocallback nofree nosync nounwind speculatable willreturn memory(none) } 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           55:  
check:17'0     ~
           56: !llvm.module.flags = !{!0} 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~
           57: !llvm.ident = !{!1} 
check:17'0     ~~~~~~~~~~~~~~~~~~~~
           58:  
check:17'0     ~
           59: !0 = !{i32 8, !"PIC Level", i32 2} 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           60: !1 = !{!"rustc version 1.86.0 (05f9846f8 2025-03-31) (built from a source tarball)"} 
check:17'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
           61: !2 = !{} 
check:17'0     ~~~~~~~~~
>>>>>>

------------------------------------------

error in revision `OPT3`: verification with 'FileCheck' failed
status: exit status: 1

@Fabian-Gruenbichler Fabian-Gruenbichler added T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) C-bug Category: This is a bug. labels Jun 5, 2025
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jun 5, 2025
@onur-ozkan onur-ozkan added A-codegen Area: Code generation and removed T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jun 5, 2025
@onur-ozkan
Copy link
Member

Command used
Debian packaging build

Please update this part in the issue.

@Fabian-Gruenbichler
Copy link
Contributor Author

done (also added HEAD, after verifying it reproduces with plain upstream sources easily. note that you need a working cross setup or run it on one of the affected targets natively.

@workingjubilee
Copy link
Member

This test was not introduced by that PR.

@workingjubilee
Copy link
Member

@Fabian-Gruenbichler Do both revisions fail, or only the OPT3 one?

@workingjubilee
Copy link
Member

...damn, that codegen sucks ass?

@nikic
Copy link
Contributor

nikic commented Jun 5, 2025

That tests seems to assume that every 64-bit target has a cheap <8 x i16> bswap, which is obviously not true. E.g. this is how mips64el looks like: https://llvm.godbolt.org/z/9K57M5z8o

This test should be limited to known-good arches.

@workingjubilee
Copy link
Member

Yes, but Power ISA has a vector byte-reverse. Is our powerpc64le target too low an architectural level for that?

@workingjubilee
Copy link
Member

workingjubilee commented Jun 5, 2025

ppc64le implies Power 8: https://github.com/llvm/llvm-project/blob/eb6577d54f53715e8917cf8a91eb68c8b47d489f/llvm/lib/Target/PowerPC/PPC.td#L713

Which should imply VSX is available and thus xxbrh is usable? Is that seriously worse than its current assembly?

@workingjubilee
Copy link
Member

workingjubilee commented Jun 5, 2025

...well, I'll confess that's not what I expected the assembly to be on Power 8 of llvm.bswap on a vector.

test:                                   # @test
.Lfunc_gep0:
        addis 2, 12, .TOC.-.Lfunc_gep0@ha
        addi 2, 2, .TOC.-.Lfunc_gep0@l
        lxvd2x 0, 0, 3
        addis 4, 2, .LCPI0_0@toc@ha
        addi 4, 4, .LCPI0_0@toc@l
        xxswapd 34, 0
        lxvd2x 0, 0, 4
        xxswapd 35, 0
        vperm 2, 2, 2, 3
        xxswapd 0, 34
        stxvd2x 0, 0, 3
        blr

@jieyouxu is there some form of //@ only-x86_64-or-aarch64 directive?

@jieyouxu
Copy link
Member

jieyouxu commented Jun 5, 2025

is there some form of //@ only-x86_64-or-aarch64 directive?

Unfortunately not (yet). I believe what you're looking for is #140575.

@nikic
Copy link
Contributor

nikic commented Jun 5, 2025

@workingjubilee xxbrh is gated behind a +power9-vector check. Whether that's correct or not I have no idea.

@workingjubilee
Copy link
Member

oh, huh.

@programmerjake is it?

@workingjubilee
Copy link
Member

Hm, after some searching, I think it is. Tragic! Not obvious from the ISA manual, irritatingly.

@programmerjake
Copy link
Member

Hm, after some searching, I think it is. Tragic! Not obvious from the ISA manual, irritatingly.

if you look in the PowerISA v3.1C pdf on page 1373 (1409), you can see that it was first added in v3.0, which corresponds to POWER9.

@Fabian-Gruenbichler
Copy link
Contributor Author

This test was not introduced by that PR.

sorry, yes, I misread the diffstat (only additions) as new file while looking up the reference

@Fabian-Gruenbichler Do both revisions fail, or only the OPT3 one?

only the OPT3 one AFAICT

This test should be limited to known-good arches.

it was originally limited to only-x86_64, then was "simplified" to only-64bit in 4c17270332c2908a9e77d0c5a5cdfc27edd45654

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation C-bug Category: This is a bug.
Projects
None yet
Development

No branches or pull requests

7 participants