Skip to content

mprotect failure on Fedora 41 during memory hook installation when clang asan is used. #13361

@etiennemlb

Description

@etiennemlb

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

5.0.5

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

dnf install.

Please describe the system on which you are running

  • Operating system/version: fedora 41
  • Computer hardware: old laptop with an i5, 16 GiB of DRAM | so, x86-64
  • Network type: nothing fancy, laptop tcp stack with wifi.

Details of the problem

When build using clang (19.1) with the address sanitizer, running a hello world example gives:

$ ./hello 
[g15549:101021:0:101021] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x8007b3f0)
==== backtrace (tid: 101021) ====
 0  /lib64/libucs.so.0(ucs_debug_print_backtrace+0x2c) [0x7f6e2efb3f4c]
 1  /lib64/libucs.so.0(ucs_handle_error+0x2e4) [0x7f6e2efb5a74]
 2  /lib64/libucs.so.0(+0x1876d) [0x7f6e2efb776d]
 3  /lib64/libucs.so.0(+0x1893d) [0x7f6e2efb793d]
 4  [0x8007b3f0]
=================================
Illegal instruction

Using gcc or clang without asan works fine.

Digging deeper, we have the following call stack:

#0  0x000000008007b3f0 in ?? ()
#1  0x00007ffff49a2e73 in ModifyMemoryProtection (addr=<optimized out>, length=<optimized out>, prot=5) at mca/patcher/base/patcher_base_patch.c:143
#2  apply_patch (patch_data=0x50d000003620 "I\273\360\023\232\364\377\177", address=<optimized out>, data_size=13) at mca/patcher/base/patcher_base_patch.c:174
#3  mca_base_patcher_patch_apply_binary (patch=patch@entry=0x50d0000035e0) at mca/patcher/base/patcher_base_patch.c:185
#4  0x00007ffff49a2f6e in mca_patcher_overwrite_apply_patch (patch=0x50d0000035e0) at mca/patcher/overwrite/patcher_overwrite_module.c:58
#5  mca_patcher_overwrite_patch_address (sys_addr=4300722, hook_addr=<optimized out>) at mca/patcher/overwrite/patcher_overwrite_module.c:307
#6  0x00007ffff49a1867 in patcher_open () at mca/memory/patcher/memory_patcher_component.c:607
#7  0x00007ffff492aad3 in open_components (framework=0x7ffff49fdc00 <opal_memory_base_framework>) at mca/base/mca_base_components_open.c:349
#8  mca_base_framework_components_open (framework=0x7ffff49fdc00 <opal_memory_base_framework>, flags=<optimized out>) at mca/base/mca_base_components_open.c:296
#9  0x00007ffff49a132d in opal_memory_base_open (flags=MCA_BASE_OPEN_DEFAULT) at mca/memory/base/memory_base_open.c:110
#10 0x00007ffff492bb5d in mca_base_framework_open (framework=0x7ffff49fdc00 <opal_memory_base_framework>, flags=MCA_BASE_OPEN_DEFAULT) at mca/base/mca_base_framework.c:194
#11 mca_base_framework_open (framework=0x7ffff49fdc00 <opal_memory_base_framework>, flags=<optimized out>) at mca/base/mca_base_framework.c:161
#12 0x00007ffff4966c9f in opal_common_ucx_mca_register () at mca/common/ucx/common_ucx.c:170
#13 opal_common_ucx_mca_register () at mca/common/ucx/common_ucx.c:155
#14 0x00007ffff55f52ed in mca_pml_ucx_component_open () at mca/pml/ucx/pml_ucx_component.c:106
#15 0x00007ffff492aad3 in open_components (framework=0x7ffff56fb780 <ompi_pml_base_framework>) at mca/base/mca_base_components_open.c:349
#16 mca_base_framework_components_open (framework=0x7ffff56fb780 <ompi_pml_base_framework>, flags=flags@entry=MCA_BASE_OPEN_DEFAULT) at mca/base/mca_base_components_open.c:296
#17 0x00007ffff55ed7e0 in mca_pml_base_open (flags=MCA_BASE_OPEN_DEFAULT) at mca/pml/base/pml_base_frame.c:230
#18 0x00007ffff492bb5d in mca_base_framework_open (framework=0x7ffff56fb780 <ompi_pml_base_framework>, flags=MCA_BASE_OPEN_DEFAULT) at mca/base/mca_base_framework.c:194
#19 mca_base_framework_open (framework=framework@entry=0x7ffff56fb780 <ompi_pml_base_framework>, flags=flags@entry=MCA_BASE_OPEN_DEFAULT) at mca/base/mca_base_framework.c:161
#20 0x00007ffff544a34b in ompi_mpi_instance_init_common (argc=<optimized out>, argc@entry=0, argv=<optimized out>, argv@entry=0x0) at instance/instance.c:407
#21 0x00007ffff544b8cc in ompi_mpi_instance_init (ts_level=<optimized out>, info=0x7ffff5723ee0 <ompi_mpi_info_null>, errhandler=0x7ffff57217a0 <ompi_mpi_errors_are_fatal>, instance=0x7ffff5721788 <ompi_mpi_instance_default>, argc=0, argv=0x0) at instance/instance.c:824
#22 0x00007ffff543d7c0 in ompi_mpi_init (argc=0, argv=0x0, requested=0, provided=0x7ffff16000e0, reinit_ok=false) at runtime/ompi_mpi_init.c:359
#23 0x00007ffff547c22a in PMPI_Init_thread (argc=0x0, argv=0x0, required=0, provided=0x7ffff16000e0) at mpi/c/init_thread.c:78
#24 0x00000000004eee97 in main (argc=<optimized out>, argv=<optimized out>) at hello.cc:375

At frame 6 we register callback hooks for different memory ops. The first patch_symbol call fails, dealing with mmap.
At frame 2, in ModifyMemoryProtection, the first mprotect call succeeds, specifying bits rwx on the mmap code. It then memcpy the hook and reset the protection to r-w, this is when it fails:

ModifyMemoryProtection(address, data_size, PROT_EXEC | PROT_READ);

The asm it is executing when failing is:

|   0x7ffff49a2e68 <mca_base_patcher_patch_apply_binary+152>        sub    %rbx,%rsi                                                                                                                                                         │
│   0x7ffff49a2e6b <mca_base_patcher_patch_apply_binary+155>        mov    %rbx,%rdi                                                                                                                                                         │
│   0x7ffff49a2e6e <mca_base_patcher_patch_apply_binary+158>        call   0x7ffff4913580 <mprotect@plt>                                                                                                                                     │
│  >0x7ffff49a2e73 <mca_base_patcher_patch_apply_binary+163>        test   %eax,%eax                                                                                                                                                         │
│   0x7ffff49a2e75 <mca_base_patcher_patch_apply_binary+165>        jne    0x7ffff49169c0 <mca_base_patcher_patch_apply_binary-574480>                                                                                                       │
│   0x7ffff49a2e7b <mca_base_patcher_patch_apply_binary+171>        lea    -0x152(%rip),%rax        # 0x7ffff49a2d30 <mca_base_patcher_patch_unapply_binary>                                                                                 │
│   0x7ffff49a2e82 <mca_base_patcher_patch_apply_binary+178>        mov    %rax,0x88(%r15)                                                                                                                                                   │
│   0x7ffff49a2e89 <mca_base_patcher_patch_apply_binary+185>        add    $0x8,%rsp           

The call to mprotect gets routed to addr: 0x000000008007b3f0 which seems to contain only zero valued garbage.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions