Skip to content

fix: CI issue with test-drive and gnu 15 #1007

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

jalvesz
Copy link
Contributor

@jalvesz jalvesz commented Jun 27, 2025

Hacking away the CI for gnu 15: #957 (comment)

@jalvesz
Copy link
Contributor Author

jalvesz commented Jun 27, 2025

@Romendakil @perazz I tried here to address the issue with gcc 15. It builds and runs but now there are several tests failing. Any advise?
cc @loiseaujc

@jalvesz
Copy link
Contributor Author

jalvesz commented Jun 27, 2025

I'm tempted to say that this version of gnu 15.1.0 is bugged ... I had it and reproduced these issues, I just downgraded locally to gnu 14.3.0 and all tests pass. If someone knows how to bring these issues to the attention of gcc developers it would be great.

For stdlib CI maybe we should consider downgrading to gcc 14.3.0

@Romendakil
Copy link

I'm tempted to say that this version of gnu 15.1.0 is bugged ... I had it and reproduced these issues, I just downgraded locally to gnu 14.3.0 and all tests pass. If someone knows how to bring these issues to the attention of gcc developers it would be great.

For stdlib CI maybe we should consider downgrading to gcc 14.3.0

As I said, this problem is fixed. Of course, this single version does have this problem. Actually, I think the official 15.0 release doesn't.

@jalvesz
Copy link
Contributor Author

jalvesz commented Jun 27, 2025

As I said, this problem is fixed

Is the fix available under an official release?

Bypassing the first issue simply unblocked new issues at runtime with the tests:

96% tests passed, 14 tests failed out of 372

Total Test time (real) =  26.26 sec

The following tests did not run:
	126 - check4 (Skipped)

The following tests FAILED:
	 25 - linalg_cholesky (Exit code 0xc0000374
)
	 26 - linalg_determinant (Exit code 0xc0000374
)
	 27 - linalg_eigenvalues (Failed)
	 29 - linalg_inverse (Exit code 0xc0000374
)
	 30 - linalg_pseudoinverse (Exit code 0xc0000374
)
	 31 - linalg_norm (Exit code 0xc0000374
)
	 32 - linalg_mnorm (Exit code 0xc0000374
)
	 33 - linalg_solve (Exit code 0xc0000374
)
	 34 - linalg_lstsq (Exit code 0xc0000374
)
	 35 - linalg_qr (Exit code 0xc0000374
)
	 36 - linalg_schur (Exit code 0xc0000374
)
	 37 - linalg_svd (Exit code 0xc0000374
)
	 78 - insert_at (Exit code 0xc0000374
)
	 79 - append_prepend (Exit code 0xc0000374
)
Errors while running CTest
Error: Process completed with exit code 8.

The only reasonable solution I see at the moment is downgrading the CI to a version available through regular channels which is stable (14.3.0 for instance)

@perazz
Copy link
Member

perazz commented Jun 28, 2025

@jalvesz some debugging shows that another (!) gcc-15 bug is being hit when self-growing an array using the array initializer:

       tests = [tests,new_unittest("$eye_det_rsp",test_rsp_eye_determinant)]
        tests = [tests,new_unittest("$eye_det_multiple_rsp",test_rsp_eye_multiple)]
        tests = [tests,new_unittest("$eye_det_rdp",test_rdp_eye_determinant)]
      ...
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x000000018f741388 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x000000018f77a88c libsystem_pthread.dylib`pthread_kill + 296
    frame #2: 0x000000018f683c60 libsystem_c.dylib`abort + 124
    frame #3: 0x000000018f588174 libsystem_malloc.dylib`malloc_vreport + 892
    frame #4: 0x000000018f58bc90 libsystem_malloc.dylib`malloc_report + 64
    frame #5: 0x000000018f59021c libsystem_malloc.dylib`___BUG_IN_CLIENT_OF_LIBMALLOC_POINTER_BEING_FREED_WAS_NOT_ALLOCATED + 32
    frame #6: 0x00000001000010a8 test_linalg_determinant`__test_linalg_determinant_MOD_test_matrix_determinant at test_linalg_determinant.f90:25:83
    frame #7: 0x000000010015bc08 test_linalg_determinant`__testdrive_MOD_run_testsuite at testdrive.F90:328:27
    frame #8: 0x00000001000072f4 test_linalg_determinant`MAIN__ at test_linalg_determinant.f90:533:68
    frame #9: 0x000000010015f4c0 test_linalg_determinant`main at test_linalg_determinant.f90:518:40
    frame #10: 0x000000018f3dab98 dyld`start + 6076

as also addressed in fpm here https://github.com/krystophny/fpm/blame/928fae3160039d42c0ed07736bb8512d30028113/src/metapackage/fpm_meta_mpi.f90#L770

(something like a = [a, b] will crash)

@jalvesz
Copy link
Contributor Author

jalvesz commented Jun 28, 2025

Thanks @perazz for finding this out! This looks like a serious regression bug on the gfortran's end.

All of the tests are written using this syntax. And it is not only stdlib that is concerned...

Should we change all of the tests to avoid recursive allocation? Doesn't look reasonable.

I think this problem should be brought to gfortran developers attention and in the meantime we downgrade the gcc version in stdlib's CI.

@Romendakil
Copy link

@jalvesz some debugging shows that another (!) gcc-15 bug is being hit when self-growing an array using the array initializer:

       tests = [tests,new_unittest("$eye_det_rsp",test_rsp_eye_determinant)]
        tests = [tests,new_unittest("$eye_det_multiple_rsp",test_rsp_eye_multiple)]
        tests = [tests,new_unittest("$eye_det_rdp",test_rdp_eye_determinant)]
      ...
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x000000018f741388 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x000000018f77a88c libsystem_pthread.dylib`pthread_kill + 296
    frame #2: 0x000000018f683c60 libsystem_c.dylib`abort + 124
    frame #3: 0x000000018f588174 libsystem_malloc.dylib`malloc_vreport + 892
    frame #4: 0x000000018f58bc90 libsystem_malloc.dylib`malloc_report + 64
    frame #5: 0x000000018f59021c libsystem_malloc.dylib`___BUG_IN_CLIENT_OF_LIBMALLOC_POINTER_BEING_FREED_WAS_NOT_ALLOCATED + 32
    frame #6: 0x00000001000010a8 test_linalg_determinant`__test_linalg_determinant_MOD_test_matrix_determinant at test_linalg_determinant.f90:25:83
    frame #7: 0x000000010015bc08 test_linalg_determinant`__testdrive_MOD_run_testsuite at testdrive.F90:328:27
    frame #8: 0x00000001000072f4 test_linalg_determinant`MAIN__ at test_linalg_determinant.f90:533:68
    frame #9: 0x000000010015f4c0 test_linalg_determinant`main at test_linalg_determinant.f90:518:40
    frame #10: 0x000000018f3dab98 dyld`start + 6076

as also addressed in fpm here https://github.com/krystophny/fpm/blame/928fae3160039d42c0ed07736bb8512d30028113/src/metapackage/fpm_meta_mpi.f90#L770

(something like a = [a, b] will crash)

This seems to be known as well to the team and a bugfix is under testing right now: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120711
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120656

@perazz
Copy link
Member

perazz commented Jun 28, 2025

@Romendakil great to see the team is working on it. However, people will stumble upon these issues on systems that ship gcc 15.1.0, and we don't want stdlib users to deal with compiler stuff. So @jalvesz I would advise to have workarounds in place across the library and I endorse this PR.

@Romendakil
Copy link

@Romendakil great to see the team is working on it. However, people will stumble upon these issues on systems that ship gcc 15.1.0, and we don't want stdlib users to deal with compiler stuff. So @jalvesz I would advise to have workarounds in place across the library and I endorse this PR.

Yes, but note that this is not in the official 15.1 release from gcc. So it is an inconsistent version that you get from somewhere

@perazz
Copy link
Member

perazz commented Jun 28, 2025

@Romendakil yep, from the CI it would look like MSYS2 ships that version. We are not testing against gcc-15 on other systems yet, though.

@perazz perazz mentioned this pull request Jun 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants