add MPI_ERR_ERRHANDLER error class #13342

hppritcha · 2025-07-23T17:59:03Z

related to #12082

also some fixes. Like the MPI_Errors.3 wasn't getting installed.

edgargabriel

LGTM

hppritcha · 2025-07-23T20:31:08Z

we may need to tweak mpi4py test. it doesn't handle being stuck between MPI 3.1 and MPI 4.1 compliance it appears.

hppritcha · 2025-07-23T21:44:55Z

i think we'll switch to using an exclude file for mpi4py pytest. The problem that caused us to add -x test_doc has been fixed so we can safely replace it with test_exceptions for now, but it appears that the --xfile approach may be better. I'll do that as a separate PR.

related to open-mpi#12082 also some fixes. Like the MPI_Errors.3 wasn't getting installed. mpi4py - disable testing MPI_ERR_ERRHANDLER. We can re-enable this once we actually support MPI 4.1. Signed-off-by: Howard Pritchard <[email protected]>

hppritcha · 2025-07-24T16:05:04Z

@edgargabriel could you recheck this?

dalcinl · 2025-07-27T14:39:51Z

@hppritcha You should have notified me about this mpi4py failure!
I'lll push a fix to mpi4py@master, then you can get rid of the exclude.

PS: This PR slightly broke the rules. MPI_VERSION/SUBVERSION and MPI_Get_version are at 3.1, but then the MPI runtime returns an error class that is not defined in MPI 3.1. I'm not complaining of asking for a change, just adding a disclaimer about why the mpi4py test suite started to fail.

rhc54 · 2025-07-27T15:32:30Z

Something to ponder: It is unlikely that OMPI will jump from one MPI level to another in a single step. More commonly, they will introduce some support for the next level at a one-step-per-PR procedure, which means they will be in some in-between state with respect to the MPI standard. OMPI probably won't update the MPI_VERSION until everything has been completed and they are completely compliant with the next version of the standard.

So if you are basing tests on the MPI version, you may continually hit these problems. Not sure of a solution, but thought it might be worth pondering how to handle these "hybrid" situations. Not sure how OMPI handles it in releases, really, since a release may be in some in-between state as well?

@hppritcha @jsquyres Any thoughts on how you've done that in the past that might help steer @dalcinl ?

hppritcha · 2025-07-27T22:13:19Z

@dalcinl I did not want to bother you with this because we are going to keep "breaking the rules" as we stage in 4.1 functionality without updating our VERSION file. This is probably going to go on for the next several months.

bosilca · 2025-07-28T10:52:00Z

MPI_Get_version returns a tuple from a limited set of values defined in Section 9.1 of the standard. While MPI_Get_library_version provides additional information about the library, interpreting this version requires users to track each subversion and understand the features it enables—a complex and error-prone process.

If we aim to give users detailed insights into new MPI features before reaching a major MPI milestone, we could introduce a MPIX_FEATURE_XXX_CHANGELOG_YYY define. Here, XXX would represent the official MPI version returned by MPI_Get_version, and YYY the item number in the corresponding changelog in the MPI standard. To manage the number of these definitions, we could remove all MPIX_FEATURE_XXX_CHANGELOG_YYY* entries once the released version is compliant with MPI XXX+1.

rhc54 · 2025-07-28T13:54:40Z

Intriguing idea! What we did for PMIx was to modify the version string by adding another element to it. In our case, the standard only has two fields (e.g., 3.1), so we added a third to represent a step above that version. So when we release with something beyond (for example) 3.1, we mark it as 3.1.1. If a later release adds some more features but still doesn't fully meet the next standard version, then we increment it to 3.1.2.

You still have to read the release notes to see what features from the next version were included, but at least it indicates that we added something.

Any thoughts on how to deal with the automated testing problem? You can't just generate an error because a test returned an unexpected value that in fact is defined in a standard version above the one you expected. Like I said, moving from one standard version to the next is not a binary step, so you know that elements of the next version are slowly going to make their way into the repo and into releases over some period of time.

dalcinl · 2025-07-28T14:00:49Z

Guys, let's not overreact and over-engineer the feature addition process. I'm totally fine with dealing with regressions that are for the advancement of the MPI standard and the Open MPI project. This particular regression is a bit special, it is an API addition (new error class constant) that that triggered a change in runtime behavior (that error class being returned from a previous existing API).

My only gentle request is that you guys keep me in the loop, and if you have a mpi4py-related problem, just "@"mention me in any PR and I'll help to sort out things from mpi4py's side. I would rather update mpi4py and its test suite ASAP rather than you guys maintaining test exclusion lists that may get outdated and eventually hide actual bugs.

hppritcha · 2025-07-28T14:05:14Z

We can't control ourselves.

rhc54 · 2025-07-28T17:10:21Z

I think the concern wasn't so much about how to deal with this specific issue, but rather:

(a) how do we communicate to users the actual state of the MPI compliance of an OMPI release? It would be an anomaly for OMPI to release a "pure" compliant state - it is almost either "nearly" compliant with some MPI standard version, or "fully" compliant with a version, but with a few additions from the next version. How does a user write an application that can deal with this situation? What can/should we provide to help them?

(b) how does someone like yourself handle this "hybrid" situation when writing an automated CI or tester? It's nice if one can test for non-compliant behavior or return values - but how do you differentiate an actual error from something that actually complies with an MPI standard version beyond the base one supported by the OMPI version under test? Using it for CI means we have to avoid false error reports, but trying to add all that logic to the tester to support change-by-change movement seems unrealistic. Likewise, simply deleting problem tests creates, as you note, potential issues down-the-road.

I can see ways to deal with (a), and George's suggestion was in that realm - resolving (b) seems a tad tougher as you don't want to do something that can lead to not finding errors down the road, or is something people forget to update as things evolve. Minimal maintenance on all sides would be desirable.

dalcinl · 2025-07-30T14:15:26Z

@rhc54 Intel MPI has a way to handle this issue. Look for I_MPI_COMPATIBILITY here:
https://www.intel.com/content/www/us/en/docs/mpi-library/developer-reference-linux/2021-16/other-environment-variables.html
Maybe Open MPI could do something similar via an MCA parameter. However, I would say that such features would be more useful for users looking for guarantees against regressions when switching OMPI releases. In my case, I'm still interested in testing against the bleeding edge to catch problems early, and then occasional regressions may be simply unavoidable.

hppritcha requested a review from edgargabriel July 23, 2025 17:59

github-actions bot added the Target: main label Jul 23, 2025

hppritcha added MPI 4.1 and removed Target: main labels Jul 23, 2025

devreal previously approved these changes Jul 23, 2025

View reviewed changes

edgargabriel previously approved these changes Jul 23, 2025

View reviewed changes

hppritcha dismissed stale reviews from edgargabriel and devreal via 29b5b70 July 23, 2025 21:16

github-actions bot added the Target: main label Jul 23, 2025

add MPI_ERR_ERRHANDLER error class

a5aa4fd

related to open-mpi#12082 also some fixes. Like the MPI_Errors.3 wasn't getting installed. mpi4py - disable testing MPI_ERR_ERRHANDLER. We can re-enable this once we actually support MPI 4.1. Signed-off-by: Howard Pritchard <[email protected]>

hppritcha force-pushed the address_issue_12082 branch from 29b5b70 to a5aa4fd Compare July 23, 2025 21:46

hppritcha requested a review from edgargabriel July 23, 2025 21:46

edgargabriel approved these changes Jul 25, 2025

View reviewed changes

abouteiller approved these changes Jul 25, 2025

View reviewed changes

hppritcha merged commit a4a7e25 into open-mpi:main Jul 25, 2025
15 checks passed

dalcinl mentioned this pull request Jul 27, 2025

build: Support MPI_ERR_ERRHANDLER under Open MPI mpi4py/mpi4py#672

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add MPI_ERR_ERRHANDLER error class #13342

add MPI_ERR_ERRHANDLER error class #13342

Uh oh!

hppritcha commented Jul 23, 2025

Uh oh!

edgargabriel left a comment

Uh oh!

hppritcha commented Jul 23, 2025

Uh oh!

hppritcha commented Jul 23, 2025

Uh oh!

hppritcha commented Jul 24, 2025

Uh oh!

Uh oh!

dalcinl commented Jul 27, 2025

Uh oh!

rhc54 commented Jul 27, 2025

Uh oh!

hppritcha commented Jul 27, 2025

Uh oh!

bosilca commented Jul 28, 2025

Uh oh!

rhc54 commented Jul 28, 2025

Uh oh!

dalcinl commented Jul 28, 2025

Uh oh!

hppritcha commented Jul 28, 2025

Uh oh!

rhc54 commented Jul 28, 2025

Uh oh!

dalcinl commented Jul 30, 2025

Uh oh!

Uh oh!

add MPI_ERR_ERRHANDLER error class #13342

add MPI_ERR_ERRHANDLER error class #13342

Uh oh!

Conversation

hppritcha commented Jul 23, 2025

Uh oh!

edgargabriel left a comment

Choose a reason for hiding this comment

Uh oh!

hppritcha commented Jul 23, 2025

Uh oh!

hppritcha commented Jul 23, 2025

Uh oh!

hppritcha commented Jul 24, 2025

Uh oh!

Uh oh!

dalcinl commented Jul 27, 2025

Uh oh!

rhc54 commented Jul 27, 2025

Uh oh!

hppritcha commented Jul 27, 2025

Uh oh!

bosilca commented Jul 28, 2025

Uh oh!

rhc54 commented Jul 28, 2025

Uh oh!

dalcinl commented Jul 28, 2025

Uh oh!

hppritcha commented Jul 28, 2025

Uh oh!

rhc54 commented Jul 28, 2025

Uh oh!

dalcinl commented Jul 30, 2025

Uh oh!

Uh oh!