Skip to content

add MPI_ERR_ERRHANDLER error class #13342

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 25, 2025

Conversation

hppritcha
Copy link
Member

related to #12082

also some fixes. Like the MPI_Errors.3 wasn't getting installed.

devreal
devreal previously approved these changes Jul 23, 2025
edgargabriel
edgargabriel previously approved these changes Jul 23, 2025
Copy link
Member

@edgargabriel edgargabriel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hppritcha
Copy link
Member Author

we may need to tweak mpi4py test. it doesn't handle being stuck between MPI 3.1 and MPI 4.1 compliance it appears.

@hppritcha hppritcha dismissed stale reviews from edgargabriel and devreal via 29b5b70 July 23, 2025 21:16
@hppritcha
Copy link
Member Author

i think we'll switch to using an exclude file for mpi4py pytest. The problem that caused us to add -x test_doc has been fixed so we can safely replace it with test_exceptions for now, but it appears that the --xfile approach may be better. I'll do that as a separate PR.

related to open-mpi#12082

also some fixes.  Like the MPI_Errors.3 wasn't getting installed.

mpi4py - disable testing MPI_ERR_ERRHANDLER.  We can re-enable this
once we actually support MPI 4.1.

Signed-off-by: Howard Pritchard <[email protected]>
@hppritcha hppritcha force-pushed the address_issue_12082 branch from 29b5b70 to a5aa4fd Compare July 23, 2025 21:46
@hppritcha hppritcha requested a review from edgargabriel July 23, 2025 21:46
@hppritcha
Copy link
Member Author

@edgargabriel could you recheck this?

@hppritcha hppritcha merged commit a4a7e25 into open-mpi:main Jul 25, 2025
15 checks passed
@dalcinl
Copy link
Contributor

dalcinl commented Jul 27, 2025

@hppritcha You should have notified me about this mpi4py failure!
I'lll push a fix to mpi4py@master, then you can get rid of the exclude.

PS: This PR slightly broke the rules. MPI_VERSION/SUBVERSION and MPI_Get_version are at 3.1, but then the MPI runtime returns an error class that is not defined in MPI 3.1. I'm not complaining of asking for a change, just adding a disclaimer about why the mpi4py test suite started to fail.

@rhc54
Copy link
Contributor

rhc54 commented Jul 27, 2025

Something to ponder: It is unlikely that OMPI will jump from one MPI level to another in a single step. More commonly, they will introduce some support for the next level at a one-step-per-PR procedure, which means they will be in some in-between state with respect to the MPI standard. OMPI probably won't update the MPI_VERSION until everything has been completed and they are completely compliant with the next version of the standard.

So if you are basing tests on the MPI version, you may continually hit these problems. Not sure of a solution, but thought it might be worth pondering how to handle these "hybrid" situations. Not sure how OMPI handles it in releases, really, since a release may be in some in-between state as well?

@hppritcha @jsquyres Any thoughts on how you've done that in the past that might help steer @dalcinl ?

@hppritcha
Copy link
Member Author

@dalcinl I did not want to bother you with this because we are going to keep "breaking the rules" as we stage in 4.1 functionality without updating our VERSION file. This is probably going to go on for the next several months.

@bosilca
Copy link
Member

bosilca commented Jul 28, 2025

MPI_Get_version returns a tuple from a limited set of values defined in Section 9.1 of the standard. While MPI_Get_library_version provides additional information about the library, interpreting this version requires users to track each subversion and understand the features it enables—a complex and error-prone process.

If we aim to give users detailed insights into new MPI features before reaching a major MPI milestone, we could introduce a MPIX_FEATURE_XXX_CHANGELOG_YYY define. Here, XXX would represent the official MPI version returned by MPI_Get_version, and YYY the item number in the corresponding changelog in the MPI standard. To manage the number of these definitions, we could remove all MPIX_FEATURE_XXX_CHANGELOG_YYY* entries once the released version is compliant with MPI XXX+1.

@rhc54
Copy link
Contributor

rhc54 commented Jul 28, 2025

Intriguing idea! What we did for PMIx was to modify the version string by adding another element to it. In our case, the standard only has two fields (e.g., 3.1), so we added a third to represent a step above that version. So when we release with something beyond (for example) 3.1, we mark it as 3.1.1. If a later release adds some more features but still doesn't fully meet the next standard version, then we increment it to 3.1.2.

You still have to read the release notes to see what features from the next version were included, but at least it indicates that we added something.

Any thoughts on how to deal with the automated testing problem? You can't just generate an error because a test returned an unexpected value that in fact is defined in a standard version above the one you expected. Like I said, moving from one standard version to the next is not a binary step, so you know that elements of the next version are slowly going to make their way into the repo and into releases over some period of time.

@dalcinl
Copy link
Contributor

dalcinl commented Jul 28, 2025

Guys, let's not overreact and over-engineer the feature addition process. I'm totally fine with dealing with regressions that are for the advancement of the MPI standard and the Open MPI project. This particular regression is a bit special, it is an API addition (new error class constant) that that triggered a change in runtime behavior (that error class being returned from a previous existing API).

My only gentle request is that you guys keep me in the loop, and if you have a mpi4py-related problem, just "@"mention me in any PR and I'll help to sort out things from mpi4py's side. I would rather update mpi4py and its test suite ASAP rather than you guys maintaining test exclusion lists that may get outdated and eventually hide actual bugs.

@hppritcha
Copy link
Member Author

We can't control ourselves.

@rhc54
Copy link
Contributor

rhc54 commented Jul 28, 2025

I think the concern wasn't so much about how to deal with this specific issue, but rather:

(a) how do we communicate to users the actual state of the MPI compliance of an OMPI release? It would be an anomaly for OMPI to release a "pure" compliant state - it is almost either "nearly" compliant with some MPI standard version, or "fully" compliant with a version, but with a few additions from the next version. How does a user write an application that can deal with this situation? What can/should we provide to help them?

(b) how does someone like yourself handle this "hybrid" situation when writing an automated CI or tester? It's nice if one can test for non-compliant behavior or return values - but how do you differentiate an actual error from something that actually complies with an MPI standard version beyond the base one supported by the OMPI version under test? Using it for CI means we have to avoid false error reports, but trying to add all that logic to the tester to support change-by-change movement seems unrealistic. Likewise, simply deleting problem tests creates, as you note, potential issues down-the-road.

I can see ways to deal with (a), and George's suggestion was in that realm - resolving (b) seems a tad tougher as you don't want to do something that can lead to not finding errors down the road, or is something people forget to update as things evolve. Minimal maintenance on all sides would be desirable.

@dalcinl
Copy link
Contributor

dalcinl commented Jul 30, 2025

@rhc54 Intel MPI has a way to handle this issue. Look for I_MPI_COMPATIBILITY here:
https://www.intel.com/content/www/us/en/docs/mpi-library/developer-reference-linux/2021-16/other-environment-variables.html
Maybe Open MPI could do something similar via an MCA parameter. However, I would say that such features would be more useful for users looking for guarantees against regressions when switching OMPI releases. In my case, I'm still interested in testing against the bleeding edge to catch problems early, and then occasional regressions may be simply unavoidable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants