-
Notifications
You must be signed in to change notification settings - Fork 56
feat(RHOAIENG-29330):Deny RayCluster creation with Ray Version mismatches #881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: ray-jobs-feature
Are you sure you want to change the base?
feat(RHOAIENG-29330):Deny RayCluster creation with Ray Version mismatches #881
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## ray-jobs-feature #881 +/- ##
====================================================
+ Coverage 93.65% 93.76% +0.10%
====================================================
Files 20 21 +1
Lines 1717 1747 +30
====================================================
+ Hits 1608 1638 +30
Misses 109 109 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
49e7d69
to
3965a7f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only small changes but this is great work @LilyLinh well done :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say we can get rid of this file. It's good practice to test as much as possible, but if we hardcode Ray version like on line 26 assert RAY_VERSION == "2.47.1"
this will fail the moment we bump the ray version so I think we don't need to include this test file. But keep up the good practice of testing everything!
@@ -0,0 +1,188 @@ | |||
# Copyright 2024 IBM, Red Hat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Copyright 2024 IBM, Red Hat | |
# Copyright 2022-2025 IBM, Red Hat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: (nit means nit pick, and is just a suggestion so you don't need to change it etc) - i don't know how lisences work or if we need the years to be up to date so not sure if this matters at all, just pointing it out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. It should be update to "2022-2025" as stated here (https://www.apache.org/legal/src-headers.html#3part) and the codeflare project launched on 2022? It is said that we can do pre-commit hook on yml file to auto update the current year https://github.com/nanoufo/copyright-hook
@@ -289,3 +291,15 @@ def check_type(value, expected_type): | |||
return isinstance(value, expected_type) | |||
|
|||
return check_type(value, expected_type) | |||
|
|||
def _validate_ray_version_compatibility(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will need to be moved to the ray jobs work instead of ray cluster. You can just copy and paste this as the function itself looks good just in the wrong place! Btw you can use this https://github.com/kryanbeane/codeflare-sdk/blob/5b6c937be4f2b177ed609ee040b8a67ece093523/demo-notebooks/guided-demos/5_rayjob_lifecycled_cluster.ipynb to for easier testing once you add it to the Ray job work just ensure you rebuild the .whl
file and reinstall
@@ -247,3 +249,114 @@ def test_ray_usage_stats_enabled(mocker): | |||
def test_cleanup(): | |||
os.remove(f"{aw_dir}test-all-params.yaml") | |||
os.remove(f"{aw_dir}aw-all-params.yaml") | |||
|
|||
|
|||
def test_ray_version_validation_compatible_image(mocker): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again this needs to be moved. the tests themselves look good, just in the wrong place and might need slight refactoring to make it more rayjob focused. (like some of the mocker.patch lines will need changing etc) MSG if you want a hand!
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
dd45d04
to
dc47dc9
Compare
Issue link
Jira
What changes have been made
This PR implements Ray version validation to prevent compatibility issues between the CodeFlare SDK and user-specified runtime images.
Solution: Added automatic Ray version detection and validation during cluster configuration:
extract_ray_version_from_image()
parses Ray versions from various image name formats (e.g.,ray:2.47.1
,quay.io/modh/ray:2.47.1-py311-cu121
)validate_ray_version_compatibility()
compares runtime image Ray version against SDK version (2.47.1)ClusterConfiguration.__post_init__()
Verification steps
Build the SDK, test its functionality in a notebook.
image="quay.io/modh/ray:2.47.1-py311-cu121"
, the cluster is created and up successfully; when using a different Ray version ( e.g.,image="quay.io/modh/ray:2.46.1-py311-cu121"
), it should show an error message.Checks