-
Notifications
You must be signed in to change notification settings - Fork 56
feat(RHOAIENG-26480): Run RayJobs against existing RayClusters #867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(RHOAIENG-26480): Run RayJobs against existing RayClusters #867
Conversation
Signed-off-by: Pat O'Connor <[email protected]>
43f18a9
to
7abeca4
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## ray-jobs-feature #867 +/- ##
====================================================
+ Coverage 92.47% 92.64% +0.16%
====================================================
Files 24 26 +2
Lines 1396 1427 +31
====================================================
+ Hits 1291 1322 +31
Misses 105 105 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kryanbeane The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
006c55b
into
project-codeflare:ray-jobs-feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look good! I've reviewed the code and some comments inline.
A note on the verification. One comment on the output of the job submission, i think the red output could be read as an error but seems like all ran fine. The job completed fine even though I did attempt to submit the job before the cluster worker pos were ready.
# Initialize the KubeRay job API client | ||
self._api = RayjobApi() | ||
|
||
logger.info(f"Initialized RayJob: {self.name} in namespace: {self.namespace}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this log probably needs an update or to be moved. the RayJob doesn't get initialized it get's submitted/applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I just have that for initial clarity as we've initialised the object but not submitted it. I can probably remove the logger stuff as is really for now until we decide on an output format?
@@ -38,6 +39,7 @@ spec: | |||
rayStartParams: | |||
block: 'true' | |||
dashboard-host: 0.0.0.0 | |||
dashboard-port: '8265' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this one is being removed elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's added in the new RayJob logic so it has to be here for the unit test assertions I think!
Issue link
Jira
What changes have been made
Support added to the CodeFlare SDK to leverage the upstream KubeRay RayJob Python Client
Verification steps
whl
file for the CodeFlare SDK based on this branch.Cell 1
NOTE: Ensure that you restart the notebook kernel after installing the
whl
file.Cell 2
Cell 3
Checks