From 138d7e05342a51300564ac03f355feb610cd94b5 Mon Sep 17 00:00:00 2001 From: Pat O'Connor Date: Fri, 1 Aug 2025 14:56:25 +0100 Subject: [PATCH] task(RHOAIENG-26481): Existing cluster RayJob demo notebook Signed-off-by: Pat O'Connor --- .../4_rayjob_existing_cluster.ipynb | 212 ++++++++++++++++++ .../4_rayjob_existing_cluster.ipynb | 212 ++++++++++++++++++ 2 files changed, 424 insertions(+) create mode 100644 demo-notebooks/guided-demos/4_rayjob_existing_cluster.ipynb create mode 100644 demo-notebooks/guided-demos/preview_nbs/4_rayjob_existing_cluster.ipynb diff --git a/demo-notebooks/guided-demos/4_rayjob_existing_cluster.ipynb b/demo-notebooks/guided-demos/4_rayjob_existing_cluster.ipynb new file mode 100644 index 00000000..7f057873 --- /dev/null +++ b/demo-notebooks/guided-demos/4_rayjob_existing_cluster.ipynb @@ -0,0 +1,212 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "9259e514", + "metadata": {}, + "source": [ + "# Submitting RayJobs against an existin RayCluster\n", + "\n", + "In this notebook, we will go through the basics of using the SDK to:\n", + " * Spin up a Ray cluster with our desired resources\n", + " * Verify the status of this cluster\n", + " * Submit a RayJob against that cluster\n", + " * Verify the status of this job" + ] + }, + { + "cell_type": "markdown", + "id": "18136ea7", + "metadata": {}, + "source": [ + "## Creating the RayCluster" + ] + }, + { + "cell_type": "markdown", + "id": "a1c2545d", + "metadata": {}, + "source": [ + "First, we'll need to import the relevant CodeFlare SDK packages. You can do this by executing the below cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "51e18292", + "metadata": {}, + "outputs": [], + "source": [ + "from codeflare_sdk import Cluster, ClusterConfiguration, RayJob, TokenAuthentication" + ] + }, + { + "cell_type": "markdown", + "id": "649c5911", + "metadata": {}, + "source": [ + "Execute the below cell to authenticate the notebook via OpenShift." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dc364888", + "metadata": {}, + "outputs": [], + "source": [ + "auth = TokenAuthentication(\n", + " token = \"XXXXX\",\n", + " server = \"XXXXX\",\n", + " skip_tls=False\n", + ")\n", + "auth.login()" + ] + }, + { + "cell_type": "markdown", + "id": "5581eca9", + "metadata": {}, + "source": [ + "Next we'll need to initalize our RayCluster and apply it. You can do this be executing the below cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3094c60a", + "metadata": {}, + "outputs": [], + "source": [ + "cluster = Cluster(ClusterConfiguration(\n", + " name='rayjob-cluster',\n", + " head_extended_resource_requests={'nvidia.com/gpu':0},\n", + " worker_extended_resource_requests={'nvidia.com/gpu':0},\n", + " num_workers=2,\n", + " worker_cpu_requests=1,\n", + " worker_cpu_limits=1,\n", + " worker_memory_requests=4,\n", + " worker_memory_limits=4,\n", + "\n", + "))\n", + "\n", + "cluster.apply()" + ] + }, + { + "cell_type": "markdown", + "id": "f3612de2", + "metadata": {}, + "source": [ + "We can check the status of our cluster by executing the below cell. If it's not up immediately, run the cell a few more times until you see that it's in a 'running' state." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "96d92f93", + "metadata": {}, + "outputs": [], + "source": [ + "cluster.status()" + ] + }, + { + "cell_type": "markdown", + "id": "a0e2a650", + "metadata": {}, + "source": [ + "## Creating and Submitting the RayJob" + ] + }, + { + "cell_type": "markdown", + "id": "4cf03419", + "metadata": {}, + "source": [ + "Now we can create the RayJob that we want to submit against the running cluster. The process is quite similar to how we initialize and apply the cluster. \n", + "In this context, we need to use the `cluster_name` variable to point it to our existing cluster.\n", + "\n", + "For the sake of demonstration, the job we'll submit via the `entrypoint` is a single python command. In standard practice this would be pointed to a python training script.\n", + "\n", + "We'll then call the `submit()` function to run the job against our cluster.\n", + "\n", + "You can run the below cell to achieve this." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "94edca70", + "metadata": {}, + "outputs": [], + "source": [ + "rayjob = RayJob(\n", + " job_name=\"sdk-test-job\",\n", + " cluster_name=\"rayjob-cluster\",\n", + " namespace=\"rhods-notebooks\",\n", + " entrypoint=\"python -c 'import time; time.sleep(20)'\",\n", + ")\n", + "\n", + "rayjob.submit()" + ] + }, + { + "cell_type": "markdown", + "id": "30a8899a", + "metadata": {}, + "source": [ + "We can observe the status of the RayJob in the same way as the RayCluster by invoking the `submit()` function via the below cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3283b09c", + "metadata": {}, + "outputs": [], + "source": [ + "rayjob.submit()" + ] + }, + { + "cell_type": "markdown", + "id": "9f3c9c9f", + "metadata": {}, + "source": [ + "This function will output different tables based on the RayJob's current status. You can re-run the cell multiple times to observe the changes as you need to. Once you've observed that the job has been completed, you can shut down the cluster we created earlier by executing the below cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5b11e379", + "metadata": {}, + "outputs": [], + "source": [ + "cluster.down()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/demo-notebooks/guided-demos/preview_nbs/4_rayjob_existing_cluster.ipynb b/demo-notebooks/guided-demos/preview_nbs/4_rayjob_existing_cluster.ipynb new file mode 100644 index 00000000..7f057873 --- /dev/null +++ b/demo-notebooks/guided-demos/preview_nbs/4_rayjob_existing_cluster.ipynb @@ -0,0 +1,212 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "9259e514", + "metadata": {}, + "source": [ + "# Submitting RayJobs against an existin RayCluster\n", + "\n", + "In this notebook, we will go through the basics of using the SDK to:\n", + " * Spin up a Ray cluster with our desired resources\n", + " * Verify the status of this cluster\n", + " * Submit a RayJob against that cluster\n", + " * Verify the status of this job" + ] + }, + { + "cell_type": "markdown", + "id": "18136ea7", + "metadata": {}, + "source": [ + "## Creating the RayCluster" + ] + }, + { + "cell_type": "markdown", + "id": "a1c2545d", + "metadata": {}, + "source": [ + "First, we'll need to import the relevant CodeFlare SDK packages. You can do this by executing the below cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "51e18292", + "metadata": {}, + "outputs": [], + "source": [ + "from codeflare_sdk import Cluster, ClusterConfiguration, RayJob, TokenAuthentication" + ] + }, + { + "cell_type": "markdown", + "id": "649c5911", + "metadata": {}, + "source": [ + "Execute the below cell to authenticate the notebook via OpenShift." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dc364888", + "metadata": {}, + "outputs": [], + "source": [ + "auth = TokenAuthentication(\n", + " token = \"XXXXX\",\n", + " server = \"XXXXX\",\n", + " skip_tls=False\n", + ")\n", + "auth.login()" + ] + }, + { + "cell_type": "markdown", + "id": "5581eca9", + "metadata": {}, + "source": [ + "Next we'll need to initalize our RayCluster and apply it. You can do this be executing the below cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3094c60a", + "metadata": {}, + "outputs": [], + "source": [ + "cluster = Cluster(ClusterConfiguration(\n", + " name='rayjob-cluster',\n", + " head_extended_resource_requests={'nvidia.com/gpu':0},\n", + " worker_extended_resource_requests={'nvidia.com/gpu':0},\n", + " num_workers=2,\n", + " worker_cpu_requests=1,\n", + " worker_cpu_limits=1,\n", + " worker_memory_requests=4,\n", + " worker_memory_limits=4,\n", + "\n", + "))\n", + "\n", + "cluster.apply()" + ] + }, + { + "cell_type": "markdown", + "id": "f3612de2", + "metadata": {}, + "source": [ + "We can check the status of our cluster by executing the below cell. If it's not up immediately, run the cell a few more times until you see that it's in a 'running' state." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "96d92f93", + "metadata": {}, + "outputs": [], + "source": [ + "cluster.status()" + ] + }, + { + "cell_type": "markdown", + "id": "a0e2a650", + "metadata": {}, + "source": [ + "## Creating and Submitting the RayJob" + ] + }, + { + "cell_type": "markdown", + "id": "4cf03419", + "metadata": {}, + "source": [ + "Now we can create the RayJob that we want to submit against the running cluster. The process is quite similar to how we initialize and apply the cluster. \n", + "In this context, we need to use the `cluster_name` variable to point it to our existing cluster.\n", + "\n", + "For the sake of demonstration, the job we'll submit via the `entrypoint` is a single python command. In standard practice this would be pointed to a python training script.\n", + "\n", + "We'll then call the `submit()` function to run the job against our cluster.\n", + "\n", + "You can run the below cell to achieve this." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "94edca70", + "metadata": {}, + "outputs": [], + "source": [ + "rayjob = RayJob(\n", + " job_name=\"sdk-test-job\",\n", + " cluster_name=\"rayjob-cluster\",\n", + " namespace=\"rhods-notebooks\",\n", + " entrypoint=\"python -c 'import time; time.sleep(20)'\",\n", + ")\n", + "\n", + "rayjob.submit()" + ] + }, + { + "cell_type": "markdown", + "id": "30a8899a", + "metadata": {}, + "source": [ + "We can observe the status of the RayJob in the same way as the RayCluster by invoking the `submit()` function via the below cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3283b09c", + "metadata": {}, + "outputs": [], + "source": [ + "rayjob.submit()" + ] + }, + { + "cell_type": "markdown", + "id": "9f3c9c9f", + "metadata": {}, + "source": [ + "This function will output different tables based on the RayJob's current status. You can re-run the cell multiple times to observe the changes as you need to. Once you've observed that the job has been completed, you can shut down the cluster we created earlier by executing the below cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5b11e379", + "metadata": {}, + "outputs": [], + "source": [ + "cluster.down()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}