Red Hat OpenShift AI (RHOAI) builds on the capabilities of Red Hat OpenShift to provide a single, consistent, enterprise-ready hybrid AI and MLOps platform. It provides tools across the full lifecycle of AI/ML experiments and models including training, serving, monitoring, and managing AI/ML models and AI-enabled applications. This is my personal repository to test and play with some of its most important features.
RHOAI is a product under continuous improvement, so this repo will be outdated at some point in time. I recommend you to refer to the Official documentation to check the latest features or you can try the official trainings.
Red Hat OpenShift AI (RHOAI) is a platform for data scientists, AI practitioners, developers, machine learning engineers, and operations teams to prototype, build, deploy, and monitor AI models. This is a wide variety of audience that needs different kinds of training. For that reason, there are several courses that will help you to understand RHOAI from all angles:
-
AI262 - Introduction to Red Hat OpenShift AI: About configuring Data Science Projects and Jupyter Notebooks.
-
AI263 - Red Hat OpenShift AI Administration: About installing RHOAI, configuring users and permissions and creating Custom Notebook Images.
-
AI264 - Creating Machine Learning Models with Red Hat OpenShift AI: About training models and enhancing the model training.
-
AI265 - Deploying Machine Learning Models with Red Hat OpenShift AI: About serving models on RHOAI.
-
AI266 - Automating AI/ML Workflows with Red Hat OpenShift AI: About creating Data Science Pipelines, and Elyra and Kubeflow Pipelines.
-
AI267 - Developing and Deploying AI/ML Applications on Red Hat OpenShift AI: All the previous courses altogether.
The following diagram depicts the general architecture of a RHOAI deployment, including the most important components:
-
codeflare: Codeflare is an IBM software stack for developing and scaling machine-learning and Python workloads. It uses and needs the Ray component.
-
dashboard: Provides the RHOAI dashboard.
-
datasciencepipelines: This enables you to build portable machine learning workflows. Requires the OpenShift Pipelines Operator to be present before enabling the data science pipelines.
-
kserve: RHOAI uses Kserve to serve large language models that can scale based on demand. Requires the OpenShift Serverless and the OpenShift Service Mesh operators to be present before enabling the component. Does not support enabled ModelMeshServing at the same time.
-
kueue: Kueue component configuration. It is not yet in Technology Preview
-
modelmeshserving: KServe also offers a component for general-purpose model serving, called ModelMesh Mesh Serving. Activate this component to serve small and medium size models. Does not support enabled Kserve at the same time.
-
ray: Component to run the data science code in a distributed manner.
-
workbenches: Workbenches are containerized and isolated working environments for data scientists to examine data and work with data models. Data scientists can create workbenches from an existing notebook container image to access its resources and properties. Workbenches are associated to container storage to prevent data loss when the workbench container is restarted or deleted.
Installing RHOAI is not as simple as installing and configuring other operators on OpenShift. This product provides integration with hardware like NVIDIA and Intel GPUs, automation of ML workflows and AI training, and deployment of LLMs. For that reason, I’ve created an auto-install.sh
script that will do everything for you:
-
If the installation is IPI AWS, it will create MachineSets for nodes with NVIDIA GPUs (Currently,
g5.4xlarge
). -
Install all the operators that RHOAI depends on:
-
Service Mesh and Serverless to enable KServe and allow Single-Model serving platform.
-
Node Feature Discovery and Nvidia GPU Operator to discover and configure nodes with GPU.
-
Authorino, to enable token authorization for models deployed with RHOAI.
-
-
Install and configure OpenShift Data Foundation (ODF) in Multicloud Object Gateway (MCG) mode. This is a lightweight alternative that allows us to use the AWS S3 object storage the same way that we will then use Object storage on Baremetal using ODF.
-
Installs the actual RHOAI operator and configures the installation with some defaults, enabling NVIDIA acceleration and Single-Model Serving.
-
Deploys a new Data Science Project called
RHOAI Playground
enabling pipelines and deploying a basicNotebook
for testing.
Some of the components deployed in this repo are bound to an specific version of OpenShift. If you want to deploy RHOAI on an older version (For example 4.16 which is LTS), you have to make the following modifications:
-
Change the image for the Node Feature Discovery container to the one for 4.16:
-
In
./rhoai-dependencies/operator-nfd/nodefeaturediscovery-nfd-instance.yaml
, the.spec.operand.image
field should have valueregistry.redhat.io/openshift4/ose-node-feature-discovery-rhel9:v4.16
.
-
-
Change the channel of ODF:
-
In
./ocp-odf/odf-operator/sub-odf-operator.yaml
, the value of.spec.channel
field should bestable-4.16
.
-
💡
|
💡 Tip 💡 The script contains many tasks divided in clear blocks with comments. Use the Environment Variables or add comments to disable those that you are not interested in. |
In order to automate it all, it relays on OpenShift GitOps (ArgoCD), so you will to have it installed before executing the following script. Check out my automated installation on alvarolop/ocp-gitops-playground GitHub repository.
Now, log in to the cluster and just execute the script:
./auto-install.sh
Most of the activities related to RHOAI will require GPU Acceleration. For that purpose, we add NVIDIA GPU nodes during the installation process. In this chapter, I collect some information that might be useful for you.
In this automation, we are currently using the AWS g5.2xlarge
instance, that according to the documentation:
Amazon EC2 G5 instances are designed to accelerate graphics-intensive applications and machine learning inference. They can also be used to train simple to moderately complex machine learning models.
The output of the following command will only be visible when you have applied the ArgoCD Application
and the Node Feature Discovery operator has scanned the OpenShift nodes:
oc describe node | egrep 'Roles|pci'
Roles: control-plane,master
Roles: worker
feature.node.kubernetes.io/pci-1d0f.present=true
Roles: gpu-worker,worker
feature.node.kubernetes.io/pci-10de.present=true
feature.node.kubernetes.io/pci-1d0f.present=true
Roles: control-plane,master
Roles: control-plane,master
pci-10de
is the PCI vendor ID that is assigned to NVIDIA.
The NVIDIA GPU Operator automates the management of all NVIDIA software components needed to provision GPU. These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Runtime, automatic node labelling, DCGM based monitoring and others.
After configuring the Node Feature Discovery Operator and the NVidia GPU Operator using GitOps, you need to confirm that the Nvidia operator is correctly retrieving the GPU information. You can use the following command to confirm that OpenShift is correctly configured:
oc exec -it -n nvidia-gpu-operator $(oc get pod -o wide -l openshift.driver-toolkit=true -o jsonpath="{.items[0].metadata.name}" -n nvidia-gpu-operator) -- nvidia-smi
The output should look like this:
Sat Oct 26 08:47:06 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A10G On | 00000000:00:1E.0 Off | 0 |
| 0% 25C P8 22W / 300W | 1MiB / 23028MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
If, for some race condition, RHOAI is not detecting that GPU worker, you might need to force it to recalculate. You can do so easily with the following command:
oc delete cm migration-gpu-status -n redhat-ods-applications; sleep 3; oc delete pods -l app=rhods-dashboard -n redhat-ods-applications
Wait for a few seconds until the dashboard pods start again and you will see in the RHOAI web console that now the NVidia GPU
Accelerator Profile is listed.
❗
|
If you want to achieve this properly, please, don’t miss reading this repo. |
Partitioning allows for flexibility in resource management, enabling multiple applications to share a single GPU or dividing a large GPU into smaller, dedicated units for different tasks. For the sake of simplicity and maximization of the reduced resources, I have enabled time-slicing configuration. You can check the configuration in rhoai-dependencies/operator-nvidia-gpu
.
How to check that the configuration is applied?
oc get node --selector=nvidia.com/gpu.product="NVIDIA-A10G-SHARED" -o json | jq '.items[0].metadata.labels' | grep nvidia
Also, you can check these two blog entries with an analysis from the RH Performance team about this topic:
The DataSciencePipelineApplication
requires an S3-compatible storage solution to store artifacts that are generated in the pipeline. You can use any S3-compatible storage solution for data science pipelines, including AWS S3, OpenShift Data Foundation, or MinIO. The automation is currently using ODF with Nooba to interact with the AWS S3 interface, so you won’t need to do anything. Nevertheless, if you decide to disable ODF, you will need to create buckets on AWS S3 manually and for that you will need the following process:
-
Define the configuration variables for AWS is a file dubbed
aws-env-vars
. You can use the same structure as inaws-env-vars.example
-
Execute the following command to interact with the AWS API:
./prerequisites/s3-bucket/create-aws-s3-bucket.sh
-
Or execute the following command if you interact with MinIO:
./prerequisites/s3-bucket/create-minio-s3-bucket.sh
ℹ️
|
This is already included in the automation, so you don’t have to do anything with this section. |
By default, the Single Stack Serving in Openshift AI uses a self-signed certificate generated at installation for the endpoints that are created when deploying a server. This can be counter-intuitive because if you already have certificates configured on your OpenShift cluster, they will be used by default for other types of endpoints like Routes.
See the following blog entry to understand what is done in the automation.
-
Documentation: Installation guide.
-
Documentation: Configuration guide.
As the Model Registry is still Tech Preview, we still keep documentation about how to sync manually models using an OCP Job and then serve it with OpenShift AI. You can use the following Application that points to a Helm Chart that automates it:
oc apply -f application-serve-mistral-7b.yaml
sleep 4
oc create secret generic hf-creds --from-env-file=hf-creds -n mistral-7b
oc apply -f application-serve-granite-1b-a400m.yaml
sleep 4
oc create secret generic hf-creds --from-env-file=hf-creds -n granite-1b-a400m
oc apply -f application-serve-nomic-embed-text-v1.yaml
sleep 4
oc create secret generic hf-creds --from-env-file=hf-creds -n nomic-embed-text-v1
# Retrieve certificates
openssl s_client -showcerts -connect mistral-7b.mistral-7b.svc.cluster.local:443 </dev/null
# Check models endpoint
curl --cacert /etc/pki/ca-trust/source/anchors/service-ca.crt https://mistral-7b.mistral-7b.svc.cluster.local:443/v1/models
# Check Completion (It might be /v1/chat/completions)
curl -s -X 'POST' https://mistral-7b.mistral-7b.svc.cluster.local/v1/completions -H 'Accept: application/json' -H 'Content-Type: application/json' -d '{"model": "mistral-7b","prompt": "San Francisco is a"}'
# Embeddings
curl -s -X 'POST' https://mistral-7b.mistral-7b.svc.cluster.local/v1/completions -H 'Accept: application/json' -H 'Content-Type: application/json' -d '{"model": "mistral-7b","prompt": "San Francisco is a"}'
curl -s -X 'POST' \
"https://nomic-embed-text-v1.nomic-embed-text-v1.svc.cluster.local/v1/embeddings" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "nomic-embed-text-v1",
"input": ["En un lugar de la Mancha..."]
}'
# API Endpoints:
# * Ollama => https://nomic-embed-text-v1.nomic-embed-text-v1.svc.cluster.local/api/embed
# * OpenAI => https://nomic-embed-text-v1.nomic-embed-text-v1.svc.cluster.local/embeddings
To ensure that machine-learning models are transparent, fair, and reliable, data scientists can use TrustyAI in OpenShift AI to monitor their data science models. Data scientists can monitor their data science models in OpenShift AI for Bias and Data Drift.
TRUSTY_ROUTE=$(oc get route/trustyai-service --template="https://{{.spec.host}}")
💡
|
This section is already fully automated in the GitOps deployment during the auto-install.sh , but if you need to deploy it manually, you can follow the steps from this section.
|
This section will guide you on how we are deploying ODF to provide internal S3 storage on our cluster.
|
Make sure to have at least three worker nodes!! |
-
Install the ODF operator.
oc apply -k ocp-odf/odf-operator
-
Install the ODF cluster
oc apply -f ocp-odf/storagecluster-ocs-storagecluster.yaml
-
Install RadosGW to provide S3 storage based on Ceph on OCP clusters deployed on Cloud Providers:
oc apply -k ocp-odf/radosgw
This workshop guide is a good read to understand the RadosGW configuration.
ℹ️
|
If you want to test your ODF deployment, not with a real use-case, but with a funny example, >> Click Here << |
Let’s now test our configuration and create a bucket to store a model in ODF.
-
Create a bucket:
oc apply -k ocp-odf/rhoai-models
-
Create a secret with the credentials
oc create secret generic hf-creds --from-env-file=hf-creds -n rhoai-models
You just need to retrieve the credentials to the bucket and point to the bucket route url:
export AWS_ACCESS_KEY_ID=$(oc get secret models -n rhoai-models -o jsonpath='{.data.AWS_ACCESS_KEY_ID}' | base64 --decode)
export AWS_SECRET_ACCESS_KEY=$(oc get secret models -n rhoai-models -o jsonpath='{.data.AWS_SECRET_ACCESS_KEY}' | base64 --decode)
export BUCKET_HOST=$(oc get route s3-rgw -n openshift-storage --template='{{ .spec.host }}')
export BUCKET_PORT=$(oc get configmap models -n rhoai-models -o jsonpath='{.data.BUCKET_PORT}')
export BUCKET_NAME="models"
export MODEL_NAME="ibm-granite/granite-3.0-1b-a400m-instruct"
And then execute normal aws-cli
commands against the bucket:
aws s3 ls s3://${BUCKET_NAME}/$MODEL_NAME/ --endpoint-url http://$BUCKET_HOST:$BUCKET_PORT
Red Hat OpenShift Lightspeed is a generative AI-powered virtual assistant for OpenShift Container Platform. Lightspeed functionality uses a natural-language interface in the OpenShift web console.
oc apply -f application-ocp-lightspeed.yaml
or you can deploy it manually with the following command:
oc apply -k components/ocp-lightspeed
This demo is fully oriented to use the default and production ready capabilities provided by OpenShift. However, if your current deployment already uses minio and you cannot change it, you can optionally deploy a MinIO application in a side namespace using the following ArgoCD application. This application is included in the auto-install.sh
automation:
oc apply -f application-minio.yaml
User and password is minio
/ minio123
.
Open WebUI is an extensible, feature-rich, and user-friendly self-hosted AI platform designed to operate entirely offline. It supports various LLM runners like Ollama and OpenAI-compatible APIs, with built-in inference engine for RAG, making it a powerful AI deployment solution.
cat application-open-webui.yaml | \
CLUSTER_DOMAIN=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') \
LLM_INFERENCE_SERVICE_URL="https://mistral-7b.mistral-7b.svc.cluster.local/v1" \
envsubst | oc apply -f -
or you can deploy it manually with the following command:
helm template components/open-webui --namespace="open-webui" \
--set llmInferenceService.url="https://mistral-7b.mistral-7b.svc.cluster.local/v1" \
--set clusterDomain=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') \
--set rag.enabled="true" | oc apply -f -
Milvus is Vector database built for scalable similarity search. It is "Open-source, highly scalable, and blazing fast". Milvus offers robust data modeling capabilities, enabling you to organize your unstructured or multi-modal data into structured collections.
Attu is an efficient open-source management tool for Milvus. It features an intuitive graphical user interface (GUI), allowing you to easily interact with your databases.
cat application-milvus.yaml | \
CLUSTER_DOMAIN=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') \
envsubst | oc apply -f -
or you can deploy it manually with the following command:
helm template components/milvus --namespace="milvus" \
--set clusterDomain=$(oc get dns.config/cluster -o jsonpath='{.spec.baseDomain}') | oc apply -f -
The password for the Attu GUI is root
/ Milvus
.
-
https://redhatquickcourses.github.io/rhods-admin/rhods-admin/1.33
-
https://redhatquickcourses.github.io/rhods-intro/rhods-intro/1.33
-
https://redhatquickcourses.github.io/rhods-model/rhods-model/1.33
-
https://rh-aiservices-bu.github.io/insurance-claim-processing/modules/02-03-creating-workbench.html
-
https://developers.redhat.com/products/red-hat-openshift-ai/getting-started