From 538064484e69b0311229959d774fd3c05e4930b6 Mon Sep 17 00:00:00 2001 From: Jyoti Verma Date: Wed, 16 Oct 2024 17:50:57 +0000 Subject: [PATCH 1/2] added known issues --- docs/sharding/README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/sharding/README.md b/docs/sharding/README.md index 134d338b..239dd767 100644 --- a/docs/sharding/README.md +++ b/docs/sharding/README.md @@ -193,5 +193,7 @@ To debug the Oracle Globally Distributed Database Topology provisioned using the * For both ENTERPRISE and FREE Images, if the Oracle Global Service Manager (GSM) POD is stopped using `crictl stopp` at the worker node level, it leaves GSM in failed state. The `gdsctl` commands fail with error **GSM-45034: Connection to GDS catalog is not established**. This is because with change, the network namespace is lost when checked from the GSM Pod. * For both ENTERPRISE and FREE Images, restart of the node running CATALOG using `/sbin/reboot -f` results in **GSM-45076: GSM IS NOT RUNNING**. After you encounter this issue, wait until the `gdsctl` commands start working as the database connection start working. When the stack comes up again after the node restart, you can encounter an unexpected restart of the GSM Pod. +* For both ENTERPRISE and FREE Images, if the CATALOG Database Pod is stopped from the worker node using the command `crictl stopp`, then it can leave the CATALOG in an error state. This error state results in GSM reporting the error message **GSM-45034: Connection to GDS catalog is not established.** * For both ENTERPRISE and FREE Images, either restart of node running the SHARD Pod using `/sbin/reboot -f` or stopping the Shard Database Pod from the worker node using `crictl stopp` command can leave the shard in an error state. -* For both ENTERPRISE and FREE Images, after force restarts of the node running GSM Pod, the GSM pod restarts multiple times, and then becomes stable. The GSM pod restarts itself because when the worker node comes up, the GSM pod is recreated, but does not obtain DB connection to the Catalog. The Liveness Probe fails which restarts the Pod. Be aware of this issue, and permit the GSM pod to become stable. \ No newline at end of file +* For both ENTERPRISE and FREE Images, after force restarts of the node running GSM Pod, the GSM pod restarts multiple times, and then becomes stable. The GSM pod restarts itself because when the worker node comes up, the GSM pod is recreated, but does not obtain DB connection to the Catalog. The Liveness Probe fails which restarts the Pod. Be aware of this issue, and permit the GSM pod to become stable. +* **DDL Propagation from Catalog to Shards:** DDL Propagation from the Catalog Database to the Shard Databases can take several minutes to complete. To see faster propagation of DDLs such as the tablespace set from the Catalog Database to the Shard Databases, Oracle recommends that you set smaller chunk values by using the `CATALOG_CHUNKS` attribute in the .yaml file while creating the Sharded Database Topology. \ No newline at end of file From af8e19342d4d71649baa389687d1454e0cd1531f Mon Sep 17 00:00:00 2001 From: Jyoti Verma Date: Fri, 18 Oct 2024 20:01:02 +0000 Subject: [PATCH 2/2] uds doc change --- .../udsharding_scale_in_delete_an_existing_shard.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/sharding/provisioning/user-defined-sharding/udsharding_scale_in_delete_an_existing_shard.md b/docs/sharding/provisioning/user-defined-sharding/udsharding_scale_in_delete_an_existing_shard.md index adb8af30..e01e606f 100644 --- a/docs/sharding/provisioning/user-defined-sharding/udsharding_scale_in_delete_an_existing_shard.md +++ b/docs/sharding/provisioning/user-defined-sharding/udsharding_scale_in_delete_an_existing_shard.md @@ -26,7 +26,7 @@ Use the file: [udsharding_shard_prov_delshard.yaml](./udsharding_shard_prov_dels 1. Move out the chunks from the shard to be deleted to another shard. For example, in the current case, before deleting the `shard4`, if you want to move the chunks from `shard4` to `shard2`, then you can run the below `kubectl` command where `/u01/app/oracle/product/23ai/gsmhome_1` is the GSM HOME: ```sh - kubectl exec -it pod/gsm1-0 -n shns -- /u01/app/oracle/product/23ai/gsmhome_1/bin/gdsctl "move chunk -chunk all -source shard4_shard4pdb -target shard4_shard4pdb" + kubectl exec -it pod/gsm1-0 -n shns -- /u01/app/oracle/product/23ai/gsmhome_1/bin/gdsctl "move chunk -chunk all -source shard4_shard4pdb -target shard2_shard2pdb" ``` 2. Confirm the shard to be deleted (`shard4` in this case) is not having any chunk using below command: ```sh @@ -48,7 +48,7 @@ Use the file: [udsharding_shard_prov_delshard.yaml](./udsharding_shard_prov_dels - After you apply `udsharding_shard_prov_delshard.yaml`, the change may not be visible immediately and it may take some time for the delete operation to complete. - If the shard, that you are trying to delete, is still having chunks, then the you will see message like below in the logs of the Oracle Database Operator Pod. ```sh - INFO controllers.database.ShardingDatabase manual intervention required + DEBUG events Shard Deletion failed for [shard4]. Retry shard deletion after manually moving the chunks. Requeuing ``` In this case, you will need to first move out the chunks from the shard to be deleted using Step 2 above and then apply the file in Step 3 to delete that shard.