Skip to content

improve: status cache for next reconciliation - only the lock version #2800

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
May 16, 2025
102 changes: 17 additions & 85 deletions docs/content/en/docs/documentation/reconciler.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,23 +175,23 @@ From v5, by default, the finalizer is added using Server Side Apply. See also `U
It is typical to want to update the status subresource with the information that is available during the reconciliation.
This is sometimes referred to as the last observed state. When the primary resource is updated, though, the framework
does not cache the resource directly, relying instead on the propagation of the update to the underlying informer's
cache. It can, therefore, happen that, if other events trigger other reconciliations before the informer cache gets
cache. It can, therefore, happen that, if other events trigger other reconciliations, before the informer cache gets
updated, your reconciler does not see the latest version of the primary resource. While this might not typically be a
problem in most cases, as caches eventually become consistent, depending on your reconciliation logic, you might still
require the latest status version possible, for example if the status subresource is used as a communication mechanism,
see [Representing Allocated Values](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#representing-allocated-values)
require the latest status version possible, for example, if the status subresource is used to store allocated values.
See [Representing Allocated Values](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#representing-allocated-values)
from the Kubernetes docs for more details.

The framework provides utilities to help with these use cases with
[`PrimaryUpdateAndCacheUtils`](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java).
These utility methods come in two flavors:
The framework provides the
[`PrimaryUpdateAndCacheUtils`](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java) utility class
to help with these use cases.

#### Using internal cache

In almost all cases for this purpose, you can use internal caches:
This class' methods use internal caches in combination with update methods that leveraging
optimistic locking. If the update method fails on optimistic locking, it will retry
using a fresh resource from the server as base for modification.

```java
@Override
@Override
public UpdateControl<StatusPatchCacheCustomResource> reconcile(
StatusPatchCacheCustomResource resource, Context<StatusPatchCacheCustomResource> context) {

Expand All @@ -201,85 +201,17 @@ public UpdateControl<StatusPatchCacheCustomResource> reconcile(
var freshCopy = createFreshCopy(primary);
freshCopy.getStatus().setValue(statusWithState());

var updatedResource = PrimaryUpdateAndCacheUtils.ssaPatchAndCacheStatus(resource, freshCopy, context);

return UpdateControl.noUpdate();
}
```

In the background `PrimaryUpdateAndCacheUtils.ssaPatchAndCacheStatus` puts the result of the update into an internal
cache and will make sure that the next reconciliation will contain the most recent version of the resource. Note that it
is not necessarily the version of the resource you got as response from the update, it can be newer since other parties
can do additional updates meanwhile, but if not explicitly modified, it will contain the up-to-date status.

See related integration test [here](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework/src/test/java/io/javaoperatorsdk/operator/baseapi/statuscache/internal).

This approach works with the default configuration of the framework and should be good to go in most of the cases.
Without going further into the details, this won't work if `ConfigurationService.parseResourceVersionsForEventFilteringAndCaching`
is set to `false` (more precisely there are some edge cases when it won't work). For that case framework provides the following solution:

#### Fallback approach: using `PrimaryResourceCache` cache

As an alternative, for very rare cases when `ConfigurationService.parseResourceVersionsForEventFilteringAndCaching`
needs to be set to `false` you can use an explicit caching approach:

```java

// We on purpose don't use the provided predicate to show what a custom one could look like.
private final PrimaryResourceCache<StatusPatchPrimaryCacheCustomResource> cache =
new PrimaryResourceCache<>(
(statusPatchCacheCustomResourcePair, statusPatchCacheCustomResource) ->
statusPatchCacheCustomResource.getStatus().getValue()
>= statusPatchCacheCustomResourcePair.afterUpdate().getStatus().getValue());

@Override
public UpdateControl<StatusPatchPrimaryCacheCustomResource> reconcile(
StatusPatchPrimaryCacheCustomResource primary,
Context<StatusPatchPrimaryCacheCustomResource> context) {

// cache will compare the current and the cached resource and return the more recent. (And evict the old)
primary = cache.getFreshResource(primary);

// omitted logic

var freshCopy = createFreshCopy(primary);
var updatedResource = PrimaryUpdateAndCacheUtils.ssaPatchStatusAndCacheResource(resource, freshCopy, context);

freshCopy.getStatus().setValue(statusWithState());

var updated =
PrimaryUpdateAndCacheUtils.ssaPatchAndCacheStatus(primary, freshCopy, context, cache);

return UpdateControl.noUpdate();
}

@Override
public DeleteControl cleanup(
StatusPatchPrimaryCacheCustomResource resource,
Context<StatusPatchPrimaryCacheCustomResource> context)
throws Exception {
// cleanup the cache on resource deletion
cache.cleanup(resource);
return DeleteControl.defaultDelete();
}

```

[`PrimaryResourceCache`](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/support/PrimaryResourceCache.java)
is designed for this purpose. As shown in the example above, it is up to you to provide a predicate to determine if the
resource is more recent than the one available. In other words, when to evict the resource from the cache. Typically, as
shown in
the [integration test](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework/src/test/java/io/javaoperatorsdk/operator/baseapi/statuscache/primarycache)
you can have a counter in status to check on that.

Since all of this happens explicitly, you cannot use this approach for managed dependent resources and workflows and
will need to use the unmanaged approach instead. This is due to the fact that managed dependent resources always get
their associated primary resource from the underlying informer event source cache.

#### Additional remarks
After the update `PrimaryUpdateAndCacheUtils.ssaPatchStatusAndCacheResource` puts the result of the update into an internal
cache and the framework will make sure that the next reconciliation contains the most recent version of the resource.
Note that it is not necessarily the same version returned as response from the update, it can be a newer version since other parties
can do additional updates meanwhile. However, unless it has been explicitly modified, that
resource will contain the up-to-date status.

As shown in the integration tests, there is no optimistic locking used when updating the
[resource](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework/src/test/java/io/javaoperatorsdk/operator/baseapi/statuscache/internal/StatusPatchCacheReconciler.java#L41)
(in other words `metadata.resourceVersion` is set to `null`). This is desired since you don't want the patch to fail on
update.

In addition, you can configure the [Fabric8 client retry](https://github.com/fabric8io/kubernetes-client?tab=readme-ov-file#configuring-the-client).
See related integration test [here](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework/src/test/java/io/javaoperatorsdk/operator/baseapi/statuscache).
Loading