Many updates based on 2021-02-09 SIG-mc meeting

shvgn · Feb 23, 2021 · d422594 · d422594
1 parent 45fb1fe
commit d422594
Showing 1 changed file with 52 additions and 38 deletions.
diff --git a/keps/sig-multicluster/2149-clusterid/README.md b/keps/sig-multicluster/2149-clusterid/README.md
@@ -353,19 +353,6 @@ Contains a unique identifier for the containing cluster.
 
 **Reusing cluster names**: Since an `id.k8s.io ClusterClaim` has no restrictions on whether or not a ClusterClaim can be repeatable, if a cluster unregisters from a ClusterSet it is permitted under this standard to rejoin later with the same `id.k8s.io ClusterClaim` it had before. Similarly, a *different* cluster could join a ClusterSet with the same `id.k8s.io ClusterClaim` that had been used by another cluster previously, as long as both do not have membership in the same ClusterSet at the same time. Finally, two or more clusters may have the same `id.k8s.io ClusterClaim` concurrently (though they **should** not; see "Uniqueness" above) *as long as* they both do not have membership in the same ClusterSet.
 
-  ```
-  <<[UNRESOLVED]>>
-  We could probably use some example scenarios describing how some dependent tool should handle IDs changing or disappearing during various stages of the cluster/membership lifecycle - @jeremyot
-  <<[/UNRESOLVED]>>
-  ```
-
-  ```
-  <<[UNRESOLVED]>>
-  How should uniqueness restrictions be handled to clusters who were originally isolated and only acquired/verified an id when joining a ClusterSet
-  <<[/UNRESOLVED]>>
-  ```
-
-
 #### Claim: `clusterset.k8s.io`
 
 Contains an identifier that relates the containing cluster to the ClusterSet in which it belongs.
@@ -433,50 +420,77 @@ This proposal suggests a CRD composed of objects all of the same `Kind` `Cluster
 Storing arbitrary facts about a cluster can be implemented in other ways. For example, Cluster API subproject stopgapped their need for cluster name metadata by leveraging the existing `Node` `Kind` and storing metadata there via annotations, such as `cluster.x-k8s.io/cluster-name` ([ref](https://github.com/kubernetes-sigs/cluster-api/pull/4048)). While practical for their case, this KEP avoids adding cluster-level info as annotations on child resources so as not to be dependent on a child resource's existence, to avoid issues maintaining parity across multiple resources of the same `Kind` for identical metadata, and maintain RBAC separation between the cluster-level metadata and the child resources. Even within the realm of implementing as a CRD, the API design could focus on distinguishing each fact by utilizing different `spec.Type`s (as `Service` objects do e.g. `spec.type=ClusterIP` or `spec.type=ExternalName`), or even more strictly, each as a different `Kind`.  The former provides no specific advantages since multiple differently named claims for the same fact are unnecessary, and is less expressive to query (it is easier to query by name directly like `kubectl get clusterclaims id.k8s.io`). The latter would result in the proliferation of cluster-wide singleton `Kind` resources, and be burdensome for users to create their own custom claims.
 
 
-### Implementing the `ClusterClaim` CRD
+### Implementing the `ClusterClaim` CRD and its admission controllers
 
 #### `id.k8s.io ClusterClaim`
 
 The actual implementation to select and store the identifier of a given cluster could occur local to the cluster. It does not necessarily ever need to be deleted, particularly if the identifier selection mechanism chooses an identifier that is compliant with this specification's most broad restrictions -- namely, being immutable for a cluster's lifetime and unique beyond just the scope of the cluster's membership. A recommended option that meets these broad restrictions is a cluster's kube-system.uuid. 
 
 That being said, for less stringent identifiers, for example a user-specified and human-readable value, a given `id.k8s.io ClusterClaim` may need to change if an identical identifier is in use by another member of the ClusterSet it wants to join. It is likely this would need to happen outside the cluster-local boundary; for example, whatever manages memberships would likely need to deny the incoming cluster, and potentially assign (or prompt the cluster to assign itself) a new ID.
 
-The most common discussion point within the SIG regarding these two options is when it comes to DNS. Since DNS names are originally intended to be the a human readable technique of address, clunky DNS names composed from long UUIDs seems like an anti-pattern, or at least unfinished. `<<[UNRESOLVED]>>So what are we gonna do about it?<<[/UNRESOLVED]>>`
+Since this KEP does not formally mandate that the cluster ID *must* be immutable for the lifetime of the cluster, only for the lifetime of its membership in a ClusterSet, any dependent tooling explicitly *cannot* assume the `id.k8s.io ClusterClaim` for a given cluster will stay constant on its own merit. For example, log aggregation of a given cluster ID based on this claim should only be trusted to be referring to the same cluster for as long as it has one ClusterSet membership; similarly, controllers whose logic depends on distinguishing clusters by cluster ID can only trust this claim to disambiguate the same cluster for as long as the cluster has one ClusterSet membership.
 
-  ```
-  <<[UNRESOLVED]>>
-  Do we need examples/guidance on the recommended structure of the claim-dependent value? Can we/should we recommend (/enforce?) not dropping arbitrary JSON in there? For example, that the value of `id.k8s.io` would likely be a string, probably the kube-system uuid.
-  
-  An example object of `id.k8s.io ClusterClaim`:
+Despite this flexibility in the KEP, clusterIDs may still be useful before ClusterSet membership needs to be established; again, particularly if the implementation chooses the broadest restrictions regarding immutability and uniqueness. Therefore, having a controller that initializes it early in the lifecycle of the cluster, and possibly as part of cluster creation, may be a useful place to implement it, though within the bounds of this KEP that is not strictly necessary.
 
-  apiVersion: multicluster.k8s.io/v1
-  kind: ClusterClaim
-  name: id.k8s.io
-  spec:
-    value: 721ab723-13bc-11e5-aec2-42010af0021e
+The most common discussion point within the SIG regarding whether an implementation should favor a UUID or a human-readable clusterID string is when it comes to DNS. Since DNS names are originally intended to be the a human readable technique of address, clunky DNS names composed from long UUIDs seems like an anti-pattern, or at least unfinished. While some extensions to this spec have been discussed as ways to leverage the best parts of both (ex. using labels on the `id.k8s.io ClusterClaim` to store aliases for DNS), an actual API specification to allow for this is outside the scope of this KEP at this time (see the Non-Goals section).
 
-  <<[/UNRESOLVED]>>
-  ```
+```
+# An example object of `id.k8s.io ClusterClaim`:
+
+apiVersion: multicluster.k8s.io/v1
+kind: ClusterClaim
+metadata:
+  name: id.k8s.io
+spec:
+  value: 721ab723-13bc-11e5-aec2-42010af0021e
+```
 
 #### `clusterset.k8s.io ClusterClaim`
 
-A cluster is expected to be authoritatively associated with a ClusterSet by an external cluster registry. Mirroring this information in the cluster-local `ClusterClaim` CRD will necessarily need to be managed above the level of the cluster itself, since the properties of `clusterset.k8s.io` extend beyond the boundaries of a single cluster, and likely by something that has access to the cluster registry. It is expected that the mcs-controller ([as described in the MCS API KEP](https://github.com/kubernetes/enhancements/tree/master/keps/sig-multicluster/1645-multi-cluster-services-api#proposal)), will act as an admission controller to verify individual objects of this claim.
+A cluster in a ClusterSet is expected to be authoritatively associated with that ClusterSet by an external process and storage mechanism with a purview above the cluster local boundary, whether that is a cluster registry or just a human running kubectl. (The details of any specific mechanism is out of scope for the MCS API and this KEP -- see the Non-Goals section.) Mirroring this information in the cluster-local `ClusterClaim` CRD will necessarily need to be managed above the level of the cluster itself, since the properties of `clusterset.k8s.io` extend beyond the boundaries of a single cluster, and will likely be something that has access to the cluster registry. It is expected that the mcs-controller ([as described in the MCS API KEP](https://github.com/kubernetes/enhancements/tree/master/keps/sig-multicluster/1645-multi-cluster-services-api#proposal)), will act as an admission controller to verify individual objects of this claim.
 
-  ```
-  <<[UNRESOLVED]>>
-  Do we need examples/guidance on the recommended structure of the claim-dependent value? Can we/should we recommend (/enforce?) not dropping arbitrary JSON in there? For example, that the value of `clusterset.k8s.io` would likely be a string, probably the name of the membership used by the cluster registry that the given implementation is using.
+Because there are obligations of the `id.k8s.io ClusterClaim` that are not meanigfully verifiable until a cluster tries to join a ClusterSet and set its `clusterset.k8s.io ClusterClaim`, the admission controller responsible for setting a `clusterset.k8s.io ClusterClaim` will need the ability to reject such an attempt when it is invalid, and alert `[UNRESOLVED]` or possibly affect changes to that cluster's `id.k8s.io ClusterClaim` to make it valid `[/UNRESOLVED]`. Two symptomatic cases of this would be:
+
+1. When a cluster with a given `id.k8s.io ClusterClaim` ties to join a ClusterSet, but a cluster with that same `id.k8s.io ClusterClaim` appears to already be in the set.
+2. When a cluster that does not have a `id.k8s.io ClusterClaim` tries to join a ClusterSet.
 
-  An example object of `clusterset.k8s.io ClusterClaim`:
+In situations like these, the admission controller will need to fail to add the invalid cluster to the ClusterSet by refusing to set its `clusterset.k8s.io ClusterClaim`, and surface an error that is actionable to make the claim valid.
 
-  apiVersion: multicluster.k8s.io/v1
-  kind: ClusterClaim
+```
+# An example object of `clusterset.k8s.io ClusterClaim`:
+
+apiVersion: multicluster.k8s.io/v1
+kind: ClusterClaim
+metadata:
   name: clusterset.k8s.io
-  spec:
-    value: environ-1
-  
-  <<[/UNRESOLVED]>>
+spec:
+  value: environ-1
 ```
 
+### CRD upgrade path
+
+#### To k/k or not to k/k?
+
+_That is the question._
+
+`[UNRESOLVED] Must resolve before alpha`
+
+> While this document has thus far referred to the `ClusterClaim` resource as being implemented as a CRD, another implementation point of debate has been whether this belongs in the core Kubernetes API, particularly the `id.k8s.io ClusterClaim`. A dependable cluster ID or cluster name has previously been discussed in other forums (such as [this SIG-Architecture thread](https://groups.google.com/g/kubernetes-sig-architecture/c/mVGobfD4TpY/m/nkdbkX1iBwAJ) from 2018, or, as mentioned above, the [Cluster API subproject](https://github.com/kubernetes-sigs/cluster-api/issues/4044) which implemented [their own solution](https://github.com/kubernetes-sigs/cluster-api/pull/4048).) It is the opinion of SIG-Multicluster that the function of the proposed `ClusterClaim` CRD is of broad utility and becomes more useful the more ubiquitous it is, not only in multicluster set ups.
+
+> This has led to the discussion of whether or not we should pursue adding this resource type not as a CRD associated with SIG-Multicluster, but as a core Kubernetes API implemented in `kubernetes/kubernetes`. A short pro/con list is enclosed at the end of this section.
+
+> One effect of that decision is related to the upgrade path. Implementing this resource only in k/k will restrict the types of clusters that can use cluster ID to only ones on the target version (or above) of Kubernetes, unless a separate backporting CRD is made available to them. At that point, with two install options, other issues arise. How do backported clusters deal with migrating their CRD data to the core k/k objects during upgrade -- will the code around the formal k/k implementation be sensitive to the backport CRD and migrate itself? Will users have to handle upgrades in a bespoke manner?
+
+|                       | CRD                                                                              | k/k                                               |
+|-----------------------|----------------------------------------------------------------------------------|---------------------------------------------------|
+| Built-in / ubiquitous | Unlikely (?)                                                                     | Likely (?)                                        |
+| Deployment            | Must be installed by the cluster lifecycle management, or as a manual setup step | In every cluster over target milestone            |
+| Schema validation     | Can use OpenAPI v3 validation                                                    | Can use the built-in Kubernetes schema validation |
+| Blockers     | Making a sigs-repo                                                    | Official API review |
+
+`[/UNRESOLVED]`
+
+
 ### Test Plan
 
 <!--