Fleet Management in Confluent for Kubernetes Blueprints

In Confluent for Kubernetes (CFK) Blueprints, fleet management is a process for managing and monitoring cluster functions such as security, configuration, and monitoring collectively for a set of Confluent Platform clusters in a centralized manner. This topic describes how fleet management works in CFK Blueprints.

When your ConfluentPlatformBlueprint or ClusterClass resource changes, CFK Blueprints automatically detects the changes. However, the rolling of the cluster resources is based on whether the existing clusters are in a running state or not:

If the existing cluster is not in a running state, the changes will be automatic and no user intervention is required.
If the existing cluster is in a running state, you have to manually intervene by applying the force-roll annotation to the clusters.
Changes in the Confluent component classes, such as KafkaClusterClass or ConnectClusterClass, do not require a user intervention as CFK Blueprints internally signals all the clusters to reconcile. But the ConfluentPlatformBlueprint CR has other configurations that may require a manual roll when changed.
Each cluster has a condition that gives an indication if there is any drift between the existing Blueprint or class. For example:
```
lastTransitionTime: "2023-03-16T15:19:04Z"
lastUpdateTime: "2023-03-16T21:17:26Z"
message: no blueprint or class generation mismatch
reason: ReconcileSkipped
status: "True”
type: cpc.platform.confluent.io/skip-reconcile
```

Below describes impacts of resource changes on the existing cluster managed by CFK Blueprints.

CredentialStoreConfig resource

If the spec.kubernetes.autoRoll field is set to false in the CredentialStoreConfig CR, user intervention is required. You need to apply the force-roll annotation to the cluster deployment CR that uses the CredentialStoreConfig CR.

CertificateStoreConfig resource

If a user-provided certificate authority (CA) is used, user intervention is required. You need to apply the force-roll annotation to the cluster deployment CR after updating the underlying source in the CertificateStoreConfig CR.
For a CA renewal, you need to pass the previous CA on top of the new CA, and then you need to apply the force-roll annotation to the cluster deployment CR.
If an auto-generated CA feature is used, no user intervention is required. All certificate generated from the CA is automatically updated. The default CA generated by CFK Blueprints is for 5 years. This is recommended for testing purposes only.
A user-provided certificate requires a user intervention when it expires. You need to apply the force-roll annotation to the cluster deployment CR after updating the certificate.

Force roll

To force-roll a Confluent Platform cluster, apply the annotation to the cluster as below:

kubectl annotate <CR type> <CR name> -n <namespace> \
  cpc.platform.confluent.io/force-roll=”true” \
  --overwrite

<cluster type> is one of kafkacluster, zookeepercluster, controlcentercluster, ksqldcluster, schemaregistrycluster, connectcluster, or kafkarestproxycluster.

Manual reconcile

You can add the force-reconcile annotation to force the control loop to generate the desired state. This feature force-updates current internal resources managed by CFK Blueprints to align with the desired state. This annotation works for all the CFK Blueprints CRs.

kubectl annotate <CFK Blueprint CR type> <CFK Blueprint CR name> \
  cpc.platform.confluent.io/force-reconcile=true --overwrite

Block reconcile

You can add the block-reconcile annotation to block the changes on the cluster. This helps especially in the debugging scenario. Any changes on the cluster resource will be ignored. The roper condition will be added to the cluster status resource.

kubectl annotate <CFK Blueprint CR type> <CFK Blueprint CR name> \
  cpc.platform.confluent.io/block-reconcile=true --overwrite

Known gaps

Because there is no dependency management across Confluent Platform components, some clusters will go into crash loopback for some transition period if the Confluent Platform components that they have dependency on are updating.
If automation doesn’t work for some reason, you can pass force-roll annotation to trigger an update.
```
kubectl annotate <CR type> <CR name> -n <namespace> \
  cpc.platform.confluent.io/force-roll=”true” \
  --overwrite
```
Certificate renewal for auto-generated CA is not supported.
Provide your own CA, or let CFK Blueprints manage the renewal duration which defaults to 5 years.