kubectl confluent kraft recover-region

Rebuild a surviving region’s KRaft quorum from a chosen seed controller.

To install the Confluent plugin, see Confluent plugin.

Synopsis

Step 2 of quorum-loss recovery.

Given the seed pod with the longest metadata log (chosen by you from ‘kraft log-length’ output), rebuilds the surviving region’s quorum:

  1. force-standalone on the seed (irreversible: rewrites the voter set to the seed alone, at a higher epoch).

  2. Release the seed from maintenance and wait for its pod to become Ready (it then leads the new single-voter quorum).

  3. For each non-seed controller: remove its stale __cluster_metadata-0 (a copy was already snapshotted in step 0), release it (boots empty, fetches from the leader as an Observer), then add-controller to promote it back to Voter.

  4. Clear maintenance mode.

The seed selection is the one irreversible choice and is NOT made by the plugin. Both the cluster (–context) and namespace (-n/–namespace) are required and must be explicit — a forgotten kubectl context switch must not silently fire this recovery at a healthy cluster.

Example:

kubectl confluent kraft recover-region –context gke_proj_region_rb2 -n rb2 –seed-pod kraftcontroller-1

kubectl confluent kraft recover-region [flags]

Options

    --backup-dir string         in-pod dir to snapshot each controller's pre-recovery metadata into (one copy, taken before any change); default: <metadata-log-dir>/backup. For durable rollback, mount a separate volume on the kraft pods and point this at it
    --force-standalone-done     acknowledge that you have manually completed the force-standalone step on the seed after a previous run left it interrupted or failed; the plugin records it as done and resumes from the next step. Only meaningful when a prior run left force-standalone pending — ignored otherwise (force-standalone is never skipped on a fresh run)
-h, --help                      help for recover-region
    --metadata-log-dir string   metadata log dir (parent of __cluster_metadata-0); default: auto-detected from the seed pod's kafka.properties (log.dirs)
    --seed-pod string           controller pod with the longest log to seed the new quorum (required)
    --skip-backup               skip the pre-recovery in-pod metadata snapshot entirely — use when the data volume lacks space for a copy. NO in-pod rollback copy is taken, so only use this if you have a volume-level (PV) snapshot. On a resume this also skips a snapshot that failed on a previous run instead of re-attempting it
    --timeout duration          overall timeout for the recovery (default 15m0s)
    --yes                       proceed without the interactive confirmation prompt

Options inherited from parent commands

    --as string                      Username to impersonate for the operation. User could be a regular user or a service account in a namespace.
    --as-group stringArray           Group to impersonate for the operation, this flag can be repeated to specify multiple groups.
    --as-uid string                  UID to impersonate for the operation.
    --cache-dir string               Default cache directory (default "$HOME/.kube/cache")
    --certificate-authority string   Path to a cert file for the certificate authority
    --client-certificate string      Path to a client certificate file for TLS
    --client-key string              Path to a client key file for TLS
    --cluster string                 The name of the kubeconfig cluster to use
    --context string                 The name of the kubeconfig context to use
    --disable-compression            If true, opt-out of response compression for all requests to the server
    --insecure-skip-tls-verify       If true, the server's certificate will not be checked for validity. This will make your HTTPS connections insecure
    --kubeconfig string              Path to the kubeconfig file to use for CLI requests.
-n, --namespace string               If present, the namespace scope for this CLI request
    --request-timeout string         The length of time to wait before giving up on a single server request. Non-zero values should contain a corresponding time unit (e.g. 1s, 2m, 3h). A value of zero means don't timeout requests. (default "0")
-s, --server string                  The address and port of the Kubernetes API server
    --tls-server-name string         Server name to use for server certificate validation. If it is not provided, the hostname used to contact the server is used
    --token string                   Bearer token for authentication to the API server
    --user string                    The name of the kubeconfig user to use

SEE ALSO