Disaster Recovery using VolumeSnapshots with Confluent Manager for Apache Flink¶
An important part of any system is the ability to recover from a disaster. Typically, this is done by backing up and restoring data. This topic provides an overview of how to back up and restore Flink applications with Confluent Manager for Apache Flink® (CMF), based on your cloud provider.
To backup and restore CMF metadata, you use the Kubernetes Volume Snapshots feature. Google Cloud, AWS, and Azure all support Volume Snapshots.
A Volume Snapshot is a Kubernetes resource that captures the state of a persistent volume claim (PVC) at a specific point in time. For more about Volume Snapshots, see Kubernetes Volume Snapshots. To learn more about persistent volumes, see Kubernetes Persistent Volumes.
Creating a backup of CMF metadata and restoring from a backup varies by cloud provider. Following are prerequisites and details on how to configure backup and restore for each cloud provider.
Prerequisites¶
Note the following common terminology:
- Custom Resource Definition (CRD) - A component to extend the Kubernetes API
- Container Storage Interface (CSI) - An industry standard that enables storage systems to seamlessly integrate with container orchestration platforms like Kubernetes
- Persistent volume claim (PVC) - A request for storage by a Kubernetes pod.
- Volume snapshot - A Kubernetes resource that captures the state of a persistent volume claim (PVC) at a specific point in time.
Following are prerequisites based on your cloud provider:
AWS¶
Complete the following steps to enable Volume Snapshots on Amazon Elastic Kubernetes Service (EKS):
Enable the EKS add-on for the Amazon Elastic Block Store (EBS) Container Storage Interface (CSI) driver.
Configure an IAM Role for the driver’s controller. Attach the
AmazonEBSCSIDriverPolicy
managed policy to grant necessary permissions. For example,ec2:CreateSnapshot
.Use the following commands to install the Container Storage Interface (CSI) snapshot controller components and custom resource definition (CRD):
# CRDs kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshotclasses.yaml kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshotcontents.yaml kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshots.yaml # Controllers kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/deploy/kubernetes/snapshot-controller/rbac-snapshot-controller.yaml kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/deploy/kubernetes/snapshot-controller/setup-snapshot-controller.yaml
Create the storage class:
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: ebs-sc provisioner: ebs.csi.aws.com parameters: type: gp3 reclaimPolicy: Retain volumeBindingMode: WaitForFirstConsumer
Before installing CMF, ensure that the default storage class is set to
ebs-sc
.# Disable gp2 as the default storage class kubectl patch storageclass gp2 -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}' # Enable ebs-sc as the default storage class kubectl patch storageclass ebs-sc -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
Google Cloud¶
- The Google Cloud Persistent Disk CSI driver is a managed add-on and is enabled by default for auto-pilot clusters. You may need to enable it for standard clusters. For more information, see Using the Compute Engine persistent disk CSI Driver.
- You do not need to install CRDs for volume snapshots. They are pre-installed.
Openshift on Azure¶
- Openshift provides a snapshot controller operator (
csi-snapshot-controller-operator
), so you don’t need to install CRDs.
Create the snapshots¶
Use the following steps to create a snapshot.
Install the
VolumeSnapshotClass
.apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshotClass metadata: name: cmf-snapshot driver: <ebs.csi.aws.com/pd.csi.storage.gke.io/disk.csi.azure.com> deletionPolicy: Delete
Create the snapshot.
apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshot metadata: name: first-snapshot spec: volumeSnapshotClassName: cmf-snapshot source: # The specified PVC should point to CMF PVC. persistentVolumeClaimName: confluent-manager-for-apache-flink-pvc
Restore from a volume snapshot¶
Follow these steps to restore CMF metadata from a volume snapshot.
Note
For Azure Openshift, you can also restore from the Openshift console by navigating to Storage → VolumeSnapshots. You can locate the snapshot, click the ellipses (…) and select the Restore as New PVC option. After the PVC is created on the Openshift cluster, you can continue to the next step.
Install the
VolumeSnapshot
resource.Create a
VolumeSnapshot
resource referencing theSnapshotClass
andSnapshotContent
.apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshot metadata: name: restored-snapshot spec: volumeSnapshotClassName: cmf-snapshot source: volumeSnapshotContentName: restored-vsc
Create the
VolumeSnapshotContent
resource that references the snapshot handle.apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshotContent metadata: name: restored-vsc spec: deletionPolicy: Retain driver: <ebs.csi.aws.com/pd.csi.storage.gke.io/disk.csi.azure.com> source: snapshotHandle: <snapshot-handle> volumeSnapshotClassName: cmf-snapshot volumeSnapshotRef: name: restored-snapshot namespace: confluent
Note
The
snapshotHandle
value can be obtained from your cloud provider’s UI such as the AWS Console.Create a restored PVC by using the restored volume snapshot.
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: pvc-restore namespace: confluent spec: storageClassName: ebs-sc dataSource: name: restored-snapshot kind: VolumeSnapshot apiGroup: snapshot.storage.k8s.io accessModes: - ReadWriteOnce resources: requests: storage: 20Gi
Restore CMF.
To restore CMF, create a YAML file and overwrite the following Helm values to mount the restored PVC.
# restore-cmf.yaml ... mountedVolumes: volumes: - name: cmf-metadata persistentVolumeClaim: claimName: pvc-restore volumeMounts: - name: cmf-metadata mountPath: "/app/local" ...
Install CMF through Helm with
persistence
set tofalse
.helm upgrade --install cmf confluentinc/confluent-manager-for-apache-flink -n confluent -f cmf-restore.yml --set persistence.create=false