Troubleshoot Key Policy Issues for Self-Managed Encryption Keys

This guide helps you identify, diagnose, and resolve key policy issues that can affect your Confluent Cloud clusters using self-managed encryption keys (BYOK). Key policy misconfigurations can cause cluster creation failures, service disruptions, or access issues.

Common Key Policy Issues

The following table describes common key policy issues and their symptoms:

Issue Symptoms Typical Causes
Cluster creation fails Error message during cluster creation: “encryption key is not valid or not authorized” Missing or incorrect Confluent permissions in key policy
Cluster stops responding Cluster becomes unavailable, producers/consumers fail Key policy was modified to remove Confluent access
Key rotation fails Automatic key rotation doesn’t work, error messages in cloud provider logs Insufficient permissions for key management operations
Access denied errors “AccessDenied” or “UnauthorizedOperation” errors in cluster logs Expired credentials, incorrect ARNs, or revoked permissions

Diagnose Key Policy Issues

Follow these steps to diagnose key policy issues:

Step 1: Check cluster status

  1. Log in to the Confluent Cloud Console at https://confluent.cloud/.
  2. Navigate to your cluster and check the cluster status and health indicators.
  3. Look for any error messages or warnings related to encryption or key access.

Step 2: Review cloud provider logs

Check your cloud provider’s logs for key-related errors:

Use AWS CloudTrail to review recent KMS API events:

aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=EventSource,AttributeValue=kms.amazonaws.com \
  --start-time "$(date -v-1H +%Y-%m-%dT%H:%M:%S)" \
  --end-time "$(date +%Y-%m-%dT%H:%M:%S)"

Look for AccessDenied, InvalidKeyUsage, or KMSAccessDenied errors. For details, see Logging AWS KMS API calls with AWS CloudTrail.

Step 3: Validate key policy configuration

Use the validation checklist in Best Practices for Using Self-Managed Encryption Keys to verify your key policy configuration is correct for your cloud provider.

Resolve Key Policy Issues

Fix missing or incorrect permissions

If your cluster creation fails or stops working due to missing permissions:

Review and update your KMS key policy to include the required Confluent permissions. You can view the current policy using the AWS CLI:

aws kms get-key-policy \
  --key-id <your-key-id> \
  --policy-name default

Ensure both required permission statements are present:

{
  "Sid": "Allow Confluent account(s) to use the key",
  "Effect": "Allow",
  "Principal": {
    "AWS": [
      "arn:aws:iam::<confluent-account-id>:role/<confluent-role>"
    ]
  },
  "Action": [
    "kms:Encrypt",
    "kms:Decrypt",
    "kms:ReEncrypt*",
    "kms:GenerateDataKey*",
    "kms:DescribeKey"
  ],
  "Resource": "*"
},
{
  "Sid": "Allow Confluent account(s) to attach persistent resources",
  "Effect": "Allow",
  "Principal": {
    "AWS": [
      "arn:aws:iam::<confluent-account-id>:role/<confluent-role>"
    ]
  },
  "Action": [
    "kms:CreateGrant",
    "kms:ListGrants",
    "kms:RevokeGrant"
  ],
  "Resource": "*"
}

If statements are missing or ARNs are incorrect, update your key policy using the AWS CLI:

aws kms put-key-policy \
  --key-id <your-key-id> \
  --policy-name default \
  --policy file://updated-policy.json

For detailed steps on modifying key policies, see Changing a key policy in the AWS documentation.

Recover from key policy lockout

If you accidentally removed Confluent’s access and your cluster is no longer working:

Warning

Key policy lockouts can cause immediate cluster unavailability. Act quickly to restore access and minimize service disruption.

  1. Immediately restore the previous working key policy from your backup.
  2. If you don’t have a backup, recreate the required permissions using the examples in the previous section.
  3. Monitor cluster health after restoring permissions. It may take several minutes for the cluster to recover.
  4. Contact Confluent Support if the cluster doesn’t recover within 30 minutes of restoring permissions.

Fix expired or rotated credentials

If Confluent’s access credentials have expired or been rotated:

  1. Check if the IAM role referenced in your key policy still exists.
  2. Verify the role’s trust policy allows Confluent to assume it.
  3. If the role was deleted, contact Confluent Support for the current role information and update your key policy.

Prevent Future Key Policy Issues

To prevent key policy issues in the future:

  • Implement change management processes for key policy modifications.
  • Use infrastructure as code (Terraform, CloudFormation, ARM templates) to manage key policies consistently.
  • Set up monitoring and alerting for key access failures.
  • Regularly review and audit key policies and permissions.
  • Test disaster recovery procedures in non-production environments.
  • Keep documentation updated with current Confluent account and role information.

Get Additional Help

If you continue to experience key policy issues:

  1. Collect diagnostic information:
    • Cluster ID and environment details
    • Error messages from the Confluent Cloud Console
    • Cloud provider error logs
    • Current key policy configuration
  2. Contact Confluent Support with the diagnostic information.
  3. For urgent production issues, use the emergency contact procedures provided in your support agreement.

Related documentation: