Troubleshoot Key Policy Issues for Self-Managed Encryption Keys¶
This guide helps you identify, diagnose, and resolve key policy issues that can affect your Confluent Cloud clusters using self-managed encryption keys (BYOK). Key policy misconfigurations can cause cluster creation failures, service disruptions, or access issues.
Common Key Policy Issues¶
The following table describes common key policy issues and their symptoms:
Issue | Symptoms | Typical Causes |
---|---|---|
Cluster creation fails | Error message during cluster creation: “encryption key is not valid or not authorized” | Missing or incorrect Confluent permissions in key policy |
Cluster stops responding | Cluster becomes unavailable, producers/consumers fail | Key policy was modified to remove Confluent access |
Key rotation fails | Automatic key rotation doesn’t work, error messages in cloud provider logs | Insufficient permissions for key management operations |
Access denied errors | “AccessDenied” or “UnauthorizedOperation” errors in cluster logs | Expired credentials, incorrect ARNs, or revoked permissions |
Diagnose Key Policy Issues¶
Follow these steps to diagnose key policy issues:
Step 1: Check cluster status¶
- Log in to the Confluent Cloud Console at https://confluent.cloud/.
- Navigate to your cluster and check the cluster status and health indicators.
- Look for any error messages or warnings related to encryption or key access.
Step 2: Review cloud provider logs¶
Check your cloud provider’s logs for key-related errors:
Use AWS CloudTrail to review recent KMS API events:
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=EventSource,AttributeValue=kms.amazonaws.com \
--start-time "$(date -v-1H +%Y-%m-%dT%H:%M:%S)" \
--end-time "$(date +%Y-%m-%dT%H:%M:%S)"
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=EventSource,AttributeValue=kms.amazonaws.com \
--start-time "$(date -d '1 hour ago' +%Y-%m-%dT%H:%M:%S)" \
--end-time "$(date +%Y-%m-%dT%H:%M:%S)"
Look for AccessDenied
, InvalidKeyUsage
, or KMSAccessDenied
errors.
For details, see Logging AWS KMS API calls with AWS CloudTrail.
Use Azure Activity Log to review Key Vault access attempts:
az monitor activity-log list \
--resource-group <your-resource-group> \
# For Linux (GNU date):
--start-time $(date -d '1 hour ago' -u +%Y-%m-%dT%H:%M:%SZ) \
# For macOS/BSD (BSD date):
# --start-time $(date -v -1H -u +%Y-%m-%dT%H:%M:%SZ) \
--query "[?contains(resourceId, 'Microsoft.KeyVault')]"
Look for Forbidden
, Unauthorized
, or KeyVaultAccessDenied
errors.
For detailed monitoring guidance, see
Monitoring and alerting for Azure Key Vault.
Use Google Cloud Logging to review KMS access attempts:
gcloud logging read \
"logName=projects/<project>/logs/cloudaudit.googleapis.com%2Factivity AND \
protoPayload.serviceName=cloudkms.googleapis.com AND \
protoPayload.resourceName=projects/<project>/locations/<location>/keyRings/<keyring>/cryptoKeys/<key>" \
--since=1h
Look for PERMISSION_DENIED
, INVALID_ARGUMENT
, or FAILED_PRECONDITION
errors.
For detailed logging guidance, see Viewing Cloud KMS logs.
Step 3: Validate key policy configuration¶
Use the validation checklist in Best Practices for Using Self-Managed Encryption Keys to verify your key policy configuration is correct for your cloud provider.
Resolve Key Policy Issues¶
Fix missing or incorrect permissions¶
If your cluster creation fails or stops working due to missing permissions:
Review and update your KMS key policy to include the required Confluent permissions. You can view the current policy using the AWS CLI:
aws kms get-key-policy \
--key-id <your-key-id> \
--policy-name default
Ensure both required permission statements are present:
{ "Sid": "Allow Confluent account(s) to use the key", "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::<confluent-account-id>:role/<confluent-role>" ] }, "Action": [ "kms:Encrypt", "kms:Decrypt", "kms:ReEncrypt*", "kms:GenerateDataKey*", "kms:DescribeKey" ], "Resource": "*" }, { "Sid": "Allow Confluent account(s) to attach persistent resources", "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::<confluent-account-id>:role/<confluent-role>" ] }, "Action": [ "kms:CreateGrant", "kms:ListGrants", "kms:RevokeGrant" ], "Resource": "*" }
If statements are missing or ARNs are incorrect, update your key policy using the AWS CLI:
aws kms put-key-policy \
--key-id <your-key-id> \
--policy-name default \
--policy file://updated-policy.json
For detailed steps on modifying key policies, see Changing a key policy in the AWS documentation.
Verify the Confluent service principal has the required role assignments using the Azure CLI:
az role assignment list \
--assignee <confluent-service-principal-id> \
--resource-group <your-resource-group>
The service principal should have these roles:
Key Vault Crypto Service Encryption User
Key Vault Reader
If roles are missing, add them using:
az role assignment create \
--assignee <confluent-service-principal-id> \
--role "Key Vault Crypto Service Encryption User" \
--scope <key-vault-resource-id>
For detailed steps on managing Key Vault access, see Provide access to Key Vault keys, certificates, and secrets with an Azure role-based access control.
Verify the Confluent Google Group has the required permissions using the Google Cloud CLI:
gcloud kms keys get-iam-policy <key-name> \
--location=<location> \
--keyring=<keyring>
If the Confluent Google Group ID is missing, add it with the custom role:
gcloud kms keys add-iam-policy-binding <key-name> \
--location=<location> \
--keyring=<keyring> \
--member="group:<confluent-google-group-id>" \
--role="projects/<project>/roles/<custom-role-name>"
The custom role should include:
cloudkms.cryptoKeyVersions.useToDecrypt
cloudkms.cryptoKeyVersions.useToEncrypt
cloudkms.cryptoKeys.get
For detailed steps on managing KMS permissions, see Using IAM with Cloud KMS.
Recover from key policy lockout¶
If you accidentally removed Confluent’s access and your cluster is no longer working:
Warning
Key policy lockouts can cause immediate cluster unavailability. Act quickly to restore access and minimize service disruption.
- Immediately restore the previous working key policy from your backup.
- If you don’t have a backup, recreate the required permissions using the examples in the previous section.
- Monitor cluster health after restoring permissions. It may take several minutes for the cluster to recover.
- Contact Confluent Support if the cluster doesn’t recover within 30 minutes of restoring permissions.
Fix expired or rotated credentials¶
If Confluent’s access credentials have expired or been rotated:
- Check if the IAM role referenced in your key policy still exists.
- Verify the role’s trust policy allows Confluent to assume it.
- If the role was deleted, contact Confluent Support for the current role information and update your key policy.
- Verify the Confluent service principal still exists in your Azure AD.
- Check if the service principal’s credentials have expired.
- If expired, contact Confluent Support for updated service principal information.
- Verify the Google Group ID is still valid and active.
- Check if the group membership or permissions have changed.
- If the group ID has changed, contact Confluent Support for the updated information.
Prevent Future Key Policy Issues¶
To prevent key policy issues in the future:
- Implement change management processes for key policy modifications.
- Use infrastructure as code (Terraform, CloudFormation, ARM templates) to manage key policies consistently.
- Set up monitoring and alerting for key access failures.
- Regularly review and audit key policies and permissions.
- Test disaster recovery procedures in non-production environments.
- Keep documentation updated with current Confluent account and role information.
Get Additional Help¶
If you continue to experience key policy issues:
- Collect diagnostic information:
- Cluster ID and environment details
- Error messages from the Confluent Cloud Console
- Cloud provider error logs
- Current key policy configuration
- Contact Confluent Support with the diagnostic information.
- For urgent production issues, use the emergency contact procedures provided in your support agreement.
Related documentation: