Troubleshoot SSO for Control Center on Confluent Platform¶
If you are having trouble with SSO using OIDC, review the following common issues encountered during configuration and possible solutions.
If you are still having trouble after reviewing the issues and solutions, contact Confluent Support.
Common issues¶
Click the following links to jump to sections that might be relevant to you:
- Misconfiguration of identity provider endpoints in Confluent Platform
- Misconfiguration of client credentials in Confluent Platform
- Misconfiguration of redirect callback URLs (to Confluent Platform) in the identity provider
- Misconfiguration of claims
- Session management problems
- Connectivity issues between the identity provider and Confluent Platform
- Authorization failures
Misconfiguration of identity provider endpoints in Confluent Platform¶
The following are common misconfigurations of identity provider endpoints in the Confluent Platform cluster for endpoint configurations: Authorize, Token, Issuer, and JWKS.
Authorize endpoint misconfiguration¶
Problem¶
Confluent Platform is unable to raise authorize request to the identity provider. This happens because authorize URI is misconfigured.
Solution¶
Authorize configuration needs to be fixed.
- Ansible Playbooks for Confluent Platform:
sso_authorize_uri
- Confluent for Kubernetes:
authorizeBaseEndpointUri
Additional details¶
- Use OpenID Connect metadata discovery URIs to verify the correct endpoints for your identity provider. To get details on the endpoints for your identity provider, check their documentation. See 1 - Establish trust between the IdP and Confluent Platform for more information about endpoints for Okta, Microsoft Entra ID (Azure Active Directory), and Keycloak.
- The error is visible on the web browser during authentication and also is output
to the MDS log files for your Confluent Platform cluster. By default, the log location is
/tmp/kafka-logs
, but you can modify the location using thelog.dirs
value in your broker configuration files or using the Java system property while starting the Confluent Platform cluster with-Dkafka.logs.dir
.
Token endpoint misconfiguration¶
Problem¶
Confluent Platform is unable to request a token from the identity provider because the token endpoint is misconfigured.
Here’s an example of the error message:
/* Example error when token endpoint is misconfigured */
{
"status_code": 500,
"message": "java.lang.RuntimeException: Got bad request status from IdP: {\"error\":\"invalid_request\",\"error_description\":\"AADSTS900023: Specified tenant identifier '0893715bxxx-959b-4906-a185-2789e1ead045' is neither a valid DNS name, nor a valid external domain.\\r\\nTrace ID: 907463ce-d595-4dcc-a89b-4098d5d61600\\r\\nCorrelation ID: f8c6cf9c-938d-411a-9ccc-de6d69114e03\\r\\nTimestamp: 2023-06-20 09:31:58Z\",\"error_codes\":[900023],\"timestamp\":\"2023-06-20 09:31:58Z\",\"trace_id\":\"907463ce-d595-4dcc-a89b-4098d5d61600\",\"correlation_id\":\"f8c6cf9c-938d-411a-9ccc-de6d69114e03\",\"error_uri\":\"https://login.microsoftonline.com/error?code=900023\"}"
}
Solution¶
The token configuration needs to be fixed.
- Ansible Playbooks for Confluent Platform:
sso_token_uri
- Confluent for Kubernetes:
tokenBaseEndpointUri
Additional details¶
- Use OpenID Connect metadata discovery URIs to verify the correct endpoints for your identity provider. To get details on the endpoints for your identity provider, check their documentation. See 1 - Establish trust between the IdP and Confluent Platform for more information about endpoints for Okta, Microsoft Entra ID (Azure Active Directory), and Keycloak.
- The error is visible on the web browser during authentication and also is output
to the MDS log files for your Confluent Platform cluster. By default, the log location is
/tmp/kafka-logs
, but you can modify the location using thelog.dirs
value in your broker configuration files or using the Java system property while starting the Confluent Platform cluster with-Dkafka.logs.dir
.
Issuer endpoint misconfiguration¶
Problem¶
Confluent Platform is unable to verify the token authenticity because the token issuer endpoint is incorrectly configured.
Here’s an example of the error message:
/* Example error when issuer is misconfigured */
{
"status_code": 403,
"message": "org.jose4j.jwt.consumer.InvalidJwtException: JWT (claims->{\"aud\":\"429a995a-de64-469a-b11d-69ca4344fdc2\",\"iss\":\"https://login.microsoftonline.com/0893715b-959b-4906-a185-2789e1ead045/v2.0\",\"iat\":1687245493,\"nbf\":1687245493,\"exp\":1687249393,\"groups\":[\"99b49608-15fc-48f8-a7c7-d1d4d7ff03de\",\"e8c31aa4-be6f-4d92-b888-7d595dc3f42e\"],\"rh\":\"0.ARsAW3GTCJuVBkmhhSeJ4erQRUozdOYN3ihLqe2mKChc58QbAHs.\",\"sub\":\"uW5lpf2zSoJ9K6O6hruUnx4LulcNUGoKR_viwsw010w\",\"tid\":\"0893715b-959b-4906-a185-2789e1ead045\",\"uti\":\"TZeBoX7yCE2LiIFqZyYVAA\",\"ver\":\"2.0\",\"wids\":[\"b79fbf4d-3ef9-4689-8143-76b194e85509\"]}) rejected due to invalid claims or other invalid content. Additional details: [[12] Issuer (iss) claim value (https://login.microsoftonline.com/0893715b-959b-4906-a185-2789e1ead045/v2.0) doesn't match expected value of https://login.microsoftonline.com/]",
"type": "CLIENT_ERROR"
}
Solution¶
The Issuer configuration needs to be fixed.In CP-Ansible, its sso_issuer_url. In CFK, it’s issuer.
Additional details¶
- Use OpenID Connect metadata discovery URIs to verify the correct endpoints for your identity provider. To get details on the endpoints for your identity provider, check their documentation. See 1 - Establish trust between the IdP and Confluent Platform for more information about endpoints for Okta, Microsoft Entra ID (Azure Active Directory), and Keycloak.
- The error is visible on the web browser during authentication and also is output
to the MDS log files for your Confluent Platform cluster. By default, the log location is
/tmp/kafka-logs
, but you can modify the location using thelog.dirs
value in your broker configuration files or using the Java system property while starting the Confluent Platform cluster with-Dkafka.logs.dir
.
JWKS endpoint misconfiguration¶
Problem¶
Confluent Platform is unable to verify the authenticity of the token because the JWKS URI is misconfigured or, if the keys used to verify the token are expired or not updated in the identity provider.
Here’s an example of the error message:
/* Example error when JWKS uri does not contain keys required to verify JWT */
{
"status_code": 403,
"message": "org.jose4j.jwt.consumer.InvalidJwtException: JWT processing failed. Additional details: [[17] Unable to process JOSE object (cause: org.jose4j.lang.UnresolvableKeyException: Unable to find a suitable verification key for JWS w/ header ........",
"type": "CLIENT_ERROR"
}
Solution¶
The JWKS configuration needs to be fixed.
- Ansible Playbooks for Confluent Platform:
sso_jwks_uri
- Confluent for Kubernetes:
jwksEndpointUri
Also, check to see if the signing keys are updated and not expired.
Misconfiguration of client credentials in Confluent Platform¶
The following are common misconfigurations of client credentials in the Confluent Platform cluster for client configurations: Client ID and Client Secret.
Client ID misconfiguration¶
Problem¶
You see a 400 Bad Request
error during authentication. The identity provider
is unable to recognize the client application for Confluent Platform because the client ID is
misconfigured in the request.
Here’s an example of the error message:
In the browser window, you see a 400 Bad Request
error and an error message like this:
"Your request resulted in an error."
Solution¶
Fix the client ID value. For example, Ansible Playbooks for Confluent Platform, fix the value of sso_client_id
in the inventory file.
Additional details¶
The error is visible on the web browser during authentication and also is output
to the MDS log files for your Confluent Platform cluster. By default, the log location is
/tmp/kafka-logs
, but you can modify the location using the log.dirs
value in your broker configuration files or using the Java system property while
starting the Confluent Platform cluster with -Dkafka.logs.dir
.
Client secret misconfiguration¶
Problem¶
The identity provider is unable to authenticate because the client secret configured is incorrect.
Here’s an example of an error message when the client secret is wrong for Microsoft Entra ID (Azure Active Directory):
java.lang.RuntimeException: Failed to retrieve tokens from IDP with status: 401. Error: {"error":"invalid_client","error_description":"AADSTS7000215: Invalid client secret provided. Ensure the secret being sent in the request is the client secret value, not the client secret ID, for a secret added to app '429a995a-de64-469a-b11d-69ca4344fdc2'.
Trace ID: bfb6ad53-766f-440d-ab71-7d1cbe6a1100
Correlation ID: f7ba524c-038f-4009-aa80-099730a8f79c
Timestamp: 2023-06-20 08:16:23Z","error_codes":[7000215],"timestamp":"2023-06-20 08:16:23Z","trace_id":"bfb6ad53-766f-440d-ab71-7d1cbe6a1100","correlation_id":"f7ba524c-038f-4009-aa80-099730a8f79c","error_uri":"https://login.microsoftonline.com/error?code=7000215"}
Solution¶
Fix the client secret value. For example, in Ansible Playbooks for Confluent Platform, it’s the value of sso_client_password
in the inventory file.
Additional details¶
The error is visible on the web browser during authentication and also is output
to the MDS log files for your Confluent Platform cluster. By default, the log location is
/tmp/kafka-logs
, but you can modify the location using the log.dirs
value in your broker configuration files or using the Java system property
while starting the Confluent Platform cluster with -Dkafka.logs.dir
.
Misconfiguration of redirect callback URLs (to Confluent Platform) in the identity provider¶
The following are common misconfigurations when the identity provider is unable to communicate with Confluent Platform or Confluent Control Center because the redirect URI is incorrect.
Redirect URI misconfiguration¶
Problem¶
The redirect URI in the client application is misconfigured.
Here’s an example of the error message:
The browser window displays a 400 Bad Request
and an error message like this:
“Error: The redirect_uri
parameter must be a Login Redirect URI in the
client app settings.”
Solution¶
Make sure that the redirect_uri
you’re using in your authentication request
is exactly the same as the URI you’ve set up in your client application settings
at the identity provider.
Update the redirect URI in the client application with:
https://<c3-host-name>:<c3-port-number>/api/metadata/security/1.0/oidc/authorization-code/callback
Replace the placeholders with the Confluent Control Center hostname and port number.
For guidance on how to update, the following articles might be helpful:
- Microsoft Entra ID (Azure Active Directory): See How to change redirect_uri for Azure AD [Stack Overflow]
- Okta: See 400 Bad Request: The ‘redirect_uri’ Parameter Must Be a Login Redirect URI In the Client App Settings.
Misconfiguration of claims¶
The following are common issues with either the subject (sub
) claim name or
the groups (groups
) claim name in configurations.
Subject (sub
) claim misconfiguration¶
Problem¶
Confluent Platform is unable to use the token provisioned by the identity provider because
- The
sub
claim is missing in the identity token. - The value of
sub
claim is empty or null. - The value of
sub
claim is not interpretable. For example, the identity token contains an array of strings for thesub
claim value instead of a simple string. Thesub
claim value should be a string.
Here’s an example of an error message when the sub
claim is missing in the
identity token:
{
"status_code": 403,
"message": "io.confluent.tokenapi.exceptions.InvalidTokenException: myclaim(sub claim) not present",
"type": "CLIENT_ERROR"
}
Solution¶
- The configured
sub
claim name in the Ansible Playbooks for Confluent Platform or Confluent for Kubernetes inventory file should be a interpretable unique key to identify the user. - Check the identity provider to use the correct claim name to uniquely identify
the user. And, accordingly configure
sso_sub_claim
in Ansible Playbooks for Confluent Platform orsubClaimName
in Confluent for Kubernetes.
Groups (groups
) claim misconfiguration¶
Problem¶
The groups
claim name is configured incorrectly.
Here’s an example of the error message:
{
"error_type": "TypeMismatch",
"message": "groups is not a List. Actual type: String"
}
Solution¶
- Verify in the identity provider for the correct claim name to get all the
groups a user is a member of.
- Ansible Playbooks for Confluent Platform: Change the value of
sso_groups_claim
. - Confluent for Kubernetes: Change the value of
groupsClaimName
.
- Ansible Playbooks for Confluent Platform: Change the value of
- The
groups
claim value in the identity token should be a list of groups the user is a member of. - The claim name in Confluent Platform configurations in Ansible Playbooks for Confluent Platform and Confluent for Kubernetes is configured
as
groups
by default.
Session management problems¶
The following are common issues with session management where:
- The refresh tokens are not enabled in the identity provider for the Confluent Platform client application.
- The refresh token expiration is configured as too low on the identity provider.
- The ID token expiration or session renewal interval values are too low.
Unexpected session behaviors because refresh token is not enabled in the identity provider¶
ID token expiration (included in the identity token provisioned by the identity provider) | Renewal interval (confluent.oidc.session.token.expiry.ms ) |
Absolute session expiration (confluent.oidc.session.max.timeout.ms ) |
Session gets renewed after 80% of session token expiry limit is past | Session cannot be extended after X mins and the user needs to login again |
---|---|---|---|---|
1440 minutes | 15 minutes | 360 minutes | 12 minutes, 24 minutes, … But new additional role assignments or user state changes won’t be reflected. | X = 360 minutes |
240 minutes | 15 minutes | 360 minutes | 12 minutes, 24 minutes, … But new additional role assignments or user state changes won’t be reflected. | X = 240 minutes |
If you find that your modifications are causing issues, revert to the default values, which should work for most cases.
- Session renewal interval (
confluent.oidc.session.token.expiry.ms
)- Default value:
900000
milliseconds (15 minutes)
- Default value:
- Absolute session expiration (
confluent.oidc.session.max.timeout.ms
)- Default value:
21600000
milliseconds (360 minutes or 6 hours)
- Default value:
Additional details¶
- When refresh tokens are not enabled in the identity provider for the client
application, the setup might even fail with
403
,404
, or500
HTTP request errors in some identity providers. - Even if the setup is working, the role bindings would not be updated during session renewal.
- When the refresh token is unavailable, the absolute session expiration limit primarily affects the time to enforce the re-login of the user.
Refresh token is invalid or expired¶
Problem¶
The refresh token is enabled, but the expiration is configured too low.
Here’s an example of an error message:
java.lang.RuntimeException: Got bad request status from IdP:
{
"error": "invalid_grant",
"error_description": "The refresh token is invalid or expired."
}
Solution¶
Verify on the identity provider and correct.
Connectivity issues between the identity provider and Confluent Platform¶
The following are common issues with connectivity between the identity provider and Confluent Platform:
- The identity provider is down or unreachable.
- The identity provider is reachable, but the client application is deleted or deactivated in the identity provider.
- When there is any other network connectivity issue between Confluent Platform and the identity provider because of restricting firewall rules or other reasons.
Identity provider is unreachable or down¶
Problem¶
The identity provider is unreachable from your Confluent Platform cluster.
Here are a couple of examples of the error messages:
{
"status_code": 500,
"message": "java.lang.RuntimeException: Failed to retrieve tokens from IDP with status:404"
}
{
"status_code": 500,
"message": "javax.ws.rs.ProcessingException: java.net.ConnectException: Connection refused (Connection refused)"
}
Solution¶
- Check if there is a healthy network connection between the Confluent Platform cluster and the identity provider.
- Check for blocking firewall rules or other network restrictions between the Confluent Platform cluster and the identity provider.
Additional details¶
The error is visible on the web browser during authentication and also is output
to the MDS log files for your Confluent Platform cluster. By default, the log location is
/tmp/kafka-logs
, but you can modify the location using the log.dirs
value in your broker configuration files or using the Java system property
while starting the Confluent Platform cluster with -Dkafka.logs.dir
.
Client application unreachable¶
Problem¶
You see an invalid_request
error from the identity provider because the
client application is unreachable.
Here’s an example of an error message:
{
"status_code": 500,
"message": "invalid_request"
}
Solution¶
Check in the identity provider if the application is available and activated.
Authorization failures¶
Following are common authorization failures when the authentication of the user is successful, but Confluent Control Center is not allowing resource access for the user.
These errors could happen to clusters with SSO for Confluent Control Center is enabled for a long time and not just during setup. Here are some of the common reasons for these errors:
- The user is not assigned with role bindings yet.
- The role bindings for the user got deleted or changed.
- The user state has changed or group membership has changed.
- Confluent Platform is unable to recognize the user groups that the identity provider has sent.
Sample authorization problem¶
Problem¶
User authentication is successful, but the user is not able to see any clusters.
Here’s an example of the error message:
The Confluent Platform Home page displays the following error message:
Solution¶
Check if the user role bindings have been deleted.