Troubleshoot SSO for Control Center on Confluent Platform

If you are having trouble with SSO using OIDC, review the following common issues encouuntered during configuration and possible solutions.

If you are still having trouble after reviewing the issues and solutions, contact Confluent Support.

Common issues

Click the following links to jump to sections that might be relevant to you:

Misconfiguration of identity provider endpoints in Confluent Platform

The following are common misconfigurations of identity provider endpoints in the Confluent Platform cluster for endpoint configurations: Authorize, Token, Issuer, and JWKS.

Authorize endpoint misconfiguration

Problem

Confluent Platform is unable to raise authorize request to the identity provider. This happens because authorize URI is misconfigured.

Solution

Authorize configuration needs to be fixed.

  • Ansible Playbooks for Confluent Platform: sso_authorize_uri
  • Confluent for Kubernetes: authorizeBaseEndpointUri
Additional details
  • Use OpenID Connect metadata discovery URIs to verify the correct endpoints for your identity provider. To get details on the endpoints for your identity provider, check their documentation. See Get the identity provider endpoints for more information about endpoints for Okta, Microft Entra ID (Azure Active Directory), and Keycloak.
  • The error is visible on the web browser during authentication and also is output to the MDS log files for your Confluent Platform cluster. By default, the log location is /tmp/kafka-logs, but you can modify the location using the log.dirs value in your broker configuration files or using the Java system property while starting the Confluent Platform cluster with -Dkafka.logs.dir.

Token endpoint misconfiguration

Problem

Confluent Platform is unable to request a token from the identity provider because the token endpoint is misconfigured.

Here’s an example of the error message:

/* Example error when token endpoint is misconfigured */
{
  "status_code": 500,
  "message": "java.lang.RuntimeException: Got bad request status from IdP: {\"error\":\"invalid_request\",\"error_description\":\"AADSTS900023: Specified tenant identifier '0893715bxxx-959b-4906-a185-2789e1ead045' is neither a valid DNS name, nor a valid external domain.\\r\\nTrace ID: 907463ce-d595-4dcc-a89b-4098d5d61600\\r\\nCorrelation ID: f8c6cf9c-938d-411a-9ccc-de6d69114e03\\r\\nTimestamp: 2023-06-20 09:31:58Z\",\"error_codes\":[900023],\"timestamp\":\"2023-06-20 09:31:58Z\",\"trace_id\":\"907463ce-d595-4dcc-a89b-4098d5d61600\",\"correlation_id\":\"f8c6cf9c-938d-411a-9ccc-de6d69114e03\",\"error_uri\":\"https://login.microsoftonline.com/error?code=900023\"}"
}
Solution

The token configuration needs to be fixed.

  • Ansible Playbooks for Confluent Platform: sso_token_uri
  • Confluent for Kubernetes: tokenBaseEndpointUri
Additional details
  • Use OpenID Connect metadata discovery URIs to verify the correct endpoints for your identity provider. To get details on the endpoints for your identity provider, check their documentation. See Get the identity provider endpoints for more information about endpoints for Okta, Microft Entra ID (Azure Active Directory), and Keycloak.
  • The error is visible on the web browser during authentication and also is output to the MDS log files for your Confluent Platform cluster. By default, the log location is /tmp/kafka-logs, but you can modify the location using the log.dirs value in your broker configuration files or using the Java system property while starting the Confluent Platform cluster with -Dkafka.logs.dir.

Issuer endpoint misconfiguration

Problem

Confluent Platform is unable to verify the token authenticity because the token issuer endpoint is incorrectly configured.

Here’s an example of the error message:

/* Example error when issuer is misconfigured */
{
  "status_code": 403,
  "message": "org.jose4j.jwt.consumer.InvalidJwtException: JWT (claims->{\"aud\":\"429a995a-de64-469a-b11d-69ca4344fdc2\",\"iss\":\"https://login.microsoftonline.com/0893715b-959b-4906-a185-2789e1ead045/v2.0\",\"iat\":1687245493,\"nbf\":1687245493,\"exp\":1687249393,\"groups\":[\"99b49608-15fc-48f8-a7c7-d1d4d7ff03de\",\"e8c31aa4-be6f-4d92-b888-7d595dc3f42e\"],\"rh\":\"0.ARsAW3GTCJuVBkmhhSeJ4erQRUozdOYN3ihLqe2mKChc58QbAHs.\",\"sub\":\"uW5lpf2zSoJ9K6O6hruUnx4LulcNUGoKR_viwsw010w\",\"tid\":\"0893715b-959b-4906-a185-2789e1ead045\",\"uti\":\"TZeBoX7yCE2LiIFqZyYVAA\",\"ver\":\"2.0\",\"wids\":[\"b79fbf4d-3ef9-4689-8143-76b194e85509\"]}) rejected due to invalid claims or other invalid content. Additional details: [[12] Issuer (iss) claim value (https://login.microsoftonline.com/0893715b-959b-4906-a185-2789e1ead045/v2.0) doesn't match expected value of https://login.microsoftonline.com/]",
  "type": "CLIENT_ERROR"
}
Solution

The Issuer configuration needs to be fixed.In CP-Ansible, its sso_issuer_url. In CFK, it’s issuer.

Additional details
  • Use OpenID Connect metadata discovery URIs to verify the correct endpoints for your identity provider. To get details on the endpoints for your identity provider, check their documentation. See Get the identity provider endpoints for more information about endpoints for Okta, Microft Entra ID (Azure Active Directory), and Keycloak.
  • The error is visible on the web browser during authentication and also is output to the MDS log files for your Confluent Platform cluster. By default, the log location is /tmp/kafka-logs, but you can modify the location using the log.dirs value in your broker configuration files or using the Java system property while starting the Confluent Platform cluster with -Dkafka.logs.dir.

JWKS endpoint misconfiguration

Problem

Confluent Platform is unable to verify the authenticity of the token because the JWKS URI is misconfigured or, if the keys used to verify the token are expired or not updated in the identity provider.

Here’s an example of the error message:

/* Example error when JWKS uri does not contain keys required to verify JWT */
{
  "status_code": 403,
  "message": "org.jose4j.jwt.consumer.InvalidJwtException: JWT processing failed. Additional details: [[17] Unable to process JOSE object (cause: org.jose4j.lang.UnresolvableKeyException: Unable to find a suitable verification key for JWS w/ header ........",
  "type": "CLIENT_ERROR"
}
Solution

The JWKS configuration needs to be fixed.

  • Ansible Playbooks for Confluent Platform: sso_jwks_uri
  • Confluent for Kubernetes: jwksEndpointUri

Also, check to see if the signing keys are updated and not expired.

Misconfiguration of client credentials in Confluent Platform

The following are common misconfigurations of client credentials in the Confluent Platform cluster for client configurations: Client ID and Client Secret.

Client ID misconfiguration

Problem

You see a 400 Bad Request error during authentication. The identity provider is unable to recognize the client application for Confluent Platform because the client ID is misconfigured in the request.

Here’s an example of the error message:

In the browser window, you see a 400 Bad Request error and an error message like this:

"Your request resulted in an error."
Solution

Fix the client ID value. For example, Ansible Playbooks for Confluent Platform, fix the value of sso_client_id in the inventory file.

Additional details

The error is visible on the web browser during authentication and also is output to the MDS log files for your Confluent Platform cluster. By default, the log location is /tmp/kafka-logs, but you can modify the location using the log.dirs value in your broker configuration files or using the Java system property while starting the Confluent Platform cluster with -Dkafka.logs.dir.

Client secret misconfiguration

Problem

The identity provider is unable to authenticate because the client secret configured is incorrect.

Here’s an example of an error message when the client secret is wrong for Microsoft Entra ID (Azure Active Directory):

java.lang.RuntimeException: Failed to retrieve tokens from IDP with status: 401. Error: {"error":"invalid_client","error_description":"AADSTS7000215: Invalid client secret provided. Ensure the secret being sent in the request is the client secret value, not the client secret ID, for a secret added to app '429a995a-de64-469a-b11d-69ca4344fdc2'.
Trace ID: bfb6ad53-766f-440d-ab71-7d1cbe6a1100
Correlation ID: f7ba524c-038f-4009-aa80-099730a8f79c
Timestamp: 2023-06-20 08:16:23Z","error_codes":[7000215],"timestamp":"2023-06-20 08:16:23Z","trace_id":"bfb6ad53-766f-440d-ab71-7d1cbe6a1100","correlation_id":"f7ba524c-038f-4009-aa80-099730a8f79c","error_uri":"https://login.microsoftonline.com/error?code=7000215"}
Solution

Fix the client secret value. For example, in Ansible Playbooks for Confluent Platform, it’s the value of sso_client_password in the inventory file.

Additional details

The error is visible on the web browser during authentication and also is output to the MDS log files for your Confluent Platform cluster. By default, the log location is /tmp/kafka-logs, but you can modify the location using the log.dirs value in your broker configuration files or using the Java system property while starting the Confluent Platform cluster with -Dkafka.logs.dir.

Misconfiguration of redirect callback URLs (to Confluent Platform) in the identity provider

The following are common misconfigurations when the identity provider is unable to communicate with Confluent Platform or Confluent Control Center because the redirect URI is incorrect.

Redirect URI misconfiguration

Problem

The redirect URI in the client application is misconfigured.

Here’s an example of the error message:

The browser window displays a 400 Bad Request and an error message like this:

“Error: The redirect_uri parameter must be a Login Redirect URI in the client app settings.”

Solution

Make sure that the redirect_uri you’re using in your authentication request is exactly the same as the URI you’ve set up in your client application settings at the identity provider.

Update the redirect URI in the client application with:

https://<c3-host-name>:<c3-port-number>/api/metadata/security/1.0/oidc/authorization-code/callback

Replace the placeholders with the Confluent Control Center hostname and port number.

For guidance on how to update, the following articles might be helpful:

Misconfiguration of claims

The following are common issues with either the subject (sub) claim name or the groups (groups) claim name in configurations.

Subject (sub) claim misconfiguration

Problem

Confluent Platform is unable to use the token provisioned by the identity provider because

  • The sub claim is missing in the identity token.
  • The value of sub claim is empty or null.
  • The value of sub claim is not interpretable. For example, the identity token contains an array of strings for the sub claim value instead of a simple string. The sub claim value should be a string.

Here’s an example of an error message when the sub claim is missing in the identity token:

{
  "status_code": 403,
  "message": "io.confluent.tokenapi.exceptions.InvalidTokenException: myclaim(sub claim) not present",
  "type": "CLIENT_ERROR"
}
Solution
  • The configured sub claim name in the Ansible Playbooks for Confluent Platform or Confluent for Kubernetes inventory file should be a interpretable unique key to identify the user.
  • Check the identity provider to use the correct claim name to uniquely identify the user. And, accordingly configure sso_sub_claim in Ansible Playbooks for Confluent Platform or subClaimName in Confluent for Kubernetes.

Groups (groups) claim misconfiguration

Problem

The groups claim name is configured incorrectly.

Here’s an example of the error message:

{
  "error_type": "TypeMismatch",
  "message": "groups is not a List. Actual type: String"
}
Solution
  • Verify in the identity provider for the correct claim name to get all the groups a user is a member of.
    • Ansible Playbooks for Confluent Platform: Change the value of sso_groups_claim.
    • Confluent for Kubernetes: Change the value of groupsClaimName.
  • The groups claim value in the identity token should be a list of groups the user is a member of.
  • The claim name in Confluent Platform configurations in Ansible Playbooks for Confluent Platform and Confluent for Kubernetes is configured as groups by default.

Session management problems

The following are common issues with session management where:

  • The refresh tokens are not enabled in the identity provider for the Confluent Platform client application.
  • The refresh token expiration is configured as too low on the identity provider.
  • The ID token expiration or session renewal interval values are too low.

Unexpected session behaviors because refresh token is not enabled in the identity provider

ID token expiration (included in the identity token provisioned by the identity provider) Renewal interval (confluent.oidc.session.token.expiry.ms) Absolute session expiration (confluent.oidc.session.max.timeout.ms) Session gets renewed after 80% of session token expiry limit is past Session cannot be extended after X mins and the user needs to login again
1440 minutes 15 minutes 360 minutes 12 minutes, 24 minutes, … But new additional role assignments or user state changes won’t be reflected. X = 360 minutes
240 minutes 15 minutes 360 minutes 12 minutes, 24 minutes, … But new additional role assignments or user state changes won’t be reflected. X = 240 minutes

If you find that your modifications are causing issues, revert to the default values, whih should work for most cases.

  • Session renewal interval (confluent.oidc.session.token.expiry.ms)
    • Default value: 900000 milliseconds (15 minutes)
  • Absolute session expiration (confluent.oidc.session.max.timeout.ms)
    • Default value: 21600000 milliseonds (360 minutes or 6 hours)
Additional details
  • When refresh tokens are not enabled in the identity provider for the client application, the setup might even fail with 403, 404, or 500 HTTP request errors in some identity providers.
  • Even if the setup is working, the role bindings would not be updated during session renewal.
  • When the refresh token is unavailable, the absolute session expiration limit primarily affects the time to enforce the re-login of the user.

Refresh token is invalid or expired

Problem

The refresh token is enabled, but the expiration is configured too low.

Here’s an example of an error message:

java.lang.RuntimeException: Got bad request status from IdP:
{
  "error": "invalid_grant",
  "error_description": "The refresh token is invalid or expired."
}
Solution

Verify on the identity provider and correct.

Connectivity issues between the identity provider and Confluent Platform

The following are common issues with connectivity between the identity provider and Confluent Platform:

  • The identity provider is down or unreachable.
  • The dentity provider is reachable, but the client application is deleted or deactivated in the identity provider.
  • When there is any other network connectivity issue between Confluent Platform and the identity provider because of restricting firewall rules or other reasons.

Identity provider is unreachable or down

Problem

The identity provider is unreachable from your Confluent Platform cluster.

Here are a couple of examples of the error messages:

{
  "status_code": 500,
  "message": "java.lang.RuntimeException: Failed to retrieve tokens from IDP with status:404"
}
{
  "status_code": 500,
  "message": "javax.ws.rs.ProcessingException: java.net.ConnectException: Connection refused (Connection refused)"
}
Solution
  • Check if there is a healthy network connection between the Confluent Platform cluster and the identity provider.
  • Check for blocking firewall rules or other network restrictions between the Confluent Platform cluster and the identity provider.
Additional details

The error is visible on the web browser during authentication and also is output to the MDS log files for your Confluent Platform cluster. By default, the log location is /tmp/kafka-logs, but you can modify the location using the log.dirs value in your broker configuration files or using the Java system property while starting the Confluent Platform cluster with -Dkafka.logs.dir.

Client application unreachable

Problem

You see an invalid_request error from the identity provider because the client application is unreachable.

Here’s an example of an error message:

{
  "status_code": 500,
  "message": "invalid_request"
}
Solution

Check in the identity provider if the application is available and activated.

Authorization failures

Following are common authorization failures when the authentication of the user is successful, but Confluent Control Center is not allowing resource access for the user.

These errors could happen to clusters with SSO for Confluent Control Center is enabled for a long time and not just during setup. Here are some of the common reasons for these errors:

  • The user is not assigned with role bindings yet.
  • The role bindings for the user got deleted or changed.
  • The user state has changed or group membership has changed.
  • Confluent Platform is unable to recognize the user groups that the identity provider has sent.

Sample authorization problem

Problem

User authentication is successful, but the user is not able to see any clusters.

Here’s an example of the error message:

The Confluent Platform Home page displays the following error message:

No clusters found.
You need to configure Control Center so it knows how to connect to your Kafka cluster(s).
Solution

Check if the user role bindings have been deleted.