.. _controlcenter_troubleshooting: Troubleshoot |c3-short| for |cp| ******************************** Common issues ============= .. _viewprocessingstatus: View processing status in |c3-short| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ View a high-level summary of running and processing status within the |c3-short| application. Check the status at any time. #. In the upper-right corner of |c3-short|, click the menu icon to open the **Administration** menu. #. Click **About Control Center**. .. figure:: /images/c3-about-control-center.png :scale: 80% :alt: About Control Center **Processing Status** shows the status of |c3-short| (Running or Not Running). Consumption data and Broker data message processing speeds are shown in real-time since 30 minutes ago. .. figure:: ../../images/c3-process-status.png :width: 600px :alt: Control Center Processing Status Installing and Setup ^^^^^^^^^^^^^^^^^^^^ If you encounter issues during installation and setup, you can try these solutions. ^^^^^^^^^^^^^^^^^^^^^^^^^^ Bad security configuration ^^^^^^^^^^^^^^^^^^^^^^^^^^ * Check the security configuration for all brokers, metrics reporter, client interceptors, and |c3-short| (see `debugging check configuration <#check-configurations>`_). For example, is it SASL_SSL, SASL_PLAINTEXT, SSL? * Possible errors include: .. code:: bash ERROR SASL authentication failed using login context 'Client'. (org.apache.zookeeper.client.ZooKeeperSaslClient) .. code:: bash Caused by: org.apache.kafka.common.KafkaException: java.lang.IllegalArgumentException: No serviceName defined in either JAAS or Kafka configuration .. code:: bash org.apache.kafka.common.errors.IllegalSaslStateException: Unexpected handshake request with client mechanism GSSAPI, enabled mechanisms are [GSSAPI] * Verify that the correct Java Authentication and Authorization Service (JAAS) configuration was detected. * If ACLs are enabled, check them. * To verify that you can communicate with the cluster, try to produce and consume using ``console-*`` with the same security settings. ^^^^^^^^^^^^^^^^^^^^^^^^^^ InvalidStateStoreException ^^^^^^^^^^^^^^^^^^^^^^^^^^ * This error usually indicates that data is corrupted in the configured ``confluent.controlcenter.data.dir``. For example, this can be caused by an unclean shutdown. To fix, give |c3-short| a new ID by changing ``confluent.controlcenter.id`` and restart. * Allow permission for the configured ``confluent.controlcenter.data.dir``. ^^^^^^^^^^^^^^^^^^ Not enough brokers ^^^^^^^^^^^^^^^^^^ Check the logs for the related error ``not enough brokers``. Verify the `topic replication factors <#check-configurations>`_ are set correctly and verify that there are enough brokers available. ^^^^^^^^^^^^^^^^^^^^^^^ Local store permissions ^^^^^^^^^^^^^^^^^^^^^^^ Check the local permissions in |c3-short| state directory. These settings are as defined in the config ``confluent.controlcenter.data.dir`` in the ``control-center.properties``. You can access that directory with the user ID that was used to start |c3-short|. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Multiple instances of |c3-short| have the same ID ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You must use unique IDs for each |c3-short| instance, including instances in Docker. Duplicate IDs are not supported and will cause problems. ^^^^^^^^^^^^^^^ License expired ^^^^^^^^^^^^^^^ If you see a message similar to this: .. codewithvars:: bash [2017-08-21 14:12:33,812] WARN checking license failure. contact `support@confluent.io `_ for a license key: Unable to process JOSE object (cause: org.jose4j.lang.JoseException: Invalid JOSE…. You should verify that the user has a valid license, as specified in ``confluent.license=``. This can be either the key or a path to a license file. For more information, see the :ref:`Control Center configuration documentation `. To manage a license in the |c3-short| web interface, see :ref:`controlcenter_licenses`. .. _c3_schema_registry_not_set_up: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A schema for message values has not been set for this topic ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If you encounter this error message, you should verify that the |sr| ``listeners=http://0.0.0.0:8081`` configuration matches the |c3-short| ``confluent.controlcenter.schema.registry.url=http://localhost:8081`` configuration. For more information, see :ref:`control_center_logging_settings`. .. _c3_connect_ccloud_max_bytes: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |c3-short| cannot connect to |ccloud| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When attempting to connect a |ccloud| cluster to |c3-short| (following the :cloud:`Connecting Control Center to Confluent Cloud|cp-component/c3-cloud-config.html` procedure), you see a message similar to the following: .. codewithvars:: bash [2019-07-31 20:40:28,023] ERROR [main] attempt=failed to create topic=TopicInfo{name=_confluent-metrics, partitions=12, replication=3} (io.confluent.controlcenter.KafkaHelper) org.apache.kafka.common.errors.PolicyViolationException: Config property 'max.message.bytes' with value '10485760' exceeded max limit of 8388608. The error message ``max.message.bytes`` is due to enforcement on some default settings on |ccloud|. To resolve the error, add the following configuration to the |c3-short| properties file and restart |c3-short|: .. code:: bash ... confluent.metrics.topic.max.message.bytes=8388608 ... This mismatch in default values between |c3-short| and |ccloud| is a known issue being tracked in MMA-3564. .. _c3_safari_websock_auth: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Cannot browse topic messages using Safari and authentication ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Due to a `WebKit bug in Safari `_, the following unknown error displays when using authentication and attempting to :ref:`browse topic messages `: .. figure:: ../../images/safari_browser_ldap_error.png :scale: 50% The Safari browser fails to send authenticated requests through the WebSocket protocol. The recommended workaround in the interim is to use the Chrome or Firefox browsers rather than Safari. .. ESCALATION-1460^ System health ^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Web interface that is blank or stuck loading ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If you experience a web interface that is blank or stuck loading, you can select the cluster in the drop-down and use the information below to troubleshoot. * Are there `errors or warnings in the logs <#check-logs>`_? For more information on how to find logs, see the :ref:`documentation `. * `What are you monitoring <#size-of-clusters>`_? Are you `under-provisioned <#system-check>`_? * Is there `a lag in Control Center <#consumer-offset-lag>`_? Especially on the ``MetricsAggregateStore`` partitions * Use browser debugging tools to check REST calls to find out if the requests have been made successfully and with a valid response, specifically these requests: .. figure:: ../../images/c3-troubleshoot.png :width: 600px .. tip:: You can view these calls by using common web browser tools (e.g., `Chrome Developer Tools `_). * The ``/2.0/metrics//maxtime`` endpoint should return the latest timestamp that |c3-short| has for metrics data. * If no data is returned from the backend, verify that you're getting data on the input topic and review the logs for issues. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The |c3-short| is getting ready to launch ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If this message appears in the web interface after starting or restarting |c3-short|, click **Try again**. .. figure:: ../../images/c3-launch-msg.png :width: 600px If |c3-short| doesn't launch, try the suggestions below to troubleshoot: * Usually this means that |ak-tm| doesn't have any metrics data, but this message could also indicate a 500 Internal Server Error has occurred. If you get a 500 error, check the |c3-short| logs for errors. * Use browser debugging tools to check the response. An empty response (``{ }``) from the ``/2.0/metrics//maxtime`` endpoint means that |ak| hasn't received any metrics data. * Check your |c3-short| log output for WARN messages: ``broker= is not instrumented with ConfluentMetricsReporter``. If you see this warning, be sure to implement the :ref:`instructions ` for configuring |ak| Server with ``metric.reporters=io.confluent.metrics.reporter.ConfluentMetricsReporter``. Restart the |ak| brokers to pick up the configuration change. * Verify that the metrics reporter is set up correctly. Dump the ``_confluent-metrics`` input topic to see if there are any messages produced. .. _c3_no_clusters_found: ^^^^^^^^^^^^^^^^^ No clusters found ^^^^^^^^^^^^^^^^^ If this message appears in the web interface after starting |c3-short|, check your :ref:`configuration ` in the appropriate ``control.center.properties`` file. .. figure:: ../../images/c3-no-clusters-found-msg.png :width: 600px .. _rocksdb_tmp_dir: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |c3-short| cannot start due to temporary directory permissions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Issue: Get an error about ``/tmp`` mounted with ``noexec``: .. code:: bash java.lang.UnsatisfiedLinkError: /tmp/librocksdbjni3375578050467151433.so: /tmp/librocksdbjni3375578050467151433.so: failed to map segment from shared object: Operation not permitted when /tmp is mounted with noexec Resolution: If you do not have write access to the ``/tmp`` directory because it is set to ``noexec``, pass in a directory path for ``rocksdbtmp`` that you have write access to and start |c3-short|. .. sourcecode:: bash CONTROL_CENTER_OPTS="-Djava.io.tmpdir=/my/dir/for/rocksdbtmp" control-center-start /path/to/control-center.properties ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Nothing is produced on the Metrics (``_confluent-metrics``) topic ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * Verify that the :ref:`metrics reporter is set up correctly ` with security configured. * Check the |ak| broker logs and look for timeouts or other errors (e.g., `RecordTooLargeException <#RecordTooLargeException>`_) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |c3-short| is lagging behind |ak| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If |c3-short| is not reporting the latest data and the charts are falling behind, you can use this information to troubleshoot. * This can happen if |c3-short| is underpowered or churning through loads of backlog. * Check the `offset lag <#consumer-offset-lag>`_. If lag is large and increasing over time, |c3-short| may not be able to handle the monitoring load. Try these additional checks for `cluster <#size-of-clusters>`_ and `system <#system-check>`_. * With |cp| 3.3.x and later, you can set a short amount of time for the skip backlog monitoring settings: ``confluent.monitoring.interceptor.topic.skip.backlog.minutes`` and ``confluent.metrics.topic.skip.backlog.minutes``. For example, you can set this to ``0`` if you want to process from the latest offsets. |c3-short| will ignore everything on the input topics older than a specified amount of time. This is useful when you need |c3-short| to catch up faster. For more information, see the :ref:`Control Center configuration documentation `. .. _record-too-large-exception: ^^^^^^^^^^^^^^^^^^^^^^^ RecordTooLargeException ^^^^^^^^^^^^^^^^^^^^^^^ If you receive this error in the broker logs, you can use this information to troubleshoot. * Set ``confluent.metrics.reporter.max.request.size=10485760`` in broker the ``server.properties`` file. This is the default in 3.3.x and later. * Change the topic configuration for ``_confluent-metrics`` to accept large messages. This is the default in 3.3.x and later. For more information, see the :ref:`Metrics Reporter message size documentation `. .. sourcecode:: bash bin/kafka-configs --bootstrap-server --alter --add-config max.message.bytes=10485760 --entity-type topics --entity-name _confluent-metrics ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Parts of the broker or topic table have blank values ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is a known issue that should be transient until |c3-short| is caught up. It can be caused by: * Different streams topologies that are processing at different rates during restore. * |c3-short| is lagging or having trouble keeping up due to lack of resources. RBAC ^^^^ .. _max-requests-exceeded: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Max requests queued per destination 1024 exceeded ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is a rare issue that occurs due to the interaction of the custom MDS client used with |c3-short| when RBAC is enabled. If you receive this error, try the following: - Restart |c3-short| if possible - Increase the ``confluent.controlcenter.mds.client.max.requests.queued.per.destination`` configuration value and lower the ``confluent.controlcenter.mds.client.idle.timeout`` value. For more, see :ref:`c3_RBAC_settings`. Streams Monitoring ^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^ Blank charts ^^^^^^^^^^^^ If you are experiencing blank charts, you can use this information to troubleshoot. * Verify that the :ref:`Confluent Monitoring Interceptors ` are properly configured on the clients, including any required security configuration settings. * For the time range selected, check if there is new data arriving to the `_confluent-monitoring topic <#review-input-topics>`_. * It is normal for |c3-short| to not show unconsumed messages because Confluent doesn't know the expected consumption, so verify if there are consumers reading from the topics. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Unexpected herringbone pattern ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If you are experiencing an unexpected herringbone pattern, you can use this information to troubleshoot. * Verify whether the clients are properly shut down. * Look for these errors in client logs: * ``Failed to shutdown metrics reporting thread...`` * ``Failed to publish all cached metrics on termination for...`` * ``ERROR Terminating publishing and collecting monitoring metrics for`` * ``Failed to close monitoring interceptor for…`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Missing consumers or consumer groups ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If you are missing consumers or consumer groups, you can use this information to troubleshoot. * Look for errors or warnings in the missing client’s log. * Verify whether the input topic is receiving interceptor data for the missing client. Connect ^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The |c3-short| is getting ready to launch ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If this message appears in the web interface, use the information below to troubleshoot. * Is the Connect cluster that is defined in ``confluent.controlcenter.connect.cluster`` available? * Can you reach the Connect endpoints directly by running a cURL command (e.g., ``curl www.example.com``)? * Check the Connect logs for any errors. |c3-short| is a proxy to Connect. .. _c3-logging-troubleshoot: Debugging ========= Check logs ^^^^^^^^^^ These are the |c3-short| log types. For more information about logging, see :ref:`control_center_logging_settings`. * ``c3.log`` - |c3-short|, HTTP activity, anything not related to streams, REST API calls * ``c3-streams.log`` - Streams * ``c3-kafka.log`` - Client, |zk|, and |ak| Here are things to look for in the logs: * ``ERROR`` * ``shutdown`` * ``Exceptions`` - verify that the brokers can be reached * ``WARN`` * ``Healthcheck`` errors and warnings If nothing is obvious, turn :ref:`DEBUG logging on ` and restart |c3-short|. .. _enable-debug-and-trace-logging: Enable debug and trace logging ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ #. Open the ``CONFLUENT_HOME/etc/confluent-control-center/log4j.properties`` file. This file is referenced by the ``CONTROL_CENTER_LOG4J_OPTS`` environment variable. #. Set and export the ``CONTROL_CENTER_LOG4J_OPTS`` environment variable similar to this example: .. code:: bash export CONTROL_CENTER_LOG4J_OPTS='-Dlog4j.configuration=file:/apps/kafka/config/confluent-control-center/log4j.properties' #. Set your debugging options: - To enable debug logging, change the log level to ``DEBUG`` at the root level: .. code:: bash log4j.rootLogger=DEBUG, stdout - To enable trace logging, change the root logger to ``TRACE`` at the root level: .. code:: bash log4j.rootLogger=TRACE, stdout - To enable additional streams logging, particularly at the request of Confluent Support, follow this example: .. code:: bash log4j.rootLogger=DEBUG, stdout log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.EnhancedPatternLayout log4j.appender.stdout.layout.ConversionPattern=[%d] %p [%t] %m (%c)%n log4j.appender.streams=org.apache.log4j.ConsoleAppender log4j.appender.streams.layout=org.apache.log4j.EnhancedPatternLayout log4j.appender.streams.layout.ConversionPattern=[%d] %p [%t] %m (%c)%n log4j.appender.streams.filter.1=io.confluent.Log4jRateFilter # Allows everything that is greater than or equal to specified level log4j.appender.streams.filter.1.level=TRACE # Allows rate/second logs at less than specified level #log4j.appender.streams.filter.1.rate=25 log4j.logger.org.apache.kafka.streams=INFO, streams log4j.additivity.org.apache.kafka.streams=false log4j.logger.io.confluent.controlcenter.streams=INFO, streams log4j.additivity.io.confluent.controlcenter.streams=false log4j.logger.kafka=ERROR, stdout log4j.logger.org.apache.kafka=ERROR, stdout log4j.logger.org.apache.kafka.clients.consumer=INFO, stdout log4j.logger.org.apache.zookeeper=ERROR, stdout log4j.logger.org.I0Itec.zkclient=ERROR, stdout #. :ref:`Restart ` |c3-short|. For more information, see :ref:`c3_properties_files`. .. codewithvars:: bash ./bin/control-center-stop ./bin/control-center-start ../etc/confluent-control-center/control-center.properties #. When you are done debugging and tracing, reset the log levels back to their defaults and restart |c3-short|: .. code:: bash log4j.rootLogger=INFO, stdout log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.EnhancedPatternLayout log4j.appender.stdout.layout.ConversionPattern=[%d] %p [%t] %m (%c)%n log4j.appender.streams=org.apache.log4j.ConsoleAppender log4j.appender.streams.layout=org.apache.log4j.EnhancedPatternLayout log4j.appender.streams.layout.ConversionPattern=[%d] %p [%t] %m (%c)%n log4j.appender.streams.filter.1=io.confluent.Log4jRateFilter # will allow everything that is >=level log4j.appender.streams.filter.1.level=WARN # will only allow rate/second logs at ` on the broker, clients, and |c3-short|. * Verify that the prefixes are correct. * Are the metrics reporter and interceptors installed and configured correctly? * Verify the topic configurations for all |c3-short| topics: replication factor, timestamp type, min isr, and retention. .. codewithvars:: bash ./bin/kafka-topics --bootstrap-server --describe Review input topics ^^^^^^^^^^^^^^^^^^^ * ``_confluent-monitoring`` and ``_confluent-metrics`` are the entry points for |c3-short| data * Verify that the input topics are created, where host and port (````), and topic (````) are specified: .. codewithvars:: bash bin/kafka-topics.sh --bootstrap-server --topic * Verify that data is being produced in the input topics. The security settings must be properly configured in the consumer for this to work. This is accomplished by specifying the properties file that was used to start |c3-short| (e.g., ``control-center.properties``) in the following command, and setting ```` to the topic you wish to read. .. codewithvars:: bash bin/control-center-console-consumer config/control-center.properties --topic {"clientType":"PRODUCER","clientId":"rock-client-producer-4","group... {"clientType":"CONSUMER","clientId":"rock-client-consumer-2","group... Size of clusters ^^^^^^^^^^^^^^^^ For examples on how to size your environment, review the :ref:`Control Center example deployments `. System check ^^^^^^^^^^^^ Check the system level metrics where |c3-short| is running; including CPU, memory, disk, and JVM settings. Are the settings within the :ref:`recommended values `? Frontend request and response ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Using your browser's web developer tools, view **Network** settings to verify that requests and responses are showing the correct data. If you are working with Confluent Support to debug browser issues, they may ask you to capture a HAR file for them to review. HAR files store all requests/responses between the browser and the server. For example, to generate the HAR file with Google Chrome: #. Open Google Chrome and go to the page where the issue is occurring. #. From the Chrome menu bar, select **View** > **Developer** > **Developer Tools**. #. In the Developer Tools panel, select the **Network** tab. #. Look for a round red Record button in the upper left corner of the tab and confirm that it is red (activated). If it is grey, click it once to start recording. #. Check the **Preserve log** box to preserve capture across multiple pages. #. Reproduce the issue in the browser by interacting with the page. #. Right-click anywhere on the grid of network requests, select **Save as HAR with content**, and save the file. You can upload the HAR file to your Confluent Support ticket (or review the contents of the file if you are doing your own troubleshooting). .. figure:: ../../images/c3-save-as-HAR-with-content.png :width: 600px .. tip:: You can also right-click any row in the developer tools panel and select **Copy >** to copy Network log content as a HAR or cURL file. REST API ^^^^^^^^ Backend REST API calls are logged in ``c3.log``. Consumer offset lag ^^^^^^^^^^^^^^^^^^^ Verify that all offset lags for |c3-short| topics are not increasing over time. Review the ``MetricsAggregateStore`` and ``aggregate-rekey`` topics as they are often the bottleneck. You will need to run this command multiple times to observe the trend, where |c3-short| version (````) and ID (``control-center-id``) are specified. .. codewithvars:: bash ./bin/kafka-consumer-groups --bootstrap-server --describe --group _confluent-controlcenter-- Enable GC logging ^^^^^^^^^^^^^^^^^ Enable GC logs, restart |c3-short| with the following, where directory (````) is specified: .. codewithvars:: bash CONTROL_CENTER_JVM_PERFORMANCE_OPTS="-server -verbose:gc -Xloggc:/gc.log -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark -XX:+DisableExplicitGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCCause -Djava.awt.headless=true" Thread dump ^^^^^^^^^^^ Run this command for a thread dump: .. codewithvars:: bash jstack -l $(jcmd | grep -i 'controlcenter\.ControlCenter' | awk '{print $1}') > jstack.out Data directory ^^^^^^^^^^^^^^ The |c3-short| local state is stored in ``confluent.controlcenter.data.dir``. You can use this command to determine the size of your data directory (````). .. codewithvars:: bash du -h