Generate Diagnostics with the Diagnostics Bundle Tool for Confluent Platform¶
The Confluent Platform Diagnostics Bundle Tool is a tool that collects diagnostic information about your
Confluent Platform installation and compresses it into a .tar.gz
file. The file can be
uploaded to Confluent Support for further analysis.
This tool currently collects diagnostics on Kafka brokers and Kafka Connect.
You can perform the following tasks with this tool:
- Collect diagnostics to submit to Confluent
- Collect logs from a specific time frame
- Collect diagnostics with an input file
- Evaluate the components and modify the diagnostics that are generated
After you collect diagnostics, you can upload the file to Confluent Support.
Prerequisites¶
To run this tool you need the following:
- Confluent Platform version 6.1 or later
- Java 8 or later (also required by Confluent Platform)
- Java Virtual Machine Process Status (jps) tool installed to Discover and modify components
- Permission to write to the current directory
- Read access to files being collected (for complete collection)
Installation¶
To install the tool:
Download the Diagnostics Bundle Tool jar file from the Confluent download page.
For example, you can use the
wget
orcurl
commands to download. You should download the latest version of the tool from the download page. Example commands to download version 1.0.3 follow.Example using
wget
. Check for the latest version and change the version string if needed.wget https://packages.confluent.io/tools/diagnostics-bundle/diagnostics-bundle-1.0.3.jar
Example using
curl
. Check for the latest version and change the version string if needed.curl -O https://packages.confluent.io/tools/diagnostics-bundle/diagnostics-bundle-1.0.3.jar
Copy the JAR file to a directory on each Confluent Platform node where you want to run it and collect diagnostics.
For release details, see the Release notes section.
Collect diagnostics¶
To collect diagnostics for Confluent Platform to share with Confluent, use the collect
command on each node
where Confluent Platform is also running. Using collect
with no options
results in the tool performing basic sanitization of the output data, meaning password strings and
MAC addresses are redacted, and log files from seven days previous to the current time are collected.
You can also specify specific logs, use the Log Redaction Tool or the the discover and plan commands to help sanitize the output.
Important
You should review all information in the output file before you upload the file to Confluent support.
Following is an example of the collect
command:
java -jar ./diagnostics-bundle-<version>.jar collect
The tool will run for a couple of minutes and when complete,
generates a zipped output bundle named like the following: diagnostics-output-<hostname>-<YYYY>-<MM>-<DD>-<HH>-<MM>-<ss>.tar.gz
.
This bundle contains:
- Confluent components logs
- Confluent component configuration files
- Confluent component process information
- Confluent component metrics if JMX is enabled
- Host information
Your final line of output should resemble:
Diagnostics output has been zipped and written to: /home/confluent/diagnostics-output-PF2T6DCF-2023-08-22-23-38-55.tar.gz
This bundle contains a metadata directory and a directory for each Confluent component. The metadata directory contains input files, if they were specified, and other metadata for the tool. Each component directory contains a subdirectory and files for each diagnostic that was collected. Following is example of the file structure of the tool output:
└── diagnostics-output-ip-10-0-206-212-2023-08-24-12-38-22
├── _meta
│ ├── component-plan.yaml
│ └── diagnostics.log
| └── discovered-components.yaml
├── host
│ └── shell
│ └── mpstat-P-ALL-1-10.yaml
└── kafka
├── logs
│ └── var-log-kafka
│ ├── controller.log
│ └── data-balancer.log
├── metrics
│ └── metrics.txt
├── properties
│ ├── log4j.properties
│ └── server.properties
└── shell
└── du-var-lib-kafka-data.yaml
Important
Do not upload event data or message content to support tickets. See Upload the files.
Collect logs from a specific time frame¶
The collect
command has the following options to specify the log files that are collected:
Option | Default | Details |
---|---|---|
--all-logs |
False | Collects all logs related to all Confluent components. This option overrides the other options. |
--logs-start=<startTimestamp> - |
7 days ago | Collect log files created before the specified timestamp. The timestamp must be specified in ISO-8601 format. Example: 2023-09-01T00:00:00Z |
--logs-end=<endTimeStamp> |
The current time | Collect log files modified after the specified timestamp. The timestamp must be specified in ISO-8601 format. Example: 2023-09-01T00:00:00Z |
Collect diagnostics with an input file¶
You can control what diagnostics are generated by specifying an input file when you run the collect
command.
You can use one of the following switches with the collect
command to specify the diagnostics
that are generated:
--from-discover
: Provide a custom discover file to specify the components that are evaluated for diagnostics. For more information about the discover file, see Discover and modify components.--from-plan
: Enables a customer plan file to specify what is included in the diagnostics. For more information about the plan file, see Understand and modify the plan file.--from-config
: Enables you to specify a config file for a component that is not currently running. To use this command, provide a configuration file for a Confluent Platform component. For more information about the config file, see Use a config file.
Following is an example of the --from-plan
switch.
java -jar "diagnostics-bundle-<version>.jar" collect --from-plan planfile.yaml
Evaluate and modify the diagnostics output¶
You can evaluate and modify the diagnostics that are output by the Diagnostics Bundle Tool by creating and/or modifying the list of components and diagnostics for those components.
Understand and modify the plan file¶
You can optionally use the plan
command to see the files that are examined and diagnostics that are generated
by the tool.
This output from this command also shows you what is excluded; passwords and other sensitive data.
Once you have generated this data, you can save it, modify it, and specify the file as input when you generate diagnostics.
The following example shows you how to run this command on the node that is running Confluent Platform:
java -jar "diagnostics-bundle-<version>.jar" plan
Following is example output (YAML) from the tool:
components:
- type: kafka
processId: 21189
diagnostics:
- type: shell
timeoutInSeconds: 35
commands:
- jinfo 21189
- top -n 10 -b -p 21189
excludedKeywords:
- password
- secret
- credential
- auth
- token
- key
# So that MAC address is not collected.
- ^([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})$
- type: logs
logDirectories:
# The log files in these directories will be collected.
- /var/log/kafka
options:
startTimestamp: 2023-08-17T05:35:04Z
endTimestamp: 2023-08-24T05:35:04Z
- type: metrics
jmxPort: 9999
jmxHost: localhost
pollingIntervalInSeconds: 6
pollingIterations: 10
include:
# JVM
- java.lang:type=OperatingSystem
- java.lang:type=Memory
# JMX
- kafka.cluster:type=Partition,*
- kafka.controller:type=ControllerStats,*
- kafka.server:type=BrokerTopicMetrics,*
- kafka.server:type=DelayedOperationPurgatory,*
excludeMetricNames:
- kafka.controller:name=ListPartitionReassignmentRateAndTimeMs,type=ControllerStats,*
excludeMetricAttributes:
- metricName: kafka.server:name=LogAppendLatencyMs,type=BrokerTopicMetrics
attributes:
- 50thPercentile
- 75thPercentile
- 95thPercentile
- type: properties
files:
- type: componentConfigurationFile
path: /etc/kafka/server.properties
- type: log4jConfigurationFile
path: /etc/kafka/log4j.properties
# These properties will not be collected.
exclude:
- confluent.ssl.key.password
- confluent.ssl.keystore.location
- confluent.ssl.keystore.password
- delegation.token.secret.key
- password.encoder.secret
- sasl.jaas.config
The file provides a list of components that diagnostics will be generated for. Each component has:
- A type. Currently
kafka
,connect
andhost
are supported - The process ID for the component (optional)
- A list of diagnostics that will be collected
You can modify this file to meet the disclosure requirements of your organization. The following table describes each component type and its elements in more detail. The table indicates whether an element is required or optional.
To skip collecting diagnostics of a particular diagnostics type, omit the section in the file you pass to the to the collect
command. For more information, see Collect diagnostics with an input file.
Diagnostics type | Element detail |
---|---|
shell |
|
logs |
|
metrics |
|
properties |
|
Discover and modify components¶
You can optionally use the discover
command with the diagnostics tool to see what components will be evaluated. This command will discover components that
are currently running and would be evaluated for diagnostic purposes. You can save this output, modify it, and
specify the file as input when you generate the diagnostics for Confluent.
The following example shows you how to run this command on the node that is running Confluent Platform.
java -jar "diagnostics-bundle.jar" discover
Following is example output (YAML) from this command:
# Diagnostic Bundle found the following supported components on this node
components:
- name: kafka
processId: 21977
log4jConfigurationFile: /etc/kafka/log4j.properties
jmxPort: 9999
jmxHost: localhost
componentConfigurationFile: /etc/kafka/server.properties
log4jDirectories:
- /var/log/kafka
dataDirectory: /var/lib/kafka/data
The following table describes each component type and its elements in more detail:
Component type | Elements detail |
---|---|
name |
Type of the component for which diagnostics are collected. Currently supported:
kafka , connect . Required. |
processId |
Optional process ID if the component is running. |
log4jConfigurationFile |
Optional path to the log4j configuration file. |
jmxPort |
Optional JMX port for the component. |
jmxHost |
Optional JMX host of the component. If not specified, defaults to localhost . |
componentConfigurationFile |
Optional path to the component configuration file. |
log4jDirectories |
Optional list of directories that contain the log4j log files to be collected. |
dataDirectory |
Optional directory containing Kafka log data. |
Use a config file¶
You can specify a configuration file for the collect
command that identifies components to be
evaluated.
The YAML file should contain the name of the component and list the paths to the
configuration and log files for the component. The file must have the following format:
components:
- name: kafka
componentConfigurationFile: /etc/kafka/server.properties
log4jConfigurationFile: /etc/kafka/log4j.properties
log4jLogDirectories:
- /etc/kafka
Element | Element detail |
---|---|
name |
Type of the component for which diagnostics are collected. Currently supported:
kafka , connect . Required. |
componentConfigurationFile |
Optional path to the component configuration file. |
log4jConfigurationFile |
Optional path to the log4j configuration file. |
log4jDirectories |
Optional list of directories that contain the log4j log files to be collected. |
Upload the files¶
Once you have generated the diagnostics bundle, it is important that you upload them in a secure manner. In addition, you should never upload sensitive information to Confluent. If you need to sanitize your files, see Evaluate and modify the diagnostics output.
To upload files, use Secure File Transfer, which enables file encryption and tracking of users that access the files. For more information, see Required Access to Confluent Network Sites.
Errors and troubleshooting¶
Info logs for the tool are output to the console by default. To view debug logs in the console,
use the --verbose/-v
switch.
If an exception occurs, only the first line of an exception is output to the console.
If you are using the collect
command and an exception occurs, you can view the full stack trace in
the application log in the output directory. This file will be named diagnostics-output-<hostname>-<YYYY>-<MM>-<DD>-<HH>-<MM>-<ss>/_meta/diagnostics.log
.
If the tool encounters failures while collecting diagnostics, the tool collects what it can. For example, if the tool cannot collect log files due to a permissions issues or cannot run a command due to missing dependencies, the tool will run and collect the remaining diagnostics.
If the tool cannot discover components, plan the diagnostics collection, or collect any
diagnostics, this is considered a fatal error and the tool does not run. For example, this
occurs when a YAML file provided to the collect
command is invalid or non-existent.
You can use the information provided in the console or check the tool logs to determine what error occured.
Troubleshoot common issues¶
Use the following information to troubleshoot common errors that might occur.
Error message | Root cause/fix |
---|---|
Error: failed to create a directory during collection | This is a fatal error that occurs when the tool does not have permission to write to the current directory. Make sure the user running the tool has appropriate write permissions. |
Error: The file provided via --from-discover/-d contains unsupported component(s) or
Error: The file provided via --from-config/-c contains unsupported component(s) |
This is a fatal error that occurs when the component type in a discover or config file is not supported. Currently
kafka and connect are supported component types. |
Error: failed to load the file provided via --from-plan/-p |
This is a fatal error that occurs when the plan file cannot be located, or is not valid YAML. This can also occur when the file is formatted correctly, but a field does not meet the validation requirements. For example, a required field is omitted. Make sure you provided the correct path to the file, that it is valid YAML, and it meets the requirements specified in the plan section. |
No diagnostic output folder is generated. | Note that an output folder is not generated for plan or discover . If you are running the
collect command, this indicates a fatal error occurred before the diagnostic output was generated.
Use the -v option to enable verbose logging and further diagnose the issue. |
No components are discovered although supported components are running. | This is a non-fatal error that occurs when the user running the diagnostic tool does not have permissions to access the relevant JVMs. This is required because the tool runs jps. To mitigate this issue, ensure that a user with the correct permissions runs the tool, such as the user who started the Confluent components. |
Release notes¶
[Dec 8, 2023] Version 1.0.1¶
- Upgraded FasterXML/jackson to version 2.15.3 due to possible CVE-2023-35116 vulnerability.