Important
You are viewing documentation for an older version of Confluent Platform. For the latest, click here.
Troubleshooting¶
The following sections provide information about troubleshooting your Confluent Platform deployment.
Logs¶
Logs are sent directory to STDOUT for each pod. Use the command below to view the logs for a pod:
kubectl logs <pod-name> -n <namespace>
Metrics¶
- JMX metrics are available on port 7203 of each pod.
- Jolokia (a REST interface for JMX metrics) is available on port 7777 of each pod.
Debugging¶
There are two types of problems that can go wrong while using Operator:
- A problem exists at the infrastructure level. That is, something has gone wrong at the Kubernetes layer.
- A problem exists at the application level. This means that the infrastructure is fine but something has gone wrong with Confluent Platform itself, usually in how something is configured.
You should look for Kubernetes issues first.
Check for potential Kubernetes errors by entering the following command:
kubectl get events -n <namespace>
Then, check for a specific resource issue, enter the following command (using the resource type example pods):
kubectl describe pods <podname> -n <namespace>
If everything looks okay after running the commands above, check the individual pod logs using the following command:
kubectl logs <pod name> -n <namespace>
Confluent Platform containers are configured so application logs go straight to STDOUT. The logs can be read directly with this command. If there is anything wrong at the application level, like an invalid configuration, this will be evident in the logs.
Note
If a pod has been replaced because it crashed and you want to check the
previous pod’s logs, add --previous
to the end of the command above.
Troubleshooting problems caused by the datacenter infrastructure, such as virtual machine (VM) firewall rules, DNS configuration, etc., should be resolved by infrastructure system administrator.
Testing the deployment¶
See the following sections for information about testing cluster communication.
Internal validation¶
On your local machine, enter the following command to display cluster namespace information (using the example namespace operator). This information contains the bootstrap endpoint you need to complete internal validation.
kubectl get kafka -n operator -oyaml
The bootstrap endpoint is shown on the
bootstrap.servers
line.... omitted internalClient: |- bootstrap.servers=kafka:9071
On your local machine, use kubectl exec to start a bash session on one of the pods in the cluster. The example uses the default pod name
kafka-0
on a Kafka cluster using the default namekafka
.kubectl -n operator exec -it kafka-0 bash
On the pod, create and populate a file named
kafka.properties
. There is no text editor installed in the containers, so you use the cat command as shown below to create this file. UseCTRL+D
to save the file.Note
The example shows default SASL/PLAIN security parameters. A production environment requires additional security. See Configuring security for additional information.
cat << EOF > kafka.properties bootstrap.servers=kafka:9071 sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="test" password="test123"; sasl.mechanism=PLAIN security.protocol=SASL_PLAINTEXT EOF
On the pod, query the bootstrap server using the following command:
kafka-broker-api-versions --command-config kafka.properties --bootstrap-server kafka:9071
You should see output for each of the three Kafka brokers that resembles the following:
kafka-1.kafka.operator.svc.cluster.local:9071 (id: 1 rack: 0) -> ( Produce(0): 0 to 7 [usable: 7], Fetch(1): 0 to 10 [usable: 10], ListOffsets(2): 0 to 4 [usable: 4], Metadata(3): 0 to 7 [usable: 7], LeaderAndIsr(4): 0 to 1 [usable: 1], StopReplica(5): 0 [usable: 0], UpdateMetadata(6): 0 to 4 [usable: 4], ControlledShutdown(7): 0 to 1 [usable: 1], OffsetCommit(8): 0 to 6 [usable: 6], OffsetFetch(9): 0 to 5 [usable: 5], FindCoordinator(10): 0 to 2 [usable: 2], JoinGroup(11): 0 to 3 [usable: 3], Heartbeat(12): 0 to 2 [usable: 2], ... omitted
This output validates internal communication within your cluster.
External validation¶
Complete the following steps to validate external communication.
- Prerequisites:
- Access to download the Confluent Platform.
- Outside access to the Kafka brokers is only available through an external load balancer. You can’t complete these steps if you did not enable an external load balancer when configuring the provider YAML file and add DNS entries.
- To access the cluster nodes from your local machine, you must add the DNS entries to your
/etc/hosts
file.
Note
The examples use default component names.
You use the Confluent CLI running on your local machine to complete external validation. The Confluent CLI is included with the Confluent Platform. On your local machine, download and start the Confluent Platform.
On your local machine, use the
kubectl get kafka -n operator -oyaml
command to get the bootstrap servers endpoint for external clients. In the example below, the boostrap servers endpoint iskafka.<providerdomain>:9092
.... omitted externalClient: |- bootstrap.servers=kafka.<providerdomain>:9092 sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="test" password="test123"; sasl.mechanism=PLAIN security.protocol=SASL_PLAINTEXT
On your local machine where you have the Confluent Platform running locally, create and populate a file named
kafka.properties
based on the example used in the previous step.bootstrap.servers=kafka.<providerdomain>:9092 sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="test" password="test123"; sasl.mechanism=PLAIN security.protocol=SASL_PLAINTEXT
Note
The example shows default SASL/PLAIN security parameters. A production environment requires additional security. See Configuring security for additional information.
Using the Confluent CLI on your local machine, create a topic using the bootstrap endpoint
kafka<providerdomain>:9092
. The example below creates a topic with 1 partition and 3 replicas.kafka-topics --bootstrap-server kafka.<providerdomain>:9092 \ --command-config kafka.properties \ --create --replication-factor 3 \ --partitions 1 --topic example
Using the Confluent CLI on your local machine, produce to the new topic using the bootstrap endpoint
kafka.<providerdomain>:9092
. Note that the bootstrap server load balancer is the only Kafka broker endpoint required because it provides gateway access to the load balancers for all Kafka brokers.seq 10000 | kafka-console-producer \ --topic example --broker-list kafka.<providerdomain>:9092 \ --producer.config kafka.properties
In a new terminal on your local machine, use the Confluent CLI to consume from the new topic.
kafka-console-consumer --from-beginning \ --topic example --bootstrap-server kafka.<providerdomain>:9092 \ --consumer.config kafka.properties
Successful completion of these steps validates external communication with your cluster.
Common CLI commands¶
The following provides common commands that you may find useful when managing the cluster.
Helm commands¶
Use the following commands to display component installation notes and other component release information.
Show the component release information.
helm list --namespace <namespace-name>
or
helm list --kube-context <kubernetes-cluster-name> --namespace <namespace-name>
Show the current status and release notes.
helm status <component-release-name>
Show the template used to deployment the component.
helm get <component-release-name>
Uninstall a component release from the cluster.
helm uninstall <component-release-name> --namespace <namespace-name>
kubectl commands¶
Use the following commands to get information about your cluster.
Get Kubernetes cluster name.
kubectl config current-context
Set the context. Use this when using multiple namespaces in your environment. For troubleshooting, you may need to set a context even when having only one namespace.
kubectl config set-context <kubernetes-cluster-name> --namespace=<namespace-name>
Get cluster information.
kubectl get kafka -n <namespace-name> -oyaml
Get cluster nodes.
kubectl get nodes
Get node details.
kubectl describe node <node>
Tip
The following two commands are useful for getting the internal and external IP addresses for Confluent Platform components.
Check for Kubernetes issues.
kubectl get events -n <namespace>
Get services for a namespace (for example, operator) or all namespaces.
kubectl get services -n operator
kubectl get services --all-namespaces
Get all pods in all namespaces.
kubectl get pods --all-namespaces
Get Kafka broker details (for example, operator).
kubectl get kafka -n operator -oyaml
Get pods with details within a namespace (for example, operator).
kubectl describe pods -n operator
Get pod details.
kubectl describe pods <podname> -n <namespace>
Get pod logs.
kubectl logs <pod name> -n <namespace>
Access a pod container.
kubectl -n <namespace> exec -it <podname> bash
Access a pod container when there is more than one container.
kubectl -n <namespace> exec -it <pod name> --container <container> bash
Run a command.
kubectl exec <pod name> <command>
Run a command if there is more than one container.
kubectl exec <pod name> --container <container> <command>