Important

You are viewing documentation for an older version of Confluent Platform. For the latest, click here.

Troubleshooting

The following sections provide information about troubleshooting your Confluent Platform deployment.

Logs

Logs are sent directory to STDOUT for each pod. Use the command below to view the logs for a pod:

kubectl logs <pod-name> -n <namespace>

Metrics

  • JMX metrics are available on port 7203 of each pod.
  • Jolokia (a REST interface for JMX metrics) is available on port 7777 of each pod.

Debugging

There are two types of problems that can go wrong while using Operator:

  • A problem exists at the infrastructure level. That is, something has gone wrong at the Kubernetes layer.
  • A problem exists at the application level. This means that the infrastructure is fine but something has gone wrong with Confluent Platform itself, usually in how something is configured.

You should look for Kubernetes issues first.

Check for potential Kubernetes errors by entering the following command:

kubectl get events -n <namespace>

Then, check for a specific resource issue, enter the following command (using the resource type example pods):

kubectl describe pods <podname> -n <namespace>

If everything looks okay after running the commands above, check the individual pod logs using the following command:

kubectl logs <pod name> -n <namespace>

Confluent Platform containers are configured so application logs go straight to STDOUT. The logs can be read directly with this command. If there is anything wrong at the application level, like an invalid configuration, this will be evident in the logs.

Note

If a pod has been replaced because it crashed and you want to check the previous pod’s logs, add --previous to the end of the command above.

Troubleshooting problems caused by the datacenter infrastructure, such as virtual machine (VM) firewall rules, DNS configuration, etc., should be resolved by infrastructure system administrator.

Testing the deployment

See the following sections for information about testing cluster communication.

Internal validation

  1. On your local machine, enter the following command to display cluster namespace information (using the example namespace operator). This information contains the bootstrap endpoint you need to complete internal validation.

    kubectl get kafka -n operator -oyaml
    

    The bootstrap endpoint is shown on the bootstrap.servers line.

    ... omitted
    
       internalClient: |-
          bootstrap.servers=kafka:9071
    
  2. On your local machine, use kubectl exec to start a bash session on one of the pods in the cluster. The example uses the default pod name kafka-0 on a Kafka cluster using the default name kafka.

    kubectl -n operator exec -it kafka-0 bash
    
  3. On the pod, create and populate a file named kafka.properties. There is no text editor installed in the containers, so you use the cat command as shown below to create this file. Use CTRL+D to save the file.

    Note

    The example shows default SASL/PLAIN security parameters. A production environment requires additional security. See Configuring security for additional information.

    cat << EOF > kafka.properties
    bootstrap.servers=kafka:9071
    sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="test" password="test123";
    sasl.mechanism=PLAIN
    security.protocol=SASL_PLAINTEXT
    EOF
    
  4. On the pod, query the bootstrap server using the following command:

    kafka-broker-api-versions --command-config kafka.properties --bootstrap-server kafka:9071
    

    You should see output for each of the three Kafka brokers that resembles the following:

    kafka-1.kafka.operator.svc.cluster.local:9071 (id: 1 rack: 0) -> (
       Produce(0): 0 to 7 [usable: 7],
       Fetch(1): 0 to 10 [usable: 10],
       ListOffsets(2): 0 to 4 [usable: 4],
       Metadata(3): 0 to 7 [usable: 7],
       LeaderAndIsr(4): 0 to 1 [usable: 1],
       StopReplica(5): 0 [usable: 0],
       UpdateMetadata(6): 0 to 4 [usable: 4],
       ControlledShutdown(7): 0 to 1 [usable: 1],
       OffsetCommit(8): 0 to 6 [usable: 6],
       OffsetFetch(9): 0 to 5 [usable: 5],
       FindCoordinator(10): 0 to 2 [usable: 2],
       JoinGroup(11): 0 to 3 [usable: 3],
       Heartbeat(12): 0 to 2 [usable: 2],
    
    ... omitted
    

    This output validates internal communication within your cluster.

External validation

Complete the following steps to validate external communication.

Prerequisites:
  • Access to download the Confluent Platform.
  • Outside access to the Kafka brokers is only available through an external load balancer. You can’t complete these steps if you did not enable an external load balancer when configuring the provider YAML file and add DNS entries.
  • To access the cluster nodes from your local machine, you must add the DNS entries to your /etc/hosts file.

Note

The examples use default component names.

  1. You use the Confluent CLI running on your local machine to complete external validation. The Confluent CLI is included with the Confluent Platform. On your local machine, download and start the Confluent Platform.

  2. On your local machine, use the kubectl get kafka -n operator -oyaml command to get the bootstrap servers endpoint for external clients. In the example below, the boostrap servers endpoint is kafka.<providerdomain>:9092.

    ... omitted
    
    externalClient: |-
       bootstrap.servers=kafka.<providerdomain>:9092
       sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="test" password="test123";
       sasl.mechanism=PLAIN
       security.protocol=SASL_PLAINTEXT
    
  3. On your local machine where you have the Confluent Platform running locally, create and populate a file named kafka.properties based on the example used in the previous step.

    bootstrap.servers=kafka.<providerdomain>:9092
    sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="test" password="test123";
    sasl.mechanism=PLAIN
    security.protocol=SASL_PLAINTEXT
    

    Note

    The example shows default SASL/PLAIN security parameters. A production environment requires additional security. See Configuring security for additional information.

  4. Using the Confluent CLI on your local machine, create a topic using the bootstrap endpoint kafka<providerdomain>:9092. The example below creates a topic with 1 partition and 3 replicas.

    kafka-topics --bootstrap-server kafka.<providerdomain>:9092 \
    --command-config kafka.properties \
    --create --replication-factor 3 \
    --partitions 1 --topic example
    
  5. Using the Confluent CLI on your local machine, produce to the new topic using the bootstrap endpoint kafka.<providerdomain>:9092. Note that the bootstrap server load balancer is the only Kafka broker endpoint required because it provides gateway access to the load balancers for all Kafka brokers.

    seq 10000 | kafka-console-producer \
    --topic example --broker-list kafka.<providerdomain>:9092 \
    --producer.config kafka.properties
    
  6. In a new terminal on your local machine, use the Confluent CLI to consume from the new topic.

    kafka-console-consumer --from-beginning \
    --topic example --bootstrap-server kafka.<providerdomain>:9092 \
    --consumer.config kafka.properties
    

Successful completion of these steps validates external communication with your cluster.

Common CLI commands

The following provides common commands that you may find useful when managing the cluster.

Helm commands

Use the following commands to display component installation notes and other component release information.

Show the component release information.

helm list --namespace <namespace-name>

or

helm list --kube-context <kubernetes-cluster-name> --namespace <namespace-name>

Show the current status and release notes.

helm status <component-release-name>

Show the template used to deployment the component.

helm get <component-release-name>

Uninstall a component release from the cluster.

helm uninstall <component-release-name> --namespace <namespace-name>

kubectl commands

Use the following commands to get information about your cluster.

Get Kubernetes cluster name.

kubectl config current-context

Set the context. Use this when using multiple namespaces in your environment. For troubleshooting, you may need to set a context even when having only one namespace.

kubectl config set-context <kubernetes-cluster-name>  --namespace=<namespace-name>

Get cluster information.

kubectl get kafka -n <namespace-name> -oyaml

Get cluster nodes.

kubectl get nodes

Get node details.

kubectl  describe node <node>

Tip

The following two commands are useful for getting the internal and external IP addresses for Confluent Platform components.

Check for Kubernetes issues.

kubectl get events -n <namespace>

Get services for a namespace (for example, operator) or all namespaces.

kubectl get services -n operator
kubectl get services --all-namespaces

Get all pods in all namespaces.

kubectl get pods --all-namespaces

Get Kafka broker details (for example, operator).

kubectl get kafka -n operator -oyaml

Get pods with details within a namespace (for example, operator).

kubectl describe pods -n operator

Get pod details.

kubectl describe pods <podname> -n <namespace>

Get pod logs.

kubectl logs <pod name> -n <namespace>

Access a pod container.

kubectl -n <namespace> exec -it <podname> bash

Access a pod container when there is more than one container.

kubectl -n <namespace> exec -it <pod name> --container <container> bash

Run a command.

kubectl exec <pod name> <command>

Run a command if there is more than one container.

kubectl exec <pod name> --container <container> <command>