Troubleshoot Confluent Operator

The following sections provide information about troubleshooting your Confluent Platform deployment.

Logs

Logs are sent directory to STDOUT for each pod. Use the command below to view the logs for a pod:

kubectl logs <pod-name> -n <namespace>

Metrics

  • JMX metrics are available on port 7203 of each pod.
  • Jolokia (a REST interface for JMX metrics) is available on port 7777 of each pod.

Debug

There are two types of problems that can go wrong while using Operator:

  • A problem exists at the infrastructure level. That is, something has gone wrong at the Kubernetes layer.
  • A problem exists at the application level. This means that the infrastructure is fine but something has gone wrong with Confluent Platform itself, usually in how something is configured.

You should look for Kubernetes issues first.

Check for potential Kubernetes errors by entering the following command:

kubectl get events -n <namespace>

Then, check for a specific resource issue, enter the following command (using the resource type example pods):

kubectl describe pods <podname> -n <namespace>

If everything looks okay after running the commands above, check the individual pod logs using the following command:

kubectl logs <pod name> -n <namespace>

Confluent Platform containers are configured so application logs go straight to STDOUT. The logs can be read directly with this command. If there is anything wrong at the application level, like an invalid configuration, this will be evident in the logs.

Note

If a pod has been replaced because it crashed and you want to check the previous pod’s logs, add --previous to the end of the command above.

Troubleshooting problems caused by the datacenter infrastructure, such as virtual machine (VM) firewall rules, DNS configuration, etc., should be resolved by infrastructure system administrator.

Test the deployment

Test and validate your deployment as described in the following sections.

Internal validation

Complete the following steps to validate internal communication.

  1. On your local machine, enter the following command to display cluster namespace information (using the example namespace operator). This information contains the bootstrap endpoint you need to complete internal validation.

    kubectl get kafka -n operator -oyaml
    

    The bootstrap endpoint is shown on the bootstrap.servers line.

    ... omitted
    
       internalClient: |-
          bootstrap.servers=kafka:9071
    
  2. On your local machine, use kubectl exec to start a bash session on one of the pods in the cluster. The example uses the default pod name kafka-0 on a Kafka cluster using the default name kafka.

    kubectl -n operator exec -it kafka-0 bash
    
  3. On the pod, create and populate a file named kafka.properties. There is no text editor installed in the containers, so you use the cat command as shown below to create this file. Use CTRL+D to save the file.

    Note

    The example shows default SASL/PLAIN security parameters. A production environment requires additional security. See Configure Security with Confluent Operator for additional information.

    cat << EOF > kafka.properties
    bootstrap.servers=kafka:9071
    sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="test" password="test123";
    sasl.mechanism=PLAIN
    security.protocol=SASL_PLAINTEXT
    EOF
    
  4. On the pod, query the bootstrap server using the following command:

    kafka-broker-api-versions --command-config kafka.properties --bootstrap-server kafka:9071
    

    You should see output for each of the three Kafka brokers that resembles the following:

    kafka-1.kafka.operator.svc.cluster.local:9071 (id: 1 rack: 0) -> (
       Produce(0): 0 to 7 [usable: 7],
       Fetch(1): 0 to 10 [usable: 10],
       ListOffsets(2): 0 to 4 [usable: 4],
       Metadata(3): 0 to 7 [usable: 7],
       LeaderAndIsr(4): 0 to 1 [usable: 1],
       StopReplica(5): 0 [usable: 0],
       UpdateMetadata(6): 0 to 4 [usable: 4],
       ControlledShutdown(7): 0 to 1 [usable: 1],
       OffsetCommit(8): 0 to 6 [usable: 6],
       OffsetFetch(9): 0 to 5 [usable: 5],
       FindCoordinator(10): 0 to 2 [usable: 2],
       JoinGroup(11): 0 to 3 [usable: 3],
       Heartbeat(12): 0 to 2 [usable: 2],
    
    ... omitted
    

    This output validates internal communication within your cluster.

External validation

Take the following steps to validate external communication after you have enabled external access to Kafka and added DNS entries as described in External access to Kafka.

Note

The examples use default Confluent Platform component names and the default Kafka bootstrap prefix, kafka.

  1. On your local machine, download the Confluent Platform. You only need to download and set the PATH and required environment variables to use Confluent CLI. You do not need to start Confluent Platform on your local machine.

    You use the Confluent CLI running on your local machine to complete external validation. The Confluent CLI is included with the Confluent Platform.

  2. On your local machine, run the command to get the bootstrap servers endpoint for external clients.

    kubectl get kafka -n operator -oyaml
    

    In the example output below, the bootstrap server endpoint is kafka.mydomain:9092.

    ... omitted
    
    externalClient: |-
       bootstrap.servers=kafka.mydomain:9092
       sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="test" password="test123";
       sasl.mechanism=PLAIN
       security.protocol=SASL_PLAINTEXT
    
  3. On your local machine where you have the Confluent Platform running locally, create and populate a file named kafka.properties with the following content. Assign the external endpoint you retrieved in the above step to bootstrap.servers.

    bootstrap.servers=<kafka bootstrap endpoint>
    sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="test" password="test123";
    sasl.mechanism=PLAIN
    security.protocol=SASL_PLAINTEXT
    

    Note

    The example shows default SASL/PLAIN security parameters. A production environment requires additional security. See Configure Security with Confluent Operator for additional information.

  4. On your local machine, create a topic using the bootstrap endpoint <kafka bootstrap endpoint>. The example below creates a topic with 1 partition and 3 replicas.

    kafka-topics --bootstrap-server <kafka bootstrap endpoint> \
    --command-config kafka.properties \
    --create --replication-factor 3 \
    --partitions 1 --topic example
    
  5. On your local machine, produce to the new topic using the bootstrap endpoint <kafka bootstrap endpoint>. Note that the bootstrap server endpoint is the only Kafka broker endpoint required because it provides gateway access to all Kafka brokers.

    seq 10000 | kafka-console-producer \
    --topic example --broker-list <kafka bootstrap endpoint> \
    --producer.config kafka.properties
    
  6. In a new terminal on your local machine, from the directory you put kafka.properties, issue the Confluent CLI command to consume from the new topic.

    kafka-console-consumer --from-beginning \
    --topic example --bootstrap-server <kafka bootstrap endpoint> \
    --consumer.config kafka.properties
    

Successful completion of these steps validates external communication with your cluster.