Monitor Confluent Platform with Confluent for Kubernetes

Monitor your Confluent for Kubernetes (CFK) environment using the following tools and resources:

  • Confluent Health+ with Telemetry
  • JMX metrics monitoring integrations
  • Confluent Control Center

Confluent Health+

Confluent Health+ provides ongoing, real-time analysis of performance and configuration data for your Confluent Platform deployment. From this analysis, Health+ sends out notifications to alert users to potential environmental issues before they become critical problems.

For more information, see Confluent Health+.

Telemetry Reporter

The Confluent Telemetry Reporter is a plugin that runs inside each Confluent Platform service to push metadata about the service to Confluent. Telemetry Reporter enables product features based on the metadata, like Health+. Data is sent over HTTP using an encrypted connection.

Telemetry is disabled by default in CFK. You can enable and configure it globally at the CFK level.

When you globally enable Telemetry, you have an option to disable for specific Confluent Platform components.

Each Confluent Platform component CR provides the status condition, in status.conditions, whether Telemetry is enabled or disabled.

For more information and supported settings for Telemetry Reporter, see Confluent Telemetry Reporter.

For a list of the metrics that are collected for Health+, see Telemetry Reporter Metrics.

Globally configure Telemetry

To globally enable Telemetry Reporter for all Confluent Platform components:

  1. Set the following in the CFK values file:

    telemetry:
      enabled: true
    
  2. Apply the change with the following command:

    helm upgrade --install confluent-operator \
      confluentinc/confluent-for-kubernetes \
      --values <path-to-values-file> \
      --namespace <namespace>
    

To globally configure the Telemetry Reporter settings for all Confluent Platform components:

  1. Set the following in the CFK values file:

    telemetry:
      secretRef:                 --- [1]
      directoryPathInContainer:  --- [2]
    
    • [1] [2] CFK supports the secretRef and directoryPathInContainer methods to load Telemetry configuration through Helm.

    • [1] secretRef takes the precedence over directoryPathInContainer if both are configured.

      The secretRef must contain the following:

      telemetry.txt : |-
       api.key=<cloud_key>
       api.secret=<cloud_secret>
       proxy.url=<proxy_url>
       proxy.username=<proxy_username>
       proxy.password=<proxy_password>
      

      If the referenced secretRef is not read or data is not in the expected format, CFK will fail to start.

    • [2] Provide the mount path or directory path where telemetry.txt is present. The telemetry.txt should contain the following:

      api.key=<cloud_key>
      api.secret=<cloud_secret>
      proxy.url=<proxy_url>
      proxy.username=<proxy_username>
      proxy.password=<proxy_password>
      

      If telemetry.txt is not in the expected format, CFK will fail to start.

  2. To apply changes in Telemetry settings, in the referenced Secret, or in the telemetry.txt file, manually restart CFK and Confluent Platform:

    • Restart CFK:

      kubectl rollout restart deployment/confluent-operator
      
    • Restart a Confluent Platform component:

      kubectl rollout restart sts/<name>
      

      See Restart Confluent Platform Cluster for looking up <name> of a component.

Disable Telemetry for a Confluent Platform component

To disable Telemetry for a specific Confluent Platform component, set the following in the component CR and apply the change with the kubectl apply command:

telemetry:
  global: false

JMX Metrics

CFK deploys all Confluent components with JMX metrics enabled by default. These JMX metrics are made available on all pods at the following endpoints:

  • JMX metrics are available on port 7203 of each pod.

  • Jolokia (a REST interface for JMX metrics) is available on port 7777 of each pod.

  • JMX Prometheus exporter is available on port 7778.

    Authentication / encryption is not supported for Prometheus exporter.

Configure security on JMX metrics endpoints

By default, authentication, encryption, and external access are not provided on JMX/Prometheus metric endpoints, but you have options to configure authentication, TLS, and external access for JMX/Prometheus metric endpoints at the component CR level:

spec:
  metrics:
    authentication:
      type:                    --- [1]
    prometheus:                --- [2]
      rules:                   --- [3]
        - attrNameSnakeCase:
          cache:
          help:
          labels:
          name:
          pattern:
          type:
          value:
          valueFactor:
      blackList:               --- [4]
      whiteList:               --- [5]
    tls:
      enabled:                 --- [6]
  • [1] Set to mtls for mTLS authentication.

    If you set this to mtls, you must set tls.enabled: true ([3]).

  • [2] Specify Prometheus configurations to override the default settings.

    See Prometheus for more information about the rules, blacklist, and whitelist properties.

  • [3] A list of rules to apply.

    For example:

    spec:
      metrics:
        prometheus:
          rules:
            - pattern: "org.apache.kafka.metrics<type=(\w+), name=(\w+)><>Value: (\d+)"
              name: "kafka_$1_$2"
              value: "$3"
              valueFactor: "0.000001"
              labels:
                "$1": "$4"
                "$2": "$3"
              help: "Kafka metric $1 $2"
              cache: false
              type: "GAUGE"
              attrNameSnakeCase: false
    
  • [4] A pattern to identify what not to query.

    For example:

    spec:
      metrics:
        prometheus:
          blackist: "org.apache.kafka.metrics:*"
    
  • [5] A pattern to identify what to query.

    For example:

    spec:
      metrics:
        prometheus:
          whitelist: "org.apache.kafka.metrics:type=ColumnFamily,*"
    
  • [6] If set to true, metrics are configured with global or component TLS as described in Configure Network Encryption with Confluent for Kubernetes.

Configure Prometheus and Grafana

You can configure Prometheus to capture and aggregate JMX metrics from Confluent components. Then you configure Grafana to visualize those metrics in a dashboard.

For an example configuration scenario, see Monitoring with Prometheus and Grafana.

Confluent Control Center

Confluent Control Center is a web-based tool for managing and monitoring Confluent Platform. Control Center provides a user interface that enables developers and operators to:

  • Get a quick overview of cluster health
  • Observe and control messages, topics, and Schema Registry
  • Develop and run ksqlDB queries

For the metrics available for monitoring, see Metrics available in Control Center.

Configure Control Center to monitor Kafka clusters

The Confluent Metrics Reporter collects various metrics from an Apache Kafka® cluster. Control Center then uses those metrics to provide a detailed monitoring view of the Kafka cluster.

By default, the Confluent Metrics Reporter is enabled and configured to send metrics for the Kafka cluster to a set of topics on the same Kafka cluster.

To send metrics to a different cluster, or to configure specific authentication settings, configure the Kafka custom resource (CR):

metricReporter:
  enabled:                   --- [1]
  authentication:
    type:                    --- [2]
    jaasConfigPassThrough:
      secretRef:             --- [3]
  tls:
    enabled:                 --- [4]
  • [1] Set to true or false to enable or disable the metrics reporting.
  • [2] Set to the authentication type to use for Kafka. See Configure authentication to access Kafka for details.
  • [3] Set to the Kubernetes Secret name used to authenticate to Kafka.
  • [4] Set to true if the Kafka cluster has TLS network encryption enabled.

Once Confluent Metrics Reporter is setup for a Kafka cluster, configure Control Center to monitor the cluster.

By default, Control Center is set up to monitor the Kafka cluster it is using to store its own state.

If there is another Kafka cluster to monitor, you can configure that in the Control Center CR as below:

spec:
  monitoringKafkaClusters:
  - name:                    --- [1]
    bootstrapEndpoint:       --- [2]
  • [1] Set to Kafka cluster name.
  • [2] Set to the Kafka bootstrap endpoint.

Configure Control Center to monitor ksqlDB, Connect and Schema Registry clusters

You can configure Control Center to provide a detailed monitoring or management view of ksqlDB, Connect, and Schema Registry clusters.

The following is an example of the dependencies section in a Control Center CR. The example connects two Schema Registry clusters, two ksqlDB clusters, and two Connect clusters to Control Center:

spec:
 dependencies:
   schemaRegistry:
     url: https://schemaregistry.confluent.svc.cluster.local:8081
     tls:
       enabled: true
     authentication:
       type: mtls
     clusters:
     - name: schemaregistry-dev
       url: https://schemaregistry-dev.confluent.svc.cluster.local:8081
       tls:
         enabled: true
       authentication:
        type: mtls
   ksqldb:
   - name: ksql-dev
     url: https://ksqldb.confluent.svc.cluster.local:8088
     tls:
       enabled: true
     authentication:
       type: mtls
   - name: ksql-dev1
     url: https://ksqldb-dev.confluent.svc.cluster.local:8088
     tls:
       enabled: true
     authentication:
       type: mtls
   connect:
   - name: connect-dev
     url: https://connect.confluent.svc.cluster.local:8083
     tls:
       enabled: true
     authentication:
       type: mtls
   - name: connect-dev2
     url: https://connect-dev.confluent.svc.cluster.local:8083
     tls:
       enabled: true
     authentication:
       type: mtls

For an example scenario to configure Confluent Control Center to monitor multiple ksqlDB, Connect, and Schema Registry clusters, see Connect Control Center to Multiple Connect, ksqlDB, and Schema Registry Clusters.