Configure Observer container for Confluent Platform

Starting with Confluent for Kubernetes 3.2.0, the Observer container is available as a lightweight health monitoring sidecar for Kafka and KRaft controllers running on Kubernetes. It provides enhanced health checking capabilities with support for secure authentication and certificate management.

Note

The Observer container follows the same version scheme as the confluent-init-container.

Overview of Observer container

The Observer container is a sidecar that runs alongside your Kafka brokers and KRaft controllers to provide Kubernetes-native health checks. It monitors component health through JMX metrics and provides standard Kubernetes probe endpoints.

Key capabilities:

Self-contained readiness: Observer determines pod readiness independently without operator dependency.
Component-specific health monitoring: Monitors Kafka and KRaft controllers via Jolokia JMX.
Separate startup and readiness validation: Distinguishes between initial startup and production readiness checks.
Secure authentication: Supports mutual TLS (mTLS) for probe communication and JMX metric collection.
Certificate hot-reload: Enables seamless certificate rotation without pod restarts.
Standard Kubernetes health endpoints: Provides /startupz, /readyz, /livez, and /healthz endpoints.
Configurable retry logic: Built-in retry with exponential backoff for Jolokia client.
Lightweight: Minimal resource footprint (approximately 10m CPU, 32Mi memory).

Compare Observer container benefits

The Observer container provides several advantages over the traditional readiness and liveness probe implementation.

Self-contained health checks

The observer’s primary design goal is self-contained pod readiness determination. Pods can be deployed and become ready without depending on the Confluent Operator. This enables:

Standalone deployments without operator.
Operator failures do not affect running cluster health checks.
Simpler debugging, which means there is no distributed state to reason about.

Enhanced health validation

The Observer container provides more sophisticated health checks:

For Kafka brokers:

Validates that URP equals zero before marking the pod as ready.
Ensures data integrity during rolling updates.

For KRaft controllers:

Queries all peer controllers to determine the cluster’s maximum Log End Offset (LEO).
Validates that the local controller’s LEO lag is within acceptable threshold. The default is 1000.
Ensures metadata synchronization before marking the controller as ready.

Separate startup and readiness validation

The Observer container distinguishes between startup and readiness:

Startup probe (/startupz): Checks only for basic process initialization. This probe is lenient and allows controllers time to catch up on metadata during initial startup.
Readiness probe (/readyz): Performs full validation including URP=0 for Kafka or LEO lag ≤ threshold for KRaft. The pod begins receiving traffic only when this probe passes.

This separation prevents premature traffic routing while allowing sufficient time for component initialization.

Simplified rolling restarts

When using the Observer container, rolling restarts for KRaft controllers and Kafka brokers can be performed using standard Kubernetes StatefulSet restart commands, the same way as other Confluent Platform components. For details, see Restart Confluent Platform Using Confluent for Kubernetes.

Configure authentication

The Observer container supports the following authentication configurations:

mTLS (Mutual TLS)

The recommended authentication method for production deployments. The Observer container can use mTLS for:

Probe communication: Kubernetes probes authenticate to the Observer server using client certificates.
JMX metric collection: The Observer authenticates to Jolokia endpoints using client certificates.

When mTLS is enabled:

The Observer server listens on HTTPS. The default port is 7443.
Kubernetes probes use the observer-client binary with TLS certificates.
Jolokia client connections use mutual TLS authentication.

Plaintext (HTTP)

For development or testing environments where encryption is not required:

The Observer server listens on HTTP. The default port is 7080.
Kubernetes probes connect without TLS.
Jolokia client connections use HTTP without authentication.

Warning

Plaintext mode is not recommended for production deployments.

Certificate management

The Observer container supports:

ConfigMap/Secret-based certificates: Certificates mounted as Kubernetes ConfigMaps or Secrets.
Vault integration: HashiCorp Vault for dynamic secret injection (DPIC mode).
Certificate hot-reload: Automatic detection and reload of updated certificates without pod restarts.

Configure Observer container

The Observer container is configured through the Confluent Platform component custom resource (CR). All the configurations are specified under spec.services.observer.

Basic configuration

To enable the Observer container with default settings:

apiVersion: platform.confluent.io/v1beta1
kind: Kafka  # or KRaftController
metadata:
  name: kafka
spec:
  services:
    observer:
      # Observer container image
      image: "confluentinc/confluent-observer-container:3.2.0"

      # Logging level (debug, info, warn, error)
      logLevel: "info"

Configure readiness thresholds

You can customize the health check thresholds based on your requirements:

For KRaft controllers:

apiVersion: platform.confluent.io/v1beta1
kind: KRaftController
metadata:
  name: kraftcontroller
spec:
  services:
    observer:
      image: "confluentinc/confluent-observer-container:3.2.0"
      readiness:
        # Maximum Log End Offset lag before marking controller not ready
        # Default: 1000
        maxLeoLag: 500

For Kafka brokers:

apiVersion: platform.confluent.io/v1beta1
kind: Kafka
metadata:
  name: kafka
spec:
  services:
    observer:
      image: "confluentinc/confluent-observer-container:3.2.0"
      readiness:
        # Maximum under-replicated partitions before marking broker not ready
        # Default: 0
        maxURP: 0

Configure probe timing

Customize probe timing to match your environment:

apiVersion: platform.confluent.io/v1beta1
kind: Kafka
metadata:
  name: kafka
spec:
  services:
    observer:
      image: "confluentinc/confluent-observer-container:3.2.0"
      containerOverrides:
        probe:
          startup:
            initialDelaySeconds: 15
            periodSeconds: 5
            timeoutSeconds: 10
            failureThreshold: 30
          readiness:
            initialDelaySeconds: 20
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 3
          liveness:
            initialDelaySeconds: 20
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 3

Configure Jolokia client

Adjust the JMX client timeout and retry settings:

apiVersion: platform.confluent.io/v1beta1
kind: Kafka
metadata:
  name: kafka
spec:
  services:
    observer:
      image: "confluentinc/confluent-observer-container:3.2.0"
      clients:
        jolokia:
          # Request timeout for JMX calls
          timeout: "15s"

          # Maximum retry attempts for failed requests
          maxRetries: 5

          # Initial backoff between retries
          retryBackoff: "200ms"

Configure resource limits

Adjust resource allocation for the Observer container:

apiVersion: platform.confluent.io/v1beta1
kind: Kafka
metadata:
  name: kafka
spec:
  services:
    observer:
      image: "confluentinc/confluent-observer-container:3.2.0"
      containerOverrides:
        resources:
          requests:
            cpu: "10m"
            memory: "32Mi"
          limits:
            cpu: "100m"
            memory: "128Mi"

Multi-cluster KRaft deployments

For KRaft controllers deployed across multiple Kubernetes clusters, you must specify peer endpoints manually as the Observer cannot auto-discover endpoints outside the local cluster:

apiVersion: platform.confluent.io/v1beta1
kind: KRaftController
metadata:
  name: kraftcontroller
spec:
  services:
    observer:
      image: "confluentinc/confluent-observer-container:3.2.0"
      cluster:
        # List of all controller Jolokia endpoints (including this one)
        peerEndpoints:
          - "https://kraftcontroller-0.cluster-a.example.com:7777/jolokia"
          - "https://kraftcontroller-1.cluster-b.example.com:7777/jolokia"
          - "https://kraftcontroller-2.cluster-c.example.com:7777/jolokia"

Configuration reference

The following table summarizes all available configuration options:

Configuration Path	Description	Default
`spec.services.observer.image`	Observer container image	Required
`spec.services.observer.logLevel`	Logging level (debug, info, warn, error)	`info`
`spec.services.observer.readiness.maxLeoLag`	Maximum LEO lag for KRaft controllers	`1000`
`spec.services.observer.readiness.maxURP`	Maximum under-replicated partitions for Kafka	`0`
`spec.services.observer.clients.jolokia.timeout`	Jolokia request timeout	`10s`
`spec.services.observer.clients.jolokia.maxRetries`	Maximum retry attempts	`3`
`spec.services.observer.clients.jolokia.retryBackoff`	Initial retry backoff	`100ms`
`spec.services.observer.cluster.peerEndpoints`	Custom peer Jolokia endpoints (multi-cluster)	Auto-discovered
`spec.services.observer.containerOverrides.probe.*`	Probe timing configuration	See Monitor health endpoints
`spec.services.observer.containerOverrides.resources`	Resource limits and requests	10m CPU, 32Mi memory
`spec.services.observer.containerOverrides.env`	Custom environment variables	`[]`

Monitor health endpoints

The Observer container exposes the following health check endpoints:

Startup probe (`/startupz`)

Purpose: Verifies that the component process has initialized.

Validation: Checks only for basic Jolokia connectivity (JMX endpoint response).

Use case: Allows Kubernetes to determine when the component has started and is ready for more stringent health checks. This probe is lenient to allow controllers time to catch up on metadata during initial startup.

Default configuration:

initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 30 (allows up to 150 seconds for startup)

Readiness probe (`/readyz`)

Purpose: Determines if the component is ready to receive traffic.

Validation:

For Kafka brokers: Verifies URP equals the configured threshold. The default value is 0.
For KRaft controllers: Verifies LEO lag is within the configured threshold. The default value is ≤ 1000.

Use case: Kubernetes uses this probe to decide when to add the pod to the Service endpoints and route traffic to it.

Default configuration:

initialDelaySeconds: 20
periodSeconds: 10
failureThreshold: 3

Liveness probe (`/livez`)

Purpose: Determines if the component process is alive and responsive.

Validation: Performs basic port connectivity checks to verify the component process is running.

Use case: Kubernetes uses this probe to determine if the pod should be restarted.

Default configuration:

initialDelaySeconds: 20
periodSeconds: 10
failureThreshold: 3

Health diagnostics (`/healthz`)

Purpose: Provides comprehensive health diagnostics for troubleshooting.

Response: Returns detailed JSON with probe results, metrics, and historical health check data.

Use case: Used by administrators for debugging health issues. Not used by Kubernetes probes.

Example response:

{
  "status": "healthy",
  "timestamp": "2024-01-15T10:30:45Z",
  "component": {
    "type": "kraftcontroller",
    "pod_name": "kraftcontroller-1"
  },
  "probes": {
    "startup": {"status": "pass", "message": "Jolokia responding"},
    "readiness": {"status": "pass", "message": "LEO lag: 16"},
    "liveness": {"status": "pass", "message": "Controller port healthy"}
  },
  "metrics": {
    "kraft_log_end_offset": {"value": 15234, "status": "healthy"},
    "connectivity": {"value": true, "status": "healthy"}
  }
}

Manage certificates

The Observer container supports automatic certificate rotation without pod restarts.

Certificate hot-reload

When TLS certificates are updated (via ConfigMap or Secret updates), the Observer container automatically:

Detects the certificate file changes.
Reloads the TLS configuration.
Applies new certificates to all new connections.
Maintains zero downtime during the rotation.

Certificate paths

The Observer container expects certificates in the following locations:

For Observer server TLS (probe connections):

Mount path: /mnt/sslcerts/observer/
Required files:
- fullchain.pem: Server certificate chain
- privkey.pem: Server private key
- cacerts.pem: CA certificate bundle

For Jolokia client mTLS (JMX connections):

Mount path: /mnt/sslcerts/jolokia/ or /vault/secrets/
Required files:
- fullchain.pem: Client certificate chain
- privkey.pem: Client private key
- cacerts.pem: CA certificate bundle

Vault integration (DPIC mode)

When using HashiCorp Vault for secret management, certificates are injected into the /vault/secrets/ directory. The Observer container automatically detects and uses these certificates.

Troubleshoot Observer issues

Checking Observer logs

To view Observer container logs:

kubectl logs <pod-name> -c observer -n <namespace>

Enable debug logging by updating the component CR:

apiVersion: platform.confluent.io/v1beta1
kind: Kafka
metadata:
  name: kafka
spec:
  services:
    observer:
      logLevel: "debug"

Verifying probe health

To manually test a health endpoint:

# From inside the pod
kubectl exec <pod-name> -c observer -n <namespace> -- \
  /usr/local/bin/observer-client --endpoint /readyz

Common issues

Probe failures during startup

If startup probes fail:

Verify Jolokia endpoint is accessible:

kubectl exec <pod-name> -c observer -n <namespace> -- \
  curl -k https://localhost:7777/jolokia/version

Check Observer logs for connection errors.
Verify TLS certificates are correctly mounted.
Consider increasing failureThreshold or initialDelaySeconds in the probe configuration.

Readiness probe not passing

If readiness probes fail but startup succeeds:

Check the /healthz endpoint for detailed diagnostics:

kubectl exec <pod-name> -c observer -n <namespace> -- \
  /usr/local/bin/observer-client --endpoint /healthz

For Kafka, verify that no under-replicated partitions exist.
For KRaft, check the LEO lag across controllers.
Consider adjusting readiness thresholds (maxURP or maxLeoLag).

Certificate errors

If you see TLS/certificate errors:

Verify certificates are mounted correctly:

kubectl exec <pod-name> -c observer -n <namespace> -- \
  ls -la /mnt/sslcerts/observer/

Verify certificate validity:

kubectl exec <pod-name> -c observer -n <namespace> -- \
  openssl x509 -in /mnt/sslcerts/observer/fullchain.pem -text -noout

Check that CA certificate matches the signing authority.

Emergency bypass

In emergency situations, you can temporarily bypass health checks using annotations:

apiVersion: platform.confluent.io/v1beta1
kind: Kafka
metadata:
  name: kafka
  annotations:
    # Bypass readiness checks (pod always ready)
    platform.confluent.io/disable-observer-readiness-checks: "true"
    # Bypass liveness checks (pod never restarted)
    platform.confluent.io/disable-observer-liveness-checks: "true"

Warning

Bypass annotations should only be used for emergency troubleshooting. They disable safety checks and may result in traffic being routed to unhealthy pods.

Configure Observer container for Confluent Platform

Overview of Observer container

Compare Observer container benefits

Self-contained health checks

Enhanced health validation

Separate startup and readiness validation

Simplified rolling restarts

Configure authentication

mTLS (Mutual TLS)

Plaintext (HTTP)

Certificate management

Configure Observer container

Basic configuration

Configure readiness thresholds

Configure probe timing

Configure Jolokia client

Configure resource limits

Multi-cluster KRaft deployments

Configuration reference

Monitor health endpoints

Startup probe (/startupz)

Readiness probe (/readyz)

Liveness probe (/livez)

Health diagnostics (/healthz)

Manage certificates

Certificate hot-reload

Certificate paths

Vault integration (DPIC mode)

Troubleshoot Observer issues

Checking Observer logs

Verifying probe health

Common issues

Emergency bypass

Startup probe (`/startupz`)

Readiness probe (`/readyz`)

Liveness probe (`/livez`)

Health diagnostics (`/healthz`)