Observability Overview and Setup

Using Confluent Cloud has the advantage of circumventing the trials and tribulations one would face when monitoring an on-prem Kafka cluster, but you still need to monitor your client applications and, to some degree, your Confluent Cloud cluster. Your success in Confluent Cloud largely depends on how well your applications are performing. Observability into the performance and status of your client applications gives you insights on how to fine tune your producers and consumers, when to scale your Confluent Cloud cluster, what might be going wrong, and how to resolve the problem.

This module covers how to set up a time-series database populated with data from the Confluent Cloud Metrics API and client metrics from a locally running Java consumer and producer, along with how to set up a data visualization tool. After the initial setup, you will follow a series of scenarios to create failure scenarios and to be alerted when the errors occur.

Note

This example uses Prometheus as the time-series database and Grafana for visualization, but the same principles can be applied to any other technologies.

Prerequisites

Cost to Run Tutorial

Caution

Any Confluent Cloud example uses real Confluent Cloud resources that may be billable. An example may create a new Confluent Cloud environment, Kafka cluster, topics, ACLs, and service accounts, as well as resources that have hourly charges like connectors and ksqlDB applications. To avoid unexpected charges, carefully evaluate the cost of resources before you start. After you are done running a Confluent Cloud example, destroy all Confluent Cloud resources to avoid accruing hourly charges for services and verify that they have been deleted.

Confluent Cloud Promo Code

To receive an additional $50 free usage in Confluent Cloud, enter promo code C50INTEG in the Confluent Cloud Console Billing & payment section (details). This promo code should sufficiently cover up to one day of running this Confluent Cloud example, beyond which you may be billed for the services that have an hourly charge until you destroy the Confluent Cloud resources created by this example.

Confluent Cloud Cluster and Observability Container Setup

The following instructions will:

  • use ccloud-stack to create a Confluent Cloud cluster, a service account with proper acls, and a client configuration file
  • create a cloud resource api-key for the ccloud-exporter
  • build a Kafka client docker image with the maven project’s dependencies cache
  • stand up numerous docker containers (1 consumer with JMX exporter, 1 producer with JMX exporter, Prometheus, Grafana, a ccloud-exporter, and a node-exporter) with docker-compose
  1. Log in to the Confluent Cloud CLI:

    ccloud login --save
    

    The --save flag will save your Confluent Cloud login credentials to the ~/.netrc file.

  2. Clone the confluentinc/examples GitHub repository.

    git clone https://github.com/confluentinc/examples.git
    
  3. Navigate to the examples/ccloud-observability/ directory and switch to the Confluent Platform release branch:

    cd examples/ccloud-observability/
    git checkout 6.2.0-post
    
  4. Setup a Confluent Cloud cluster, secrets, and observability components by running start.sh script:

    ./start.sh
    
  5. It will take up to 3 minutes for data to become visible in Grafana. Open Grafana and use the username admin and password password to login. Now you are ready to proceed to Producer, Consumer, or General scenarios to see what different failure scenarios look like.

Validate Setup

  1. Validate the producer and consumer Kafka clients are running. From the Confluent Cloud Console, view the Data flow in your newly created environment and Kafka cluster.

    Data Flow

  2. Navigate to the Prometheus Targets page.

    Prometheus Targets Unknown

    This page will show you if Prometheus is scraping the targets you have created. It should look like below after 2 minutes if everything is working. You may need to refresh the page.

    Prometheus Targets Up

  3. It will take up to 3 minutes for data to become visible in Grafana. Open Grafana and use the username admin and password password to login.

  4. Now you are ready to proceed to Producer, Consumer, or General scenarios to see what different failure scenarios look like.