Configuring Control Center

Control Center is a component of Confluent Platform and is installed as part of the Confluent Platform bundle.

System Requirements

For the complete Control Center system requirements, see the Confluent Platform system requirements.

Important

To use Control Center, you must have access the host that runs the application. You can configure the network port that Control Center uses to serve data. Because Control Center is a web application, you can use a proxy to control and secure access to it.

Modes

Starting in Confluent Platform version 7.0.0, Control Center enables users to choose between Normal mode, which is consistent with earlier versions of Confluent Control Center and includes management and monitoring services, or Reduced infrastructure mode, meaning monitoring services are disabled, and the resource burden to operate Control Center is lowered. You configure the mode with the mode property. If the Control Center mode is not explicitly set, Confluent Control Center defaults to Normal mode.

Data retention

Control Center stores cluster metadata and user data (alerts triggers/actions) in the _confluent-command topic. This topic is not changed during an upgrade. To reset the topic, change the confluent.controlcenter.command.topic config to something else (for example, _confluent-command-2) and restart Control Center. This will re-index the cluster metadata and remove all triggers/actions.

Control Center has the following retention defaults:

  • Monitoring topic (_confluent-monitoring): three days’ worth of data
  • Metrics topic (_confluent-metrics): three days’ worth of data
  • Command topic (_confluent-command): one day’s worth of data
  • Each internal topic: seven days’ worth of data

This means that you can take Control Center down for maintenance for as long as 24 hours without data loss.

You can change these values by setting the following configuration parameters:

  • confluent.monitoring.interceptor.topic.retention.ms
  • confluent.metrics.topic.retention.ms
  • confluent.controlcenter.internal.topics.retention.ms

Although configurable, reducing the retention of the command topic (confluent.controlcenter.command.topic.retention.ms) has negligible impact on Control Center’s footprint.

Control Center also has other internal topics that it uses for aggregations. Data on these topics is kept with different retention periods based on the data type:

  • Streams Monitoring data is held at two retention levels: 96 hours for granular data; and 700 days for historical data. For example, if you have the same number of clients reading and writing granular data from the same number of topics, the amount of space that is required is about twice the amount needed for running at 96 hours.
  • Metrics data has a retention period of seven days. With a constant number of topic partitions in a cluster, the amount of data that is used for metrics data should grow linearly and max out after seven days of accumulation.

By default, Control Center stores three copies on all topic partitions for availability and fault tolerance.

The full set of configuration options are documented in Control Center Configuration Reference.

See also

For an example that shows this in action, see the Confluent Platform demo. Refer to the demo’s docker-compose.yml file for a configuration reference.

Partitions and replication

Define the number of partitions (<num-partitions>) and replication (<num-replication>) settings for Control Center by adding these lines to the appropriate properties file (<path-to-file>/etc/confluent-control-center/control-center.properties).

confluent.controlcenter.internal.topics.partitions=<num-partitions>
confluent.controlcenter.internal.topics.replication=<num-replication>
confluent.controlcenter.command.topic.replication=<num-replication>
confluent.monitoring.interceptor.topic.partitions=<num-partitions>
confluent.monitoring.interceptor.topic.replication=<num-replication>
confluent.metrics.topic.partitions=<num-partitions>
confluent.metrics.topic.replication=<num-replication>

For more information, see the Control Center Configuration Reference.

Multi-cluster configuration

You can use Control Center to manage and monitor multiple Apache Kafka® clusters. As an alternative, you can use Confluent Health+ to monitor a multi-cluster configuration. For more information, see Enable Health+.

All metric data from the interceptors and metrics reporters is tagged by Kafka cluster ID and aggregated in Control Center by cluster ID. The cluster ID is randomly generated by Apache Kafka, but you can assign meaningful names using Control Center.

Note

For multi-cluster configurations in Control Center, if you are adding additional connection configurations and specifying a cluster <name> instead of a cluster id, do not include .streams in the parameter string. See the connection config setting description for details.

Prerequisites

  • Control Center must be installed, and running in Normal mode.
  • Multiple Kafka clusters must be already running. You cannot deploy new clusters with Control Center.
  • Each Kafka cluster must have Confluent Metrics Reporter configured to enable monitoring.
  • Each Kafka cluster must be specified in the Control Center configuration using its own confluent.controlcenter.kafka.<name>.bootstrap.servers configuration. See Control Center Configuration Reference for more details.

See also

For an example that shows Control Center and a multi-cluster configuration in action, see the Multi-datacenter GitHub demo and refer to the demo’s docker-compose.yml for a configuration reference.

There are two basic methods for configuring the interceptor and metrics reporter plugins in multi-cluster environments: direct and replicated. With either method, you install a single Control Center server and connect to a Kafka cluster. This cluster acts as the storage and coordinator for Control Center.

  • Direct: Using the direct method, the plugins will report the data directly to the Control Center cluster. If your network topology allows direct communication from interceptors and metrics reporters to the Control Center cluster, the direct method is the recommended solution.
  • Replicated: Using the replicated method, the plugins will report data to a local Kafka cluster that they have access to. A replicator process will copy the data to the Control Center cluster. For more information, see the Replicator quick start. The replicated configuration is simpler to use when deploying interceptors, because they will report to the local cluster by default. Use this method if you have a network topology that prevents Control Center plugins from communicating directly with the Control Center cluster, or if you are already using Replicator and you are familiar with its operations.

Direct

You can configure interceptors to send metrics data directly to the Control Center Kafka cluster. This cluster might be separate from the Kafka cluster that the Client being monitored is connected to.

../../_images/kafka_cluster_1_ports.png

Example direct configuration. Solid lines indicate flow of interceptor data.

The primary advantage of this method is its robust protection against availability issues with the cluster being monitored.

The primary disadvantage is that every Kafka client must be configured with the Control Center Kafka cluster connection parameters. This could potentially be more time consuming, particularly if Confluent Control Center Security is enabled.

Here is an example configuration for a client:

bootstrap.servers=kafka-cluster-1:9092 # this is the cluster your clients are talking to
confluent.monitoring.interceptor.bootstrap.servers=kafka-cluster-2:9092 # this is the Control Center cluster

Replicated

By default, interceptors and metric reporters send metric data to the same Kafka cluster they are monitoring. You can use Confluent Replicator to transfer and merge this data into the Kafka cluster that is used by Control Center. The _confluent-monitoring and _confluent-metrics topics must be replicated to the Control Center cluster.

../../_images/kafka_cluster_2_port_sans_mm.png

Example replicated configuration. Solid lines indicate flow of interceptor and cluster data.

Dedicated metric data cluster

You can send your monitoring data to an existing Kafka cluster or configure a dedicated cluster for this purpose.

Advantages to giving Control Center its own Kafka cluster include:

  • By hosting Control Center on its own Kafka cluster, it is independent of the availability of the production cluster it is monitoring. For example, if there are severe production issues, you will continue to receive alerts and be able to view the Control Center monitoring information. A production disaster is when you need these metrics the most.
  • Ease of upgrade. Future versions of Control Center are likely to take advantage of new features of Kafka. If you use a separate Kafka cluster for Control Center, it may be easier for you to take advantage of new features in future versions of Control Center if the upgrade path does not involve any production Kafka cluster.
  • The cluster may have reduced security requirements that could make it easier to implement the direct strategy described.
  • The Control Center requires a significant amount of disk space and throughput for metrics collection. By giving Control Center its own dedicated cluster, you guarantee that Control Center workload will never interfere with production traffic.

The main disadvantage of giving Control Center its own Kafka cluster is that a dedicated cluster requires additional virtual or physical hardware, setup, and maintenance.

Saturation testing

Control Center was saturation-tested on simulated monitoring data. The goal is to find the maximum size cluster that Control Center can successfully monitor, along several important dimensions.

Test setup

Kafka cluster running on Confluent Cloud that consists of:

  • Four Kafka nodes running on AWS EC2 r4.xlarge.
  • 232 initial topic partitions (including internal topics).
    • Replication factor of three. Each topic partition has three replicas and each partition replica sends its own metrics. With X total number of topic partitions, partition-level metrics for 3X partitions are sent.
    • One initial user topic partition.
  • Three Kafka brokers.
  • One Confluent Control Center instance running in Normal mode on AWS EC2 m4.2xlarge.
  • Two nodes generating load; one for broker monitoring and one for stream monitoring.
  • Each user topic is created with 12 partitions.
  • Eight streams threads, which is confluent.controlcenter.streams.num.stream.threads default configuration.
  • JDK 8

Broker monitoring

Kafka metrics were generated to simulate by a cluster with three brokers and no producers and consumers. The number of partitions were increased on the simulated cluster until lag occurred on Control Center.

Result: The number of partitions was increased to 100,000 partitions. Control Center kept up with the incoming metrics.

Caveat: Any change to sizing or network topology would likely give different results.

Streams monitoring

Metrics were generated as if by a cluster with three brokers and 5000 partitions in 250 topics. The number of consumer groups was increased to report consumption completeness and lag data from 1 through 100,000, in 5,000 consumer group increments. Each simulated consumer group included a single consumer reading from a single partition.

Result: At 20,000 consumer groups, Control Center was no longer able to keep up with incoming data on this server size and the reports lagged behind.

Caveat: Up to 20,000 consumers were tested, but no producers. This likely has impact on the monitoring capacity.

Example deployments

The following example Control Center setups were tested internally.

Broker monitoring

Given:
  • 1 Confluent Control Center (running on EC2 m4.2xlarge)
  • 3 Kafka Brokers
  • 1 ZooKeeper
  • 200 Topics
  • 10 Partitions per Topic
  • 3x Replication Factor
  • Default JVM settings
  • Default Control Center config
  • Default Kafka config
Expect:
  • Control Center state store size ~50MB/hr
  • Kafka log size ~500MB/hr (per broker)
  • Average CPU load ~7 %
  • Allocated java on-heap memory ~580 MB and off-heap ~100 MB
  • Total allocated memory including page cache ~3.6 GB
  • Network read utilization ~150 KB/sec
  • Network write utilization ~170 KB/sec

Streams monitoring

Given:
  • 1 Confluent Control Center (running on EC2 m4.2xlarge)
  • 3 Kafka Brokers
  • 1 ZooKeeper
  • 30 Topics
  • 10 Partitions per Topic
  • 150 Consumers
  • 50 Consumer Groups
  • 3x Replication Factor
  • Default JVM settings
  • Default Control Center config
  • Default Kafka config
Expect:
  • Control Center state store size ~1 GB/hr
  • Kafka log size ~1 GB/hr (per broker)
  • Average CPU load ~8 %
  • Allocated java on-heap memory ~600 MB and off-heap ~100 MB
  • Total allocated memory including page cache ~4 GB
  • Network read utilization ~160 KB/sec
  • Network write utilization ~180 KB/sec

Next steps