Operations

This section is not an exhaustive guide to running Control Center in production, but it covers the key things to consider before going live.

Hardware

As of this release, Control Center must run on a single machine. The resources needed for this machine depend primarily on how many producers are monitored and how many partitions each producer writes to. The Stream Monitoring functionality of Control Center is implemented as a Kafka Streams application and consequently benefits from having a lot of memory to work with for RocksDB caches and OS page cache.

Memory

The more memory you give Control Center the better but we recommend at least 32GB of RAM. The JVM heap size can be fairly small (defaults to 6GB) but the application needs the additional memory for RocksDB in-memory indexes and caches as well as OS page cache for faster access to persistent data.

CPUs

The Stream Monitoring functionality of Control Center requires significant CPU power for data verification and aggregation. We recommend at least 8 cores. If you have more cores available, you can increase the number of threads in the Kafka Streams pool (confluent.controlcenter.streams.num.stream.threads) and increase the number of partitions on internal Control Center topics (confluent.controlcenter.internal.topics.partitions) for greater parallelism.

Disks

Control Center relies on local state in RocksDB. We recommend at least 300GB of storage space, preferably SSDs. All local data is kept in the directory specified by the confluent.controlcenter.data.dir config parameter.

Network

Control Center relies heavily on Kafka, so fast and reliable network is important for performance. Modern data-center networking (1 GbE, 10 GbE) should be sufficient.

OS

Control Center needs many open RocksDB files. Make sure the ulimit for the number of open files (ulimit -n) is at least 16384.

JVM

We recommend running the latest version of JDK 1.8 with a 6GB max heap size. JDK 1.7 is also supported.

User/Cluster Metadata

Control Center stores cluster metadata and user data (triggers/actions) in the _confluent-command topic. This topic is not changed during an upgrade. To reset it, change the confluent.controlcenter.command.topic config to something else (e.g. _confluent-command-2) and restart Control Center, this will re-index the cluster metadata and remove all triggers/actions

Kafka

The amount of storage space needed in Kafka depends on how many producers and consumers are being monitored as well as the configured retention and replication settings.

By default, Control Center keeps 3 days worth of data for the monitoring topic _confluent-monitoring and metrics topic _confluent-metrics, and 24 hours of data of all of it’s internal topics. This means that you can take Control Center down for maintenance for as long as 24 hours without data loss. You can change these values by setting the following config parameters

  • confluent.monitoring.interceptor.topic.retention.ms
  • confluent.metrics.topic.retention.ms
  • confluent.controlcenter.internal.topics.retention.ms

By default, Control Center stores 3 copies on all topic partitions for availability and fault tolerance.

The full set of configuration options are documented in Configuration.

Example Deployments

Here are some example Control Center setup we tested internally.

Broker Monitoring

Given:
  • 1 Confluent Control Center (running on EC2 m4.2xlarge)
  • 3 Kafka Brokers
  • 1 Zookeeper
  • 200 Topics
  • 10 Partitions per Topic
  • 3x Replication Factor
  • Default JVM settings
  • Default Control Center config
  • Default Kafka config
Expect:
  • Control Center state store size ~50MB/hr
  • Kafka log size ~500MB/hr (per broker)
  • Average CPU load ~7 %
  • Allocated java on-heap memory ~580 MB and off-heap ~100 MB
  • Total allocated memory including page cache ~3.6 GB
  • Network read utilization ~150 KB/sec
  • Network write utilization ~170 KB/sec

Streams Monitoring

Given:
  • 1 Confluent Control Center (running on EC2 m4.2xlarge)
  • 3 Kafka Brokers
  • 1 Zookeeper
  • 30 Topics
  • 10 Partitions per Topic
  • 150 Consumers
  • 50 Consumer Groups
  • 3x Replication Factor
  • Default JVM settings
  • Default Control Center config
  • Default Kafka config
Expect:
  • Control Center state store size ~1GB/hr
  • Kafka log size ~1GB/hr (per broker)
  • Average CPU load ~8 %
  • Allocated java on-heap memory ~600 MB and off-heap ~100 MB
  • Total allocated memory including page cache ~4 GB
  • Network read utilization ~160 KB/sec
  • Network write utilization ~180 KB/sec