.. _controlcenter_install_cp: ========================== Manage |c3-short| for |cp| ========================== |c3-short| is a component of |cp| and is :ref:`installed ` as part of the |cp| bundle. .. include:: ../../includes/cp-cta.rst ^^^^^^^^^^^^^^^^^^^ System Requirements ^^^^^^^^^^^^^^^^^^^ To use |c3-short|, you must have access to the host that runs the application. You can :ref:`configure ` the network port that |c3-short| uses to serve data. Because |c3-short| is a web application, you can use a proxy to control and secure access to it. For the complete |c3-short| system requirements, see the :ref:`Confluent Platform system requirements `. ^^^^^ Modes ^^^^^ .. include:: ../includes/modes.rst .. _c3-data-retention: ^^^^^^^^^^^^^^ Data retention ^^^^^^^^^^^^^^ |c3-short| stores cluster metadata and user data (alerts triggers/actions) in the ``_confluent-command`` topic. This topic is not changed during an upgrade. To reset the topic, change the ``confluent.controlcenter.command.topic`` config to something else (for example, ``_confluent-command-2``) and restart |c3-short|. This will re-index the cluster metadata and remove all triggers/actions. Retention defaults ^^^^^^^^^^^^^^^^^^ |c3-short| has the following retention defaults: - Monitoring topic (``_confluent-monitoring``): three days' worth of data - Metrics topic (``_confluent-metrics``): three days' worth of data - Command topic (``_confluent-command``): one day's worth of data - Each internal topic: seven days' worth of data, except for :ref:`internal metrics and monitoring topics ` | This means that you can take |c3-short| down for maintenance for as long as 24 hours without data loss. You can change these values by setting the following configuration parameters: - ``confluent.monitoring.interceptor.topic.retention.ms`` - ``confluent.metrics.topic.retention.ms`` - ``confluent.controlcenter.internal.topics.retention.ms`` Although configurable, reducing the retention of the command topic (``confluent.controlcenter.command.topic.retention.ms``) has negligible impact on |c3-short|'s footprint. .. _retention-internal-topics: Retention for internal metrics and monitoring ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |c3-short| also has other internal topics that it uses for aggregations. Data on these topics is kept with different retention periods based on the data type: - Internal streams monitoring data is held at two retention levels: 96 hours for granular data; and 700 days for historical data. For example, if you have the same number of clients reading and writing granular data from the same number of topics, the amount of space that is required is about twice the amount needed for running at 96 hours. - Internal metrics data has a retention period of seven days. With a constant number of topic partitions in a cluster, the amount of data that is used for metrics data should grow linearly and max out after seven days of accumulation. By default, |c3-short| stores three copies on all topic partitions for availability and fault tolerance. The full set of configuration options are documented in :ref:`controlcenter_configuration`. .. include:: ../../includes/cp-demo-tip.rst ^^^^^^^^^^^^^^^^^^^^^^^^^^ Partitions and replication ^^^^^^^^^^^^^^^^^^^^^^^^^^ Define the number of partitions (````) and replication (````) settings for |c3-short| by adding these lines to the appropriate properties file (``/etc/confluent-control-center/control-center.properties``). .. codewithvars:: bash confluent.controlcenter.internal.topics.partitions= confluent.controlcenter.internal.topics.replication= confluent.controlcenter.command.topic.replication= confluent.monitoring.interceptor.topic.partitions= confluent.monitoring.interceptor.topic.replication= confluent.metrics.topic.partitions= confluent.metrics.topic.replication= For more information, see the :ref:`controlcenter_configuration`. .. _config-c3-multi-cluster: ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Multi-cluster configuration ^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can use |c3-short| to manage and monitor multiple |ak-tm| clusters. As an alternative, you can use :ref:`health-plus` to monitor a multi-cluster configuration. For more information, see :ref:`enable-health-plus`. All metric data from the interceptors and metrics reporters is tagged by |ak| cluster ID and aggregated in |c3-short| by cluster ID. The cluster ID is randomly generated by Apache |ak|, but you can assign meaningful names using |c3-short|. For multi-cluster configurations in |c3-short|, if you are adding additional connection configurations and specifying a cluster ```` instead of a cluster id, do not include ``.streams`` in the parameter string. See the :ref:`connection config ` setting description for details. Prerequisites ^^^^^^^^^^^^^ - |c3-short| must be installed, and running in Normal mode. - Multiple |ak| clusters must be already running. You cannot deploy new clusters with |c3-short|. - Each |ak| cluster must have :ref:`metrics_reporter` configured to enable monitoring. - Each |ak| cluster must be specified in the |c3-short| configuration using its own ``confluent.controlcenter.kafka..bootstrap.servers`` configuration. See :ref:`controlcenter_configuration` for more details. .. seealso:: For an example that shows |c3-short| and a multi-cluster configuration in action, see the :ref:`Multi-datacenter GitHub demo ` and refer to the demo's ``docker-compose.yml`` for a configuration reference. There are two basic methods for configuring the interceptor and metrics reporter plugins in multi-cluster environments: *direct* and *replicated*. With either method, you install a single |c3-short| server and connect to a |ak| cluster. This cluster acts as the storage and coordinator for |c3-short|. - **Direct:** Using the direct method, the plugins will report the data directly to the |c3-short| cluster. If your network topology allows direct communication from interceptors and metrics reporters to the |c3-short| cluster, the direct method is the recommended solution. - **Replicated:** Using the replicated method, the plugins will report data to a local |ak| cluster that they have access to. A replicator process will copy the data to the |c3-short| cluster. For more information, see the :ref:`Replicator quick start `. The replicated configuration is simpler to use when deploying interceptors, because they will report to the local cluster by default. Use this method if you have a network topology that prevents |c3-short| plugins from communicating directly with the |c3-short| cluster, or if you are already using Replicator and you are familiar with its operations. **Direct** You can configure interceptors to send metrics data directly to the |c3-short| |ak| cluster. This cluster might be separate from the |ak| cluster that the Client being monitored is connected to. .. figure:: ../../images/kafka_cluster_1_ports.png :scale: 50% Example direct configuration. Solid lines indicate flow of interceptor data. The primary advantage of this method is its robust protection against availability issues with the cluster being monitored. The primary disadvantage is that every |ak| client must be configured with the |c3-short| |ak| cluster connection parameters. This could potentially be more time consuming, particularly if :ref:`controlcenter_security` is enabled. Here is an example configuration for a client: .. codewithvars:: bash bootstrap.servers=kafka-cluster-1:9092 # this is the cluster your clients are talking to confluent.monitoring.interceptor.bootstrap.servers=kafka-cluster-2:9092 # this is the Control Center cluster **Replicated** By default, interceptors and metric reporters send metric data to the same |ak| cluster they are monitoring. You can use :ref:`Confluent Replicator ` to transfer and merge this data into the |ak| cluster that is used by |c3-short|. The ``_confluent-monitoring`` and ``_confluent-metrics`` topics must be replicated to the |c3-short| cluster. .. figure:: ../../images/kafka_cluster_2_port_sans_mm.png :scale: 50% Example replicated configuration. Solid lines indicate flow of interceptor and cluster data. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Dedicated metric data cluster ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can send your monitoring data to an existing |ak| cluster or configure a dedicated cluster for this purpose. Advantages to giving |c3-short| its own |ak| cluster include: - By hosting |c3-short| on its own |ak| cluster, it is independent of the availability of the production cluster it is monitoring. For example, if there are severe production issues, you will continue to receive alerts and be able to view the |c3-short| monitoring information. A production disaster is when you need these metrics the most. - Ease of upgrade. Future versions of |c3-short| are likely to take advantage of new features of |ak|. If you use a separate |ak| cluster for |c3-short|, it may be easier for you to take advantage of new features in future versions of |c3-short| if the upgrade path does not involve any production |ak| cluster. - The cluster may have reduced security requirements that could make it easier to implement the direct strategy described. - The |c3-short| requires a significant amount of disk space and throughput for metrics collection. By giving |c3-short| its own dedicated cluster, you guarantee that |c3-short| workload will never interfere with production traffic. The main disadvantage of giving |c3-short| its own |ak| cluster is that a dedicated cluster requires additional virtual or physical hardware, setup, and maintenance. .. _c3-saturation-testing: ^^^^^^^^^^^^^^^^^^ Saturation testing ^^^^^^^^^^^^^^^^^^ |c3-short| was saturation-tested on simulated monitoring data. The goal is to find the maximum size cluster that |c3-short| can successfully monitor, along several important dimensions. Test setup ^^^^^^^^^^ |ak| cluster running on |ccloud| that consists of: * Four |ak| nodes running on |aws| EC2 r4.xlarge. * 232 initial topic partitions (including internal topics). * Replication factor of three. Each topic partition has three replicas and each partition replica sends its own metrics. With X total number of topic partitions, partition-level metrics for 3X partitions are sent. * One initial user topic partition. * Three |ak| brokers. * One |c3| instance running in Normal mode on |aws| EC2 m4.2xlarge. * Two nodes generating load; one for broker monitoring and one for stream monitoring. * Each user topic is created with 12 partitions. * Eight :ref:`streams threads `, which is ``confluent.controlcenter.streams.num.stream.threads`` default configuration. * JDK 8 Broker monitoring ^^^^^^^^^^^^^^^^^ |ak| metrics were generated to simulate by a cluster with three brokers and no producers and consumers. The number of partitions were increased on the simulated cluster until lag occurred on |c3-short|. Result: The number of partitions was increased to 100,000 partitions. |c3-short| kept up with the incoming metrics. Caveat: Any change to sizing or network topology would likely give different results. Streams monitoring ^^^^^^^^^^^^^^^^^^ Metrics were generated as if by a cluster with three brokers and 5000 partitions in 250 topics. The number of consumer groups was increased to report consumption completeness and lag data from 1 through 100,000, in 5,000 consumer group increments. Each simulated consumer group included a single consumer reading from a single partition. Result: At 20,000 consumer groups, |c3-short| was no longer able to keep up with incoming data on this server size and the reports lagged behind. Caveat: Up to 20,000 consumers were tested, but no producers. This likely has impact on the monitoring capacity. ^^^^^^^^^^^^^^^^^^^ Example deployments ^^^^^^^^^^^^^^^^^^^ The following example |c3-short| setups were tested internally. Broker monitoring ^^^^^^^^^^^^^^^^^ Given: * 1 |c3| (running on EC2 m4.2xlarge) * 3 |ak| Brokers * 1 |zk| * 200 Topics * 10 Partitions per Topic * 3x Replication Factor * Default JVM settings * Default |c3-short| config * Default |ak| config Expect: * |c3-short| state store size ~50MB/hr * |ak| log size ~500MB/hr (per broker) * Average CPU load ~7 % * Allocated java on-heap memory ~580 MB and off-heap ~100 MB * Total allocated memory including page cache ~3.6 GB * Network read utilization ~150 KB/sec * Network write utilization ~170 KB/sec Streams monitoring ^^^^^^^^^^^^^^^^^^ Given: * 1 |c3| (running on EC2 m4.2xlarge) * 3 |ak| Brokers * 1 |zk| * 30 Topics * 10 Partitions per Topic * 150 Consumers * 50 Consumer Groups * 3x Replication Factor * Default JVM settings * Default |c3-short| config * Default |ak| config Expect: * |c3-short| state store size ~1 GB/hr * |ak| log size ~1 GB/hr (per broker) * Average CPU load ~8 % * Allocated java on-heap memory ~600 MB and off-heap ~100 MB * Total allocated memory including page cache ~4 GB * Network read utilization ~160 KB/sec * Network write utilization ~180 KB/sec Next steps ^^^^^^^^^^ - For troubleshooting information, see :ref:`controlcenter_troubleshooting`. - For a complete example that includes stream monitoring, see the :ref:`quickstart`.