.. _controlcenter_install_cp:

==========================
Manage |c3-short| for |cp|
==========================

|c3-short| is a component of |cp| and is :ref:`installed <installation>` as part 
of the |cp| bundle.

.. include:: ../../includes/cp-cta.rst

^^^^^^^^^^^^^^^^^^^
System Requirements
^^^^^^^^^^^^^^^^^^^

To use |c3-short|, you must have access to the host that runs the application. You can 
:ref:`configure <controlcenter_configuration>` the network port that |c3-short| uses to 
serve data. Because |c3-short| is a web application, you can use a proxy to control 
and secure access to it.

For the complete |c3-short| system requirements, see the 
:ref:`Confluent Platform system requirements <system-requirements>`.

^^^^^
Modes
^^^^^

.. include:: ../includes/modes.rst


.. _c3-data-retention:

^^^^^^^^^^^^^^
Data retention
^^^^^^^^^^^^^^

|c3-short| stores cluster metadata and user data (alerts triggers/actions) in 
the ``_confluent-command`` topic.
This topic is not changed during an upgrade. To reset the topic, change the 
``confluent.controlcenter.command.topic`` config
to something else (for example, ``_confluent-command-2``) and restart 
|c3-short|.
This will re-index the cluster metadata and remove all triggers/actions.

Retention defaults
^^^^^^^^^^^^^^^^^^

|c3-short| has the following retention defaults:

- Monitoring topic (``_confluent-monitoring``): three days' worth of data
- Metrics topic (``_confluent-metrics``): three days' worth of data
- Command topic (``_confluent-command``): one day's worth of data
- Each internal topic: seven days' worth of data, except for 
  :ref:`internal metrics and monitoring topics <retention-internal-topics>` 

|

This means that you can take |c3-short| down for maintenance for as long as 24 
hours without data loss.

You can change these values by setting the following configuration parameters:

- ``confluent.monitoring.interceptor.topic.retention.ms``
- ``confluent.metrics.topic.retention.ms``
- ``confluent.controlcenter.internal.topics.retention.ms``

Although configurable, reducing the retention of the command topic (``confluent.controlcenter.command.topic.retention.ms``)
has negligible impact on |c3-short|'s footprint.

.. _retention-internal-topics:

Retention for internal metrics and monitoring
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

|c3-short| also has other internal topics that it uses for aggregations. 
Data on these topics is kept with different retention periods based on the data 
type:

- Internal streams monitoring data is held at two retention levels: 96 hours for granular 
  data; and 700 days for historical data. For example, if you have the same 
  number of clients reading and writing granular data from the same number of 
  topics, the amount of space that is required is about twice the amount needed 
  for running at 96 hours.
- Internal metrics data has a retention period of seven days. With a constant number of 
  topic partitions in a cluster, the amount of data that is used for metrics 
  data should grow linearly and max out after seven days of accumulation.

By default, |c3-short| stores three copies on all topic partitions for 
availability and fault tolerance.

The full set of configuration options are documented in 
:ref:`controlcenter_configuration`.

.. include:: ../../includes/cp-demo-tip.rst

^^^^^^^^^^^^^^^^^^^^^^^^^^
Partitions and replication
^^^^^^^^^^^^^^^^^^^^^^^^^^

Define the number of partitions (``<num-partitions>``) and replication 
(``<num-replication>``) settings for |c3-short| by
adding these lines to the appropriate properties file 
(``<path-to-file>/etc/confluent-control-center/control-center.properties``).

.. codewithvars:: bash

  confluent.controlcenter.internal.topics.partitions=<num-partitions>
  confluent.controlcenter.internal.topics.replication=<num-replication>
  confluent.controlcenter.command.topic.replication=<num-replication>
  confluent.monitoring.interceptor.topic.partitions=<num-partitions>
  confluent.monitoring.interceptor.topic.replication=<num-replication>
  confluent.metrics.topic.partitions=<num-partitions>
  confluent.metrics.topic.replication=<num-replication>


For more information, see the :ref:`controlcenter_configuration`.

.. _config-c3-multi-cluster:

^^^^^^^^^^^^^^^^^^^^^^^^^^^
Multi-cluster configuration
^^^^^^^^^^^^^^^^^^^^^^^^^^^

You can use |c3-short| to manage and monitor multiple |ak-tm| clusters. As an alternative,
you can use :ref:`health-plus` to monitor a multi-cluster configuration. For more information, see
:ref:`enable-health-plus`.

All metric data from the interceptors and metrics reporters is tagged by |ak| cluster ID and 
aggregated in |c3-short| by cluster ID. The cluster ID is randomly generated by Apache |ak|, 
but you can assign meaningful names using |c3-short|.

For multi-cluster configurations in |c3-short|, if you are adding additional 
connection configurations and specifying a cluster ``<name>`` instead of 
a cluster id, do not include ``.streams`` in the parameter string. 
See the :ref:`connection config <c3_addl_connection_config_cluster_name>` setting description for details. 

Prerequisites
^^^^^^^^^^^^^

- |c3-short| must be installed, and running in Normal mode. 
- Multiple |ak| clusters must be already running. You cannot deploy new clusters 
  with |c3-short|.
- Each |ak| cluster must have :ref:`metrics_reporter` configured to enable 
  monitoring.
- Each |ak| cluster must be specified in the |c3-short| configuration using its 
  own ``confluent.controlcenter.kafka.<name>.bootstrap.servers`` configuration. 
  See :ref:`controlcenter_configuration` for more details.

.. seealso:: For an example that shows |c3-short| and a multi-cluster configuration 
         in action, see the :ref:`Multi-datacenter GitHub demo <replicator>` and 
         refer to the demo's ``docker-compose.yml`` for a configuration reference.

There are two basic methods for configuring the interceptor and metrics reporter 
plugins in multi-cluster environments:  *direct* and *replicated*. With either 
method, you install a single |c3-short| server and connect to a |ak| 
cluster. This cluster acts as the storage and coordinator for |c3-short|.

- **Direct:** Using the direct method, the plugins will report the data directly 
  to the |c3-short| cluster. If your network topology allows direct communication 
  from interceptors and metrics reporters to the |c3-short| cluster, the direct 
  method is the recommended solution.

- **Replicated:** Using the replicated method, the plugins will report data to a 
  local |ak| cluster that they have access
  to. A replicator process will copy the data to the |c3-short| cluster. For 
  more information, see the
  :ref:`Replicator quick start <replicator_quickstart>`. The replicated 
  configuration is simpler to use when deploying
  interceptors, because they will report to the local cluster by default. 
  Use this method if you have a network
  topology that prevents |c3-short| plugins from communicating directly with the 
  |c3-short| cluster, or if
  you are already using Replicator and you are familiar with its operations.


**Direct**

You can configure interceptors to send metrics data directly to the |c3-short| 
|ak| cluster. This cluster might be separate from the |ak| cluster that the 
Client being monitored is connected to.

.. figure:: ../../images/kafka_cluster_1_ports.png
    :scale: 50%


    Example direct configuration. Solid lines indicate flow of interceptor data.

The primary advantage of this method is its robust protection against 
availability issues with the cluster being monitored.

The primary disadvantage is that every |ak| client must be configured with the 
|c3-short| |ak| cluster
connection parameters. This could potentially be more time consuming, 
particularly if :ref:`controlcenter_security` is enabled.

Here is an example configuration for a client:

.. codewithvars:: bash

    bootstrap.servers=kafka-cluster-1:9092 # this is the cluster your clients are talking to
    confluent.monitoring.interceptor.bootstrap.servers=kafka-cluster-2:9092 # this is the Control Center cluster

**Replicated**

By default, interceptors and metric reporters send metric data to the same |ak| 
cluster they are monitoring. You can
use :ref:`Confluent Replicator <replicator_quickstart>` to transfer and merge 
this data into the |ak| cluster that
is used by |c3-short|. The ``_confluent-monitoring`` and ``_confluent-metrics`` 
topics must be replicated to the |c3-short| cluster.

.. figure:: ../../images/kafka_cluster_2_port_sans_mm.png
    :scale: 50%


    Example replicated configuration. Solid lines indicate flow of interceptor and cluster data.


^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dedicated metric data cluster
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You can send your monitoring data to an existing |ak| cluster or configure a 
dedicated cluster for this purpose.

Advantages to giving |c3-short| its own |ak| cluster include:

- By hosting |c3-short| on its own |ak| cluster, it is independent of the 
  availability of the production cluster it is monitoring. For example, if there 
  are severe production issues, you will continue to receive alerts and be able
  to view the |c3-short| monitoring information. A production disaster is when 
  you need these metrics the most.
- Ease of upgrade. Future versions of |c3-short| are likely to take advantage of 
  new features of |ak|. If you use a separate |ak| cluster for |c3-short|, it 
  may be easier for you to take advantage of new features
  in future versions of |c3-short| if the upgrade path does not involve any 
  production |ak| cluster.
- The cluster may have reduced security requirements that could make it easier 
  to implement the direct strategy described.
- The |c3-short| requires a significant amount of disk space and throughput for 
  metrics collection. By giving
  |c3-short| its own dedicated cluster, you guarantee that |c3-short| workload 
  will never interfere with production traffic.

The main disadvantage of giving |c3-short| its own |ak| cluster is that a 
dedicated cluster requires additional virtual or physical hardware, setup, and 
maintenance.

.. _c3-saturation-testing:

^^^^^^^^^^^^^^^^^^
Saturation testing
^^^^^^^^^^^^^^^^^^

|c3-short| was saturation-tested on simulated monitoring data. The goal is to 
find the maximum size cluster that |c3-short| can successfully monitor, along 
several important dimensions.

Test setup
^^^^^^^^^^

|ak| cluster running on |ccloud| that consists of:

* Four |ak| nodes running on |aws| EC2 r4.xlarge.
* 232 initial topic partitions (including internal topics).

  * Replication factor of three. Each topic partition has three replicas and 
    each partition replica sends its own metrics. With X total number of topic 
    partitions, partition-level metrics for 3X partitions are sent.
  * One initial user topic partition.

* Three |ak| brokers.
* One |c3| instance running in Normal mode on |aws| EC2 m4.2xlarge.
* Two nodes generating load; one for broker monitoring and one for stream 
  monitoring.
* Each user topic is created with 12 partitions.
* Eight :ref:`streams threads <internal_streams_settings>`, which is 
  ``confluent.controlcenter.streams.num.stream.threads`` default configuration.
* JDK 8

Broker monitoring
^^^^^^^^^^^^^^^^^

|ak| metrics were generated to simulate by a cluster with three brokers and no 
producers and consumers. The number of partitions were increased on the 
simulated cluster until lag occurred on |c3-short|.

Result: The number of partitions was increased to 100,000 partitions. |c3-short| kept up with 
the incoming metrics.

Caveat: Any change to sizing or network topology would likely give different 
results.

Streams monitoring
^^^^^^^^^^^^^^^^^^

Metrics were generated as if by a cluster with three brokers and 5000 partitions 
in 250 topics. The number of consumer groups was increased to report consumption 
completeness and lag data from 1 through 100,000, in 5,000 consumer group 
increments. Each simulated consumer group included a single consumer reading 
from a single partition.

Result: At 20,000 consumer groups, |c3-short| was no longer able to keep up with 
incoming data on this server size and the reports lagged behind.

Caveat: Up to 20,000 consumers were tested, but no producers. This likely has 
impact on the monitoring capacity.

^^^^^^^^^^^^^^^^^^^
Example deployments
^^^^^^^^^^^^^^^^^^^

The following example |c3-short| setups were tested internally.

Broker monitoring
^^^^^^^^^^^^^^^^^

Given:
   * 1 |c3| (running on EC2 m4.2xlarge)
   * 3 |ak| Brokers
   * 1 |zk|
   * 200 Topics
   * 10 Partitions per Topic
   * 3x Replication Factor
   * Default JVM settings
   * Default |c3-short| config
   * Default |ak| config

Expect:
   * |c3-short| state store size ~50MB/hr
   * |ak| log size ~500MB/hr (per broker)
   * Average CPU load ~7 %
   * Allocated java on-heap memory ~580 MB and off-heap ~100 MB
   * Total allocated memory including page cache ~3.6  GB
   * Network read utilization ~150 KB/sec
   * Network write utilization ~170 KB/sec

Streams monitoring
^^^^^^^^^^^^^^^^^^

Given:
   * 1 |c3| (running on EC2 m4.2xlarge)
   * 3 |ak| Brokers
   * 1 |zk|
   * 30 Topics
   * 10 Partitions per Topic
   * 150 Consumers
   * 50 Consumer Groups
   * 3x Replication Factor
   * Default JVM settings
   * Default |c3-short| config
   * Default |ak| config

Expect:
   * |c3-short| state store size ~1 GB/hr
   * |ak| log size ~1 GB/hr (per broker)
   * Average CPU load ~8 %
   * Allocated java on-heap memory ~600 MB and off-heap ~100 MB
   * Total allocated memory including page cache ~4 GB
   * Network read utilization ~160 KB/sec
   * Network write utilization ~180 KB/sec

Next steps
^^^^^^^^^^

- For troubleshooting information, see :ref:`controlcenter_troubleshooting`.
- For a complete example that includes stream monitoring, see 
  the :ref:`quickstart`.