.. _rebalancer:

Auto Data Balancing
===================

The ``confluent-rebalancer`` tool balances data so that the number of leaders and 
disk usage are even across brokers and racks on a per topic and cluster level while
minimizing data movement. It also integrates closely with the replication quotas 
feature in |ak-tm| to dynamically throttle data-balancing traffic.

The tool is part of |cp| and can also be installed on its own using the ``confluent-rebalancer`` 
package.

.. warning::  After you start rebalancer, you cannot stop it without causing large 
              disruptions in |cp|. The rebalancer tool employs an all or nothing 
              approach, and does not perform rebalancing in batches. Therefore, when you have a high 
              number of partition reassignments, you must also use the reassignment tool. For 
              example, if you attempt to rebalance over 8000 partitions, it would kill the 
              controller. The rebalancer tool generates a proposed assignment that you can 
              pass to the reassignment tool to move partitions:
    
              .. codewithvars:: bash  
                 
                 confluent-rebalancer proposed-assignment --metrics-bootstrap-server localhost:9092 --zookeeper localhost:2181 > test.json
                 
              For details about using the reassignment tool, see :ref:`scaling-the-cluster`.   

Requirements
------------

To compute the rebalance plan, the tool relies on metrics collected from the |ak-tm| 
cluster. This data is published by the :ref:`Confluent Metrics Reporter <metrics_reporter>` 
to a configurable |ak| topic (``_confluent-metrics`` by default) in a configurable |ak| cluster.

To enable the Metrics Reporter, see the :ref:`installation instructions <metrics_reporter>`.

|cadb-full| Quick Start
-----------------------

Start a |ak| cluster
~~~~~~~~~~~~~~~~~~~~~

Start a |zk| server. In this example, assume services run on ``localhost``.

.. codewithvars:: bash

   # Start ZooKeeper.  Run this command in its own terminal.
   ./bin/zookeeper-server-start ./etc/kafka/zookeeper.properties

.. include:: ../../includes/installation-types-zip-tar.rst

Copy the broker configuration file to a temporary location, enable the metrics 
reporter, and duplicate the config for additional brokers.

.. codewithvars:: bash

   # Copy the configuration files to /tmp
   cp ./etc/kafka/server.properties /tmp/server0.properties

   # Add metrics reporter configurations (alternatively, you could uncomment the configurations)
   echo "" >> /tmp/server0.properties
   echo "metric.reporters=io.confluent.metrics.reporter.ConfluentMetricsReporter" >> /tmp/server0.properties
   echo "confluent.metrics.reporter.bootstrap.servers=localhost:9092" >> /tmp/server0.properties
   echo "confluent.metrics.reporter.topic.replicas=1" >> /tmp/server0.properties

   # properties for broker.id=1
   cp /tmp/server0.properties /tmp/server1.properties
   sed -i'' -e "s/broker.id=0/broker.id=1/g" /tmp/server1.properties
   sed -i'' -e "s/9092/9082/g" /tmp/server1.properties
   sed -i'' -e "s/#listen/listen/g" /tmp/server1.properties
   sed -i'' -e "s/kafka-logs/kafka-logs-1/g" /tmp/server1.properties

   # properties for broker.id=2
   cp /tmp/server0.properties /tmp/server2.properties
   sed -i'' -e "s/broker.id=0/broker.id=2/g" /tmp/server2.properties
   sed -i'' -e "s/9092/9072/g" /tmp/server2.properties
   sed -i'' -e "s/#listen/listen/g" /tmp/server2.properties
   sed -i'' -e "s/kafka-logs/kafka-logs-2/g" /tmp/server2.properties

   # properties for broker.id=3
   cp /tmp/server0.properties /tmp/server3.properties
   sed -i'' -e "s/broker.id=0/broker.id=3/g" /tmp/server3.properties
   sed -i'' -e "s/9092/9062/g" /tmp/server3.properties
   sed -i'' -e "s/#listen/listen/g" /tmp/server3.properties
   sed -i'' -e "s/kafka-logs/kafka-logs-3/g" /tmp/server3.properties

Start the |ak| brokers:

.. codewithvars:: bash

   # Start Kafka.  Run these commands in a separate terminal:
   ./bin/kafka-server-start /tmp/server0.properties &
   ./bin/kafka-server-start /tmp/server1.properties &
   ./bin/kafka-server-start /tmp/server2.properties &
   ./bin/kafka-server-start /tmp/server3.properties &

For complete details on getting these services up and running see the :ref:`quickstart <quickstart>`
for |cp|.

Create topics and produce data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Create a couple of topics, each with 4 partitions and a replication factor 
of 2. We intentionally create an unbalanced assignment:

.. codewithvars:: bash

   ./bin/kafka-topics --create --topic topic-a --replica-assignment 0:1,0:1,0:1,0:1 --bootstrap-server localhost:9092
   ./bin/kafka-topics --create --topic topic-b --replica-assignment 1:0,2:1,1:2,2:1 --bootstrap-server localhost:9092

Examine how they look:

.. codewithvars:: bash

   ./bin/kafka-topics --describe --topic topic-a --bootstrap-server localhost:9092

   Topic:topic-a	PartitionCount:4	ReplicationFactor:2	Configs:
      Topic: topic-a	Partition: 0	Leader: 0	Replicas: 0,1	Isr: 0,1
      Topic: topic-a	Partition: 1	Leader: 0	Replicas: 0,1	Isr: 0,1
      Topic: topic-a	Partition: 2	Leader: 0	Replicas: 0,1	Isr: 0,1
      Topic: topic-a	Partition: 3	Leader: 0	Replicas: 0,1	Isr: 0,1

   ./bin/kafka-topics --describe --topic topic-b --bootstrap-server localhost:9092

   Topic:topic-b	PartitionCount:4	ReplicationFactor:2	Configs:
     Topic: topic-b	Partition: 0	Leader: 1	Replicas: 1,0	Isr: 1,0
     Topic: topic-b	Partition: 1	Leader: 2	Replicas: 2,1	Isr: 2,1
     Topic: topic-b	Partition: 2	Leader: 1	Replicas: 1,2	Isr: 1,2
     Topic: topic-b	Partition: 3	Leader: 2	Replicas: 2,1	Isr: 2,1

Now produce some data:

.. codewithvars:: bash

   ./bin/kafka-producer-perf-test --topic topic-a --num-records 200000 --record-size 1000 --throughput 10000000 --producer-props bootstrap.servers=localhost:9092
   ./bin/kafka-producer-perf-test --topic topic-b --num-records 800000 --record-size 1000 --throughput 10000000 --producer-props bootstrap.servers=localhost:9092

And finally, force the creation of the offsets topic by running a consumer:

.. codewithvars:: bash

   ./bin/kafka-consumer-perf-test --topic topic-a --broker-list localhost:9092 --messages 10

.. _execute-rebalancer:

Execute the rebalancer
~~~~~~~~~~~~~~~~~~~~~~

Before starting, it's worth mentioning that if you run ``./bin/confluent-rebalancer`` 
with no arguments, it will output a list of supported commands along with a description.

Begin with the ``execute`` command. Specify the bootstrap servers for the |ak| cluster
containing the metrics topic and the maximum bandwidth (in bytes per second) allocated to
moving replicas. The ``verbose`` flag includes per broker statistics in the CLI output,
which is useful unless the number of brokers is very high.

.. codewithvars:: bash

   ./bin/confluent-rebalancer execute --bootstrap-server localhost:9092 --metrics-bootstrap-server localhost:9092 --throttle 10000000 --verbose

You are presented with a rebalancing plan, which will be slightly different 
depending on whether the |ak| brokers are configured to use a single log directory 
(as shown in our example) or multiple log directories. In the former case, the 
rebalancer ensures that the volume containing the log directory has at least the 
specified percentage of free space during and after the rebalance (20% is the default).

.. codewithvars:: bash

   Computing the rebalance plan (this may take a while) ...
   You are about to move 17 replica(s) for 14 partitions to 4 broker(s) with total size 827.2 MB.
   The preferred leader for 14 partition(s) will be changed.
   In total, the assignment for 15 partitions will be changed.
   The minimum free volume space is set to 20.0%.

   The following brokers will have less than 40% of free volume space during the rebalance:
   	 Broker     Current Size (MB)  Size During Rebalance (MB)   Free % During Rebalance      Size After Rebalance (MB)    Free % After Rebalance
   	 0          413.6              620.4                        30.1                         519.6                        30.5
   	 2          620.4              723.8                        30.1                         520.8                        30.5
   	 3          0                  517                          30.1                         520.8                        30.5
   	 1          1,034              1,034                        30.1                         519.6                        30.5

   Min/max stats for brokers (before -> after):
  	 Type  Leader Count                 Replica Count                Size (MB)
  	 Min   12 (id: 3) -> 17 (id: 0)     37 (id: 3) -> 43 (id: 3)     0 (id: 3) -> 517 (id: 1)
  	 Max   21 (id: 0) -> 17 (id: 0)     51 (id: 1) -> 45 (id: 0)     1,034 (id: 1) -> 517 (id: 3)
   No racks are defined.

   Broker stats (before -> after):
     Broker     Leader Count    Replica Count   Size (MB)            Free Space (%)
  	 0          21 -> 17        48 -> 45        413.6 -> 517       30.5 -> 30.5
  	 1          20 -> 17        51 -> 44        1,034 -> 517       30.5 -> 30.5
  	 2          15 -> 17        40 -> 44        620.4 -> 517       30.5 -> 30.5
  	 3          12 -> 17        37 -> 43        0 -> 517           30.5 -> 30.5

   Would you like to continue? (y/n):

Because we are running all brokers in the same volume, the free space numbers don't 
change after the rebalance. If configuring multiple log directories, pay close 
attention to the additional disk space requirements during and after the rebalance to ensure that
it won't cause a broker to run out of disk space.

In addition, the ``min/max`` stats provide a quick summary of the data balance 
improvement after the rebalance completes. The goal should be for the ``min`` and 
``max`` values to be closer to each other after the rebalance. In this case, we 
achieve near optimal balance so the numbers are virtually identical.

If you proceed, you will see the following (don't ignore the warning):

.. codewithvars:: bash

   Rebalance started, its status can be checked via the status command.

   Warning: You must run the status or finish command periodically, until the rebalance completes, to ensure the throttle is removed. You can also alter the throttle by re-running the execute command passing a new value.

At this point, the tool will exit and the rebalance will take place in the 
background (driven by the |ak| Controller).

Check status and finish
~~~~~~~~~~~~~~~~~~~~~~~

Check the status of the rebalance:

.. codewithvars:: bash

   ./bin/confluent-rebalancer status --zookeeper localhost:2181

   Partitions being rebalanced:
	   Topic topic-a: 1,2
	   Topic topic-b: 0

Eventually, the rebalance will complete (``finish`` and ``status`` are similar; 
the latter has more concise output and returns a 0 exit status code when the 
rebalance is finished):

.. codewithvars:: bash

   ./bin/confluent-rebalancer finish --zookeeper localhost:2181

   The rebalance has completed and throttling has been disabled

Verify that the replica assignment is balanced:

.. codewithvars:: bash

   ./bin/kafka-topics --describe --topic topic-a --bootstrap-server localhost:9092

   Topic:topic-a	PartitionCount:4	ReplicationFactor:2	Configs:
	   Topic: topic-a	Partition: 0	Leader: 3	Replicas: 3,2	Isr: 2,3
	   Topic: topic-a	Partition: 1	Leader: 2	Replicas: 2,3	Isr: 2,3
	   Topic: topic-a	Partition: 2	Leader: 0	Replicas: 1,0	Isr: 0,1
	   Topic: topic-a	Partition: 3	Leader: 0	Replicas: 0,1	Isr: 0,1

   ./bin/kafka-topics --describe --topic topic-b --bootstrap-server localhost:9092

   Topic:topic-b	PartitionCount:4	ReplicationFactor:2	Configs:
	   Topic: topic-b	Partition: 0	Leader: 3	Replicas: 3,0	Isr: 0,3
	   Topic: topic-b	Partition: 1	Leader: 0	Replicas: 0,3	Isr: 0,3
	   Topic: topic-b	Partition: 2	Leader: 1	Replicas: 1,2	Isr: 1,2
	   Topic: topic-b	Partition: 3	Leader: 2	Replicas: 2,1	Isr: 2,1

Partition sizes are communicated asynchronously (every 15 seconds 
by default and configurable using the ``confluent.metrics.reporter.publish.ms`` 
configuration), so there may be a delay before the tool reports the correct information after a 
rebalance completes, which is particularly noticeable when conducting local tests such as this.

Leader balance
~~~~~~~~~~~~~~

If ``auto.leader.rebalance.enable`` is disabled on your brokers, run the preferred 
leader election tool after the rebalance completes. This will ensure that the 
actual leaders are balanced (not just the preferred leaders).

.. sourcecode:: bash

   ./bin/kafka-preferred-replica-election --bootstrap-server localhost:9092
   
   Created preferred replica election path with _confluent-metrics-7,_confluent-metrics-2,_confluent-metrics-5,_confluent-metrics-11,_confluent-metrics-10,topic-b-2,topic-a-1,_confluent-metrics-1,topic-a-3,__confluent.support.metrics-0,topic-a-0,_confluent-metrics-9,_confluent-metrics-0,topic-a-2,topic-b-3,_confluent-metrics-3,_confluent-metrics-4,_confluent-metrics-6,topic-b-1,_confluent-metrics-8,topic-b-0
   Successfully started preferred replica election for partitions Set(_confluent-metrics-7, _confluent-metrics-2, _confluent-metrics-5, _confluent-metrics-11, _confluent-metrics-10, topic-b-2, topic-a-1, _confluent-metrics-1, topic-a-3, __confluent.support.metrics-0, topic-a-0, _confluent-metrics-9, _confluent-metrics-0, topic-a-2, topic-b-3, _confluent-metrics-3, _confluent-metrics-4, _confluent-metrics-6, topic-b-1, _confluent-metrics-8, topic-b-0)


.. sourcecode:: bash

   # run again to see the completion
   ./bin/kafka-preferred-replica-election --bootstrap-server localhost:9092
   
   Successfully completed preferred replica election for partitions topic-b-3, topic-b-0, _confluent-metrics-1, _confluent-metrics-5, _confluent-metrics-9, topic-a-1, _confluent-metrics-2, topic-b-1, _confluent-metrics-6, __confluent.support.metrics-0, _confluent-metrics-10, topic-a-2, _confluent-metrics-3, topic-b-2, topic-a-3, _confluent-metrics-7, _confluent-metrics-4, _confluent-metrics-11, _confluent-metrics-8, _confluent-metrics-0, topic-a-0

Decommissioning brokers
~~~~~~~~~~~~~~~~~~~~~~~

The ``confluent-rebalancer`` tool automatically generates a rebalance plan for 
decommissioning brokers using the ``--remove-broker-ids`` option. The 
cluster is balanced in the same execution.

.. codewithvars:: bash

   ./bin/confluent-rebalancer execute --zookeeper localhost:2181 --metrics-bootstrap-server localhost:9092 --throttle 100000 --remove-broker-ids 1

   Computing the rebalance plan (this may take a while) ...
   You are about to move 48 replica(s) for 48 partitions to 3 broker(s) with total size 775.5 MB.
   The preferred leader for 20 partition(s) will be changed.
   In total, the assignment for 49 partitions will be changed.

   You have requested all replicas to be moved out of 1 broker(s) with ID(s): 1.
   After the rebalance, these broker(s) will have no replicas.

   The following brokers will require more disk space during the rebalance and, in some cases, after the rebalance:
   	Broker     Current (MB)    During Rebalance (MB)  After Rebalance (MB)
   	0          517             775.5                  568.7
   	2          517             775.5                  723.8
   	3          517             775.5                  775.5

   Min/max stats for brokers (before -> after):
   	Type  Leader Count                 Replica Count                Size (MB)
   	Min   17 (id: 0) -> 0 (id: 1)      43 (id: 3) -> 0 (id: 1)      517 (id: 2) -> 0 (id: 1)
   	Max   17 (id: 0) -> 23 (id: 0)     45 (id: 0) -> 59 (id: 0)     517 (id: 1) -> 775.5 (id: 3)
   No racks are defined.

   Would you like to continue? (y/n):

Given that we have 8 partitions with roughly the same amount of data, it's not possible for the 3 remaining brokers to have
the same amount of data. Broker ``2`` would have 2 partitions after the rebalance and therefore, less data.

Limiting bandwidth usage during data migration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use the ``--throttle`` option to limit the amount of bandwidth used for 
replication. It is possible to update the value while a rebalance is in progress 
by simply rerunning the tool.

.. codewithvars:: bash

   ./bin/confluent-rebalancer execute --zookeeper localhost:2181 --metrics-bootstrap-server localhost:9092 --throttle 100000

   The throttle rate was updated to 100000 bytes/sec.
   A rebalance is currently in progress for:
	   Topic topic-b: 0,1

See :ref:`Throttling <throttling>` for more details.

Use JMX for monitoring
~~~~~~~~~~~~~~~~~~~~~~

Authentication is disabled for Java Management Extensions (JMX) by default in |ak|. You must use environment variables to override these defaults.

Use ``KAFKA_JMX_OPTS`` for processes started using the CLI or set the appropriate Java system properties.

For each platform component, the :ref:`options to override the JMX defaults <jmx-overrides-for-cp>` take the form ``<component-name>_JMX_OPTS``.

For the ``confluent-rebalancer``, use the environment variable ``REBALANCER_JMX_OPTS`` for processes started using the CLI or set the appropriate Java system properties.

Licensing
---------

A license was not specified as part of the quick start. If you have a Confluent license, you can specify 
it using the ``confluent.license`` configuration. For more details about licenses, see :ref:`controlcenter_licenses`.
   
Suggested Reading
-----------------

- :ref:`Tutorial: Automatic Data Balancing on Docker <automatic_data_balancing>`
- :ref:`rebalancer-config-options`

.. toctree::
   :maxdepth: 3
   :hidden:
   
   configuration_options