What are Self-Balancing Clusters?
Confluent Platform deployments can run hundreds of brokers, manage thousands of topics and
produce billions of messages per hour. Every day brokers die, new topics are
created and deleted, and partitions must be reassigned to balance the workload.
This can overload teams tasked with managing Confluent Platform runtime operations.
Self-Balancing automates your resource workload balancing, provides failure detection and
self-healing, and allows you to add or decommission brokers as needed, with no
manual tuning required.
- Fully automated load balancing
- Auto-monitoring of clusters for imbalances based on a large set of parameters, configurations, and runtime variables
- Continuous metrics aggregation and rebalancing plans, generated instantaneously in most cases, and executed automatically
- Automatic triggering of rebalance operations based on simple configurations you set on Confluent Control Center or in Kafka
You can choose to auto-balance Only when brokers are added or Anytime, which rebalances for any uneven load.
- At-a-glance visibility into the state of your clusters, and the strategy and progress of auto-balancing through a few key metrics
How it Works
Self-Balancing Clusters optimize Kafka load awareness. Resource usage within a cluster can be
heterogeneous. Kafka out-of-the-box does not provide an automated process to improve cluster
balance, but rather requires manual calculation of how to reassign partitions.
Self-Balancing simplifies this process by moving data to spread the cluster load evenly.
Self-Balancing defines the meaning of “even” based on built-in goals, although you can provide
optional input through configurations.
Architecture of a Self-Balancing Cluster
Kafka brokers collect metrics and feed the data to an internal topic on the controller, which is on the lead broker for the cluster.
The controller, therefore, plays a key role in cluster balancing. You can think of the controller as the node on which Self-Balancing is actively running.
The metrics are processed and decisions are made, based on goals.
This data is fed to other internal Kafka topics for monitoring and potential actions, such as generating a load balancing plan or triggering a rebalance.
You can view all topics, including internal topics, by using the
kafka-topics --list command while Confluent Platform is running, for example:
kafka-topics --list --bootstrap-server localhost:9092. Self-Balancing internal topics are prefixed with
Enabling Self-Balancing Clusters
- For each broker in your cluster, set
confluent.balancer.enable=true to enable Self-Balancing and make sure that this line is uncommented.
Confluent Platform ships with Self-Balancing set to enabled in the example file:
$CONFLUENT_HOME/etc/server.properties. To learn more, see confluent.balancer.enable.
- To make Self-Balancing accessible for Configuration and Monitoring from Control Center, configure the Control Center cluster with REST endpoints
to enable HTTP servers on the brokers, as described in Required Configurations for Control Center.
What defines a “balanced” cluster and what triggers a rebalance?
At a high level, Self-Balancing distinguishes between two types of imbalances:
- Intentional operational actions, such as adding or removing brokers. In these cases, Self-Balancing saves the operator time and manual steps that would otherwise be required.
- Ongoing cluster operations (such as a hot topic or partition). This balancing is ongoing throughout the life of the cluster.
These map to the two high level configuration options you have with Self-Balancing enabled:
- Rebalance only when a broker is added
- Rebalance anytime (for any uneven load, including changes in number of available brokers)
The first case is clear-cut. If brokers are added, self-healing occurs to
redistribute data to the new broker or offload data from the missing broker.
The second case is more nuanced. To achieve ongoing cluster and data balance,
Self-Balancing Clusters optimize on a number of goals and also avoid unnecessary movements
if rebalancing would not materially improve cluster performance. Goals include
considerations for replica placement and capacity, replication factors and
throughput, multiple metrics on topics and partitions, leadership, rack
awareness, disk usage and capacity, processor usage and capacity, network
throughputs per broker, numerous load distribution targets, and more.
Self-Balancing Clusters employ continuous monitoring and data collection to track performance
against these goals, and generate plans for rebalancing. A rebalance may or may
not be triggered based on the implications of all weighted factors for the
topology at a given moment. Furthermore, the balancing algorithm is approximate, and
influenced by the factors described above.
In both cases, the cluster also rebalances when a broker is removed by user request (either from the
command line or the Control Center),
or if a broker goes missing some period of time (as specified by confluent.balancer.heal.broker.failure.threshold.ms).
Therefore, even with Self-Balancing set to Rebalance only when a broker is added, the cluster will rebalance for missing brokers unless you
intentionally set properties to prevent this (for example, if
confluent.balancer.heal.broker.failure.threshold.ms is set to
See also Replica Placement and Rack Configurations.
What happens if the lead broker (controller) is removed or lost?
In a multi-broker cluster, one of the brokers is the leader or controller and
plays a key role in Self-Balancing (as described in Architecture of a Self-Balancing Cluster). What happens if the
broker where the controller is running is intentionally removed or crashes?
- There is no impact to cluster integrity.
- When the controller is removed or lost, a new controller is elected.
- A broker removal request persists once it is made. If another broker becomes the controller,
the new controller restarts and resumes the broker removal process, possibly with some delay.
- The new leader will pick up the remove broker
request and complete it.
- If the lead broker is lost during an in-progress “add broker” operation, the “add broker” operation will not complete, and is marked as failed.
If the new controller does not pick this up, you may need to restart the broker you were trying to add.
See also, Troubleshooting.
How do the brokers leverage Cruise Control?
Confluent Self-Balancing Clusters leverages Cruise Control
for continuous metrics aggregation and reporting, reassignment algorithms and plans, and some
rebalance triggers. Unlike Cruise Control, which has to be managed separately from Kafka, Self-Balancing
is built into the brokers, meaning that data balancing is supported out of the box without additional
dependencies. The result is custom-made, automated load balancing optimized for your Kafka clusters
on Confluent Platform, and designed to work seamlessly with other components like Tiered Storage and Multi-Region Clusters.
- Self-Balancing does not support JBOD (just a bunch of disks), also known as spanning, to make multiple disks appear as one.
- If a broker contains the only replica of a partition, Self-Balancing will block elective attempts
to remove the broker to prevent potential data loss (the broker removal operation will fail).
The best way to remedy this is to increase replication factors on topics and internal topics
on the broker you want to remove, as described in Configure replication factors for Self-Balancing in the tutorial.
- Attempts to remove a broker immediately after cluster startup (while Self-Balancing is initializing) can fail due to insufficient metrics,
and attempts to remove the lead broker can also fail at another phase. The solution is to retry broker removal after a period of time.
If the broker is a controller, you must run broker removal from the command line, as it may not be available on the Control Center.
To learn more, see Broker removal attempt fails during Self-Balancing initialization in Troubleshooting.
Configuration and Monitoring
Self-Balancing Clusters are self-managed, and can be enabled while the cluster is running. In
most cases, you should not have to tinker with the defaults. Simply enable Self-Balancing
either before or after you start the cluster, and allow it to auto-balance as
In the example
server.properties file that ships with Confluent Platform,
confluent.balancer.enable is set to
true, which means Self-Balancing is on.
Using Control Center
You can change these Self-Balancing settings from the Confluent Control Center (http://localhost:9021/) while the cluster is running:
Select a cluster, click Cluster settings and select the Self-balancing tab.
The current settings for Self-Balancing are shown.
Click Edit Settings.
Make changes and click Save.
The updated settings are put into effect and reflected on the Self-balancing tab.
To view the property names for the Self-Balancing settings in effect (while editing or monitoring them), select Show raw configs.
Metrics for Monitoring a Rebalance
Confluent Platform exposes several metrics through Java Management Extensions (JMX) that are useful for monitoring rebalances initiated by Self-Balancing:
- The incoming and outgoing byte rate for reassignments is tracked by
These metrics are reported by each broker.
- The number of pending and in-progress reassignment tasks currently tracked by Self-Balancing are tracked by
These metrics are reported from the broker with the active data balancer instance (the controller).
- The maximum follower lag on each broker is tracked by
Reassigning partitions will cause this metric to jump to a value corresponding to the size of the partition and then slowly decay as the reassignment progresses.
If this value slowly rises rather than slowly falling over time, the replication throttle is too low.
You can view the full list of metrics for Confluent Platform in Monitoring Kafka.
The Apache Kafka® documentation covers metrics reporting and monitoring by means of JMX
To monitor Self-Balancing, set the
JMX_PORT environment variable before starting the
cluster, then collect the reported metrics using your usual monitoring tools.
JMXTrans, Graphite, and Grafana are a popular combination for collecting and
reporting JMX metrics from Kafka. Datadog is another popular monitoring solution.
Replica Placement and Rack Configurations
The following considerations apply to all rebalances.
- Racks are fault domains for Kafka. All brokers in the same rack are vulnerable
to simultaneous failure. Both Kafka and Self-Balancing Clusters use knowledge of rack IDs
to place topics, both initially (Kafka) and during rebalances (Self-Balancing).
- Self-Balancing attempts to distribute replicas evenly across racks if brokers have assigned
rack IDs. Specifically, Self-Balancing tries to ensure that no rack has more than 1 greater
replica of a partition than any other rack. Because this is for fault-tolerance,
if Self-Balancing cannot achieve an even rack balance, it will not attempt any balancing at all.
- Racks should have an approximately equal number of brokers on them. This is particularly
important if there are very few racks in your cluster. At a minimum, the number of brokers
per rack should be at least as many as the expected number of replicas per rack.
Replica Placement and Multi-Region Clusters
- If you are using both Multi-Region Clusters and Self-Balancing Clusters, you must specify the broker rack on all brokers
Your starting set of brokers and any brokers you add with Self-Balancing enabled must have a region or rack specified for
broker.rack in each of their
- Replica placement rules are used for Multi-Region Clusters, and
overload the definition of “rack” to limit where replicas can be placed. Replica
placement only works when racks are provided for all brokers.
- If replica placement rules are specified for a topic, then this overrides the
standard rack awareness policy.
- Self-Balancing responds quickly to changes in replica placement rules and proactively
moves replicas as needed to satisfy those placement rules.
- If Self-Balancing cannot satisfy replica placements, it will not do any rebalances. See
Debugging Rebalance Failures for how to identify this.
- Self-Balancing will feel free to move replicas across racks as long as doing so doesn’t
violate rack awareness. This can lead to high network costs both for reassignments
and ongoing replication, so multi-region clusters should specify placement rules
for all topics in the cluster.
Self-Balancing tries to ensure that brokers do not exceed the host’s capacity and,
therefore, will move replicas away from overloaded brokers. Broker capacity is
measured in terms of the following metrics:
- Replica capacity
- Brokers should not host more than this many replicas (default 10,000; can be overridden by confluent.balancer.disk.max.replicas)
- Disk capacity
- Brokers should not fill their disks beyond this point (default 85%, can be overridden by confluent.balancer.disk.max.load)
- Network bytes in and out (optional)
The amount of broker network traffic should not exceed this many bytes. By default this value is set extremely high (
so Self-Balancing will not, by default, ever move replicas off a broker due to excessive network load. You can set
confluent.balancer.network.out.max.bytes.per.second if you want to
use network capacity to play a role in Self-Balancing.
Self-Balancing attempts to ensure even distribution of replicas and disk usage across the cluster but with some caveats:
- Both replicas and disk usage are balanced to within approximately 20 percent (%). It is normal
to see a range of replicas / disk usage across brokers, even after balancing.
- Disk usage balancing only occurs when at least one broker is using more than 20 percent of its disk storage.
- These resource distribution goals are lower priority than other goals and will not be met
if it requires violating rack awareness, replica placement, or capacity limits.
- Self-Balancing attempts to balance (below-capacity) network usage and leader distribution when it rebalances,
but these will not trigger a rebalance. For uneven resource distribution, only replica counts
and disk usage above 20 percent are triggering factors.
Debugging Rebalance Failures
If Self-Balancing cannot rebalance due to rack awareness, replica placement, or capacity
problems, it will result in an
OptimizationFailureException that should be
visible in the log of the Self-Balancing broker (which is the same as the cluster
controller). Looking for this exception may help identify such problems.
See also What defines a “balanced” cluster and what triggers a rebalance? and on Troubleshooting.
If you are running Self-Balancing with security configured, you must configure authentication for REST endpoints
on the brokers. Without these configurations in the
broker properties files, Control Center will not have access to Self-Balancing in a secure environment.
If you are using role-based access control (RBAC), the user interacting with Self-Balancing on Control Center must be have the
SystemAdmin on the Kafka cluster to be able to add
or remove brokers, and to perform other Self-Balancing related tasks.
For information on setting up security on Confluent Platform, see the sections on authentication methods,
role-based access control, RBAC and ACLs, and Security Tutorial.
For information about setting up security on Confluent Platform, see Security Overview, Security Tutorial,
and the overviews on authentication methods. Also, the Confluent Platform Demo (cp-demo) shows
various types of security enabled on an example deployment.
Following is a list of problems you may encounter while working with Self-Balancing and how to solve for them.
In addition to the troubleshooting tips below, see Replica Placement and Rack Configurations
for best practices, further discussion of what triggers rebalances, and how to identify and debug rebalance failures.
Self-Balancing options do not show up on Control Center
When Self-Balancing Clusters are enabled, status and configuration options are available on
Control Center Cluster Settings > Self-balancing tab. If, instead, this tab
displays a message about Confluent Platform version requirements and configuring HTTP servers
on the brokers, this indicates something is missing from your configurations or that
you are not running the required version of Confluent Platform.
Also, if you are running Self-Balancing with security enabled, you may get an error message such as: Error 504 Gateway Timeout,
which indicates that you also must configure authentication for REST endpoints in your broker files, as described below.
Solution: Verify that you have the following settings and update your
configuration as needed.
Broker metrics are not displayed on Control Center
This issue is not specific to Self-Balancing, but related to proper configuration of multi-broker clusters, in general.
You may encounter a scenario where Self-Balancing is enabled and displaying Self-Balancing options on Control Center, but broker metrics
and per-broker drill-down and management options are not showing up on the Brokers Overview page. The most likely
cause for this is that you did not configure the Metrics Reporter for Control Center. To do so, uncomment the following lines
in properties files for all brokers. For example, in
Solution: To fix this on running clusters, you will need to shut down Control Center and the brokers,
update the Metrics Reporter configurations for the brokers, and reboot.
This configuration is covered in more detail in the Self-Balancing tutorial at Enable the Metrics Reporter for Control Center.
Broker removal attempt fails during Self-Balancing initialization
Self-Balancing requires 30 minutes to initialize and collect metrics from brokers in
the cluster. If you attempt to remove broker before metrics collection
completes, the broker removal will fail almost immediately due to insufficient
metrics for Self-Balancing. This is the most common use case for “remove broker” failing.
The following error message will show on the command line or on Control Center,
depending on which method you used for the remove operation:
Self-balancing requires a few minutes to collect metrics for rebalancing plans. Metrics collection is in process. Please try again after 900 seconds.
Solution: The solution for this is to wait 30 minutes, and retry the broker removal.
If you want to remove a controller, the same factors are at play, so you should
give some time for Self-Balancing to initialize before attempting a remove operation. In
this case, though, the “remove broker” operation can fail at a later phase after the target broker is already shut down. In an
offline state, the broker will no longer be accessible from Control Center. If this
occurs, wait for 30 minutes, then retry the remove broker operation from the command line. Once Self-Balancing has
initialized and had time to collect metrics, the operation should succeed, and
the rebalancing plan will run.
To learn more about Self-Balancing initialization, see Self-Balancing Initialization.
Broker removal cannot complete due to offline partitions
Broker removal can also fail in cases where taking a broker down will result in
having fewer online brokers than the number of replicas required in your configurations.
The broker status (available with kafka-remove-brokers
will remain as follows, until you restart one or more of the offline brokers:
[2020-09-17 23:40:53,743] WARN [AdminClient clientId=adminclient-1] Connection to node -5 (localhost/127.0.0.1:9096) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)Broker 1 removal status:
Partition Reassignment: IN_PROGRESS
Broker Shutdown: COMPLETE
A short-hand way of troubleshooting this is to ask “how many brokers are down?”
and “how many replicas/replication factors must the cluster support?”
Partition reassignment (the last phase in broker removal)
will fail to complete in any case where you have
n brokers down, and your configuration
n + 1 or more replicas.
Alternatively, you can consider how many online brokers you need to support the required number of replicas.
If you have
n brokers online, these can support at most a total of
Solution: The solution is to restart the down brokers, and perhaps modify the cluster
configuration as a whole. This might include both adding brokers and modifying
replicas/replication factors (see example below).
Scenarios that lead to this problem can be a combination of under-replicated
topics and topics with too many replicas for the number of online brokers.
Having a topic with a replication factor of 1 does not necessarily lead to a
problem in and of itself.
A quick way to get an overview of configured replicas on a running cluster is to
kafka-topics --describe on a specified topic, or on the whole cluster
(with no topic specified). For system topics, you can scan the replication
factors and replicas on system properties (which generate system topics). The
Self-Balancing Tutorial covers these commands, replicas/replication factors, and
the impact of these configurations.
Too many excluded topics causes problems with Self-Balancing
Excluding too many topics (and by inference, partitions) can be counter-productive to maintaining
a well-balanced cluster. In internal tests, excluding system topics which accounted for
approximately 100 out of 500 total topics was enough to put Self-Balancing into a less than optimal state.
The manifest result of this is that the cluster constantly churns on rebalancing.
Solution: Reduce the number of excluded topics. The relevant configuration options to modify
these settings are:ref:sbc-config-exclude-topic-names and confluent.balancer.exclude.topic.prefixes.