Running ZooKeeper in Production¶
Apache Kafka® uses ZooKeeper to store persistent cluster metadata and is a critical component of the Confluent Platform deployment. For example, if you lost the Kafka data in ZooKeeper, the mapping of replicas to Brokers and topic configurations would be lost as well, making your Kafka cluster no longer functional and potentially resulting in total data loss.
This document provides the key considerations before making your ZooKeeper cluster live, but is not a complete guide for running ZooKeeper in production. For more detailed information, see the ZooKeeper Administrator’s Guide.
Supported version¶
Confluent Platform ships a stable version of ZooKeeper. You can use the ENVI
“four letter word” to find the current version of a running server with nc
.
For example:
echo envi | nc localhost 2181
This will display all of the environment information for the ZooKeeper server, including the version.
Note
Note that the ZooKeeper start script and functionality of ZooKeeper is tested only with this version of ZooKeeper.
Hardware¶
A production ZooKeeper cluster can cover a wide variety of use cases. The recommendations are intended to be general guidelines for choosing proper hardware for a cluster of ZooKeeper servers. Not all use cases can be covered in this document, so consider your use case and business requirements when making a final decision.
Memory¶
In general, ZooKeeper is not a memory intensive application when handling only data stored by Kafka. The physical memory needs of a ZooKeeper server scale with the size of the znodes stored by the ensemble. This is because each ZooKeeper holds all znode contents in memory at any given time. For Kafka, the dominant driver of znode creation is the number of partitions in the cluster. In a typical production use case, a minimum of 4 GB of RAM should be dedicated for ZooKeeper use. Note that ZooKeeper is sensitive to swapping and any host running a ZooKeeper server should avoid swapping.
CPU¶
In general, ZooKeeper as a Kafka metadata store does not heavily consume CPU resources. However, if ZooKeeper is shared, or the host on which the ZooKeeper server is running is shared, CPU should be considered. ZooKeeper provides a latency sensitive function, so if it must compete for CPU with other processes, or if the ZooKeeper ensemble is serving multiple purposes, you should consider providing a dedicated CPU core to ensure context switching is not an issue.
Disks¶
Disk performance is vital to maintaining a healthy ZooKeeper cluster. Solid state drives (SSD) are highly recommended as
ZooKeeper must have low latency disk writes in order to perform optimally. Each request to ZooKeeper must be committed to
to disk on each server in the quorum before the result is available for read. A dedicated SSD of at least 64 GB
in size on each ZooKeeper server is recommended for a production deployment. You can use autopurge.purgeInterval
and
autopurge.snapRetainCount
to automatically cleanup ZooKeeper data and lower maintenance overhead.
JVM¶
ZooKeeper runs as a JVM. It is not notably heap intensive when running for the Kafka use case. A heap size of 1 GB is recommended for most use cases and monitoring heap usage to ensure no delays are caused by garbage collection.
Configuration Options¶
The ZooKeeper configuration properties file is located in /etc/kafka/zookeeper.properties
.
ZooKeeper does not require configuration tuning for most deployments. Below are a few important parameters to consider. A complete list of configurations can be found in the ZooKeeper project page.
clientPort
This is the port where ZooKeeper clients will listen on. This is where the Brokers will connect to ZooKeeper. Typically this is set to 2181.
- Type: int
- Importance: required
dataDir
The directory where ZooKeeper in-memory database snapshots and, unless specified in
dataLogDir
, the transaction log of updates to the database. This location should be a dedicated disk that is ideally an SSD. For more information, see the ZooKeeper Administration Guide.- Type: string
- Importance: required
dataLogDir
The location where the transaction log is written to. If you don’t specify this option, the log is written to dataDir. By specifying this option, you can use a dedicated log device, and help avoid competition between logging and snapshots. For more information, see the ZooKeeper Administration Guide.
- Type: string
- Importance: optional
tickTime
The unit of time for ZooKeeper translated to milliseconds. This governs all ZooKeeper time dependent operations. It is used for heartbeats and timeouts especially. Note that the minimum session timeout will be two ticks.
- Type: int
- Default: 3000
- Importance: high
maxClientCnxns
The maximum allowed number of client connections for a ZooKeeper server. To avoid running out of allowed connections set this to 0 (unlimited).
- Type: int
- Default: 60
- Importance: high
autopurge.snapRetainCount
When enabled, ZooKeeper auto purge feature retains the autopurge.snapRetainCount most recent snapshots and the corresponding transaction logs in the dataDir and dataLogDir respectively and deletes the rest.
- Type: int
- Default: 3
- Importance: high
autopurge.purgeInterval
The time interval in hours for which the purge task has to be triggered. Set to a positive integer (1 and above) to enable the auto purging.
- Type: int
- Default: 0
- Importance: high
Monitoring¶
ZooKeeper servers should be monitored to ensure they are functioning properly and proactively identify issues. In this section, a set of common monitoring best practices is discussed.
Operating System¶
The underlying OS metrics can help predict when the ZooKeeper service will start to struggle. In particular, you should monitor:
- Number of open file handles – this should be done system wide and for the user running the ZooKeeper process. Values should be considered with respect to the maximum allowed number of open file handles. ZooKeeper opens and closes connections often, and needs an available pool of file handles to choose from.
- Network bandwidth usage – because ZooKeeper keeps track of state, it is sensitive to timeouts caused by network latency. If the network bandwidth is saturated, you may experience hard to explain timeouts with client sessions that will make your Kafka cluster less reliable.
“Four Letter Words”¶
ZooKeeper responds to a set of commands, each four letters in length. The documentation
gives a description of each one and what version each became available. To run the commands you must
send a message (via netcat
or telnet
) to the ZooKeeper client port. For example echo stat | nc localhost 2181
would return the output of the STAT
command to stdout.
You should monitor the STAT
and MNTR
four letter words. Each environment will be somewhat different,
but you will be monitoring for any rapid increase or decrease in any metric reported here.
The metrics reported by these commands should be consistent except for number of packets sent and received
which should increase slowly over time.
JMX Monitoring¶
Confluent Control Center monitors the Broker to ZooKeeper connection as shown here.
The ZooKeeper server also provides a number of JMX metrics that are described in the project documentation here.
Here are a few JMX metrics that are important to monitor:
- NumAliveConnections - make sure you are not close to maximum as set with
maxClientCnxns
- OutstandingRequests - should be below 10 in general
- AvgRequestLatency - target below 10 ms
- HeapMemoryUsage (Java built-in) - should be relatively flat and well below max heap size
In addition to JMX metrics ZooKeeper provides, Kafka tracks a number of relevant ZooKeeper
events via the SessionExpireListener
that should be monitored to ensure the
health of ZooKeeper-Kafka interactions. For information about the metrics, see
ZooKeeper Metrics.
- ZooKeeperAuthFailuresPerSec (secure environments only)
- ZooKeeperDisconnectsPerSec
- ZooKeeperExpiresPerSec
- ZooKeeperReadOnlyConnectsPerSec
- ZooKeeperSaslAuthenticationsPerSec (secure environments only)
- ZooKeeperSyncConnectsPerSec
- SessionState
Multi-node Setup¶
In a production environment, the ZooKeeper servers will be deployed on multiple nodes. This is called an ensemble.
An ensemble is a set of 2n + 1
ZooKeeper servers where n
is any number greater than 0. The odd number of servers
allows ZooKeeper to perform majority elections for leadership. At any given time, there can be up to n
failed servers in
an ensemble and the ZooKeeper cluster will keep quorum
. If at any time, quorum
is lost, the ZooKeeper cluster will go
down. A few general considerations for multi-node ZooKeeper ensembles:
- Start with a small ensemble of three or five servers and only scale as truly necessary (or as required for fault tolerance).
Each write must be propagated to a
quorum
of servers in the ensemble. This means that if you choose to run an ensemble of three servers, you are tolerant to one server being lost, and writes must propagate to two servers before they are committed. If you have five servers, you are tolerant to two servers being lost, but writes must propagate to three servers before they are committed. - Redundancy in the physical/hardware/network layout: try not to put them all in the same rack, decent (but don’t go nuts) hardware, try to keep redundant power and network paths, etc.
- Virtualization: place servers in different availability zones to avoid correlated crashes and to make sure that the storage system available can accommodate the requirements of the transaction logging and snapshotting of ZooKeeper.
- When it doubt, keep it simple. ZooKeeper holds important data, so prefer stability and durability over trying a new deployment model in production.
A multi-node setup does require a few additional configurations. There is a comprehensive overview of these in the project documentation. Below is a concrete example configuration to help get you started.
tickTime=2000
dataDir=/var/lib/zookeeper/
clientPort=2181
initLimit=5
syncLimit=2
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888
autopurge.snapRetainCount=3
autopurge.purgeInterval=24
This example is for a 3 node ensemble. Note that this configuration file is expected to be identical across all members of the ensemble.
Breaking this down, you can see tickTime
, dataDir
, and clientPort
are all set to typical single server values.
The initLimit
and syncLimit
are used to govern how long following ZooKeeper servers can take to initialize with the
current leader and how long they can be out of sync with the leader. In this configuration, a follower can take 10000ms to
initialize and may be out of sync for up to 4000ms based on the tickTime
being set to 2000ms.
The server.*
properties set the ensemble membership. The format is server.<myid>=<hostname>:<leaderport>:<electionport>
. Some explanation:
myid
is the server identification number. In this example, there are three servers, so each one will have a differentmyid
with values1
,2
, and3
respectively. Themyid
is set by creating a file namedmyid
in thedataDir
that contains a single integer in human readable ASCII text. This value must match one of themyid
values from the configuration file. If another ensemble member has already been started with a conflictingmyid
value, an error will be thrown upon startup.leaderport
is used by followers to connect to the active leader. This port should be open between all ZooKeeper ensemble members.electionport
is used to perform leader elections between ensemble members. This port should be open between all ZooKeeper ensemble members.
The autopurge.snapRetainCount
and autopurge.purgeInterval
have been set to purge all but three snapshots every 24 hours.
Post Deployment¶
After you deploy your ZooKeeper cluster, ZooKeeper largely runs without much maintenance. The project documentation
contains many helpful tips on operations such as managing cleanup of the dataDir
, logging, and troubleshooting.
Note
In 2021, Kafka will replace its usage of ZooKeeper with its own built-in consensus layer. For information, see Preparing Your Clients and Tools for KIP-500: ZooKeeper Removal from Apache Kafka.
Suggested Resources¶
- Podcast: The Truth About ZooKeeper Removal and the KIP-500 Release in Apache Kafka, featuring Jason Gustafson and Colin McCabe
- Blog posts on ZooKeeper
- KIP-500: Apache Kafka Without ZooKeeper (video talk)
- KIP-500: Apache Kafka Without ZooKeeper (podcast)
- Kafka Needs No ZooKeeper (video talk from Kafka Summit 2019)