.. _zk-prod-deployment: ========================== Running |zk| in Production ========================== |ak-tm| uses |zk| to store persistent cluster metadata and is a critical component of the Confluent Platform deployment. For example, if you lost the `Kafka data in ZooKeeper `_, the mapping of replicas to Brokers and topic configurations would be lost as well, making your |ak| cluster no longer functional and potentially resulting in total data loss. This document provides the key considerations before making your |zk| cluster live, but is not a complete guide for running |zk| in production. For more detailed information, see the `ZooKeeper Administrator's Guide `_. .. _kafka_zookeeper_deployment: Supported version ~~~~~~~~~~~~~~~~~ |cp| ships a stable version of |zk|. You can use the ENVI "four letter word" to find the current version of a running server with ``nc``. For example: .. codewithvars:: bash echo envi | nc localhost 2181 This will display all of the environment information for the |zk| server, including the version. .. note:: Note that the |zk| start script and functionality of |zk| is tested only with this version of |zk|. Hardware ~~~~~~~~ A production |zk| cluster can cover a wide variety of use cases. The recommendations are intended to be general guidelines for choosing proper hardware for a cluster of |zk| servers. Not all use cases can be covered in this document, so consider your use case and business requirements when making a final decision. ~~~~~~ Memory ~~~~~~ In general, |zk| is not a memory intensive application when handling only data stored by |ak|. The physical memory needs of a |zk| server scale with the size of the znodes stored by the ensemble. This is because each |zk| holds all znode contents in memory at any given time. For |ak|, the dominant driver of znode creation is the number of partitions in the cluster. In a typical production use case, a minimum of 8 GB of RAM should be dedicated for |zk| use. Note that |zk| is sensitive to swapping and any host running a |zk| server should avoid swapping. ~~~ CPU ~~~ In general, |zk| as a |ak| metadata store does not heavily consume CPU resources. However, if |zk| is shared, or the host on which the |zk| server is running is shared, CPU should be considered. |zk| provides a latency sensitive function, so if it must compete for CPU with other processes, or if the |zk| ensemble is serving multiple purposes, you should consider providing a dedicated CPU core to ensure context switching is not an issue. ~~~~~ Disks ~~~~~ Disk performance is vital to maintaining a healthy |zk| cluster. Solid state drives (SSD) are highly recommended as |zk| must have low latency disk writes in order to perform optimally. Each request to |zk| must be committed to to disk on each server in the quorum before the result is available for read. A dedicated SSD of at least 64 GB in size on each |zk| server is recommended for a production deployment. You can use ``autopurge.purgeInterval`` and ``autopurge.snapRetainCount`` to automatically cleanup |zk| data and lower maintenance overhead. JVM ~~~ |zk| runs as a JVM. It is not notably heap intensive when running for the |ak| use case. A heap size of 1 GB is recommended for most use cases and monitoring heap usage to ensure no delays are caused by garbage collection. .. _zk-prod-config: Configuration Options ~~~~~~~~~~~~~~~~~~~~~ The |zk| configuration properties file is located in ``/etc/kafka/zookeeper.properties``. |zk| does not require configuration tuning for most deployments. Below are a few important parameters to consider. A complete list of configurations can be found in the `ZooKeeper project page `_. ``clientPort`` This is the port where |zk| clients will listen on. This is where the Brokers will connect to |zk|. Typically this is set to 2181. * Type: int * Importance: required ``dataDir`` The directory where |zk| in-memory database snapshots and, unless specified in ``dataLogDir``, the transaction log of updates to the database. This location should be a dedicated disk that is ideally an SSD. For more information, see the `ZooKeeper Administration Guide `_. * Type: string * Importance: required ``dataLogDir`` The location where the transaction log is written to. If you don't specify this option, the log is written to dataDir. By specifying this option, you can use a dedicated log device, and help avoid competition between logging and snapshots. For more information, see the `ZooKeeper Administration Guide `_. * Type: string * Importance: optional ``tickTime`` The unit of time for |zk| translated to milliseconds. This governs all |zk| time dependent operations. It is used for heartbeats and timeouts especially. Note that the minimum session timeout will be two ticks. * Type: int * Default: 2000 * Importance: high ``maxClientCnxns`` The maximum allowed number of client connections for a |zk| server. To avoid running out of allowed connections set this to 0 (unlimited). * Type: int * Default: 60 * Importance: high ``autopurge.snapRetainCount`` When enabled, |zk| auto purge feature retains the autopurge.snapRetainCount most recent snapshots and the corresponding transaction logs in the dataDir and dataLogDir respectively and deletes the rest. * Type: int * Default: 3 * Importance: high ``autopurge.purgeInterval`` The time interval in hours for which the purge task has to be triggered. Set to a positive integer (1 and above) to enable the auto purging. * Type: int * Default: 0 * Importance: high Monitoring ~~~~~~~~~~ |zk| servers should be monitored to ensure they are functioning properly and proactively identify issues. In this section, a set of common monitoring best practices is discussed. ~~~~~~~~~~~~~~~~ Operating System ~~~~~~~~~~~~~~~~ The underlying OS metrics can help predict when the |zk| service will start to struggle. In particular, you should monitor: * Number of open file handles -- this should be done system wide and for the user running the |zk| process. Values should be considered with respect to the maximum allowed number of open file handles. |zk| opens and closes connections often, and needs an available pool of file handles to choose from. * Network bandwidth usage -- because |zk| keeps track of state, it is sensitive to timeouts caused by network latency. If the network bandwidth is saturated, you may experience hard to explain timeouts with client sessions that will make your |ak| cluster less reliable. ~~~~~~~~~~~~~~~~~~~ "Four Letter Words" ~~~~~~~~~~~~~~~~~~~ |zk| responds to a set of commands, each four letters in length. The `documentation `__ gives a description of each one and what version each became available. To run the commands you must send a message (via ``netcat`` or ``telnet``) to the |zk| client port. For example ``echo stat | nc localhost 2181`` would return the output of the ``STAT`` command to stdout. You should monitor the ``STAT`` and ``MNTR`` four letter words. Each environment will be somewhat different, but you will be monitoring for any rapid increase or decrease in any metric reported here. The metrics reported by these commands should be consistent except for number of packets sent and received which should increase slowly over time. ~~~~~~~~~~~~~~ JMX Monitoring ~~~~~~~~~~~~~~ :ref:`Confluent Control Center ` monitors the Broker to |zk| connection as show :ref:`here `. The |zk| server also provides a number of JMX metrics that are described in the project documentation `here `__. Here are a few JMX metrics that are important to monitor: * NumAliveConnections - make sure you are not close to maximum as set with ``maxClientCnxns`` * OutstandingRequests - should be below 10 in general * AvgRequestLatency - target below 10 ms * MaxRequestLatency - target below 20 ms * HeapMemoryUsage (Java built-in) - should be relatively flat and well below max heap size In addition to JMX metrics |zk| provides, |ak| tracks a number of relevant |zk| events via the SessionExpireListener that should be monitored to ensure the health of |zk|-|ak| interactions: * ZooKeeperAuthFailuresPerSec (secure environments only) * ZooKeeperDisconnectsPerSec * ZooKeeperExpiresPerSec * ZooKeeperReadOnlyConnectsPerSec * ZooKeeperSaslAuthenticationsPerSec (secure environments only) * ZooKeeperSyncConnectsPerSec Multi-node Setup ~~~~~~~~~~~~~~~~ In a production environment, the |zk| servers will be deployed on multiple nodes. This is called an ensemble. An ensemble is a set of ``2n + 1`` |zk| servers where ``n`` is any number greater than 0. The odd number of servers allows |zk| to perform majority elections for leadership. At any given time, there can be up to ``n`` failed servers in an ensemble and the |zk| cluster will keep ``quorum``. If at any time, ``quorum`` is lost, the |zk| cluster will go down. A few general considerations for multi-node |zk| ensembles: * Start with a small ensemble of three or five servers and only scale as truly necessary (or as required for fault tolerance). Each write must be propagated to a ``quorum`` of servers in the ensemble. This means that if you choose to run an ensemble of three servers, you are tolerant to one server being lost, and writes must propagate to two servers before they are committed. If you have five servers, you are tolerant to two servers being lost, but writes must propagate to three servers before they are committed. * Redundancy in the physical/hardware/network layout: try not to put them all in the same rack, decent (but don't go nuts) hardware, try to keep redundant power and network paths, etc. * Virtualization: place servers in different availability zones to avoid correlated crashes and to make sure that the storage system available can accommodate the requirements of the transaction logging and snapshotting of |zk|. * When it doubt, keep it simple. |zk| holds important data, so prefer stability and durability over trying a new deployment model in production. A multi-node setup does require a few additional configurations. There is a comprehensive overview of these in the `project documentation `__. Below is a concrete example configuration to help get you started. .. codewithvars:: bash tickTime=2000 dataDir=/var/lib/zookeeper/ clientPort=2181 initLimit=5 syncLimit=2 server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888 autopurge.snapRetainCount=3 autopurge.purgeInterval=24 This example is for a 3 node ensemble. Note that this configuration file is expected to be identical across all members of the ensemble. Breaking this down, you can see ``tickTime``, ``dataDir``, and ``clientPort`` are all set to typical single server values. The ``initLimit`` and ``syncLimit`` are used to govern how long following |zk| servers can take to initialize with the current leader and how long they can be out of sync with the leader. In this configuration, a follower can take 10000ms to initialize and may be out of sync for up to 4000ms based on the ``tickTime`` being set to 2000ms. The ``server.*`` properties set the ensemble membership. The format is ``server.=::``. Some explanation: * ``myid`` is the server identification number. In this example, there are three servers, so each one will have a different ``myid`` with values ``1``, ``2``, and ``3`` respectively. The ``myid`` is set by creating a file named ``myid`` in the ``dataDir`` that contains a single integer in human readable ASCII text. This value must match one of the ``myid`` values from the configuration file. If another ensemble member has already been started with a conflicting ``myid`` value, an error will be thrown upon startup. * ``leaderport`` is used by followers to connect to the active leader. This port should be open between all |zk| ensemble members. * ``electionport`` is used to perform leader elections between ensemble members. This port should be open between all |zk| ensemble members. The ``autopurge.snapRetainCount`` and ``autopurge.purgeInterval`` have been set to purge all but three snapshots every 24 hours. Post Deployment ~~~~~~~~~~~~~~~ After you deploy your |zk| cluster, |zk| largely runs without much maintenance. The `project documentation `__ contains many helpful tips on operations such as managing cleanup of the ``dataDir``, logging, and troubleshooting.