Mount Docker External Volumes in Confluent Platform
Mount Docker external volumes for Confluent Platform to persist Apache Kafka® data across container restarts, supply secrets to a secured cluster, and add third-party JARs to Kafka Connect. For background on Docker volumes, see the Docker volumes documentation. The following sections describe each use case.
Data Storage: Apache Kafka® and ZooKeeper require externally mounted volumes to persist data if a container stops running or is restarted.
Security: When security is configured, the secrets are stored on the host and made available to the containers using mapped volumes.
Configuring Connect with external JARs: Kafka Connect can be configured to use third-party jars by storing them on a volume on the host.
Note
If you need to add support for additional use cases for external volumes, see extending the images.
Data volumes for Kafka
To persist Kafka log data, mount a host directory to /var/lib/kafka/data in the Kafka container. Set the host directory to be readable and writable by the container user. In Confluent Platform images, this user is appuser with UID 1000.
When you map volumes from host, you must use the full path (example: /var/lib/kafka/data).
The following example shows how to use Kafka and ZooKeeper with mounted volumes and how to configure volumes if you are running Docker container as non-root user. In this example, the containers run with the user appuser with UID=1000 and GID=1000. In all Confluent Platform images, the containers run with the appuser user.
For KRaft mode, you only need to mount a volume for Kafka since ZooKeeper is not running.
Important
As of Confluent Platform 8.0, ZooKeeper has been removed. You must use KRaft mode for metadata management. For steps to migrate from ZooKeeper to KRaft, see Migrate from ZooKeeper to KRaft on Confluent Platform. For links to ZooKeeper topics in older versions of the documentation, see ZooKeeper Topic Guide.
Prerequisite:
Create a cluster ID and format storage for your cluster, if you haven’t already. For instructions, see Generate and format IDs.
To mount an external volume for Kafka:
On the Docker host (VirtualBox VM, for example), create the directory:
# Create directory for Kafka data mkdir -p /vol1/kafka-data # Make sure the user has read and write permissions. chown -R 1000:1000 /vol1/kafka-data
Start the containers:
docker run -d \ --name=kafka-vols \ --net=host \ --user=1000 \ -e KAFKA_NODE_ID=1 \ -e KAFKA_LISTENER_SECURITY_PROTOCOL_MAP='CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT' \ -e KAFKA_ADVERTISED_LISTENERS='PLAINTEXT://kafka-vols:29092,PLAINTEXT_HOST://localhost:9092' \ -e KAFKA_PROCESS_ROLES='broker,controller' \ -e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \ -e KAFKA_CONTROLLER_QUORUM_VOTERS='1@kafka-vols:29093' \ -e KAFKA_LISTENERS='PLAINTEXT://kafka-vols:29092,CONTROLLER://kafka-vols:29093,PLAINTEXT_HOST://0.0.0.0:9092' \ -e KAFKA_INTER_BROKER_LISTENER_NAME='PLAINTEXT' \ -e KAFKA_CONTROLLER_LISTENER_NAMES='CONTROLLER' \ -e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \ -e CLUSTER_ID='MkU3OEVBNTcwNTJENDM2Qk' \ -v /vol1/kafka-data:/var/lib/kafka/data \ confluentinc/cp-kafka:8.2.1
The data volumes are mounted using the
-vflag.
To mount volumes for ZooKeeper and Kafka:
On the Docker host (VirtualBox VM, for example), create the directories:
# Create directories for Kafka and ZooKeeper data. mkdir -p /vol1/zk-data mkdir -p /vol2/zk-txn-logs mkdir -p /vol3/kafka-data # Make sure the user has read and write permissions. chown -R 1000:1000 /vol1/zk-data chown -R 1000:1000 /vol2/zk-txn-logs chown -R 1000:1000 /vol3/kafka-data
Start the containers:
# Run Zookeeper with user 1000 and volumes mapped to host volumes docker run -d \ --name=zk-vols \ --net=host \ --user=1000 \ -e ZOOKEEPER_TICK_TIME=2000 \ -e ZOOKEEPER_CLIENT_PORT=32181 \ -v /vol1/zk-data:/var/lib/zookeeper/data \ -v /vol2/zk-txn-logs:/var/lib/zookeeper/log \ confluentinc/cp-zookeeper:8.2.1 docker run -d \ --name=kafka-vols \ --net=host \ --user=1000 \ -e KAFKA_BROKER_ID=1 \ -e KAFKA_ZOOKEEPER_CONNECT=localhost:32181 \ -e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://localhost:39092 \ -e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \ -v /vol3/kafka-data:/var/lib/kafka/data \ confluentinc/cp-kafka:8.2.1
The data volumes are mounted using the
-vflag.
Security: data volumes for configuring secrets
When security is enabled, the secrets are made available to the containers using volumes. For example, if the host has the secrets (credentials, keytab, certificates, kerberos config, JAAS config) in /vol007/kafka-node-1-secrets, configure Kafka as follows to use the secrets:
docker run -d \
--name=kafka-sasl-ssl-1 \
--net=host \
-e KAFKA_BROKER_ID=1 \
-e KAFKA_ZOOKEEPER_CONNECT=localhost:22181,localhost:32181,localhost:42181/saslssl \
-e KAFKA_ADVERTISED_LISTENERS=SASL_SSL://localhost:39094 \
-e KAFKA_SSL_KEYSTORE_FILENAME=kafka.broker3.keystore.jks \
-e KAFKA_SSL_KEYSTORE_CREDENTIALS=broker3_keystore_creds \
-e KAFKA_SSL_KEY_CREDENTIALS=broker3_sslkey_creds \
-e KAFKA_SSL_TRUSTSTORE_FILENAME=kafka.broker3.truststore.jks \
-e KAFKA_SSL_TRUSTSTORE_CREDENTIALS=broker3_truststore_creds \
-e KAFKA_SECURITY_INTER_BROKER_PROTOCOL=SASL_SSL \
-e KAFKA_SASL_MECHANISM_INTER_BROKER_PROTOCOL=GSSAPI \
-e KAFKA_SASL_ENABLED_MECHANISMS=GSSAPI \
-e KAFKA_SASL_KERBEROS_SERVICE_NAME=kafka \
-e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \
-e KAFKA_OPTS=-Djava.security.auth.login.config=/etc/kafka/secrets/host_broker3_jaas.conf -Djava.security.krb5.conf=/etc/kafka/secrets/host_krb.conf \
-v /vol007/kafka-node-1-secrets:/etc/kafka/secrets \
confluentinc/cp-kafka:latest
In the preceding example, the location of the data volumes is specified by setting -v /vol007/kafka-node-1-secrets:/etc/kafka/secrets. You then specify how they are to be used by setting:
-e KAFKA_OPTS=-Djava.security.auth.login.config=/etc/kafka/secrets/host_broker3_jaas.conf -Djava.security.krb5.conf=/etc/kafka/secrets/host_krb.conf
Configuring Connect with external JARs
Kafka Connect can be configured to use third-party jars by storing them on a volume on the host and mapping the volume to /etc/kafka-connect/jars on the container.
At the host (Virtualbox VM for example), download the MySQL driver:
# Create a dir for jars and download the mysql jdbc driver into the directories
mkdir -p /vol42/kafka-connect/jars
# get the driver and store the jar in the dir
curl -k -SL "https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.39.tar.gz" | tar -xzf - -C /vol42/kafka-connect/jars --strip-components=1 mysql-connector-java-5.1.39/mysql-connector-java-5.1.39-bin.jar
Then start Kafka Connect mounting the download directory as /etc/kafka-connect/jars:
docker run -d \
--name=connect-host-json \
--net=host \
-e CONNECT_BOOTSTRAP_SERVERS=localhost:39092 \
-e CONNECT_REST_PORT=28082 \
-e CONNECT_GROUP_ID="default" \
-e CONNECT_CONFIG_STORAGE_TOPIC="default.config" \
-e CONNECT_OFFSET_STORAGE_TOPIC="default.offsets" \
-e CONNECT_STATUS_STORAGE_TOPIC="default.status" \
-e CONNECT_KEY_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
-e CONNECT_VALUE_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
-e CONNECT_INTERNAL_KEY_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
-e CONNECT_INTERNAL_VALUE_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
-e CONNECT_REST_ADVERTISED_HOST_NAME="localhost" \
-e CONNECT_PLUGIN_PATH=/usr/share/java,/etc/kafka-connect/jars \
-e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \
-v /vol42/kafka-connect/jars:/etc/kafka-connect/jars \
confluentinc/cp-kafka-connect:latest