Important
You are viewing documentation for an older version of Confluent Platform. For the latest, click here.
Docker Developer Guide¶
This document assumes knowledge of Docker and Dockerfiles
. To review best
practices for writing Dockerfiles
, see Docker’s best practices guide.
If you want to contribute back to the project, review the contributing guidelines.
Confluent Platform image bootup process¶
Upon startup of Docker, the entry point /etc/confluent/docker/run
runs three
executable scripts found in the /etc/confluent/docker
. They are run in the
following sequence:
Configure script
The
/etc/confluent/docker/configure
script does all the necessary configuration for each image. This includes the following:- Create all configuration files and copying them to their proper location.
- Ensure that mandatory configuration properties are present.
- If required, handle service discovery.
Ensure script
The
/etc/confluent/docker/ensure
scripts makes sure that all the prerequisites for launching the service are in place. This includes:- Ensure the configuration files are present and readable.
- Ensure that you can write/read to the data directory. The directories must be world writable.
- Ensure that supporting services are in the READY state. For example, ensure that ZooKeeper is ready before launching a Kafka broker.
- Ensure supporting systems are configured properly. For example, make sure all topics required for Confluent Control Center are created with proper replication, security and partition settings.
Launch
The
/etc/confluent/docker/launch
script runs the actual process. The script should ensure that:- The process is run with process id 1. Your script should use
exec
so the program takes over the shell process rather than running as a child process. This is so that your program will receive signals like SIGTERM directly rather than its parent shell process receiving them. - Log to stdout.
- The process is run with process id 1. Your script should use
Prerequisites¶
Install Docker. The examples in this guide are for running on macOS. For instructions on installing Docker on Linux or Windows, refer to the official Docker Machine documentation.
brew install docker docker-machine
Create a Docker Machine:
docker-machine create --driver virtualbox --virtualbox-memory 6000 confluent
This command will create a local environment but it is recommended that you create one on AWS. The builds are much faster and more predictable (virtualbox stops when you close the lid of the laptop and sometimes gets into a weird state). When choosing an instance type,
m4.large
is good choice. It has 2 vCPUs with 8GB RAM and costs around ~$88 monthly.export INSTANCE_NAME=$USER-docker-machine docker-machine create \ --driver amazonec2 \ --amazonec2-region us-west-2 \ --amazonec2-instance-type m4.large \ --amazonec2-root-size 100 \ --amazonec2-ami ami-16b1a077 \ --amazonec2-tags Name,$INSTANCE_NAME \ $USER-aws-confluent
Configure your terminal window to attach it to your new Docker Machine:
eval $(docker-machine env confluent)
Install Maven:
brew install maven
Build the Confluent Platform images¶
Refer to Docker Image Reference for a list of GitHub repos for the Confluent Platform components.
For each Confluent Platform image you want to build:
Clone the repo.
Checkout the release branch.
Get the values for the required and optional arguments for the build command.
For the list of supported arguments, see this README file.
The following are required arguments:
CONFLUENT_PACKAGES_REPO
: Specify the location of the Confluent Platform packages repository. Depending on the type of OS for the image you are building, you might need to provide a Debian or RPM repository.CONFLUENT_VERSION
: Specify the full Confluent Platform release version, e.g., 5.5.15.docker.upstream-registry
: Registry to pull base images from. Trailing/
is required. Used asDOCKER_UPSTREAM_REGISTRY
duringdocker build
.docker.upstream-tag
: Use the given tag when pulling base images. Used asDOCKER_UPSTREAM_TAG
duringdocker build
.
Optionally, you can choose an operating system you want your docker image based on, specifically Debian or RHEL UBI. To build a RHEL UBI image pass the following argument to the Maven command:
-Ddocker.os_type=ubi8
From the root folder of the repo, build the Confluent Platform images using Maven.
For example:
mvn clean package \ -DskipTests -Pdocker \ -DCONFLUENT_PACKAGES_REPO='https://packages.confluent.io/rpm/5.5' \ -DCONFLUENT_VERSION=5.5.15 \ -Ddocker.upstream-registry=docker.io/ \ -Ddocker.upstream-tag=5.5.15
Extend Confluent Platform images¶
You can extend the images to add or customize connectors, add new software, change the configuration management, and set up external service discovery. The following sections provide examples.
Add Connectors or Software¶
Confluent provides two images for Kafka Connect:
- The Kafka Connect Base image contains Kafka Connect and all of its dependencies. When started, it will run the Connect framework in distributed mode.
- The Kafka Connect image extends the Kafka Connect Base image and includes several of the connectors supported by Confluent: JDBC, Elasticsearch, HDFS, S3, and JMS.
There are currently two ways to add new connectors to these images.
- Use the
cp-kafka-connect
orcp-kafka-connect-base
image as-is and add the connector JARs via volumes. - Build a new Docker image that has the new connectors installed. See the following examples.
Create a Docker Image containing Confluent Hub Connectors¶
This example shows how to use the Confluent Hub client to create a Docker image that extends from one of
Confluent’s Kafka Connect images but which contains a custom set of connectors.
This may be useful if you’d like to use a connector that isn’t contained in the
cp-kafka-connect
image, or if you’d like to keep the custom image
lightweight and not include any connectors that you don’t plan to use.
Add connectors from Confluent Hub.
Choose an image to extend.
Functionally, the
cp-kafka-connect
and thecp-kafka-connect-base
images are identical. The only difference is that thecp-kafka-connect
image already contains several of Confluent’s connectors, whereas thecp-kafka-connect-base
image comes with none by default. Thecp-kafka-connect-base
image is shown in this example.Choose the connectors from Confluent Hub that you’d like to include in your custom image. Note that the remaining steps result in a custom image containing a MongoDB connector, a Microsoft Azure IoT Hub connector, and a Google BigQuery connector.
Write a Dockerfile.
FROM confluentinc/cp-kafka-connect-base:5.5.15 RUN confluent-hub install --no-prompt hpgrahsl/kafka-connect-mongodb:1.1.0 \ && confluent-hub install --no-prompt microsoft/kafka-connect-iothub:0.6 \ && confluent-hub install --no-prompt wepay/kafka-connect-bigquery:1.1.0
Build the Dockerfile.
docker build . -t my-custom-image:1.0.0
The output from that command should resemble:
Step 1/2 : FROM confluentinc/cp-kafka-connect-base ---> e0d92da57dc3 ... Running in a "--no-prompt" mode Implicit acceptance of the license below: Apache 2.0 https://github.com/wepay/kafka-connect-bigquery/blob/master/LICENSE.md Implicit confirmation of the question: You are about to install 'kafka-connect-bigquery' from WePay, as published on Confluent Hub. Downloading component BigQuery Sink Connector 1.1.0, provided by WePay from Confluent Hub and installing into /usr/share/confluent-hub-components Adding installation directory to plugin path in the following files: /etc/kafka/connect-distributed.properties /etc/kafka/connect-standalone.properties /etc/schema-registry/connect-avro-distributed.properties /etc/schema-registry/connect-avro-standalone.properties Completed Removing intermediate container 48d4506b8a83 ---> 496befc3d3f7 Successfully built 496befc3d3f7 Successfully tagged my-custom-image:1.0.0
This results in an image named
my-custom-image
that contains the MongoDB, Azure IoT Hub, and BigQuery connectors, and which is capable of running any or all of the connectors using the Kafka Connect framework.
If you are using a docker-compose.yml
file and the Confluent Hub
client to build your Kafka environment, use the following
properties to enable a connector.
connect:
image: confluentinc/kafka-connect-datagen:latest
build:
context: .
dockerfile: Dockerfile-confluenthub
Create a Docker Image containing Local Connectors¶
This example shows how to create a Docker image that extends the
cp-kafka-connect-base
image to contain one or more local connectors. This is
useful if you want to use your connectors instead of pulling connectors from
Confluent Hub.
Package your local connector in a zip file.
Set up the Dockerfile as shown in the example below.
FROM confluentinc/cp-kafka-connect-base:5.5.15 COPY target/components/packages/my-connector-5.5.15.zip /tmp/my-connector-5.5.15.zip RUN confluent-hub install --no-prompt /tmp/my-connector-5.5.15.zip
Build the Dockerfile.
docker build . -t my-custom-image:1.0.0
Add Additional Software¶
This example shows how to add new software to an image. For example, you might
want to extend the Kafka Connect client to include the MySQL JDBC driver. If
this approach is used to add new connectors to an image, the connector JARs must
be on the plugin.path
or the CLASSPATH
for the Connect framework.
Write the Dockerfile.
FROM confluentinc/cp-kafka-connect ENV MYSQL_DRIVER_VERSION 5.1.39 RUN curl -k -SL "https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-${MYSQL_DRIVER_VERSION}.tar.gz" \ | tar -xzf - -C /usr/share/java/kafka/ --strip-components=1 mysql-connector-java-5.1.39/mysql-connector-java-${MYSQL_DRIVER_VERSION}-bin.jar
Build the image.
docker build -t foo/mysql-connect:latest .
Note
This approach can also be used to create images with your own Kafka Connect Plugins.
Change configuration management¶
This example describes how to change the configuration management. To accomplish
this, you override the configure
script to download the scripts from a URL.
For example, with the ZooKeeper image, you need the following Dockerfile
and
configure script. This example assumes that each property file is has a URL.
Dockerfile:
FROM confluentinc/cp-zookeeper
COPY include/etc/confluent/docker/configure /etc/confluent/docker/configure
Example Configure Script:
Location: include/etc/confluent/docker/configure
. /etc/confluent/docker/bash-config
# Ensure that URL locations are available.
dub ensure ZOOKEEPER_SERVER_CONFIG_URL
dub ensure ZOOKEEPER_SERVER_ID_URL
dub ensure ZOOKEEPER_LOG_CONFIG_URL
# Ensure that the config location is writable.
dub path /etc/kafka/ writable
curl -XGET ZOOKEEPER_SERVER_CONFIG_URL > /etc/kafka/zookeeper.properties
curl -XGET ZOOKEEPER_SERVER_ID_URL > /var/lib/zookeeper/data/myid
curl -XGET ZOOKEEPER_LOG_CONFIG_URL > /etc/kafka/log4j.properties
Build the image:
docker build -t foo/zookeeper:latest .
Enter the command.
docker run \
-e ZOOKEEPER_SERVER_CONFIG_URL=http://foo.com/zk1/server.properties \
-e ZOOKEEPER_SERVER_ID_URL =http://foo.com/zk1/myid \
-e ZOOKEEPER_LOG_CONFIG_URL =http://foo.com/zk1/log4j.properties \
foo/zookeeper:latest
Log to external volumes¶
The images only expose volumes for data and security configuration. But you might want to write to external storage for some use cases. The following example shows how to write the Kafka authorizer logs to a volume for auditing.
Dockerfile:
FROM confluentinc/cp-kafka
# Make sure the log directory is world-writable
RUN echo "===> Creating authorizer logs dir ..." \\
&& mkdir -p /var/log/kafka-auth-logs \\
&& chmod -R ag+w /var/log/kafka-auth-logs
VOLUME \["/var/lib/$\{COMPONENT}/data", "/etc/$\{COMPONENT}/secrets", "/var/log/kafka-auth-logs"]
COPY include/etc/confluent/log4j.properties.template /etc/confluent/docker/log4j.properties.template
log4j.properties.template:
Location: include/etc/confluent/log4j.properties.template
log4j.rootLogger={{ env["KAFKA_LOG4J_ROOT_LOGLEVEL"] | default('INFO') }}, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c)%n
log4j.appender.authorizerAppender=org.apache.log4j.DailyRollingFileAppender
log4j.appender.authorizerAppender.DatePattern='.'yyyy-MM-dd-HH
log4j.appender.authorizerAppender.File=/var/log/kafka-auth-logs/kafka-authorizer.log
log4j.appender.authorizerAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.authorizerAppender.layout.ConversionPattern=[%d] %p %m (%c)%n
log4j.additivity.kafka.authorizer.logger=false
{% set loggers = {
'kafka': 'INFO',
'kafka.network.RequestChannel$': 'WARN',
'kafka.producer.async.DefaultEventHandler': 'DEBUG',
'kafka.request.logger': 'WARN',
'kafka.controller': 'TRACE',
'kafka.log.LogCleaner': 'INFO',
'state.change.logger': 'TRACE',
'kafka.authorizer.logger': 'WARN, authorizerAppender'
} -%}
{% if env['KAFKA_LOG4J_LOGGERS'] %}
{% set loggers = parse_log4j_loggers(env['KAFKA_LOG4J_LOGGERS'], loggers) %}
{% endif %}
{% for logger,loglevel in loggers.iteritems() %}
log4j.logger.\{\{logger}}=\{\{loglevel}}
{% endfor %}
Build the image.
docker build -t foo/kafka-auditable:latest .
Write garbage collection logs to an external volume¶
The following example shows how to log heap dumps and GC logs to an external volume. This is useful for debugging the Kafka image.
Dockerfile:
FROM confluentinc/cp-kafka
# Make sure the jvm log directory is world-writable
RUN echo "===> Creating jvm logs dir ..." \
&& mkdir -p /var/log/jvm-logs
&& chmod -R ag+w /var/log/jvm-logs
VOLUME ["/var/lib/${COMPONENT}/data", "/etc/${COMPONENT}/secrets", "/var/log/jvm-logs"]
Build the image.
docker build -t foo/kafka-verbose-jvm:latest .
Enter the command.
docker run \ -e KAFKA_HEAP_OPTS="-Xmx256M -Xloggc:/var/log/jvm-logs/verbose-gc.log -verbose:gc -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/jvm-logs" \ foo/kafka-verbose-jvm:latest
Use external service discovery¶
You can extend the images to support for any service discovery mechanism either
by overriding relevant properties or by overriding the configure
script as shown in Change Configuration Management.
The Docker images provide Mesos support by overriding relevant properties for Mesos
service discovery. See
debian/kafka-connect/includes/etc/confluent/docker/mesos-overrides
for
examples.
Use the Oracle JDK¶
The Confluent Platform images ship with Azul Zulu OpenJDK. If you are required to use Oracle’s version, follow the steps below to modify the cp-base-new image to include Oracle JDK instead of Zulu OpenJDK.
Update the Dockerfile for the OS, for example,
Dockerfile.deb8
, in thebase
directory of the Confluent Base image repo.Replace the following lines for Zulu OpenJDK:
&& echo "Installing Zulu OpenJDK ${ZULU_OPENJDK_VERSION}" \ && apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 0x219BD9C9 \ && echo "deb http://repos.azulsystems.com/debian stable main" >> /etc/apt/sources.list.d/zulu.list \ && apt-get -qq update \ && apt-get -y install zulu-${ZULU_OPENJDK_VERSION} \ && rm -rf /var/lib/apt/lists/* \
With the following lines for Oracle JDK:
&& echo "===> Adding webupd8 repository for Oracle JDK..." \ && echo "deb http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main" | tee /etc/apt/sources.list.d/webupd8team-java.list \ && echo "deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main" | tee -a /etc/apt/sources.list.d/webupd8team-java.list \ && apt-key adv --keyserver keyserver.ubuntu.com --recv-keys EEA14886 \ && apt-get update \ \ && echo "===> Installing Oracle JDK 8 ..." \ && echo debconf shared/accepted-oracle-license-v1-1 select true | debconf-set-selections \ && echo debconf shared/accepted-oracle-license-v1-1 seen true | debconf-set-selections \ && DEBIAN_FRONTEND=noninteractive apt-get install -y --force-yes \ oracle-java8-installer \ oracle-java8-set-default \ ca-certificates \ && rm -rf /var/cache/oracle-jdk8-installer \ && apt-get clean && rm -rf /tmp/* /var/lib/apt/lists/* \
Rebuild the cp-base-new image. Refer to Build the Confluent Platform images for the steps.
Utility scripts¶
Given the dependencies between the various Confluent Platform components (e.g. ZooKeeper required for Kafka, Kafka and ZooKeeper required for Schema Registry, etc.), it is sometimes necessary to be able to check the status of different services. The following utilities are used during the bootup sequence of the images and in the testing framework.
Docker Utility Belt (dub)¶
Template
usage: dub template [-h] input output Generate template from env vars. positional arguments: input Path to template file. output Path of output file.
ensure
usage: dub ensure [-h] name Check if env var exists. positional arguments: name Name of env var.
wait
usage: dub wait [-h] host port timeout wait for network service to appear. positional arguments: host Host. port Host. timeout timeout in secs.
path
usage: dub path [-h] path {writable,readable,executable,exists} Check for path permissions and existence. positional arguments: path Full path. {writable,readable,executable,exists} One of [writable, readable, executable, exists].
path-wait
usage: dub path-wait [-h] path timeout Wait for a path to exist. positional arguments: path Full path. timeout Time in secs to wait for the path to exist. optional arguments: -h, --help show this help message and exit
Confluent Platform Utility Belt (cub)¶
zk-ready
Used for checking if ZooKeeper is ready.
usage: cub zk-ready [-h] connect_string timeout retries wait Check if ZK is ready. positional arguments: connect_string ZooKeeper connect string. timeout Time in secs to wait for service to be ready. retries No of retries to check if leader election is complete. wait Time in secs between retries
kafka-ready
Used for checking if Kafka is ready.
usage: cub kafka-ready [-h] (-b BOOTSTRAP_BROKER_LIST | -z ZOOKEEPER_CONNECT) [-c CONFIG] [-s SECURITY_PROTOCOL] expected_brokers timeout Check if Kafka is ready. positional arguments: expected_brokers Minimum number of brokers to wait for timeout Time in secs to wait for service to be ready. optional arguments: -h, --help show this help message and exit -b BOOTSTRAP_BROKER_LIST, --bootstrap_broker_list BOOTSTRAP_BROKER_LIST List of bootstrap brokers. -z ZOOKEEPER_CONNECT, This option is deprecated in Confluent Platform 5.5.0 and later. --zookeeper_connect ZOOKEEPER_CONNECT ZooKeeper connect string. -c CONFIG, --config CONFIG Path to config properties file (required when security is enabled). -s SECURITY_PROTOCOL, --security-protocol SECURITY_PROTOCOL Security protocol to use when multiple listeners are enabled.
sr-ready
Used for checking if Schema Registry is ready. If you have multiple Schema Registry nodes, you may need to check their availability individually.
usage: cub sr-ready [-h] host port timeout positional arguments: host Hostname for Schema Registry. port Port for Schema Registry. timeout Time in secs to wait for service to be ready.
kr-ready
Used for checking if the REST Proxy is ready. If you have multiple REST Proxy nodes, you may need to check their availability individually.
usage: cub kr-ready [-h] host port timeout positional arguments: host Hostname for REST Proxy. port Port for REST Proxy. timeout Time in secs to wait for service to be ready.
connect-ready
Used for checking if Kafka Connect is ready.
usage: cub connect-ready [-h] host port timeout positional arguments: host Hostname for Connect worker. port Port for Connect worker. timeout Time in secs to wait for service to be ready.
ksql-server-ready
Used for checking if ksqlDB is ready.
usage: cub ksql-server-ready [-h] host port timeout positional arguments: host Hostname for KSQL server. port Port for KSQL server. timeout Time in secs to wait for service to be ready.
control-center-ready
Used for checking if Confluent Control Center is ready.
usage: cub control-center-ready [-h] host port timeout positional arguments: host Hostname for Control Center. port Port for Control Center. timeout Time in secs to wait for service to be ready.
Client properties¶
The following properties may be configured when using the kafka-ready
utility described above.
bootstrap.servers
A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping - this list only impacts the initial hosts used to discover the full set of servers. This list should be in the form
host1:port1,host2:port2,...
. Since these servers are just used for the initial connection to discover the full cluster membership (which may change dynamically), this list need not contain the full set of servers (you may want more than one, though, in case a server is down).- Type: list
- Default:
- Importance: high
ssl.key.password
The password of the private key in the key store file. This is optional for client.
- Type: password
- Importance: high
ssl.keystore.location
The location of the key store file. This is optional for client and can be used for two-way authentication for client.
- Type: string
- Importance: high
ssl.keystore.password
The store password for the key store file.This is optional for client and only needed if ssl.keystore.location is configured.
- Type: password
- Importance: high
ssl.truststore.location
The location of the trust store file.
- Type: string
- Importance: high
ssl.truststore.password
The password for the trust store file.
- Type: password
- Importance: high
sasl.kerberos.service.name
The Kerberos principal name that Kafka runs as. This can be defined either in Kafka’s JAAS config or in Kafka’s config.
- Type: string
- Importance: medium
sasl.mechanism
SASL mechanism used for client connections. This may be any mechanism for which a security provider is available. GSSAPI is the default mechanism.
- Type: string
- Default: “GSSAPI”
- Importance: medium
security.protocol
Protocol used to communicate with brokers. Valid values are: PLAINTEXT, SSL, SASL_PLAINTEXT, SASL_SSL.
- Type: string
- Default: “PLAINTEXT”
- Importance: medium
ssl.enabled.protocols
The list of protocols enabled for SSL connections.
- Type: list
- Default: [TLSv1.2, TLSv1.1, TLSv1]
- Importance: medium
ssl.keystore.type
The file format of the key store file. This is optional for client.
- Type: string
- Default: “JKS”
- Importance: medium
ssl.protocol
The SSL protocol used to generate the SSLContext. Default setting is TLS, which is fine for most cases. Allowed values in recent JVMs are TLS, TLSv1.1 and TLSv1.2. SSL, SSLv2 and SSLv3 may be supported in older JVMs, but their usage is discouraged due to known security vulnerabilities.
- Type: string
- Default: “TLS”
- Importance: medium
ssl.provider
The name of the security provider used for SSL connections. Default value is the default security provider of the JVM.
- Type: string
- Importance: medium
ssl.truststore.type
The file format of the trust store file.
- Type: string
- Default: “JKS”
- Importance: medium
sasl.kerberos.kinit.cmd
Kerberos kinit command path.
- Type: string
- Default: “/usr/bin/kinit”
- Importance: low
sasl.kerberos.min.time.before.relogin
Login thread sleep time between refresh attempts.
- Type: long
- Default: 60000
- Importance: low
sasl.kerberos.ticket.renew.jitter
Percentage of random jitter added to the renewal time.
- Type: double
- Default: 0.05
- Importance: low
sasl.kerberos.ticket.renew.window.factor
Login thread will sleep until the specified window factor of time from last refresh to ticket’s expiry has been reached, at which time it will try to renew the ticket.
- Type: double
- Default: 0.8
- Importance: low
ssl.cipher.suites
A list of cipher suites. This is a named combination of authentication, encryption, MAC and key exchange algorithm used to negotiate the security settings for a network connection using TLS or SSL network protocol.By default all the available cipher suites are supported.
- Type: list
- Importance: low
ssl.endpoint.identification.algorithm
The endpoint identification algorithm to validate server hostname using server certificate.
- Type: string
- Importance: low
ssl.keymanager.algorithm
The algorithm used by key manager factory for SSL connections. Default value is the key manager factory algorithm configured for the Java Virtual Machine.
- Type: string
- Default: “SunX509”
- Importance: low
ssl.trustmanager.algorithm
The algorithm used by trust manager factory for SSL connections. Default value is the trust manager factory algorithm configured for the Java Virtual Machine.
- Type: string
- Default: “PKIX”
- Importance: low
Reference¶
See this document for an example for setting up a Dockerized AWS EC2 instance.