You are viewing documentation for an older version of Confluent Platform. For the latest, click here.

Single Node Basic Deployment on Docker

This tutorial provides a basic guide for deploying a Kafka cluster along with all Confluent Platform components in your Docker environment. By the end, you will have a functional Confluent Platform deployment that you can use to run applications.

This tutorial builds a single node Docker environment using Docker client. You will configure Kafka and ZooKeeper to store data locally in their Docker containers.

  • Docker version 1.11 or later installed and running.

    If you’re running on Windows:

    • You must use Docker Machine to start the Docker host.
    • You must allocate at least 4 GB of RAM (6 GB is recommended) to the Docker Machine.
  • curl

Step 1: Setup Your Docker Environment

Create a VirtualBox Instance

Create and configure the Docker Machine on VirtualBox.


This step is only for Mac and Windows users who are not using Docker for Mac or Docker for Windows.

  1. Create a VirtualBox instance running a Docker container named confluent with 6 GB of memory.

    docker-machine create --driver virtualbox --virtualbox-memory 6000 confluent
  2. Configure your terminal window to attach it to your new Docker Machine:

    docker-machine env confluent

Create a Docker Network

Create the Docker network that is used to run the Confluent containers.


A Docker network is required to enable DNS resolution across your containers. The default Docker network does not have DNS enabled.

$ docker network create confluent

Step 2. Start the Confluent Platform Components

Start ZooKeeper

  1. Start ZooKeeper and keep this service running.

    $ docker run -d \
        --net=confluent \
        --name=zookeeper \
        -e ZOOKEEPER_CLIENT_PORT=2181 \

    This command instructs Docker to launch an instance of the confluentinc/cp-zookeeper:5.0.0 container and name it zookeeper. Also, the Docker network confluent and the required ZooKeeper parameter ZOOKEEPER_CLIENT_PORT are specified. For a full list of the available configuration options and more details on passing environment variables into Docker containers, see the configuration reference docs.

  2. Optional: Check the Docker logs to confirm that the container has booted up successfully and started the ZooKeeper service.

    $ docker logs zookeeper

    With this command, you’re referencing the container name that you want to see the logs for. To list all containers (running or failed), you can always run docker ps -a. This is especially useful when running in detached mode.

    When you output the logs for ZooKeeper, you should see the following message at the end of the log output:

    [2016-07-24 05:15:35,453] INFO binding to port (org.apache.zookeeper.server.NIOServerCnxnFactory)

    Note that the message shows the ZooKeeper service listening at the port you passed in as ZOOKEEPER_CLIENT_PORT above.

    If the service is not running, the log messages should provide details to help you identify the problem. A common error is:

    • Insufficient resources. In rare occasions, you may see memory allocation or other low-level failures at startup. This will only happen if you dramatically overload the capacity of your Docker host.

Start Kafka

  1. Start Kafka with this command.

    $ docker run -d \
        --net=confluent \
        --name=kafka \
        -e KAFKA_ZOOKEEPER_CONNECT=zookeeper:2181 \

    The KAFKA_ADVERTISED_LISTENERS variable is set to kafka:9092. This will make Kafka accessible to other containers by advertising it’s location on the Docker network. The same ZooKeeper port is specified here as the previous container.

    The KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR is set to 1 for a single-node cluster. If you have three or more nodes, you do not need to change this from the default.

  2. Optional: Check the logs to see the broker has booted up successfully:

    $ docker logs kafka

    You should see the following at the end of the log output:

    [2016-07-15 23:31:00,295] INFO [Kafka Server 1], started (kafka.server.KafkaServer)
    [2016-07-15 23:31:00,349] INFO [Controller 1]: New broker startup callback for 1 (kafka.controller.KafkaController)
    [2016-07-15 23:31:00,350] INFO [Controller-1-to-broker-1-send-thread], Starting  (kafka.controller.RequestSendThread)

Step 3. Create a Topic and Produce Data

  1. Create a topic. You’ll name it foo and keep things simple by just giving it one partition and only one replica.

    $ docker run \
      --net=confluent \
      --rm confluentinc/cp-kafka:5.0.0 \
      kafka-topics --create --topic foo --partitions 1 --replication-factor 1 \
      --if-not-exists --zookeeper zookeeper:2181

    You should see the following:

    Created topic "foo".
  2. Optional: Verify that the topic was successfully created:

    $ docker run \
      --net=confluent \
      --rm \
      confluentinc/cp-kafka:5.0.0 \
      kafka-topics --describe --topic foo --zookeeper zookeeper:2181

    You should see the following:

    Topic:foo   PartitionCount:1    ReplicationFactor:1 Configs:
    Topic: foo  Partition: 0    Leader: 1001    Replicas: 1001  Isr: 1001
  3. Publish data to your new topic:

    $ docker run \
      --net=confluent \
      --rm \
      confluentinc/cp-kafka:5.0.0 \
      bash -c "seq 42 | kafka-console-producer --request-required-acks 1 \
      --broker-list kafka:9092 --topic foo && echo 'Produced 42 messages.'"

    This command will use the built-in Kafka Console Producer to produce 42 simple messages to the topic. After running the command, you should see the following:

    Produced 42 messages.

    To complete the story, you can read back the message using the built-in Console consumer:

    $ docker run \
      --net=confluent \
      --rm \
      confluentinc/cp-kafka:5.0.0 \
      kafka-console-consumer --bootstrap-server kafka:9092 --topic foo --from-beginning --max-messages 42

    If everything is working as expected, each of the original messages you produced should be written back out:

    Processed a total of 42 messages

You can now continue to Step 4: Start Schema Registry.

Step 4: Start Schema Registry

In this step, Schema Registry is used to create a new schema and send some Avro data to a Kafka topic. Although you would normally do this from one of your applications, you’ll use a utility provided with Schema Registry to send the data without having to write any code.

  1. Start the Schema Registry container:

    $ docker run -d \
      --net=confluent \
      --name=schema-registry \
      -e SCHEMA_REGISTRY_HOST_NAME=schema-registry \
  2. Optional: Check that it started correctly by viewing the logs.

    $ docker logs schema-registry
  3. Launch a second Schema Registry container in interactive mode (-it) and then execute the kafka-avro-console-producer utility from there. This publishes data to a new topic that will leverage the Schema Registry.

    $ docker run -it --net=confluent --rm confluentinc/cp-schema-registry:5.0.0 bash
    1. Run the Kafka console producer against your Kafka cluster, instruct it to write to the topic bar, read each line of input as an Avro message, validate the schema against the Schema Registry at the specified URL, and indicate the format of the data.

      # /usr/bin/kafka-avro-console-producer \
        --broker-list kafka:9092 --topic bar \
        --property schema.registry.url=http://schema-registry:8081 \
        --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"f1","type":"string"}]}'

      After it has started, the process waits for you to enter messages, one per line, and will send them immediately when you hit the Enter key.


      If you hit Enter with an empty line, it will be interpreted as a null value and cause an error. You can simply restart the console producer again to continue sending messages.

    2. Enter these messages:

      {"f1": "value1"}
      {"f1": "value2"}
      {"f1": "value3"}
    3. Use Ctrl+C or Ctrl+D to stop the producer client. You can then type exit to leave the container altogether.

Step 5: Start REST Proxy

This section describes how to deploy the REST Proxy container and then consume data from the Confluent REST Proxy service. This demonstrates how to read the data produced from outside the container on your local machine via the REST Proxy. The REST Proxy depends on the Schema Registry when producing and consuming Avro data, so must pass in the details for the detached Schema Registry container that you launched in the previous step.

  1. Start up the REST Proxy:

    $ docker run -d \
      --net=confluent \
      --name=kafka-rest \
      -e KAFKA_REST_ZOOKEEPER_CONNECT=zookeeper:2181 \
      -e KAFKA_REST_SCHEMA_REGISTRY_URL=http://schema-registry:8081 \
      -e KAFKA_REST_HOST_NAME=kafka-rest \

    For the next two steps, you’re going to use curl commands to talk to the REST Proxy. Your deployment steps thus far have ensured that both the REST Proxy container and the Schema Registry container are accessible directly through network ports on your local host. The REST Proxy service is listening at http://localhost:8082.

  2. Launch a new Docker container to execute your commands from:

    $ docker run -it --net=confluent --rm confluentinc/cp-schema-registry:5.0.0 bash
    1. Create a consumer instance.

      # curl -X POST -H "Content-Type: application/vnd.kafka.v1+json" \
        --data '{"name": "my_consumer_instance", "format": "avro", "auto.offset.reset": "smallest"}' \

      You should see the following:

    2. Retrieve data from the bar topic in your cluster. The messages will be decoded, translated to JSON, and included in the response. The schema used for deserialization is retrieved automatically from the Schema Registry service. You configured this by setting the KAFKA_REST_SCHEMA_REGISTRY_URL variable on startup.

      # curl -X GET -H "Accept: application/vnd.kafka.avro.v1+json" \

      You should see the following output:

    3. Use Ctrl+C or Ctrl+D to stop the producer client. You can then type exit to leave the container altogether.

Step 6: Start Control Center

Confluent Control Center is a web-based tool for managing and monitoring Apache Kafka. This portion of the quick start provides an overview of how to use Confluent Control Center with console producers and consumers to monitor consumption and latency.

Stream Monitoring

  1. Start Control Center, binding its data directory to the directory you just created and its HTTP interface to port 9021.

    $ docker run -d \
      --name=control-center \
      --net=confluent \
      --ulimit nofile=16384:16384 \
      -p 9021:9021 \
      -v /tmp/control-center/data:/var/lib/confluent-control-center \
      -e CONTROL_CENTER_ZOOKEEPER_CONNECT=zookeeper:2181 \
      -e CONTROL_CENTER_CONNECT_CLUSTER=http://kafka-connect:8082 \

    Control Center will create the topics it needs in Kafka, including the URL for a Kafka Connect cluster that is created in a later step.

  2. Optional: Verify that it started correctly by searching its logs with the following command:

    $ docker logs control-center | grep Started

    You should see the following:

    [2016-08-26 18:47:26,809] INFO Started NetworkTrafficServerConnector@26d96e5{HTTP/1.1}{} (org.eclipse.jetty.server.NetworkTrafficServerConnector)
    [2016-08-26 18:47:26,811] INFO Started @5211ms (org.eclipse.jetty.server.Server)
  3. To see the Control Center UI, open the link http://localhost:9021 in your browser.

    If you are running Docker Machine, the UI will be running at http://<docker-host-ip>:9021. You can find the Docker Host IP by running this command:

    $ docker-machine ip confluent


    If your Docker daemon is running on a remote machine (such as an AWS EC2 instance), you must allow TCP access to that instance on port 9021. This is done in AWS by adding a “Custom TCP Rule” to the instance’s security group; the rule should all access to port 9021 from any source IP.

    Initially, the Stream Monitoring UI will have no data.


    Confluent Control Center Initial View

  4. Run the console producer and consumer with monitoring interceptors configured and see the data in Control Center. First you must create a topic for testing.

    $ docker run \
      --net=confluent \
      --rm confluentinc/cp-kafka:5.0.0 \
      kafka-topics --create --topic c3-test --partitions 1 --replication-factor 1 --if-not-exists --zookeeper zookeeper:2181
  5. Use the console producer with the monitoring interceptor enabled to send data

    $ while true;
      docker run \
        --net=confluent \
        --rm \
        -e CLASSPATH=/usr/share/java/monitoring-interceptors/monitoring-interceptors-5.0.0.jar \
        confluentinc/cp-kafka-connect:5.0.0 \
        bash -c 'seq 10000 | kafka-console-producer --request-required-acks 1 --broker-list kafka:9092 --topic c3-test --producer-property interceptor.classes=io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor --producer-property acks=1 && echo "Produced 10000 messages."'
        sleep 10;
  6. Use the built-in Kafka Console Producer to produce 10000 simple messages on a 10 second interval to the c3-test topic. After running the command, you should see the following:

    Produced 10000 messages.

    The message will repeat every 10 seconds, as successive iterations of the shell loop are executed. You can terminate the client with a Ctrl+C.

  7. Use the console consumer with the monitoring interceptor enabled to read the data. Run this command in a separate terminal window (prepared with the eval $(docker-machine env confluent) as described earlier).

    $ OFFSET=0
    $ while true;
      docker run \
        --net=confluent \
        --rm \
        -e CLASSPATH=/usr/share/java/monitoring-interceptors/monitoring-interceptors-5.0.0.jar \
        confluentinc/cp-kafka-connect:5.0.0 \
        bash -c 'kafka-console-consumer --consumer-property --consumer-property interceptor.classes=io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor --bootstrap-server kafka:9092 --topic c3-test --offset '$OFFSET' --partition 0 --max-messages=1000'
      sleep 1;
      let OFFSET=OFFSET+1000

    If everything is working as expected, each of the original messages you produced should be written back out:

    Processed a total of 1000 messages

In this step you have intentionally setup a slow consumer to consume at a rate of 1000 messages per second. You’ll soon reach a steady state where the producer window shows an update every 10 seconds while the consumer window shows bursts of 1000 messages received every 1 second. The monitoring activity should appear in the Control Center UI after 15 to 30 seconds. If you don’t see any activity, use the scaling selector in the upper left hand corner of the web page to select a smaller time window (the default is 4 hours, and you’ll want to zoom in to a 10-minute scale). You will notice there will be moments where the bars are colored red to reflect the slow consumption of data.



Confluent Control Center provides alerting functionality to notify you when anomalous events occur in your cluster. This section assumes the console producer and consumer you launched to illustrate the stream monitoring features are still running in the background.

The Alerts and Overview tabs on the lefthand navigation sidebar displays a history of all triggered events. To begin receiving alerts, you’ll need to create a trigger. Click the “Triggers” navigation item and then select “+ New trigger”.

Let’s configure a trigger to fire when the difference between your actual consumption and expected consumption is greater than 1000 messages:


New trigger

Set the trigger name to be “Underconsumption”, which is what will be displayed on the history page when your trigger fires. You must select a specific consumer group (qs-consumer) for this trigger. That’s the name of the group you specified above in your invocation of kafka-console-consumer.

Set the trigger metric to be “Consumption difference” where the condition is “Greater than” 1000 messages. The buffer time (in seconds) is the wall clock time you will wait before firing the trigger to make sure the trigger condition is not too transient.

After saving the trigger, Control Center will now prompt us to associate an action that will execute when your newly created trigger fires. For now, the only action is to send an email. Select your new trigger and choose maximum send rate for your alert email.


New action

Let’s return to your trigger history page. In a short while, you should see a new trigger show up in your alert history. This is because you setup your consumer to consume data at a slower rate than your producer.


A newly triggered event

Step 7: Start Kafka Connect

In this section, a simple data pipeline is created by using Kafka Connect. You will read data from a file and write that data to a new file. You will then extend the pipeline to show how to use Connect to read from a database table.

Kafka Connect stores all its stateful data (configuration, status, and internal offsets for connectors) directly in Kafka topics. You will create these topics in the Kafka cluster you have running from the steps above.


You can allow Connect to auto-create these topics by enabling the auto.create.topics.enable setting. However, it is recommended that you manually create these topics. With manual creation you can control settings like replication factor and number of partitions.

  1. Start up a container with Kafka Connect.

    $ docker run \
      --net=confluent \
      --rm \
      confluentinc/cp-kafka:5.0.0 \
      kafka-topics --create --topic quickstart-offsets --partitions 1 \
      --replication-factor 1 --if-not-exists --zookeeper zookeeper:2181
  2. Create a topic for storing data that you’ll be sending to Kafka.

    $ docker run \
      --net=confluent \
      --rm \
      confluentinc/cp-kafka:5.0.0 \
      kafka-topics --create --topic quickstart-data --partitions 1 \
      --replication-factor 1 --if-not-exists --zookeeper zookeeper:2181
  3. Optional: Verify that the topics are created before moving on:

    $ docker run \
       --net=confluent \
       --rm \
       confluentinc/cp-kafka:5.0.0 \
       kafka-topics --describe --zookeeper zookeeper:2181
  4. Create a FileSourceConnector and directories for storing the input and output files. This starts a Connect worker in distributed mode and points Connect to the three topics that were created in the first step of this quick start.

    $ docker run -d \
      --name=kafka-connect \
      --net=confluent \
      -e CONNECT_PRODUCER_INTERCEPTOR_CLASSES=io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor \
      -e CONNECT_CONSUMER_INTERCEPTOR_CLASSES=io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor \
      -e CONNECT_BOOTSTRAP_SERVERS=kafka:9092 \
      -e CONNECT_REST_PORT=8082 \
      -e CONNECT_GROUP_ID="quickstart" \
      -e CONNECT_CONFIG_STORAGE_TOPIC="quickstart-config" \
      -e CONNECT_OFFSET_STORAGE_TOPIC="quickstart-offsets" \
      -e CONNECT_STATUS_STORAGE_TOPIC="quickstart-status" \
      -e CONNECT_KEY_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
      -e CONNECT_VALUE_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
      -e CONNECT_INTERNAL_KEY_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
      -e CONNECT_INTERNAL_VALUE_CONVERTER="org.apache.kafka.connect.json.JsonConverter" \
      -e CONNECT_REST_ADVERTISED_HOST_NAME="kafka-connect" \
      -e CONNECT_PLUGIN_PATH=/usr/share/java \
      -e CONNECT_REST_HOST_NAME="kafka-connect" \
      -v /tmp/quickstart/file:/tmp/quickstart \
  5. Optional: Verify that the Connect worker is up by running the following command to search the logs:

    $ docker logs kafka-connect | grep started

    You should see the following:

    [2016-08-25 18:25:19,665] INFO Herder started (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
    [2016-08-25 18:25:19,676] INFO Kafka Connect started (org.apache.kafka.connect.runtime.Connect)
  6. Create the directory to store the input and output data files.

    $ docker exec kafka-connect mkdir -p /tmp/quickstart/file
  7. Create a connector for reading a file from disk.

    1. Create a file with some data.

      $ docker exec kafka-connect sh -c 'seq 1000 > /tmp/quickstart/file/input.txt'
    2. Create the connector using the Kafka Connect REST API.

      $ docker exec kafka-connect curl -s -X POST \
        -H "Content-Type: application/json" \
        --data '{"name": "quickstart-file-source", "config": {"connector.class":"org.apache.kafka.connect.file.FileStreamSourceConnector", "tasks.max":"1", "topic":"quickstart-data", "file": "/tmp/quickstart/file/input.txt"}}' \

      Your output should look like the following:

    3. Optional: verify the status of the connector using this curl command:

      $ docker exec kafka-connect curl -s -X GET http://kafka-connect:8082/connectors/quickstart-file-source/status

      You should see the following output including the state of the connector as RUNNING:

    4. Optional: Read a sample of 10 records from the quickstart-data topic to check if the connector is uploading data to Kafka. You should run this command in a separate terminal window, retaining the SSH session to the Docker Host for later commands.

      $ docker run \
        --net=confluent \
        --rm \
        confluentinc/cp-kafka:5.0.0 \
        kafka-console-consumer --bootstrap-server kafka:9092 --topic \
        quickstart-data --from-beginning --max-messages 10

      You should see the following:

           Processed a total of 10 messages
      Success!  You now have a functioning source connector!
  8. Launch a File Sink to read from this topic and write to an output file. Run the following command from the Docker Host session started earlier:

    $ docker exec kafka-connect curl -X POST -H "Content-Type: application/json" \
        --data '{"name": "quickstart-file-sink", \
        "config": {"connector.class":"org.apache.kafka.connect.file.FileStreamSinkConnector", "tasks.max":"1", \
        "topics":"quickstart-data", "file": "/tmp/quickstart/file/output.txt"}}' \

    You should see the following output, confirming that the quickstart-file-sink connector is created and will write to /tmp/quickstart/file/output.txt:

  9. Optional: Verify the status of the connector:

    $ docker exec kafka-connect curl -s -X GET http://kafka-connect:8082/connectors/quickstart-file-sink/status

    You should see the following:

  10. Optional: Check the file to see if the data is present.

    $ docker exec kafka-connect cat /tmp/quickstart/file/output.txt

    You should see all of the data you originally wrote to the input file:


Step 8. Monitor in Control Center

Next you’ll see how to monitor the Kafka Connect connectors in Control Center. Because you specified the monitoring interceptors when you deployed the Connect container, the data flows through all of your connectors will monitored in the same ways as the console producer/consumer tasks you executed above. Additionally, Control Center allows us to visually manage and deploy connectors, as you’ll see now.

Select the Management / Kafka Connect link in the Control Center navigation bar. Select the SOURCES and SINKS tabs at the top of the page to see that both the source and sink are running.


Confluent Control Center showing a Connect source


Confluent Control Center showing a Connect sink

You should start to see stream monitoring data from Kafka Connect in the Control Center UI from the running connectors. Remember that the file contained only 1000 messages, so you’ll only see a short spike of topic data.


Confluent Control Center monitoring Kafka Connect


After you’re done, cleanup is simple. Run the command docker rm -f $(docker ps -a -q) to delete all the containers you created in the steps above, docker volume prune to remove any remaining unused volumes, and docker network rm confluent to delete the network that was created.

If you are running Docker Machine, you can remove the virtual machine with this command: docker-machine rm confluent.

You must explicitly shut down Docker Compose. For more information, see the docker-compose down documentation. This will delete all of the containers that you created in this quick start.

$ docker-compose down

Next Steps

  • For examples of how to add mounted volumes to your host machines, see Mounting Docker External Volumes. Mounted volumes provide a persistent storage layer for deployed containers. With a persistent storage layer, you can stop and restart Docker images, such as cp-kafka and cp-zookeeper, without losing their stateful data.
  • For examples of more complex target environments, see the Docker Installation Recipes.
  • For more information about Schema Registry, see Schema Registry.
  • For a more in-depth Kafka Connect example, see Kafka Connect Tutorial on Docker.