QuickstartΒΆ

You can get up and running with the full Confluent platform quickly on a single server. If you are interested in deploying with Docker, please refer to our Docker Quickstart.

In this quickstart we’ll show how to run ZooKeeper, Kafka, Kafka Connect, and Control Center and then write and read some data to/from Kafka.

  1. Download and install the Confluent platform. In this quickstart we’ll use the zip archive, but there are many other installation options.

    Here is a high-level view of the contents of the package:

    confluent-3.2.1/bin/        # Driver scripts for starting/stopping services
    confluent-3.2.1/etc/        # Configuration files
    confluent-3.2.1/share/java/ # Jars
    

    If you installed from deb or rpm packages, the contents are installed globally and you’ll need to adjust the paths used below:

    /usr/bin/                  # Driver scripts for starting/stopping services, prefixed with <package> names
    /etc/<package>/            # Configuration files
    /usr/share/java/<package>/ # Jars
    
  2. Start Zookeeper. Since this is a long-running service, you should run it in its own terminal (or at least run it in the background and redirect output to a file):

    # The following commands assume you exactly followed the instructions above.
    # This means, for example, that at this point your current working directory
    # must be confluent-3.2.1/.
    $ ./bin/zookeeper-server-start ./etc/kafka/zookeeper.properties
    
  3. Configure Confluent Metrics Reporter. by uncommenting the following settings in ./etc/kafka/server.properties

    metric.reporters=io.confluent.metrics.reporter.ConfluentMetricsReporter
    confluent.metrics.reporter.bootstrap.servers=localhost:9092
    confluent.metrics.reporter.zookeeper.connect=localhost:2181
    confluent.metrics.reporter.topic.replicas=1
    
  4. Start Kafka, also in its own terminal.

    $ ./bin/kafka-server-start ./etc/kafka/server.properties
    
  5. Copy the settings for Kafka Connect, and add support for the interceptors:

    $ cp etc/kafka/connect-distributed.properties /tmp/connect-distributed.properties
    $ echo "" >> /tmp/connect-distributed.properties
    $ cat <<EOF >> /tmp/connect-distributed.properties
    consumer.interceptor.classes=io.confluent.monitoring.clients.interceptor.MonitoringConsumerInterceptor
    producer.interceptor.classes=io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor
    EOF
    
  6. Start Kafka Connect in its own terminal.

    $ ./bin/connect-distributed /tmp/connect-distributed.properties
    
  7. Start Control Center in its own terminal (set to run with one replica):

    $ cp etc/confluent-control-center/control-center.properties /tmp/control-center.properties
    $ cat <<EOF >> /tmp/control-center.properties
    confluent.controlcenter.internal.topics.partitions=1
    confluent.controlcenter.internal.topics.replication=1
    confluent.controlcenter.command.topic.replication=1
    confluent.monitoring.interceptor.topic.partitions=1
    confluent.monitoring.interceptor.topic.replication=1
    EOF
    $ ./bin/control-center-start /tmp/control-center.properties
    
  8. Now we have all the services running and can start building a data pipeline. As an example, let’s create a small job to create data. Open an editor, enter the following text (our apologies to William Carlos Williams), and save this as “totail.sh”.

    #!/usr/bin/env bash
    
    file=/tmp/totail.txt
    
    while true; do
        echo This is just to say >> ${file}
        echo >> ${file}
        echo I have eaten >> ${file}
        echo the plums >> ${file}
        echo that were in >> ${file}
        echo the icebox >> ${file}
        echo >> ${file}
        echo and which >> ${file}
        echo you were probably >> ${file}
        echo saving >> ${file}
        echo for breakfast >> ${file}
        echo >> ${file}
        echo Forgive me >> ${file}
        echo they were delicious >> ${file}
        echo so sweet >> ${file}
        echo and so cold >> ${file}
        sleep 1
    done
    
  9. Start this script. (It writes the poem to /tmp/totail.txt once per second. We will use Kafka Connect to load that into a Kafka topic.)

    $ bash totail.sh
    
  10. Use the Kafka Topics tool to create a new topic:

    $ ./bin/kafka-topics --zookeeper localhost:2181 --create --topic poem \
       --partitions 1 --replication-factor 1
    
  11. Now, open your web browser, and go to the URL http://localhost:9021/. This will open up the web interface for Control Center.

  12. Click on the Kafka Connect button on the left side. On this page you can see a list of sources that have been configured - by default it will be empty. Click the “New source” button.

    From the Connection Class dropdown menu select FileStreamSourceConnector. Specify the Connection Name as Poem File Source. Once you have specified a name for the connection a set of other configuration options will appear.

    In the General section specify the file as /tmp/totail.txt and the topic as poem.

    Click on the Continue button, and then Save & Finish to apply the new configuration.

  13. From the Kafka Connect tab click on the “Sinks” tab. Click the “New sink” button. From the Topics dropdown list, choose “poem”. Click Continue.

In the next screen set the Connection Class to “FileStreamSinkConnector” and set the Connection Name as Poem File Sink. Once you have specified a name for the connection a set of other configuration options will appear.

In the General section specify the file as /tmp/sunk.txt. Click Continue and then “Save & Finish”.

  1. In a terminal window, open the file /tmp/sunk.txt. This file will have almost the same contents as /tmp/totail.txt (it may be a few lines behind, depending on when you check).
  2. Now that you have data flowing into and out of Kafka, let’s monitor what’s going on! Click on the button on the left side that says “Stream Monitoring.” Very soon (a couple seconds on a fast server, longer on an overworked laptop), a chart will appear showing the total number of messages produced and consumed on the cluster. If you scroll down, you will see more details on the consumer group for your sink.

When you’re done testing, you can use Ctrl+C to shutdown each service, in the reverse order that you started them.

This simple guide only covered Kafka, Kafka Connect, and Control Center. See the documentation for each component for a quickstart guide specific to that component: