Create an Apache Kafka Client App for Kafka Connect Datagen

In this tutorial, you will run a Kafka Connect Datagen source connector using Kafka Connect Datagen that produces messages to and consumes messages from an Apache Kafka® cluster.

After you run the tutorial, use the provided source code as a reference to develop your own Kafka client application.

Prerequisites

Client

  • Docker
  • Download Confluent Platform 7.2.8

Kafka Cluster

The easiest way to follow this tutorial is with Confluent Cloud because you don’t have to run a local Kafka cluster. From the Console, click on LEARN to provision a cluster and click on Clients to get the cluster-specific configurations and credentials to set for your client application.

You can alternatively use the supported CLI or REST API, or the community-supported ccloud-stack utility for Confluent Cloud.

If you don’t want to use Confluent Cloud, you can also use this tutorial with a Kafka cluster running on your local host or any other remote server.

Setup

  1. Clone the confluentinc/examples GitHub repository and check out the 7.2.8-post branch.

    git clone https://github.com/confluentinc/examples
    cd examples
    git checkout 7.2.8-post
    
  2. Change directory to the example for Kafka Connect Datagen.

    cd clients/cloud/kafka-connect-datagen/
    
  3. Create a local file (for example, at $HOME/.confluent/java.config) with configuration parameters to connect to your Kafka cluster. Starting with one of the templates below, customize the file with connection information to your cluster. Substitute your values for {{ BROKER_ENDPOINT }}, {{CLUSTER_API_KEY }}, and {{ CLUSTER_API_SECRET }} (see Configure Confluent Cloud Clients for instructions on how to manually find these values, or use the ccloud-stack Utility for Confluent Cloud to automatically create them).

    • Template configuration file for Confluent Cloud

      # Required connection configs for Kafka producer, consumer, and admin
      bootstrap.servers={{ BROKER_ENDPOINT }}
      security.protocol=SASL_SSL
      sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username='{{ CLUSTER_API_KEY }}' password='{{ CLUSTER_API_SECRET }}';
      sasl.mechanism=PLAIN
      # Required for correctness in Apache Kafka clients prior to 2.6
      client.dns.lookup=use_all_dns_ips
      
      # Best practice for higher availability in Apache Kafka clients prior to 3.0
      session.timeout.ms=45000
      
      # Best practice for Kafka producer to prevent data loss 
      acks=all
      
    • Template configuration file for local host

      # Kafka
      bootstrap.servers=localhost:9092
      

Basic Producer and Consumer

In this example, the producer application writes Kafka data to a topic in your Kafka cluster. If the topic does not already exist in your Kafka cluster, the producer application will use the Kafka Admin Client API to create the topic. Each record written to Kafka has a key representing a username (for example, alice) and a value of a count, formatted as json (for example, {"count": 0}). The consumer application reads the same Kafka topic and keeps a rolling sum of the count as it processes each record.

Produce Records

  1. Create the topic in Confluent Cloud.

    kafka-topics --bootstrap-server `grep "^\s*bootstrap.server" $HOME/.confluent/java.config | tail -1` --command-config $HOME/.confluent/java.config --topic test1 --create --replication-factor 3 --partitions 6
    
  2. Generate a file of ENV variables used by Docker to set the bootstrap servers and security configuration.

    ../../../ccloud/ccloud-generate-cp-configs.sh $HOME/.confluent/java.config
    
  3. Source the generated file of ENV variables.

    source ./delta_configs/env.delta
    
  4. Start Docker by running the following command:

    docker-compose up -d
    

    You should see:

    Creating connect ... done
    
  5. Wait for about 60 seconds, and then verify Connect is ready by running the following command:

    docker-compose logs -f connect | grep "Finished starting connectors and tasks"
    

    You should see:

    connect    | [2019-05-30 14:43:53,799] INFO Finished starting connectors and tasks (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
    
  6. Verify the Connect Datagen connector plugin is available by running the following command:

    docker-compose logs connect | grep "DatagenConnector"
    

    You should see:

    connect    | [2019-05-30 14:43:41,167] INFO Added plugin 'io.confluent.kafka.connect.datagen.DatagenConnector' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
    connect    | [2019-05-30 14:43:42,614] INFO Added aliases 'DatagenConnector' and 'Datagen' to plugin 'io.confluent.kafka.connect.datagen.DatagenConnector' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
    
  7. Submit the kafka-connect-datagen connector.

    ./submit_datagen_orders_config.sh
    
  8. View the kafka-connect-datagen code.

Consume Records

  1. Consume from topic test1 by doing the following:

    • Referencing a properties file

      docker-compose exec connect bash -c 'kafka-console-consumer --topic test1 --bootstrap-server $CONNECT_BOOTSTRAP_SERVERS --consumer.config /tmp/ak-tools-ccloud.delta --max-messages 5'
      
    • Referencing individual properties

      docker-compose exec connect bash -c 'kafka-console-consumer --topic test1 --bootstrap-server $CONNECT_BOOTSTRAP_SERVERS --consumer-property sasl.mechanism=PLAIN --consumer-property security.protocol=SASL_SSL --consumer-property sasl.jaas.config="$SASL_JAAS_CONFIG_PROPERTY_FORMAT" --max-messages 5'
      

    You should see messages as follows:

    {"ordertime":1489322485717,"orderid":15,"itemid":"Item_352","orderunits":9.703502112840228,"address":{"city":"City_48","state":"State_21","zipcode":32731}}
    
  2. When you are done, press CTRL-C.

  3. View the consumer code.

Avro and Confluent Cloud Schema Registry

This example is similar to the previous example, except the value is formatted as Avro and integrates with the Confluent Cloud Schema Registry. Before using Confluent Cloud Schema Registry, check its availability and limits.

  1. As described in the Quick Start for Schema Management on Confluent Cloud in the Confluent Cloud Console, enable Confluent Cloud Schema Registry and create an API key and secret to connect to it.

  2. Verify that your VPC can connect to the Confluent Cloud Schema Registry public internet endpoint.

  3. Update your local configuration file (for example, at $HOME/.confluent/java.config) with parameters to connect to Schema Registry.

    • Template configuration file for Confluent Cloud

      # Required connection configs for Kafka producer, consumer, and admin
      bootstrap.servers={{ BROKER_ENDPOINT }}
      security.protocol=SASL_SSL
      sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username='{{ CLUSTER_API_KEY }}' password='{{ CLUSTER_API_SECRET }}';
      sasl.mechanism=PLAIN
      # Required for correctness in Apache Kafka clients prior to 2.6
      client.dns.lookup=use_all_dns_ips
      
      # Best practice for higher availability in Apache Kafka clients prior to 3.0
      session.timeout.ms=45000
      
      # Best practice for Kafka producer to prevent data loss 
      acks=all
      
      # Required connection configs for Confluent Cloud Schema Registry
      schema.registry.url=https://{{ SR_ENDPOINT }}
      basic.auth.credentials.source=USER_INFO
      basic.auth.user.info={{ SR_API_KEY }}:{{ SR_API_SECRET }}
      
    • Template configuration file for local host

      # Kafka
      bootstrap.servers=localhost:9092
      
      # Confluent Schema Registry
      schema.registry.url=http://localhost:8081
      
  4. Verify your Confluent Cloud Schema Registry credentials by listing the Schema Registry subjects. In the following example, substitute your values for {{ SR_API_KEY }}, {{ SR_API_SECRET }}, and {{ SR_ENDPOINT }}.

    curl -u {{ SR_API_KEY }}:{{ SR_API_SECRET }} https://{{ SR_ENDPOINT }}/subjects
    

Produce Avro Records

  1. Create the topic in Confluent Cloud.

    kafka-topics --bootstrap-server `grep "^\s*bootstrap.server" $HOME/.confluent/java.config | tail -1` --command-config $HOME/.confluent/java.config --topic test2 --create --replication-factor 3 --partitions 6
    
  2. Generate a file of `ENV variables used by Docker to set the bootstrap servers and security configuration.

    ../../../ccloud/ccloud-generate-cp-configs.sh $HOME/.confluent/java.config
    
  3. Source the generated file of ENV variables.

    source ./delta_configs/env.delta
    
  4. Start Docker by running the following command:

    docker-compose up -d
    

    You should see:

    Creating connect ... done
    
  5. Wait for about 60 seconds, and then verify Connect is ready by running the following command:

    docker-compose logs -f connect | grep "Finished starting connectors and tasks"
    

    You should see:

    connect    | [2019-05-30 14:43:53,799] INFO Finished starting connectors and tasks (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
    
  6. Verify the Connect Datagen connector plugin is available by running the following command:

    docker-compose logs connect | grep "DatagenConnector"
    

    You should see:

    connect    | [2019-05-30 14:43:41,167] INFO Added plugin 'io.confluent.kafka.connect.datagen.DatagenConnector' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
    connect    | [2019-05-30 14:43:42,614] INFO Added aliases 'DatagenConnector' and 'Datagen' to plugin 'io.confluent.kafka.connect.datagen.DatagenConnector' (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
    
  7. Submit the kafka-connect-datagen connector.

    ./submit_datagen_orders_config_avro.sh
    
  8. View the kafka-connect-datagen Avro code.

Consume Avro Records

  1. Consume from topic test2 by doing the following:

    • Referencing a properties file

      docker-compose exec connect bash -c 'kafka-avro-console-consumer --topic test2 --bootstrap-server $CONNECT_BOOTSTRAP_SERVERS --consumer.config /tmp/ak-tools-ccloud.delta --property basic.auth.credentials.source=$CONNECT_VALUE_CONVERTER_BASIC_AUTH_CREDENTIALS_SOURCE --property schema.registry.basic.auth.user.info=$CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_BASIC_AUTH_USER_INFO --property schema.registry.url=$CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL --max-messages 5'
      
    • Referencing individual properties

      docker-compose exec connect bash -c 'kafka-avro-console-consumer --topic test2 --bootstrap-server $CONNECT_BOOTSTRAP_SERVERS --consumer-property sasl.mechanism=PLAIN --consumer-property security.protocol=SASL_SSL --consumer-property sasl.jaas.config="$SASL_JAAS_CONFIG_PROPERTY_FORMAT" --property basic.auth.credentials.source=$CONNECT_VALUE_CONVERTER_BASIC_AUTH_CREDENTIALS_SOURCE --property schema.registry.basic.auth.user.info=$CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_BASIC_AUTH_USER_INFO --property schema.registry.url=$CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL --max-messages 5'
      

    You should see the following messages:

    {"ordertime":{"long":1494153923330},"orderid":{"int":25},"itemid":{"string":"Item_441"},"orderunits":{"double":0.9910185646928878},"address":{"io.confluent.ksql.avro_schemas.KsqlDataSourceSchema_address":{"city":{"string":"City_61"},"state":{"string":"State_41"},"zipcode":{"long":60468}}}}
    
  2. When you are done, press CTRL-C.

  3. View the consumer Avro code.

Confluent Cloud Schema Registry

  1. View the schema subjects registered in Confluent Cloud Schema Registry. In the following output, substitute values for <SR API KEY>, <SR API SECRET>, and <SR ENDPOINT>.

    curl -u <SR API KEY>:<SR API SECRET> https://<SR ENDPOINT>/subjects
    
  2. Verify that the subject test2-value exists.

    ["test2-value"]
    
  3. View the schema information for subject test2-value. In the following output, substitute values for <SR API KEY>, <SR API SECRET>, and <SR ENDPOINT>.

    curl -u <SR API KEY>:<SR API SECRET> https://<SR ENDPOINT>/subjects/test2-value/versions/1
    
  4. Verify the schema information for subject test2-value.

    {"subject":"test2-value","version":1,"id":100001,"schema":"{\"type\":\"record\",\"name\":\"KsqlDataSourceSchema\",\"namespace\":\"io.confluent.ksql.avro_schemas\",\"fields\":[{\"name\":\"ordertime\",\"type\":[\"null\",\"long\"],\"default\":null},{\"name\":\"orderid\",\"type\":[\"null\",\"int\"],\"default\":null},{\"name\":\"itemid\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"orderunits\",\"type\":[\"null\",\"double\"],\"default\":null},{\"name\":\"address\",\"type\":[\"null\",{\"type\":\"record\",\"name\":\"KsqlDataSourceSchema_address\",\"fields\":[{\"name\":\"city\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"state\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"zipcode\",\"type\":[\"null\",\"long\"],\"default\":null}]}],\"default\":null}]}"}