Important

You are viewing documentation for an older version of Confluent Platform. For the latest, click here.

Python

In this tutorial, you will run a Python client application that produces messages to and consumes messages from an Apache Kafka® cluster.

After you run the tutorial, view the provided source code and use it as a reference to develop your own Kafka client application.

Prerequisites

Client

  • A functioning Python environment with the Confluent Python Client for Apache Kafka installed.

  • You can use Virtualenv and run the following commands to create a virtual environment with the client installed.

    virtualenv ccloud-venv
    source ./ccloud-venv/bin/activate
    pip install -r requirements.txt
    
  • Check your confluent-kafka library version. The requirements.txt file specifies a version of the confluent-kafka library >= 1.4.2 which is required for the latest Serialization API demonstrated here. If you install the library manually or globally, the same version requirements apply.

Configure SSL trust store

Depending on your operating system or Linux distro you may need to take extra steps to set up the SSL CA root certificates. If your system doesn’t have the SSL CA root certificates properly set up, you may receive an error message similar to the following:

%3|1554125834.196|FAIL|rdkafka#producer-2| [thrd:sasl_ssl://pkc-epgnk.us-central1.gcp.confluent.cloud\:9092/boot]: sasl_ssl://pkc-epgnk.us-central1.gcp.confluent.cloud\:9092/bootstrap: Failed to verify broker certificate: unable to get issuer certificate (after 626ms in state CONNECT)
%3|1554125834.197|ERROR|rdkafka#producer-2| [thrd:sasl_ssl://pkc-epgnk.us-central1.gcp.confluent.cloud\:9092/boot]: sasl_ssl://pkc-epgnk.us-central1.gcp.confluent.cloud\:9092/bootstrap: Failed to verify broker certificate: unable to get issuer certificate (after 626ms in state CONNECT)
%3|1554125834.197|ERROR|rdkafka#producer-2| [thrd:sasl_ssl://pkc-epgnk.us-central1.gcp.confluent.cloud\:9092/boot]: 1/1 brokers are down
macOS

On newer versions of macOS (for example, 10.15), you may need to add an additional dependency:

pip install certifi

Add the ssl.ca.location property to the config dict object in producer.py and consumer.py, and its value should correspond to the location of the appropriate CA certificates file on your host:

ssl.ca.location: '/Library/Python/3.7/site-packages/certifi/cacert.pem'
CentOS
sudo yum reinstall ca-certificates

Add the ssl.ca.location property to the config dict object in producer.py and consumer.py, and its value should correspond to the location of the appropriate CA certificates file on your host:

ssl.ca.location: '/etc/ssl/certs/ca-bundle.crt'

For more information, see the librdkafka documentation on which this Python producer is built.

Kafka Cluster

  • You can use this tutorial with a Kafka cluster in any environment:
  • If you are running on Confluent Cloud, you must have access to a Confluent Cloud cluster
    • The first 20 users to sign up for Confluent Cloud and use promo code C50INTEG will receive an additional $50 free usage (details)

Setup

  1. Clone the confluentinc/examples GitHub repository and check out the 5.5.15-post branch.

    git clone https://github.com/confluentinc/examples
    cd examples
    git checkout 5.5.15-post
    
  2. Change directory to the example for Python.

    cd clients/cloud/python/
    
  3. Create a local file (for example, at $HOME/.confluent/librdkafka.config) with configuration parameters to connect to your Kafka cluster. Starting with one of the templates below, customize the file with connection information to your cluster. Substitute your values for {{ BROKER_ENDPOINT }}, {{CLUSTER_API_KEY }}, and {{ CLUSTER_API_SECRET }} (see Connecting Clients to Confluent Cloud for instructions on how to create or find those values).

    • Template configuration file for Confluent Cloud

      # Kafka
      bootstrap.servers={{ BROKER_ENDPOINT }}
      security.protocol=SASL_SSL
      sasl.mechanisms=PLAIN
      sasl.username={{ CLUSTER_API_KEY }}
      sasl.password={{ CLUSTER_API_SECRET }}
      
    • Template configuration file for local host

      # Kafka
      bootstrap.servers=localhost:9092
      

Basic Producer and Consumer

In this example, the producer application writes Kafka data to a topic in your Kafka cluster. If the topic does not already exist in your Kafka cluster, the producer application will use the Kafka Admin Client API to create the topic. Each record written to Kafka has a key representing a username (for example, alice) and a value of a count, formatted as json (for example, {"count": 0}). The consumer application reads the same Kafka topic and keeps a rolling sum of the count as it processes each record.

Produce Records

  1. Run the producer, passing in arguments for:

    • the local file with configuration parameters to connect to your Kafka cluster
    • topic name
    ./producer.py -f $HOME/.confluent/librdkafka.config -t test1
    
  2. Verify that the producer sent all the messages. You should see:

    Producing record: alice      {"count": 0}
    Producing record: alice      {"count": 1}
    Producing record: alice      {"count": 2}
    Producing record: alice      {"count": 3}
    Producing record: alice      {"count": 4}
    Producing record: alice      {"count": 5}
    Producing record: alice      {"count": 6}
    Producing record: alice      {"count": 7}
    Producing record: alice      {"count": 8}
    Producing record: alice      {"count": 9}
    Produced record to topic test1 partition [0] @ offset 0
    Produced record to topic test1 partition [0] @ offset 1
    Produced record to topic test1 partition [0] @ offset 2
    Produced record to topic test1 partition [0] @ offset 3
    Produced record to topic test1 partition [0] @ offset 4
    Produced record to topic test1 partition [0] @ offset 5
    Produced record to topic test1 partition [0] @ offset 6
    Produced record to topic test1 partition [0] @ offset 7
    Produced record to topic test1 partition [0] @ offset 8
    Produced record to topic test1 partition [0] @ offset 9
    10 messages were produced to topic test1!
    
  3. View the producer code.

Consume Records

  1. Run the consumer, passing in arguments for:

    • the local file with configuration parameters to connect to your Kafka cluster.
    • the same topic name you used in step 1.
    ./consumer.py -f $HOME/.confluent/librdkafka.config -t test1
    
  2. Verify the consumer received all the messages. You should see:

    ...
    Waiting for message or event/error in poll()
    Consumed record with key alice and value {"count": 0}, and updated total count to 0
    Consumed record with key alice and value {"count": 1}, and updated total count to 1
    Consumed record with key alice and value {"count": 2}, and updated total count to 3
    Consumed record with key alice and value {"count": 3}, and updated total count to 6
    Consumed record with key alice and value {"count": 4}, and updated total count to 10
    Consumed record with key alice and value {"count": 5}, and updated total count to 15
    Consumed record with key alice and value {"count": 6}, and updated total count to 21
    Consumed record with key alice and value {"count": 7}, and updated total count to 28
    Consumed record with key alice and value {"count": 8}, and updated total count to 36
    Consumed record with key alice and value {"count": 9}, and updated total count to 45
    Waiting for message or event/error in poll()
    ...
    
  3. View the consumer code.

Avro And Confluent Cloud Schema Registry

This example is similar to the previous example, except the value is formatted as Avro and integrates with the Confluent Cloud Schema Registry. Before using Confluent Cloud Schema Registry, check its availability and limits.

These examples use the latest Serializer API provided by the confluent-kafka library. The Serializer API replaces the legacy AvroProducer and AvroConsumer classes to provide a more flexible API including additional support for JSON, Protobuf, and Avro data formats. See the latest confluent-kafka documentation for further details.

  1. As described in the Schema Registry and Confluent Cloud in the Confluent Cloud GUI, enable Confluent Cloud Schema Registry and create an API key and secret to connect to it.

  2. Verify that your VPC can connect to the Confluent Cloud Schema Registry public internet endpoint.

  3. Update your local configuration file (for example, at $HOME/.confluent/librdkafka.config) with parameters to connect to Schema Registry.

    • Template configuration file for Confluent Cloud

      # Kafka
      bootstrap.servers={{ BROKER_ENDPOINT }}
      security.protocol=SASL_SSL
      sasl.mechanisms=PLAIN
      sasl.username={{ CLUSTER_API_KEY }}
      sasl.password={{ CLUSTER_API_SECRET }}
      
      # Confluent Cloud Schema Registry
      schema.registry.url=https://{{ SR_ENDPOINT }}
      basic.auth.credentials.source=USER_INFO
      schema.registry.basic.auth.user.info={{ SR_API_KEY }}:{{ SR_API_SECRET }}
      
    • Template configuration file for local host

      # Kafka
      bootstrap.servers=localhost:9092
      
      # Confluent Schema Registry
      schema.registry.url=http://localhost:8081
      
  4. Verify your Confluent Cloud Schema Registry credentials work from your host. In the following example, substitute your values for {{ SR_API_KEY}}, {{SR_API_SECRET }}, and {{ SR_ENDPOINT }}.

    # View the list of registered subjects
    $ curl -u {{ SR_API_KEY }}:{{ SR_API_SECRET }} https://{{ SR_ENDPOINT }}/subjects
    
    # Same as above, as a single bash command to parse the values out of  $HOME/.confluent/librdkafka.config
    $ curl -u $(grep "^schema.registry.basic.auth.user.info"  $HOME/.confluent/librdkafka.config | cut -d'=' -f2) $(grep "^schema.registry.url"  $HOME/.confluent/librdkafka.config | cut -d'=' -f2)/subjects
    

Produce Avro Records

  1. Run the Avro producer, passing in arguments for:

    • the local file with configuration parameters to connect to your Kafka cluster
    • the topic name
    ./producer_ccsr.py -f  $HOME/.confluent/librdkafka.config -t test2
    
  2. Verify that the producer sent all the messages. You should see:

    Producing Avro record: alice    0
    Producing Avro record: alice    1
    Producing Avro record: alice    2
    Producing Avro record: alice    3
    Producing Avro record: alice    4
    Producing Avro record: alice    5
    Producing Avro record: alice    6
    Producing Avro record: alice    7
    Producing Avro record: alice    8
    Producing Avro record: alice    9
    Produced record to topic test2 partition [0] @ offset 0
    Produced record to topic test2 partition [0] @ offset 1
    Produced record to topic test2 partition [0] @ offset 2
    Produced record to topic test2 partition [0] @ offset 3
    Produced record to topic test2 partition [0] @ offset 4
    Produced record to topic test2 partition [0] @ offset 5
    Produced record to topic test2 partition [0] @ offset 6
    Produced record to topic test2 partition [0] @ offset 7
    Produced record to topic test2 partition [0] @ offset 8
    Produced record to topic test2 partition [0] @ offset 9
    10 messages were produced to topic test2!
    
  3. View the producer Avro code.

Consume Avro Records

  1. Run the Avro consumer, passing in arguments for:

    • the local file with configuration parameters to connect to your Kafka cluster
    • the same topic name you used in step 5
    ./consumer_ccsr.py -f $HOME/.confluent/librdkafka.config -t test2
    
  2. Verify the consumer received all the messages. You should see:

    ./consumer_ccsr.py -f $HOME/.confluent/librdkafka.config -t test2
    ...
    Waiting for message or event/error in poll()
    Consumed record with key alice and value 0,                       and updated total count to 0
    Consumed record with key alice and value 1,                       and updated total count to 1
    Consumed record with key alice and value 2,                       and updated total count to 3
    Consumed record with key alice and value 3,                       and updated total count to 6
    Consumed record with key alice and value 4,                       and updated total count to 10
    Consumed record with key alice and value 5,                       and updated total count to 15
    Consumed record with key alice and value 6,                       and updated total count to 21
    Consumed record with key alice and value 7,                       and updated total count to 28
    Consumed record with key alice and value 8,                       and updated total count to 36
    Consumed record with key alice and value 9,                       and updated total count to 45
    ...
    
  3. View the consumer Avro code.

Confluent Cloud Schema Registry

  1. View the schema subjects registered in Confluent Cloud Schema Registry. In the following output, substitute values for <SR API KEY>, <SR API SECRET>, and <SR ENDPOINT>.

    curl -u <SR API KEY>:<SR API SECRET> https://<SR ENDPOINT>/subjects
    
  2. Verify that the subject test2-value exists.

    ["test2-value"]
    
  3. View the schema information for subject test2-value. In the following output, substitute values for <SR API KEY>, <SR API SECRET>, and <SR ENDPOINT>.

    curl -u <SR API KEY>:<SR API SECRET> https://<SR ENDPOINT>/subjects/test2-value/versions/1
    
  4. Verify the schema information for subject test2-value.

    {"subject":"test2-value","version":1,"id":100001,"schema":"{\"name\":\"io.confluent.examples.clients.cloud.DataRecordAvro\",\"type\":\"record\",\"fields\":[{\"name\":\"count\",\"type\":\"long\"}]}"}
    

Run the All the Code in Docker

You can also run all the previous code within Docker.

  1. Ensure you have created a local file with configuration parameters to connect to your Kafka cluster at $HOME/.confluent/librdkafka.config.

  2. View the Dockerfile that builds a custom Docker image.

    FROM python:3.7-slim
    
    COPY requirements.txt /tmp/requirements.txt
    RUN pip3 install -U -r /tmp/requirements.txt
    
    COPY *.py ./
    
  3. Build the Docker image using the following command:

    docker build -t cloud-demo-python .
    
  4. Run the Docker image using the following command:

    docker run -v $HOME/.confluent/librdkafka.config:/root/.confluent/librdkafka.config -it --rm cloud-demo-python bash
    
  5. Run the Python applications from within the container shell. See earlier sections for more details.

    root@6970a2a9e65b:/# ./producer.py -f $HOME/.confluent/librdkafka.config -t test1
    root@6970a2a9e65b:/# ./consumer.py -f $HOME/.confluent/librdkafka.config -t test1
    root@6970a2a9e65b:/# ./producer_ccsr.py -f $HOME/.confluent/librdkafka.config -t test2
    root@6970a2a9e65b:/# ./consumer_ccsr.py -f $HOME/.confluent/librdkafka.config -t test2