Python: Code Example for Apache Kafka®¶
In this tutorial, you will run a Python client application that produces messages to and consumes messages from an Apache Kafka® cluster.
After you run the tutorial, use the provided source code as a reference to develop your own Kafka client application.
Prerequisites¶
Client¶
A functioning Python environment with the Confluent Python Client for Apache Kafka installed.
Check your
confluent-kafka
library version. The requirements.txt file specifies a version of theconfluent-kafka
library >= 1.4.2 which is required for the latest Serialization API demonstrated here. If you install the library manually or globally, the same version requirements apply.You can use Virtualenv and run the following commands to create a virtual environment with the client installed.
virtualenv ccloud-venv source ./ccloud-venv/bin/activate pip install -r requirements.txt
Configure SSL trust store¶
Depending on your operating system or Linux distribution you may need to take extra
steps to set up the SSL CA root certificates. If your system doesn’t have the
SSL CA root certificates properly set up, you may receive a SSL handshake failed
error message similar to the following:
%3|1605776788.619|FAIL|rdkafka#producer-1| [thrd:sasl_ssl://...confluent.cloud:9092/bootstr]: sasl_ssl://...confluent.cloud:9092/bootstrap: SSL handshake failed: error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed: broker certificate could not be verified, verify that ssl.ca.location is correctly configured or root CA certificates are installed (brew install openssl) (after 258ms in state CONNECT)
In this case, you need to manually install a bundle of validated CA root certificates and potentially modify the client code to set the ssl.ca.location
configuration property.
(For more information, see the documentation for librdkafka on which this client is built)
macOS¶
On newer versions of macOS (for example, 10.15), you may need to add an additional dependency.
For the Python client:
pip install certifi
For other clients:
brew install openssl
Once you install the CA root certificates, set the ssl.ca.location
property in the client code.
Edit both the producer and consumer code files, and add the ssl.ca.location
configuration parameter into the producer and consumer properties.
The value should correspond to the location of the appropriate CA root certificates file on your host.
For the Python client, use certifi.where()
to determine the location of the certificate files:
ssl.ca.location: certifi.where()
For other clients, check the install path and provide it in the code:
ssl.ca.location: '/usr/local/etc/openssl@1.1/cert.pem'
CentOS¶
You may need to install CA root certificates in the following way:
sudo yum reinstall ca-certificates
This should be sufficient for the Kafka clients to find the certificates.
However, if you still get the same error, you can set the ssl.ca.location
property in the client code.
Edit both the producer and consumer code files, and add the ssl.ca.location
configuration parameter into the producer and consumer properties.
The value should correspond to the location of the appropriate CA root certificates file on your host, for example:
ssl.ca.location: '/etc/ssl/certs/ca-bundle.crt'
Kafka Cluster¶
- You can use this tutorial with a Kafka cluster in any environment:
- In Confluent Cloud
- On your local host
- Any remote Kafka cluster
- If you are running on Confluent Cloud, you must have access to a
Confluent Cloud cluster
with an API key and secret.
- The first 20 users to sign up for Confluent Cloud and use promo code
C50INTEG
will receive an additional $50 free usage (details) - For an automated way to create a Kafka cluster, credentials, and ACLs in Confluent Cloud, see ccloud-stack Utility for Confluent Cloud.
- The first 20 users to sign up for Confluent Cloud and use promo code
Setup¶
Clone the confluentinc/examples GitHub repository and check out the
6.1.0-post
branch.git clone https://github.com/confluentinc/examples cd examples git checkout 6.1.0-post
Change directory to the example for Python.
cd clients/cloud/python/
Create a local file (for example, at
$HOME/.confluent/librdkafka.config
) with configuration parameters to connect to your Kafka cluster. Starting with one of the templates below, customize the file with connection information to your cluster. Substitute your values for{{ BROKER_ENDPOINT }}
,{{CLUSTER_API_KEY }}
, and{{ CLUSTER_API_SECRET }}
(see Configure Confluent Cloud Clients for instructions on how to manually find these values, or use the ccloud-stack Utility for Confluent Cloud to automatically create them).Template configuration file for Confluent Cloud
# Kafka bootstrap.servers={{ BROKER_ENDPOINT }} security.protocol=SASL_SSL sasl.mechanisms=PLAIN sasl.username={{ CLUSTER_API_KEY }} sasl.password={{ CLUSTER_API_SECRET }}
Template configuration file for local host
# Kafka bootstrap.servers=localhost:9092
Basic Producer and Consumer¶
In this example, the producer application writes Kafka data to a topic in your Kafka cluster.
If the topic does not already exist in your Kafka cluster, the producer application will use the Kafka Admin Client API to create the topic.
Each record written to Kafka has a key representing a username (for example, alice
) and a value of a count, formatted as json (for example, {"count": 0}
).
The consumer application reads the same Kafka topic and keeps a rolling sum of the count as it processes each record.
Produce Records¶
Run the producer, passing in arguments for:
- the local file with configuration parameters to connect to your Kafka cluster
- topic name
./producer.py -f $HOME/.confluent/librdkafka.config -t test1
Verify that the producer sent all the messages. You should see:
Producing record: alice {"count": 0} Producing record: alice {"count": 1} Producing record: alice {"count": 2} Producing record: alice {"count": 3} Producing record: alice {"count": 4} Producing record: alice {"count": 5} Producing record: alice {"count": 6} Producing record: alice {"count": 7} Producing record: alice {"count": 8} Producing record: alice {"count": 9} Produced record to topic test1 partition [0] @ offset 0 Produced record to topic test1 partition [0] @ offset 1 Produced record to topic test1 partition [0] @ offset 2 Produced record to topic test1 partition [0] @ offset 3 Produced record to topic test1 partition [0] @ offset 4 Produced record to topic test1 partition [0] @ offset 5 Produced record to topic test1 partition [0] @ offset 6 Produced record to topic test1 partition [0] @ offset 7 Produced record to topic test1 partition [0] @ offset 8 Produced record to topic test1 partition [0] @ offset 9 10 messages were produced to topic test1!
View the producer code.
Consume Records¶
Run the consumer, passing in arguments for:
- the local file with configuration parameters to connect to your Kafka cluster.
- the same topic name you used in step 1.
./consumer.py -f $HOME/.confluent/librdkafka.config -t test1
Verify the consumer received all the messages. You should see:
... Waiting for message or event/error in poll() Consumed record with key alice and value {"count": 0}, and updated total count to 0 Consumed record with key alice and value {"count": 1}, and updated total count to 1 Consumed record with key alice and value {"count": 2}, and updated total count to 3 Consumed record with key alice and value {"count": 3}, and updated total count to 6 Consumed record with key alice and value {"count": 4}, and updated total count to 10 Consumed record with key alice and value {"count": 5}, and updated total count to 15 Consumed record with key alice and value {"count": 6}, and updated total count to 21 Consumed record with key alice and value {"count": 7}, and updated total count to 28 Consumed record with key alice and value {"count": 8}, and updated total count to 36 Consumed record with key alice and value {"count": 9}, and updated total count to 45 Waiting for message or event/error in poll() ...
View the consumer code.
Avro And Confluent Cloud Schema Registry¶
This example is similar to the previous example, except the value is formatted as Avro and integrates with the Confluent Cloud Schema Registry. Before using Confluent Cloud Schema Registry, check its availability and limits.
These examples use the latest Serializer API provided by the confluent-kafka
library. The Serializer API replaces the legacy AvroProducer and AvroConsumer
classes to provide a more flexible API including additional support for JSON,
Protobuf, and Avro data formats. See the latest confluent-kafka documentation for
further details.
As described in the Quick Start for Schema Management on Confluent Cloud in the Confluent Cloud GUI, enable Confluent Cloud Schema Registry and create an API key and secret to connect to it.
Verify that your VPC can connect to the Confluent Cloud Schema Registry public internet endpoint.
Update your local configuration file (for example, at
$HOME/.confluent/librdkafka.config
) with parameters to connect to Schema Registry.Template configuration file for Confluent Cloud
# Kafka bootstrap.servers={{ BROKER_ENDPOINT }} security.protocol=SASL_SSL sasl.mechanisms=PLAIN sasl.username={{ CLUSTER_API_KEY }} sasl.password={{ CLUSTER_API_SECRET }} # Confluent Cloud Schema Registry schema.registry.url=https://{{ SR_ENDPOINT }} basic.auth.credentials.source=USER_INFO basic.auth.user.info={{ SR_API_KEY }}:{{ SR_API_SECRET }}
Template configuration file for local host
# Kafka bootstrap.servers=localhost:9092 # Confluent Schema Registry schema.registry.url=http://localhost:8081
Verify your Confluent Cloud Schema Registry credentials by listing the Schema Registry subjects. In the following example, substitute your values for
{{ SR_API_KEY }}
,{{ SR_API_SECRET }}
, and{{ SR_ENDPOINT }}
.curl -u {{ SR_API_KEY }}:{{ SR_API_SECRET }} https://{{ SR_ENDPOINT }}/subjects
Produce Avro Records¶
Run the Avro producer, passing in arguments for:
- the local file with configuration parameters to connect to your Kafka cluster
- the topic name
./producer_ccsr.py -f $HOME/.confluent/librdkafka.config -t test2
Verify that the producer sent all the messages. You should see:
Producing Avro record: alice 0 Producing Avro record: alice 1 Producing Avro record: alice 2 Producing Avro record: alice 3 Producing Avro record: alice 4 Producing Avro record: alice 5 Producing Avro record: alice 6 Producing Avro record: alice 7 Producing Avro record: alice 8 Producing Avro record: alice 9 Produced record to topic test2 partition [0] @ offset 0 Produced record to topic test2 partition [0] @ offset 1 Produced record to topic test2 partition [0] @ offset 2 Produced record to topic test2 partition [0] @ offset 3 Produced record to topic test2 partition [0] @ offset 4 Produced record to topic test2 partition [0] @ offset 5 Produced record to topic test2 partition [0] @ offset 6 Produced record to topic test2 partition [0] @ offset 7 Produced record to topic test2 partition [0] @ offset 8 Produced record to topic test2 partition [0] @ offset 9 10 messages were produced to topic test2!
View the producer Avro code.
Consume Avro Records¶
Run the Avro consumer, passing in arguments for:
- the local file with configuration parameters to connect to your Kafka cluster
- the same topic name you used in step 5
./consumer_ccsr.py -f $HOME/.confluent/librdkafka.config -t test2
Verify the consumer received all the messages. You should see:
./consumer_ccsr.py -f $HOME/.confluent/librdkafka.config -t test2 ... Waiting for message or event/error in poll() Consumed record with key alice and value 0, and updated total count to 0 Consumed record with key alice and value 1, and updated total count to 1 Consumed record with key alice and value 2, and updated total count to 3 Consumed record with key alice and value 3, and updated total count to 6 Consumed record with key alice and value 4, and updated total count to 10 Consumed record with key alice and value 5, and updated total count to 15 Consumed record with key alice and value 6, and updated total count to 21 Consumed record with key alice and value 7, and updated total count to 28 Consumed record with key alice and value 8, and updated total count to 36 Consumed record with key alice and value 9, and updated total count to 45 ...
View the consumer Avro code.
Confluent Cloud Schema Registry¶
View the schema subjects registered in Confluent Cloud Schema Registry. In the following output, substitute values for
<SR API KEY>
,<SR API SECRET>
, and<SR ENDPOINT>
.curl -u <SR API KEY>:<SR API SECRET> https://<SR ENDPOINT>/subjects
Verify that the subject
test2-value
exists.["test2-value"]
View the schema information for subject test2-value. In the following output, substitute values for
<SR API KEY>
,<SR API SECRET>
, and<SR ENDPOINT>
.curl -u <SR API KEY>:<SR API SECRET> https://<SR ENDPOINT>/subjects/test2-value/versions/1
Verify the schema information for subject
test2-value
.{"subject":"test2-value","version":1,"id":100001,"schema":"{\"name\":\"io.confluent.examples.clients.cloud.DataRecordAvro\",\"type\":\"record\",\"fields\":[{\"name\":\"count\",\"type\":\"long\"}]}"}
Run all the code in Docker¶
You can also run all the previous code within Docker.
Ensure you have created a local file with configuration parameters to connect to your Kafka cluster at
$HOME/.confluent/librdkafka.config
.View the Dockerfile that builds a custom Docker image.
FROM python:3.7-slim COPY requirements.txt /tmp/requirements.txt RUN pip3 install -U -r /tmp/requirements.txt COPY *.py ./
Build the Docker image using the following command:
docker build -t cloud-demo-python .
Run the Docker image using the following command:
docker run -v $HOME/.confluent/librdkafka.config:/root/.confluent/librdkafka.config -it --rm cloud-demo-python bash
Run the Python applications from within the container shell. See earlier sections for more details.
root@6970a2a9e65b:/# ./producer.py -f $HOME/.confluent/librdkafka.config -t test1 root@6970a2a9e65b:/# ./consumer.py -f $HOME/.confluent/librdkafka.config -t test1 root@6970a2a9e65b:/# ./producer_ccsr.py -f $HOME/.confluent/librdkafka.config -t test2 root@6970a2a9e65b:/# ./consumer_ccsr.py -f $HOME/.confluent/librdkafka.config -t test2