Unit testing, integration testing, and schema compatibility are all important steps in building resilient applications in a CI/CD pipeline.
The Kafka community has developed many resources for helping to test your client applications.
Unit tests run very quickly and verify that isolated functional blocks of code work as expected.
They can test the logic of your application with minimal dependencies on other services.
ksqlDB exposes a test runner command line tool called ksql-test-runner that can automatically test whether your ksqlDB statements behave correctly when given a set of inputs. It runs quickly and doesn’t require a running Kafka or ksqlDB cluster. Example in Kafka Tutorial. Note that it can change and does not have backward compatibility guarantees.
With a Kafka Streams application, use TopologyTestDriver, a test class that tests Kafka Streams logic.
Its start-up time is very fast, and you can test a single message at a time through a Kafka Streams topology, which allows easy debugging and stepping.
Refer to the example in Kafka Tutorial.
If you developed your own Kafka Streams Processor, you may want to unit test it as well.
Processor forwards its results to the context rather than returning them, unit testing requires a mocked context capable of capturing forwarded data for inspection.
For these purposes, use MockProcessorContext, with an example in Kafka Streams test.
For basic Producers and Consumers, there are mock interfaces useful in unit tests.
JVM Producer and Consumer unit tests can make use of MockProducer and MockConsumer, which implement the same interfaces and mock all the I/O operations as implemented in the
You can refer to a
MockProducer example in Kafka Tutorial and a
MockConsumer example in Kafka Tutorial.
For non-JVM librdkafka Producer and Consumer, it varies by language.
You could also use rdkafka_mock, a minimal implementation of the Kafka protocol broker APIs with no other dependencies.
Refer to an example in librdkafka.
Integration tests look at end-to-end testing with other services.
You can use
Testcontainers, a Java library that supports JUnit tests, databases, browsers, and anything else inside Docker containers.
With a Testcontainer, Kafka brokers can get their own JVMs with their own classpath, and the broker’s memory and CPU are isolated from the tests themselves.
You can also throttle performance by allocating just a few processors for Docker.
As an alternative, you could also use
EmbeddedKafkaCluster, which provides an asynchronous, multi-threaded cluster.
Testcontainers, it allocates extra threads and objects in the same JVM as your tests.
It is an interface provided by
kafka-streams-test-utils, but it is not not a public interface and does not have backward compatibility guarantees.
For non-JVM librdkafka clients, you can use:
Schema Management and Evolution
There is an implicit contract that Kafka producers write data with a schema that can be read by Kafka consumers, even as producers and consumers evolve their schemas.
Kafka applications depend on these schemas and expect that any changes made to schemas are still compatible and able to run.
This is where Confluent Schema Registry helps: It provides centralized schema management and compatibility checks as schemas evolve.
If your application is using Schema Registry, you can simulate a Schema Registry instance in your unit testing.
Use Confluent Schema Registry’s MockSchemaRegistryClient to register and retrieve schemas that enable you to serialize and deserialize data.
MockSchemaRegistryClient example in Kafka Tutorial.
As you start building examples of streaming applications and tests using the Kafka Streams API along with Schema Registry, for integration testing, use any of the tools described in Integration Testing.
After your applications are running in production, schemas may evolve but still need to be compatible for all applications that rely on both old and new versions of a schema.
Confluent Schema Registry allows for schema evolution and provides compatibility checks to ensure that the contract between producers and consumers is not broken.
This allows producers and consumers to update independently as well as evolve their schemas independently, with assurances that they can read new and legacy data.
Confluent provides a Schema Registry Maven plugin, which checks the compatibility of a new schema against previously registered schemas.
Refer to an example of this plugin in Java example client pom.xml.
Though it may take only a few seconds to get your Kafka client application up and running, you should performance-tune your application before going into production.
Since different use cases have different sets of requirements that drive different service goals, you must decide what is your service goal:
Once you have determined your service goal, performance-test your application. Do some benchmark testing to see its current behavior, and then tune it.
Read more for a deeper dive on Optimizing and Tuning.
Chaos testing makes your applications more resilient by proactively testing for failures.
It can help expose some issues your application wouldn’t hit with other types of testing.
It injects random failures or simulates constrained resources to see how the apps or system performs.
Some examples of tools for chaos testing include:
Testing on Confluent Cloud
Once your application is up and running to Confluent Cloud, verify that all the functional pieces of the architecture work and check the data flows from end to end.
Additional considerations specifically for Confluent Cloud:
- Network throughput and latency
- Separate environments for development, testing, and production
- Cloud-specific security authentication and authorization
To dive into more involved scenarios, test your client application, or perhaps build a cool Kafka demo for your teammates, you may want to use more realistic datasets.
Real Production Datasets
One option is to test with real production datasets.
If possible, pull real production data (i.e., copy Kafka topics) into your test environment.
You can use Kafka’s native API to copy data from prod to test using:
Or pull data directly from a source system using a pre-built, fully managed, Apache Kafka® connector.
If you are not ready to integrate with a real data source or real cluster, you could still generate interesting test data for your topics.
The easiest way is to use a data generator with predefined schema definitions with complex records and multiple fields.
Using the Kafka Connect Datagen, you can format your data as one of Avro, JSON_SR, JSON, or Protobuf.