Important Warning

Update July 2016: This Tech Preview documentation from March 2016 is outdated and deprecated. Please use the latest Confluent Platform documentation instead.

Introduction

Kafka Streams, a component of Apache Kafka, is a library for processing and analyzing data stored in Kafka. It builds upon important concepts for stream processing such as properly distinguishing between event time and processing time, windowing support, and simple yet efficient management of application state.

The following list highlights several key capabilities and aspects of Kafka Streams that make it a compelling choice for building stream processing applications and microservices:

  • Designed as a simple and lightweight library in Apache Kafka, much like the Kafka producer and consumer client libraries. You can easily embed and integrate Kafka Streams into your own applications, which is a significant departure from framework-based stream processing tools such as Apache Storm and Spark Streaming that dictate many requirements upon you such as forcing you to setup and operate a distributed processing cluster. That said, you can also opt to use Kafka Streams as the foundation to build your own stream processing framework, of course.
  • Has no external dependencies on systems other than Apache Kafka and can be used in any Java application.
  • Leverages Kafka as its internal messaging layer instead of (re)implementing a custom messaging layer like many other stream processing tools. Notably, it uses Kafka’s partitioning model to horizontally scale processing while maintaining strong ordering guarantees. This ensures high performance, scalability, and operational simplicity for production environments. A key benefit of this design decision is that you do not have to understand and tune two different messaging layers – one for moving stream data at scale (Kafka) and a separate one for your stream processing tool. Similarly, any performance and reliability improvements of Kafka will automatically be available to Kafka Streams, too, thus tapping into the momentum of Kafka’s strong developer community.
  • Is agnostic to resource management and configuration tools, so it integrates much more seamlessly into the existing development, packaging, deployment, and operational practices of your organization.
  • Supports fault-tolerant local state, which enables very fast and efficient stateful operations like joins and windowed aggregations.
  • Employs one-record-at-a-time processing to achieve low processing latency, which is crucial for a variety of use cases such as fraud detection. This makes Kafka Streams different from micro-batch based stream processing tools.

Furthermore, Kafka Streams has a strong focus on usability and a great developer experience. It offers all the necessary stream processing primitives to allow applications to read data from Kafka as streams, process the data, and then either write the resulting data back to Kafka or send the final output to an external system. Developers can choose between a high-level DSL with commonly used operations like filter, map, join and a low-level API.

Finally, Kafka Streams helps with scaling developers, too – yes, the human side – because it has a low barrier to entry and a smooth path to scale from development to production: You can quickly write and run a small-scale proof-of-concept on a single machine because you don’t need to install or understand a distributed stream processing cluster; and you only need to run additional instances of your application on multiple machines to scale up to high-volume production workloads. Kafka Streams transparently handles the load balancing of multiple instances of the same application by leveraging Kafka’s parallelism model.

In summary, Kafka Streams is a compelling choice for building stream processing applications and microservices. Give it a try and run your first Hello World Streams application! The next sections in this documentation will get you started.

Requirements

  • Kafka 0.9.1.0-cp1
  • [Optional] For additional Avro schema support: Schema Registry 2.1.0-alpha1 recommended, 1.0 minimum