Important Warning

Update July 2016: This Tech Preview documentation from March 2016 is outdated and deprecated. Please use the latest Confluent Platform documentation instead.

Confluent Platform 2.1.0-alpha1 (Kafka Streams Tech Preview) Release Notes

This alpha release is a Tech Preview that includes a preview version of Apache Kafka and its upcoming Kafka Streams library. Kafka Streams is a stream processing library bundled with Kafka.

Note

Why a Tech Preview? As of March 2016 the Apache Kafka project does not yet provide an official release that includes the new Kafka Streams library (Kafka Streams is expected to be released with upcoming Kafka 0.10).

We provide this Tech Preview of Kafka Streams to allow Kafka users and developers to take a first look at the upcoming functionality, and also to solicit feedback on Kafka Streams, which you are invited to share via the Kafka mailing lists.

What’s New in CP 2.1.0-alpha1?

The CP 2.1.0-alpha1 release aka the Kafka Streams Tech Preview is an experimental release. This means that this Tech Preview release is well suited for technical evaluations and experiments – notably Kafka Streams functionality – but is not recommended for longer-lived production deployments.

Technically, this Tech Preview is based on CP 2.0.1 with the addition of the latest Kafka Streams implementation, which was backported to Kafka 0.9.

Kafka Streams

This Tech Preview release includes the latest implementation of Kafka Streams, which is not yet officially available through the Apache Kafka open source project.

Kafka Streams, a component of Apache Kafka, is a stream processing library natively integrated with Kafka. It has a very low barrier to entry, easy operationalization, and a high-level DSL for writing stream processing applications. As such it is the most convenient yet scalable option to process and analyze data that is backed by Kafka.

Further information is available in the Kafka Streams documentation. If you want to give it a quick spin, head straight to the Kafka Streams Quickstart.

Previous CP releases

What’s New in CP 2.0.1?

Confluent Platform 2.0.1 contains a number of bug fixes included in the Kafka 0.9.0.1 release. Details of the changes to Kafka in this patch release are found in the Kafka Release Notes. Details of the changes to other components of the Confluent Platform are listed in the respective changelogs such as Kafka REST Proxy changelog.

Here is a quick overview of the notable Kafka-related bug fixes in the release, grouped by the affected functionality:

New Java consumer

  • KAFKA-2978: Topic partition is not sometimes consumed after rebalancing of consumer group
  • KAFKA-3179: Kafka consumer delivers message whose offset is earlier than sought offset.
  • KAFKA-3157: Mirror maker doesn’t commit offset with new consumer when there is no more messages
  • KAFKA-3170: Default value of fetch_min_bytes in new consumer is 1024 while doc says it is 1

Compatibility

  • KAFKA-2695: Handle null string/bytes protocol primitives
  • KAFKA-3100: Broker.createBroker should work if json is version > 2, but still compatible
  • KAFKA-3012: Avoid reserved.broker.max.id collisions on upgrade

Security

  • KAFKA-3198: Ticket Renewal Thread exits prematurely due to inverted comparison
  • KAFKA-3152: kafka-acl doesn’t allow space in principal name
  • KAFKA-3169: Kafka broker throws OutOfMemory error with invalid SASL packet
  • KAFKA-2878: Kafka broker throws OutOfMemory exception with invalid join group request
  • KAFKA-3166: Disable SSL client authentication for SASL_SSL security protocol

Performance/memory usage

  • KAFKA-3003: The fetch.wait.max.ms is not honored when new log segment rolled for low volume topics
  • KAFKA-3159: Kafka consumer 0.9.0.0 client poll is very CPU intensive under certain conditions
  • KAFKA-2988: Change default configuration of the log cleaner
  • KAFKA-2973: Fix leak of child sensors on remove

Topic deletion

  • KAFKA-3170: Topics marked for delete in Zookeeper may become undeletable

What’s New in CP 2.0.0?

The CP 2.0.0 release includes a range of new features over the previous release CP 1.0.x.

Security

This release includes three key security features built directly within Kafka itself. First we now authenticate users using either Kerberos or TLS client certificates, so we now know who is making each request to Kafka. Second we have added a unix-like permissions system (ACLs) to control which users can access which data. Third, we support encryption on the wire using TLS to protect sensitive data on an untrusted network.

For more information on security features and how to enable them, see Kafka Security.

Kafka Connect

Kafka Connect facilitates large-scale, real-time data import and export for Kafka. It abstracts away common problems that each such data integration tool needs to solve to be viable for 24x7 production environments: fault tolerance, partitioning, offset management and delivery semantics, operations, and monitoring. It offers the capability to run a pool of processes that host a large number of Kafka connectors while handling load balancing and fault tolerance.

Confluent Platform includes a file connector for importing data from text files or exporting to text files, JDBC connector for importing data from relational databases and an HDFS connector for exporting data to HDFS / Hive in Avro and Parquet formats.

To learn more about Kafka Connect and the available connectors, see Kafka Connect.

User Defined Quotas

Confluent Platform 2.0 and Kafka 0.9 now support user-defined quotas. Users have the ability to enforce quotas on a per-client basis. Producer-side quotas are defined in terms of bytes written per second per client id while consumer quotas are defined in terms of bytes read per second per client id.

Learn more about user defined quotas in the Enforcing Client Quotas section of the post-deployment documentation.

New Consumer

This release introduces beta support for the newly redesigned consumer client. At a high level, the primary difference in the new consumer is that it removes the distinction between the “high-level” ZooKeeper-based consumer and the “low-level” SimpleConsumer APIs, and instead offers a unified consumer API.

The new consumer allows the use of the group management facility (like the older high-level consumer) while still offering better control over offset commits at the partition level (like the older low-level consumer). It offers pluggable partition assignment amongst the members of a consumer group and ships with several assignment strategies. This completes a series of projects done in the last few years to fully decouple Kafka clients from Zookeeper, thus entirely removing the consumer client’s dependency on ZooKeeper.

To learn how to use the new consumer, refer to the Kafka Consumers documentation or the API docs.

New Client Library - librdkafka

In this release of the Confluent Platform we are packaging librdkafka. librdkafka is a C/C++ library implementation of the Apache Kafka protocol, containing both Producer and Consumer support. It was designed with message delivery reliability and high performance in mind, current figures exceed 800000 msgs/second for the producer and 3 million msgs/second for the consumer.

You can learn how to use librdkafka side-by-side with the Java clients in our Kafka Clients documentation.

Proactive Support

Proactive Support is a component of the Confluent Platform that improves Confluent’s support for the platform by collecting and reporting support metrics (“Metrics”) to Confluent. Proactive Support is enabled by default in the Confluent Platform. We do this to provide proactive support to our customers, to help us build better products, to help customers comply with support contracts, and to help guide our marketing efforts. With Metrics enabled, a Kafka broker is configured to collect and report certain broker and cluster metadata (“Metadata”) every 24 hours about your use of the Confluent Platform (including without limitation, your remote internet protocol address) to Confluent, Inc. (“Confluent”) or its parent, subsidiaries, affiliates or service providers.

Proactive Support is enabled by default in the Confluent Platform, but you can disable it by following the instructions in Proactive Support documentation. Please refer to the Confluent Privacy Policy for an in-depth description of how Confluent processes such information.

Compatibility Notes

  • Kafka 0.9 no longer supports Java 6 or Scala 2.9. If you are still on Java 6, consider upgrading to a supported version.
  • Configuration parameter replica.lag.max.messages was removed. Partition leaders will no longer consider the number of lagging messages when deciding which replicas are in sync.
  • Configuration parameter replica.lag.time.max.ms now refers not just to the time passed since last fetch request from replica, but also to time since the replica last caught up. Replicas that are still fetching messages from leaders but did not catch up to the latest messages in replica.lag.time.max.ms will be considered out of sync.
  • MirroMaker no longer supports multiple target clusters. As a result it will only accept a single --consumer.config parameter. To mirror multiple source clusters, you will need at least one MirrorMaker instance per source cluster, each with its own consumer configuration.
  • Broker IDs above 1000 are now reserved by default to automatically assigned broker IDs. If your cluster has existing broker IDs above that threshold make sure to increase the reserved.broker.max.id broker configuration property accordingly.

How to Download

CP 2.1.0-alpha1 (“Kafka Streams Tech Preview”) is available for download at http://www.confluent.io/developer. See section Installation for detailed information.

Important

No upgrades: Unlike past CP releases, the CP 2.1.0-alpha1 release aka the Kafka Streams Tech Preview is an experimental release for which we support neither upgrade paths from previous CP releases, nor upgrade paths to future CP releases. Users are expected to start with a fresh installation of the Tech Preview from a blank slate.

This means that this Tech Preview release is well suited for technical evaluations and experiments – notably Kafka Streams functionality – but is not recommended for longer-lived production deployments.

Questions?

If you have questions regarding this release, feel free to reach out via the Confluent Platform mailing list. Confluent customers are encouraged to contact our support directly.