Configure a Kafka Streams Application for Confluent Platform

You configure a Kafka Streams application through a java.util.Properties instance that supplies required parameters, such as application.id and bootstrap.servers, along with optional parameters that tune processing, state, and client behavior. Configure these Apache Kafka® and Kafka Streams options before using the Streams API.

Create a java.util.Properties instance.

Set the parameters. For example:

import java.util.Properties;
import org.apache.kafka.streams.StreamsConfig;

Properties props = new Properties();
// Set a few key parameters
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "my-first-streams-application");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-broker1:9092");
// Any further settings
props.put(... , ...);

Configuration parameter reference

This section contains the most common Kafka Streams configuration parameters. For more information, see the Streams and Client Javadocs.

Required configuration parameters

Here are the required Kafka Streams configuration parameters.

Parameter Name	Importance	Description	Default Value
application.id	Required	An identifier for the stream processing application. Must be unique within the Kafka cluster.	None
bootstrap.servers	Required	A list of host/port pairs to use for establishing the initial connection to the Kafka cluster.	None

application.id

(Required) The application ID. Each stream processing application must have a unique ID. The same ID must be given to all instances of the application. You should use only alphanumeric characters, . (dot), - (hyphen), and _ (underscore). Examples are "hello_world", "hello_world-v1.0.0".

This ID is used in the following places to isolate resources used by the application from others:

As the default Kafka consumer and producer client.id prefix
As the Kafka consumer group.id for coordination
As the name of the subdirectory in the state directory (cf. state.dir)
As the prefix of internal Kafka topic names

When an application is updated, you should change the application.id, unless you want to reuse the existing data in internal topics and state stores. For example, you could embed the version information within application.id, as my-app-v1.0.0 and my-app-v1.0.2.

bootstrap.servers

(Required) The Kafka bootstrap servers. This is the same setting that is used by the underlying producer and consumer clients to connect to the Kafka cluster. Example: "kafka-broker1:9092,kafka-broker2:9092".

Kafka Streams applications can communicate only with a single Kafka cluster specified by this config value.

Suggested configuration parameters for resiliency

There are several Kafka and Kafka Streams configuration options that must be configured explicitly for resiliency in face of broker failures:

Parameter Name	Corresponding Client	Default value	Consider setting to
acks	Producer [1]	`acks=all`	`acks=all` [2]
min.insync.replicas	Broker	`1`	`2`
num.standby.replicas	Streams	`0`	`1`
replication.factor (for broker version 2.3 and older)	Streams	`-1`	`3` [3]
state.dir	Streams	`/${java.io.tmpdir}/kafka-streams`	a persistent volume

Increasing the replication factor to 3 ensures that the internal Kafka Streams topic can tolerate up to two broker failures. The tradeoff from moving from the default values to the suggested ones is that you sacrifice some performance and storage space (3x with the replication factor of 3) for more resiliency.

Define these settings by using the StreamsConfig class:

Properties streamsSettings = new Properties();
// for broker version 2.3 or older
//streamsSettings.put(StreamsConfig.REPLICATION_FACTOR_CONFIG, 3);
// for version 2.8 or older
//streamsSettings.put(StreamsConfig.producerPrefix(ProducerConfig.ACKS_CONFIG), "all");
streamsSettings.put(StreamsConfig.topicPrefix(TopicConfig.MIN_IN_SYNC_REPLICAS_CONFIG), 2);
streamsSettings.put(StreamsConfig.NUM_STANDBY_REPLICAS_CONFIG, 1);

acks

The number of acknowledgments that the leader must have received before considering a request complete. This controls the durability of records that are sent. The possible values are:

acks=0: The producer does not wait for acknowledgment from the server and the record is immediately added to the socket buffer and considered sent. No guarantee can be made that the server has received the record in this case, and the producer does not generally know of any failures. The offset returned for each record is always set to -1.
acks=1: The leader writes the record to its local log and responds without waiting for full acknowledgment from all followers. If the leader immediately fails after acknowledging the record, but before the followers have replicated it, then the record is lost.
acks=all (the default value): The leader waits for the full set of in-sync replicas to acknowledge the record. This guarantees that the record is not lost if there is at least one in-sync replica alive. This is the strongest available guarantee.

For more information, see the Kafka Producer documentation.

min.insync.replicas

In addition to setting the min.insync.replicas parameter in the broker configuration, min.insync.replicas must be set for the internal topics of your Kafka Streams application, as shown in the previous code example.

See the description here.

num.standby.replicas

See the description here.

replication.factor

See the description here.

state.dir

Although state can be re-created from Kafka, recreating state can require significant resources and time. You can configure the /tmp/ directory as tmpfs, which is essentially memory. Also, many organizations mount tmp directories with the noexec option. For these reasons, you should rely on a persistent volume for a production environment.

Optional configuration parameters

Here are the optional Streams configuration parameters, with the level of importance indicated for each:

High: These are parameters with a default value which is most likely not a good fit for production use. Revisit these parameters for production usage.
Medium: The default values of these parameters should work for production for many cases, but it’s not uncommon that they are changed, for example to tune performance.
Low: It is rarely necessary to change the value for these parameters. Change them only if there is a specific issue you must address.

Parameter Name	Importance	Description	Default Value
acceptable.recovery.lag	Medium	The maximum acceptable lag (number of offsets to catch up) for an instance to be considered caught-up and ready for the active task.	10,000
allow.os.group.write.access	Low	Allows state store directories to have write access for the OS group.	`false`
application.server	Low	A host:port pair pointing to an embedded user defined endpoint that can be used for discovering the locations of state stores within a single Kafka Streams application. The value of this must be different for each instance of the application.	the empty string
buffered.records.per.partition	Low	The maximum number of records to buffer per partition.	1000
cache.max.bytes.buffering	Medium	Deprecated in Confluent Platform 7.4. Use `statestore.cache.max.bytes` instead.	10485760 bytes
client.id	Medium	An ID string to pass to the server when making requests. (This setting is passed to the consumer/producer clients used internally by Kafka Streams.)	the empty string
commit.interval.ms	Low	The frequency with which to save the position (offsets in source topics) of tasks. For at-least-once processing, committing means saving the position (offsets) of the processor. For exactly-once processing, it means to commit the transaction, which includes saving the position.	30000 ms (`at_least_once`) / 100 ms (`exactly_once_v2`)
connections.max.idle.ms	Low	The number of milliseconds to wait before closing idle connections.	540000 ms (9 minutes)
default.client.supplier	Low	Client supplier class that implements the `org.apache.kafka.streams.KafkaClientSupplier` interface.
default.deserialization.exception.handler	Medium	Deprecated. Use `deserialization.exception.handler` instead.	See default.deserialization.exception.handler
default.dsl.store	Low	Deprecated in Confluent Platform 7.7. The default state store type used by DSL operators.	“rocksDB”
default.key.serde	Medium	Default serializer/deserializer class for record keys, implements the `Serde` interface (see also value.serde).	`null`
default.production.exception.handler	Medium	Deprecated. Use `production.exception.handler` instead.	See default.production.exception.handler
default.timestamp.extractor	Medium	Default timestamp extractor class that implements the `TimestampExtractor` interface.	See Timestamp Extractor
default.value.serde	Medium	Default serializer/deserializer class for record values, implements the `Serde` interface (see also key.serde).	`null`
default.windowed.key.serde.inner	Medium	Deprecated in Confluent Platform 7.9. Use `windowed.inner.class.serde` instead.	`Serdes.ByteArray().getClass().getName()`
default.windowed.value.serde.inner	Medium	Deprecated in Confluent Platform 7.9. Use `windowed.inner.class.serde` instead.	`Serdes.ByteArray().getClass().getName()`
deserialization.exception.handler	Medium	Exception handling class that implements the `DeserializationExceptionHandler` interface.	`LogAndFailExceptionHandler`
dsl.store.suppliers.class	Low	Defines a default state store implementation.	`BuiltInDslStoreSuppliers`
dsl.store.format	Low	Controls whether DSL operators materialize headers-aware state stores. Accepted values: `DEFAULT`, `HEADERS`.	`DEFAULT`
enable.metrics.push	Medium	Push client metrics to the cluster, if the cluster has a client metrics subscription that matches this client.
ensure.explicit.internal.resource.naming	Medium	Enables enforcement of explicit naming for all internal resources of the topology, including internal topics.	`false`
group.protocol	Low	The protocol used for group coordination.	`classic`
log.summary.interval.ms	Low	Added to a window’s `maintainMs` to ensure data is not deleted from the log prematurely. Allows for clock drift.	120000 milliseconds (2 minutes)
max.task.idle.ms	Medium	Maximum amount of time Kafka Streams waits to fetch data to ensure in-order processing semantics.	0 milliseconds
max.warmup.replicas	Medium	The maximum number of warmup replicas (extra standbys beyond the configured num.standbys) that can be assigned at once.	2
metadata.max.age.ms	Low	The period of time in milliseconds after which a refresh of metadata is forced.	300000 ms (5 minutes)
metric.reporters	Low	A list of classes to use as metrics reporters.	the empty list
metrics.num.samples	Low	The number of samples maintained to compute metrics.	2
metrics.recording.level	Low	The highest recording level for metrics.	`INFO`
metrics.sample.window.ms	Low	The window of time a metrics sample is computed over.	30000 milliseconds
num.standby.replicas	High	The number of standby replicas for each task.	0
num.stream.threads	Medium	The number of threads to execute stream processing.	1
poll.ms	Low	The amount of time in milliseconds to block waiting for input.	100 milliseconds
probing.rebalance.interval.ms	Low	The maximum time to wait before triggering a rebalance to probe for warmup replicas that have sufficiently caught up.	600000 milliseconds (10 minutes)
processing.exception.handler	Medium	Exception handling class that implements the `ProcessingExceptionHandler` interface.	`LogAndFailProcessingExceptionHandler`
processing.exception.handler.global.enabled	Low	Deprecated. Controls whether the `ProcessingExceptionHandler` is invoked for global store/KTable processing.	`false`
processing.guarantee	Medium	The processing mode. Can be either `at_least_once` (default), or `exactly_once_v2` (for EOS version 2, requires Confluent Platform version 5.5.x / Kafka version 2.5.x or higher). Deprecated config options are `exactly_once` (for EOS version 1) and `exactly_once_beta` (for EOS version 2).	See Processing Guarantee
processor.wrapper.class	Medium	A class or class name implementing the `ProcessorWrapper` interface. Must be passed in when creating the topology via `TopologyConfig`. For more information, see processor.wrapper.class.	`null`
production.exception.handler	Medium	Exception handling class that implements the `ProductionExceptionHandler` interface. For more information, see production.exception.handler.	`DefaultProductionExceptionHandler`
rack.aware.assignment.non_overlap_cost	Low	Cost associated with moving tasks from existing assignment. For more information, see rack.aware.assignment.non_overlap_cost.	`null`
rack.aware.assignment.strategy	Low	The strategy used for rack-aware assignment. Values are “none” (default), “min_traffic”, and “balance_subtopology”. For more information, see rack.aware.assignment.strategy.	`none`
rack.aware.assignment.tags	Low	List of tag keys used to distribute standby replicas across Kafka Streams clients.	the empty list
rack.aware.assignment.traffic_cost	Low	Cost associated with cross-rack traffic. For more information, see rack.aware.assignment.traffic_cost.	`null`
replication.factor	High	The replication factor for changelog topics and repartition topics created by the application. If your broker cluster is on version Confluent Platform 5.4.x (Kafka 2.4.x) or newer, you can set -1 to use the broker default replication factor.	1
retries	Medium	The number of retries for broker requests that return a retryable error.	0
retry.backoff.ms	Medium	The amount of time in milliseconds, before a request is retried. This applies if the `retries` parameter is configured to be greater than 0.	100
rocksdb.config.setter	Medium	The RocksDB configuration.
state.cleanup.delay.ms	Low	The amount of time in milliseconds to wait before deleting state when a partition has migrated.	600000 milliseconds
state.cleanup.dir.max.age.ms	Low	Time-based threshold for purging local state directories during startup. Disabled by default.	`-1`
state.dir	High	Directory location for state stores.	`/${java.io.tmpdir}/kafka-streams`
statestore.cache.max.bytes	Medium	Maximum number of memory bytes to be used for record caches across all threads.	10485760 bytes
task.assignor.class	Medium	A task assignor class or class name implementing the `TaskAssignor` interface.	The high-availability task assignor.
task.timeout.ms	Medium	The maximum amount of time in ms a task might stall due to internal errors and retries until an error is raised.	300000 milliseconds (5 minutes)
topology.optimization	Low	Enables/Disables topology optimization.	`NO_OPTIMIZATION`
upgrade.from	Medium	The version you are upgrading from during a rolling upgrade.	See Upgrade From
windowed.inner.class.serde	Medium	Deprecated. For alternatives, see Window Serdes.
windowstore.changelog.additional.retention.ms	Low	Added to a windows maintainMs to ensure data is not deleted from the log prematurely. Allows for clock drift.	86400000 milliseconds = 1 day
window.size.ms	Low	Deprecated. For alternatives, see Window Serdes.	`null`

acceptable.recovery.lag

The maximum acceptable lag, which is the total number of offsets to catch up from the changelog, for an instance to be considered caught-up and able to receive an active task. Kafka Streams assigns stateful active tasks only to instances whose state stores are within the acceptable recovery lag, if any exist, and assigns warmup replicas to restore state in the background for instances that are not yet caught up. This value should correspond to a recovery time of well under a minute for a given workload. Must be at least 0.

allow.os.group.write.access

Allows state store directories that Kafka Streams creates to have write access for the OS group. When you enable this option, Kafka Streams creates the state store directories with group write permissions. The default is false.

application.server

A host:port pair pointing to a user-defined endpoint that can be used for state store discovery and interactive queries on the current KafkaStreams instance.

built.in.metrics.version

Version of the built-in metrics to use. The default is “latest”.

buffered.records.per.partition

Deprecated in Confluent Platform 7.4.

The maximum number of records to buffer per partition. Kafka Streams attempts to fetch no more than this number of records from a partition at a time and buffers fetched but unprocessed records in memory. The default is 1000. For current recommendations on tuning buffering and memory usage, see Kafka Streams Memory Management for Confluent Platform.

commit.interval.ms

The frequency in milliseconds with which to commit processing progress. For at-least-once processing, committing means to save the position (the offsets) of the processor. For exactly-once processing, it means to commit the transaction, which includes saving the position and making the committed data in the output topic visible to consumers with isolation level read_committed.

If processing.guarantee is set to exactly_once_v2 or exactly_once, the default value is 100, otherwise the default value is 30000.

connections.max.idle.ms

The number of milliseconds to wait before closing idle connections.

default.client.supplier

Client supplier class that implements the org.apache.kafka.streams.KafkaClientSupplier interface. The default is org.apache.kafka.streams.processor.internals.DefaultKafkaClientSupplier.

default.dsl.store

Deprecated in Confluent Platform 7.7. The default state store type used by DSL operators. The default is “rocksDB”.

If you currently specify default.dsl.store=ROCKS_DB or default.dsl.store=IN_MEMORY, replace these configurations with dsl.store.suppliers.class=BuiltInDslStoreSuppliers.RocksDBDslStoreSuppliers.class and dsl.stores.suppliers.class=BuiltInDslStoreSuppliers.InMemoryDslStoreSuppliers.class, respectively.

dsl.store.suppliers.class

Defines a default state store implementation to be used by any stateful DSL operator that has not explicitly configured the store implementation type. Must implement the org.apache.kafka.streams.state.DslStoreSuppliers interface. The default is BuiltInDslStoreSuppliers.RocksDBDslStoreSuppliers.

dsl.store.format

Controls whether DSL operators materialize headers-aware state stores. Case-insensitive. Accepted values:

DEFAULT: Uses existing timestamped or plain store variants per operator. Does not persist record headers in state. This is the existing behavior, and no change is required for existing applications.
HEADERS: Selects headers-aware stores (introduced by KIP-1271) that persist record headers alongside values and timestamps. Changelog payloads and local RocksDB state are larger than under DEFAULT. Required for using the Schema Registry schema GUID in record header format with Kafka Streams.

For new Kafka Streams applications, set dsl.store.format=HEADERS. This avoids a future migration if you later adopt record headers, such as the Schema Registry schema GUID in header format, or other features that depend on header-aware stores. The runtime overhead for workloads without headers is a single byte per state store entry. Existing applications can stay on DEFAULT and switch when needed. The migration path is described in the following Migration note.

This config applies globally to all DSL operators in the application. Per-operator customization of the store format is possible by providing a custom DslStoreSuppliers implementation through Materialized.withStoreType() or by supplying explicit store suppliers that return headers-aware stores.

This config is orthogonal to dsl.store.suppliers.class, which selects the store implementation (for example, RocksDB versus in-memory). dsl.store.format selects the store format (with or without header preservation). Both can be set independently.

Migration: Switching from DEFAULT to HEADERS triggers a lazy, per-key migration of existing local RocksDB state. New writes go to a headers-aware column family, and legacy entries are converted on first read. No application downtime is required. Reverting from HEADERS back to DEFAULT requires clearing local state directories and restoring from changelog topics.

Changelog compatibility: The changelog record value format is unchanged regardless of store format. Headers are carried as native Kafka record metadata. Old changelogs restore correctly into HEADERS stores (with empty headers), and new changelogs restore into DEFAULT stores (headers are dropped). No changelog topic recreation is needed.

Performance: Each local state store entry grows by at least 1 byte (empty headers prefix). Changelog records carry additional native Kafka record headers, increasing per-record size proportional to header count and size. For workloads with small or empty headers, the overhead is negligible.

Current limitations: The suppress() operator, left and outer stream-stream joins, and versioned state stores do not support dsl.store.format=HEADERS. When these are used, the HEADERS setting is silently ignored and no headers are stored.

For more information, see KIP-1285.

default.deserialization.exception.handler

The default deserialization exception handler allows you to manage record exceptions that fail to deserialize. This can be caused by corrupt data, incorrect serialization logic, or unhandled record types. The implemented exception handler must return FAIL or CONTINUE, depending on the record and the exception thrown. Returning FAIL signals that Kafka Streams should shut down, and CONTINUE signals that Kafka Streams should ignore the issue and continue processing. The default implementation class is LogAndFailExceptionHandler. These exception handlers are available:

LogAndContinueExceptionHandler: This handler logs the deserialization exception and then signals the processing pipeline to continue processing more records. This log-and-skip strategy allows Kafka Streams to make progress instead of failing, if there are records that fail to deserialize.
LogAndFailExceptionHandler. This handler logs the deserialization exception and then signals the processing pipeline to stop processing more records.

You can also provide your own customized exception handler instead of the library-provided handlers to meet your needs. For an example customized exception handler implementation, see the Failure and exception handling FAQ.

default.production.exception.handler

Deprecated in Confluent Platform 8.0 (Kafka Streams 4.0). Use production.exception.handler instead.

The default production exception handler allows you to manage exceptions triggered when trying to interact with a broker such as attempting to produce a record that is too large. By default, Kafka provides and uses the DefaultProductionExceptionHandler that always fails when these exceptions occur.

An exception handler can return FAIL, CONTINUE, or RETRY depending on the record and the exception thrown.

FAIL signals that Kafka Streams should shut down.
CONTINUE signals that Kafka Streams should ignore the issue and continue processing.
For a RetriableException, the handler can return RETRY to indicate to the runtime that it should try again to send the failed record. If RETRY is returned for an exception that is not a RetriableException, it is treated as FAIL.

If you want to provide an exception handler that always ignores records that are too large, you could implement something like the following:

import java.util.Properties;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.common.errors.RecordTooLargeException;
import org.apache.kafka.streams.errors.ProductionExceptionHandler;
import org.apache.kafka.streams.errors.ProductionExceptionHandler.ProductionExceptionHandlerResponse;

class IgnoreRecordTooLargeHandler implements ProductionExceptionHandler {
    public void configure(Map<String, Object> config) {}

    public ProductionExceptionHandlerResponse handle(final ProducerRecord<byte[], byte[]> record,
                                                     final Exception exception) {
        if (exception instanceof RecordTooLargeException) {
            return ProductionExceptionHandlerResponse.CONTINUE;
        } else {
            return ProductionExceptionHandlerResponse.FAIL;
        }
    }
}

Properties settings = new Properties();

// other various kafka streams settings, e.g. bootstrap servers, application ID, etc

settings.put(StreamsConfig.DEFAULT_PRODUCTION_EXCEPTION_HANDLER_CLASS_CONFIG,
             IgnoreRecordTooLargeHandler.class);

default.key.serde

The default Serializer/Deserializer class for record keys, null until set by user. Serialization and deserialization in Kafka Streams happens whenever data needs to be materialized, for example:

Whenever data is read from or written to a Kafka topic, for example, by the StreamsBuilder#stream() and KStream#to() methods.
Whenever data is read from or written to a state store.

For more information, see Data types and serialization.

default.list.key.serde.inner

Default inner class of list serde for key that implements the org.apache.kafka.common.serialization.Serde interface. This configuration is read only if the default.key.serde configuration is set to org.apache.kafka.common.serialization.Serdes.ListSerde.

default.list.key.serde.type

Default class for key that implements the java.util.List interface. This configuration is read only if the default.key.serde configuration is set to org.apache.kafka.common.serialization.Serdes.ListSerde.

When a list serde class is used, you must set the inner serde class that implements the org.apache.kafka.common.serialization.Serde Interface by using default.list.key.serde.inner.

default.list.value.serde.inner

Default inner class of list serde for value that implements the org.apache.kafka.common.serialization.Serde interface. This configuration is read only if the default.value.serde configuration is set to org.apache.kafka.common.serialization.Serdes.ListSerde.

default.list.value.serde.type

Default class for value that implements the java.util.List interface. This configuration is read only if the default.value.serde configuration is set to org.apache.kafka.common.serialization.Serdes.ListSerde.

When a list serde class is used, you must set the inner serde class that implements the org.apache.kafka.common.serialization.Serde Interface by using default.list.value.serde.inner.

default.value.serde

The default Serializer/Deserializer class for record values. Its value is null until you set it. Serialization and deserialization in Kafka Streams happens whenever data needs to be materialized, for example:

Whenever data is read from or written to a Kafka topic, for example, by the StreamsBuilder#stream() and KStream#to() methods.
Whenever data is read from or written to a state store.

For more information, see Data types and serialization.

default.timestamp.extractor

A timestamp extractor pulls a timestamp from an instance of ConsumerRecord. Timestamps are used to control the progress of streams.

The default extractor is FailOnInvalidTimestamp. This extractor retrieves built-in timestamps that are automatically embedded into Kafka messages by the Kafka producer client since Kafka version 0.10. Depending on the setting of Kafka’s server-side log.message.timestamp.type broker and message.timestamp.type topic parameters, this extractor provides you with:

event-time processing semantics if log.message.timestamp.type is set to CreateTime, also called “producer time” (which is the default). This represents the time when a Kafka producer sent the original message. If you use Kafka’s official producer client or one of Confluent’s producer clients, the timestamp represents milliseconds since the epoch.
ingestion-time processing semantics if log.message.timestamp.type is set to LogAppendTime, also called “broker time”. This represents the time when the Kafka broker received the original message, in milliseconds since the epoch.

The FailOnInvalidTimestamp extractor throws an exception if a record contains an invalid, that is, negative, built-in timestamp, because Kafka Streams would not process this record but silently drop it. Invalid built-in timestamps can occur for various reasons: if, for example, you consume a topic that is written to by pre-0.10 Kafka producer clients or by third-party producer clients that don’t support the new Kafka 0.10 message format yet; another situation in which this may happen is after upgrading your Kafka cluster from 0.9 to 0.10, where all the data that was generated with 0.9 does not include the 0.10 message timestamps.

If you have data with invalid timestamps and want to process it, then there are two alternative extractors available. Both work on built-in timestamps, but handle invalid timestamps differently.

LogAndSkipOnInvalidTimestamp: This extractor logs a warn message and returns the invalid timestamp to Kafka Streams, which will not process but silently drop the record. This log-and-skip strategy allows Kafka Streams to make progress instead of failing if there are records with an invalid built-in timestamp in your input data.
UsePartitionTimeOnInvalidTimestamp. This extractor returns the record’s built-in timestamp if it is valid, that is, not negative. If the record does not have a valid built-in timestamps, the extractor returns the previously extracted valid timestamp from a record of the same topic partition as the current record as a timestamp estimation. In case that no timestamp can be estimated, it throws an exception.

Another built-in extractor is WallclockTimestampExtractor. This extractor does not actually “extract” a timestamp from the consumed record but rather returns the current time in milliseconds from the system clock (System.currentTimeMillis()), which effectively means Kafka Streams operates on the basis of the so-called processing-time of events.

You can also provide your own timestamp extractors, for instance to retrieve timestamps embedded in the payload of messages. If you can’t extract a valid timestamp, you can either throw an exception, return a negative timestamp, or estimate a timestamp. Returning a negative timestamp results in data loss, as the corresponding record isn’t processed, but instead, it’s dropped silently. If you want to estimate a new timestamp, you can use the value provided by previousTimestamp, that is, a Kafka Streams timestamp estimation. Here is an example of a custom TimestampExtractor implementation:

import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.streams.processor.TimestampExtractor;

// Extracts the embedded timestamp of a record (giving you "event-time" semantics).
public class MyEventTimeExtractor implements TimestampExtractor {

  @Override
  public long extract(final ConsumerRecord<Object, Object> record, final long previousTimestamp) {
    // `Foo` is your own custom class, which we assume has a method that returns
    // the embedded timestamp (milliseconds since midnight, January 1, 1970 UTC).
    long timestamp = -1;
    final Foo myPojo = (Foo) record.value();
    if (myPojo != null) {
      timestamp = myPojo.getTimestampInMillis();
    }
    if (timestamp < 0) {
      // Invalid timestamp!  Attempt to estimate a new timestamp,
      // otherwise fall back to wall-clock time (processing-time).
      if (previousTimestamp >= 0) {
        return previousTimestamp;
      } else {
        return System.currentTimeMillis();
      }
    }
    return timestamp;
  }
}

You would then define the custom timestamp extractor in your Kafka Streams configuration as follows:

import java.util.Properties;
import org.apache.kafka.streams.StreamsConfig;

Properties streamsConfiguration = new Properties();
streamsConfiguration.put(StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG, MyEventTimeExtractor.class);

default.windowed.key.serde.inner

Deprecated in Confluent Platform 7.9. For alternatives, see Window Serdes.

The default Serializer/Deserializer class for the inner class of windowed keys. Serialization and deserialization in Kafka Streams happens whenever data needs to be materialized, for example:

Whenever data is read from or written to a Kafka topic, for example, via the StreamsBuilder#stream() and KStream#to() methods.
Whenever data is read from or written to a state store.

For more information, see Kafka Streams Data Types and Serialization for Confluent Platform.

default.windowed.value.serde.inner

Deprecated in Confluent Platform 7.9. For alternatives, see Window Serdes.

The default Serializer/Deserializer class for the inner class of windowed values. Serialization and deserialization in Kafka Streams happens whenever data needs to be materialized, for example:

Whenever data is read from or written to a Kafka topic, for example, via the StreamsBuilder#stream() and KStream#to() methods.
Whenever data is read from or written to a state store.

For more information, see Kafka Streams Data Types and Serialization for Confluent Platform.

enable.metrics.push

Kafka Streams metrics can be pushed to the brokers, similar to client metrics. Also, Kafka Streams supports enabling and disabling metric pushing for each embedded client individually. Pushing Kafka Streams metrics requires that enable.metric.push is enabled on the main-consumer and admin client.

ensure.explicit.internal.resource.naming

Enables enforcement of explicit naming for all internal resources of the topology, including internal topics, for example, changelog and repartition, and their associated state stores.

When enabled, the application refuses to start if any internal resource has an auto-generated name.

group.protocol

The group protocol used by the Kafka Streams client for coordination. It determines how the client communicates with the Kafka brokers and other clients in the same group.

The default value is classic, which is the classic consumer group protocol.

You can set it to streams, which requires broker-side enablement to enable the “streams” rebalance protocol.

For more information, see Streams Rebalance Protocol.

log.summary.interval.ms

Controls the output interval for summary information. If greater than or equal to 0, the summary log is output according to the set time interval; if less than 0, summary output is disabled.

max.task.idle.ms

Controls how long Kafka Streams waits to fetch data to ensure in-order processing semantics.

The max.task.idle.ms setting controls whether joins and merges may produce out-of-order results. The config value is the maximum amount of time, in milliseconds, that a stream task stays idle when it’s fully caught up on some, but not all, input partitions when waiting for producers to send additional records. This idle time avoids potential out-of-order record processing across multiple input streams.

The default is 0. If set to the default, the stream doesn’t wait for producers to send more records. Instead, it waits to fetch data that’s already present on the brokers, which means that for records already present on the brokers, Kafka Streams processes them in timestamp order.

Set to -1 to disable idling and process any locally available data, even though doing so may produce out-of-order processing.

When processing a task that has multiple input partitions, like in a join or merge, Kafka Streams must choose which partition to process the next record from. When all input partitions have locally buffered data, Kafka Streams chooses the partition with the next record that has the lowest timestamp. This decision collates the input partitions in timestamp order, which is desirable in a streaming join or merge.

But when Kafka Streams doesn’t have any data buffered locally for one of the partitions, it can’t determine whether the next record for that partition has a lower or higher timestamp than the remaining partitions’ records.

There are two cases to consider: either there is data in the partition on the broker that Kafka Streams hasn’t fetched yet, or Kafka Streams is fully caught up with that partition, but the producers haven’t produced any new records since Kafka Streams polled the last batch.

The default value of 0 causes Kafka Streams to delay processing a task when it detects that it has no locally buffered data for a partition, but there is data available on the brokers, which means that there’s an empty partition in the local buffer, but Kafka Streams has a non-zero lag for that partition. But as soon as Kafka Streams catches up to the broker, it continues processing, even if there’s no data in one of the partitions, so it doesn’t wait for new data to be produced. This default is designed to sacrifice some throughput in exchange for correct join semantics.

Setting max.task.idle.ms to any value greater than zero specifies the number of additional milliseconds that Kafka Streams waits if it has a caught-up but empty partition. A value greater than zero defines the time to wait for new data to be produced to the input partitions to ensure in-order processing of data, in the case of a slow producer.

Setting max.task.idle.ms to -1 indicates that Kafka Streams never waits to buffer empty partitions before choosing the next record by timestamp, which achieves maximum throughput at the expense of introducing out-of-order processing.

max.warmup.replicas

The maximum number of warmup replicas. Warmup replicas are extra standbys beyond the configured num.standbys, that may be assigned to keep the task available on one instance while it’s warming up on another instance that it has been reassigned to. Used to throttle extra broker traffic and cluster state that can be used for high availability. Increasing this enables Kafka Streams to warm up more tasks at once, speeding up the time for the reassigned warmups to restore sufficient state to be transitioned to active tasks. Must be at least 1.

One warmup replica corresponds to one stream task. Furthermore, each warmup task can be promoted to an active task only during a rebalance, normally during a so-called “probing rebalance,” which occurs at a frequency specified by the probing.rebalance.interval.ms configuration. This means that the maximum rate at which active tasks can be migrated from one Kafka Streams instance to another instance can be determined by max.warmup.replicas and probing.rebalance.interval.ms.

metadata.max.age.ms

The period of time in milliseconds after which a refresh of metadata is forced, even in the absence of partition leadership changes, to proactively discover any new brokers or partitions. The default is 300000 ms (5 minutes).

metric.reporters

A list of classes to use as metrics reporters. Implementing the org.apache.kafka.common.metrics.MetricsReporter interface enables plugging in classes that are notified of new metric creation. The JmxReporter is always included to register JMX statistics.

metrics.num.samples

The number of samples maintained to compute metrics. Valid values are integers starting with 1. The default is 2.

metrics.recording.level

The highest recording level for metrics. Valid values are “INFO”, “DEBUG”, and “TRACE”. The default is “INFO”.

metrics.sample.window.ms

The window of time a metrics sample is computed over. The default is 30000 ms (30 seconds).

min.insync.replicas

The minimum number of in-sync replicas available for replication if the producer is configured with acks="all".

When a producer sets acks to “all” (or “-1”), the min.insync.replicas configuration specifies the minimum number of replicas that must acknowledge a write for the write to be considered successful. For more information, see min.insync.replicas.

num.standby.replicas

The number of standby replicas. Standby replicas are shadow copies of local state stores. Kafka Streams attempts to create the specified number of replicas per store and keep them up to date as long as there are enough instances running. Standby replicas are used to minimize the latency of task failover. A task that was previously running on a failed instance is preferred to restart on an instance that has standby replicas so that the local state store restoration process from its changelog can be minimized. Details about how Kafka Streams makes use of the standby replicas to minimize the cost of resuming tasks on failover can be found in the State section.

Suggestion:

Increase the number of standbys to 1 to get instant fail-over (high-availability). Increasing the number of standbys requires more client-side storage space. For example, with 1 standby, 2x space is required.

If you configure n standby replicas, you need to provision n+1KafkaStreams instances.

num.stream.threads

This specifies the number of stream threads in an instance of the Kafka Streams application. The stream processing code runs in these threads. For more information about the Kafka Streams threading model, see Threading model.

poll.ms

The amount of time in milliseconds to block waiting for input. The default is 100.

probing.rebalance.interval.ms

The maximum time to wait before triggering a rebalance to probe for warmup replicas that have restored enough to be considered caught up. Kafka Streams assigns stateful active tasks only to instances that are caught up and within the acceptable.recovery.lag, if any exist. Probing rebalances are used to query the latest total lag of warmup replicas and transition them to active tasks if ready. They will continue to be triggered as long as there are warmup tasks, and until the assignment is balanced. Must be at least 1 minute.

processing.exception.handler

The processing exception handler enables managing exceptions triggered during the processing of a record. The implemented exception handler must return FAIL or CONTINUE, depending on the record and the exception thrown. Returning FAIL signals that Kafka Streams should shut down, and CONTINUE signals that Kafka Streams should ignore the issue and continue processing. The following library built-in exception handlers are available:

LogAndContinueProcessingExceptionHandler: This handler logs the processing exception and signals the processing pipeline to continue processing more records. This log-and-skip strategy allows Kafka Streams to make progress instead of failing if there are records that fail to be processed.
LogAndFailProcessingExceptionHandler: This handler logs the processing exception and signals the processing pipeline to stop processing more records.

You can also provide your own customized exception handler instead of the library-provided handlers. For example, you can choose to forward corrupted records into a quarantine topic, or “dead letter queue”, for further processing. To do this, use the Producer API to write a corrupted record directly to the quarantine topic. You can create a separate KafkaProducer object outside of the Kafka Streams client and pass in this object as well as the dead letter queue topic name into the Properties map, which then can be retrieved from the configure function call.

public class SendToDeadLetterQueueExceptionHandler implements ProcessingExceptionHandler {
    KafkaProducer<byte[], byte[]> dlqProducer;
    String dlqTopic;

    @Override
    public ProcessingHandlerResponse handle(final ErrorHandlerContext context,
                                            final Record record,
                                            final Exception exception) {

        log.warn("Exception caught during message processing, sending to the dead queue topic; " +
            "processor node: {}, taskId: {}, source topic: {}, source partition: {}, source offset: {}",
            context.processorNodeId(), context.taskId(), context.topic(), context.partition(), context.offset(),
            exception);

        dlqProducer.send(new ProducerRecord<>(dlqTopic, null, record.timestamp(), (byte[]) record.key(), (byte[]) record.value), record.headers()));

        return ProcessingHandlerResponse.CONTINUE;
    }

    @Override
    public void configure(final Map<String, ?> configs) {
        dlqProducer = ... // get a producer from the configs map
        dlqTopic = ... // get the topic name from the configs map
    }
}

The drawback of this approach is that “manual” writes are side effects that are invisible to the Kafka Streams runtime library, so they don’t benefit from the end-to-end processing guarantees of the Streams API.

processing.exception.handler.global.enabled

Deprecated. Controls whether the configured ProcessingExceptionHandler handles exceptions that occur during global store and GlobalKTable processing. When set to true, the exception handler configured by processing.exception.handler also applies to global processors. When set to false (the default), the configured handler does not handle global processing exceptions. The default is false.

Note

This configuration is deprecated and might be removed in a future release. When removed, the behavior enabled by this setting becomes the default.

Dead Letter Queue (DLQ) functionality is not supported for global stores and GlobalKTables. For global store and GlobalKTable exceptions, the record metadata does not include a DLQ topic.

For more information, see KIP-1270.

processing.guarantee

The processing guarantee that should be used. Possible values are at_least_once (default) and exactly_once_v2. Using exactly_once_v2 requires Confluent Platform version 5.5.x / Kafka version 2.5.x or newer. If exactly-once processing is enabled, the default for parameter commit.interval.ms changes to 100ms. Additionally, consumers are configured with isolation.level="read_committed" and producers are configured with enable.idempotence=true by default. The exactly_once_v2 processing mode requires a cluster of at least three brokers by default, which is the suggested setting for production. For development, you can change this by adjusting the broker settings in both transaction.state.log.replication.factor and transaction.state.log.min.isr to the number of brokers you want to use. To learn more, see Processing Guarantees.

processor.wrapper.class

A class or class name implementing the ProcessorWrapper interface. This feature allows you to wrap any of the processors in the compiled topology, including both custom processor implementations and those created by Kafka Streams for DSL operators. This can be useful for logging or tracing implementations because it provides access to the otherwise-hidden processor context for DSL operators, and also allows for injecting additional debugging information into an entire application topology with a single configuration.

Important

This configuration must be passed in when creating the topology and does not take effect unless you pass it to the appropriate topology-building constructor. Use the StreamsBuilder(TopologyConfig) constructor for DSL applications, and the Topology(TopologyConfig) constructor for Processor API applications.

production.exception.handler

The production exception handler enables you to manage exceptions triggered when trying to interact with a broker, such as attempting to produce a record that is too large. By default, Kafka Streams provides and uses the DefaultProductionExceptionHandler that always fails when these exceptions occur.

An exception handler can return FAIL, CONTINUE, or RETRY depending on the record and the exception thrown. Returning FAIL signals that Kafka Streams should shut down. CONTINUE signals that Kafka Streams should ignore the issue and continue processing. For RetriableException, the handler can return RETRY to tell the runtime to retry sending the failed record.

If RETRY is returned for a non-RetriableException, it is treated as FAIL.

import java.util.Properties;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.common.errors.RecordTooLargeException;
import org.apache.kafka.streams.errors.ProductionExceptionHandler;
import org.apache.kafka.streams.errors.ProductionExceptionHandler.ProductionExceptionHandlerResponse;

public class IgnoreRecordTooLargeHandler implements ProductionExceptionHandler {
    public void configure(Map<String, Object> config) {}

    public ProductionExceptionHandlerResponse handle(final ErrorHandlerContext context,
                                                    final ProducerRecord<byte[], byte[]> record,
                                                    final Exception exception) {
        if (exception instanceof RecordTooLargeException) {
            return ProductionExceptionHandlerResponse.CONTINUE;
        } else {
            return ProductionExceptionHandlerResponse.FAIL;
        }
    }
}

Properties settings = new Properties();

// other various kafka streams settings, for example, bootstrap servers, application id, etc.

settings.put(StreamsConfig.PRODUCTION_EXCEPTION_HANDLER_CLASS_CONFIG,
            IgnoreRecordTooLargeHandler.class);

rack.aware.assignment.non_overlap_cost

This configuration sets the cost of moving a task from the original assignment computed either by StickyTaskAssignor or HighAvailabilityTaskAssignor. Together with rack.aware.assignment.traffic_cost, they control whether the optimizer favors minimizing cross-rack traffic or minimizing the movement of tasks in the existing assignment.

If this config is set to a larger value than rack.aware.assignment.traffic_cost, the optimizer tries to maintain the existing assignment computed by the task assignor.

The optimizer takes the ratio of these two configs into consideration of favoring maintaining existing assignment or minimizing traffic cost. For example, setting rack.aware.assignment.non_overlap_cost to 10 and rack.aware.assignment.traffic_cost to 1 is more likely to maintain existing assignment than setting rack.aware.assignment.non_overlap_cost to 100 and rack.aware.assignment.traffic_cost to 50.

The default value is null, which means the default non_overlap_cost in different assignors is used.

In StickyTaskAssignor, it has a default value of 10, and rack.aware.assignment.traffic_cost has a default value of 1, which means that maintaining stickiness is preferred in StickyTaskAssignor.
In HighAvailabilityTaskAssignor, it has a default value of 1, and rack.aware.assignment.traffic_cost has a default value of 10, which means that minimizing cross-rack traffic is preferred in HighAvailabilityTaskAssignor.

rack.aware.assignment.strategy

This configuration sets the strategy Kafka Streams uses for rack-aware task assignment so that cross traffic from broker to client can be reduced. This config takes effect only when broker.rack is set on the brokers and client.rack is set on Kafka Streams side. These are the settings for this config:

none: This is the default value, which means rack-aware task assignment is disabled.
min_traffic: This setting means the rack-aware task assigner computes an assignment that tries to minimize cross-rack traffic.
balance_subtopology: This settings means that the rack-aware task assigner computes an assignment that tries to balance tasks from same subtopology to different clients and minimize cross-rack traffic on top of this.

This config can be used with rack.aware.assignment.non_overlap_cost and rack.aware.assignment.traffic_cost to balance reducing cross-rack traffic and maintaining the existing assignment.

rack.aware.assignment.tags

This configuration sets a list of tag keys used to distribute standby replicas across Kafka Streams clients. When configured, Kafka Streams makes a best-effort to distribute the standby tasks over clients with different tag values.

Tags for the Kafka Streams clients can be set by using the client.tag. prefix, for example:

Client-1                                   | Client-2
-------------------------------------------+-----------------------------------------
client.tag.zone: eu-central-1a             | client.tag.zone: eu-central-1b
client.tag.cluster: k8s-cluster1           | client.tag.cluster: k8s-cluster1
rack.aware.assignment.tags: zone,cluster   | rack.aware.assignment.tags: zone,cluster


Client-3                                   | Client-4
-------------------------------------------+-----------------------------------------
client.tag.zone: eu-central-1a             | client.tag.zone: eu-central-1b
client.tag.cluster: k8s-cluster2           | client.tag.cluster: k8s-cluster2
rack.aware.assignment.tags: zone,cluster   | rack.aware.assignment.tags: zone,cluster

This example shows four Kafka Streams clients across two zones (eu-central-1a, eu-central-1b) and across two clusters (k8s-cluster1, k8s-cluster2). For an active task located on Client-1, Kafka Streams allocates a standby task on Client-4, because Client-4 has a different zone and a different cluster than Client-1.

rack.aware.assignment.traffic_cost

This configuration sets the cost of cross-rack traffic. Together with rack.aware.assignment.non_overlap_cost, they control whether the optimizer favors minimizing cross-rack traffic or minimizing the movement of tasks in the existing assignment.

If this config is set to a larger value than rack.aware.assignment.non_overlap_cost, the optimizer tries to compute an assignment that minimizes the cross-rack traffic.

The optimizer takes the ratio of these two configs into consideration of favoring maintaining existing assignment or minimizing traffic cost. For example, setting rack.aware.assignment.non_overlap_cost to 10 and rack.aware.assignment.traffic_cost to 1 is more likely to maintain existing assignment than setting rack.aware.assignment.non_overlap_cost to 100 and rack.aware.assignment.traffic_cost to 50.

The default value is null, which means default traffic cost in different assignors is used.

In StickyTaskAssignor, it has a default value of 1 and rack.aware.assignment.non_overlap_cost has a default value of 10.
In HighAvailabilityTaskAssignor, it has a default value of 10 and rack.aware.assignment.non_overlap_cost has a default value of 1.

receive.buffer.bytes

The size of the TCP receive buffer (SO_RCVBUF) to use when reading data. If the value is -1, the OS default is used. The default is 32768 (32 kibibytes).

reconnect.backoff.ms

The base amount of time to wait before attempting to reconnect to a given host. This avoids repeatedly connecting to a host in a tight loop. This backoff applies to all connection attempts by the client to a broker. The default is 50 ms.

reconnect.backoff.max.ms

The maximum amount of time in milliseconds to wait when reconnecting to a broker that has repeatedly failed to connect. If provided, the backoff per host increases exponentially for each consecutive connection failure, up to this maximum. After calculating the backoff increase, 20 percent random jitter is added to avoid connection storms. The default is 1000 ms (1 second).

repartition.purge.interval.ms

The frequency in milliseconds with which to delete fully consumed records from repartition topics. Purging occurs after at least this value since the last purge but may be delayed until later.

Unlike commit.interval.ms, the default for this value remains unchanged when processing.guarantee is set to exactly_once_v2.

The default is 30000 ms (30 seconds)

replication.factor

This specifies the replication factor of internal topics that Kafka Streams creates when local states are used or a stream is repartitioned for aggregation. Replication is important for fault tolerance. Without replication, even a single broker failure might prevent progress of the stream processing application. Use a similar replication factor as source topics.

A Kafka Streams application defaults to the broker’s replication factor for internal topics only in Kafka Streams version 3.0.0 and later. For more information, see KIP-733: change Kafka Streams default replication factor config.

Suggestion:: Increase the replication factor to 3 to ensure that the internal Kafka Streams topic can tolerate up to two broker failures. You require more storage space as well: three times more with the replication factor of 3.

request.timeout.ms

Controls the maximum amount of time the client waits for the response of a request. If the response is not received before the timeout elapses, the client resends the request if necessary or fails the request if retries are exhausted. The default is 40000 ms (40 seconds).

retries

Setting a value greater than zero causes the client to resend any request that fails with a potentially transient error. Set the value to either zero or MAX_VALUE and use corresponding timeout parameters to control how long a client should retry a request. The default is 0.

retry.backoff.ms

The amount of time to wait before attempting to retry a failed request to a given topic partition. This avoids repeatedly sending requests in a tight loop under some failure scenarios. The default is 100 ms.

rocksdb.config.setter

The RocksDB configuration. Kafka Streams uses RocksDB as the default storage engine for persistent stores. To change the default configuration for RocksDB, implement RocksDBConfigSetter and provide your custom class via rocksdb.config.setter.

Here is an example that adjusts the memory size consumed by RocksDB.

public static class CustomRocksDBConfig implements RocksDBConfigSetter {

  // This object should be a member variable so it can be closed in RocksDBConfigSetter#close.
  private org.rocksdb.Cache cache = new org.rocksdb.LRUCache(16 * 1024L * 1024L);

  @Override
  public void setConfig(final String storeName, final Options options, final Map<String, Object> configs) {
    // See #1 below.
    BlockBasedTableConfig tableConfig = (BlockBasedTableConfig) options.tableFormatConfig();
    tableConfig.setBlockCache(cache);
    // See #2 below.
    tableConfig.setBlockSize(16 * 1024L);
    // See #3 below.
    tableConfig.setCacheIndexAndFilterBlocks(true);
    options.setTableFormatConfig(tableConfig);
    // See #4 below.
    options.setMaxWriteBufferNumber(2);
  }

  @Override
  public void close(final String storeName, final Options options) {
    // See #5 below.
    cache.close();
  }
}

Properties streamsSettings = new Properties();
streamsConfig.put(StreamsConfig.ROCKSDB_CONFIG_SETTER_CLASS_CONFIG, CustomRocksDBConfig.class);

Notes for example:

BlockBasedTableConfig tableConfig = (BlockBasedTableConfig) options.tableFormatConfig(); Get a reference to the existing TableFormatConfig rather than create a new one so you don’t accidentally overwrite defaults such as the BloomFilter, an important optimization.
tableConfig.setBlockSize(16 * 1024L); Modify the default per these instructions from the RocksDB GitHub (indexes and filter blocks).
tableConfig.setCacheIndexAndFilterBlocks(true); Do not let the index and filter blocks grow unbounded. For more information, see the RocksDB GitHub (caching index and filter blocks).
options.setMaxWriteBufferNumber(2); See the advanced options in the RocksDB GitHub.
cache.close(); To avoid memory leaks, you must close any objects you constructed that extend org.rocksdb.RocksObject. See RocksJava docs for more details.

security.protocol

Protocol used to communicate with brokers. Valid values are: PLAINTEXT, SSL, SASL_PLAINTEXT, SASL_SSL.

send.buffer.bytes

The size of the TCP send buffer (SO_SNDBUF) to use when sending data. If the value is -1, the OS default is used. The default is 131072 (128 kibibytes).

state.cleanup.delay.ms

The amount of time in milliseconds to wait before deleting state when a partition has migrated. Only state directories that have not been modified for at least state.cleanup.delay.ms are removed. The default is 600000 ms (10 minutes).

state.cleanup.dir.max.age.ms

Time-based threshold for purging local state directories and checkpoint files during application startup. During startup, Kafka Streams removes state directories that have not changed for at least state.cleanup.dir.max.age.ms. This addresses scenarios where stale local state persists after changelog tombstones expire during broker-side retention. A value of -1 (the default) disables this feature.

For more information, see KIP-1259.

state.dir

The state directory. Kafka Streams persists local states under the state directory. Each application has a subdirectory on its hosting machine that is located under the state directory. The name of the subdirectory is the application ID. The state stores associated with the application are created under this subdirectory.

statestore.cache.max.bytes

Maximum number of bytes in memory to be used for statestore cache across all threads.

task.assignor.class

A task assignor class or class name implementing the TaskAssignor interface. The default is the high-availability task assignor.

One possible alternative implementation provided in Kafka is the org.apache.kafka.streams.processor.assignment.assignors.StickyTaskAssignor, which was the default task assignor before KIP-441 and minimizes task movement at the cost of stateful task availability. Alternative implementations of the task-assignment algorithm can be plugged into the application by implementing a custom TaskAssignor and setting this config to the name of the custom task assignor class.

task.timeout.ms

The maximum amount of time in milliseconds a task might stall due to internal errors and retries until an error is raised. For a timeout of 0ms, a task raises an error for the first internal error. For any timeout larger than 0ms, a task retries at least once before an error is raised.

topology.optimization

Indicates that Kafka Streams should apply topology optimizations. By default, optimizations are disabled.

For production code, you should list specific optimizations in the configuration, so the structure of your topology doesn’t change unexpectedly during upgrades of the Kafka Streams library.

The optimizations include:

merge.repartition.topics: move and reduce repartition topics.
reuse.ktable.source.topics: reuse the source topic as the changelog for source KTables.
single.store.self.join: use one state store for Stream-Stream inner joins on the primary key. This optimization doesn’t apply to Table-Table inner self-joins or N-way self-joins.

Valid values are:

StreamsConfig.NO_OPTIMIZATION (equivalent to none)
StreamsConfig.OPTIMIZE (equivalent to all)
or a comma-separated list of specific optimizations: StreamsConfig.MERGE_REPARTITION_TOPICS, StreamsConfig.REUSE_KTABLE_SOURCE_TOPICS, StreamsConfig.SINGLE_STORE_SELF_JOIN

You must do two things to enable optimizations:

Set the StreamsConfig.OPTIMIZE configuration.
Pass your configuration properties when building your topology by using the overloaded StreamsBuilder.build(Properties) method, for example:
```
KafkaStreams myStream = new KafkaStreams(streamsBuilder.build(properties), properties)
```

upgrade.from

The version you are upgrading from. It is important to set this config when performing a rolling upgrade to certain versions, as described in the Upgrade Guide.

Set this config to the appropriate version before bouncing your instances and upgrading them to the newer version. After the instance is on the newer version, remove this config and do a second rolling bounce. It is only necessary to set this config and follow the two-bounce upgrade path when upgrading from below version 2.0 (Confluent Platform 6.0), or when upgrading to 2.4 (Confluent Platform 6.4) and later from any version lower than 2.4 (Confluent Platform 6.4).

window.size.ms

Deprecated. For alternatives, see Window Serdes.

Sets window size for the deserializer in order to calculate window end times. Setting this config in a KafkaStreams application results in an error, because it is intended to be used only from a plain consumer client.

windowed.inner.class.serde

Deprecated. For alternatives, see Window Serdes.

Serializer/deserializer for the inner class of a windowed record. Must implement the org.apache.kafka.common.serialization.Serde interface.

This config is used only by plain consumer/producer clients that set a windowed de/serializer type via configs. For Kafka Streams applications that deal with windowed types, you must pass in the inner serde type when you instantiate the windowed serde object for your topology.

windowstore.changelog.additional.retention.ms

Added to a windows maintainMs to ensure data is not deleted from the log prematurely. Accounts for clock drift. The default is 86400000 ms (1 day).

Kafka consumers, producer, and admin client configuration parameters

You can specify parameters for the Kafka consumers, producers, and admin client that are used internally. The consumer, producer, and admin client settings are defined by specifying parameters in a StreamsConfig instance.

In this example, the Kafka consumer session timeout is configured to be 60000 milliseconds in the Streams settings:

Properties streamsSettings = new Properties();
// Example of a "normal" setting for Kafka Streams
streamsSettings.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-broker-01:9092");
// Customize the Kafka consumer settings of your Streams application
streamsSettings.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 60000);

Naming

Some consumer, producer, and admin client configuration parameters use the same parameter name. For example, send.buffer.bytes and receive.buffer.bytes are used to configure TCP buffers; request.timeout.ms and retry.backoff.ms control retries for client request. You can avoid duplicate names by prefix parameter names with consumer., producer., or admin. (for example, consumer.send.buffer.bytes or producer.send.buffer.bytes).

Properties streamsSettings = new Properties();
// same value for consumer and producer
streamsSettings.put("PARAMETER_NAME", "value");
// different values for consumer, producer, and admin client
streamsSettings.put("consumer.PARAMETER_NAME", "consumer-value");
streamsSettings.put("producer.PARAMETER_NAME", "producer-value");
streamsSettings.put("admin.PARAMETER_NAME", "admin-value");
// alternatively, you can use
streamsSettings.put(StreamsConfig.consumerPrefix("PARAMETER_NAME"), "consumer-value");
streamsSettings.put(StreamsConfig.producerPrefix("PARAMETER_NAME"), "producer-value");
streamsSettings.put(StreamsConfig.adminClientPrefix("PARAMETER_NAME"), "admin-value");

Parameter names for the main consumer, restore consumer, and global consumer are prepended with the following prefixes.

main.consumer. – for the main consumer, which is the default consumer of a stream source.
restore.consumer. – for the restore consumer, which manages state store recovery.
global.consumer. – for the global consumer, which is used in global KTable construction.

Setting values for parameters with these prefixes overrides the values set for consumer parameters. For example, the following configuration overrides the consumer.max.poll.records value.

consumer.max.poll.records = 5
main.consumer.max.poll.records = 100
restore.consumer.max.poll.records = 50

During initialization, these settings have the following effect on consumers.

Consumer Type	max.poll.records value \| Reason
Consumer	5	Default value of 5 for all consumer types.
Main Consumer	100	Target assignment with `main.consumer.` prefix.
Restore Consumer	50	The `restore.consumer.` prefix overrides the default value.
Global Consumer	5	No `global.consumer` prefix, so the default value is used.

For example, if you want to configure only the restore consumer, without changing the settings of other consumers, you can use restore.consumer. to set the configuration.

Properties streamsSettings = new Properties();
// same config value for all consumer types
streamsSettings.put("consumer.PARAMETER_NAME", "general-consumer-value");
// set a different restore consumer config. This would make restore consumer take restore-consumer-value,
// while main consumer and global consumer stay with general-consumer-value
streamsSettings.put("restore.consumer.PARAMETER_NAME", "restore-consumer-value");
// alternatively, you can use
streamsSettings.put(StreamsConfig.restoreConsumerPrefix("PARAMETER_NAME"), "restore-consumer-value");

Internal topic parameters

To configure the internal repartition/changelog topics, you can use the topic. prefix, followed by any of the standard topic configuration properties.

Properties streamsSettings = new Properties();
// Override default for both changelog and repartition topics
streamsSettings.put("topic.PARAMETER_NAME", "topic-value");
// alternatively, you can use
streamsSettings.put(StreamsConfig.topicPrefix("PARAMETER_NAME"), "topic-value");

Default values

Kafka Streams uses different default values for some of the underlying client configurations, which are summarized below. For detailed descriptions of these configurations, see Kafka Producer Configuration Reference for Confluent Platform and Kafka Consumer Configuration Reference for Confluent Platform.

Parameter Name	Corresponding Client	Streams Default
auto.offset.reset	Consumer	earliest
client.id	Producer	`<application.id>-<random-UUID>`
linger.ms	Producer	100
max.poll.records	Consumer	1000

If EOS is enabled, other parameters have the following default values.

Parameter Name	Corresponding Client	Streams Default
transaction.timeout.ms	Producer	10000
delivery.timeout.ms	Producer	`Integer.MAX_VALUE`

Parameters controlled by Kafka Streams

Kafka Streams assigns the following configuration parameters. If you try to change allow.auto.create.topics, your value is ignored and setting it has no effect in a Kafka Streams application. You can set the other parameters. Kafka Streams sets them to different default values than a plain KafkaConsumer.

Kafka Streams uses the client.id parameter to compute derived client IDs for internal clients. If you don’t set client.id, Kafka Streams sets it to <application.id>-<random-UUID>.

There are some special considerations when Kafka Streams assigns values to configuration parameters.

There is only one global consumer per Kafka Streams instance.
There is one restore consumer per thread.
Producer client.id: the value depends on the configured processing guarantee.
- EOS disabled or EOS version 2 enabled: There is only one producer per thread.
- EOS version 1 enabled: There is only one producer per task.
partition.assignment.strategy: the assignment strategy parameter affects only the main consumer. The global and restore consumers use “partition assignment” instead of “topic subscription”, and they don’t form a consumer group, so their StreamsPartitionAssignor is never used.
This parameter is not supported when the consumer group protocol, group.protocol=consumer, is enabled. The partition.assignment.strategy configuration applies only when using the classic consumer group protocol, group.protocol=classic, which is the default.

Parameter Name	Corresponding Client	Value Assigned by Kafka Streams
allow.auto.create.topics	Consumer	`false`
auto.offset.reset	Global Consumer	none
auto.offset.reset	Restore Consumer	none
client.id	Admin	`<client.id>-admin`
client.id	Consumer	`<client.id>-StreamThread-<threadIndex>-consumer`
client.id	Global Consumer	`<client.id>-global-consumer`
client.id	Restore Consumer	`<client.id>-StreamThread-<threadIndex>-restore-consumer`
client.id	Producer	EOS v1 case: `<client.id>-StreamThread-<threadIndex>-<taskId>-producer` Non-EOS and EOS v2 case: `<client.id>-StreamThread-<threadIndex>-producer`
enable.auto.commit	Consumer	`false`
group.id	Consumer	Equal to `application.id`.
group.id	Global Consumer	`null`
group.id	Restore Consumer	`null`
group.instance.id	Consumer	User-provided setting with the `-<threadIndex>` suffix appended.
partition.assignment.strategy	Consumer	Always set to `StreamsPartitionAssignor`.

enable.auto.commit

The consumer auto commit. To guarantee at-least-once processing semantics and turn off auto commits, Kafka Streams overrides this consumer config value to false. Consumers will only commit explicitly via commitSync calls when the Kafka Streams library or a user decides to commit the current processing state.

Note

This website includes content developed at the Apache Software Foundation under the terms of the Apache License v2.