High Availability in ksqlDB for Confluent Platform¶
When you run pull queries, it’s often the case that you need your data to remain available for querying even if one server fails. Because ksqlDB supports clustering, it can remain highly available to support pull queries on replicas of your data, even in the face of partial cluster failures.
High availability is turned off by default, but you can enable it with the following server configuration parameters. These parameters must be turned on for all nodes in your ksqlDB cluster.
- Set
ksql.streams.num.standby.replicas
to a value greater than
0
. - Set
ksql.query.pull.enable.standby.reads
to
true
. - Set
ksql.heartbeat.enable
to
true
. - Set
ksql.lag.reporting.enable
to
true
.
In addition, make sure that all of your nodes identify as part of the same cluster by setting ksql.service.id to the same value.
Note
In Confluent Cloud, ksqlDB clusters with 8 or 12 CSUs are automatically configured for high availability. High availability can’t be enabled for Confluent Cloud clusters with fewer than 8 CSUs.
Controlling consistency¶
Because ksqlDB replicates data between its servers asynchronously, you
may want to bound the potential staleness that your query will tolerate.
You can control this per pull query by using the
ksql.query.pull.max.allowed.offset.lag
parameter. For instance, a value of 10,000
means that results of
pull queries forwarded to servers whose current offset is more than
10,000
positions behind the end offset of the changelog topic are
rejected.
Compatability with authentication¶
Set the authentication.skip.paths
config with both /lag
and
/heartbeat
. This enables ksqlDB cluster instances to communicate
without authenticating between each other.
This configuration prevents the following error.
ksqldb-server1 | [...] ERROR Failed to handle request 401 /heartbeat (io.confluent.ksql.api.server.FailureHandler:38)
ksqldb-server1 | io.confluent.ksql.api.server.KsqlApiException: Unauthorized
For security reasons, this configuration is recommended to block the traffic from outside your cluster to those endpoints.