Create Stream Processing Apps with ksqlDB¶
ksqlDB is fully hosted on Confluent Cloud and provides a simple, scalable, resilient, and secure event streaming platform. Sign up for Confluent Cloud to get started.
Confluent Platform ksqlDB
ksqlDB for Confluent Platform is packaged as part of Confluent Platform. This is a commercial component of Confluent Platform. ksqlDB for Confluent Platform includes enterprise features, like role-based access control.
Confluent Cloud ksqlDB¶
ksqlDB is a database purpose-built to help developers create stream processing applications on top of Apache Kafka®. Confluent Cloud provides a fully managed solution for creating and managing ksqlDB clusters.
You can provision ksqlDB clusters by using the Confluent Cloud Console or the Confluent CLI.
If you’d like a guided tour in the Confluent Cloud Console, click here to sign up or sign in and follow the in-product tutorial: Get started with ksqlDB.
For more information, see Section 2: Add ksqlDB to the cluster.
Supported Features for ksqlDB in Confluent Cloud¶
- Web interface for managing your ksqlDB cloud environment directly from your browser that exposes all critical ksqlDB information.
- SQL editor to write, develop, and execute SQL queries with auto completion directly from the Web interface.
- Integration with Confluent Cloud Schema Registry to leverage your existing schemas to use within your SQL queries.
- SQL-based Connect integration.
- Available in AWS, GCP, and Azure in all regions.
- Private networking with Private Link.
New Features for ksqlDB in Confluent Cloud¶
For the latest features in ksqlDB, see the
Limitations for ksqlDB in Confluent Cloud¶
- Currently, user-defined functions (UDFs, UDAFs, and UDTFs) aren’t supported. For more information, see Functions.
- You can have a maximum of 40 persistent queries per cluster.
- You can have a maximum of 10 ksqlDB clusters per environment. For more information, see Step 1: Create a ksqlDB cluster in Confluent Cloud.
- Pull queries have specific limitations in Confluent Cloud. For more information, see Pull queries in Confluent Cloud.
- You can’t create more than 100 push queries.
- Fully managed ksqlDB is not available on Enterprise clusters. Only self-managed ksqlDB is supported.
Schema context limitations¶
Common use cases with cluster linking work with ksqlDB, because cluster linking yields unique topic names. Schema Registry infers the correct context from the topic name when the context isn’t provided.
Schema contexts can be leveraged if the following are true:
- Topic names are unique across Schema Registry contexts, and
TopicNameStrategyis used to look up Schema Registry subjects.
ksqlDB may not be able to discover the correct schema when contexts are used in these cases:
- Topic names are not unique across Schema Registry contexts, and/or
- The ksqlDB stream or table configures an explicit schema ID, using the
Global cache limit¶
Each ksqlDB cluster has a global cache limit of 1GB, and each query defaults to 10MB of cache usage.
If you have many queries, you may need to set the
1GB / number_of_queries so that all of your queries together
don’t exceed the limit.
To request a limit increase, submit a support request.
Pricing for ksqlDB in Confluent Cloud¶
The unit of pricing in Confluent Cloud ksqlDB is the Confluent Streaming Unit. A Confluent Streaming Unit is an abstract unit that represents the linearity of performance. For example, if a workload gets a certain level of throughput with 4 CSUs, you can expect about three times the throughput with 12 CSUs.
Confluent charges you in CSUs per hour.
You select the number of CSUs for your cluster at provisioning time. You can configure CSUs as follows:
- 1 CSU is the minimum.
- 28 CSUs is the maximum.
- Cluster sizes can be 1, 2, 4, 8, 12, 16, 20, 24, or 28 CSUs.
- Clusters with 1 or 2 CSUs can be scaled up to 4. For clusters with 4 CSUs or more, you can scale up to 28 CSUs in 4-CSU increments.
- Clusters with 8 or more CSUs are configured automatically for high availability.
- High availability cannot be enabled for clusters with less than 8 CSUs.
Confluent Streaming Unit pricing varies slightly depending on cloud provider and region. Pricing is shown in the web interface as part of ksqlDB provisioning so that you can see exact pricing for your cloud and region of choice.
Scaling CSUs for ksqlDB in Confluent Cloud¶
Scaling cluster CSUs after initial provisioning is not currently supported as a self-service option. Contact your Confluent team to scale the cluster (up or down).
Alternatively, if your cluster requires more CSUs, you can provision a new cluster with the desired number of CSUs, and migrate to your new one.
Sizing guidelines for ksqlDB in Confluent Cloud¶
The number of CSUs needed for your cluster depends on the workload, including the number of queries, query complexity, and throughput. The amount of resources allocated to a cluster is proportional to the number of CSUs defined for the cluster.
Four CSUs are sufficient for many workloads. In general, start with four CSUs and scale out if more capacity is needed, or scale down if less capacity is needed. To identify when more CSUs are needed, check the ksqlDB consumer lag or the CSU Saturation metric. For more information, see Step 9: Monitor persistent queries.
After your ksqlDB cluster is provisioned, you can only change the Confluent Streaming Unit cluster by migrating to a new one with the desired number of CSUs.
CSUs and storage¶
You get 125 GB of storage space with each CSU. For 8 CSUs and higher, clusters are configured with high availability automatically, and ksqlDB maintains replicas of your data that use half of the available storage. For 4 CSUs and 8 CSUs, you get 500 GB of user-available storage, so to expand storage past 500 GB, you must expand to at least 12 CSUs.
The following table shows how storage is provisioned for ksqlDB clusters.
|Number of CSUs||Storage||User-available storage|
|1||125 GB||125 GB|
|2||250 GB||250 GB|
|4||500 GB||500 GB|
|8||1000 GB||500 GB*|
|12-28||1500 GB||750 GB*|
* High-availability clusters with half of storage used for data replicas.
When your cluster has 8 CSUs or more, ksqlDB is configured for high availability automatically, for both processing and storage. This is done in a multi-zones way if your Kafka cluster is configured for multi-zones.
For 8 or more CSUs:
- Storage is redundant (2x).
- ksqlDB has multi-node deployment, with 4 CSUs per node.
- When the Kafka cluster is multi-zones, ksqlDB nodes are placed in multiple zones.
For fault tolerance and faster failure recovery, standby replication is enabled for clusters with 8 CSUs or more. With standby replication, a ksqlDB server, beside being the active server for some partitions, also acts as the standby server for other partitions. In this mode, a ksqlDB server subscribes to the changelog topic partition of the active and replicates updates continuously to its own copy of the table partition. Pull queries that would otherwise fail are routed to other active or standby servers that host the same partition.
The unavailability window during failures is dominated by the table restoration time and the failure detection time. Standby replication significantly reduces the table restoration time. The failure detection time is directly related to the Kafka Streams consumer configuration, and with default settings used in managed ksqlDB, failure detection can take as long as 10–15 seconds.
For more information, see Highly Available, Fault-Tolerant Pull Queries in ksqlDB.
Exclude row data in the ksqlDB processing log¶
With ksqlDB, you can exclude row data in the processing log when queries
error out by setting the ksql.logging.processing.rows.include
With ksqlDB in Confluent Cloud, this config is set to
true by default,
and row data is written to the log. You can exclude row data by using
Confluent Cloud Console, or by using the Confluent CLI.
Configure with Confluent Cloud Console¶
When you create a new ksqlDB cluster, toggle “Hide row data in processing log” by selecting Advanced under Configuration options.
Configure with Confluent Cloud CLI¶
confluent ksql cluster create <cluster-name> --api-key <api-key> --api-secret <secret> --log-exclude-rows