.. _ksql_faq: Frequently Asked Questions for |ksqldb| in |cp| =============================================== ================================== What are the benefits of |ksqldb|? ================================== |ksqldb| allows you to query, read, write, and process data in |ak-tm| in real-time and at scale using intuitive SQL-like syntax. |ksqldb| does not require proficiency with a programming language such as Java or Scala, and you don’t have to install a separate processing cluster technology. ================================================ What are the technical requirements of |ksqldb|? ================================================ |ksqldb| only requires: 1. A Java runtime environment 2. Access to an Apache Kafka cluster for reading and writing data in real-time. The cluster can be on-premises or in the cloud. |ksqldb| works with clusters running vanilla Apache Kafka as well as with clusters running the Kafka versions included in Confluent Platform. We recommend the use of `Confluent Platform `__ or `Confluent Cloud `__ for running Apache Kafka. ==================================================== Is |ksqldb| owned by the Apache Software Foundation? ==================================================== No, |ksqldb| is owned and maintained by `Confluent Inc. `__ as part of its `Confluent Platform `__ product. However, |ksqldb| is licensed under the Confluent Community License. ======================================================== How does |ksqldb| compare to Apache Kafka’s Streams API? ======================================================== |ksqldb| is complementary to the Kafka Streams API, and indeed executes queries through Kafka Streams applications. They share some similarities such as having very flexible deployment models so you can integrate them easily into your existing technical and organizational processes and tooling, regardless of whether you have opted for containers, VMs, bare-metal machines, cloud services, or on-premise environments. One of the key benefits of |ksqldb| is that it does not require the user to develop any code in Java or Scala. This enables users to leverage a SQL-like interface alone to construct streaming ETL pipelines, to respond to real-time, continuous business requests, to spot anomalies, and more. |ksqldb| is a great fit when your processing logic can be naturally expressed through SQL. For full-fledged stream processing applications Kafka Streams remains a more appropriate choice. For example, implementing a finite state machine that is driven by streams of data is easier to achieve in a programming language such as Java or Scala than in SQL. In Kafka Streams you can also choose between the :ref:`DSL ` (a functional programming API) and the :ref:`Processor API ` (an imperative programming API), and even combine the two. As with many technologies, each has its sweet-spot based on technical requirements, mission-criticality, and user skillset. =========================================================================================================================== Does |ksqldb| work with vanilla Apache Kafka clusters, or does it require the Kafka version included in Confluent Platform? =========================================================================================================================== |ksqldb| works with both vanilla Apache Kafka clusters as well as with the Kafka versions included in Confluent Platform. ================================================================ Does |ksqldb| support Kafka’s exactly-once processing semantics? ================================================================ Yes, |ksqldb| supports exactly-once processing, which means it will compute correct results even in the face of failures such as machine crashes. This behavior can be configured with the ``processing.guarantee`` setting. For more information, see :ksqldb-docs:`Enable exactly-once semantics|operate-and-deploy/exactly-once-semantics/#enable-exactly-once-semantics`. ================================================================== Can I use |ksqldb| with my favorite data format (e.g. JSON, Avro)? ================================================================== |ksqldb| currently supports the following formats: - DELIMITED (e.g. comma-separated value) - JSON - Avro message values are supported. Avro keys are not yet supported. Requires |sr| and ``ksql.schema.registry.url`` in the |ksqldb| server configuration file. For more information, see :ksqldb-docs:`Configure ksqlDB for Avro|operate-and-deploy/installation/server-config/avro-schema/`. - KAFKA (for example, a ``BIGINT`` that's serialized using Kafka's standard ``LongSerializer``). See :ksqldb-docs:`Serialization Formats|developer-guide/serialization/#serialization-formats` for more details. ======================================== Is |ksqldb| fully compliant to ANSI SQL? ======================================== |ksqldb| is a dialect inspired by ANSI SQL. It has some differences because it is geared at processing streaming data. For example, ANSI SQL has no notion of “windowing” for use cases such as performing aggregations on data grouped into 5-minute windows, which is a commonly required functionality in the streaming world. ========================================== How do I shut down a |ksqldb| environment? ========================================== Exit |ksqldb| CLI: .. code:: bash ksql> exit If you're running with Confluent CLI, use the ``confluent stop`` command: .. code:: bash confluent stop ksql If you're running |ksqldb| in Docker containers, stop the ``cp-ksqldb-server`` container: .. code:: bash docker stop If you're running |ksqldb| as a system service, use the ``systemctl stop`` command: .. code:: bash sudo systemctl stop confluent-ksql For more information on shutting down |cp|, see :ref:`installation-overview`. ============================================ How do I configure the target Kafka cluster? ============================================ Define ``bootstrap.servers`` in the :ksqldb-docs:`Configure ksqlDB Server|operate-and-deploy/installation/server-config/`. .. _add-ksql-servers: ============================================================== How do I add |ksqldb| servers to an existing |ksqldb| cluster? ============================================================== You can add or remove |ksqldb| servers during live operations. |ksqldb| servers that have been configured to use the same Kafka cluster (``bootstrap.servers``) and the same |ksqldb| service ID (``ksql.service.id``) form a given |ksqldb| cluster. To add a |ksqldb| server to an existing |ksqldb| cluster the server must be configured with the same ``bootstrap.servers`` and ``ksql.service.id`` settings as the |ksqldb| cluster it should join. For more information, see :ksqldb-docs:`Configure ksqlDB Server|operate-and-deploy/installation/server-config/` and :ksqldb-docs:`Scaling ksqlDB|operate-and-deploy/capacity-planning/#scaling-ksqldb`. ========================================================================================== How can I lock-down |ksqldb| servers for production and prevent interactive client access? ========================================================================================== You can configure your servers to run a set of predefined queries by using ``ksql.queries.file`` or the ``--queries-file`` command line flag. For more information, see :ksqldb-docs:`Configure ksqlDB Server|operate-and-deploy/installation/server-config/`. ==================================================== How do I use Avro data and integrate with |sr-long|? ==================================================== Configure the ``ksql.schema.registry.url`` property in the |ksqldb| server configuration to point to |sr| (see :ksqldb-docs:`Configure ksqlDB for Avro|operate-and-deploy/installation/server-config/avro-schema/`). .. important:: - To use Avro data with |ksqldb| you must have |sr| installed. This is included by default with |cp|. - Avro message values are supported. Avro keys are not yet supported. ============================= How can I scale out |ksqldb|? ============================= The maximum parallelism depends on the number of partitions. - To scale out: start additional |ksqldb| servers with same config. This can be done during live operations. See :ref:`add-ksql-servers` - To scale in: stop the desired running |ksqldb| servers, but keep at least one server running. This can be done during live operations. The remaining servers should have sufficient capacity to take over work from stopped servers. .. tip:: Idle servers will consume a small amount of resource. For example, if you have 10 |ksqldb| servers and run a query against a two-partition input topic, only two servers perform the actual work, but the other eight will run an "idle" query. ========================================================= Can |ksqldb| connect to an Apache Kafka cluster over TLS? ========================================================= Yes. Internally, |ksqldb| uses standard Kafka consumers and producers. The procedure to securely connect |ksqldb| to Kafka is the same as connecting any app to Kafka. For more information, see :ksqldb-docs:`Configure ksqlDB for Secured Apache Kafka clusters|operate-and-deploy/installation/server-config/security/#configure-ksqldb-for-secured-apache-kafka-clusters`. ===================================================================================== Can |ksqldb| connect to an Apache Kafka cluster over TLS and authenticate using SASL? ===================================================================================== Yes. Internally, |ksqldb| uses standard Kafka consumers and producers. The procedure to securely connect |ksqldb| to Kafka is the same as connecting any app to Kafka. For more information, see :ksqldb-docs:`Configure Kafka Authentication|operate-and-deploy/installation/server-config/security/#configure-kafka-authentication`. ================================= Will |ksqldb| work with |ccloud|? ================================= Yes. Running |ksqldb| against an |ak| cluster running in the cloud is pretty straightforward. For more information, see :cloud:`Connecting ksqlDB to Confluent Cloud|cp-component/ksql-cloud-config.html`. Also, you can run fully managed KSQL in |ccloud|. For more information, see :cloud:`Create streaming queries in Confluent Cloud ksqlDB|quickstart/ksql.html`. ======================================================================== Will |ksqldb| work with a Apache Kafka cluster secured using Kafka ACLs? ======================================================================== Yes. For more information, see :ksqldb-docs:`Configure Authorization of ksqlDB with Kafka ACLs|operate-and-deploy/installation/server-config/security/#configure-authorization-of-ksqldb-with-kafka-acls`. ========================================== Will |ksqldb| work with a HTTPS |sr-long|? ========================================== Yes. |ksqldb| can be configured to communicate with |sr-long| over HTTPS. For more information, see :ksqldb-docs:`Configure ksqlDB for Secured Confluent Schema Registry|operate-and-deploy/installation/server-config/security/#configure-ksqldb-for-secured-confluent-schema-registry`. ==================================================== Where are |ksqldb|-related data and metadata stored? ==================================================== In interactive mode, |ksqldb| stores metatada in and builds metadata from the |ksqldb| command topic. To secure the metadata, you must secure the command topic. The |ksqldb| command topic stores all data definition language (DDL) statements: CREATE STREAM, CREATE TABLE, DROP STREAM, and DROP TABLE. Also, the |ksqldb| command topic stores TERMINATE statements, which stop persistent queries based on CREATE STREAM AS SELECT (CSAS) and CREATE TABLE AS SELECT (CTAS). Currently, data manipulation language (DML) statements, like UPDATE and DELETE aren't available. In headless mode, |ksqldb| stores metadata in the config topic. The config topic stores the |ksqldb| properties provided to |ksqldb| when the application was first started. |ksqldb| uses these configs to ensure that your |ksqldb| queries are built compatibly on every restart of the server. =================================================== Which |ksqldb| queries read or write data to Kafka? =================================================== SHOW STREAMS and EXPLAIN statements run against the |ksqldb| server that the |ksqldb| client is connected to. They don't communicate directly with Kafka. CREATE STREAM WITH and CREATE TABLE WITH write metadata to the |ksqldb| command topic. Persistent queries based on CREATE STREAM AS SELECT and CREATE TABLE AS SELECT read and write to Kafka topics. Non-persistent queries based on SELECT that are stateless only read from Kafka topics, for example SELECT … FROM foo WHERE …. Non-persistent queries that are stateful read and write to Kafka, for example, COUNT and JOIN. The data in Kafka is deleted automatically when you terminate the query with CTRL-C. =============================================== How do I check the health of a |ksqldb| server? =============================================== Use the ``ps`` command to check whether the |ksqldb| server process is running, for example: .. code:: bash ps -aux | grep ksql Your output should resemble: .. code:: bash jim 2540 5.2 2.3 8923244 387388 tty2 Sl 07:48 0:33 /usr/lib/jvm/java-8-oracle/bin/java -cp /home/jim/confluent-5.0.0/share/java/monitoring-interceptors/* ... If the process status of the JVM isn't ``Sl`` or ``Ssl``, the |ksqldb| server may be down. If you're running |ksqldb| server in a Docker container, run the ``docker ps`` or ``docker-compose ps`` command, and check that the status of the ``ksql-server`` container is ``Up``. Check the health of the process in the container by running ``docker logs ``. Check runtime stats for the |ksqldb| server that you're connected to. - Run SHOW STREAMS or SHOW TABLES, then run DESCRIBE EXTENDED. - Run SHOW QUERIES, then run EXPLAIN . The |ksqldb| REST API supports a "server info" request (for example, ``http:///info``), which returns info such as the |ksqldb| version. For more info, see :ksqldb-docs:`REST API Index|developer-guide/api/`. =============================================== What if automatic topic creation is turned off? =============================================== If automatic topic creation is disabled, |ksqldb| and Kafka Streams applications continue to work. |ksqldb| and Kafka Streams applications use the Admin Client, so topics are still created.