This tutorial demonstrates a simple workflow using ksqlDB to write streaming queries against messages in Apache Kafka®.
Write Streaming Queries with the ksqlDB CLI¶
=========================================== = _ _ ____ ____ = = | | _____ __ _| | _ \| __ ) = = | |/ / __|/ _` | | | | | _ \ = = | <\__ \ (_| | | |_| | |_) | = = |_|\_\___/\__, |_|____/|____/ = = |_| = = Event Streaming Database purpose-built = = for stream processing apps = =========================================== Copyright 2017-2020 Confluent Inc. CLI v6.0.15, Server v6.0.15 located at http://localhost:8088 Having trouble? Type 'help' (case-insensitive) for a rundown of how things work! ksql>
Get started with the ksqlDB CLI:
Common use cases¶
Use ksqlDB to implement solutions for these common use cases.
Create and query a set of materialized views about phone calls made to a call center. This tutorial demonstrates capturing changes from a MySQL database, forwarding them into Kafka, creating materialized views with ksqlDB, and querying them from your applications.
Streaming ETL pipeline¶
Create a streaming ETL pipeline that ingests and joins events together to create a cohesive view of orders that have shipped. This tutorial demonstrates capturing changes from Postgres and MongoDB databases, forwarding them into Kafka, joining them together with ksqlDB, and sinking them out to ElasticSearch for analytics.
Clickstream Data Analysis Pipeline¶
Clickstream analysis is the process of collecting, analyzing, and reporting aggregate data about which pages a website visitor visits and in what order. The path the visitor takes though a website is called the clickstream.
This tutorial focuses on building real-time analytics of users to determine:
- General website analytics, such as hit count and visitors
- Bandwidth use
- Mapping user-IP addresses to actual users and their location
- Detection of high-bandwidth user sessions
- Error-code occurrence and enrichment
- Sessionization to track user-sessions and understand behavior (such as per-user-session-bandwidth, per-user-session-hits etc)
The tutorial uses standard streaming functions (i.e., min, max, etc) and enrichment using child tables, stream-table join, and different types of windowing functionality.
Get started now with these instructions:
If you don’t have Docker, you can also run an automated version of the Clickstream tutorial designed for local Confluent Platform installs. Running the Clickstream demo locally without Docker requires that you have Confluent Platform installed locally, along with Elasticsearch and Grafana.
These examples provide common ksqlDB usage operations:
You can configure Java streams applications to deserialize and ingest data in multiple ways, including Kafka console producers, JDBC source connectors, and Java client producers. For full code examples, see Pipelining with Kafka Connect and Kafka Streams.
ksqlDB in a Kafka Streaming ETL¶
To learn how to deploy a Kafka streaming ETL using ksqlDB for stream processing, you can run the Confluent Platform demo. All components in the Confluent Platform demo have encryption, authentication, and authorization configured end-to-end.
|Confluent Platform 5.5: What’s New in ksqlDB||Overview of new features in Confluent Platform ksqlDB.|
|ksqlDB Demo: The Event Streaming Database in Action||Build a movie-rating system by using a connector with ksqlDB.|
Level Up Your KSQL Videos¶
|KSQL Introduction||Intro to Kafka stream processing, with a focus on KSQL.|
|KSQL Use Cases||Describes several KSQL uses cases, like data exploration, arbitrary filtering, streaming ETL, anomaly detection, and real-time monitoring.|
|KSQL and Core Kafka||Describes KSQL dependency on core Kafka, relating KSQL to clients, and describes how KSQL uses Kafka topics.|
|Installing and Running KSQL||How to get KSQL, configure and start the KSQL server, and syntax basics.|
|KSQL Streams and Tables||Explains the difference between a STREAM and TABLE, shows a detailed example, and explains how streaming queries are unbounded.|
|Reading Kafka Data from KSQL||How to explore Kafka topic data, create a STREAM or TABLE from a Kafka topic, identify fields. Also explains metadata like ROWTIME and TIMESTAMP, and covers different formats like Avro, JSON, and Delimited.|
|Streaming and Unbounded Data in KSQL||More detail on streaming queries, how to read topics from the beginning, the differences between persistent and non-persistent queries, how do streaming queries end.|
|Enriching data with KSQL||Scalar functions, changing field types, filtering data, merging data with JOIN, and rekeying streams.|
|Aggregations in KSQL||How to aggregate data with KSQL, different types of aggregate functions like COUNT, SUM, MAX, MIN, TOPK, etc, and windowing and out-of-order data.|
|Taking KSQL to Production||How to use KSQL in streaming ETL pipelines, scale query processing, isolate workloads, and secure your entire deployment.|
|Insert Into||A brief tutorial on how to use INSERT INTO in KSQL by Confluent.|
|Struct (Nested Data)||A brief tutorial on how to use STRUCT in KSQL by Confluent.|
|Stream-Stream Joins||A short tutorial on stream-stream joins in KSQL by Confluent.|
|Table-Table Joins||A short tutorial on table-table joins in KSQL by Confluent.|
|Monitoring KSQL in Confluent Control Center||Monitor performance and end-to-end message delivery of your KSQL queries.|