Track Data with Stream Lineage on Confluent Cloud

To move forward with updates to mission-critical applications or answer questions on important subjects like data regulation and compliance, teams need an easy means of comprehending the big picture journey of data in motion.

Stream lineage provides a graphical UI of event streams and data relationships with both a bird’s eye view and drill-down magnification for answering questions like:

  • Where did data come from?
  • Where is it going?
  • Where, when, and how was it transformed?

Answers to questions like these allow developers to trust the data they’ve found, and gain the visibility needed to make sure their changes won’t cause any negative or unexpected downstream impact. Developers can learn and make decisions quickly with live metrics and metadata inspection embedded directly within lineage graphs.

Note

If you are working with secured data:

  • The following tutorial assumes that you have role-based access to the clusters and topics you need.
  • The tutorial assumes that you have role-based access to stream lineage. If you do not have this access, stream lineage will not show up as an option on any screen.
  • Developer roles do not grant access to Stream Lineage. You can give developers access to Stream Lineage by granting them additional roles with appropriate scope for need-to-know; such as Operator role at the cluster level. To learn more, see Access control (RBAC) for Stream Lineage and Role-based Access Control (RBAC) in Confluent Cloud.

First look

What stream lineage shows

Stream lineage in Confluent Cloud is represented visually to show the movement of data from source to destination, and how it is transformed as it moves. The lineage graph always shows the activity of producers and consumers of data for the last 10 minutes.

How to access stream lineage views

There are multiple ways to get into the stream lineage view, as described in Summary of navigation paths. This example shows one path.

To view the stream lineage UIs:

  1. Log on to Confluent Cloud using the Confluent CLI.

  2. Select an environment.

  3. Select a cluster.

  4. Select a topic.

  5. Click See in Stream Lineage on the top right of the topic page.

    Tip

    If you do not see the option See in Stream Lineage, then you do not have the required permissions. To learn more, see Access control (RBAC) for Stream Lineage.

  6. The stream lineage for that topic is shown.

../_images/dg-dl-overview.png

The stream lineage shown in this example is the result of setting up a data pipeline based on several ksqlDB query streams. If you haven’t set up a data pipeline yet, your lineage view may only show a single, lonely event node.

To get an interesting lineage like the one shown above, take a spin through the tutorial in the next section!

Summary of navigation paths

You can get into a stream lineage view from any of the following paths and resources on the Cloud Console:

  • On the left menu from anywhere within a cluster on the Cloud Console, click Stream Lineage.

    ../_images/dg-dl-left-menu-lineage.png
  • From inside a topic, click See in Stream Lineage on the top right of the topic page.

    ../_images/dg-dl-left-view-lineage-from-topic.png
  • From a ksqlDB table or stream, click See in Stream Lineage on the top right of table or stream page.

    ../_images/dg-dl-left-view-lineage-from-ksqlDB-stream.png
  • From a client, such as a producer or consumer, click See in Stream Lineage on the top right of client page.

    ../_images/dg-dl-left-view-lineage-from-client.png
  • From a connector, click See in Stream Lineage on the top right of connector page.

    ../_images/dg-dl-left-view-lineage-from-connector.png

Tutorial

In order to really see stream lineage in action, you need to configure topics, producers, and consumers to create a data pipeline. Once you have events flowing into your pipeline, you can use stream lineage to inspect where data is coming from, what transformations are applied to it, and where it’s going.

Select an environment, cluster, and Schema Registry

  1. Add an environment or select an existing one.

  2. Add a cluster or select an existing one on which to run the demo.

    If you create a new cluster:

    • You must select a cluster type. You can choose any cluster type.
    • Choose a cloud provider and region.
    • Click Continue to review configuration and costs, usage limits, and uptime service level agreement (SLA)

    Then click Launch Cluster

  3. Enable a Schema Registry (if not already enabled) by navigating to the schemas page for your cluster and follow the prompts to choose a region and cloud provider.

    ../_images/dg-dl-sr-create.png
  4. The Schema Registry settings and information will be available on the Schema Registry tab for the environment.

    ../_images/dg-dl-cluster-settings.png
  5. Generate and save the Schema Registry API key and secret for this Schema Registry. (Save the key to use later on step 10 of this procedure.)

If you need help with these initial steps, see Quick Start for Confluent Cloud.

Create the “stocks” topic and generate data

  1. (Optional) Create a topic named stocks.

    Tip

    • This step is optional because adding the Datagen connector (as described in next steps) will automatically create the stocks topic if it does not exist.
    • To learn more about manually creating topics and working with them, see Work With Topics in Confluent Cloud.
  2. Choose Connectors from the menu and select the Datagen source connector.

  3. Add the Connect Datagen source connector to generate sample data to the stocks topic, using these settings:

    • Select the Stock trades template
    • Select AVRO as the output record format

    You’ll also need to generate and save an API key and secret for this cluster, if you have not done so already.

    ../_images/dg-dl-datagen-setup.png
  4. Click Continue and review sizing.

    ../_images/dg-dl-datagen-setup-02.png
  5. Click Continue, and review or update the configuration.

    • Here you can rename the connector; for example, StockSource Connector.
    ../_images/dg-dl-datagen-setup-03.png
  6. Click Continue to start sending data to the target topic.

    ../_images/dg-dl-datagen-setup-02.png

    The connector first shows as Provisioning, then Running when it is fully initiated.

Create a ksqlDB app

  1. Navigate to ksqlDB
  2. Click Create cluster myself.
  3. Select Global access and click Continue.
  4. Provide a cluster name, such as ksqlDB_stocks, and accept the defaults for cluster size.
  5. Click Launch cluster.

Tip

  • Provisioning will take some time. In some cases, it can take up to an hour.
  • By creating the ksqlDB app with global access, you avoid having to create specific ACLs for the app itself. With global access, the ksqlDB cluster is running with the same level of access to Kafka as the user who provisions ksqlDB. If you are interested in learning how to manage ACLs on a ksqlDB cluster with granular access, see Appendix A: Creating a ksqlDB app with granular access and assigning ACLs.

Verify your ksqlDB app is running

Return to the list of ksqlDB apps on the Cloud Console.

Your ksqlDB app should have completed Provisioning, and show a status of Up.

../_images/dg-dl-ksql-app-up.png

Create persistent streams in ksqlDB to filter on stock prices

Navigate to the ksqlDB Editor and click into your ksqlDB app, ksqlDB_stocks (ksqlDB_stocks > Editor), to create the following persistent streams.

Specify each query statement in the Editor and click Run query to start the query. You can click the Streams tab to view a list of running queries.

  1. Create a stream for the stocks topic, then create a persistent stream that filters on stocks with price <= 100. This feed the results to the stocks_under_100 topic.

    You’ll need to specify and run three separate queries for this step. You start by creating the stocks stream, then add the filters to find and list stocks under $100. After each of these, click Run query, then clear the editor to specify the next statement.

    CREATE STREAM stocks WITH (KAFKA_TOPIC = 'stocks', VALUE_FORMAT = 'AVRO');
    
    CREATE STREAM stocks_under_100 WITH (KAFKA_TOPIC='stocks_under_100', PARTITIONS=10, REPLICAS=3) AS SELECT * FROM stocks WHERE (price <= 100);
    
    SELECT * FROM stocks_under_100 EMIT CHANGES;
    

    When you have these running, click the Streams tab. You should have two new streams, STOCKS and STOCKS_UNDER_100. (The last statement is a transient query on the stream, STOCKS_UNDER_100, to get some data onto the UI.)

  2. Create a persistent stream that filters on stocks to BUY, and feed the results to the stocks_buy topic.

    You’ll need to specify and run two separate queries for this step. After each of these, click Run query, then clear the editor to specify the next statement.

    CREATE STREAM stocks_buy WITH (KAFKA_TOPIC='stocks_buy', PARTITIONS=10, REPLICAS=3) AS SELECT * FROM stocks WHERE side='BUY';
    
    SELECT * FROM stocks_buy EMIT CHANGES;
    
  3. Create a persistent stream that filters on stocks to SELL.

    You’ll need to specify and run two separate queries for this step. After each of these, click Run query, then clear the editor to specify the next statement.

    CREATE STREAM stocks_sell WITH (KAFKA_TOPIC='stocks_sell', PARTITIONS=10, REPLICAS=3) AS SELECT * FROM stocks WHERE side='SELL';
    
    SELECT * FROM stocks_sell EMIT CHANGES;
    

When you have completed these steps, click the ksqlDB > Streams tab. You should have four persistent ksqlDB query streams producing data to their associated topics:

  • STOCKS
  • STOCKS_BUY
  • STOCKS_SELL
  • STOCKS_UNDER_100
../_images/dg-dl-ksqldb-streams-all.png

The associated topics and schemas will be listed on those pages, respectively. Here is an example of the Topics page.

../_images/dg-dl-topics.png

Consume events from the “stocks” topic

Now, set up a consumer using the Confluent CLI to consume events from your stocks topic.

Tip

Got Confluent CLI? Make sure it’s up-to-date.

  1. Log on using the Confluent CLI. (Provide username and password at prompts.)

    confluent login --url https://confluent.cloud
    
  2. List the environments to verify you are on the environment.

    confluent environment list
    
  3. If needed, re-select the environment you’ve been using for this demo.

    confluent environment use <ENVIRONMENT_ID>
    
  4. List the clusters to verify you are on the right cluster.

    confluent kafka cluster list
    
  5. If needed, re-select the cluster you’ve been using for this demo.

    confluent kafka cluster use <KAFKA_CLUSTER_ID>
    
  6. Create Kafka API credentials for the consumer.

    Create an API key.

    confluent api-key create --resource <KAFKA_CLUSTER_ID>
    

    Use the API key.

    confluent api-key use <API_KEY> --resource <KAFKA_CLUSTER_ID>
    

    Alternatively, you can store the key.

    confluent api-key store --resource  <KAFKA_CLUSTER_ID>
    
  7. Run a CLI consumer.

    confluent kafka topic consume stocks_buy --value-format avro --group buy_group
    
  8. When prompted, provide the Schema Registry API key you generated in the first steps.

    You should see the consumer data being generated to the consumer at the command line, for example:

    My-MacBook-Pro:~ my$ confluent kafka topic consume stocks_buy --value-format avro --group buy_group
    Starting Kafka Consumer. ^C or ^D to exit
    {"SIDE":{"string":"BUY"},"QUANTITY":{"int":959},"SYMBOL":{"string":"ZVZZT"},"PRICE":{"int":704},"ACCOUNT":{"string":"XYZ789"},"USERID":{"string":"User_8"}}
    {"ACCOUNT":{"string":"ABC123"},"USERID":{"string":"User_1"},"SIDE":{"string":"BUY"},"QUANTITY":{"int":1838},"SYMBOL":{"string":"ZWZZT"},"PRICE":{"int":405}}
    {"QUANTITY":{"int":2163},"SYMBOL":{"string":"ZTEST"},"PRICE":{"int":78},"ACCOUNT":{"string":"ABC123"},"USERID":{"string":"User_8"},"SIDE":{"string":"BUY"}}
    {"PRICE":{"int":165},"ACCOUNT":{"string":"LMN456"},"USERID":{"string":"User_2"},"SIDE":{"string":"BUY"},"QUANTITY":{"int":4675},"SYMBOL":{"string":"ZJZZT"}}
    {"QUANTITY":{"int":1702},"SYMBOL":{"string":"ZJZZT"},"PRICE":{"int":82},"ACCOUNT":{"string":"XYZ789"},"USERID":{"string":"User_7"},"SIDE":{"string":"BUY"}}
    {"ACCOUNT":{"string":"LMN456"},"USERID":{"string":"User_9"},"SIDE":{"string":"BUY"},"QUANTITY":{"int":2982},"SYMBOL":{"string":"ZVV"},"PRICE":{"int":643}}
    {"SIDE":{"string":"BUY"},"QUANTITY":{"int":3687},"SYMBOL":{"string":"ZJZZT"},"PRICE":{"int":514},"ACCOUNT":{"string":"ABC123"},"USERID":{"string":"User_5"}}
    {"USERID":{"string":"User_5"},"SIDE":{"string":"BUY"},"QUANTITY":{"int":289},"SYMBOL":{"string":"ZJZZT"},"PRICE":{"int":465},"ACCOUNT":{"string":"XYZ789"}}
    ...
    

Explore the data pipeline in stream lineage

Stream data quick tour

With the producers and consumers up and running, you can use stream lineage to visualize and explore the flow of data from the source connector to the STOCKS topic, where queries filter the data on specified limits and generate lists to your three topics: - STOCKS_BUY - STOCKS_SELL - STOCKS_UNDER_100

  1. Search for stocks topic on the search box.

    ../_images/dg-dl-search-topic.png
  2. Click See in Stream Lineage on the top right of the stocks topic page.

    The stream lineage for the stocks topic is shown.

    ../_images/dg-dl-overview.png
  3. Hover on a node for a high level description of the data source and throughput.

    • This example shows a ksqlDB query node

      ../_images/dg-dl-on-query-hover.png

      The thumbnail in this case shows:

      • Mode and type: persistent stream
      • Total number of bytes in and out of the flow for the last 10 minutes
      • Total number of messages in and out of the flow for the last 10 minutes
    • This example shows a topic node:

      ../_images/dg-dl-on-topic-hover.png

      The thumbnail in this case shows:

      • Topic name
      • Schema format (can be Avro, Protobuf, or JSON schema)
      • Number of partitions for the topic
      • Total number of bytes into the topic during the last 10 minutes
      • Total number of messages received by the topic in the last 10 minutes
  4. Click a node to inspect.

    ../_images/dg-dl-on-topic-drilldown.png
  5. Return to the diagram, and hover on an edge to get a description of the flow between the given nodes.

    ../_images/dg-dl-on-edge-hover.png
  6. Click the edge to inspect.

    ../_images/dg-dl-on-edge-drilldown.png

Tabs on node drilldown to inspect queries

The stream lineage inspect panel surfaces details and metrics about the queries based on the nodes you select. The tabs available and details shown will vary, depending on the query. For example:

  • Overview tab - Shows per topic throughput, along with bytes consumed and produced.

    ../_images/dg-dl-topic-tabs-overview.png
  • Messages tab - Shows the list of messages the topic received.

    ../_images/dg-dl-topic-tabs-messages.png
  • Schema tab - Shows a view-only copy of the schema for the topic. An editable version is available directly from the topic (see Manage Schemas in Confluent Cloud).

    ../_images/dg-dl-topic-tabs-schema.png
  • Query tab - Shows a view-only copy of the persistent query that is sending results to the topic. (For details on stream processing, see ksqlDB Stream Processing.)

    ../_images/dg-dl-topic-tabs-query.png

View and navigate options

From anywhere on the stream lineage view:

  • Click the tab on the left side of the lineage view at any time to show/hide a navigation panel.

    ../_images/dg-dl-left-nav.png

From a drilldown on a ksqlDB query, within the lineage tabs:

  • Click View query at the top of the tabs view to jump directly to the ksqlDB stream associated with the persistent ksqlDB query.

  • Click the tab handle to the left of the tab view to expand the tab to full.

    ../_images/dg-dl-stream-tabs-expand.png

In addition to the above, the lineage view provides options to link directly into topics, schemas, queries, and connectors at various point on the UIs.

Try this

  • Click the stocks topic node, and scroll through the message throughput timelines on the Overview tab, then click Edit topic to go directly to the topic.
  • Click the stocks_buy topic node, then click the Schema tab to view its associated schema.
  • Click a query, such as stocks_buy query, and click the Schema tab. This shows you a menu style view of the same schema because the schema associated with the stocks_buy topic is coming from the stocks_buy query.
  • To verify this, click View query to link to the ksqlDB_stocks, then click the Flow tab under that app, and click stocks_buy on that diagram. (Note that you also can visualize a data flow particular to that query from directly within the ksqlDB app, but not the combined flows of all queries to all topics, as is shown on stream lineage.)

Hide or show internal topics

From any stream lineage graph view, you have the option to hide or show internal (system) topics. System topics are those that manage and track Confluent Cloud metadata, such as replication factors, partition counts, and so forth. Typically, this system metadata is of less interest than data related your own topics, and you’ll want to hide it.

../_images/dg-dl-hide-internal-topics.png

Browsing the diagram view

Set the diagram to a point in time (point in time lineage)

What is it?

Point-in-time lineage allows to visualize the flow of data at a point in time in the past. The Essentials package only allow users to visualize the flow of data from the past 10 minutes, but with point-in-time feature, only available on Advanced package, a user can choose to see last 10 minutes, 30 minutes, 1 hour, 4 hours, 8 hours, 12 hours, 24 hours and 1 hour window on any of the last 7 days.

Screenshot of point in time Stream Lineage for features page

Why is it important?

Sometimes the flow of data is not always continuous, might be that you ingest data at a regular non real-time cadence, or some issue might interrupt the flow of the data. In those cases looking at the last 10 minutes might not show anything in Stream Lineage. Another important use case might be to troubleshoot a potential data breach where you want to navigate to a point in time to understand who were all the data consumers at that point.

This blog post explains the use of point-in-time lineage and the main use cases for Stream Lineage: How to Visualize Your Apache Kafka Data the Easy Way with Stream Lineage

How to use it

By default, graphs represent the last 10 minutes of data flowing through the system. You can navigate and search the graphs in this default time window, or set a specific time window. These settings apply to all data on the cluster, whether that data is currently on-screen or not.

../_images/dg-dl-point-in-time.png

Pre-set windows are available for:

  • Last 10 minutes
  • Last 30 minutes
  • Last 1 hour
  • Last 4 hours
  • Last 8 hours
  • Last 12 hours
  • Last 24 hours (maximum size of a pre-set time window)
../_images/dg-dl-point-in-time-windows.png

You can also set a custom date and time window for your search, going back 7 days for a selected 1 hour block.

../_images/dg-dl-custom-point-in-time.png

The graphs, nodes, and available data will change depending on the selected time window. For example, a custom setting to show data only from last Friday from 6:00-7:00 AM will not show streams created later in the week. Similarly, the graph search is dependent on the time window setting, and will not find data that isn’t available in the current time window.

Export a lineage diagram

To export the current diagram, click the Export icon image_export on the lower right tool panel.

../_images/dg-dl-export-diagram.png

Reset the view

To reset the view to center on the entity that is the original focus of the diagram, click the Reset icon image_reset_view on the lower right tool panel.

../_images/dg-dl-reset-view.png

Reset view is only applicable when you launch the lineage diagram from within an entity, such as a topic, ksqlDB table or stream, producer, consumer, and so forth. It is not applicable if you launch the lineage diagram from the left menu or dashboard because that is a global view, not centered on any specific node to begin with.

Zoom in or out

Use the + and - buttons on the lower right tool panel to zoom in or zoom out on the lineage diagram.

../_images/dg-dl-zoom-diagram.png

Traverse the Diagram

To explore the diagram, click, hold, and drag the cursor, or use analogous actions such as three-finger drag on a Mac trackpad.

All streams

Click All Streams on the lower right of a diagram to view cards representing the data flows.

../_images/dg-dl-all-streams.png

The default view shows Stream 1.

../_images/dg-dl-all-streams-1.png

Click another card to focus in on a particular stream, for example Stream 2. The diagram updates to show only the selected stream.

../_images/dg-dl-all-streams-2.png

Understanding data nodes

Consumers and producers are automatically grouped; that is, a group of consumers or producers is represented as a single node that expands upon drilldown to show the client IDs.

Node Description
node-topic

A topic node shows:

  • Topic name and link to the topic
  • Associated schemas (key and value)
  • Number of partitions
  • Total throughput as a time series line graph
  • Bytes consumed and produced per app, and per partition
  • Messages consumed and produced per query, which is ingress and egress for the stream
node-customApp

A custom application node provides:

  • Name of the application, and its status
  • Total throughput as a time series line graph
  • Bytes produced and consumed per topic
  • Drilldowns to show consumers and producers
node-query

A ksqlDB query node shows:

  • Query name and link to the query
  • Query type, status, and mode
  • Total throughput as a time series line graph
  • Bytes produced and consumed per topic
  • Drilldowns to show the ksqlDB app data
node-kstream

A Kafka streams app includes:

  • Application ID
  • Total throughput as a time series line graph
  • Bytes in and bytes out in the past 10 minutes
  • Messages in and messages out in the past 10 minutes
node-connector

A Kafka Connector node shows:

  • Connector name and link to the connector
  • Type of connector (plugin type)
  • Message throughput and lag
  • Total production as a time series line graph
  • Bytes produced per topic as a time series line graph
  • Associated tasks and their statuses
node-cli

A CLI node shows monitoring data on producers and consumers running on the the Confluent CLI, producing to or reading from a topic on your Confluent Cloud cluster:

  • For consumers, name or consumer group name, bytes in, number of messages read
  • For producers, client ID, bytes out, number of messages sent
  • Total and per topic number of bytes produced or consumed, as time series graphs
  • Drilldowns to show individual consumers, producers, and messages

Understanding node groups

In cases where you have a threshold number of nodes of the same type in a workflow (24 or more like nodes), stream lineage collapses these into a single node group to save screen real estate and improve navigation. This grouping is purely for visual display of like nodes that are processing data to or from the same connection point.

../_images/dg-dl-super-node-overview.png

To drill down on the individual nodes represented by a node group:

  1. Click the composite node (node group) to display the individual nodes on the right.

    ../_images/dg-dl-super-node-citibike.png
  2. Select a node from the list on the right to inspect details of that node.

    ../_images/dg-dl-super-node-citibike-drilldown.png

Automatic visual grouping of like nodes applies to any node type, but producers and consumers are the most common as there is a tendency to employ large numbers of these.

Understanding edges

Edge thumbnails and drilldowns describe the flow between the given nodes. They show:

  • The node where the data came from
  • The node where the data is going to
  • Bytes transfered
  • Number of messages transferred

The relative thickness of an edge indicates the amount of data that is moving through the connected nodes during the selected time range, also known as throughput. Thicker edges have a higher throughput than thinner edges. To get specific throughput numbers, use the drilldowns.

Hovering on an edge gives you the thumbnail.

../_images/dg-dl-edge-summary.png

Drilldown on an edge provides the tab view.

../_images/dg-dl-edge-summary-tab.png

Access control (RBAC) for Stream Lineage

If role-based access control (RBAC) is configured on your clusters, you must make sure you have access to the appropriate resources, such as clusters, topics, and features such as stream lineage views. This section provides a summary of roles related to stream lineage access.

For details on how to manage RBAC for these resources, see List the role bindings for a principal and List the role bindings for a principal.

  • The following roles have full access to stream lineage views:

    Role View Scope Admin Scope
    OrganizationAdmin All All
    EnvironmentAdmin Organization, Support Plan, Users All clusters in the environment, Schema Registry, Networking
    CloudClusterAdmin Organization, Environment, Support Plan, Users, Schema Registry Specified Cluster, Topics, ksqlDB applications, Connectors, Schema Subjects
    Operator Organization, Environment, Cluster N/A
    DataDiscovery Environment N/A
    DataSteward Environment N/A

Note

  • Developer roles do not grant access to Stream Lineage. You can give developers access to Stream Lineage by granting them additional roles from the above table with appropriate scope for need-to-know; such as the Operator role at the cluster level. To learn more, see Access control (RBAC) for Stream Lineage and Role-based Access Control (RBAC) in Confluent Cloud.
  • The DataSteward and DataDiscovery roles have access to view the lineage but cannot access the actual message contents.
  • DataSteward and DataDiscovery do not have access to view details of a ksqlDB node.

To learn more, see Role-based Access Control (RBAC) in Confluent Cloud.

Pause or teardown

When you are ready to quit the demo, don’t forget to either pause or tear down resources so as not to incur unnecessary charges. The extent to which you want to maintain your setup will depend on your use case. Here are some options.

Pause data generation temporarily

If you want to keep your setup but minimize data traffic and cost when the system isn’t in use, do the following.

  1. Stop the consumer.

    On the Confluent CLI where the consumer is running, press Ctrl+C to stop consuming data.

  2. Pause the <kconnect> Datagen “StockSource” producers.

    On the Confluent Cloud Console, navigate to the Datagen StockSource connector, click it to drill down, and pause the connector.

    This will suspend data generation so that no messages are flowing through your cluster.

    ../_images/dg-dl-stop-datagen-connector.png

Tip

How to restart the demo

In this scenario, you can always resume using the stocks app at any time by clicking Resume on the StockSource connector, and restarting the consumer from the Confluent CLI.

Stop the queries and remove the cluster

To entirely tear down this instance, follow the “temporary pause” steps above, but also perform the following tasks on the Confluent Cloud Console:

  1. Navigate to ksqlDB, click ksqlDB_stocks, click the Persistent queries tab, and click Terminate on each query.
  2. After you pause the <kconnect> Datagen “StockSource”, delete the connector.
  3. At the Environment level, delete the cluster.

Appendix A: Creating a ksqlDB app with granular access and assigning ACLs

As an alternative to creating the ksqlDB app with global access, you can create the app with granular access, assign a service account to it, and then create ACLs limited specifically to your ksqlDB app. There may be cases where you want to limit access to the ksqlDB cluster to specific topics or actions.

  1. Navigate to ksqlDB

  2. Click Create application myself.

  3. Select Granular access and click Continue.

  4. Under Create a service account:

    • Select Create a new one (unless you already have an account you want to use).
    • Provide a new service account name and description, such as stocks_trader ksqlDB_stocks.
    • Check the box to add required ACLs when the ksqlDB app is created.
  5. Provide access to the stocks topic (this should already be selected), and click Continue.

  6. Create the ACLs for your ksqlDB app as follows (skip this step if you have done this previously for this app).

  7. Log on to the Confluent Cloud by means of the Confluent CLI. (Provide username and password at prompts.)

    confluent login --url https://confluent.cloud
    
  8. List the environments to get the environment ID.

    confluent environment list
    
  9. Select the environment you’ve been using for this demo.

    confluent environment use <ENVIRONMENT_ID>
    
  10. List the clusters to get the right cluster ID.

    confluent kafka cluster list
    
  11. Select the cluster you’ve been using for this demo.

    confluent kafka cluster use <KAFKA_CLUSTER_ID>
    
  12. List the ksqlDB apps to get the ID for your app.

    confluent ksql cluster list
    
  13. Run this command to get the service account ID.

    confluent ksql cluster configure-acls <KSQL_APP_ID> * --cluster <KAFKA_CLUSTER_ID> --dry-run
    
  14. Copy the service account ID (after User:<SERVICE_ACCOUNT_ID> in the output).

  15. Allow READ access to all topics on the ksql cluster for your service account ID.

    confluent kafka acl create --allow  --service-account  <SERVICE_ACCOUNT_ID> --operations read --topic  '*'
    
  16. Allow WRITE access to all topics on the ksql cluster for your service account ID.

    confluent kafka acl create --allow --service-account <SERVICE_ACCOUNT_ID> --operations write --topic "*"
    
  17. Allow CREATE access for all topics on the ksql cluster for your service account ID.

    confluent kafka acl create --allow  --service-account  <SERVICE_ACCOUNT_ID> --operations create --topic  '*'