Quick Start for Confluent Cloud

Confluent Cloud is a fully-managed, cloud-native data streaming platform powered by Apache Kafka®. This quick start guide will help you begin your data streaming journey by showing you how to create a cluster, add a topic, and produce data.

Why use Confluent Cloud instead of managing Kafka yourself?

Confluent Cloud eliminates the operational complexity of Kafka with:

Serverless scaling and enterprise-grade reliability
Built-in security, monitoring, and schema management
Lower TCO and faster development velocity

Compare Confluent Cloud vs. Apache Kafka or sign up for Confluent Cloud to get started.

Considerations

This quick start gets you up and running with Confluent Cloud using a Basic Kafka cluster.
The first section shows how to use Confluent Cloud to create topics, and produce and consume data to and from the cluster.
The second section walks you through how to use Confluent Cloud for Apache Flink® to run queries on the data using SQL syntax.
The quick start workflows assume you already have a working Confluent Cloud environment, which includes a Stream Governance package. To learn more about Stream Governance packages, features, and environment setup workflows, see Manage Stream Governance Packages in Confluent Cloud.
Confluent Cloud has a web interface called the Cloud Console, a local command line interface, and REST APIs. Use the Cloud Console to manage cluster resources, settings, and billing. Use the Confluent CLI and REST APIs to create and manage topics and more.

Prerequisites

Access to Confluent Cloud. To get started for free, see Deploy Free Clusters on Confluent Cloud.
Internet connectivity
Stream Governance package

Section 2: Query streaming data with Flink SQL

In Section 1, you installed a Datagen connector to produce data to the users topic in your Confluent Cloud cluster.

In this section, you create a Flink workspace and write queries against the users topic and other streaming data.

Step 1: Create a Flink workspace

To write queries against streaming data in tables, create a new Flink workspace.

Navigate to the Environments page, and in the navigation menu, click Stream processing.
In the dropdown, select the environment where you created the users topic and the Datagen Source connector.
Click Create workspace.
Tip
To experiment with example data streams provided by Confluent Cloud for Apache Flink, click Try example to open a curated Flink workspace that has example Flink SQL queries you can run immediately.
A new workspace opens with an example query in the code editor, or cell.
Under the hood, Confluent Cloud for Apache Flink is creating a compute pool, which represents the compute resources that are used to run your SQL statements.
It may take a minute or two for the compute pool to be provisioned. The status is displayed in the upper-right section of the workspace.

Step 2: Run Flink SQL statements

When the compute pool status changes from Provisioning to Running, it’s ready to run queries.

Click Run to submit the example query.
The example statement is submitted, and information about the statement is displayed, including its status and a unique identifier. Click the Statement name link to open the statement details view, which displays the statement status and other information. Click X to dismiss the details view.
After an initialization period, the query results display beneath the cell.
Your output should resemble:
```
EXPR$0
0
1
2
```
Clear the previous query from the cell and run the following query to inspect the users stream.
Confluent Cloud for Apache Flink registers tables automatically on your Kafka topics, and your query runs against the users table with the streaming data from the underlying topic.
```
SELECT * FROM users;
```
Your output should resemble:
```
key             registertime  userid regionid gender
x'557365725f34' 1502088104187 User_4 Region_1 MALE
x'557365725f32' 1500243991207 User_2 Region_9 FEMALE
x'557365725f32' 1497969328414 User_2 Region_9 OTHER
...
```
Click Stop to end the query.
Data continues to flow from the Datagen connector into the users table even though the SELECT query is stopped.

Step 3: Mask a field

With a Flink compute pool running, you can run sophisticated SQL queries on your streaming data.

In this step, you run a Flink SQL statement to hide personal information in the users stream and publish the scrubbed data to a new Kafka topic, named users_mask.

Run the following statement to create a new Flink table based on the users table.
```
CREATE TABLE users_mask LIKE users;
```

When the status of the previous statement is Completed, run the following statement to inspect the schema of the users_mask table.

DESCRIBE users_mask;

Your output should resemble:

+--------------+-----------+----------+------------+
| Column Name  | Data Type | Nullable |   Extras   |
+--------------+-----------+----------+------------+
| key          | BYTES     | NULL     | BUCKET KEY |
| registertime | BIGINT    | NULL     |            |
| userid       | STRING    | NULL     |            |
| regionid     | STRING    | NULL     |            |
| gender       | STRING    | NULL     |            |
+--------------+-----------+----------+------------+

Run the following statement to start a persistent query that uses the Flink REGEXP_REPLACE function to mask the value of the gender field and stream the results to the users_mask table.
```
INSERT INTO users_mask SELECT
  `key`,
  registertime,
  userid,
  regionid,
  REGEXP_REPLACE(gender, '(\w)', '*') as gender
FROM users;
```
The INSERT INTO FROM SELECT statement runs continuously until you stop it manually.

Click to create a new cell, and run the following statement to inspect the rows in the users_mask table.

SELECT * FROM users_mask;

Your output should resemble:

key             registertime  userid regionid gender
x'557365725f34' 1488737391835 User_4 Region_5 *****
x'557365725f34' 1499070045309 User_4 Region_5 *****
x'557365725f32' 1505447077187 User_2 Region_7 *****
x'557365725f34' 1505592707164 User_4 Region_2 *****
...

Click Stop to end the SELECT statement.
The INSERT INTO statement that you started previously continues streaming data into the users_mask topic.

Step 4: View the Stream Lineage

Your Flink SQL statements are resources in Confluent Cloud, like topics and connectors, so you can view them in Stream Lineage.

In the navigation menu, find your environment and click to open it.
The Kafka clusters in the environment are shown.
Click the cluster that has the users_mask topic.
The Kafka topics in the cluster are shown.
Hover over the users_mask topic and click View topic details.
In the topic details pane, scroll to the Stream Lineage section and click View full lineage.
Hover over the nodes in the Stream Lineage diagram to see details of the data flow.

Step 5: Delete resources

When you are finished with the Quick Start, delete the resources you created to avoid unexpected charges to your account.

Cloud Console

Delete the persistent query
1. Navigate to your environment’s details page and click Flink.
2. In the statements list, find the statement that has a status of Running.
3. In the Actions column, click … and select Delete statement.
4. In the Confirm statement deletion dialog, copy and paste the statement name and click Confirm.
Delete the connector:
1. From the navigation menu, select Connectors.
2. Click DatagenSourceConnector_users and choose the Settings tab.
3. Click Delete connector, enter the connector name (DatagenSourceConnector_users), and click Confirm.
Delete the topics:
1. From the navigation menu, click Topics, select the users topic, and choose the Configuration tab.
2. Click Delete topic, enter the topic name (users), and click Continue.
3. Repeat these steps with the users_mask topic.

Confluent CLI

Delete the connector:

confluent connect delete <connector-id> [flags]

For example:

confluent connect delete lcc-aa1234 --cluster lkc-000000

Delete the topic:

confluent kafka topic delete <topic name> [flags]

For example:

confluent kafka topic delete users --cluster lkc-000000

Confluent Cloud APIs

Delete a producer.

Request:

DELETE /connect/v1/environments/{environment_id}/clusters/{kafka_cluster_id}/connectors/{connector_name}
Host: api.confluent.cloud

Delete a topic.

Request:

DELETE /kafka/v3/clusters/{kafka_cluster_id}/topics/{topic_name}
Host: pkc-{0000}.{region}.{provider}.confluent.cloud

Quick Start for Confluent Cloud

Section 1: Create a cluster and add a topic

Step 1: Create a Kafka cluster in Confluent Cloud

Step 2: Create a Kafka topic

Step 3: Create a sample producer

Step 4: View messages

Step 5: Inspect the data stream

Step 6: Delete resources (optional)

Section 2: Query streaming data with Flink SQL

Step 1: Create a Flink workspace

Step 2: Run Flink SQL statements

Step 3: Mask a field

Step 4: View the Stream Lineage

Step 5: Delete resources

Quick Start for Confluent Cloud

Section 1: Create a cluster and add a topic

Step 1: Create a Kafka cluster in Confluent Cloud

Step 2: Create a Kafka topic

Step 3: Create a sample producer

Step 4: View messages

Step 5: Inspect the data stream

Step 6: Delete resources (optional)

Section 2: Query streaming data with Flink SQL

Step 1: Create a Flink workspace

Step 2: Run Flink SQL statements

Step 3: Mask a field

Step 4: View the Stream Lineage

Step 5: Delete resources

Related content