# CONFLUENT PLATFORM * [Overview](overview.md) * [Get Started](get-started/index.md) * [Install and Upgrade](installation/index.md) * [Overview](installation/overview.md) * [System Requirements](installation/system-requirements.md) * [Supported Versions and Interoperability](installation/versions-interoperability.md) * [Install Manually](installation/installing_cp/index.md) * [Deploy with Ansible Playbooks](https://docs.confluent.io/ansible/current/overview.html) * [Deploy with Confluent for Kubernetes](https://docs.confluent.io/operator/current/) * [License](installation/license.md) * [Upgrade](installation/upgrade-overview.md) * [Installation Packages](installation/available_packages.md) * [Migrate to Confluent Platform](installation/migrating.md) * [Migrate to and from Confluent Server](installation/migrate-confluent-server.md) * [Migrate from Confluent Server to Confluent Kafka](installation/migrate-confluent-kafka.md) * [Migrate from ZooKeeper to KRaft](installation/migrate-zk-kraft.md) * [Installation FAQ](installation/faq.md) * [Build Client Applications](clients/index.md) * [Overview](clients/overview.md) * [Configure Clients](clients/config-index.md) * [Client Guides](clients/guides.md) * [Client Examples](clients/examples-index.md) * [Kafka Client APIs](clients/client-api.md) * [Deprecated Client APIs](clients/deprecate-how-to.md) * [VS Code Extension](clients/vscode/index.md) * [MQTT Proxy](kafka-mqtt/index.md) * [Build Kafka Streams Applications](streams/index.md) * [Overview](streams/overview.md) * [Quick Start](streams/quickstart.md) * [Streams API](streams/introduction.md) * [Tutorial: Streaming Application Development Basics on Confluent Platform](streams/microservices-orders.md) * [Connect Streams to Confluent Cloud](https://docs.confluent.io/cloud/current/cp-component/streams-cloud-config.html) * [Concepts](streams/concepts.md) * [Architecture](streams/architecture.md) * [Examples](streams/code-examples.md) * [Developer Guide](streams/developer-guide/index.md) * [Build Pipeline with Connect and Streams](streams/connect-streams-pipeline.md) * [Operations](streams/operations.md) * [Upgrade](streams/upgrade-guide.md) * [Frequently Asked Questions](streams/faq.md) * [Javadocs](streams/javadocs.md) * [ksqlDB](ksqldb/index.md) * [Confluent Private Cloud](private-cloud/index.md) * [Overview](private-cloud/overview.md) * [Confluent Private Cloud Gateway](https://docs.confluent.io/private-cloud-gateway/current) * [Intelligent Replication](private-cloud/intelligent-replication/index.md) * [Release Notes for Confluent Private Cloud](private-cloud/cpc-release-notes.md) * [Confluent REST Proxy for Apache Kafka](kafka-rest/overview.md) * [Overview](kafka-rest/index.md) * [Quick Start](kafka-rest/quickstart.md) * [API Reference](kafka-rest/api.md) * [Production Deployment](kafka-rest/production-deployment/index.md) * [Connect to Confluent Cloud](https://docs.confluent.io/cloud/current/cp-component/kafka-rest-config.html) * [Process Data With Flink](flink/index.md) * [Overview](flink/overview.md) * [Installation and Upgrade](flink/installation/index.md) * [Get Started](flink/get-started/index.md) * [Architecture and Features](flink/concepts/index.md) * [Configure Environments, Catalogs and Compute Pools](flink/configure/index.md) * [Deploy and Manage Flink Jobs](flink/jobs/index.md) * [Disaster Recovery](flink/disaster-recovery.md) * [Clients and APIs](flink/clients-api/index.md) * [How-to Guides](flink/how-to-guides/index.md) * [FAQ](flink/faq.md) * [Get Help](flink/get-help.md) * [What’s New](flink/changelog.md) * [Connect to External Services](connect/overview.md) * [Overview](connect/index.md) * [Get Started](connect/userguide.md) * [Connectors](connect/kafka_connectors.md) * [Confluent Hub](connect/confluent-hub/overview.md) * [Connect on z/OS](connect/connect-zos.md) * [Install](connect/install.md) * [License](connect/license.md) * [Supported](connect/supported-overview.md) * [Preview](connect/preview.md) * [Configure](connect/configuring.md) * [Monitor](connect/monitoring.md) * [Logging](connect/logging.md) * [Connect to Confluent Cloud](https://docs.confluent.io/cloud/current/cp-component/connect-cloud-config.html) * [Developer Guide](connect/devguide.md) * [Tutorial: Moving Data In and Out of Kafka](connect/quickstart.md) * [Reference](connect/references/index.md) * [Transform](https://docs.confluent.io/kafka-connectors/transforms/current/overview.html) * [Custom Transforms](connect/transforms/custom.md) * [Security](connect/security-overview.md) * [Design](connect/design.md) * [Add Connectors and Software](connect/extending.md) * [Install Community Connectors](connect/community.md) * [Upgrade](connect/upgrade.md) * [Troubleshoot](connect/troubleshoot.md) * [FileStream Connectors](connect/filestream_connector.md) * [FAQ](connect/faq.md) * [Manage Schema Registry and Govern Data Streams](schema-registry/overview.md) * [Overview](schema-registry/index.md) * [Get Started with Schema Registry Tutorial](schema-registry/schema_registry_onprem_tutorial.md) * [Install and Configure](schema-registry/installation/overview.md) * [Fundamentals](schema-registry/fundamentals/overview.md) * [Manage Schemas](schema-registry/schemas-overview.md) * [Security](schema-registry/security/overview.md) * [Reference](schema-registry/develop/overview.md) * [FAQ](schema-registry/faqs-cp.md) * [Manage Security](security/index.md) * [Overview](security/overview.md) * [Deployment Profiles](security/deployment-profiles.md) * [Compliance](security/compliance/index.md) * [Authenticate](security/authentication/index.md) * [Authorize](security/authorization/index.md) * [Protect Data](security/protect-data/index.md) * [Configure Security Properties using Prefixes](kafka/security_prefixes.md) * [Secure Components](security/component/index.md) * [Enable Security for a Cluster](security/security_tutorial.md) * [Add Security to Running Clusters](security/incremental-security-upgrade.md) * [Configure Confluent Server Authorizer](security/csa-introduction.md) * [Security Management Tools](security/sec-manage-tools.md) * [Cluster Registry](security/cluster-registry.md) * [Encrypt using Client-Side Payload Encryption](security/encrypt/cspe.md) * [Deploy Confluent Platform in a Multi-Datacenter Environment](multi-dc-deployments/overview.md) * [Overview](multi-dc-deployments/index.md) * [Multi-Data Center Architectures on Confluent Platform](multi-dc-deployments/multi-region-architectures.md) * [Cluster Linking on Confluent Platform](multi-dc-deployments/cluster-linking/overview.md) * [Multi-Region Clusters on Confluent Platform](multi-dc-deployments/multi-region-overview.md) * [Replicate Topics Across Kafka Clusters in Confluent Platform](multi-dc-deployments/replicator/overview.md) * [Configure and Manage](administer.md) * [Overview](config-manage/overview.md) * [Configuration Reference](installation/configuration/config-index.md) * [CLI Tools for Use with Confluent Platform](tools/cli-reference-overview.md) * [Change Configurations Without Restart](kafka/dynamic-config.md) * [Manage Clusters](clusters/index.md) * [Metadata Service (MDS) in Confluent Platform](kafka/configure-mds/overview.md) * [Docker Operations for Confluent Platform](installation/docker/operations/overview.md) * [Run Kafka in Production](kafka/deployment.md) * [Production Best Practices](kafka/post-deployment.md) * [Manage Hybrid Environments with USM](usm/index.md) * [Overview](usm/overview.md) * [Get Started with USM](usm/get-started.md) * [USM Agent Operations](usm/agent-sizing.md) * [Schema Registry in Hybrid setup](usm/usm-schema.md) * [Monitor with Control Center](https://docs.confluent.io/control-center/current/) * [Monitor](monitor/index.md) * [Logging](monitor/cp-logging.md) * [Monitor with JMX](kafka/monitoring.md) * [Monitor with Metrics Reporter](monitor/metrics-reporter.md) * [Monitor Consumer Lag](monitor/monitor-consumer-lag.md) * [Monitor with Health+](health-plus/overview.md) * [Confluent CLI](https://docs.confluent.io/confluent-cli/current/) * [Release Notes](release-notes/overview.md) * [APIs and Javadocs for Confluent Platform](api-javadoc/overview.md) * [Overview](api-javadoc/index.md) * [Kafka API and Javadocs for Confluent Platform](api-javadoc/kafka-api.md) * [Client APIs](clients/client-api.md) * [Confluent APIs for Confluent Platform](api-javadoc/confluent-api.md) * [Glossary](_glossary.md) ### Azure Blob Storage object names The Azure Blob Storage data model is a flat structure: each bucket stores objects, and the name of each Azure Blob Storage object serves as the unique key. However, a logical hierarchy can be inferred when the Azure Blob Storage object names uses directory delimiters, such as `/`. The Azure Blob Storage connector allows you to customize the names of the Azure Blob Storage objects it uploads to the Azure Blob Storage bucket. In general, the names of the Azure Blob Storage object uploaded by the Azure Blob Storage connector follow this format: ```bash ///++. Admin API : The Admin API is the Kafka REST API that enables administrators to manage and monitor Kafka clusters, topics, brokers, and other Kafka components. Ansible Playbooks for Confluent Platform : Ansible Playbooks for Confluent Platform is a set of Ansible playbooks and roles that are designed to automate the deployment and management of Confluent Platform. Apache Flink : Apache Flink is an open source stream processing framework for stateful computations over unbounded and bounded data streams. Flink provides a unified API for batch and stream processing that supports event-time and out-of-order processing, and supports exactly-once semantics. Flink applications include real-time analytics, data pipelines, and event-driven applications. **Related terms**: *bounded stream*, *data stream*, *stream processing*, *unbounded stream* **Related content** - [Apache Flink: Stream Processing and SQL on Confluent Cloud](/cloud/current/flink/index.html#) - [What is Apache Flink?](https://www.confluent.io/learn/apache-flink/) - [Apache Flink 101 (Confluent Developer course)](https://developer.confluent.io/courses/apache-flink/intro/) Apache Kafka : Apache Kafka is an open source event streaming platform that provides a unified, high-throughput, low-latency, fault-tolerant, scalable, distributed, and secure data streaming platform. Kafka is a publish-and-subscribe messaging system that enables distributed applications to ingest, process, and share data in real-time. **Related content** - [Introduction to Kafka](/kafka/introduction.html) audit log : An audit log is a historical record of actions and operations that are triggered when auditable events occurs. Audit log records can be used to troubleshoot system issues, manage security, and monitor compliance, by tracking administrative activity, data access and modification, monitoring sign-in attempts, and reconstructing security breaches and fraudulent activity. **Related terms**: *auditable event* **Related content** - [Audit Log Concepts for Confluent Cloud](/cloud/current/monitoring/audit-logging/cloud-audit-log-concepts.html) - [Audit Log Concepts for Confluent Platform](/platform/current/security/audit-logs/audit-logs-concepts.html) auditable event : An auditable event is an event that represents an action or operation that can be tracked and monitored for security purposes and compliance. When an auditable event occurs, an auditable event method is triggered and an event message is sent to the audit log cluster and stored as an audit log record. **Related terms**: *audit log*, *event message* **Related content** - [Auditable Events in Confluent Cloud](/cloud/current/monitoring/audit-logging/event-methods/index.html) - [Auditable Events in Confluent Platform](/platform/current/security/audit-logs/auditable-events.html) authentication : Authentication is the process of verifying the identity of a principal that interacts with a system or application. Authentication is often used in conjunction with authorization to determine whether a principal is allowed to access a resource and perform a specific action or operation on that resource. Digital authentication requires one or more of the following: something a principal knows (a password or security question), something a principal has (a security token or key), or something a principal is (a biometric characteristic, such as a fingerprint or voiceprint). Multi-factor authentication (MFA) requires two or more forms of authentication. **Related terms**: *authorization*, *identity*, *identity provider*, *identity pool*, *principal*, *role* authorization : Authorization is the process of evaluating and then granting or denying a principal a set of permissions required to access and perform operations on resources. **Related terms**: *authentication*, *group mapping*, *identity*, *identity provider*, *identity pool*, *principal*, *role* Avro : Avro is a data serialization and exchange framework that provides data structures, remote procedure call (RPC), compact binary data format, a container file, and uses JSON to represent schemas. Avro schemas ensure that every field is properly described and documented for use with serializers and deserializers. You can either send a schema with every message or use Schema Registry to store and receive schemas for use by consumers and producers to save bandwidth and storage space. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [Apache Avro - a data serialization system](https://avro.apache.org/) - Confluent Cloud: [Avro Schema Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-avro.html) - Confluent Platform: [Avro Schema Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-avro.html) Basic Kafka cluster : A Confluent Cloud cluster type. Basic Kafka clusters are designed for experimentation, early development, and basic use cases. batch processing : Batch processing is the method of collecting a large volume of data over a specific time interval, after which the data is processed all at once and loaded into a destination system. Batch processing is often used when processing data can occur independently of the source and timing of the data. It is efficient for non-real-time data processing, such as data warehousing, reporting, and analytics. **Related terms**: *bounded stream*, *stream processing*, *unbounded stream* CIDR block : A CIDR block is a group of IP addresses that are contiguous and can be represented as a single block. CIDR blocks are expressed using Classless Inter-domain Routing (CIDR) notation that includes an IP address and a number of bits in the network mask. **Related content** - [CIDR notation](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing#CIDR_notation) - [Classless Inter-domain Routing (CIDR): The Internet Address Assignment and Aggregation Plan [RFC 4632]](https://www.rfc-editor.org/rfc/rfc4632.html) Cluster Linking : Cluster Linking is a highly performant data replication feature that enables links between Kafka clusters to mirror data from one cluster to another. Cluster Linking creates perfect copies of Kafka topics, which keep data in sync across clusters. Use cases include geo-replication of data, data sharing, migration, disaster recovery, and tiered separation of critical applications. **Related content** - [Geo-replication with Cluster Linking on Confluent Cloud](/cloud/current/multi-cloud/cluster-linking/index.html) - [Cluster Linking for Confluent Platform](/platform/current/multi-dc-deployments/cluster-linking/index.html) commit log : A commit log is a log of all event messages about commits (changes or operations made) sent to a Kafka topic. A commit log ensures that all event messages are processed at least once and provides a mechanism for recovery in the event of a failure. The commit log is also referred to as a write-ahead log (WAL) or a transaction log. **Related terms**: *event message* Confluent Cloud : Confluent Cloud is the fully managed, cloud-native event streaming service powered by Kora, the event streaming platform based on Kafka and extended by Confluent to provide high availability, scalability, elasticity, security, and global interconnectivity. Confluent Cloud offers cost-effective multi-tenant configurations as well as dedicated solutions, if stronger isolation is required. **Related terms**: *Apache Kafka*, *Kora* **Related content** - [Confluent Cloud Overview](/cloud/current/index.html) - [Confluent Cloud](https://www.confluent.io/confluent-cloud/) Confluent Cloud network : A Confluent Cloud network is an abstraction for a single tenant network environment that hosts Dedicated Kafka clusters in Confluent Cloud along with their single tenant services, like ksqlDB clusters and managed connectors. **Related content** - [Confluent Cloud Network Overview](/cloud/current/networking/overview.html#ccloud-networks) Confluent for Kubernetes (CFK) : *Confluent for Kubernetes (CFK)* is a cloud-native control plane for deploying and managing Confluent in private cloud environments through declarative API. Confluent Platform : Confluent Platform is a specialized distribution of Kafka at its core, with additional components for data integration, streaming data pipelines, and stream processing. Confluent REST Proxy : Confluent REST Proxy provides a RESTful interface to an Kafka cluster, making it easy to produce and consume messages, view the state of the cluster, and perform administrative actions without using the native Kafka protocol or clients. **Related content** - Confluent Platform: [REST Proxy](/platform/current/kafka-rest/index.html) Confluent Server : Confluent Server is the default Kafka broker component of Confluent Platform that builds on the foundation of Apache Kafka® and provides enhanced proprietary features designed for enterprise use. Confluent Server is fully compatible with Kafka, and adds Kafka cluster support for Role-Based Access Control, Audit Logs, Schema Validation, Self Balancing Clusters, Tiered Storage, Multi-Region Clusters, and Cluster Linking. **Related terms**: *Confluent Platform*, *Apache Kafka*, *Kafka broker*, *Cluster Linking*, *multi-region cluster (MRC)* Confluent Unit for Kafka (CKU) : Confluent Unit for Kafka (CKU) is a unit of horizontal scaling for Dedicated Kafka clusters in Confluent Cloud that provide preallocated resources. CKUs determine the capacity of a Dedicated Kafka cluster in Confluent Cloud. **Related content** - [CKU limits per cluster](/cloud/current/clusters/cluster-types.html#cku-limits-per-cluster) Connect API : The Connect API is the Kafka API that enables a connector to read event streams from a source system and write to a target system. Connect worker : A Connect worker is a server process that runs a connector and performs the actual work of moving data in and out of Kafka topics. A worker is a server process that runs on hardware independent of the Kafka brokers themselves. It is scalable and fault-tolerant, meaning you can run a cluster of workers that share the load of moving data in and out of Kafka from and to external systems. **Related terms**: *connector*, *Kafka Connect* connection attempts : In Confluent Cloud, connection attempts are a Kafka cluster billing dimension that defines the maximum number of new TCP connections to the cluster you can create in one second. This includes successful and unsuccessful authentication attempts. Available in the Metrics API as `successful_authentication_count` (only includes successful authentications, not unsuccessful authentication attempts). To reduce usage on connection attempts, use longer-lived connections to the cluster. If you exceed the maximum, connection attempts may be refused. connector : A connector is an abstract mechanism that enables communication, coordination, or cooperation among components by transferring data elements from one interface to another without changing the data. connector offset : Connector offset uniquely identifies the position of a connector as it processes data. Connectors use a variety of strategies to implement the connector offset, including everything from monotonically increasing integers to replay ids, lists of files, timestamps and even checkpoint information. Connector offsets keep track of already-processed data in the event of a connector restart or recovery. While sink connectors use a pattern for connector offsets similar to the offset mechanism used throughout Kafka, the implementation details for source connectors are often much different. This is because source connectors track the progress of a source system as it process data. consumer : A consumer is a Kafka client application that subscribes to (reads and processes) event messages from a Kafka topic. The Streams API and the Consumer API are the two APIs that enable consumers to read event streams from Kafka topics. **Related terms**: *Consumer API*, *consumer group*, *producer*, *Streams API* Consumer API : The Consumer API is the Kafka API used for consuming (reading) event messages or records from Kafka topics and enables a Kafka consumer to subscribe to a topic and read event messages as they arrive. Batch processing is a common use case for the Consumer API. consumer group : A consumer group is a single logical consumer implemented with multiple physical consumers for reasons of throughput and resilience. By dividing topics among consumers in the group into partitions, consumers in the group can process messages in parallel, increasing message throughput and enabling load balancing. **Related terms**: *consumer*, *partition*, *partition*, *producer*, *topic* consumer lag : Consumer lag is the number of consumer offsets between the latest message produced in a partition and the last message consumed by a consumer, that is the number of messages pending to be consumed from a particular partition. A large consumer lag, or a quickly growing lag, indicates that the consumer is unable to read from a partition as fast as the messages are available. This can be caused by a slow consumer, slow network, or slow broker. consumer offset : Consumer offset is the unique and monotonically increasing integer value that uniquely identifies the position of an event record in a partition. Consumers use offsets to track their current position in the Kafka topic, allowing consumers to resume processing from where they left off. Offsets are stored on the Kafka broker, which does not track which records have been read and which have not. It is up to the consumer connection to track this information. When a consumer acknowledges receiving and processing a message, it commits an offset value that is stored in the special internal topic `__commit_offsets`. cross-resource RBAC role binding : A cross-resource RBAC role binding is a role binding in Confluent Cloud that is applied at the Organization or Environment scope and grants access to multiple resources. For example, assigning a principal the NetworkAdmin role at the Organization scope lets them administer all networks across all Environments in their Organization. **Related terms**: *identity pool*, *principal*, *role*, *role binding* CRUD : CRUD is an acronym for the four basic operations that can be performed on data: Create, Read, Update, and Delete. custom connector : A custom connector is a connector created using Connect plugins uploaded to Confluent Cloud by users. This includes connector plugins that are built from scratch, modified open-source connector plugins, or third-party connector plugins. data at rest : Data at rest is data that is physically stored on non-volatile media (such as hard drives, solid-state drives, or other storage devices) and is not actively being transmitted or processed by a system. data contract : A data contract is a formal agreement between an upstream component and a downstream component on the structure and semantics of data that is in motion. A schema is a key element of a data contract. The schema, metadata, rules, policies, and evolution plan form the data contract. You can associate data contracts (schemas and more) with [topics](#term-Kafka-topic). **Related content** - Confluent Platform: [Data Contracts for Schema Registry on Confluent Platform](/platform/current/schema-registry/fundamentals/data-contracts.html) - Confluent Cloud: [Data Contracts for Schema Registry on Confluent Cloud](/cloud/current/sr/fundamentals/data-contracts.html) - Cloud Console: [Manage Schemas in Confluent Cloud](/cloud/current//sr/schemas-manage.html) data encryption key (DEK) : A data encryption key (DEK) is a symmetric key that is used to encrypt and decrypt data. The DEK is used in client-side field level encryption (CSFLE) to encrypt sensitive data. The DEK is itself encrypted using a key encryption key (KEK) that is only accessible to authorized users. The encrypted DEK and encrypted data are stored together. Only users with access to the KEK can decrypt the DEK and access the sensitive data. **Related terms**: *envelope encryption*, *key encryption key (KEK)* data in motion : Data in motion is data that is actively being transferred between source and destination, typically systems, devices, or networks. Data in motion is also referred to as data in transit or data in flight. data in use : Data in use is data that is actively being processed or manipulated in memory (RAM, CPU caches, or CPU registers). data ingestion : Data ingestion is the process of collecting, importing, and integrating data from various sources into a system for further processing, analysis, or storage. data mapping : Data mapping is the process of defining relationships or associations between source data elements and target data elements. Data mapping is an important process in data integration, data migration, and data transformation, ensuring that data is accurately and consistently represented when it is moved or combined. data pipeline : A data pipeline is a series of processes and systems that enable the flow of data from sources to destinations, automating the movement and transformation of data for various purposes, such as analytics, reporting, or machine learning. A data pipeline typically comprised of a source system, a data ingestion tool, a data transformation tool, and a target system. A data pipeline covers the following stages: data extraction, data transformation, data loading, and data validation. Data Portal : Data Portal is a Confluent Cloud application that uses Stream Catalog and Stream Lineage to provide self-service access throughout Confluent Cloud Console for data practitioners to search and discover existing topics using tags and business metadata, request access to topics and data, and access data in topics to to build streaming applications and data pipelines. Leverages Stream Catalog and Stream Lineage to provide a data-centric view of Confluent optimized for self-service access to data where users can search, discover and understand available data, request access to data, and use data. **Related terms**: *Stream Catalog*, *Stream Lineage* **Related content** - [Data Portal on Confluent Cloud](/cloud/current/stream-governance/data-portal.html) data serialization : Data serialization is the process of converting data structures or objects into a format that can be stored or transmitted, and reconstructed later in the same or another computer environment. Data serialization is a common technique for implementing data persistence, interprocess communication, and object communication. Confluent Schema Registry (in Confluent Platform) and Confluent Cloud Schema Registry support data serialization using serializers and deserializers for the following formats: Avro, JSON Schema, and Protobuf. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) data steward : A data steward is a person with data-related responsibilities, such as data governance, data quality, and data security. data stream : A data stream is a continuous flow of data records that are produced and consumed by applications. dead letter queue (DLQ) : A dead letter queue (DLQ) is a queue where messages that could not be processed successfully by a sink connector are placed. Instead of stopping, the sink connector sends messages that could not be written successfully as event records to the DLQ topic while the sink connector continues processing messages. Dedicated Kafka cluster : A Confluent Cloud cluster type. Dedicated Kafka clusters are designed for critical production workloads with high traffic or private networking requirements. deserializer : A deserializer is a tool that converts a serial byte stream back into objects and parallel data. Deserializers work with serializers (known together as Serdes) to support efficient storage and high-speed data transmission over the wire. Confluent provides Serdes for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) egress : In general networking, egress refers to outbound traffic leaving a network or a specific network segment. In Confluent Cloud, egress is a Kafka cluster billing dimension that defines the number of bytes consumed from the cluster in one second. Available in the Metrics API as `sent_bytes` (convert from bytes to MB). To reduce egress in Confluent Cloud, compress your messages and ensure each consumer is only consuming from the topics it requires. For compression, use lz4. Avoid gzip because of high overhead on the cluster. Elastic Confluent Unit for Kafka (eCKU) : Elastic Confluent Unit for Kafka (eCKU) is used to express capacity for Basic, Standard, Enterprise, and Freight Kafka clusters. These clusters automatically scale up to a fixed ceiling. There is no need to resize these type clusters. When you need more capacity, your cluster expands up to the fixed ceiling. If you’re not using capacity above the minimum, you’re not paying for it. ELT : ELT is an acronym for Extract-Load-Transform, where data is extracted from a source system and loaded into a target system before processing or transformation. Compared to ETL, ELT is a more flexible approach to data ingestion because the data is loaded into the target system before transformation. Enterprise Kafka cluster : A Confluent Cloud cluster type. Enterprise Kafka clusters are designed for production-ready functionality that requires private endpoint networking capabilities. envelope encryption : Envelope encryption is a cryptographic technique that uses two keys to encrypt data. The symmetric data encryption key (DEK) is used to encrypt sensitive data. The separate asymmetric key encryption key (KEK) is the master key used to encrypt the DEK. The DEK and encrypted data are stored together. Only users with access to the KEK can decrypt the DEK and access the sensitive data. In Confluent Cloud, envelope encryption is used to enable client-side field level encryption (CSFLE). CSFLE encrypts sensitive data in a message before it is sent to Confluent Cloud and allows for temporary decryption of sensitive data when required to perform operations on the data. **Related terms**: *data encryption key (DEK)*, *key encryption key (KEK)* ETL : ETL is an acronym for Extract-Transform-Load, where data is extracted from a source system, transformed into a target format, and loaded into a target system. Compared to ELT, ETL is a more rigid approach to data ingestion because the data is transformed before loading into the target system. event : An event is a meaningful action or occurrence of something that happened. Events that can be recognized by a program, either human-generated or triggered by software, can be recorded in a log file or other data store. **Related terms**: *event message*, *event record*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event message : An event message is a record of an event sent to a Kafka topic, represented as a key-value pair. Each event message consists of a key-value pair, a timestamp, the compression type, headers for metadata (optional), and a partition and offset ID (once the message is written). The key is optional and can be used to identify the event. The value is required and contains details about the event that happened. **Related terms**: *event*, *event record*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event record : An event record is the record of an event stored in a Kafka topic. Event records are organized and durably stored in topics. Examples of events include orders, payments, activities, or measurements. An event typically contains one or more data fields that describe the fact, as well as a timestamp that denotes when the event was created by its event source. The event may also contain various metadata, such as its source of origin (for example, the application or cloud service that created the event) and storage-level information (for example, its position in the event stream). **Related terms**: *event*, *event message*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event sink : An event sink is a consumer of events, which can include applications, cloud services, databases, IoT sensors, and more. **Related terms**: *event*, *event message, \*event record*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event source : An event source is a producer of events, which can include cloud services, databases, IoT sensors, mainframes, and more. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event stream*, *event streaming*, *event streaming platform*, *event time* event stream : An event stream is a continuous flow of event messages produced by an event source and consumed by one or more consumers. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event streaming platform*, *event time* event streaming : Event streaming is the practice of capturing event data in real-time from data sources. Event streaming is a form of data streaming that is used to capture, store, process, and react to data in real-time or retrospectively. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming platform*, *event time* event streaming platform : An event streaming platform is a platform that events can be written to once, allowing distributed functions within an organization to react in realtime. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event time* event time : Event time is the time when an event occurred on the producing device, as opposed to the time when the event was processed or recorded. Event time is often used in stream processing to determine the order of events and to perform windowing operations. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event streaming platform* exactly-once semantics : Exactly-once semantics is a guarantee that a message is delivered exactly once and in the order that it was sent. Even if a producer retries sending a message, or a consumer retries processing a message, the message is delivered exactly once. This guarantee is achieved by the broker assigning a unique ID to each message and storing the ID in the consumer offset. The consumer offset is committed to the broker only after the message is processed. If the consumer fails to process the message, the message is redelivered and processed again. Freight Kafka cluster : A Confluent Cloud cluster type. Freight Kafka clusters are designed for high-throughput, relaxed latency workloads that are less expensive than self-managed open source Kafka. granularity : Granularity is the degree or level of detail to which an entity (a system, service, or resource) is broken down into subcomponents, parts, or elements. Entities that are *fine-grained* have a higher level of detail, while *coarse-grained* entities have a reduced level of detail, often combining finer parts into a larger whole. In the context of access control, granular permissions provide precise control over resource access. They allow administrators to grant specific operations on distinct resources. This ensures users only have permissions tailored to their needs, minimizing unnecessary or potentially risky access. group mapping : Group mapping is a set of rules that map groups in your SSO identity provider to Confluent Cloud RBAC roles. When a user signs in to Confluent Cloud using SSO, Confluent Cloud uses the group mapping to grant access to Confluent Cloud resources. **Related terms**: *identity provider*, *identity pool*, *principal*, *role* **Related content** - [Group Mapping for Confluent Cloud](/cloud/current/access-management/authenticate/sso/group-mapping/overview.html) identity : An identity is a unique identifier that is used to authenticate and authorize users and applications to access resources. Identity is often used in conjunction with access control to determine whether a user or application is allowed to access a resource and perform a specific action or operation on that resource. **Related terms**: *identity provider*, *identity pool*, *principal*, *role* identity pool : An identity pool is a collection of identities that can be used to authenticate and authorize users and applications to access resources. Identity pools are used to manage permissions for users and applications that access resources in Confluent Cloud. They are also used to manage permissions for Confluent Cloud service accounts that are used to access resources in Confluent Cloud. identity provider : An identity provider is a trusted provider that authenticates users and issues security tokens that are used to verify the identity of a user. Identity providers are often used in single sign-on (SSO) scenarios, where a user can log in to multiple applications or services with a single set of credentials. Infinite Storage : Infinite Storage is the Confluent Cloud storage service that enhances the scalability of Confluent Cloud resources by separating storage and processing. Tiered storage within Confluent Cloud moves data between storage layers based on the needs of the workload, retrieves tiered data when requested, and garbage collects data that is past retention or otherwise deleted. If an application reads historical data, latency is not increased for other applications reading more recent data. Storage resources are decoupled from compute resources, you only pay for what you produce to Confluent Cloud and for storage that you use, and CKUs do not have storage limits. Related content: - [Infinite Storage in Confluent Cloud for Apache Kafka](https://www.confluent.io/blog/infinite-kafka-data-storage-in-confluent-cloud/) ingress : In general networking, ingress refers to traffic that enters a network from an external source. In Confluent Cloud, ingress is a Kafka cluster billing dimension that defines the number of bytes produced to the cluster in one second. Available in the Metrics API as `received_bytes` (convert from bytes to MB). To reduce ingress in Confluent Cloud, compress your messages. For compression, use lz4. Avoid gzip because of high overhead on the cluster. internal topic : An internal topic is a topic, prefixed with double underscores (“_\_”), that is automatically created by a Kafka component to store metadata about the broker, partition assignment, consumer offsets, and other information. Examples of internal topics: `__cluster_metadata`, `__consumer_offsets`, `__transaction_state`, `__confluent.support.metrics`, and `__confluent.support.metrics-raw`. JSON Schema : JSON Schema is a declarative language used for data serialization and exchange to define data structures, specify formats, and validate JSON documents. It is a way to encode expected data types, properties, and constraints to ensure that all fields are properly described for use with serializers and deserializers. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [JSON Schema - a declarative language that allows you to annotate and validate JSON documents.](https://json-schema.org/) - Confluent Cloud: [JSON Schema Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-avro.html) - Confluent Platform: [JSON Schema Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-json.html) Kafka bootstrap server : A Kafka bootstrap server is a Kafka broker that a Kafka client initiates a connection to a Kafka cluster and returns metadata, which includes the addresses for all of the brokers in the Kafka cluster. Although only one bootstrap server is required to connect to a Kafka cluster, multiple brokers can be specified in a bootstrap server list to provide high availability and fault tolerance in case a broker is unavailable. In Confluent Cloud, the bootstrap server is the general cluster endpoint. Kafka broker : A Kafka broker is a server in the Kafka storage layer that stores event streams from one or more sources. A Kafka cluster is typically comprised of several brokers. Every broker in a cluster is also a bootstrap server, meaning if you can connect to one broker in a cluster, you can connect to every broker. Kafka client : A Kafka client allows you to write distributed applications and microservices that read, write, and process streams of events in parallel, at scale, and in a fault-tolerant manner, even in the case of network problems or machine failures. The Kafka client library provides functions, classes, and utilities that allow developers to create Kafka producer clients (Producers) and consumer clients (Consumers) using various programming languages. The primary way to build production-ready Producers and Consumers is by using your preferred programming language and a Kafka client library. **Related content** - [Build Client Applications for Confluent Cloud](/cloud/current/client-apps/overview.html) - [Build Client Applications for Confluent Platform](/platform/current/clients/index.html) - [Getting Started with Apache Kafka and Java (or Python, Go, .Net, and others)](https://developer.confluent.io/get-started/java/) Kafka cluster : A Kafka cluster is a group of interconnected Kafka brokers that manage and distribute real-time data streaming, processing, and storage as if they are a single system. By distributing tasks and services across multiple Kafka brokers, the Kafka cluster improves availability, reliability, and performance. Kafka Connect : Kafka Connect is the component of Kafka that provides data integration between databases, key-value stores, search indexes, file systems, and Kafka brokers. Kafka Connect is an ecosystem of a client application and pluggable connectors. As a client application, Connect is a server process that runs on hardware independent of the Kafka brokers themselves. It is scalable and fault-tolerant, meaning you can run a cluster of Connect workers that share the load of moving data in and out of Kafka from and to external systems. Connect also abstracts the business of code away from the user and instead requires only JSON configuration to run. **Related content** - Confluent Cloud: [Kafka Connect](/cloud/current/billing/overview.html#kconnect-long) - Confluent Platform: [Kafka Connect](/platform/current/connect/index.html) Kafka controller : A Kafka controller is the node in a Kafka cluster that is responsible for managing and changing the metadata of the cluster. This node also communicates metadata changes to the rest of the cluster. When Kafka uses ZooKeeper for metadata management, the controller is a broker, and the broker persists the metadata to ZooKeeper for backup and recovery. With KRaft, you dedicate Kafka nodes to operate as controllers and the metadata is stored in Kafka itself and not persisted to ZooKeeper. KRaft enables faster recovery because of this. For more information, see [KRaft overview](/platform/current/kafka-metadata/kraft.html). Kafka listener : A Kafka listener is an endpoint that Kafka brokers bind to use to communicate with clients. For Kafka clusters, Kafka listeners are configured in the `listeners` property of the `server.properties` file. Advertised listeners are publicly accessible endpoints that are used by clients to connect to the Kafka cluster. **Related content** - [Kafka Listeners – Explained](https://www.confluent.io/blog/kafka-listeners-explained/) Kafka metadata : Kafka metadata is the information about the Kafka cluster and the topics that are stored in it. This information includes details such as the brokers in the cluster, the topics that are available, the partitions for each topic, and the location of the leader for each partition. Kafka metadata is used by clients to discover the available brokers and topics, and to determine which broker is the leader for a particular partition. This information is essential for clients to be able to send and receive messages to and from Kafka. Kafka Streams : Kafka Streams is a stream processing library for building streaming applications and microservices that transform (filter, group mapping, aggregate, join, and more) incoming event streams in real-time to Kafka topics stored in an Kafka cluster. The Streams API can be used to build applications that process data in real-time, analyze data continuously, and build data pipelines. **Related content** - [Kafka Streams](/platform/current/streams/overview.html) Kafka topic : See *topic*. key encryption key (KEK) : A key encryption key (KEK) is a master key that is used to encrypt and a decrypt other keys, specifically the data encryption key (DEK). Only users with access to the KEK can decrypt the DEK and access the sensitive data. **Related terms**: *data encryption key (DEK)*, *envelope encryption*. Kora : Kora is the cloud-native streaming data service based on Kafka technology that powers the Confluent Cloud event streaming platform for building real-time data pipelines and streaming applications. Kora abstracts low-level resources, such as Kafka brokers, and hides operational complexities, such as system upgrades. Kora is built on the following foundations: a tiered storage layer that improves cost and performance, elasticity and consistent performance through incremental load balancing, cost effective multi-tenancy with dynamic quota management and cell-based isolation, continuous monitoring of both system health and data integrity, and clean abstraction with standard Kafka protocols and CKUs to hide underlying resources. **Related terms**: *Apache Kafka*, *Confluent Cloud*, *Confluent Unit for Kafka (CKU)* **Related content** - [Kora: The Cloud Native Engine for Apache Kafka](https://www.confluent.io/blog/cloud-native-data-streaming-kafka-engine/) - [Kora: A Cloud-Native Event Streaming Platform For Kafka](https://www.vldb.org/pvldb/vol16/p3822-povzner.pdf) KRaft : KRaft (or Apache Kafka Raft) is a consensus protocol introduced in Kafka 2.4 to provide metadata management for Kafka with the goal to replace ZooKeeper. KRaft simplifies Kafka because it enables the management of metadata in Kafka itself, rather than splitting it between ZooKeeper and Kafka. As of Confluent Platform 7.5, KRaft is the default method of metadata management in new deployments. For more information, see [KRaft overview](/platform/current/kafka-metadata/kraft.html). ksqlDB : ksqlDB is a streaming SQL database engine purpose-built for creating stream processing applications on top of Kafka. logical Kafka cluster (LKC) : A logical Kafka cluster (LKC) is a subset of a physical Kafka cluster (PKC) that is isolated from other logical clusters within Confluent Cloud. Each logical unit of isolation is considered a tenant and maps to a specific organization. If the mapping is one-to-one, one LKC maps to one PKC (a Dedicated cluster). If the mapping is many-to-one, one LKC maps to one of the multitenant Kafka cluster types (Basic, Standard, Enterprise, and Freight). **Related terms**: *Confluent Cloud*, *Kafka cluster*, *physical Kafka cluster (PKC)* **Related content** - [Kafka Cluster Types in Confluent Cloud](/cloud/current/clusters/cluster-types.html) - [Multi-tenancy and Client Quotas on Confluent Cloud](/cloud/current/clusters/client-quotas.html) multi-region cluster (MRC) : A multi-region cluster (MRC) is a single Kafka cluster that replicates data between datacenters across regional availability zones. multi-tenancy : Multi-tenancy is a software architecture in which a single physical instance is shared among multiple logical instances, or tenants. In Confluent Cloud, each Basic, Standard, Enterprise, and Freight cluster is a logical Kafka cluster (LKC) that shares a physical Kafka cluster (PKC) with other tenants. Each LKC is isolated from other L and has its own resources, such as memory, compute, and storage. **Related terms**: *Confluent Cloud*, *logical Kafka cluster (LKC)*, *physical Kafka cluster (PKC)* **Related content** * [From On-Prem to Cloud-Native: Multi-Tenancy in Confluent Cloud](https://www.confluent.io/blog/cloud-native-multi-tenant-kafka-with-confluent-cloud/) * [Multi-tenancy and Client Quotas on Confluent Cloud](/cloud/current/clusters/client-quotas.html) offset : An offset is an integer assigned to each message that uniquely represents its position within the data stream, guaranteeing the ordering of records and allowing offset-based connections to replay messages from any point in time. **Related terms**: *consumer offset*, *connector offset*, *offset commit*, *replayability* offset commit : An offset commit is the process of keeping track of the current position of an offset-based connection (primarily Kafka consumers and connectors) within the data stream. The offset commit process is not specific to consumers, producers, or connectors. It is a general mechanism in Kafka to track the position of any application that is reading data. When a consumer commits an offset, the offset identifies the next message the consumer should consume. For example, if a consumer has an offset of 5, it has consumed messages 0 through 4 and will next consume message 5. If the consumer crashes or is shut down, its partitions are reassigned to another consumer which initiates consuming from the last committed offset of each partition. The committed offset for consumers is stored on a Kafka broker. When a consumer commits an offset, it sends a commit request to the Kafka cluster, specifying the partition and offset it wants to commit for a particular consumer group. The Kafka broker receiving the commit request then stores this offset in the `__consumer_offsets` internal topic. **Related terms**: *consumer offset*, *offset* OpenSSL : OpenSSL is an open-source software library and toolkit that implements the Secure Sockets Layer (SSL) and Transport Layer Security (TLS) protocols. For more information, see [Wikipedia](https://en.wikipedia.org/wiki/OpenSSL). parent cluster : The Kafka cluster that a resource belongs to. **Related terms**: *Kafka cluster* partition : A partition is a unit of data storage that divides a topic into multiple, parallel event streams, each of which is stored on separate Kafka brokers and can be consumed independently. Partitioning is a key concept in Kafka because it allows Kafka to scale horizontally by adding more brokers to the cluster. Partitions are also the unit of parallelism in Kafka. A topic can have one or more partitions, and each partition is an ordered, immutable sequence of event records that is continually appended to a partition log. partitions (pre-replication) : In Confluent Cloud, partitions are a Kafka cluster billing dimension that define the maximum number of partitions that can exist on the cluster at one time, before replication. While you are not charged for partitions on any type of Kafka cluster, the number of partitions you use has an impact on eCKU. To determine eCKUs limits for partitions, Confluent Cloud bills only for pre-replication (leader partitions) across a cluster. All topics that you create (as well as internal topics that are automatically created by Confluent Platform components such as ksqlDB, Kafka Streams, Connect, and Control Center (Legacy)) count towards the cluster partition limit. Confluent prefixes topics created automatically with an underscore (_). Topics that are internal to Kafka itself (consumer offsets) are not visible in Cloud Console and do not count against partition limits or toward partition billing. Available in the Metrics API as `partition_count`. In Confluent Cloud, attempts to create additional partitions beyond the cluster limit fail with an error message. To reduce usage on partitions (pre-replication), delete unused topics and create new topics with fewer partitions. Use the Kafka Admin interface to increase the partition count of an existing topic if the initial partition count is too low. physical Kafka cluster (PKC) : A physical Kafka cluster (PKC) is a Kafka cluster comprised of multiple brokers. Each physical Kafka cluster is created on a Kubernetes cluster by the control plane. A PKC is not directly accessible by clients. principal : A principal is an entity that can be authenticated and granted permissions based on roles to access resources and perform operations. An entity can be a user account, service account, group mapping, or identity pool. **Related terms**: *group mapping*, *identity*, *identity pool*, *role*, *service account*, *user account* private internet : A private internet is a closed, restricted computer network typically used by organizations to provide secure environments for managing sensitive data and resources. processing time : Processing time is the time when an event is processed or recorded by a system, as opposed to the time when the event occurred on the producing device. Processing time is often used in stream processing to determine the order of events and to perform windowing operations. producer : A producer is a client application that publishes (writes) data to a topic in an Kafka cluster. Producers write data to a topic and are the only clients that can write data to a topic. Each record written to a topic is appended to the partition of the topic that is selected by the producer. Producer API : The Producer API is the Kafka API that allows you to write data to a topic in an Kafka cluster. The Producer API is used by producer clients to publish data to a topic in an Kafka cluster. Protobuf : Protobuf (or Protocol Buffers) is an open-source data format used to serialize structured data for storage. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [Protocol Buffers](https://protobuf.dev/) - [Getting Started with Protobuf in Confluent Cloud](https://www.confluent.io/blog/using-protobuf-in-confluent-cloud/) - Confluent Cloud: [Protobuf Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-protobuf.html) - Confluent Platform: [Protobuf Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-protobuf.html) public internet : The public internet is the global system of interconnected computers and networks that use TCP/IP to communicate with each other. rebalancing : Rebalancing is the process of redistributing the partitions of a topic among the consumers of a consumer group for improved performance and scalability. A rebalance can occur if a consumer has failed the heartbeat and has been excluded from the group, it voluntarily left the group, metadata has been updated for a consumer, or a consumer has joined the group. replayability : Replayability is the ability to replay messages from any point in time. **Related terms**: *consumer offset*, *offset*, *offset commit* replication : Replication is the process of creating and maintaining multiple copies (or *replicas*) of data across different nodes in a distributed system to increase availability, reliability, redundancy, and accessibility. replication factor : A replication factor is the number of copies of a partition that are distributed across the brokers in a cluster. requests : In Confluent Cloud, requests are a Kafka cluster billing dimension that defines the number of client requests to the cluster in one second. Available in the Metrics API as `request_count`. To reduce usage on requests, you can adjust producer batching configurations, consumer client batching configurations, and shut down otherwise inactive clients. For Dedicated clusters, a high number of requests per second results in increased load on the cluster. role : A role is a Confluent-defined job function assigned a set of permissions required to perform specific actions or operations on Confluent resources bound to a principal and Confluent resources. A role can be assigned to a user account, group mapping, service account, or identity pool. **Related terms**: *group mapping*, *identity*, *identity pool*, *principal*, *service account* **Related content** - [Predefined RBAC Roles in Confluent Cloud](/cloud/current/access-management/access-control/rbac/predefined-rbac-roles.html) - [Role-Based Access Control Predefined Roles in Confluent Platform](/platform/current/security/rbac/rbac-predefined-roles.html) rolling restart : A rolling restart restarts the brokers in a Kafka cluster with zero downtime by incrementally restarting a Kafka broker after verifying that there are no under-replicated partitions on the broker before proceeding to the next broker. Restarting the brokers one at a time allows for software upgrades, broker configuration updates, or cluster maintenance while maintaining high availability by avoiding downtime. **Related content** - [Rolling restart](/platform/current/kafka/post-deployment.html#rolling-restart) schema : A schema is the structured definition or blueprint used to describe the format and structure event messages sent through the Kafka event streaming platform. Schemas are used to validate the structure of data in event messages and ensures that producers and consumers are sending and receiving data in the same format. Schemas are defined in the Schema Registry. Schema Registry : Schema Registry is a centralized repository for managing and validating schemas for topic message data that stores and manages schemas for Kafka topics. Schema Registry is built into Confluent Cloud as a managed service, available with the Advanced Stream Governance package, and offered as part of Confluent Enterprise for self-managed deployments. The Schema Registry is a RESTful service that stores and manages schemas for Kafka topics. The Schema Registry is integrated with Kafka and Connect to provide a central location for managing schemas and validating data. Producers and consumers to Kafka topics use schemas to ensure data consistency and compatibility as schemas evolve. Schema Registry is a key component of Stream Governance. **Related content** - Confluent Cloud: [Manage Schemas in Confluent Cloud](/cloud/current/sr/schemas-manage.html) - Confluent Platform: [Schema Registry Overview](/platform/current/schema-registry/index.html) schema subject : A schema subject is the namespace for a schema in Schema Registry. This unique identifier defines a logical grouping of related schemas. Kafka topics contain event messages serialized and deserialized using the structure and rules defined in a schema subject. This ensures compatibility and supports schema evolution. **Related content** - Confluent Cloud: [Manage Schemas in Confluent Cloud](/cloud/current/sr/schemas-manage.html) - Confluent Platform: [Schema Registry Concepts](/platform/current/schema-registry/index.html) - [Understanding Schema Subjects](https://developer.confluent.io/courses/schema-registry/schema-subjects/) Serdes : Serdes are serializers and deserializers that convert objects and parallel data into a serial byte stream for efficient storage and high-speed data transmission over the wire. Confluent provides Serdes for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) - [Serde](https://serde.rs/) serializer : A serializer is a tool that converts objects and parallel data into a serial byte stream. Serializers work with deserializers (known together as Serdes) to support efficient storage and high-speed data transmission over the wire. Confluent provides serializers for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) service account : A service account is a non-person entity used by an application or service to access resources and perform operations. Because a service account is an identity independent of the user who created it, it can be used programmatically to authenticate to resources and perform operations without the need for a user to be signed in. **Related content** - [Service Accounts for Confluent Cloud](/cloud/current/access-management/identity/service-accounts.html) service quota : A service quota is the limit, or maximum value, for a specific Confluent Cloud resource or operation that might vary by the resource scope it applies to. **Related content** - [Service Quotas for Confluent Cloud](/cloud/current/quotas/index.html) single message transform (SMT) : A single message transform (SMT) is a transformation or operation applied in realtime on an individual message that changes the values, keys, or headers of a message before being sent to a sink connector or after being read from a source connector. SMTs are convenient for inserting fields, masking information, event routing, and other minor data adjustments. single sign-on (SSO) : Single sign-on (SSO) is a centralized authentication service that allows users to use a single set of credentials to log in to multiple applications or services. **Related terms**: *authentication*, *group mapping*, *identity provider* **Related content** - [Single Sign-On for Confluent Cloud](/cloud/current/access-management/authenticate/sso/index.html) sink connector : A sink connector is a Kafka Connect connector that publishes (writes) data from a Kafka topic to an external system. source connector : A source connector is a Kafka Connect connector that subscribes (reads) data from a source (external system), extracts the payload and schema of the data, and publishes (writes) the data to Kafka topics. standalone : Standalone refers to a configuration in which a software application, system, or service operates independently on a single instance or device. This mode is commonly used for development, testing, and debugging purposes. For Kafka Connect, a standalone worker is a single process responsible for running all connectors and tasks on a single instance. Standard Kafka cluster : A Confluent Cloud cluster type. Standard Kafka clusters are designed for production-ready features and functionality. static egress IP address : A static egress IP address is an IP address used by a Confluent Cloud managed connector to establish outbound connections to endpoints of external data sources and sinks over the public internet. **Related content** - [Use Static IP Addresses on Confluent Cloud for Connectors and Cluster Linking](/cloud/current/networking/static-egress-ip-addresses.html) - [Static Egress IP Addresses for Confluent Cloud Connectors](/cloud/current/connectors/static-egress-ip.html) storage (pre-replication) : In Confluent Cloud, storage is a Kafka cluster billing dimension that defines the number of bytes retained on the cluster, pre-replication. Available in the Metrics API as `retained_bytes` (convert from bytes to TB). The returned value is pre-replication. Standard, Enterprise, Dedicated, and Freight clusters support Infinite Storage. This means there is no maximum size limit for the amount of data that can be stored on the cluster. You can configure policy settings `retention.bytes` and `retention.ms` at the topic level to control exactly how much and how long to retain data in a way that makes sense for your applications and helps control your costs. To reduce storage in Confluent Cloud, compress your messages and reduce retention settings. For compression, use lz4. Avoid gzip because of high overhead on the cluster. Stream Catalog : Stream Catalog is a pillar of Confluent Cloud Stream Governance that provides a centralized inventory of your organization’s data assets that supports data governance and data discovery. With Data Portal in Confluent Cloud Console, users can find event streams across systems, search topics by name or tags, and enrich event data to increase value and usefulness. REST and GraphQL APIs can be used to search schemas, apply tags to records or fields, manage business metadata, and discover relationships across data assets. **Related content** - [Stream Catalog on Confluent Cloud: User Guide to Manage Tags and Metadata](/cloud/current/stream-governance/stream-catalog.html) - [Stream Catalog in Streaming Data Governance (Confluent Developer course)](https://developer.confluent.io/courses/governing-data-streams/stream-catalog/) Stream Governance : Stream Governance is a collection of tools and features that provide data governance for data in motion. These include data quality tools such as Schema Registry, schema ID validation, and schema linking; built-in data catalog capabilities to classify, organize, and find event streams across systems; and stream lineage to visualize complex data relationships and uncover insights with interactive, end-to-end maps of event streams. Taken together, these and other governance tools enable teams to manage the availability, integrity, and security of data used across organizations, and help with standardization, monitoring, collaboration, reporting, and more. **Related terms**: *Stream Catalog*, *Stream Lineage* **Related content** - [Stream Governance on Confluent Cloud](/cloud/current/stream-governance/index.html) stream lineage : Stream lineage is the life cycle, or history, of data, including its origins, transformations, and consumption, as it moves through various stages in data pipelines, applications, and systems. Stream lineage provides a record of data’s journey from its source to its destination, and is used to track data quality, data governance, and data security. **Related terms**: **Data Portal**, *Stream Governance* **Related content** - [Stream Lineage on Confluent Cloud](/cloud/current/stream-governance/stream-lineage.html) stream processing : Stream processing is the method of collecting event stream data in real-time as it arrives, transforming the data in real-time using operations (such as filters, joins, and aggregations), and publishing the results to one or more target systems. Stream processing can be used to analyze data continuously, build data pipelines, and process time-sensitive data in real-time. Using the Confluent event streaming platform, event streams can be processed in real-time using Kafka Streams, Kafka Connect, or ksqlDB. Streams API : The Streams API is the Kafka API that allows you to build streaming applications and microservices that transform (for example, filter, group, aggregate, join) incoming event streams in real-time to Kafka topics stored in a Kafka cluster. The Streams API is used by stream processing clients to process data in real-time, analyze data continuously, and build data pipelines. **Related content** - [Introduction Kafka Streams API](/platform/current/streams/introduction.html) throttling : Throttling is the process Kafka clusters in Confluent Cloud use to protect themselves from getting to an over-utilized state. Also known as backpressure, throttling in Confluent Cloud occurs when cluster load reaches 80%. At this point, applications may start seeing higher latencies or timeouts as the cluster must begin throttling requests or connection attempts. topic : A topic is a user-defined category or feed name where event messages are stored and published by producers and subscribed to by consumers. Each topic is a log of event messages. Topics are stored in one or more partitions, which distribute topic records brokers in a Kafka cluster. Each partition is an ordered, immutable sequence of records that are continually appended to a topic. **Related content** - [Manage Topics in Confluent Cloud](/cloud/current/client-apps/topics/index.html) total client connections : In Confluent Cloud, total client connections are a Kafka cluster billing dimension that defines the number of TCP connections to the cluster you can open at one time. Available in the Metrics API as `active_connection_count`. Filter by principal to understand how many connections each application is creating. How many connections a cluster supports can vary widely based on several factors, including number of producer clients, number of consumer clients, partition keying strategy, produce patterns per client, and consume patterns per client. For Dedicated clusters, Confluent derives a guideline for total client connections from benchmarking that indicates exceeding this number of connections increases produce latency for test clients. However, this does not apply to all workloads. That is why total client connections are a guideline, not a hard limit for Dedicated Kafka clusters. Monitor the impact on cluster load as connection count increases, as this is the final representation of the impact of a given workload or CKU dimension on the underlying resources of the cluster. Consider the Confluent guideline a per-CKU guideline. The number of connections tends to increase when you add brokers. In other words, if you significantly exceed the per-CKU guideline, cluster expansion doesn’t always give your cluster more connection count headroom. Transport Layer Security (TLS) : Transport Layer Security (TLS) is a cryptographic protocol that provides secure communication over a network. For more information, see [Wikipedia](https://en.wikipedia.org/wiki/Transport_Layer_Security). unbounded stream : An unbounded stream is a stream of data that is continuously generated in real-time and has no defined end. Examples of unbounded streams include stock prices, sensor data, and social media feeds. Processing unbounded streams requires a different approach than processing bounded streams. Unbounded streams are processed incrementally as data arrives, while bounded streams are processed as a batch after all data has arrived. Kafka Streams and Flink can be used to process unbounded streams. **Related terms**: *bounded stream*, *stream processing*, *unbounded stream* under replication : Under replication is a situation when the number of in-sync replicas is below the number of all replicas. Under Replicated partitions can occur when a broker is down or cannot replicate fast enough from the leader (replica fetcher lag). user account : A user account is an account representing the identity of a person who can be authenticated and granted access to Confluent Cloud resources. **Related content** - [User Accounts for Confluent Cloud](/cloud/current/access-management/identity/user-accounts/overview.html) watermark : A watermark in Flink is a marker that keeps track of time as data is processed. A watermark means that all records until the current moment in time have been “seen”. This way, Flink can correctly perform tasks that depend on when things happened, like calculating aggregations over time windows. **Related content** - [Time and Watermarks](/cloud/current/flink/concepts/timely-stream-processing.html) ### Azure Data Lake Storage Gen1 object names The Azure Data Lake Storage Gen1 data model is a flat structure: each bucket stores objects, and the name of each Azure Data Lake Storage Gen1 object serves as the unique key. However, a logical hierarchy can be inferred when the Azure Data Lake Storage Gen1 object names uses directory delimiters, such as `/`. The Azure Data Lake Storage Gen1 Sink connector allows you to customize the names of the Azure Data Lake Storage Gen1 objects it uploads to the Azure Data Lake Storage Gen1 bucket. In general, the names of the Azure Data Lake Storage Gen1 object uploaded by the Azure Data Lake Storage Gen1 Sink connector follow this format: ```bash ///++. Admin API : The Admin API is the Kafka REST API that enables administrators to manage and monitor Kafka clusters, topics, brokers, and other Kafka components. Ansible Playbooks for Confluent Platform : Ansible Playbooks for Confluent Platform is a set of Ansible playbooks and roles that are designed to automate the deployment and management of Confluent Platform. Apache Flink : Apache Flink is an open source stream processing framework for stateful computations over unbounded and bounded data streams. Flink provides a unified API for batch and stream processing that supports event-time and out-of-order processing, and supports exactly-once semantics. Flink applications include real-time analytics, data pipelines, and event-driven applications. **Related terms**: *bounded stream*, *data stream*, *stream processing*, *unbounded stream* **Related content** - [Apache Flink: Stream Processing and SQL on Confluent Cloud](/cloud/current/flink/index.html#) - [What is Apache Flink?](https://www.confluent.io/learn/apache-flink/) - [Apache Flink 101 (Confluent Developer course)](https://developer.confluent.io/courses/apache-flink/intro/) Apache Kafka : Apache Kafka is an open source event streaming platform that provides a unified, high-throughput, low-latency, fault-tolerant, scalable, distributed, and secure data streaming platform. Kafka is a publish-and-subscribe messaging system that enables distributed applications to ingest, process, and share data in real-time. **Related content** - [Introduction to Kafka](/kafka/introduction.html) audit log : An audit log is a historical record of actions and operations that are triggered when auditable events occurs. Audit log records can be used to troubleshoot system issues, manage security, and monitor compliance, by tracking administrative activity, data access and modification, monitoring sign-in attempts, and reconstructing security breaches and fraudulent activity. **Related terms**: *auditable event* **Related content** - [Audit Log Concepts for Confluent Cloud](/cloud/current/monitoring/audit-logging/cloud-audit-log-concepts.html) - [Audit Log Concepts for Confluent Platform](/platform/current/security/audit-logs/audit-logs-concepts.html) auditable event : An auditable event is an event that represents an action or operation that can be tracked and monitored for security purposes and compliance. When an auditable event occurs, an auditable event method is triggered and an event message is sent to the audit log cluster and stored as an audit log record. **Related terms**: *audit log*, *event message* **Related content** - [Auditable Events in Confluent Cloud](/cloud/current/monitoring/audit-logging/event-methods/index.html) - [Auditable Events in Confluent Platform](/platform/current/security/audit-logs/auditable-events.html) authentication : Authentication is the process of verifying the identity of a principal that interacts with a system or application. Authentication is often used in conjunction with authorization to determine whether a principal is allowed to access a resource and perform a specific action or operation on that resource. Digital authentication requires one or more of the following: something a principal knows (a password or security question), something a principal has (a security token or key), or something a principal is (a biometric characteristic, such as a fingerprint or voiceprint). Multi-factor authentication (MFA) requires two or more forms of authentication. **Related terms**: *authorization*, *identity*, *identity provider*, *identity pool*, *principal*, *role* authorization : Authorization is the process of evaluating and then granting or denying a principal a set of permissions required to access and perform operations on resources. **Related terms**: *authentication*, *group mapping*, *identity*, *identity provider*, *identity pool*, *principal*, *role* Avro : Avro is a data serialization and exchange framework that provides data structures, remote procedure call (RPC), compact binary data format, a container file, and uses JSON to represent schemas. Avro schemas ensure that every field is properly described and documented for use with serializers and deserializers. You can either send a schema with every message or use Schema Registry to store and receive schemas for use by consumers and producers to save bandwidth and storage space. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [Apache Avro - a data serialization system](https://avro.apache.org/) - Confluent Cloud: [Avro Schema Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-avro.html) - Confluent Platform: [Avro Schema Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-avro.html) Basic Kafka cluster : A Confluent Cloud cluster type. Basic Kafka clusters are designed for experimentation, early development, and basic use cases. batch processing : Batch processing is the method of collecting a large volume of data over a specific time interval, after which the data is processed all at once and loaded into a destination system. Batch processing is often used when processing data can occur independently of the source and timing of the data. It is efficient for non-real-time data processing, such as data warehousing, reporting, and analytics. **Related terms**: *bounded stream*, *stream processing*, *unbounded stream* CIDR block : A CIDR block is a group of IP addresses that are contiguous and can be represented as a single block. CIDR blocks are expressed using Classless Inter-domain Routing (CIDR) notation that includes an IP address and a number of bits in the network mask. **Related content** - [CIDR notation](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing#CIDR_notation) - [Classless Inter-domain Routing (CIDR): The Internet Address Assignment and Aggregation Plan [RFC 4632]](https://www.rfc-editor.org/rfc/rfc4632.html) Cluster Linking : Cluster Linking is a highly performant data replication feature that enables links between Kafka clusters to mirror data from one cluster to another. Cluster Linking creates perfect copies of Kafka topics, which keep data in sync across clusters. Use cases include geo-replication of data, data sharing, migration, disaster recovery, and tiered separation of critical applications. **Related content** - [Geo-replication with Cluster Linking on Confluent Cloud](/cloud/current/multi-cloud/cluster-linking/index.html) - [Cluster Linking for Confluent Platform](/platform/current/multi-dc-deployments/cluster-linking/index.html) commit log : A commit log is a log of all event messages about commits (changes or operations made) sent to a Kafka topic. A commit log ensures that all event messages are processed at least once and provides a mechanism for recovery in the event of a failure. The commit log is also referred to as a write-ahead log (WAL) or a transaction log. **Related terms**: *event message* Confluent Cloud : Confluent Cloud is the fully managed, cloud-native event streaming service powered by Kora, the event streaming platform based on Kafka and extended by Confluent to provide high availability, scalability, elasticity, security, and global interconnectivity. Confluent Cloud offers cost-effective multi-tenant configurations as well as dedicated solutions, if stronger isolation is required. **Related terms**: *Apache Kafka*, *Kora* **Related content** - [Confluent Cloud Overview](/cloud/current/index.html) - [Confluent Cloud](https://www.confluent.io/confluent-cloud/) Confluent Cloud network : A Confluent Cloud network is an abstraction for a single tenant network environment that hosts Dedicated Kafka clusters in Confluent Cloud along with their single tenant services, like ksqlDB clusters and managed connectors. **Related content** - [Confluent Cloud Network Overview](/cloud/current/networking/overview.html#ccloud-networks) Confluent for Kubernetes (CFK) : *Confluent for Kubernetes (CFK)* is a cloud-native control plane for deploying and managing Confluent in private cloud environments through declarative API. Confluent Platform : Confluent Platform is a specialized distribution of Kafka at its core, with additional components for data integration, streaming data pipelines, and stream processing. Confluent REST Proxy : Confluent REST Proxy provides a RESTful interface to an Kafka cluster, making it easy to produce and consume messages, view the state of the cluster, and perform administrative actions without using the native Kafka protocol or clients. **Related content** - Confluent Platform: [REST Proxy](/platform/current/kafka-rest/index.html) Confluent Server : Confluent Server is the default Kafka broker component of Confluent Platform that builds on the foundation of Apache Kafka® and provides enhanced proprietary features designed for enterprise use. Confluent Server is fully compatible with Kafka, and adds Kafka cluster support for Role-Based Access Control, Audit Logs, Schema Validation, Self Balancing Clusters, Tiered Storage, Multi-Region Clusters, and Cluster Linking. **Related terms**: *Confluent Platform*, *Apache Kafka*, *Kafka broker*, *Cluster Linking*, *multi-region cluster (MRC)* Confluent Unit for Kafka (CKU) : Confluent Unit for Kafka (CKU) is a unit of horizontal scaling for Dedicated Kafka clusters in Confluent Cloud that provide preallocated resources. CKUs determine the capacity of a Dedicated Kafka cluster in Confluent Cloud. **Related content** - [CKU limits per cluster](/cloud/current/clusters/cluster-types.html#cku-limits-per-cluster) Connect API : The Connect API is the Kafka API that enables a connector to read event streams from a source system and write to a target system. Connect worker : A Connect worker is a server process that runs a connector and performs the actual work of moving data in and out of Kafka topics. A worker is a server process that runs on hardware independent of the Kafka brokers themselves. It is scalable and fault-tolerant, meaning you can run a cluster of workers that share the load of moving data in and out of Kafka from and to external systems. **Related terms**: *connector*, *Kafka Connect* connection attempts : In Confluent Cloud, connection attempts are a Kafka cluster billing dimension that defines the maximum number of new TCP connections to the cluster you can create in one second. This includes successful and unsuccessful authentication attempts. Available in the Metrics API as `successful_authentication_count` (only includes successful authentications, not unsuccessful authentication attempts). To reduce usage on connection attempts, use longer-lived connections to the cluster. If you exceed the maximum, connection attempts may be refused. connector : A connector is an abstract mechanism that enables communication, coordination, or cooperation among components by transferring data elements from one interface to another without changing the data. connector offset : Connector offset uniquely identifies the position of a connector as it processes data. Connectors use a variety of strategies to implement the connector offset, including everything from monotonically increasing integers to replay ids, lists of files, timestamps and even checkpoint information. Connector offsets keep track of already-processed data in the event of a connector restart or recovery. While sink connectors use a pattern for connector offsets similar to the offset mechanism used throughout Kafka, the implementation details for source connectors are often much different. This is because source connectors track the progress of a source system as it process data. consumer : A consumer is a Kafka client application that subscribes to (reads and processes) event messages from a Kafka topic. The Streams API and the Consumer API are the two APIs that enable consumers to read event streams from Kafka topics. **Related terms**: *Consumer API*, *consumer group*, *producer*, *Streams API* Consumer API : The Consumer API is the Kafka API used for consuming (reading) event messages or records from Kafka topics and enables a Kafka consumer to subscribe to a topic and read event messages as they arrive. Batch processing is a common use case for the Consumer API. consumer group : A consumer group is a single logical consumer implemented with multiple physical consumers for reasons of throughput and resilience. By dividing topics among consumers in the group into partitions, consumers in the group can process messages in parallel, increasing message throughput and enabling load balancing. **Related terms**: *consumer*, *partition*, *partition*, *producer*, *topic* consumer lag : Consumer lag is the number of consumer offsets between the latest message produced in a partition and the last message consumed by a consumer, that is the number of messages pending to be consumed from a particular partition. A large consumer lag, or a quickly growing lag, indicates that the consumer is unable to read from a partition as fast as the messages are available. This can be caused by a slow consumer, slow network, or slow broker. consumer offset : Consumer offset is the unique and monotonically increasing integer value that uniquely identifies the position of an event record in a partition. Consumers use offsets to track their current position in the Kafka topic, allowing consumers to resume processing from where they left off. Offsets are stored on the Kafka broker, which does not track which records have been read and which have not. It is up to the consumer connection to track this information. When a consumer acknowledges receiving and processing a message, it commits an offset value that is stored in the special internal topic `__commit_offsets`. cross-resource RBAC role binding : A cross-resource RBAC role binding is a role binding in Confluent Cloud that is applied at the Organization or Environment scope and grants access to multiple resources. For example, assigning a principal the NetworkAdmin role at the Organization scope lets them administer all networks across all Environments in their Organization. **Related terms**: *identity pool*, *principal*, *role*, *role binding* CRUD : CRUD is an acronym for the four basic operations that can be performed on data: Create, Read, Update, and Delete. custom connector : A custom connector is a connector created using Connect plugins uploaded to Confluent Cloud by users. This includes connector plugins that are built from scratch, modified open-source connector plugins, or third-party connector plugins. data at rest : Data at rest is data that is physically stored on non-volatile media (such as hard drives, solid-state drives, or other storage devices) and is not actively being transmitted or processed by a system. data contract : A data contract is a formal agreement between an upstream component and a downstream component on the structure and semantics of data that is in motion. A schema is a key element of a data contract. The schema, metadata, rules, policies, and evolution plan form the data contract. You can associate data contracts (schemas and more) with [topics](#term-Kafka-topic). **Related content** - Confluent Platform: [Data Contracts for Schema Registry on Confluent Platform](/platform/current/schema-registry/fundamentals/data-contracts.html) - Confluent Cloud: [Data Contracts for Schema Registry on Confluent Cloud](/cloud/current/sr/fundamentals/data-contracts.html) - Cloud Console: [Manage Schemas in Confluent Cloud](/cloud/current//sr/schemas-manage.html) data encryption key (DEK) : A data encryption key (DEK) is a symmetric key that is used to encrypt and decrypt data. The DEK is used in client-side field level encryption (CSFLE) to encrypt sensitive data. The DEK is itself encrypted using a key encryption key (KEK) that is only accessible to authorized users. The encrypted DEK and encrypted data are stored together. Only users with access to the KEK can decrypt the DEK and access the sensitive data. **Related terms**: *envelope encryption*, *key encryption key (KEK)* data in motion : Data in motion is data that is actively being transferred between source and destination, typically systems, devices, or networks. Data in motion is also referred to as data in transit or data in flight. data in use : Data in use is data that is actively being processed or manipulated in memory (RAM, CPU caches, or CPU registers). data ingestion : Data ingestion is the process of collecting, importing, and integrating data from various sources into a system for further processing, analysis, or storage. data mapping : Data mapping is the process of defining relationships or associations between source data elements and target data elements. Data mapping is an important process in data integration, data migration, and data transformation, ensuring that data is accurately and consistently represented when it is moved or combined. data pipeline : A data pipeline is a series of processes and systems that enable the flow of data from sources to destinations, automating the movement and transformation of data for various purposes, such as analytics, reporting, or machine learning. A data pipeline typically comprised of a source system, a data ingestion tool, a data transformation tool, and a target system. A data pipeline covers the following stages: data extraction, data transformation, data loading, and data validation. Data Portal : Data Portal is a Confluent Cloud application that uses Stream Catalog and Stream Lineage to provide self-service access throughout Confluent Cloud Console for data practitioners to search and discover existing topics using tags and business metadata, request access to topics and data, and access data in topics to to build streaming applications and data pipelines. Leverages Stream Catalog and Stream Lineage to provide a data-centric view of Confluent optimized for self-service access to data where users can search, discover and understand available data, request access to data, and use data. **Related terms**: *Stream Catalog*, *Stream Lineage* **Related content** - [Data Portal on Confluent Cloud](/cloud/current/stream-governance/data-portal.html) data serialization : Data serialization is the process of converting data structures or objects into a format that can be stored or transmitted, and reconstructed later in the same or another computer environment. Data serialization is a common technique for implementing data persistence, interprocess communication, and object communication. Confluent Schema Registry (in Confluent Platform) and Confluent Cloud Schema Registry support data serialization using serializers and deserializers for the following formats: Avro, JSON Schema, and Protobuf. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) data steward : A data steward is a person with data-related responsibilities, such as data governance, data quality, and data security. data stream : A data stream is a continuous flow of data records that are produced and consumed by applications. dead letter queue (DLQ) : A dead letter queue (DLQ) is a queue where messages that could not be processed successfully by a sink connector are placed. Instead of stopping, the sink connector sends messages that could not be written successfully as event records to the DLQ topic while the sink connector continues processing messages. Dedicated Kafka cluster : A Confluent Cloud cluster type. Dedicated Kafka clusters are designed for critical production workloads with high traffic or private networking requirements. deserializer : A deserializer is a tool that converts a serial byte stream back into objects and parallel data. Deserializers work with serializers (known together as Serdes) to support efficient storage and high-speed data transmission over the wire. Confluent provides Serdes for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) egress : In general networking, egress refers to outbound traffic leaving a network or a specific network segment. In Confluent Cloud, egress is a Kafka cluster billing dimension that defines the number of bytes consumed from the cluster in one second. Available in the Metrics API as `sent_bytes` (convert from bytes to MB). To reduce egress in Confluent Cloud, compress your messages and ensure each consumer is only consuming from the topics it requires. For compression, use lz4. Avoid gzip because of high overhead on the cluster. Elastic Confluent Unit for Kafka (eCKU) : Elastic Confluent Unit for Kafka (eCKU) is used to express capacity for Basic, Standard, Enterprise, and Freight Kafka clusters. These clusters automatically scale up to a fixed ceiling. There is no need to resize these type clusters. When you need more capacity, your cluster expands up to the fixed ceiling. If you’re not using capacity above the minimum, you’re not paying for it. ELT : ELT is an acronym for Extract-Load-Transform, where data is extracted from a source system and loaded into a target system before processing or transformation. Compared to ETL, ELT is a more flexible approach to data ingestion because the data is loaded into the target system before transformation. Enterprise Kafka cluster : A Confluent Cloud cluster type. Enterprise Kafka clusters are designed for production-ready functionality that requires private endpoint networking capabilities. envelope encryption : Envelope encryption is a cryptographic technique that uses two keys to encrypt data. The symmetric data encryption key (DEK) is used to encrypt sensitive data. The separate asymmetric key encryption key (KEK) is the master key used to encrypt the DEK. The DEK and encrypted data are stored together. Only users with access to the KEK can decrypt the DEK and access the sensitive data. In Confluent Cloud, envelope encryption is used to enable client-side field level encryption (CSFLE). CSFLE encrypts sensitive data in a message before it is sent to Confluent Cloud and allows for temporary decryption of sensitive data when required to perform operations on the data. **Related terms**: *data encryption key (DEK)*, *key encryption key (KEK)* ETL : ETL is an acronym for Extract-Transform-Load, where data is extracted from a source system, transformed into a target format, and loaded into a target system. Compared to ELT, ETL is a more rigid approach to data ingestion because the data is transformed before loading into the target system. event : An event is a meaningful action or occurrence of something that happened. Events that can be recognized by a program, either human-generated or triggered by software, can be recorded in a log file or other data store. **Related terms**: *event message*, *event record*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event message : An event message is a record of an event sent to a Kafka topic, represented as a key-value pair. Each event message consists of a key-value pair, a timestamp, the compression type, headers for metadata (optional), and a partition and offset ID (once the message is written). The key is optional and can be used to identify the event. The value is required and contains details about the event that happened. **Related terms**: *event*, *event record*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event record : An event record is the record of an event stored in a Kafka topic. Event records are organized and durably stored in topics. Examples of events include orders, payments, activities, or measurements. An event typically contains one or more data fields that describe the fact, as well as a timestamp that denotes when the event was created by its event source. The event may also contain various metadata, such as its source of origin (for example, the application or cloud service that created the event) and storage-level information (for example, its position in the event stream). **Related terms**: *event*, *event message*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event sink : An event sink is a consumer of events, which can include applications, cloud services, databases, IoT sensors, and more. **Related terms**: *event*, *event message, \*event record*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event source : An event source is a producer of events, which can include cloud services, databases, IoT sensors, mainframes, and more. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event stream*, *event streaming*, *event streaming platform*, *event time* event stream : An event stream is a continuous flow of event messages produced by an event source and consumed by one or more consumers. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event streaming platform*, *event time* event streaming : Event streaming is the practice of capturing event data in real-time from data sources. Event streaming is a form of data streaming that is used to capture, store, process, and react to data in real-time or retrospectively. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming platform*, *event time* event streaming platform : An event streaming platform is a platform that events can be written to once, allowing distributed functions within an organization to react in realtime. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event time* event time : Event time is the time when an event occurred on the producing device, as opposed to the time when the event was processed or recorded. Event time is often used in stream processing to determine the order of events and to perform windowing operations. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event streaming platform* exactly-once semantics : Exactly-once semantics is a guarantee that a message is delivered exactly once and in the order that it was sent. Even if a producer retries sending a message, or a consumer retries processing a message, the message is delivered exactly once. This guarantee is achieved by the broker assigning a unique ID to each message and storing the ID in the consumer offset. The consumer offset is committed to the broker only after the message is processed. If the consumer fails to process the message, the message is redelivered and processed again. Freight Kafka cluster : A Confluent Cloud cluster type. Freight Kafka clusters are designed for high-throughput, relaxed latency workloads that are less expensive than self-managed open source Kafka. granularity : Granularity is the degree or level of detail to which an entity (a system, service, or resource) is broken down into subcomponents, parts, or elements. Entities that are *fine-grained* have a higher level of detail, while *coarse-grained* entities have a reduced level of detail, often combining finer parts into a larger whole. In the context of access control, granular permissions provide precise control over resource access. They allow administrators to grant specific operations on distinct resources. This ensures users only have permissions tailored to their needs, minimizing unnecessary or potentially risky access. group mapping : Group mapping is a set of rules that map groups in your SSO identity provider to Confluent Cloud RBAC roles. When a user signs in to Confluent Cloud using SSO, Confluent Cloud uses the group mapping to grant access to Confluent Cloud resources. **Related terms**: *identity provider*, *identity pool*, *principal*, *role* **Related content** - [Group Mapping for Confluent Cloud](/cloud/current/access-management/authenticate/sso/group-mapping/overview.html) identity : An identity is a unique identifier that is used to authenticate and authorize users and applications to access resources. Identity is often used in conjunction with access control to determine whether a user or application is allowed to access a resource and perform a specific action or operation on that resource. **Related terms**: *identity provider*, *identity pool*, *principal*, *role* identity pool : An identity pool is a collection of identities that can be used to authenticate and authorize users and applications to access resources. Identity pools are used to manage permissions for users and applications that access resources in Confluent Cloud. They are also used to manage permissions for Confluent Cloud service accounts that are used to access resources in Confluent Cloud. identity provider : An identity provider is a trusted provider that authenticates users and issues security tokens that are used to verify the identity of a user. Identity providers are often used in single sign-on (SSO) scenarios, where a user can log in to multiple applications or services with a single set of credentials. Infinite Storage : Infinite Storage is the Confluent Cloud storage service that enhances the scalability of Confluent Cloud resources by separating storage and processing. Tiered storage within Confluent Cloud moves data between storage layers based on the needs of the workload, retrieves tiered data when requested, and garbage collects data that is past retention or otherwise deleted. If an application reads historical data, latency is not increased for other applications reading more recent data. Storage resources are decoupled from compute resources, you only pay for what you produce to Confluent Cloud and for storage that you use, and CKUs do not have storage limits. Related content: - [Infinite Storage in Confluent Cloud for Apache Kafka](https://www.confluent.io/blog/infinite-kafka-data-storage-in-confluent-cloud/) ingress : In general networking, ingress refers to traffic that enters a network from an external source. In Confluent Cloud, ingress is a Kafka cluster billing dimension that defines the number of bytes produced to the cluster in one second. Available in the Metrics API as `received_bytes` (convert from bytes to MB). To reduce ingress in Confluent Cloud, compress your messages. For compression, use lz4. Avoid gzip because of high overhead on the cluster. internal topic : An internal topic is a topic, prefixed with double underscores (“_\_”), that is automatically created by a Kafka component to store metadata about the broker, partition assignment, consumer offsets, and other information. Examples of internal topics: `__cluster_metadata`, `__consumer_offsets`, `__transaction_state`, `__confluent.support.metrics`, and `__confluent.support.metrics-raw`. JSON Schema : JSON Schema is a declarative language used for data serialization and exchange to define data structures, specify formats, and validate JSON documents. It is a way to encode expected data types, properties, and constraints to ensure that all fields are properly described for use with serializers and deserializers. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [JSON Schema - a declarative language that allows you to annotate and validate JSON documents.](https://json-schema.org/) - Confluent Cloud: [JSON Schema Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-avro.html) - Confluent Platform: [JSON Schema Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-json.html) Kafka bootstrap server : A Kafka bootstrap server is a Kafka broker that a Kafka client initiates a connection to a Kafka cluster and returns metadata, which includes the addresses for all of the brokers in the Kafka cluster. Although only one bootstrap server is required to connect to a Kafka cluster, multiple brokers can be specified in a bootstrap server list to provide high availability and fault tolerance in case a broker is unavailable. In Confluent Cloud, the bootstrap server is the general cluster endpoint. Kafka broker : A Kafka broker is a server in the Kafka storage layer that stores event streams from one or more sources. A Kafka cluster is typically comprised of several brokers. Every broker in a cluster is also a bootstrap server, meaning if you can connect to one broker in a cluster, you can connect to every broker. Kafka client : A Kafka client allows you to write distributed applications and microservices that read, write, and process streams of events in parallel, at scale, and in a fault-tolerant manner, even in the case of network problems or machine failures. The Kafka client library provides functions, classes, and utilities that allow developers to create Kafka producer clients (Producers) and consumer clients (Consumers) using various programming languages. The primary way to build production-ready Producers and Consumers is by using your preferred programming language and a Kafka client library. **Related content** - [Build Client Applications for Confluent Cloud](/cloud/current/client-apps/overview.html) - [Build Client Applications for Confluent Platform](/platform/current/clients/index.html) - [Getting Started with Apache Kafka and Java (or Python, Go, .Net, and others)](https://developer.confluent.io/get-started/java/) Kafka cluster : A Kafka cluster is a group of interconnected Kafka brokers that manage and distribute real-time data streaming, processing, and storage as if they are a single system. By distributing tasks and services across multiple Kafka brokers, the Kafka cluster improves availability, reliability, and performance. Kafka Connect : Kafka Connect is the component of Kafka that provides data integration between databases, key-value stores, search indexes, file systems, and Kafka brokers. Kafka Connect is an ecosystem of a client application and pluggable connectors. As a client application, Connect is a server process that runs on hardware independent of the Kafka brokers themselves. It is scalable and fault-tolerant, meaning you can run a cluster of Connect workers that share the load of moving data in and out of Kafka from and to external systems. Connect also abstracts the business of code away from the user and instead requires only JSON configuration to run. **Related content** - Confluent Cloud: [Kafka Connect](/cloud/current/billing/overview.html#kconnect-long) - Confluent Platform: [Kafka Connect](/platform/current/connect/index.html) Kafka controller : A Kafka controller is the node in a Kafka cluster that is responsible for managing and changing the metadata of the cluster. This node also communicates metadata changes to the rest of the cluster. When Kafka uses ZooKeeper for metadata management, the controller is a broker, and the broker persists the metadata to ZooKeeper for backup and recovery. With KRaft, you dedicate Kafka nodes to operate as controllers and the metadata is stored in Kafka itself and not persisted to ZooKeeper. KRaft enables faster recovery because of this. For more information, see [KRaft overview](/platform/current/kafka-metadata/kraft.html). Kafka listener : A Kafka listener is an endpoint that Kafka brokers bind to use to communicate with clients. For Kafka clusters, Kafka listeners are configured in the `listeners` property of the `server.properties` file. Advertised listeners are publicly accessible endpoints that are used by clients to connect to the Kafka cluster. **Related content** - [Kafka Listeners – Explained](https://www.confluent.io/blog/kafka-listeners-explained/) Kafka metadata : Kafka metadata is the information about the Kafka cluster and the topics that are stored in it. This information includes details such as the brokers in the cluster, the topics that are available, the partitions for each topic, and the location of the leader for each partition. Kafka metadata is used by clients to discover the available brokers and topics, and to determine which broker is the leader for a particular partition. This information is essential for clients to be able to send and receive messages to and from Kafka. Kafka Streams : Kafka Streams is a stream processing library for building streaming applications and microservices that transform (filter, group mapping, aggregate, join, and more) incoming event streams in real-time to Kafka topics stored in an Kafka cluster. The Streams API can be used to build applications that process data in real-time, analyze data continuously, and build data pipelines. **Related content** - [Kafka Streams](/platform/current/streams/overview.html) Kafka topic : See *topic*. key encryption key (KEK) : A key encryption key (KEK) is a master key that is used to encrypt and a decrypt other keys, specifically the data encryption key (DEK). Only users with access to the KEK can decrypt the DEK and access the sensitive data. **Related terms**: *data encryption key (DEK)*, *envelope encryption*. Kora : Kora is the cloud-native streaming data service based on Kafka technology that powers the Confluent Cloud event streaming platform for building real-time data pipelines and streaming applications. Kora abstracts low-level resources, such as Kafka brokers, and hides operational complexities, such as system upgrades. Kora is built on the following foundations: a tiered storage layer that improves cost and performance, elasticity and consistent performance through incremental load balancing, cost effective multi-tenancy with dynamic quota management and cell-based isolation, continuous monitoring of both system health and data integrity, and clean abstraction with standard Kafka protocols and CKUs to hide underlying resources. **Related terms**: *Apache Kafka*, *Confluent Cloud*, *Confluent Unit for Kafka (CKU)* **Related content** - [Kora: The Cloud Native Engine for Apache Kafka](https://www.confluent.io/blog/cloud-native-data-streaming-kafka-engine/) - [Kora: A Cloud-Native Event Streaming Platform For Kafka](https://www.vldb.org/pvldb/vol16/p3822-povzner.pdf) KRaft : KRaft (or Apache Kafka Raft) is a consensus protocol introduced in Kafka 2.4 to provide metadata management for Kafka with the goal to replace ZooKeeper. KRaft simplifies Kafka because it enables the management of metadata in Kafka itself, rather than splitting it between ZooKeeper and Kafka. As of Confluent Platform 7.5, KRaft is the default method of metadata management in new deployments. For more information, see [KRaft overview](/platform/current/kafka-metadata/kraft.html). ksqlDB : ksqlDB is a streaming SQL database engine purpose-built for creating stream processing applications on top of Kafka. logical Kafka cluster (LKC) : A logical Kafka cluster (LKC) is a subset of a physical Kafka cluster (PKC) that is isolated from other logical clusters within Confluent Cloud. Each logical unit of isolation is considered a tenant and maps to a specific organization. If the mapping is one-to-one, one LKC maps to one PKC (a Dedicated cluster). If the mapping is many-to-one, one LKC maps to one of the multitenant Kafka cluster types (Basic, Standard, Enterprise, and Freight). **Related terms**: *Confluent Cloud*, *Kafka cluster*, *physical Kafka cluster (PKC)* **Related content** - [Kafka Cluster Types in Confluent Cloud](/cloud/current/clusters/cluster-types.html) - [Multi-tenancy and Client Quotas on Confluent Cloud](/cloud/current/clusters/client-quotas.html) multi-region cluster (MRC) : A multi-region cluster (MRC) is a single Kafka cluster that replicates data between datacenters across regional availability zones. multi-tenancy : Multi-tenancy is a software architecture in which a single physical instance is shared among multiple logical instances, or tenants. In Confluent Cloud, each Basic, Standard, Enterprise, and Freight cluster is a logical Kafka cluster (LKC) that shares a physical Kafka cluster (PKC) with other tenants. Each LKC is isolated from other L and has its own resources, such as memory, compute, and storage. **Related terms**: *Confluent Cloud*, *logical Kafka cluster (LKC)*, *physical Kafka cluster (PKC)* **Related content** * [From On-Prem to Cloud-Native: Multi-Tenancy in Confluent Cloud](https://www.confluent.io/blog/cloud-native-multi-tenant-kafka-with-confluent-cloud/) * [Multi-tenancy and Client Quotas on Confluent Cloud](/cloud/current/clusters/client-quotas.html) offset : An offset is an integer assigned to each message that uniquely represents its position within the data stream, guaranteeing the ordering of records and allowing offset-based connections to replay messages from any point in time. **Related terms**: *consumer offset*, *connector offset*, *offset commit*, *replayability* offset commit : An offset commit is the process of keeping track of the current position of an offset-based connection (primarily Kafka consumers and connectors) within the data stream. The offset commit process is not specific to consumers, producers, or connectors. It is a general mechanism in Kafka to track the position of any application that is reading data. When a consumer commits an offset, the offset identifies the next message the consumer should consume. For example, if a consumer has an offset of 5, it has consumed messages 0 through 4 and will next consume message 5. If the consumer crashes or is shut down, its partitions are reassigned to another consumer which initiates consuming from the last committed offset of each partition. The committed offset for consumers is stored on a Kafka broker. When a consumer commits an offset, it sends a commit request to the Kafka cluster, specifying the partition and offset it wants to commit for a particular consumer group. The Kafka broker receiving the commit request then stores this offset in the `__consumer_offsets` internal topic. **Related terms**: *consumer offset*, *offset* OpenSSL : OpenSSL is an open-source software library and toolkit that implements the Secure Sockets Layer (SSL) and Transport Layer Security (TLS) protocols. For more information, see [Wikipedia](https://en.wikipedia.org/wiki/OpenSSL). parent cluster : The Kafka cluster that a resource belongs to. **Related terms**: *Kafka cluster* partition : A partition is a unit of data storage that divides a topic into multiple, parallel event streams, each of which is stored on separate Kafka brokers and can be consumed independently. Partitioning is a key concept in Kafka because it allows Kafka to scale horizontally by adding more brokers to the cluster. Partitions are also the unit of parallelism in Kafka. A topic can have one or more partitions, and each partition is an ordered, immutable sequence of event records that is continually appended to a partition log. partitions (pre-replication) : In Confluent Cloud, partitions are a Kafka cluster billing dimension that define the maximum number of partitions that can exist on the cluster at one time, before replication. While you are not charged for partitions on any type of Kafka cluster, the number of partitions you use has an impact on eCKU. To determine eCKUs limits for partitions, Confluent Cloud bills only for pre-replication (leader partitions) across a cluster. All topics that you create (as well as internal topics that are automatically created by Confluent Platform components such as ksqlDB, Kafka Streams, Connect, and Control Center (Legacy)) count towards the cluster partition limit. Confluent prefixes topics created automatically with an underscore (_). Topics that are internal to Kafka itself (consumer offsets) are not visible in Cloud Console and do not count against partition limits or toward partition billing. Available in the Metrics API as `partition_count`. In Confluent Cloud, attempts to create additional partitions beyond the cluster limit fail with an error message. To reduce usage on partitions (pre-replication), delete unused topics and create new topics with fewer partitions. Use the Kafka Admin interface to increase the partition count of an existing topic if the initial partition count is too low. physical Kafka cluster (PKC) : A physical Kafka cluster (PKC) is a Kafka cluster comprised of multiple brokers. Each physical Kafka cluster is created on a Kubernetes cluster by the control plane. A PKC is not directly accessible by clients. principal : A principal is an entity that can be authenticated and granted permissions based on roles to access resources and perform operations. An entity can be a user account, service account, group mapping, or identity pool. **Related terms**: *group mapping*, *identity*, *identity pool*, *role*, *service account*, *user account* private internet : A private internet is a closed, restricted computer network typically used by organizations to provide secure environments for managing sensitive data and resources. processing time : Processing time is the time when an event is processed or recorded by a system, as opposed to the time when the event occurred on the producing device. Processing time is often used in stream processing to determine the order of events and to perform windowing operations. producer : A producer is a client application that publishes (writes) data to a topic in an Kafka cluster. Producers write data to a topic and are the only clients that can write data to a topic. Each record written to a topic is appended to the partition of the topic that is selected by the producer. Producer API : The Producer API is the Kafka API that allows you to write data to a topic in an Kafka cluster. The Producer API is used by producer clients to publish data to a topic in an Kafka cluster. Protobuf : Protobuf (or Protocol Buffers) is an open-source data format used to serialize structured data for storage. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [Protocol Buffers](https://protobuf.dev/) - [Getting Started with Protobuf in Confluent Cloud](https://www.confluent.io/blog/using-protobuf-in-confluent-cloud/) - Confluent Cloud: [Protobuf Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-protobuf.html) - Confluent Platform: [Protobuf Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-protobuf.html) public internet : The public internet is the global system of interconnected computers and networks that use TCP/IP to communicate with each other. rebalancing : Rebalancing is the process of redistributing the partitions of a topic among the consumers of a consumer group for improved performance and scalability. A rebalance can occur if a consumer has failed the heartbeat and has been excluded from the group, it voluntarily left the group, metadata has been updated for a consumer, or a consumer has joined the group. replayability : Replayability is the ability to replay messages from any point in time. **Related terms**: *consumer offset*, *offset*, *offset commit* replication : Replication is the process of creating and maintaining multiple copies (or *replicas*) of data across different nodes in a distributed system to increase availability, reliability, redundancy, and accessibility. replication factor : A replication factor is the number of copies of a partition that are distributed across the brokers in a cluster. requests : In Confluent Cloud, requests are a Kafka cluster billing dimension that defines the number of client requests to the cluster in one second. Available in the Metrics API as `request_count`. To reduce usage on requests, you can adjust producer batching configurations, consumer client batching configurations, and shut down otherwise inactive clients. For Dedicated clusters, a high number of requests per second results in increased load on the cluster. role : A role is a Confluent-defined job function assigned a set of permissions required to perform specific actions or operations on Confluent resources bound to a principal and Confluent resources. A role can be assigned to a user account, group mapping, service account, or identity pool. **Related terms**: *group mapping*, *identity*, *identity pool*, *principal*, *service account* **Related content** - [Predefined RBAC Roles in Confluent Cloud](/cloud/current/access-management/access-control/rbac/predefined-rbac-roles.html) - [Role-Based Access Control Predefined Roles in Confluent Platform](/platform/current/security/rbac/rbac-predefined-roles.html) rolling restart : A rolling restart restarts the brokers in a Kafka cluster with zero downtime by incrementally restarting a Kafka broker after verifying that there are no under-replicated partitions on the broker before proceeding to the next broker. Restarting the brokers one at a time allows for software upgrades, broker configuration updates, or cluster maintenance while maintaining high availability by avoiding downtime. **Related content** - [Rolling restart](/platform/current/kafka/post-deployment.html#rolling-restart) schema : A schema is the structured definition or blueprint used to describe the format and structure event messages sent through the Kafka event streaming platform. Schemas are used to validate the structure of data in event messages and ensures that producers and consumers are sending and receiving data in the same format. Schemas are defined in the Schema Registry. Schema Registry : Schema Registry is a centralized repository for managing and validating schemas for topic message data that stores and manages schemas for Kafka topics. Schema Registry is built into Confluent Cloud as a managed service, available with the Advanced Stream Governance package, and offered as part of Confluent Enterprise for self-managed deployments. The Schema Registry is a RESTful service that stores and manages schemas for Kafka topics. The Schema Registry is integrated with Kafka and Connect to provide a central location for managing schemas and validating data. Producers and consumers to Kafka topics use schemas to ensure data consistency and compatibility as schemas evolve. Schema Registry is a key component of Stream Governance. **Related content** - Confluent Cloud: [Manage Schemas in Confluent Cloud](/cloud/current/sr/schemas-manage.html) - Confluent Platform: [Schema Registry Overview](/platform/current/schema-registry/index.html) schema subject : A schema subject is the namespace for a schema in Schema Registry. This unique identifier defines a logical grouping of related schemas. Kafka topics contain event messages serialized and deserialized using the structure and rules defined in a schema subject. This ensures compatibility and supports schema evolution. **Related content** - Confluent Cloud: [Manage Schemas in Confluent Cloud](/cloud/current/sr/schemas-manage.html) - Confluent Platform: [Schema Registry Concepts](/platform/current/schema-registry/index.html) - [Understanding Schema Subjects](https://developer.confluent.io/courses/schema-registry/schema-subjects/) Serdes : Serdes are serializers and deserializers that convert objects and parallel data into a serial byte stream for efficient storage and high-speed data transmission over the wire. Confluent provides Serdes for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) - [Serde](https://serde.rs/) serializer : A serializer is a tool that converts objects and parallel data into a serial byte stream. Serializers work with deserializers (known together as Serdes) to support efficient storage and high-speed data transmission over the wire. Confluent provides serializers for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) service account : A service account is a non-person entity used by an application or service to access resources and perform operations. Because a service account is an identity independent of the user who created it, it can be used programmatically to authenticate to resources and perform operations without the need for a user to be signed in. **Related content** - [Service Accounts for Confluent Cloud](/cloud/current/access-management/identity/service-accounts.html) service quota : A service quota is the limit, or maximum value, for a specific Confluent Cloud resource or operation that might vary by the resource scope it applies to. **Related content** - [Service Quotas for Confluent Cloud](/cloud/current/quotas/index.html) single message transform (SMT) : A single message transform (SMT) is a transformation or operation applied in realtime on an individual message that changes the values, keys, or headers of a message before being sent to a sink connector or after being read from a source connector. SMTs are convenient for inserting fields, masking information, event routing, and other minor data adjustments. single sign-on (SSO) : Single sign-on (SSO) is a centralized authentication service that allows users to use a single set of credentials to log in to multiple applications or services. **Related terms**: *authentication*, *group mapping*, *identity provider* **Related content** - [Single Sign-On for Confluent Cloud](/cloud/current/access-management/authenticate/sso/index.html) sink connector : A sink connector is a Kafka Connect connector that publishes (writes) data from a Kafka topic to an external system. source connector : A source connector is a Kafka Connect connector that subscribes (reads) data from a source (external system), extracts the payload and schema of the data, and publishes (writes) the data to Kafka topics. standalone : Standalone refers to a configuration in which a software application, system, or service operates independently on a single instance or device. This mode is commonly used for development, testing, and debugging purposes. For Kafka Connect, a standalone worker is a single process responsible for running all connectors and tasks on a single instance. Standard Kafka cluster : A Confluent Cloud cluster type. Standard Kafka clusters are designed for production-ready features and functionality. static egress IP address : A static egress IP address is an IP address used by a Confluent Cloud managed connector to establish outbound connections to endpoints of external data sources and sinks over the public internet. **Related content** - [Use Static IP Addresses on Confluent Cloud for Connectors and Cluster Linking](/cloud/current/networking/static-egress-ip-addresses.html) - [Static Egress IP Addresses for Confluent Cloud Connectors](/cloud/current/connectors/static-egress-ip.html) storage (pre-replication) : In Confluent Cloud, storage is a Kafka cluster billing dimension that defines the number of bytes retained on the cluster, pre-replication. Available in the Metrics API as `retained_bytes` (convert from bytes to TB). The returned value is pre-replication. Standard, Enterprise, Dedicated, and Freight clusters support Infinite Storage. This means there is no maximum size limit for the amount of data that can be stored on the cluster. You can configure policy settings `retention.bytes` and `retention.ms` at the topic level to control exactly how much and how long to retain data in a way that makes sense for your applications and helps control your costs. To reduce storage in Confluent Cloud, compress your messages and reduce retention settings. For compression, use lz4. Avoid gzip because of high overhead on the cluster. Stream Catalog : Stream Catalog is a pillar of Confluent Cloud Stream Governance that provides a centralized inventory of your organization’s data assets that supports data governance and data discovery. With Data Portal in Confluent Cloud Console, users can find event streams across systems, search topics by name or tags, and enrich event data to increase value and usefulness. REST and GraphQL APIs can be used to search schemas, apply tags to records or fields, manage business metadata, and discover relationships across data assets. **Related content** - [Stream Catalog on Confluent Cloud: User Guide to Manage Tags and Metadata](/cloud/current/stream-governance/stream-catalog.html) - [Stream Catalog in Streaming Data Governance (Confluent Developer course)](https://developer.confluent.io/courses/governing-data-streams/stream-catalog/) Stream Governance : Stream Governance is a collection of tools and features that provide data governance for data in motion. These include data quality tools such as Schema Registry, schema ID validation, and schema linking; built-in data catalog capabilities to classify, organize, and find event streams across systems; and stream lineage to visualize complex data relationships and uncover insights with interactive, end-to-end maps of event streams. Taken together, these and other governance tools enable teams to manage the availability, integrity, and security of data used across organizations, and help with standardization, monitoring, collaboration, reporting, and more. **Related terms**: *Stream Catalog*, *Stream Lineage* **Related content** - [Stream Governance on Confluent Cloud](/cloud/current/stream-governance/index.html) stream lineage : Stream lineage is the life cycle, or history, of data, including its origins, transformations, and consumption, as it moves through various stages in data pipelines, applications, and systems. Stream lineage provides a record of data’s journey from its source to its destination, and is used to track data quality, data governance, and data security. **Related terms**: **Data Portal**, *Stream Governance* **Related content** - [Stream Lineage on Confluent Cloud](/cloud/current/stream-governance/stream-lineage.html) stream processing : Stream processing is the method of collecting event stream data in real-time as it arrives, transforming the data in real-time using operations (such as filters, joins, and aggregations), and publishing the results to one or more target systems. Stream processing can be used to analyze data continuously, build data pipelines, and process time-sensitive data in real-time. Using the Confluent event streaming platform, event streams can be processed in real-time using Kafka Streams, Kafka Connect, or ksqlDB. Streams API : The Streams API is the Kafka API that allows you to build streaming applications and microservices that transform (for example, filter, group, aggregate, join) incoming event streams in real-time to Kafka topics stored in a Kafka cluster. The Streams API is used by stream processing clients to process data in real-time, analyze data continuously, and build data pipelines. **Related content** - [Introduction Kafka Streams API](/platform/current/streams/introduction.html) throttling : Throttling is the process Kafka clusters in Confluent Cloud use to protect themselves from getting to an over-utilized state. Also known as backpressure, throttling in Confluent Cloud occurs when cluster load reaches 80%. At this point, applications may start seeing higher latencies or timeouts as the cluster must begin throttling requests or connection attempts. topic : A topic is a user-defined category or feed name where event messages are stored and published by producers and subscribed to by consumers. Each topic is a log of event messages. Topics are stored in one or more partitions, which distribute topic records brokers in a Kafka cluster. Each partition is an ordered, immutable sequence of records that are continually appended to a topic. **Related content** - [Manage Topics in Confluent Cloud](/cloud/current/client-apps/topics/index.html) total client connections : In Confluent Cloud, total client connections are a Kafka cluster billing dimension that defines the number of TCP connections to the cluster you can open at one time. Available in the Metrics API as `active_connection_count`. Filter by principal to understand how many connections each application is creating. How many connections a cluster supports can vary widely based on several factors, including number of producer clients, number of consumer clients, partition keying strategy, produce patterns per client, and consume patterns per client. For Dedicated clusters, Confluent derives a guideline for total client connections from benchmarking that indicates exceeding this number of connections increases produce latency for test clients. However, this does not apply to all workloads. That is why total client connections are a guideline, not a hard limit for Dedicated Kafka clusters. Monitor the impact on cluster load as connection count increases, as this is the final representation of the impact of a given workload or CKU dimension on the underlying resources of the cluster. Consider the Confluent guideline a per-CKU guideline. The number of connections tends to increase when you add brokers. In other words, if you significantly exceed the per-CKU guideline, cluster expansion doesn’t always give your cluster more connection count headroom. Transport Layer Security (TLS) : Transport Layer Security (TLS) is a cryptographic protocol that provides secure communication over a network. For more information, see [Wikipedia](https://en.wikipedia.org/wiki/Transport_Layer_Security). unbounded stream : An unbounded stream is a stream of data that is continuously generated in real-time and has no defined end. Examples of unbounded streams include stock prices, sensor data, and social media feeds. Processing unbounded streams requires a different approach than processing bounded streams. Unbounded streams are processed incrementally as data arrives, while bounded streams are processed as a batch after all data has arrived. Kafka Streams and Flink can be used to process unbounded streams. **Related terms**: *bounded stream*, *stream processing*, *unbounded stream* under replication : Under replication is a situation when the number of in-sync replicas is below the number of all replicas. Under Replicated partitions can occur when a broker is down or cannot replicate fast enough from the leader (replica fetcher lag). user account : A user account is an account representing the identity of a person who can be authenticated and granted access to Confluent Cloud resources. **Related content** - [User Accounts for Confluent Cloud](/cloud/current/access-management/identity/user-accounts/overview.html) watermark : A watermark in Flink is a marker that keeps track of time as data is processed. A watermark means that all records until the current moment in time have been “seen”. This way, Flink can correctly perform tasks that depend on when things happened, like calculating aggregations over time windows. **Related content** - [Time and Watermarks](/cloud/current/flink/concepts/timely-stream-processing.html) ### Azure Data Lake Storage Gen2 object names The Azure Data Lake Storage Gen2 data model is a flat structure: each bucket stores objects, and the name of each Azure Data Lake Storage Gen2 object serves as the unique key. However, a logical hierarchy can be inferred when the Azure Data Lake Storage Gen2 object names uses directory delimiters, such as `/`. The Azure Data Lake Storage Gen2 Sink connector allows you to customize the names of the Azure Data Lake Storage Gen2 objects it uploads to the Azure Data Lake Storage Gen2 bucket. In general, the names of the Azure Data Lake Storage Gen2 object uploaded by the Azure Data Lake Storage Gen2 Sink connector follow this format: ```text ///++. Admin API : The Admin API is the Kafka REST API that enables administrators to manage and monitor Kafka clusters, topics, brokers, and other Kafka components. Ansible Playbooks for Confluent Platform : Ansible Playbooks for Confluent Platform is a set of Ansible playbooks and roles that are designed to automate the deployment and management of Confluent Platform. Apache Flink : Apache Flink is an open source stream processing framework for stateful computations over unbounded and bounded data streams. Flink provides a unified API for batch and stream processing that supports event-time and out-of-order processing, and supports exactly-once semantics. Flink applications include real-time analytics, data pipelines, and event-driven applications. **Related terms**: *bounded stream*, *data stream*, *stream processing*, *unbounded stream* **Related content** - [Apache Flink: Stream Processing and SQL on Confluent Cloud](/cloud/current/flink/index.html#) - [What is Apache Flink?](https://www.confluent.io/learn/apache-flink/) - [Apache Flink 101 (Confluent Developer course)](https://developer.confluent.io/courses/apache-flink/intro/) Apache Kafka : Apache Kafka is an open source event streaming platform that provides a unified, high-throughput, low-latency, fault-tolerant, scalable, distributed, and secure data streaming platform. Kafka is a publish-and-subscribe messaging system that enables distributed applications to ingest, process, and share data in real-time. **Related content** - [Introduction to Kafka](/kafka/introduction.html) audit log : An audit log is a historical record of actions and operations that are triggered when auditable events occurs. Audit log records can be used to troubleshoot system issues, manage security, and monitor compliance, by tracking administrative activity, data access and modification, monitoring sign-in attempts, and reconstructing security breaches and fraudulent activity. **Related terms**: *auditable event* **Related content** - [Audit Log Concepts for Confluent Cloud](/cloud/current/monitoring/audit-logging/cloud-audit-log-concepts.html) - [Audit Log Concepts for Confluent Platform](/platform/current/security/audit-logs/audit-logs-concepts.html) auditable event : An auditable event is an event that represents an action or operation that can be tracked and monitored for security purposes and compliance. When an auditable event occurs, an auditable event method is triggered and an event message is sent to the audit log cluster and stored as an audit log record. **Related terms**: *audit log*, *event message* **Related content** - [Auditable Events in Confluent Cloud](/cloud/current/monitoring/audit-logging/event-methods/index.html) - [Auditable Events in Confluent Platform](/platform/current/security/audit-logs/auditable-events.html) authentication : Authentication is the process of verifying the identity of a principal that interacts with a system or application. Authentication is often used in conjunction with authorization to determine whether a principal is allowed to access a resource and perform a specific action or operation on that resource. Digital authentication requires one or more of the following: something a principal knows (a password or security question), something a principal has (a security token or key), or something a principal is (a biometric characteristic, such as a fingerprint or voiceprint). Multi-factor authentication (MFA) requires two or more forms of authentication. **Related terms**: *authorization*, *identity*, *identity provider*, *identity pool*, *principal*, *role* authorization : Authorization is the process of evaluating and then granting or denying a principal a set of permissions required to access and perform operations on resources. **Related terms**: *authentication*, *group mapping*, *identity*, *identity provider*, *identity pool*, *principal*, *role* Avro : Avro is a data serialization and exchange framework that provides data structures, remote procedure call (RPC), compact binary data format, a container file, and uses JSON to represent schemas. Avro schemas ensure that every field is properly described and documented for use with serializers and deserializers. You can either send a schema with every message or use Schema Registry to store and receive schemas for use by consumers and producers to save bandwidth and storage space. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [Apache Avro - a data serialization system](https://avro.apache.org/) - Confluent Cloud: [Avro Schema Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-avro.html) - Confluent Platform: [Avro Schema Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-avro.html) Basic Kafka cluster : A Confluent Cloud cluster type. Basic Kafka clusters are designed for experimentation, early development, and basic use cases. batch processing : Batch processing is the method of collecting a large volume of data over a specific time interval, after which the data is processed all at once and loaded into a destination system. Batch processing is often used when processing data can occur independently of the source and timing of the data. It is efficient for non-real-time data processing, such as data warehousing, reporting, and analytics. **Related terms**: *bounded stream*, *stream processing*, *unbounded stream* CIDR block : A CIDR block is a group of IP addresses that are contiguous and can be represented as a single block. CIDR blocks are expressed using Classless Inter-domain Routing (CIDR) notation that includes an IP address and a number of bits in the network mask. **Related content** - [CIDR notation](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing#CIDR_notation) - [Classless Inter-domain Routing (CIDR): The Internet Address Assignment and Aggregation Plan [RFC 4632]](https://www.rfc-editor.org/rfc/rfc4632.html) Cluster Linking : Cluster Linking is a highly performant data replication feature that enables links between Kafka clusters to mirror data from one cluster to another. Cluster Linking creates perfect copies of Kafka topics, which keep data in sync across clusters. Use cases include geo-replication of data, data sharing, migration, disaster recovery, and tiered separation of critical applications. **Related content** - [Geo-replication with Cluster Linking on Confluent Cloud](/cloud/current/multi-cloud/cluster-linking/index.html) - [Cluster Linking for Confluent Platform](/platform/current/multi-dc-deployments/cluster-linking/index.html) commit log : A commit log is a log of all event messages about commits (changes or operations made) sent to a Kafka topic. A commit log ensures that all event messages are processed at least once and provides a mechanism for recovery in the event of a failure. The commit log is also referred to as a write-ahead log (WAL) or a transaction log. **Related terms**: *event message* Confluent Cloud : Confluent Cloud is the fully managed, cloud-native event streaming service powered by Kora, the event streaming platform based on Kafka and extended by Confluent to provide high availability, scalability, elasticity, security, and global interconnectivity. Confluent Cloud offers cost-effective multi-tenant configurations as well as dedicated solutions, if stronger isolation is required. **Related terms**: *Apache Kafka*, *Kora* **Related content** - [Confluent Cloud Overview](/cloud/current/index.html) - [Confluent Cloud](https://www.confluent.io/confluent-cloud/) Confluent Cloud network : A Confluent Cloud network is an abstraction for a single tenant network environment that hosts Dedicated Kafka clusters in Confluent Cloud along with their single tenant services, like ksqlDB clusters and managed connectors. **Related content** - [Confluent Cloud Network Overview](/cloud/current/networking/overview.html#ccloud-networks) Confluent for Kubernetes (CFK) : *Confluent for Kubernetes (CFK)* is a cloud-native control plane for deploying and managing Confluent in private cloud environments through declarative API. Confluent Platform : Confluent Platform is a specialized distribution of Kafka at its core, with additional components for data integration, streaming data pipelines, and stream processing. Confluent REST Proxy : Confluent REST Proxy provides a RESTful interface to an Kafka cluster, making it easy to produce and consume messages, view the state of the cluster, and perform administrative actions without using the native Kafka protocol or clients. **Related content** - Confluent Platform: [REST Proxy](/platform/current/kafka-rest/index.html) Confluent Server : Confluent Server is the default Kafka broker component of Confluent Platform that builds on the foundation of Apache Kafka® and provides enhanced proprietary features designed for enterprise use. Confluent Server is fully compatible with Kafka, and adds Kafka cluster support for Role-Based Access Control, Audit Logs, Schema Validation, Self Balancing Clusters, Tiered Storage, Multi-Region Clusters, and Cluster Linking. **Related terms**: *Confluent Platform*, *Apache Kafka*, *Kafka broker*, *Cluster Linking*, *multi-region cluster (MRC)* Confluent Unit for Kafka (CKU) : Confluent Unit for Kafka (CKU) is a unit of horizontal scaling for Dedicated Kafka clusters in Confluent Cloud that provide preallocated resources. CKUs determine the capacity of a Dedicated Kafka cluster in Confluent Cloud. **Related content** - [CKU limits per cluster](/cloud/current/clusters/cluster-types.html#cku-limits-per-cluster) Connect API : The Connect API is the Kafka API that enables a connector to read event streams from a source system and write to a target system. Connect worker : A Connect worker is a server process that runs a connector and performs the actual work of moving data in and out of Kafka topics. A worker is a server process that runs on hardware independent of the Kafka brokers themselves. It is scalable and fault-tolerant, meaning you can run a cluster of workers that share the load of moving data in and out of Kafka from and to external systems. **Related terms**: *connector*, *Kafka Connect* connection attempts : In Confluent Cloud, connection attempts are a Kafka cluster billing dimension that defines the maximum number of new TCP connections to the cluster you can create in one second. This includes successful and unsuccessful authentication attempts. Available in the Metrics API as `successful_authentication_count` (only includes successful authentications, not unsuccessful authentication attempts). To reduce usage on connection attempts, use longer-lived connections to the cluster. If you exceed the maximum, connection attempts may be refused. connector : A connector is an abstract mechanism that enables communication, coordination, or cooperation among components by transferring data elements from one interface to another without changing the data. connector offset : Connector offset uniquely identifies the position of a connector as it processes data. Connectors use a variety of strategies to implement the connector offset, including everything from monotonically increasing integers to replay ids, lists of files, timestamps and even checkpoint information. Connector offsets keep track of already-processed data in the event of a connector restart or recovery. While sink connectors use a pattern for connector offsets similar to the offset mechanism used throughout Kafka, the implementation details for source connectors are often much different. This is because source connectors track the progress of a source system as it process data. consumer : A consumer is a Kafka client application that subscribes to (reads and processes) event messages from a Kafka topic. The Streams API and the Consumer API are the two APIs that enable consumers to read event streams from Kafka topics. **Related terms**: *Consumer API*, *consumer group*, *producer*, *Streams API* Consumer API : The Consumer API is the Kafka API used for consuming (reading) event messages or records from Kafka topics and enables a Kafka consumer to subscribe to a topic and read event messages as they arrive. Batch processing is a common use case for the Consumer API. consumer group : A consumer group is a single logical consumer implemented with multiple physical consumers for reasons of throughput and resilience. By dividing topics among consumers in the group into partitions, consumers in the group can process messages in parallel, increasing message throughput and enabling load balancing. **Related terms**: *consumer*, *partition*, *partition*, *producer*, *topic* consumer lag : Consumer lag is the number of consumer offsets between the latest message produced in a partition and the last message consumed by a consumer, that is the number of messages pending to be consumed from a particular partition. A large consumer lag, or a quickly growing lag, indicates that the consumer is unable to read from a partition as fast as the messages are available. This can be caused by a slow consumer, slow network, or slow broker. consumer offset : Consumer offset is the unique and monotonically increasing integer value that uniquely identifies the position of an event record in a partition. Consumers use offsets to track their current position in the Kafka topic, allowing consumers to resume processing from where they left off. Offsets are stored on the Kafka broker, which does not track which records have been read and which have not. It is up to the consumer connection to track this information. When a consumer acknowledges receiving and processing a message, it commits an offset value that is stored in the special internal topic `__commit_offsets`. cross-resource RBAC role binding : A cross-resource RBAC role binding is a role binding in Confluent Cloud that is applied at the Organization or Environment scope and grants access to multiple resources. For example, assigning a principal the NetworkAdmin role at the Organization scope lets them administer all networks across all Environments in their Organization. **Related terms**: *identity pool*, *principal*, *role*, *role binding* CRUD : CRUD is an acronym for the four basic operations that can be performed on data: Create, Read, Update, and Delete. custom connector : A custom connector is a connector created using Connect plugins uploaded to Confluent Cloud by users. This includes connector plugins that are built from scratch, modified open-source connector plugins, or third-party connector plugins. data at rest : Data at rest is data that is physically stored on non-volatile media (such as hard drives, solid-state drives, or other storage devices) and is not actively being transmitted or processed by a system. data contract : A data contract is a formal agreement between an upstream component and a downstream component on the structure and semantics of data that is in motion. A schema is a key element of a data contract. The schema, metadata, rules, policies, and evolution plan form the data contract. You can associate data contracts (schemas and more) with [topics](#term-Kafka-topic). **Related content** - Confluent Platform: [Data Contracts for Schema Registry on Confluent Platform](/platform/current/schema-registry/fundamentals/data-contracts.html) - Confluent Cloud: [Data Contracts for Schema Registry on Confluent Cloud](/cloud/current/sr/fundamentals/data-contracts.html) - Cloud Console: [Manage Schemas in Confluent Cloud](/cloud/current//sr/schemas-manage.html) data encryption key (DEK) : A data encryption key (DEK) is a symmetric key that is used to encrypt and decrypt data. The DEK is used in client-side field level encryption (CSFLE) to encrypt sensitive data. The DEK is itself encrypted using a key encryption key (KEK) that is only accessible to authorized users. The encrypted DEK and encrypted data are stored together. Only users with access to the KEK can decrypt the DEK and access the sensitive data. **Related terms**: *envelope encryption*, *key encryption key (KEK)* data in motion : Data in motion is data that is actively being transferred between source and destination, typically systems, devices, or networks. Data in motion is also referred to as data in transit or data in flight. data in use : Data in use is data that is actively being processed or manipulated in memory (RAM, CPU caches, or CPU registers). data ingestion : Data ingestion is the process of collecting, importing, and integrating data from various sources into a system for further processing, analysis, or storage. data mapping : Data mapping is the process of defining relationships or associations between source data elements and target data elements. Data mapping is an important process in data integration, data migration, and data transformation, ensuring that data is accurately and consistently represented when it is moved or combined. data pipeline : A data pipeline is a series of processes and systems that enable the flow of data from sources to destinations, automating the movement and transformation of data for various purposes, such as analytics, reporting, or machine learning. A data pipeline typically comprised of a source system, a data ingestion tool, a data transformation tool, and a target system. A data pipeline covers the following stages: data extraction, data transformation, data loading, and data validation. Data Portal : Data Portal is a Confluent Cloud application that uses Stream Catalog and Stream Lineage to provide self-service access throughout Confluent Cloud Console for data practitioners to search and discover existing topics using tags and business metadata, request access to topics and data, and access data in topics to to build streaming applications and data pipelines. Leverages Stream Catalog and Stream Lineage to provide a data-centric view of Confluent optimized for self-service access to data where users can search, discover and understand available data, request access to data, and use data. **Related terms**: *Stream Catalog*, *Stream Lineage* **Related content** - [Data Portal on Confluent Cloud](/cloud/current/stream-governance/data-portal.html) data serialization : Data serialization is the process of converting data structures or objects into a format that can be stored or transmitted, and reconstructed later in the same or another computer environment. Data serialization is a common technique for implementing data persistence, interprocess communication, and object communication. Confluent Schema Registry (in Confluent Platform) and Confluent Cloud Schema Registry support data serialization using serializers and deserializers for the following formats: Avro, JSON Schema, and Protobuf. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) data steward : A data steward is a person with data-related responsibilities, such as data governance, data quality, and data security. data stream : A data stream is a continuous flow of data records that are produced and consumed by applications. dead letter queue (DLQ) : A dead letter queue (DLQ) is a queue where messages that could not be processed successfully by a sink connector are placed. Instead of stopping, the sink connector sends messages that could not be written successfully as event records to the DLQ topic while the sink connector continues processing messages. Dedicated Kafka cluster : A Confluent Cloud cluster type. Dedicated Kafka clusters are designed for critical production workloads with high traffic or private networking requirements. deserializer : A deserializer is a tool that converts a serial byte stream back into objects and parallel data. Deserializers work with serializers (known together as Serdes) to support efficient storage and high-speed data transmission over the wire. Confluent provides Serdes for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) egress : In general networking, egress refers to outbound traffic leaving a network or a specific network segment. In Confluent Cloud, egress is a Kafka cluster billing dimension that defines the number of bytes consumed from the cluster in one second. Available in the Metrics API as `sent_bytes` (convert from bytes to MB). To reduce egress in Confluent Cloud, compress your messages and ensure each consumer is only consuming from the topics it requires. For compression, use lz4. Avoid gzip because of high overhead on the cluster. Elastic Confluent Unit for Kafka (eCKU) : Elastic Confluent Unit for Kafka (eCKU) is used to express capacity for Basic, Standard, Enterprise, and Freight Kafka clusters. These clusters automatically scale up to a fixed ceiling. There is no need to resize these type clusters. When you need more capacity, your cluster expands up to the fixed ceiling. If you’re not using capacity above the minimum, you’re not paying for it. ELT : ELT is an acronym for Extract-Load-Transform, where data is extracted from a source system and loaded into a target system before processing or transformation. Compared to ETL, ELT is a more flexible approach to data ingestion because the data is loaded into the target system before transformation. Enterprise Kafka cluster : A Confluent Cloud cluster type. Enterprise Kafka clusters are designed for production-ready functionality that requires private endpoint networking capabilities. envelope encryption : Envelope encryption is a cryptographic technique that uses two keys to encrypt data. The symmetric data encryption key (DEK) is used to encrypt sensitive data. The separate asymmetric key encryption key (KEK) is the master key used to encrypt the DEK. The DEK and encrypted data are stored together. Only users with access to the KEK can decrypt the DEK and access the sensitive data. In Confluent Cloud, envelope encryption is used to enable client-side field level encryption (CSFLE). CSFLE encrypts sensitive data in a message before it is sent to Confluent Cloud and allows for temporary decryption of sensitive data when required to perform operations on the data. **Related terms**: *data encryption key (DEK)*, *key encryption key (KEK)* ETL : ETL is an acronym for Extract-Transform-Load, where data is extracted from a source system, transformed into a target format, and loaded into a target system. Compared to ELT, ETL is a more rigid approach to data ingestion because the data is transformed before loading into the target system. event : An event is a meaningful action or occurrence of something that happened. Events that can be recognized by a program, either human-generated or triggered by software, can be recorded in a log file or other data store. **Related terms**: *event message*, *event record*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event message : An event message is a record of an event sent to a Kafka topic, represented as a key-value pair. Each event message consists of a key-value pair, a timestamp, the compression type, headers for metadata (optional), and a partition and offset ID (once the message is written). The key is optional and can be used to identify the event. The value is required and contains details about the event that happened. **Related terms**: *event*, *event record*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event record : An event record is the record of an event stored in a Kafka topic. Event records are organized and durably stored in topics. Examples of events include orders, payments, activities, or measurements. An event typically contains one or more data fields that describe the fact, as well as a timestamp that denotes when the event was created by its event source. The event may also contain various metadata, such as its source of origin (for example, the application or cloud service that created the event) and storage-level information (for example, its position in the event stream). **Related terms**: *event*, *event message*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event sink : An event sink is a consumer of events, which can include applications, cloud services, databases, IoT sensors, and more. **Related terms**: *event*, *event message, \*event record*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event source : An event source is a producer of events, which can include cloud services, databases, IoT sensors, mainframes, and more. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event stream*, *event streaming*, *event streaming platform*, *event time* event stream : An event stream is a continuous flow of event messages produced by an event source and consumed by one or more consumers. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event streaming platform*, *event time* event streaming : Event streaming is the practice of capturing event data in real-time from data sources. Event streaming is a form of data streaming that is used to capture, store, process, and react to data in real-time or retrospectively. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming platform*, *event time* event streaming platform : An event streaming platform is a platform that events can be written to once, allowing distributed functions within an organization to react in realtime. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event time* event time : Event time is the time when an event occurred on the producing device, as opposed to the time when the event was processed or recorded. Event time is often used in stream processing to determine the order of events and to perform windowing operations. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event streaming platform* exactly-once semantics : Exactly-once semantics is a guarantee that a message is delivered exactly once and in the order that it was sent. Even if a producer retries sending a message, or a consumer retries processing a message, the message is delivered exactly once. This guarantee is achieved by the broker assigning a unique ID to each message and storing the ID in the consumer offset. The consumer offset is committed to the broker only after the message is processed. If the consumer fails to process the message, the message is redelivered and processed again. Freight Kafka cluster : A Confluent Cloud cluster type. Freight Kafka clusters are designed for high-throughput, relaxed latency workloads that are less expensive than self-managed open source Kafka. granularity : Granularity is the degree or level of detail to which an entity (a system, service, or resource) is broken down into subcomponents, parts, or elements. Entities that are *fine-grained* have a higher level of detail, while *coarse-grained* entities have a reduced level of detail, often combining finer parts into a larger whole. In the context of access control, granular permissions provide precise control over resource access. They allow administrators to grant specific operations on distinct resources. This ensures users only have permissions tailored to their needs, minimizing unnecessary or potentially risky access. group mapping : Group mapping is a set of rules that map groups in your SSO identity provider to Confluent Cloud RBAC roles. When a user signs in to Confluent Cloud using SSO, Confluent Cloud uses the group mapping to grant access to Confluent Cloud resources. **Related terms**: *identity provider*, *identity pool*, *principal*, *role* **Related content** - [Group Mapping for Confluent Cloud](/cloud/current/access-management/authenticate/sso/group-mapping/overview.html) identity : An identity is a unique identifier that is used to authenticate and authorize users and applications to access resources. Identity is often used in conjunction with access control to determine whether a user or application is allowed to access a resource and perform a specific action or operation on that resource. **Related terms**: *identity provider*, *identity pool*, *principal*, *role* identity pool : An identity pool is a collection of identities that can be used to authenticate and authorize users and applications to access resources. Identity pools are used to manage permissions for users and applications that access resources in Confluent Cloud. They are also used to manage permissions for Confluent Cloud service accounts that are used to access resources in Confluent Cloud. identity provider : An identity provider is a trusted provider that authenticates users and issues security tokens that are used to verify the identity of a user. Identity providers are often used in single sign-on (SSO) scenarios, where a user can log in to multiple applications or services with a single set of credentials. Infinite Storage : Infinite Storage is the Confluent Cloud storage service that enhances the scalability of Confluent Cloud resources by separating storage and processing. Tiered storage within Confluent Cloud moves data between storage layers based on the needs of the workload, retrieves tiered data when requested, and garbage collects data that is past retention or otherwise deleted. If an application reads historical data, latency is not increased for other applications reading more recent data. Storage resources are decoupled from compute resources, you only pay for what you produce to Confluent Cloud and for storage that you use, and CKUs do not have storage limits. Related content: - [Infinite Storage in Confluent Cloud for Apache Kafka](https://www.confluent.io/blog/infinite-kafka-data-storage-in-confluent-cloud/) ingress : In general networking, ingress refers to traffic that enters a network from an external source. In Confluent Cloud, ingress is a Kafka cluster billing dimension that defines the number of bytes produced to the cluster in one second. Available in the Metrics API as `received_bytes` (convert from bytes to MB). To reduce ingress in Confluent Cloud, compress your messages. For compression, use lz4. Avoid gzip because of high overhead on the cluster. internal topic : An internal topic is a topic, prefixed with double underscores (“_\_”), that is automatically created by a Kafka component to store metadata about the broker, partition assignment, consumer offsets, and other information. Examples of internal topics: `__cluster_metadata`, `__consumer_offsets`, `__transaction_state`, `__confluent.support.metrics`, and `__confluent.support.metrics-raw`. JSON Schema : JSON Schema is a declarative language used for data serialization and exchange to define data structures, specify formats, and validate JSON documents. It is a way to encode expected data types, properties, and constraints to ensure that all fields are properly described for use with serializers and deserializers. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [JSON Schema - a declarative language that allows you to annotate and validate JSON documents.](https://json-schema.org/) - Confluent Cloud: [JSON Schema Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-avro.html) - Confluent Platform: [JSON Schema Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-json.html) Kafka bootstrap server : A Kafka bootstrap server is a Kafka broker that a Kafka client initiates a connection to a Kafka cluster and returns metadata, which includes the addresses for all of the brokers in the Kafka cluster. Although only one bootstrap server is required to connect to a Kafka cluster, multiple brokers can be specified in a bootstrap server list to provide high availability and fault tolerance in case a broker is unavailable. In Confluent Cloud, the bootstrap server is the general cluster endpoint. Kafka broker : A Kafka broker is a server in the Kafka storage layer that stores event streams from one or more sources. A Kafka cluster is typically comprised of several brokers. Every broker in a cluster is also a bootstrap server, meaning if you can connect to one broker in a cluster, you can connect to every broker. Kafka client : A Kafka client allows you to write distributed applications and microservices that read, write, and process streams of events in parallel, at scale, and in a fault-tolerant manner, even in the case of network problems or machine failures. The Kafka client library provides functions, classes, and utilities that allow developers to create Kafka producer clients (Producers) and consumer clients (Consumers) using various programming languages. The primary way to build production-ready Producers and Consumers is by using your preferred programming language and a Kafka client library. **Related content** - [Build Client Applications for Confluent Cloud](/cloud/current/client-apps/overview.html) - [Build Client Applications for Confluent Platform](/platform/current/clients/index.html) - [Getting Started with Apache Kafka and Java (or Python, Go, .Net, and others)](https://developer.confluent.io/get-started/java/) Kafka cluster : A Kafka cluster is a group of interconnected Kafka brokers that manage and distribute real-time data streaming, processing, and storage as if they are a single system. By distributing tasks and services across multiple Kafka brokers, the Kafka cluster improves availability, reliability, and performance. Kafka Connect : Kafka Connect is the component of Kafka that provides data integration between databases, key-value stores, search indexes, file systems, and Kafka brokers. Kafka Connect is an ecosystem of a client application and pluggable connectors. As a client application, Connect is a server process that runs on hardware independent of the Kafka brokers themselves. It is scalable and fault-tolerant, meaning you can run a cluster of Connect workers that share the load of moving data in and out of Kafka from and to external systems. Connect also abstracts the business of code away from the user and instead requires only JSON configuration to run. **Related content** - Confluent Cloud: [Kafka Connect](/cloud/current/billing/overview.html#kconnect-long) - Confluent Platform: [Kafka Connect](/platform/current/connect/index.html) Kafka controller : A Kafka controller is the node in a Kafka cluster that is responsible for managing and changing the metadata of the cluster. This node also communicates metadata changes to the rest of the cluster. When Kafka uses ZooKeeper for metadata management, the controller is a broker, and the broker persists the metadata to ZooKeeper for backup and recovery. With KRaft, you dedicate Kafka nodes to operate as controllers and the metadata is stored in Kafka itself and not persisted to ZooKeeper. KRaft enables faster recovery because of this. For more information, see [KRaft overview](/platform/current/kafka-metadata/kraft.html). Kafka listener : A Kafka listener is an endpoint that Kafka brokers bind to use to communicate with clients. For Kafka clusters, Kafka listeners are configured in the `listeners` property of the `server.properties` file. Advertised listeners are publicly accessible endpoints that are used by clients to connect to the Kafka cluster. **Related content** - [Kafka Listeners – Explained](https://www.confluent.io/blog/kafka-listeners-explained/) Kafka metadata : Kafka metadata is the information about the Kafka cluster and the topics that are stored in it. This information includes details such as the brokers in the cluster, the topics that are available, the partitions for each topic, and the location of the leader for each partition. Kafka metadata is used by clients to discover the available brokers and topics, and to determine which broker is the leader for a particular partition. This information is essential for clients to be able to send and receive messages to and from Kafka. Kafka Streams : Kafka Streams is a stream processing library for building streaming applications and microservices that transform (filter, group mapping, aggregate, join, and more) incoming event streams in real-time to Kafka topics stored in an Kafka cluster. The Streams API can be used to build applications that process data in real-time, analyze data continuously, and build data pipelines. **Related content** - [Kafka Streams](/platform/current/streams/overview.html) Kafka topic : See *topic*. key encryption key (KEK) : A key encryption key (KEK) is a master key that is used to encrypt and a decrypt other keys, specifically the data encryption key (DEK). Only users with access to the KEK can decrypt the DEK and access the sensitive data. **Related terms**: *data encryption key (DEK)*, *envelope encryption*. Kora : Kora is the cloud-native streaming data service based on Kafka technology that powers the Confluent Cloud event streaming platform for building real-time data pipelines and streaming applications. Kora abstracts low-level resources, such as Kafka brokers, and hides operational complexities, such as system upgrades. Kora is built on the following foundations: a tiered storage layer that improves cost and performance, elasticity and consistent performance through incremental load balancing, cost effective multi-tenancy with dynamic quota management and cell-based isolation, continuous monitoring of both system health and data integrity, and clean abstraction with standard Kafka protocols and CKUs to hide underlying resources. **Related terms**: *Apache Kafka*, *Confluent Cloud*, *Confluent Unit for Kafka (CKU)* **Related content** - [Kora: The Cloud Native Engine for Apache Kafka](https://www.confluent.io/blog/cloud-native-data-streaming-kafka-engine/) - [Kora: A Cloud-Native Event Streaming Platform For Kafka](https://www.vldb.org/pvldb/vol16/p3822-povzner.pdf) KRaft : KRaft (or Apache Kafka Raft) is a consensus protocol introduced in Kafka 2.4 to provide metadata management for Kafka with the goal to replace ZooKeeper. KRaft simplifies Kafka because it enables the management of metadata in Kafka itself, rather than splitting it between ZooKeeper and Kafka. As of Confluent Platform 7.5, KRaft is the default method of metadata management in new deployments. For more information, see [KRaft overview](/platform/current/kafka-metadata/kraft.html). ksqlDB : ksqlDB is a streaming SQL database engine purpose-built for creating stream processing applications on top of Kafka. logical Kafka cluster (LKC) : A logical Kafka cluster (LKC) is a subset of a physical Kafka cluster (PKC) that is isolated from other logical clusters within Confluent Cloud. Each logical unit of isolation is considered a tenant and maps to a specific organization. If the mapping is one-to-one, one LKC maps to one PKC (a Dedicated cluster). If the mapping is many-to-one, one LKC maps to one of the multitenant Kafka cluster types (Basic, Standard, Enterprise, and Freight). **Related terms**: *Confluent Cloud*, *Kafka cluster*, *physical Kafka cluster (PKC)* **Related content** - [Kafka Cluster Types in Confluent Cloud](/cloud/current/clusters/cluster-types.html) - [Multi-tenancy and Client Quotas on Confluent Cloud](/cloud/current/clusters/client-quotas.html) multi-region cluster (MRC) : A multi-region cluster (MRC) is a single Kafka cluster that replicates data between datacenters across regional availability zones. multi-tenancy : Multi-tenancy is a software architecture in which a single physical instance is shared among multiple logical instances, or tenants. In Confluent Cloud, each Basic, Standard, Enterprise, and Freight cluster is a logical Kafka cluster (LKC) that shares a physical Kafka cluster (PKC) with other tenants. Each LKC is isolated from other L and has its own resources, such as memory, compute, and storage. **Related terms**: *Confluent Cloud*, *logical Kafka cluster (LKC)*, *physical Kafka cluster (PKC)* **Related content** * [From On-Prem to Cloud-Native: Multi-Tenancy in Confluent Cloud](https://www.confluent.io/blog/cloud-native-multi-tenant-kafka-with-confluent-cloud/) * [Multi-tenancy and Client Quotas on Confluent Cloud](/cloud/current/clusters/client-quotas.html) offset : An offset is an integer assigned to each message that uniquely represents its position within the data stream, guaranteeing the ordering of records and allowing offset-based connections to replay messages from any point in time. **Related terms**: *consumer offset*, *connector offset*, *offset commit*, *replayability* offset commit : An offset commit is the process of keeping track of the current position of an offset-based connection (primarily Kafka consumers and connectors) within the data stream. The offset commit process is not specific to consumers, producers, or connectors. It is a general mechanism in Kafka to track the position of any application that is reading data. When a consumer commits an offset, the offset identifies the next message the consumer should consume. For example, if a consumer has an offset of 5, it has consumed messages 0 through 4 and will next consume message 5. If the consumer crashes or is shut down, its partitions are reassigned to another consumer which initiates consuming from the last committed offset of each partition. The committed offset for consumers is stored on a Kafka broker. When a consumer commits an offset, it sends a commit request to the Kafka cluster, specifying the partition and offset it wants to commit for a particular consumer group. The Kafka broker receiving the commit request then stores this offset in the `__consumer_offsets` internal topic. **Related terms**: *consumer offset*, *offset* OpenSSL : OpenSSL is an open-source software library and toolkit that implements the Secure Sockets Layer (SSL) and Transport Layer Security (TLS) protocols. For more information, see [Wikipedia](https://en.wikipedia.org/wiki/OpenSSL). parent cluster : The Kafka cluster that a resource belongs to. **Related terms**: *Kafka cluster* partition : A partition is a unit of data storage that divides a topic into multiple, parallel event streams, each of which is stored on separate Kafka brokers and can be consumed independently. Partitioning is a key concept in Kafka because it allows Kafka to scale horizontally by adding more brokers to the cluster. Partitions are also the unit of parallelism in Kafka. A topic can have one or more partitions, and each partition is an ordered, immutable sequence of event records that is continually appended to a partition log. partitions (pre-replication) : In Confluent Cloud, partitions are a Kafka cluster billing dimension that define the maximum number of partitions that can exist on the cluster at one time, before replication. While you are not charged for partitions on any type of Kafka cluster, the number of partitions you use has an impact on eCKU. To determine eCKUs limits for partitions, Confluent Cloud bills only for pre-replication (leader partitions) across a cluster. All topics that you create (as well as internal topics that are automatically created by Confluent Platform components such as ksqlDB, Kafka Streams, Connect, and Control Center (Legacy)) count towards the cluster partition limit. Confluent prefixes topics created automatically with an underscore (_). Topics that are internal to Kafka itself (consumer offsets) are not visible in Cloud Console and do not count against partition limits or toward partition billing. Available in the Metrics API as `partition_count`. In Confluent Cloud, attempts to create additional partitions beyond the cluster limit fail with an error message. To reduce usage on partitions (pre-replication), delete unused topics and create new topics with fewer partitions. Use the Kafka Admin interface to increase the partition count of an existing topic if the initial partition count is too low. physical Kafka cluster (PKC) : A physical Kafka cluster (PKC) is a Kafka cluster comprised of multiple brokers. Each physical Kafka cluster is created on a Kubernetes cluster by the control plane. A PKC is not directly accessible by clients. principal : A principal is an entity that can be authenticated and granted permissions based on roles to access resources and perform operations. An entity can be a user account, service account, group mapping, or identity pool. **Related terms**: *group mapping*, *identity*, *identity pool*, *role*, *service account*, *user account* private internet : A private internet is a closed, restricted computer network typically used by organizations to provide secure environments for managing sensitive data and resources. processing time : Processing time is the time when an event is processed or recorded by a system, as opposed to the time when the event occurred on the producing device. Processing time is often used in stream processing to determine the order of events and to perform windowing operations. producer : A producer is a client application that publishes (writes) data to a topic in an Kafka cluster. Producers write data to a topic and are the only clients that can write data to a topic. Each record written to a topic is appended to the partition of the topic that is selected by the producer. Producer API : The Producer API is the Kafka API that allows you to write data to a topic in an Kafka cluster. The Producer API is used by producer clients to publish data to a topic in an Kafka cluster. Protobuf : Protobuf (or Protocol Buffers) is an open-source data format used to serialize structured data for storage. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [Protocol Buffers](https://protobuf.dev/) - [Getting Started with Protobuf in Confluent Cloud](https://www.confluent.io/blog/using-protobuf-in-confluent-cloud/) - Confluent Cloud: [Protobuf Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-protobuf.html) - Confluent Platform: [Protobuf Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-protobuf.html) public internet : The public internet is the global system of interconnected computers and networks that use TCP/IP to communicate with each other. rebalancing : Rebalancing is the process of redistributing the partitions of a topic among the consumers of a consumer group for improved performance and scalability. A rebalance can occur if a consumer has failed the heartbeat and has been excluded from the group, it voluntarily left the group, metadata has been updated for a consumer, or a consumer has joined the group. replayability : Replayability is the ability to replay messages from any point in time. **Related terms**: *consumer offset*, *offset*, *offset commit* replication : Replication is the process of creating and maintaining multiple copies (or *replicas*) of data across different nodes in a distributed system to increase availability, reliability, redundancy, and accessibility. replication factor : A replication factor is the number of copies of a partition that are distributed across the brokers in a cluster. requests : In Confluent Cloud, requests are a Kafka cluster billing dimension that defines the number of client requests to the cluster in one second. Available in the Metrics API as `request_count`. To reduce usage on requests, you can adjust producer batching configurations, consumer client batching configurations, and shut down otherwise inactive clients. For Dedicated clusters, a high number of requests per second results in increased load on the cluster. role : A role is a Confluent-defined job function assigned a set of permissions required to perform specific actions or operations on Confluent resources bound to a principal and Confluent resources. A role can be assigned to a user account, group mapping, service account, or identity pool. **Related terms**: *group mapping*, *identity*, *identity pool*, *principal*, *service account* **Related content** - [Predefined RBAC Roles in Confluent Cloud](/cloud/current/access-management/access-control/rbac/predefined-rbac-roles.html) - [Role-Based Access Control Predefined Roles in Confluent Platform](/platform/current/security/rbac/rbac-predefined-roles.html) rolling restart : A rolling restart restarts the brokers in a Kafka cluster with zero downtime by incrementally restarting a Kafka broker after verifying that there are no under-replicated partitions on the broker before proceeding to the next broker. Restarting the brokers one at a time allows for software upgrades, broker configuration updates, or cluster maintenance while maintaining high availability by avoiding downtime. **Related content** - [Rolling restart](/platform/current/kafka/post-deployment.html#rolling-restart) schema : A schema is the structured definition or blueprint used to describe the format and structure event messages sent through the Kafka event streaming platform. Schemas are used to validate the structure of data in event messages and ensures that producers and consumers are sending and receiving data in the same format. Schemas are defined in the Schema Registry. Schema Registry : Schema Registry is a centralized repository for managing and validating schemas for topic message data that stores and manages schemas for Kafka topics. Schema Registry is built into Confluent Cloud as a managed service, available with the Advanced Stream Governance package, and offered as part of Confluent Enterprise for self-managed deployments. The Schema Registry is a RESTful service that stores and manages schemas for Kafka topics. The Schema Registry is integrated with Kafka and Connect to provide a central location for managing schemas and validating data. Producers and consumers to Kafka topics use schemas to ensure data consistency and compatibility as schemas evolve. Schema Registry is a key component of Stream Governance. **Related content** - Confluent Cloud: [Manage Schemas in Confluent Cloud](/cloud/current/sr/schemas-manage.html) - Confluent Platform: [Schema Registry Overview](/platform/current/schema-registry/index.html) schema subject : A schema subject is the namespace for a schema in Schema Registry. This unique identifier defines a logical grouping of related schemas. Kafka topics contain event messages serialized and deserialized using the structure and rules defined in a schema subject. This ensures compatibility and supports schema evolution. **Related content** - Confluent Cloud: [Manage Schemas in Confluent Cloud](/cloud/current/sr/schemas-manage.html) - Confluent Platform: [Schema Registry Concepts](/platform/current/schema-registry/index.html) - [Understanding Schema Subjects](https://developer.confluent.io/courses/schema-registry/schema-subjects/) Serdes : Serdes are serializers and deserializers that convert objects and parallel data into a serial byte stream for efficient storage and high-speed data transmission over the wire. Confluent provides Serdes for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) - [Serde](https://serde.rs/) serializer : A serializer is a tool that converts objects and parallel data into a serial byte stream. Serializers work with deserializers (known together as Serdes) to support efficient storage and high-speed data transmission over the wire. Confluent provides serializers for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) service account : A service account is a non-person entity used by an application or service to access resources and perform operations. Because a service account is an identity independent of the user who created it, it can be used programmatically to authenticate to resources and perform operations without the need for a user to be signed in. **Related content** - [Service Accounts for Confluent Cloud](/cloud/current/access-management/identity/service-accounts.html) service quota : A service quota is the limit, or maximum value, for a specific Confluent Cloud resource or operation that might vary by the resource scope it applies to. **Related content** - [Service Quotas for Confluent Cloud](/cloud/current/quotas/index.html) single message transform (SMT) : A single message transform (SMT) is a transformation or operation applied in realtime on an individual message that changes the values, keys, or headers of a message before being sent to a sink connector or after being read from a source connector. SMTs are convenient for inserting fields, masking information, event routing, and other minor data adjustments. single sign-on (SSO) : Single sign-on (SSO) is a centralized authentication service that allows users to use a single set of credentials to log in to multiple applications or services. **Related terms**: *authentication*, *group mapping*, *identity provider* **Related content** - [Single Sign-On for Confluent Cloud](/cloud/current/access-management/authenticate/sso/index.html) sink connector : A sink connector is a Kafka Connect connector that publishes (writes) data from a Kafka topic to an external system. source connector : A source connector is a Kafka Connect connector that subscribes (reads) data from a source (external system), extracts the payload and schema of the data, and publishes (writes) the data to Kafka topics. standalone : Standalone refers to a configuration in which a software application, system, or service operates independently on a single instance or device. This mode is commonly used for development, testing, and debugging purposes. For Kafka Connect, a standalone worker is a single process responsible for running all connectors and tasks on a single instance. Standard Kafka cluster : A Confluent Cloud cluster type. Standard Kafka clusters are designed for production-ready features and functionality. static egress IP address : A static egress IP address is an IP address used by a Confluent Cloud managed connector to establish outbound connections to endpoints of external data sources and sinks over the public internet. **Related content** - [Use Static IP Addresses on Confluent Cloud for Connectors and Cluster Linking](/cloud/current/networking/static-egress-ip-addresses.html) - [Static Egress IP Addresses for Confluent Cloud Connectors](/cloud/current/connectors/static-egress-ip.html) storage (pre-replication) : In Confluent Cloud, storage is a Kafka cluster billing dimension that defines the number of bytes retained on the cluster, pre-replication. Available in the Metrics API as `retained_bytes` (convert from bytes to TB). The returned value is pre-replication. Standard, Enterprise, Dedicated, and Freight clusters support Infinite Storage. This means there is no maximum size limit for the amount of data that can be stored on the cluster. You can configure policy settings `retention.bytes` and `retention.ms` at the topic level to control exactly how much and how long to retain data in a way that makes sense for your applications and helps control your costs. To reduce storage in Confluent Cloud, compress your messages and reduce retention settings. For compression, use lz4. Avoid gzip because of high overhead on the cluster. Stream Catalog : Stream Catalog is a pillar of Confluent Cloud Stream Governance that provides a centralized inventory of your organization’s data assets that supports data governance and data discovery. With Data Portal in Confluent Cloud Console, users can find event streams across systems, search topics by name or tags, and enrich event data to increase value and usefulness. REST and GraphQL APIs can be used to search schemas, apply tags to records or fields, manage business metadata, and discover relationships across data assets. **Related content** - [Stream Catalog on Confluent Cloud: User Guide to Manage Tags and Metadata](/cloud/current/stream-governance/stream-catalog.html) - [Stream Catalog in Streaming Data Governance (Confluent Developer course)](https://developer.confluent.io/courses/governing-data-streams/stream-catalog/) Stream Governance : Stream Governance is a collection of tools and features that provide data governance for data in motion. These include data quality tools such as Schema Registry, schema ID validation, and schema linking; built-in data catalog capabilities to classify, organize, and find event streams across systems; and stream lineage to visualize complex data relationships and uncover insights with interactive, end-to-end maps of event streams. Taken together, these and other governance tools enable teams to manage the availability, integrity, and security of data used across organizations, and help with standardization, monitoring, collaboration, reporting, and more. **Related terms**: *Stream Catalog*, *Stream Lineage* **Related content** - [Stream Governance on Confluent Cloud](/cloud/current/stream-governance/index.html) stream lineage : Stream lineage is the life cycle, or history, of data, including its origins, transformations, and consumption, as it moves through various stages in data pipelines, applications, and systems. Stream lineage provides a record of data’s journey from its source to its destination, and is used to track data quality, data governance, and data security. **Related terms**: **Data Portal**, *Stream Governance* **Related content** - [Stream Lineage on Confluent Cloud](/cloud/current/stream-governance/stream-lineage.html) stream processing : Stream processing is the method of collecting event stream data in real-time as it arrives, transforming the data in real-time using operations (such as filters, joins, and aggregations), and publishing the results to one or more target systems. Stream processing can be used to analyze data continuously, build data pipelines, and process time-sensitive data in real-time. Using the Confluent event streaming platform, event streams can be processed in real-time using Kafka Streams, Kafka Connect, or ksqlDB. Streams API : The Streams API is the Kafka API that allows you to build streaming applications and microservices that transform (for example, filter, group, aggregate, join) incoming event streams in real-time to Kafka topics stored in a Kafka cluster. The Streams API is used by stream processing clients to process data in real-time, analyze data continuously, and build data pipelines. **Related content** - [Introduction Kafka Streams API](/platform/current/streams/introduction.html) throttling : Throttling is the process Kafka clusters in Confluent Cloud use to protect themselves from getting to an over-utilized state. Also known as backpressure, throttling in Confluent Cloud occurs when cluster load reaches 80%. At this point, applications may start seeing higher latencies or timeouts as the cluster must begin throttling requests or connection attempts. topic : A topic is a user-defined category or feed name where event messages are stored and published by producers and subscribed to by consumers. Each topic is a log of event messages. Topics are stored in one or more partitions, which distribute topic records brokers in a Kafka cluster. Each partition is an ordered, immutable sequence of records that are continually appended to a topic. **Related content** - [Manage Topics in Confluent Cloud](/cloud/current/client-apps/topics/index.html) total client connections : In Confluent Cloud, total client connections are a Kafka cluster billing dimension that defines the number of TCP connections to the cluster you can open at one time. Available in the Metrics API as `active_connection_count`. Filter by principal to understand how many connections each application is creating. How many connections a cluster supports can vary widely based on several factors, including number of producer clients, number of consumer clients, partition keying strategy, produce patterns per client, and consume patterns per client. For Dedicated clusters, Confluent derives a guideline for total client connections from benchmarking that indicates exceeding this number of connections increases produce latency for test clients. However, this does not apply to all workloads. That is why total client connections are a guideline, not a hard limit for Dedicated Kafka clusters. Monitor the impact on cluster load as connection count increases, as this is the final representation of the impact of a given workload or CKU dimension on the underlying resources of the cluster. Consider the Confluent guideline a per-CKU guideline. The number of connections tends to increase when you add brokers. In other words, if you significantly exceed the per-CKU guideline, cluster expansion doesn’t always give your cluster more connection count headroom. Transport Layer Security (TLS) : Transport Layer Security (TLS) is a cryptographic protocol that provides secure communication over a network. For more information, see [Wikipedia](https://en.wikipedia.org/wiki/Transport_Layer_Security). unbounded stream : An unbounded stream is a stream of data that is continuously generated in real-time and has no defined end. Examples of unbounded streams include stock prices, sensor data, and social media feeds. Processing unbounded streams requires a different approach than processing bounded streams. Unbounded streams are processed incrementally as data arrives, while bounded streams are processed as a batch after all data has arrived. Kafka Streams and Flink can be used to process unbounded streams. **Related terms**: *bounded stream*, *stream processing*, *unbounded stream* under replication : Under replication is a situation when the number of in-sync replicas is below the number of all replicas. Under Replicated partitions can occur when a broker is down or cannot replicate fast enough from the leader (replica fetcher lag). user account : A user account is an account representing the identity of a person who can be authenticated and granted access to Confluent Cloud resources. **Related content** - [User Accounts for Confluent Cloud](/cloud/current/access-management/identity/user-accounts/overview.html) watermark : A watermark in Flink is a marker that keeps track of time as data is processed. A watermark means that all records until the current moment in time have been “seen”. This way, Flink can correctly perform tasks that depend on when things happened, like calculating aggregations over time windows. **Related content** - [Time and Watermarks](/cloud/current/flink/concepts/timely-stream-processing.html) ## FTPS File Names The FTPS data model is a flat structure. Each record gets stored into a file and the name of each file serves as the unique key. Generally, the hierarchy of files in which records get stored follow this format: ```bash ///++. Admin API : The Admin API is the Kafka REST API that enables administrators to manage and monitor Kafka clusters, topics, brokers, and other Kafka components. Ansible Playbooks for Confluent Platform : Ansible Playbooks for Confluent Platform is a set of Ansible playbooks and roles that are designed to automate the deployment and management of Confluent Platform. Apache Flink : Apache Flink is an open source stream processing framework for stateful computations over unbounded and bounded data streams. Flink provides a unified API for batch and stream processing that supports event-time and out-of-order processing, and supports exactly-once semantics. Flink applications include real-time analytics, data pipelines, and event-driven applications. **Related terms**: *bounded stream*, *data stream*, *stream processing*, *unbounded stream* **Related content** - [Apache Flink: Stream Processing and SQL on Confluent Cloud](/cloud/current/flink/index.html#) - [What is Apache Flink?](https://www.confluent.io/learn/apache-flink/) - [Apache Flink 101 (Confluent Developer course)](https://developer.confluent.io/courses/apache-flink/intro/) Apache Kafka : Apache Kafka is an open source event streaming platform that provides a unified, high-throughput, low-latency, fault-tolerant, scalable, distributed, and secure data streaming platform. Kafka is a publish-and-subscribe messaging system that enables distributed applications to ingest, process, and share data in real-time. **Related content** - [Introduction to Kafka](/kafka/introduction.html) audit log : An audit log is a historical record of actions and operations that are triggered when auditable events occurs. Audit log records can be used to troubleshoot system issues, manage security, and monitor compliance, by tracking administrative activity, data access and modification, monitoring sign-in attempts, and reconstructing security breaches and fraudulent activity. **Related terms**: *auditable event* **Related content** - [Audit Log Concepts for Confluent Cloud](/cloud/current/monitoring/audit-logging/cloud-audit-log-concepts.html) - [Audit Log Concepts for Confluent Platform](/platform/current/security/audit-logs/audit-logs-concepts.html) auditable event : An auditable event is an event that represents an action or operation that can be tracked and monitored for security purposes and compliance. When an auditable event occurs, an auditable event method is triggered and an event message is sent to the audit log cluster and stored as an audit log record. **Related terms**: *audit log*, *event message* **Related content** - [Auditable Events in Confluent Cloud](/cloud/current/monitoring/audit-logging/event-methods/index.html) - [Auditable Events in Confluent Platform](/platform/current/security/audit-logs/auditable-events.html) authentication : Authentication is the process of verifying the identity of a principal that interacts with a system or application. Authentication is often used in conjunction with authorization to determine whether a principal is allowed to access a resource and perform a specific action or operation on that resource. Digital authentication requires one or more of the following: something a principal knows (a password or security question), something a principal has (a security token or key), or something a principal is (a biometric characteristic, such as a fingerprint or voiceprint). Multi-factor authentication (MFA) requires two or more forms of authentication. **Related terms**: *authorization*, *identity*, *identity provider*, *identity pool*, *principal*, *role* authorization : Authorization is the process of evaluating and then granting or denying a principal a set of permissions required to access and perform operations on resources. **Related terms**: *authentication*, *group mapping*, *identity*, *identity provider*, *identity pool*, *principal*, *role* Avro : Avro is a data serialization and exchange framework that provides data structures, remote procedure call (RPC), compact binary data format, a container file, and uses JSON to represent schemas. Avro schemas ensure that every field is properly described and documented for use with serializers and deserializers. You can either send a schema with every message or use Schema Registry to store and receive schemas for use by consumers and producers to save bandwidth and storage space. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [Apache Avro - a data serialization system](https://avro.apache.org/) - Confluent Cloud: [Avro Schema Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-avro.html) - Confluent Platform: [Avro Schema Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-avro.html) Basic Kafka cluster : A Confluent Cloud cluster type. Basic Kafka clusters are designed for experimentation, early development, and basic use cases. batch processing : Batch processing is the method of collecting a large volume of data over a specific time interval, after which the data is processed all at once and loaded into a destination system. Batch processing is often used when processing data can occur independently of the source and timing of the data. It is efficient for non-real-time data processing, such as data warehousing, reporting, and analytics. **Related terms**: *bounded stream*, *stream processing*, *unbounded stream* CIDR block : A CIDR block is a group of IP addresses that are contiguous and can be represented as a single block. CIDR blocks are expressed using Classless Inter-domain Routing (CIDR) notation that includes an IP address and a number of bits in the network mask. **Related content** - [CIDR notation](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing#CIDR_notation) - [Classless Inter-domain Routing (CIDR): The Internet Address Assignment and Aggregation Plan [RFC 4632]](https://www.rfc-editor.org/rfc/rfc4632.html) Cluster Linking : Cluster Linking is a highly performant data replication feature that enables links between Kafka clusters to mirror data from one cluster to another. Cluster Linking creates perfect copies of Kafka topics, which keep data in sync across clusters. Use cases include geo-replication of data, data sharing, migration, disaster recovery, and tiered separation of critical applications. **Related content** - [Geo-replication with Cluster Linking on Confluent Cloud](/cloud/current/multi-cloud/cluster-linking/index.html) - [Cluster Linking for Confluent Platform](/platform/current/multi-dc-deployments/cluster-linking/index.html) commit log : A commit log is a log of all event messages about commits (changes or operations made) sent to a Kafka topic. A commit log ensures that all event messages are processed at least once and provides a mechanism for recovery in the event of a failure. The commit log is also referred to as a write-ahead log (WAL) or a transaction log. **Related terms**: *event message* Confluent Cloud : Confluent Cloud is the fully managed, cloud-native event streaming service powered by Kora, the event streaming platform based on Kafka and extended by Confluent to provide high availability, scalability, elasticity, security, and global interconnectivity. Confluent Cloud offers cost-effective multi-tenant configurations as well as dedicated solutions, if stronger isolation is required. **Related terms**: *Apache Kafka*, *Kora* **Related content** - [Confluent Cloud Overview](/cloud/current/index.html) - [Confluent Cloud](https://www.confluent.io/confluent-cloud/) Confluent Cloud network : A Confluent Cloud network is an abstraction for a single tenant network environment that hosts Dedicated Kafka clusters in Confluent Cloud along with their single tenant services, like ksqlDB clusters and managed connectors. **Related content** - [Confluent Cloud Network Overview](/cloud/current/networking/overview.html#ccloud-networks) Confluent for Kubernetes (CFK) : *Confluent for Kubernetes (CFK)* is a cloud-native control plane for deploying and managing Confluent in private cloud environments through declarative API. Confluent Platform : Confluent Platform is a specialized distribution of Kafka at its core, with additional components for data integration, streaming data pipelines, and stream processing. Confluent REST Proxy : Confluent REST Proxy provides a RESTful interface to an Kafka cluster, making it easy to produce and consume messages, view the state of the cluster, and perform administrative actions without using the native Kafka protocol or clients. **Related content** - Confluent Platform: [REST Proxy](/platform/current/kafka-rest/index.html) Confluent Server : Confluent Server is the default Kafka broker component of Confluent Platform that builds on the foundation of Apache Kafka® and provides enhanced proprietary features designed for enterprise use. Confluent Server is fully compatible with Kafka, and adds Kafka cluster support for Role-Based Access Control, Audit Logs, Schema Validation, Self Balancing Clusters, Tiered Storage, Multi-Region Clusters, and Cluster Linking. **Related terms**: *Confluent Platform*, *Apache Kafka*, *Kafka broker*, *Cluster Linking*, *multi-region cluster (MRC)* Confluent Unit for Kafka (CKU) : Confluent Unit for Kafka (CKU) is a unit of horizontal scaling for Dedicated Kafka clusters in Confluent Cloud that provide preallocated resources. CKUs determine the capacity of a Dedicated Kafka cluster in Confluent Cloud. **Related content** - [CKU limits per cluster](/cloud/current/clusters/cluster-types.html#cku-limits-per-cluster) Connect API : The Connect API is the Kafka API that enables a connector to read event streams from a source system and write to a target system. Connect worker : A Connect worker is a server process that runs a connector and performs the actual work of moving data in and out of Kafka topics. A worker is a server process that runs on hardware independent of the Kafka brokers themselves. It is scalable and fault-tolerant, meaning you can run a cluster of workers that share the load of moving data in and out of Kafka from and to external systems. **Related terms**: *connector*, *Kafka Connect* connection attempts : In Confluent Cloud, connection attempts are a Kafka cluster billing dimension that defines the maximum number of new TCP connections to the cluster you can create in one second. This includes successful and unsuccessful authentication attempts. Available in the Metrics API as `successful_authentication_count` (only includes successful authentications, not unsuccessful authentication attempts). To reduce usage on connection attempts, use longer-lived connections to the cluster. If you exceed the maximum, connection attempts may be refused. connector : A connector is an abstract mechanism that enables communication, coordination, or cooperation among components by transferring data elements from one interface to another without changing the data. connector offset : Connector offset uniquely identifies the position of a connector as it processes data. Connectors use a variety of strategies to implement the connector offset, including everything from monotonically increasing integers to replay ids, lists of files, timestamps and even checkpoint information. Connector offsets keep track of already-processed data in the event of a connector restart or recovery. While sink connectors use a pattern for connector offsets similar to the offset mechanism used throughout Kafka, the implementation details for source connectors are often much different. This is because source connectors track the progress of a source system as it process data. consumer : A consumer is a Kafka client application that subscribes to (reads and processes) event messages from a Kafka topic. The Streams API and the Consumer API are the two APIs that enable consumers to read event streams from Kafka topics. **Related terms**: *Consumer API*, *consumer group*, *producer*, *Streams API* Consumer API : The Consumer API is the Kafka API used for consuming (reading) event messages or records from Kafka topics and enables a Kafka consumer to subscribe to a topic and read event messages as they arrive. Batch processing is a common use case for the Consumer API. consumer group : A consumer group is a single logical consumer implemented with multiple physical consumers for reasons of throughput and resilience. By dividing topics among consumers in the group into partitions, consumers in the group can process messages in parallel, increasing message throughput and enabling load balancing. **Related terms**: *consumer*, *partition*, *partition*, *producer*, *topic* consumer lag : Consumer lag is the number of consumer offsets between the latest message produced in a partition and the last message consumed by a consumer, that is the number of messages pending to be consumed from a particular partition. A large consumer lag, or a quickly growing lag, indicates that the consumer is unable to read from a partition as fast as the messages are available. This can be caused by a slow consumer, slow network, or slow broker. consumer offset : Consumer offset is the unique and monotonically increasing integer value that uniquely identifies the position of an event record in a partition. Consumers use offsets to track their current position in the Kafka topic, allowing consumers to resume processing from where they left off. Offsets are stored on the Kafka broker, which does not track which records have been read and which have not. It is up to the consumer connection to track this information. When a consumer acknowledges receiving and processing a message, it commits an offset value that is stored in the special internal topic `__commit_offsets`. cross-resource RBAC role binding : A cross-resource RBAC role binding is a role binding in Confluent Cloud that is applied at the Organization or Environment scope and grants access to multiple resources. For example, assigning a principal the NetworkAdmin role at the Organization scope lets them administer all networks across all Environments in their Organization. **Related terms**: *identity pool*, *principal*, *role*, *role binding* CRUD : CRUD is an acronym for the four basic operations that can be performed on data: Create, Read, Update, and Delete. custom connector : A custom connector is a connector created using Connect plugins uploaded to Confluent Cloud by users. This includes connector plugins that are built from scratch, modified open-source connector plugins, or third-party connector plugins. data at rest : Data at rest is data that is physically stored on non-volatile media (such as hard drives, solid-state drives, or other storage devices) and is not actively being transmitted or processed by a system. data contract : A data contract is a formal agreement between an upstream component and a downstream component on the structure and semantics of data that is in motion. A schema is a key element of a data contract. The schema, metadata, rules, policies, and evolution plan form the data contract. You can associate data contracts (schemas and more) with [topics](#term-Kafka-topic). **Related content** - Confluent Platform: [Data Contracts for Schema Registry on Confluent Platform](/platform/current/schema-registry/fundamentals/data-contracts.html) - Confluent Cloud: [Data Contracts for Schema Registry on Confluent Cloud](/cloud/current/sr/fundamentals/data-contracts.html) - Cloud Console: [Manage Schemas in Confluent Cloud](/cloud/current//sr/schemas-manage.html) data encryption key (DEK) : A data encryption key (DEK) is a symmetric key that is used to encrypt and decrypt data. The DEK is used in client-side field level encryption (CSFLE) to encrypt sensitive data. The DEK is itself encrypted using a key encryption key (KEK) that is only accessible to authorized users. The encrypted DEK and encrypted data are stored together. Only users with access to the KEK can decrypt the DEK and access the sensitive data. **Related terms**: *envelope encryption*, *key encryption key (KEK)* data in motion : Data in motion is data that is actively being transferred between source and destination, typically systems, devices, or networks. Data in motion is also referred to as data in transit or data in flight. data in use : Data in use is data that is actively being processed or manipulated in memory (RAM, CPU caches, or CPU registers). data ingestion : Data ingestion is the process of collecting, importing, and integrating data from various sources into a system for further processing, analysis, or storage. data mapping : Data mapping is the process of defining relationships or associations between source data elements and target data elements. Data mapping is an important process in data integration, data migration, and data transformation, ensuring that data is accurately and consistently represented when it is moved or combined. data pipeline : A data pipeline is a series of processes and systems that enable the flow of data from sources to destinations, automating the movement and transformation of data for various purposes, such as analytics, reporting, or machine learning. A data pipeline typically comprised of a source system, a data ingestion tool, a data transformation tool, and a target system. A data pipeline covers the following stages: data extraction, data transformation, data loading, and data validation. Data Portal : Data Portal is a Confluent Cloud application that uses Stream Catalog and Stream Lineage to provide self-service access throughout Confluent Cloud Console for data practitioners to search and discover existing topics using tags and business metadata, request access to topics and data, and access data in topics to to build streaming applications and data pipelines. Leverages Stream Catalog and Stream Lineage to provide a data-centric view of Confluent optimized for self-service access to data where users can search, discover and understand available data, request access to data, and use data. **Related terms**: *Stream Catalog*, *Stream Lineage* **Related content** - [Data Portal on Confluent Cloud](/cloud/current/stream-governance/data-portal.html) data serialization : Data serialization is the process of converting data structures or objects into a format that can be stored or transmitted, and reconstructed later in the same or another computer environment. Data serialization is a common technique for implementing data persistence, interprocess communication, and object communication. Confluent Schema Registry (in Confluent Platform) and Confluent Cloud Schema Registry support data serialization using serializers and deserializers for the following formats: Avro, JSON Schema, and Protobuf. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) data steward : A data steward is a person with data-related responsibilities, such as data governance, data quality, and data security. data stream : A data stream is a continuous flow of data records that are produced and consumed by applications. dead letter queue (DLQ) : A dead letter queue (DLQ) is a queue where messages that could not be processed successfully by a sink connector are placed. Instead of stopping, the sink connector sends messages that could not be written successfully as event records to the DLQ topic while the sink connector continues processing messages. Dedicated Kafka cluster : A Confluent Cloud cluster type. Dedicated Kafka clusters are designed for critical production workloads with high traffic or private networking requirements. deserializer : A deserializer is a tool that converts a serial byte stream back into objects and parallel data. Deserializers work with serializers (known together as Serdes) to support efficient storage and high-speed data transmission over the wire. Confluent provides Serdes for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) egress : In general networking, egress refers to outbound traffic leaving a network or a specific network segment. In Confluent Cloud, egress is a Kafka cluster billing dimension that defines the number of bytes consumed from the cluster in one second. Available in the Metrics API as `sent_bytes` (convert from bytes to MB). To reduce egress in Confluent Cloud, compress your messages and ensure each consumer is only consuming from the topics it requires. For compression, use lz4. Avoid gzip because of high overhead on the cluster. Elastic Confluent Unit for Kafka (eCKU) : Elastic Confluent Unit for Kafka (eCKU) is used to express capacity for Basic, Standard, Enterprise, and Freight Kafka clusters. These clusters automatically scale up to a fixed ceiling. There is no need to resize these type clusters. When you need more capacity, your cluster expands up to the fixed ceiling. If you’re not using capacity above the minimum, you’re not paying for it. ELT : ELT is an acronym for Extract-Load-Transform, where data is extracted from a source system and loaded into a target system before processing or transformation. Compared to ETL, ELT is a more flexible approach to data ingestion because the data is loaded into the target system before transformation. Enterprise Kafka cluster : A Confluent Cloud cluster type. Enterprise Kafka clusters are designed for production-ready functionality that requires private endpoint networking capabilities. envelope encryption : Envelope encryption is a cryptographic technique that uses two keys to encrypt data. The symmetric data encryption key (DEK) is used to encrypt sensitive data. The separate asymmetric key encryption key (KEK) is the master key used to encrypt the DEK. The DEK and encrypted data are stored together. Only users with access to the KEK can decrypt the DEK and access the sensitive data. In Confluent Cloud, envelope encryption is used to enable client-side field level encryption (CSFLE). CSFLE encrypts sensitive data in a message before it is sent to Confluent Cloud and allows for temporary decryption of sensitive data when required to perform operations on the data. **Related terms**: *data encryption key (DEK)*, *key encryption key (KEK)* ETL : ETL is an acronym for Extract-Transform-Load, where data is extracted from a source system, transformed into a target format, and loaded into a target system. Compared to ELT, ETL is a more rigid approach to data ingestion because the data is transformed before loading into the target system. event : An event is a meaningful action or occurrence of something that happened. Events that can be recognized by a program, either human-generated or triggered by software, can be recorded in a log file or other data store. **Related terms**: *event message*, *event record*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event message : An event message is a record of an event sent to a Kafka topic, represented as a key-value pair. Each event message consists of a key-value pair, a timestamp, the compression type, headers for metadata (optional), and a partition and offset ID (once the message is written). The key is optional and can be used to identify the event. The value is required and contains details about the event that happened. **Related terms**: *event*, *event record*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event record : An event record is the record of an event stored in a Kafka topic. Event records are organized and durably stored in topics. Examples of events include orders, payments, activities, or measurements. An event typically contains one or more data fields that describe the fact, as well as a timestamp that denotes when the event was created by its event source. The event may also contain various metadata, such as its source of origin (for example, the application or cloud service that created the event) and storage-level information (for example, its position in the event stream). **Related terms**: *event*, *event message*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event sink : An event sink is a consumer of events, which can include applications, cloud services, databases, IoT sensors, and more. **Related terms**: *event*, *event message, \*event record*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event source : An event source is a producer of events, which can include cloud services, databases, IoT sensors, mainframes, and more. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event stream*, *event streaming*, *event streaming platform*, *event time* event stream : An event stream is a continuous flow of event messages produced by an event source and consumed by one or more consumers. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event streaming platform*, *event time* event streaming : Event streaming is the practice of capturing event data in real-time from data sources. Event streaming is a form of data streaming that is used to capture, store, process, and react to data in real-time or retrospectively. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming platform*, *event time* event streaming platform : An event streaming platform is a platform that events can be written to once, allowing distributed functions within an organization to react in realtime. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event time* event time : Event time is the time when an event occurred on the producing device, as opposed to the time when the event was processed or recorded. Event time is often used in stream processing to determine the order of events and to perform windowing operations. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event streaming platform* exactly-once semantics : Exactly-once semantics is a guarantee that a message is delivered exactly once and in the order that it was sent. Even if a producer retries sending a message, or a consumer retries processing a message, the message is delivered exactly once. This guarantee is achieved by the broker assigning a unique ID to each message and storing the ID in the consumer offset. The consumer offset is committed to the broker only after the message is processed. If the consumer fails to process the message, the message is redelivered and processed again. Freight Kafka cluster : A Confluent Cloud cluster type. Freight Kafka clusters are designed for high-throughput, relaxed latency workloads that are less expensive than self-managed open source Kafka. granularity : Granularity is the degree or level of detail to which an entity (a system, service, or resource) is broken down into subcomponents, parts, or elements. Entities that are *fine-grained* have a higher level of detail, while *coarse-grained* entities have a reduced level of detail, often combining finer parts into a larger whole. In the context of access control, granular permissions provide precise control over resource access. They allow administrators to grant specific operations on distinct resources. This ensures users only have permissions tailored to their needs, minimizing unnecessary or potentially risky access. group mapping : Group mapping is a set of rules that map groups in your SSO identity provider to Confluent Cloud RBAC roles. When a user signs in to Confluent Cloud using SSO, Confluent Cloud uses the group mapping to grant access to Confluent Cloud resources. **Related terms**: *identity provider*, *identity pool*, *principal*, *role* **Related content** - [Group Mapping for Confluent Cloud](/cloud/current/access-management/authenticate/sso/group-mapping/overview.html) identity : An identity is a unique identifier that is used to authenticate and authorize users and applications to access resources. Identity is often used in conjunction with access control to determine whether a user or application is allowed to access a resource and perform a specific action or operation on that resource. **Related terms**: *identity provider*, *identity pool*, *principal*, *role* identity pool : An identity pool is a collection of identities that can be used to authenticate and authorize users and applications to access resources. Identity pools are used to manage permissions for users and applications that access resources in Confluent Cloud. They are also used to manage permissions for Confluent Cloud service accounts that are used to access resources in Confluent Cloud. identity provider : An identity provider is a trusted provider that authenticates users and issues security tokens that are used to verify the identity of a user. Identity providers are often used in single sign-on (SSO) scenarios, where a user can log in to multiple applications or services with a single set of credentials. Infinite Storage : Infinite Storage is the Confluent Cloud storage service that enhances the scalability of Confluent Cloud resources by separating storage and processing. Tiered storage within Confluent Cloud moves data between storage layers based on the needs of the workload, retrieves tiered data when requested, and garbage collects data that is past retention or otherwise deleted. If an application reads historical data, latency is not increased for other applications reading more recent data. Storage resources are decoupled from compute resources, you only pay for what you produce to Confluent Cloud and for storage that you use, and CKUs do not have storage limits. Related content: - [Infinite Storage in Confluent Cloud for Apache Kafka](https://www.confluent.io/blog/infinite-kafka-data-storage-in-confluent-cloud/) ingress : In general networking, ingress refers to traffic that enters a network from an external source. In Confluent Cloud, ingress is a Kafka cluster billing dimension that defines the number of bytes produced to the cluster in one second. Available in the Metrics API as `received_bytes` (convert from bytes to MB). To reduce ingress in Confluent Cloud, compress your messages. For compression, use lz4. Avoid gzip because of high overhead on the cluster. internal topic : An internal topic is a topic, prefixed with double underscores (“_\_”), that is automatically created by a Kafka component to store metadata about the broker, partition assignment, consumer offsets, and other information. Examples of internal topics: `__cluster_metadata`, `__consumer_offsets`, `__transaction_state`, `__confluent.support.metrics`, and `__confluent.support.metrics-raw`. JSON Schema : JSON Schema is a declarative language used for data serialization and exchange to define data structures, specify formats, and validate JSON documents. It is a way to encode expected data types, properties, and constraints to ensure that all fields are properly described for use with serializers and deserializers. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [JSON Schema - a declarative language that allows you to annotate and validate JSON documents.](https://json-schema.org/) - Confluent Cloud: [JSON Schema Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-avro.html) - Confluent Platform: [JSON Schema Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-json.html) Kafka bootstrap server : A Kafka bootstrap server is a Kafka broker that a Kafka client initiates a connection to a Kafka cluster and returns metadata, which includes the addresses for all of the brokers in the Kafka cluster. Although only one bootstrap server is required to connect to a Kafka cluster, multiple brokers can be specified in a bootstrap server list to provide high availability and fault tolerance in case a broker is unavailable. In Confluent Cloud, the bootstrap server is the general cluster endpoint. Kafka broker : A Kafka broker is a server in the Kafka storage layer that stores event streams from one or more sources. A Kafka cluster is typically comprised of several brokers. Every broker in a cluster is also a bootstrap server, meaning if you can connect to one broker in a cluster, you can connect to every broker. Kafka client : A Kafka client allows you to write distributed applications and microservices that read, write, and process streams of events in parallel, at scale, and in a fault-tolerant manner, even in the case of network problems or machine failures. The Kafka client library provides functions, classes, and utilities that allow developers to create Kafka producer clients (Producers) and consumer clients (Consumers) using various programming languages. The primary way to build production-ready Producers and Consumers is by using your preferred programming language and a Kafka client library. **Related content** - [Build Client Applications for Confluent Cloud](/cloud/current/client-apps/overview.html) - [Build Client Applications for Confluent Platform](/platform/current/clients/index.html) - [Getting Started with Apache Kafka and Java (or Python, Go, .Net, and others)](https://developer.confluent.io/get-started/java/) Kafka cluster : A Kafka cluster is a group of interconnected Kafka brokers that manage and distribute real-time data streaming, processing, and storage as if they are a single system. By distributing tasks and services across multiple Kafka brokers, the Kafka cluster improves availability, reliability, and performance. Kafka Connect : Kafka Connect is the component of Kafka that provides data integration between databases, key-value stores, search indexes, file systems, and Kafka brokers. Kafka Connect is an ecosystem of a client application and pluggable connectors. As a client application, Connect is a server process that runs on hardware independent of the Kafka brokers themselves. It is scalable and fault-tolerant, meaning you can run a cluster of Connect workers that share the load of moving data in and out of Kafka from and to external systems. Connect also abstracts the business of code away from the user and instead requires only JSON configuration to run. **Related content** - Confluent Cloud: [Kafka Connect](/cloud/current/billing/overview.html#kconnect-long) - Confluent Platform: [Kafka Connect](/platform/current/connect/index.html) Kafka controller : A Kafka controller is the node in a Kafka cluster that is responsible for managing and changing the metadata of the cluster. This node also communicates metadata changes to the rest of the cluster. When Kafka uses ZooKeeper for metadata management, the controller is a broker, and the broker persists the metadata to ZooKeeper for backup and recovery. With KRaft, you dedicate Kafka nodes to operate as controllers and the metadata is stored in Kafka itself and not persisted to ZooKeeper. KRaft enables faster recovery because of this. For more information, see [KRaft overview](/platform/current/kafka-metadata/kraft.html). Kafka listener : A Kafka listener is an endpoint that Kafka brokers bind to use to communicate with clients. For Kafka clusters, Kafka listeners are configured in the `listeners` property of the `server.properties` file. Advertised listeners are publicly accessible endpoints that are used by clients to connect to the Kafka cluster. **Related content** - [Kafka Listeners – Explained](https://www.confluent.io/blog/kafka-listeners-explained/) Kafka metadata : Kafka metadata is the information about the Kafka cluster and the topics that are stored in it. This information includes details such as the brokers in the cluster, the topics that are available, the partitions for each topic, and the location of the leader for each partition. Kafka metadata is used by clients to discover the available brokers and topics, and to determine which broker is the leader for a particular partition. This information is essential for clients to be able to send and receive messages to and from Kafka. Kafka Streams : Kafka Streams is a stream processing library for building streaming applications and microservices that transform (filter, group mapping, aggregate, join, and more) incoming event streams in real-time to Kafka topics stored in an Kafka cluster. The Streams API can be used to build applications that process data in real-time, analyze data continuously, and build data pipelines. **Related content** - [Kafka Streams](/platform/current/streams/overview.html) Kafka topic : See *topic*. key encryption key (KEK) : A key encryption key (KEK) is a master key that is used to encrypt and a decrypt other keys, specifically the data encryption key (DEK). Only users with access to the KEK can decrypt the DEK and access the sensitive data. **Related terms**: *data encryption key (DEK)*, *envelope encryption*. Kora : Kora is the cloud-native streaming data service based on Kafka technology that powers the Confluent Cloud event streaming platform for building real-time data pipelines and streaming applications. Kora abstracts low-level resources, such as Kafka brokers, and hides operational complexities, such as system upgrades. Kora is built on the following foundations: a tiered storage layer that improves cost and performance, elasticity and consistent performance through incremental load balancing, cost effective multi-tenancy with dynamic quota management and cell-based isolation, continuous monitoring of both system health and data integrity, and clean abstraction with standard Kafka protocols and CKUs to hide underlying resources. **Related terms**: *Apache Kafka*, *Confluent Cloud*, *Confluent Unit for Kafka (CKU)* **Related content** - [Kora: The Cloud Native Engine for Apache Kafka](https://www.confluent.io/blog/cloud-native-data-streaming-kafka-engine/) - [Kora: A Cloud-Native Event Streaming Platform For Kafka](https://www.vldb.org/pvldb/vol16/p3822-povzner.pdf) KRaft : KRaft (or Apache Kafka Raft) is a consensus protocol introduced in Kafka 2.4 to provide metadata management for Kafka with the goal to replace ZooKeeper. KRaft simplifies Kafka because it enables the management of metadata in Kafka itself, rather than splitting it between ZooKeeper and Kafka. As of Confluent Platform 7.5, KRaft is the default method of metadata management in new deployments. For more information, see [KRaft overview](/platform/current/kafka-metadata/kraft.html). ksqlDB : ksqlDB is a streaming SQL database engine purpose-built for creating stream processing applications on top of Kafka. logical Kafka cluster (LKC) : A logical Kafka cluster (LKC) is a subset of a physical Kafka cluster (PKC) that is isolated from other logical clusters within Confluent Cloud. Each logical unit of isolation is considered a tenant and maps to a specific organization. If the mapping is one-to-one, one LKC maps to one PKC (a Dedicated cluster). If the mapping is many-to-one, one LKC maps to one of the multitenant Kafka cluster types (Basic, Standard, Enterprise, and Freight). **Related terms**: *Confluent Cloud*, *Kafka cluster*, *physical Kafka cluster (PKC)* **Related content** - [Kafka Cluster Types in Confluent Cloud](/cloud/current/clusters/cluster-types.html) - [Multi-tenancy and Client Quotas on Confluent Cloud](/cloud/current/clusters/client-quotas.html) multi-region cluster (MRC) : A multi-region cluster (MRC) is a single Kafka cluster that replicates data between datacenters across regional availability zones. multi-tenancy : Multi-tenancy is a software architecture in which a single physical instance is shared among multiple logical instances, or tenants. In Confluent Cloud, each Basic, Standard, Enterprise, and Freight cluster is a logical Kafka cluster (LKC) that shares a physical Kafka cluster (PKC) with other tenants. Each LKC is isolated from other L and has its own resources, such as memory, compute, and storage. **Related terms**: *Confluent Cloud*, *logical Kafka cluster (LKC)*, *physical Kafka cluster (PKC)* **Related content** * [From On-Prem to Cloud-Native: Multi-Tenancy in Confluent Cloud](https://www.confluent.io/blog/cloud-native-multi-tenant-kafka-with-confluent-cloud/) * [Multi-tenancy and Client Quotas on Confluent Cloud](/cloud/current/clusters/client-quotas.html) offset : An offset is an integer assigned to each message that uniquely represents its position within the data stream, guaranteeing the ordering of records and allowing offset-based connections to replay messages from any point in time. **Related terms**: *consumer offset*, *connector offset*, *offset commit*, *replayability* offset commit : An offset commit is the process of keeping track of the current position of an offset-based connection (primarily Kafka consumers and connectors) within the data stream. The offset commit process is not specific to consumers, producers, or connectors. It is a general mechanism in Kafka to track the position of any application that is reading data. When a consumer commits an offset, the offset identifies the next message the consumer should consume. For example, if a consumer has an offset of 5, it has consumed messages 0 through 4 and will next consume message 5. If the consumer crashes or is shut down, its partitions are reassigned to another consumer which initiates consuming from the last committed offset of each partition. The committed offset for consumers is stored on a Kafka broker. When a consumer commits an offset, it sends a commit request to the Kafka cluster, specifying the partition and offset it wants to commit for a particular consumer group. The Kafka broker receiving the commit request then stores this offset in the `__consumer_offsets` internal topic. **Related terms**: *consumer offset*, *offset* OpenSSL : OpenSSL is an open-source software library and toolkit that implements the Secure Sockets Layer (SSL) and Transport Layer Security (TLS) protocols. For more information, see [Wikipedia](https://en.wikipedia.org/wiki/OpenSSL). parent cluster : The Kafka cluster that a resource belongs to. **Related terms**: *Kafka cluster* partition : A partition is a unit of data storage that divides a topic into multiple, parallel event streams, each of which is stored on separate Kafka brokers and can be consumed independently. Partitioning is a key concept in Kafka because it allows Kafka to scale horizontally by adding more brokers to the cluster. Partitions are also the unit of parallelism in Kafka. A topic can have one or more partitions, and each partition is an ordered, immutable sequence of event records that is continually appended to a partition log. partitions (pre-replication) : In Confluent Cloud, partitions are a Kafka cluster billing dimension that define the maximum number of partitions that can exist on the cluster at one time, before replication. While you are not charged for partitions on any type of Kafka cluster, the number of partitions you use has an impact on eCKU. To determine eCKUs limits for partitions, Confluent Cloud bills only for pre-replication (leader partitions) across a cluster. All topics that you create (as well as internal topics that are automatically created by Confluent Platform components such as ksqlDB, Kafka Streams, Connect, and Control Center (Legacy)) count towards the cluster partition limit. Confluent prefixes topics created automatically with an underscore (_). Topics that are internal to Kafka itself (consumer offsets) are not visible in Cloud Console and do not count against partition limits or toward partition billing. Available in the Metrics API as `partition_count`. In Confluent Cloud, attempts to create additional partitions beyond the cluster limit fail with an error message. To reduce usage on partitions (pre-replication), delete unused topics and create new topics with fewer partitions. Use the Kafka Admin interface to increase the partition count of an existing topic if the initial partition count is too low. physical Kafka cluster (PKC) : A physical Kafka cluster (PKC) is a Kafka cluster comprised of multiple brokers. Each physical Kafka cluster is created on a Kubernetes cluster by the control plane. A PKC is not directly accessible by clients. principal : A principal is an entity that can be authenticated and granted permissions based on roles to access resources and perform operations. An entity can be a user account, service account, group mapping, or identity pool. **Related terms**: *group mapping*, *identity*, *identity pool*, *role*, *service account*, *user account* private internet : A private internet is a closed, restricted computer network typically used by organizations to provide secure environments for managing sensitive data and resources. processing time : Processing time is the time when an event is processed or recorded by a system, as opposed to the time when the event occurred on the producing device. Processing time is often used in stream processing to determine the order of events and to perform windowing operations. producer : A producer is a client application that publishes (writes) data to a topic in an Kafka cluster. Producers write data to a topic and are the only clients that can write data to a topic. Each record written to a topic is appended to the partition of the topic that is selected by the producer. Producer API : The Producer API is the Kafka API that allows you to write data to a topic in an Kafka cluster. The Producer API is used by producer clients to publish data to a topic in an Kafka cluster. Protobuf : Protobuf (or Protocol Buffers) is an open-source data format used to serialize structured data for storage. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [Protocol Buffers](https://protobuf.dev/) - [Getting Started with Protobuf in Confluent Cloud](https://www.confluent.io/blog/using-protobuf-in-confluent-cloud/) - Confluent Cloud: [Protobuf Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-protobuf.html) - Confluent Platform: [Protobuf Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-protobuf.html) public internet : The public internet is the global system of interconnected computers and networks that use TCP/IP to communicate with each other. rebalancing : Rebalancing is the process of redistributing the partitions of a topic among the consumers of a consumer group for improved performance and scalability. A rebalance can occur if a consumer has failed the heartbeat and has been excluded from the group, it voluntarily left the group, metadata has been updated for a consumer, or a consumer has joined the group. replayability : Replayability is the ability to replay messages from any point in time. **Related terms**: *consumer offset*, *offset*, *offset commit* replication : Replication is the process of creating and maintaining multiple copies (or *replicas*) of data across different nodes in a distributed system to increase availability, reliability, redundancy, and accessibility. replication factor : A replication factor is the number of copies of a partition that are distributed across the brokers in a cluster. requests : In Confluent Cloud, requests are a Kafka cluster billing dimension that defines the number of client requests to the cluster in one second. Available in the Metrics API as `request_count`. To reduce usage on requests, you can adjust producer batching configurations, consumer client batching configurations, and shut down otherwise inactive clients. For Dedicated clusters, a high number of requests per second results in increased load on the cluster. role : A role is a Confluent-defined job function assigned a set of permissions required to perform specific actions or operations on Confluent resources bound to a principal and Confluent resources. A role can be assigned to a user account, group mapping, service account, or identity pool. **Related terms**: *group mapping*, *identity*, *identity pool*, *principal*, *service account* **Related content** - [Predefined RBAC Roles in Confluent Cloud](/cloud/current/access-management/access-control/rbac/predefined-rbac-roles.html) - [Role-Based Access Control Predefined Roles in Confluent Platform](/platform/current/security/rbac/rbac-predefined-roles.html) rolling restart : A rolling restart restarts the brokers in a Kafka cluster with zero downtime by incrementally restarting a Kafka broker after verifying that there are no under-replicated partitions on the broker before proceeding to the next broker. Restarting the brokers one at a time allows for software upgrades, broker configuration updates, or cluster maintenance while maintaining high availability by avoiding downtime. **Related content** - [Rolling restart](/platform/current/kafka/post-deployment.html#rolling-restart) schema : A schema is the structured definition or blueprint used to describe the format and structure event messages sent through the Kafka event streaming platform. Schemas are used to validate the structure of data in event messages and ensures that producers and consumers are sending and receiving data in the same format. Schemas are defined in the Schema Registry. Schema Registry : Schema Registry is a centralized repository for managing and validating schemas for topic message data that stores and manages schemas for Kafka topics. Schema Registry is built into Confluent Cloud as a managed service, available with the Advanced Stream Governance package, and offered as part of Confluent Enterprise for self-managed deployments. The Schema Registry is a RESTful service that stores and manages schemas for Kafka topics. The Schema Registry is integrated with Kafka and Connect to provide a central location for managing schemas and validating data. Producers and consumers to Kafka topics use schemas to ensure data consistency and compatibility as schemas evolve. Schema Registry is a key component of Stream Governance. **Related content** - Confluent Cloud: [Manage Schemas in Confluent Cloud](/cloud/current/sr/schemas-manage.html) - Confluent Platform: [Schema Registry Overview](/platform/current/schema-registry/index.html) schema subject : A schema subject is the namespace for a schema in Schema Registry. This unique identifier defines a logical grouping of related schemas. Kafka topics contain event messages serialized and deserialized using the structure and rules defined in a schema subject. This ensures compatibility and supports schema evolution. **Related content** - Confluent Cloud: [Manage Schemas in Confluent Cloud](/cloud/current/sr/schemas-manage.html) - Confluent Platform: [Schema Registry Concepts](/platform/current/schema-registry/index.html) - [Understanding Schema Subjects](https://developer.confluent.io/courses/schema-registry/schema-subjects/) Serdes : Serdes are serializers and deserializers that convert objects and parallel data into a serial byte stream for efficient storage and high-speed data transmission over the wire. Confluent provides Serdes for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) - [Serde](https://serde.rs/) serializer : A serializer is a tool that converts objects and parallel data into a serial byte stream. Serializers work with deserializers (known together as Serdes) to support efficient storage and high-speed data transmission over the wire. Confluent provides serializers for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) service account : A service account is a non-person entity used by an application or service to access resources and perform operations. Because a service account is an identity independent of the user who created it, it can be used programmatically to authenticate to resources and perform operations without the need for a user to be signed in. **Related content** - [Service Accounts for Confluent Cloud](/cloud/current/access-management/identity/service-accounts.html) service quota : A service quota is the limit, or maximum value, for a specific Confluent Cloud resource or operation that might vary by the resource scope it applies to. **Related content** - [Service Quotas for Confluent Cloud](/cloud/current/quotas/index.html) single message transform (SMT) : A single message transform (SMT) is a transformation or operation applied in realtime on an individual message that changes the values, keys, or headers of a message before being sent to a sink connector or after being read from a source connector. SMTs are convenient for inserting fields, masking information, event routing, and other minor data adjustments. single sign-on (SSO) : Single sign-on (SSO) is a centralized authentication service that allows users to use a single set of credentials to log in to multiple applications or services. **Related terms**: *authentication*, *group mapping*, *identity provider* **Related content** - [Single Sign-On for Confluent Cloud](/cloud/current/access-management/authenticate/sso/index.html) sink connector : A sink connector is a Kafka Connect connector that publishes (writes) data from a Kafka topic to an external system. source connector : A source connector is a Kafka Connect connector that subscribes (reads) data from a source (external system), extracts the payload and schema of the data, and publishes (writes) the data to Kafka topics. standalone : Standalone refers to a configuration in which a software application, system, or service operates independently on a single instance or device. This mode is commonly used for development, testing, and debugging purposes. For Kafka Connect, a standalone worker is a single process responsible for running all connectors and tasks on a single instance. Standard Kafka cluster : A Confluent Cloud cluster type. Standard Kafka clusters are designed for production-ready features and functionality. static egress IP address : A static egress IP address is an IP address used by a Confluent Cloud managed connector to establish outbound connections to endpoints of external data sources and sinks over the public internet. **Related content** - [Use Static IP Addresses on Confluent Cloud for Connectors and Cluster Linking](/cloud/current/networking/static-egress-ip-addresses.html) - [Static Egress IP Addresses for Confluent Cloud Connectors](/cloud/current/connectors/static-egress-ip.html) storage (pre-replication) : In Confluent Cloud, storage is a Kafka cluster billing dimension that defines the number of bytes retained on the cluster, pre-replication. Available in the Metrics API as `retained_bytes` (convert from bytes to TB). The returned value is pre-replication. Standard, Enterprise, Dedicated, and Freight clusters support Infinite Storage. This means there is no maximum size limit for the amount of data that can be stored on the cluster. You can configure policy settings `retention.bytes` and `retention.ms` at the topic level to control exactly how much and how long to retain data in a way that makes sense for your applications and helps control your costs. To reduce storage in Confluent Cloud, compress your messages and reduce retention settings. For compression, use lz4. Avoid gzip because of high overhead on the cluster. Stream Catalog : Stream Catalog is a pillar of Confluent Cloud Stream Governance that provides a centralized inventory of your organization’s data assets that supports data governance and data discovery. With Data Portal in Confluent Cloud Console, users can find event streams across systems, search topics by name or tags, and enrich event data to increase value and usefulness. REST and GraphQL APIs can be used to search schemas, apply tags to records or fields, manage business metadata, and discover relationships across data assets. **Related content** - [Stream Catalog on Confluent Cloud: User Guide to Manage Tags and Metadata](/cloud/current/stream-governance/stream-catalog.html) - [Stream Catalog in Streaming Data Governance (Confluent Developer course)](https://developer.confluent.io/courses/governing-data-streams/stream-catalog/) Stream Governance : Stream Governance is a collection of tools and features that provide data governance for data in motion. These include data quality tools such as Schema Registry, schema ID validation, and schema linking; built-in data catalog capabilities to classify, organize, and find event streams across systems; and stream lineage to visualize complex data relationships and uncover insights with interactive, end-to-end maps of event streams. Taken together, these and other governance tools enable teams to manage the availability, integrity, and security of data used across organizations, and help with standardization, monitoring, collaboration, reporting, and more. **Related terms**: *Stream Catalog*, *Stream Lineage* **Related content** - [Stream Governance on Confluent Cloud](/cloud/current/stream-governance/index.html) stream lineage : Stream lineage is the life cycle, or history, of data, including its origins, transformations, and consumption, as it moves through various stages in data pipelines, applications, and systems. Stream lineage provides a record of data’s journey from its source to its destination, and is used to track data quality, data governance, and data security. **Related terms**: **Data Portal**, *Stream Governance* **Related content** - [Stream Lineage on Confluent Cloud](/cloud/current/stream-governance/stream-lineage.html) stream processing : Stream processing is the method of collecting event stream data in real-time as it arrives, transforming the data in real-time using operations (such as filters, joins, and aggregations), and publishing the results to one or more target systems. Stream processing can be used to analyze data continuously, build data pipelines, and process time-sensitive data in real-time. Using the Confluent event streaming platform, event streams can be processed in real-time using Kafka Streams, Kafka Connect, or ksqlDB. Streams API : The Streams API is the Kafka API that allows you to build streaming applications and microservices that transform (for example, filter, group, aggregate, join) incoming event streams in real-time to Kafka topics stored in a Kafka cluster. The Streams API is used by stream processing clients to process data in real-time, analyze data continuously, and build data pipelines. **Related content** - [Introduction Kafka Streams API](/platform/current/streams/introduction.html) throttling : Throttling is the process Kafka clusters in Confluent Cloud use to protect themselves from getting to an over-utilized state. Also known as backpressure, throttling in Confluent Cloud occurs when cluster load reaches 80%. At this point, applications may start seeing higher latencies or timeouts as the cluster must begin throttling requests or connection attempts. topic : A topic is a user-defined category or feed name where event messages are stored and published by producers and subscribed to by consumers. Each topic is a log of event messages. Topics are stored in one or more partitions, which distribute topic records brokers in a Kafka cluster. Each partition is an ordered, immutable sequence of records that are continually appended to a topic. **Related content** - [Manage Topics in Confluent Cloud](/cloud/current/client-apps/topics/index.html) total client connections : In Confluent Cloud, total client connections are a Kafka cluster billing dimension that defines the number of TCP connections to the cluster you can open at one time. Available in the Metrics API as `active_connection_count`. Filter by principal to understand how many connections each application is creating. How many connections a cluster supports can vary widely based on several factors, including number of producer clients, number of consumer clients, partition keying strategy, produce patterns per client, and consume patterns per client. For Dedicated clusters, Confluent derives a guideline for total client connections from benchmarking that indicates exceeding this number of connections increases produce latency for test clients. However, this does not apply to all workloads. That is why total client connections are a guideline, not a hard limit for Dedicated Kafka clusters. Monitor the impact on cluster load as connection count increases, as this is the final representation of the impact of a given workload or CKU dimension on the underlying resources of the cluster. Consider the Confluent guideline a per-CKU guideline. The number of connections tends to increase when you add brokers. In other words, if you significantly exceed the per-CKU guideline, cluster expansion doesn’t always give your cluster more connection count headroom. Transport Layer Security (TLS) : Transport Layer Security (TLS) is a cryptographic protocol that provides secure communication over a network. For more information, see [Wikipedia](https://en.wikipedia.org/wiki/Transport_Layer_Security). unbounded stream : An unbounded stream is a stream of data that is continuously generated in real-time and has no defined end. Examples of unbounded streams include stock prices, sensor data, and social media feeds. Processing unbounded streams requires a different approach than processing bounded streams. Unbounded streams are processed incrementally as data arrives, while bounded streams are processed as a batch after all data has arrived. Kafka Streams and Flink can be used to process unbounded streams. **Related terms**: *bounded stream*, *stream processing*, *unbounded stream* under replication : Under replication is a situation when the number of in-sync replicas is below the number of all replicas. Under Replicated partitions can occur when a broker is down or cannot replicate fast enough from the leader (replica fetcher lag). user account : A user account is an account representing the identity of a person who can be authenticated and granted access to Confluent Cloud resources. **Related content** - [User Accounts for Confluent Cloud](/cloud/current/access-management/identity/user-accounts/overview.html) watermark : A watermark in Flink is a marker that keeps track of time as data is processed. A watermark means that all records until the current moment in time have been “seen”. This way, Flink can correctly perform tasks that depend on when things happened, like calculating aggregations over time windows. **Related content** - [Time and Watermarks](/cloud/current/flink/concepts/timely-stream-processing.html) ### GCS object names The GCS data model is a flat structure: each bucket stores objects, and the name of each GCS object serves as the unique key. However, a logical hierarchy can be inferred when the GCS object names uses directory delimiters, such as `/`. The GCS connector allows you to customize the names of the GCS objects it uploads to the GCS bucket. In general, the names of the GCS object uploaded by the GCS connector follow this format: ```bash ///++. Admin API : The Admin API is the Kafka REST API that enables administrators to manage and monitor Kafka clusters, topics, brokers, and other Kafka components. Ansible Playbooks for Confluent Platform : Ansible Playbooks for Confluent Platform is a set of Ansible playbooks and roles that are designed to automate the deployment and management of Confluent Platform. Apache Flink : Apache Flink is an open source stream processing framework for stateful computations over unbounded and bounded data streams. Flink provides a unified API for batch and stream processing that supports event-time and out-of-order processing, and supports exactly-once semantics. Flink applications include real-time analytics, data pipelines, and event-driven applications. **Related terms**: *bounded stream*, *data stream*, *stream processing*, *unbounded stream* **Related content** - [Apache Flink: Stream Processing and SQL on Confluent Cloud](/cloud/current/flink/index.html#) - [What is Apache Flink?](https://www.confluent.io/learn/apache-flink/) - [Apache Flink 101 (Confluent Developer course)](https://developer.confluent.io/courses/apache-flink/intro/) Apache Kafka : Apache Kafka is an open source event streaming platform that provides a unified, high-throughput, low-latency, fault-tolerant, scalable, distributed, and secure data streaming platform. Kafka is a publish-and-subscribe messaging system that enables distributed applications to ingest, process, and share data in real-time. **Related content** - [Introduction to Kafka](/kafka/introduction.html) audit log : An audit log is a historical record of actions and operations that are triggered when auditable events occurs. Audit log records can be used to troubleshoot system issues, manage security, and monitor compliance, by tracking administrative activity, data access and modification, monitoring sign-in attempts, and reconstructing security breaches and fraudulent activity. **Related terms**: *auditable event* **Related content** - [Audit Log Concepts for Confluent Cloud](/cloud/current/monitoring/audit-logging/cloud-audit-log-concepts.html) - [Audit Log Concepts for Confluent Platform](/platform/current/security/audit-logs/audit-logs-concepts.html) auditable event : An auditable event is an event that represents an action or operation that can be tracked and monitored for security purposes and compliance. When an auditable event occurs, an auditable event method is triggered and an event message is sent to the audit log cluster and stored as an audit log record. **Related terms**: *audit log*, *event message* **Related content** - [Auditable Events in Confluent Cloud](/cloud/current/monitoring/audit-logging/event-methods/index.html) - [Auditable Events in Confluent Platform](/platform/current/security/audit-logs/auditable-events.html) authentication : Authentication is the process of verifying the identity of a principal that interacts with a system or application. Authentication is often used in conjunction with authorization to determine whether a principal is allowed to access a resource and perform a specific action or operation on that resource. Digital authentication requires one or more of the following: something a principal knows (a password or security question), something a principal has (a security token or key), or something a principal is (a biometric characteristic, such as a fingerprint or voiceprint). Multi-factor authentication (MFA) requires two or more forms of authentication. **Related terms**: *authorization*, *identity*, *identity provider*, *identity pool*, *principal*, *role* authorization : Authorization is the process of evaluating and then granting or denying a principal a set of permissions required to access and perform operations on resources. **Related terms**: *authentication*, *group mapping*, *identity*, *identity provider*, *identity pool*, *principal*, *role* Avro : Avro is a data serialization and exchange framework that provides data structures, remote procedure call (RPC), compact binary data format, a container file, and uses JSON to represent schemas. Avro schemas ensure that every field is properly described and documented for use with serializers and deserializers. You can either send a schema with every message or use Schema Registry to store and receive schemas for use by consumers and producers to save bandwidth and storage space. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [Apache Avro - a data serialization system](https://avro.apache.org/) - Confluent Cloud: [Avro Schema Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-avro.html) - Confluent Platform: [Avro Schema Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-avro.html) Basic Kafka cluster : A Confluent Cloud cluster type. Basic Kafka clusters are designed for experimentation, early development, and basic use cases. batch processing : Batch processing is the method of collecting a large volume of data over a specific time interval, after which the data is processed all at once and loaded into a destination system. Batch processing is often used when processing data can occur independently of the source and timing of the data. It is efficient for non-real-time data processing, such as data warehousing, reporting, and analytics. **Related terms**: *bounded stream*, *stream processing*, *unbounded stream* CIDR block : A CIDR block is a group of IP addresses that are contiguous and can be represented as a single block. CIDR blocks are expressed using Classless Inter-domain Routing (CIDR) notation that includes an IP address and a number of bits in the network mask. **Related content** - [CIDR notation](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing#CIDR_notation) - [Classless Inter-domain Routing (CIDR): The Internet Address Assignment and Aggregation Plan [RFC 4632]](https://www.rfc-editor.org/rfc/rfc4632.html) Cluster Linking : Cluster Linking is a highly performant data replication feature that enables links between Kafka clusters to mirror data from one cluster to another. Cluster Linking creates perfect copies of Kafka topics, which keep data in sync across clusters. Use cases include geo-replication of data, data sharing, migration, disaster recovery, and tiered separation of critical applications. **Related content** - [Geo-replication with Cluster Linking on Confluent Cloud](/cloud/current/multi-cloud/cluster-linking/index.html) - [Cluster Linking for Confluent Platform](/platform/current/multi-dc-deployments/cluster-linking/index.html) commit log : A commit log is a log of all event messages about commits (changes or operations made) sent to a Kafka topic. A commit log ensures that all event messages are processed at least once and provides a mechanism for recovery in the event of a failure. The commit log is also referred to as a write-ahead log (WAL) or a transaction log. **Related terms**: *event message* Confluent Cloud : Confluent Cloud is the fully managed, cloud-native event streaming service powered by Kora, the event streaming platform based on Kafka and extended by Confluent to provide high availability, scalability, elasticity, security, and global interconnectivity. Confluent Cloud offers cost-effective multi-tenant configurations as well as dedicated solutions, if stronger isolation is required. **Related terms**: *Apache Kafka*, *Kora* **Related content** - [Confluent Cloud Overview](/cloud/current/index.html) - [Confluent Cloud](https://www.confluent.io/confluent-cloud/) Confluent Cloud network : A Confluent Cloud network is an abstraction for a single tenant network environment that hosts Dedicated Kafka clusters in Confluent Cloud along with their single tenant services, like ksqlDB clusters and managed connectors. **Related content** - [Confluent Cloud Network Overview](/cloud/current/networking/overview.html#ccloud-networks) Confluent for Kubernetes (CFK) : *Confluent for Kubernetes (CFK)* is a cloud-native control plane for deploying and managing Confluent in private cloud environments through declarative API. Confluent Platform : Confluent Platform is a specialized distribution of Kafka at its core, with additional components for data integration, streaming data pipelines, and stream processing. Confluent REST Proxy : Confluent REST Proxy provides a RESTful interface to an Kafka cluster, making it easy to produce and consume messages, view the state of the cluster, and perform administrative actions without using the native Kafka protocol or clients. **Related content** - Confluent Platform: [REST Proxy](/platform/current/kafka-rest/index.html) Confluent Server : Confluent Server is the default Kafka broker component of Confluent Platform that builds on the foundation of Apache Kafka® and provides enhanced proprietary features designed for enterprise use. Confluent Server is fully compatible with Kafka, and adds Kafka cluster support for Role-Based Access Control, Audit Logs, Schema Validation, Self Balancing Clusters, Tiered Storage, Multi-Region Clusters, and Cluster Linking. **Related terms**: *Confluent Platform*, *Apache Kafka*, *Kafka broker*, *Cluster Linking*, *multi-region cluster (MRC)* Confluent Unit for Kafka (CKU) : Confluent Unit for Kafka (CKU) is a unit of horizontal scaling for Dedicated Kafka clusters in Confluent Cloud that provide preallocated resources. CKUs determine the capacity of a Dedicated Kafka cluster in Confluent Cloud. **Related content** - [CKU limits per cluster](/cloud/current/clusters/cluster-types.html#cku-limits-per-cluster) Connect API : The Connect API is the Kafka API that enables a connector to read event streams from a source system and write to a target system. Connect worker : A Connect worker is a server process that runs a connector and performs the actual work of moving data in and out of Kafka topics. A worker is a server process that runs on hardware independent of the Kafka brokers themselves. It is scalable and fault-tolerant, meaning you can run a cluster of workers that share the load of moving data in and out of Kafka from and to external systems. **Related terms**: *connector*, *Kafka Connect* connection attempts : In Confluent Cloud, connection attempts are a Kafka cluster billing dimension that defines the maximum number of new TCP connections to the cluster you can create in one second. This includes successful and unsuccessful authentication attempts. Available in the Metrics API as `successful_authentication_count` (only includes successful authentications, not unsuccessful authentication attempts). To reduce usage on connection attempts, use longer-lived connections to the cluster. If you exceed the maximum, connection attempts may be refused. connector : A connector is an abstract mechanism that enables communication, coordination, or cooperation among components by transferring data elements from one interface to another without changing the data. connector offset : Connector offset uniquely identifies the position of a connector as it processes data. Connectors use a variety of strategies to implement the connector offset, including everything from monotonically increasing integers to replay ids, lists of files, timestamps and even checkpoint information. Connector offsets keep track of already-processed data in the event of a connector restart or recovery. While sink connectors use a pattern for connector offsets similar to the offset mechanism used throughout Kafka, the implementation details for source connectors are often much different. This is because source connectors track the progress of a source system as it process data. consumer : A consumer is a Kafka client application that subscribes to (reads and processes) event messages from a Kafka topic. The Streams API and the Consumer API are the two APIs that enable consumers to read event streams from Kafka topics. **Related terms**: *Consumer API*, *consumer group*, *producer*, *Streams API* Consumer API : The Consumer API is the Kafka API used for consuming (reading) event messages or records from Kafka topics and enables a Kafka consumer to subscribe to a topic and read event messages as they arrive. Batch processing is a common use case for the Consumer API. consumer group : A consumer group is a single logical consumer implemented with multiple physical consumers for reasons of throughput and resilience. By dividing topics among consumers in the group into partitions, consumers in the group can process messages in parallel, increasing message throughput and enabling load balancing. **Related terms**: *consumer*, *partition*, *partition*, *producer*, *topic* consumer lag : Consumer lag is the number of consumer offsets between the latest message produced in a partition and the last message consumed by a consumer, that is the number of messages pending to be consumed from a particular partition. A large consumer lag, or a quickly growing lag, indicates that the consumer is unable to read from a partition as fast as the messages are available. This can be caused by a slow consumer, slow network, or slow broker. consumer offset : Consumer offset is the unique and monotonically increasing integer value that uniquely identifies the position of an event record in a partition. Consumers use offsets to track their current position in the Kafka topic, allowing consumers to resume processing from where they left off. Offsets are stored on the Kafka broker, which does not track which records have been read and which have not. It is up to the consumer connection to track this information. When a consumer acknowledges receiving and processing a message, it commits an offset value that is stored in the special internal topic `__commit_offsets`. cross-resource RBAC role binding : A cross-resource RBAC role binding is a role binding in Confluent Cloud that is applied at the Organization or Environment scope and grants access to multiple resources. For example, assigning a principal the NetworkAdmin role at the Organization scope lets them administer all networks across all Environments in their Organization. **Related terms**: *identity pool*, *principal*, *role*, *role binding* CRUD : CRUD is an acronym for the four basic operations that can be performed on data: Create, Read, Update, and Delete. custom connector : A custom connector is a connector created using Connect plugins uploaded to Confluent Cloud by users. This includes connector plugins that are built from scratch, modified open-source connector plugins, or third-party connector plugins. data at rest : Data at rest is data that is physically stored on non-volatile media (such as hard drives, solid-state drives, or other storage devices) and is not actively being transmitted or processed by a system. data contract : A data contract is a formal agreement between an upstream component and a downstream component on the structure and semantics of data that is in motion. A schema is a key element of a data contract. The schema, metadata, rules, policies, and evolution plan form the data contract. You can associate data contracts (schemas and more) with [topics](#term-Kafka-topic). **Related content** - Confluent Platform: [Data Contracts for Schema Registry on Confluent Platform](/platform/current/schema-registry/fundamentals/data-contracts.html) - Confluent Cloud: [Data Contracts for Schema Registry on Confluent Cloud](/cloud/current/sr/fundamentals/data-contracts.html) - Cloud Console: [Manage Schemas in Confluent Cloud](/cloud/current//sr/schemas-manage.html) data encryption key (DEK) : A data encryption key (DEK) is a symmetric key that is used to encrypt and decrypt data. The DEK is used in client-side field level encryption (CSFLE) to encrypt sensitive data. The DEK is itself encrypted using a key encryption key (KEK) that is only accessible to authorized users. The encrypted DEK and encrypted data are stored together. Only users with access to the KEK can decrypt the DEK and access the sensitive data. **Related terms**: *envelope encryption*, *key encryption key (KEK)* data in motion : Data in motion is data that is actively being transferred between source and destination, typically systems, devices, or networks. Data in motion is also referred to as data in transit or data in flight. data in use : Data in use is data that is actively being processed or manipulated in memory (RAM, CPU caches, or CPU registers). data ingestion : Data ingestion is the process of collecting, importing, and integrating data from various sources into a system for further processing, analysis, or storage. data mapping : Data mapping is the process of defining relationships or associations between source data elements and target data elements. Data mapping is an important process in data integration, data migration, and data transformation, ensuring that data is accurately and consistently represented when it is moved or combined. data pipeline : A data pipeline is a series of processes and systems that enable the flow of data from sources to destinations, automating the movement and transformation of data for various purposes, such as analytics, reporting, or machine learning. A data pipeline typically comprised of a source system, a data ingestion tool, a data transformation tool, and a target system. A data pipeline covers the following stages: data extraction, data transformation, data loading, and data validation. Data Portal : Data Portal is a Confluent Cloud application that uses Stream Catalog and Stream Lineage to provide self-service access throughout Confluent Cloud Console for data practitioners to search and discover existing topics using tags and business metadata, request access to topics and data, and access data in topics to to build streaming applications and data pipelines. Leverages Stream Catalog and Stream Lineage to provide a data-centric view of Confluent optimized for self-service access to data where users can search, discover and understand available data, request access to data, and use data. **Related terms**: *Stream Catalog*, *Stream Lineage* **Related content** - [Data Portal on Confluent Cloud](/cloud/current/stream-governance/data-portal.html) data serialization : Data serialization is the process of converting data structures or objects into a format that can be stored or transmitted, and reconstructed later in the same or another computer environment. Data serialization is a common technique for implementing data persistence, interprocess communication, and object communication. Confluent Schema Registry (in Confluent Platform) and Confluent Cloud Schema Registry support data serialization using serializers and deserializers for the following formats: Avro, JSON Schema, and Protobuf. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) data steward : A data steward is a person with data-related responsibilities, such as data governance, data quality, and data security. data stream : A data stream is a continuous flow of data records that are produced and consumed by applications. dead letter queue (DLQ) : A dead letter queue (DLQ) is a queue where messages that could not be processed successfully by a sink connector are placed. Instead of stopping, the sink connector sends messages that could not be written successfully as event records to the DLQ topic while the sink connector continues processing messages. Dedicated Kafka cluster : A Confluent Cloud cluster type. Dedicated Kafka clusters are designed for critical production workloads with high traffic or private networking requirements. deserializer : A deserializer is a tool that converts a serial byte stream back into objects and parallel data. Deserializers work with serializers (known together as Serdes) to support efficient storage and high-speed data transmission over the wire. Confluent provides Serdes for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) egress : In general networking, egress refers to outbound traffic leaving a network or a specific network segment. In Confluent Cloud, egress is a Kafka cluster billing dimension that defines the number of bytes consumed from the cluster in one second. Available in the Metrics API as `sent_bytes` (convert from bytes to MB). To reduce egress in Confluent Cloud, compress your messages and ensure each consumer is only consuming from the topics it requires. For compression, use lz4. Avoid gzip because of high overhead on the cluster. Elastic Confluent Unit for Kafka (eCKU) : Elastic Confluent Unit for Kafka (eCKU) is used to express capacity for Basic, Standard, Enterprise, and Freight Kafka clusters. These clusters automatically scale up to a fixed ceiling. There is no need to resize these type clusters. When you need more capacity, your cluster expands up to the fixed ceiling. If you’re not using capacity above the minimum, you’re not paying for it. ELT : ELT is an acronym for Extract-Load-Transform, where data is extracted from a source system and loaded into a target system before processing or transformation. Compared to ETL, ELT is a more flexible approach to data ingestion because the data is loaded into the target system before transformation. Enterprise Kafka cluster : A Confluent Cloud cluster type. Enterprise Kafka clusters are designed for production-ready functionality that requires private endpoint networking capabilities. envelope encryption : Envelope encryption is a cryptographic technique that uses two keys to encrypt data. The symmetric data encryption key (DEK) is used to encrypt sensitive data. The separate asymmetric key encryption key (KEK) is the master key used to encrypt the DEK. The DEK and encrypted data are stored together. Only users with access to the KEK can decrypt the DEK and access the sensitive data. In Confluent Cloud, envelope encryption is used to enable client-side field level encryption (CSFLE). CSFLE encrypts sensitive data in a message before it is sent to Confluent Cloud and allows for temporary decryption of sensitive data when required to perform operations on the data. **Related terms**: *data encryption key (DEK)*, *key encryption key (KEK)* ETL : ETL is an acronym for Extract-Transform-Load, where data is extracted from a source system, transformed into a target format, and loaded into a target system. Compared to ELT, ETL is a more rigid approach to data ingestion because the data is transformed before loading into the target system. event : An event is a meaningful action or occurrence of something that happened. Events that can be recognized by a program, either human-generated or triggered by software, can be recorded in a log file or other data store. **Related terms**: *event message*, *event record*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event message : An event message is a record of an event sent to a Kafka topic, represented as a key-value pair. Each event message consists of a key-value pair, a timestamp, the compression type, headers for metadata (optional), and a partition and offset ID (once the message is written). The key is optional and can be used to identify the event. The value is required and contains details about the event that happened. **Related terms**: *event*, *event record*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event record : An event record is the record of an event stored in a Kafka topic. Event records are organized and durably stored in topics. Examples of events include orders, payments, activities, or measurements. An event typically contains one or more data fields that describe the fact, as well as a timestamp that denotes when the event was created by its event source. The event may also contain various metadata, such as its source of origin (for example, the application or cloud service that created the event) and storage-level information (for example, its position in the event stream). **Related terms**: *event*, *event message*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event sink : An event sink is a consumer of events, which can include applications, cloud services, databases, IoT sensors, and more. **Related terms**: *event*, *event message, \*event record*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event source : An event source is a producer of events, which can include cloud services, databases, IoT sensors, mainframes, and more. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event stream*, *event streaming*, *event streaming platform*, *event time* event stream : An event stream is a continuous flow of event messages produced by an event source and consumed by one or more consumers. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event streaming platform*, *event time* event streaming : Event streaming is the practice of capturing event data in real-time from data sources. Event streaming is a form of data streaming that is used to capture, store, process, and react to data in real-time or retrospectively. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming platform*, *event time* event streaming platform : An event streaming platform is a platform that events can be written to once, allowing distributed functions within an organization to react in realtime. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event time* event time : Event time is the time when an event occurred on the producing device, as opposed to the time when the event was processed or recorded. Event time is often used in stream processing to determine the order of events and to perform windowing operations. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event streaming platform* exactly-once semantics : Exactly-once semantics is a guarantee that a message is delivered exactly once and in the order that it was sent. Even if a producer retries sending a message, or a consumer retries processing a message, the message is delivered exactly once. This guarantee is achieved by the broker assigning a unique ID to each message and storing the ID in the consumer offset. The consumer offset is committed to the broker only after the message is processed. If the consumer fails to process the message, the message is redelivered and processed again. Freight Kafka cluster : A Confluent Cloud cluster type. Freight Kafka clusters are designed for high-throughput, relaxed latency workloads that are less expensive than self-managed open source Kafka. granularity : Granularity is the degree or level of detail to which an entity (a system, service, or resource) is broken down into subcomponents, parts, or elements. Entities that are *fine-grained* have a higher level of detail, while *coarse-grained* entities have a reduced level of detail, often combining finer parts into a larger whole. In the context of access control, granular permissions provide precise control over resource access. They allow administrators to grant specific operations on distinct resources. This ensures users only have permissions tailored to their needs, minimizing unnecessary or potentially risky access. group mapping : Group mapping is a set of rules that map groups in your SSO identity provider to Confluent Cloud RBAC roles. When a user signs in to Confluent Cloud using SSO, Confluent Cloud uses the group mapping to grant access to Confluent Cloud resources. **Related terms**: *identity provider*, *identity pool*, *principal*, *role* **Related content** - [Group Mapping for Confluent Cloud](/cloud/current/access-management/authenticate/sso/group-mapping/overview.html) identity : An identity is a unique identifier that is used to authenticate and authorize users and applications to access resources. Identity is often used in conjunction with access control to determine whether a user or application is allowed to access a resource and perform a specific action or operation on that resource. **Related terms**: *identity provider*, *identity pool*, *principal*, *role* identity pool : An identity pool is a collection of identities that can be used to authenticate and authorize users and applications to access resources. Identity pools are used to manage permissions for users and applications that access resources in Confluent Cloud. They are also used to manage permissions for Confluent Cloud service accounts that are used to access resources in Confluent Cloud. identity provider : An identity provider is a trusted provider that authenticates users and issues security tokens that are used to verify the identity of a user. Identity providers are often used in single sign-on (SSO) scenarios, where a user can log in to multiple applications or services with a single set of credentials. Infinite Storage : Infinite Storage is the Confluent Cloud storage service that enhances the scalability of Confluent Cloud resources by separating storage and processing. Tiered storage within Confluent Cloud moves data between storage layers based on the needs of the workload, retrieves tiered data when requested, and garbage collects data that is past retention or otherwise deleted. If an application reads historical data, latency is not increased for other applications reading more recent data. Storage resources are decoupled from compute resources, you only pay for what you produce to Confluent Cloud and for storage that you use, and CKUs do not have storage limits. Related content: - [Infinite Storage in Confluent Cloud for Apache Kafka](https://www.confluent.io/blog/infinite-kafka-data-storage-in-confluent-cloud/) ingress : In general networking, ingress refers to traffic that enters a network from an external source. In Confluent Cloud, ingress is a Kafka cluster billing dimension that defines the number of bytes produced to the cluster in one second. Available in the Metrics API as `received_bytes` (convert from bytes to MB). To reduce ingress in Confluent Cloud, compress your messages. For compression, use lz4. Avoid gzip because of high overhead on the cluster. internal topic : An internal topic is a topic, prefixed with double underscores (“_\_”), that is automatically created by a Kafka component to store metadata about the broker, partition assignment, consumer offsets, and other information. Examples of internal topics: `__cluster_metadata`, `__consumer_offsets`, `__transaction_state`, `__confluent.support.metrics`, and `__confluent.support.metrics-raw`. JSON Schema : JSON Schema is a declarative language used for data serialization and exchange to define data structures, specify formats, and validate JSON documents. It is a way to encode expected data types, properties, and constraints to ensure that all fields are properly described for use with serializers and deserializers. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [JSON Schema - a declarative language that allows you to annotate and validate JSON documents.](https://json-schema.org/) - Confluent Cloud: [JSON Schema Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-avro.html) - Confluent Platform: [JSON Schema Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-json.html) Kafka bootstrap server : A Kafka bootstrap server is a Kafka broker that a Kafka client initiates a connection to a Kafka cluster and returns metadata, which includes the addresses for all of the brokers in the Kafka cluster. Although only one bootstrap server is required to connect to a Kafka cluster, multiple brokers can be specified in a bootstrap server list to provide high availability and fault tolerance in case a broker is unavailable. In Confluent Cloud, the bootstrap server is the general cluster endpoint. Kafka broker : A Kafka broker is a server in the Kafka storage layer that stores event streams from one or more sources. A Kafka cluster is typically comprised of several brokers. Every broker in a cluster is also a bootstrap server, meaning if you can connect to one broker in a cluster, you can connect to every broker. Kafka client : A Kafka client allows you to write distributed applications and microservices that read, write, and process streams of events in parallel, at scale, and in a fault-tolerant manner, even in the case of network problems or machine failures. The Kafka client library provides functions, classes, and utilities that allow developers to create Kafka producer clients (Producers) and consumer clients (Consumers) using various programming languages. The primary way to build production-ready Producers and Consumers is by using your preferred programming language and a Kafka client library. **Related content** - [Build Client Applications for Confluent Cloud](/cloud/current/client-apps/overview.html) - [Build Client Applications for Confluent Platform](/platform/current/clients/index.html) - [Getting Started with Apache Kafka and Java (or Python, Go, .Net, and others)](https://developer.confluent.io/get-started/java/) Kafka cluster : A Kafka cluster is a group of interconnected Kafka brokers that manage and distribute real-time data streaming, processing, and storage as if they are a single system. By distributing tasks and services across multiple Kafka brokers, the Kafka cluster improves availability, reliability, and performance. Kafka Connect : Kafka Connect is the component of Kafka that provides data integration between databases, key-value stores, search indexes, file systems, and Kafka brokers. Kafka Connect is an ecosystem of a client application and pluggable connectors. As a client application, Connect is a server process that runs on hardware independent of the Kafka brokers themselves. It is scalable and fault-tolerant, meaning you can run a cluster of Connect workers that share the load of moving data in and out of Kafka from and to external systems. Connect also abstracts the business of code away from the user and instead requires only JSON configuration to run. **Related content** - Confluent Cloud: [Kafka Connect](/cloud/current/billing/overview.html#kconnect-long) - Confluent Platform: [Kafka Connect](/platform/current/connect/index.html) Kafka controller : A Kafka controller is the node in a Kafka cluster that is responsible for managing and changing the metadata of the cluster. This node also communicates metadata changes to the rest of the cluster. When Kafka uses ZooKeeper for metadata management, the controller is a broker, and the broker persists the metadata to ZooKeeper for backup and recovery. With KRaft, you dedicate Kafka nodes to operate as controllers and the metadata is stored in Kafka itself and not persisted to ZooKeeper. KRaft enables faster recovery because of this. For more information, see [KRaft overview](/platform/current/kafka-metadata/kraft.html). Kafka listener : A Kafka listener is an endpoint that Kafka brokers bind to use to communicate with clients. For Kafka clusters, Kafka listeners are configured in the `listeners` property of the `server.properties` file. Advertised listeners are publicly accessible endpoints that are used by clients to connect to the Kafka cluster. **Related content** - [Kafka Listeners – Explained](https://www.confluent.io/blog/kafka-listeners-explained/) Kafka metadata : Kafka metadata is the information about the Kafka cluster and the topics that are stored in it. This information includes details such as the brokers in the cluster, the topics that are available, the partitions for each topic, and the location of the leader for each partition. Kafka metadata is used by clients to discover the available brokers and topics, and to determine which broker is the leader for a particular partition. This information is essential for clients to be able to send and receive messages to and from Kafka. Kafka Streams : Kafka Streams is a stream processing library for building streaming applications and microservices that transform (filter, group mapping, aggregate, join, and more) incoming event streams in real-time to Kafka topics stored in an Kafka cluster. The Streams API can be used to build applications that process data in real-time, analyze data continuously, and build data pipelines. **Related content** - [Kafka Streams](/platform/current/streams/overview.html) Kafka topic : See *topic*. key encryption key (KEK) : A key encryption key (KEK) is a master key that is used to encrypt and a decrypt other keys, specifically the data encryption key (DEK). Only users with access to the KEK can decrypt the DEK and access the sensitive data. **Related terms**: *data encryption key (DEK)*, *envelope encryption*. Kora : Kora is the cloud-native streaming data service based on Kafka technology that powers the Confluent Cloud event streaming platform for building real-time data pipelines and streaming applications. Kora abstracts low-level resources, such as Kafka brokers, and hides operational complexities, such as system upgrades. Kora is built on the following foundations: a tiered storage layer that improves cost and performance, elasticity and consistent performance through incremental load balancing, cost effective multi-tenancy with dynamic quota management and cell-based isolation, continuous monitoring of both system health and data integrity, and clean abstraction with standard Kafka protocols and CKUs to hide underlying resources. **Related terms**: *Apache Kafka*, *Confluent Cloud*, *Confluent Unit for Kafka (CKU)* **Related content** - [Kora: The Cloud Native Engine for Apache Kafka](https://www.confluent.io/blog/cloud-native-data-streaming-kafka-engine/) - [Kora: A Cloud-Native Event Streaming Platform For Kafka](https://www.vldb.org/pvldb/vol16/p3822-povzner.pdf) KRaft : KRaft (or Apache Kafka Raft) is a consensus protocol introduced in Kafka 2.4 to provide metadata management for Kafka with the goal to replace ZooKeeper. KRaft simplifies Kafka because it enables the management of metadata in Kafka itself, rather than splitting it between ZooKeeper and Kafka. As of Confluent Platform 7.5, KRaft is the default method of metadata management in new deployments. For more information, see [KRaft overview](/platform/current/kafka-metadata/kraft.html). ksqlDB : ksqlDB is a streaming SQL database engine purpose-built for creating stream processing applications on top of Kafka. logical Kafka cluster (LKC) : A logical Kafka cluster (LKC) is a subset of a physical Kafka cluster (PKC) that is isolated from other logical clusters within Confluent Cloud. Each logical unit of isolation is considered a tenant and maps to a specific organization. If the mapping is one-to-one, one LKC maps to one PKC (a Dedicated cluster). If the mapping is many-to-one, one LKC maps to one of the multitenant Kafka cluster types (Basic, Standard, Enterprise, and Freight). **Related terms**: *Confluent Cloud*, *Kafka cluster*, *physical Kafka cluster (PKC)* **Related content** - [Kafka Cluster Types in Confluent Cloud](/cloud/current/clusters/cluster-types.html) - [Multi-tenancy and Client Quotas on Confluent Cloud](/cloud/current/clusters/client-quotas.html) multi-region cluster (MRC) : A multi-region cluster (MRC) is a single Kafka cluster that replicates data between datacenters across regional availability zones. multi-tenancy : Multi-tenancy is a software architecture in which a single physical instance is shared among multiple logical instances, or tenants. In Confluent Cloud, each Basic, Standard, Enterprise, and Freight cluster is a logical Kafka cluster (LKC) that shares a physical Kafka cluster (PKC) with other tenants. Each LKC is isolated from other L and has its own resources, such as memory, compute, and storage. **Related terms**: *Confluent Cloud*, *logical Kafka cluster (LKC)*, *physical Kafka cluster (PKC)* **Related content** * [From On-Prem to Cloud-Native: Multi-Tenancy in Confluent Cloud](https://www.confluent.io/blog/cloud-native-multi-tenant-kafka-with-confluent-cloud/) * [Multi-tenancy and Client Quotas on Confluent Cloud](/cloud/current/clusters/client-quotas.html) offset : An offset is an integer assigned to each message that uniquely represents its position within the data stream, guaranteeing the ordering of records and allowing offset-based connections to replay messages from any point in time. **Related terms**: *consumer offset*, *connector offset*, *offset commit*, *replayability* offset commit : An offset commit is the process of keeping track of the current position of an offset-based connection (primarily Kafka consumers and connectors) within the data stream. The offset commit process is not specific to consumers, producers, or connectors. It is a general mechanism in Kafka to track the position of any application that is reading data. When a consumer commits an offset, the offset identifies the next message the consumer should consume. For example, if a consumer has an offset of 5, it has consumed messages 0 through 4 and will next consume message 5. If the consumer crashes or is shut down, its partitions are reassigned to another consumer which initiates consuming from the last committed offset of each partition. The committed offset for consumers is stored on a Kafka broker. When a consumer commits an offset, it sends a commit request to the Kafka cluster, specifying the partition and offset it wants to commit for a particular consumer group. The Kafka broker receiving the commit request then stores this offset in the `__consumer_offsets` internal topic. **Related terms**: *consumer offset*, *offset* OpenSSL : OpenSSL is an open-source software library and toolkit that implements the Secure Sockets Layer (SSL) and Transport Layer Security (TLS) protocols. For more information, see [Wikipedia](https://en.wikipedia.org/wiki/OpenSSL). parent cluster : The Kafka cluster that a resource belongs to. **Related terms**: *Kafka cluster* partition : A partition is a unit of data storage that divides a topic into multiple, parallel event streams, each of which is stored on separate Kafka brokers and can be consumed independently. Partitioning is a key concept in Kafka because it allows Kafka to scale horizontally by adding more brokers to the cluster. Partitions are also the unit of parallelism in Kafka. A topic can have one or more partitions, and each partition is an ordered, immutable sequence of event records that is continually appended to a partition log. partitions (pre-replication) : In Confluent Cloud, partitions are a Kafka cluster billing dimension that define the maximum number of partitions that can exist on the cluster at one time, before replication. While you are not charged for partitions on any type of Kafka cluster, the number of partitions you use has an impact on eCKU. To determine eCKUs limits for partitions, Confluent Cloud bills only for pre-replication (leader partitions) across a cluster. All topics that you create (as well as internal topics that are automatically created by Confluent Platform components such as ksqlDB, Kafka Streams, Connect, and Control Center (Legacy)) count towards the cluster partition limit. Confluent prefixes topics created automatically with an underscore (_). Topics that are internal to Kafka itself (consumer offsets) are not visible in Cloud Console and do not count against partition limits or toward partition billing. Available in the Metrics API as `partition_count`. In Confluent Cloud, attempts to create additional partitions beyond the cluster limit fail with an error message. To reduce usage on partitions (pre-replication), delete unused topics and create new topics with fewer partitions. Use the Kafka Admin interface to increase the partition count of an existing topic if the initial partition count is too low. physical Kafka cluster (PKC) : A physical Kafka cluster (PKC) is a Kafka cluster comprised of multiple brokers. Each physical Kafka cluster is created on a Kubernetes cluster by the control plane. A PKC is not directly accessible by clients. principal : A principal is an entity that can be authenticated and granted permissions based on roles to access resources and perform operations. An entity can be a user account, service account, group mapping, or identity pool. **Related terms**: *group mapping*, *identity*, *identity pool*, *role*, *service account*, *user account* private internet : A private internet is a closed, restricted computer network typically used by organizations to provide secure environments for managing sensitive data and resources. processing time : Processing time is the time when an event is processed or recorded by a system, as opposed to the time when the event occurred on the producing device. Processing time is often used in stream processing to determine the order of events and to perform windowing operations. producer : A producer is a client application that publishes (writes) data to a topic in an Kafka cluster. Producers write data to a topic and are the only clients that can write data to a topic. Each record written to a topic is appended to the partition of the topic that is selected by the producer. Producer API : The Producer API is the Kafka API that allows you to write data to a topic in an Kafka cluster. The Producer API is used by producer clients to publish data to a topic in an Kafka cluster. Protobuf : Protobuf (or Protocol Buffers) is an open-source data format used to serialize structured data for storage. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [Protocol Buffers](https://protobuf.dev/) - [Getting Started with Protobuf in Confluent Cloud](https://www.confluent.io/blog/using-protobuf-in-confluent-cloud/) - Confluent Cloud: [Protobuf Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-protobuf.html) - Confluent Platform: [Protobuf Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-protobuf.html) public internet : The public internet is the global system of interconnected computers and networks that use TCP/IP to communicate with each other. rebalancing : Rebalancing is the process of redistributing the partitions of a topic among the consumers of a consumer group for improved performance and scalability. A rebalance can occur if a consumer has failed the heartbeat and has been excluded from the group, it voluntarily left the group, metadata has been updated for a consumer, or a consumer has joined the group. replayability : Replayability is the ability to replay messages from any point in time. **Related terms**: *consumer offset*, *offset*, *offset commit* replication : Replication is the process of creating and maintaining multiple copies (or *replicas*) of data across different nodes in a distributed system to increase availability, reliability, redundancy, and accessibility. replication factor : A replication factor is the number of copies of a partition that are distributed across the brokers in a cluster. requests : In Confluent Cloud, requests are a Kafka cluster billing dimension that defines the number of client requests to the cluster in one second. Available in the Metrics API as `request_count`. To reduce usage on requests, you can adjust producer batching configurations, consumer client batching configurations, and shut down otherwise inactive clients. For Dedicated clusters, a high number of requests per second results in increased load on the cluster. role : A role is a Confluent-defined job function assigned a set of permissions required to perform specific actions or operations on Confluent resources bound to a principal and Confluent resources. A role can be assigned to a user account, group mapping, service account, or identity pool. **Related terms**: *group mapping*, *identity*, *identity pool*, *principal*, *service account* **Related content** - [Predefined RBAC Roles in Confluent Cloud](/cloud/current/access-management/access-control/rbac/predefined-rbac-roles.html) - [Role-Based Access Control Predefined Roles in Confluent Platform](/platform/current/security/rbac/rbac-predefined-roles.html) rolling restart : A rolling restart restarts the brokers in a Kafka cluster with zero downtime by incrementally restarting a Kafka broker after verifying that there are no under-replicated partitions on the broker before proceeding to the next broker. Restarting the brokers one at a time allows for software upgrades, broker configuration updates, or cluster maintenance while maintaining high availability by avoiding downtime. **Related content** - [Rolling restart](/platform/current/kafka/post-deployment.html#rolling-restart) schema : A schema is the structured definition or blueprint used to describe the format and structure event messages sent through the Kafka event streaming platform. Schemas are used to validate the structure of data in event messages and ensures that producers and consumers are sending and receiving data in the same format. Schemas are defined in the Schema Registry. Schema Registry : Schema Registry is a centralized repository for managing and validating schemas for topic message data that stores and manages schemas for Kafka topics. Schema Registry is built into Confluent Cloud as a managed service, available with the Advanced Stream Governance package, and offered as part of Confluent Enterprise for self-managed deployments. The Schema Registry is a RESTful service that stores and manages schemas for Kafka topics. The Schema Registry is integrated with Kafka and Connect to provide a central location for managing schemas and validating data. Producers and consumers to Kafka topics use schemas to ensure data consistency and compatibility as schemas evolve. Schema Registry is a key component of Stream Governance. **Related content** - Confluent Cloud: [Manage Schemas in Confluent Cloud](/cloud/current/sr/schemas-manage.html) - Confluent Platform: [Schema Registry Overview](/platform/current/schema-registry/index.html) schema subject : A schema subject is the namespace for a schema in Schema Registry. This unique identifier defines a logical grouping of related schemas. Kafka topics contain event messages serialized and deserialized using the structure and rules defined in a schema subject. This ensures compatibility and supports schema evolution. **Related content** - Confluent Cloud: [Manage Schemas in Confluent Cloud](/cloud/current/sr/schemas-manage.html) - Confluent Platform: [Schema Registry Concepts](/platform/current/schema-registry/index.html) - [Understanding Schema Subjects](https://developer.confluent.io/courses/schema-registry/schema-subjects/) Serdes : Serdes are serializers and deserializers that convert objects and parallel data into a serial byte stream for efficient storage and high-speed data transmission over the wire. Confluent provides Serdes for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) - [Serde](https://serde.rs/) serializer : A serializer is a tool that converts objects and parallel data into a serial byte stream. Serializers work with deserializers (known together as Serdes) to support efficient storage and high-speed data transmission over the wire. Confluent provides serializers for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) service account : A service account is a non-person entity used by an application or service to access resources and perform operations. Because a service account is an identity independent of the user who created it, it can be used programmatically to authenticate to resources and perform operations without the need for a user to be signed in. **Related content** - [Service Accounts for Confluent Cloud](/cloud/current/access-management/identity/service-accounts.html) service quota : A service quota is the limit, or maximum value, for a specific Confluent Cloud resource or operation that might vary by the resource scope it applies to. **Related content** - [Service Quotas for Confluent Cloud](/cloud/current/quotas/index.html) single message transform (SMT) : A single message transform (SMT) is a transformation or operation applied in realtime on an individual message that changes the values, keys, or headers of a message before being sent to a sink connector or after being read from a source connector. SMTs are convenient for inserting fields, masking information, event routing, and other minor data adjustments. single sign-on (SSO) : Single sign-on (SSO) is a centralized authentication service that allows users to use a single set of credentials to log in to multiple applications or services. **Related terms**: *authentication*, *group mapping*, *identity provider* **Related content** - [Single Sign-On for Confluent Cloud](/cloud/current/access-management/authenticate/sso/index.html) sink connector : A sink connector is a Kafka Connect connector that publishes (writes) data from a Kafka topic to an external system. source connector : A source connector is a Kafka Connect connector that subscribes (reads) data from a source (external system), extracts the payload and schema of the data, and publishes (writes) the data to Kafka topics. standalone : Standalone refers to a configuration in which a software application, system, or service operates independently on a single instance or device. This mode is commonly used for development, testing, and debugging purposes. For Kafka Connect, a standalone worker is a single process responsible for running all connectors and tasks on a single instance. Standard Kafka cluster : A Confluent Cloud cluster type. Standard Kafka clusters are designed for production-ready features and functionality. static egress IP address : A static egress IP address is an IP address used by a Confluent Cloud managed connector to establish outbound connections to endpoints of external data sources and sinks over the public internet. **Related content** - [Use Static IP Addresses on Confluent Cloud for Connectors and Cluster Linking](/cloud/current/networking/static-egress-ip-addresses.html) - [Static Egress IP Addresses for Confluent Cloud Connectors](/cloud/current/connectors/static-egress-ip.html) storage (pre-replication) : In Confluent Cloud, storage is a Kafka cluster billing dimension that defines the number of bytes retained on the cluster, pre-replication. Available in the Metrics API as `retained_bytes` (convert from bytes to TB). The returned value is pre-replication. Standard, Enterprise, Dedicated, and Freight clusters support Infinite Storage. This means there is no maximum size limit for the amount of data that can be stored on the cluster. You can configure policy settings `retention.bytes` and `retention.ms` at the topic level to control exactly how much and how long to retain data in a way that makes sense for your applications and helps control your costs. To reduce storage in Confluent Cloud, compress your messages and reduce retention settings. For compression, use lz4. Avoid gzip because of high overhead on the cluster. Stream Catalog : Stream Catalog is a pillar of Confluent Cloud Stream Governance that provides a centralized inventory of your organization’s data assets that supports data governance and data discovery. With Data Portal in Confluent Cloud Console, users can find event streams across systems, search topics by name or tags, and enrich event data to increase value and usefulness. REST and GraphQL APIs can be used to search schemas, apply tags to records or fields, manage business metadata, and discover relationships across data assets. **Related content** - [Stream Catalog on Confluent Cloud: User Guide to Manage Tags and Metadata](/cloud/current/stream-governance/stream-catalog.html) - [Stream Catalog in Streaming Data Governance (Confluent Developer course)](https://developer.confluent.io/courses/governing-data-streams/stream-catalog/) Stream Governance : Stream Governance is a collection of tools and features that provide data governance for data in motion. These include data quality tools such as Schema Registry, schema ID validation, and schema linking; built-in data catalog capabilities to classify, organize, and find event streams across systems; and stream lineage to visualize complex data relationships and uncover insights with interactive, end-to-end maps of event streams. Taken together, these and other governance tools enable teams to manage the availability, integrity, and security of data used across organizations, and help with standardization, monitoring, collaboration, reporting, and more. **Related terms**: *Stream Catalog*, *Stream Lineage* **Related content** - [Stream Governance on Confluent Cloud](/cloud/current/stream-governance/index.html) stream lineage : Stream lineage is the life cycle, or history, of data, including its origins, transformations, and consumption, as it moves through various stages in data pipelines, applications, and systems. Stream lineage provides a record of data’s journey from its source to its destination, and is used to track data quality, data governance, and data security. **Related terms**: **Data Portal**, *Stream Governance* **Related content** - [Stream Lineage on Confluent Cloud](/cloud/current/stream-governance/stream-lineage.html) stream processing : Stream processing is the method of collecting event stream data in real-time as it arrives, transforming the data in real-time using operations (such as filters, joins, and aggregations), and publishing the results to one or more target systems. Stream processing can be used to analyze data continuously, build data pipelines, and process time-sensitive data in real-time. Using the Confluent event streaming platform, event streams can be processed in real-time using Kafka Streams, Kafka Connect, or ksqlDB. Streams API : The Streams API is the Kafka API that allows you to build streaming applications and microservices that transform (for example, filter, group, aggregate, join) incoming event streams in real-time to Kafka topics stored in a Kafka cluster. The Streams API is used by stream processing clients to process data in real-time, analyze data continuously, and build data pipelines. **Related content** - [Introduction Kafka Streams API](/platform/current/streams/introduction.html) throttling : Throttling is the process Kafka clusters in Confluent Cloud use to protect themselves from getting to an over-utilized state. Also known as backpressure, throttling in Confluent Cloud occurs when cluster load reaches 80%. At this point, applications may start seeing higher latencies or timeouts as the cluster must begin throttling requests or connection attempts. topic : A topic is a user-defined category or feed name where event messages are stored and published by producers and subscribed to by consumers. Each topic is a log of event messages. Topics are stored in one or more partitions, which distribute topic records brokers in a Kafka cluster. Each partition is an ordered, immutable sequence of records that are continually appended to a topic. **Related content** - [Manage Topics in Confluent Cloud](/cloud/current/client-apps/topics/index.html) total client connections : In Confluent Cloud, total client connections are a Kafka cluster billing dimension that defines the number of TCP connections to the cluster you can open at one time. Available in the Metrics API as `active_connection_count`. Filter by principal to understand how many connections each application is creating. How many connections a cluster supports can vary widely based on several factors, including number of producer clients, number of consumer clients, partition keying strategy, produce patterns per client, and consume patterns per client. For Dedicated clusters, Confluent derives a guideline for total client connections from benchmarking that indicates exceeding this number of connections increases produce latency for test clients. However, this does not apply to all workloads. That is why total client connections are a guideline, not a hard limit for Dedicated Kafka clusters. Monitor the impact on cluster load as connection count increases, as this is the final representation of the impact of a given workload or CKU dimension on the underlying resources of the cluster. Consider the Confluent guideline a per-CKU guideline. The number of connections tends to increase when you add brokers. In other words, if you significantly exceed the per-CKU guideline, cluster expansion doesn’t always give your cluster more connection count headroom. Transport Layer Security (TLS) : Transport Layer Security (TLS) is a cryptographic protocol that provides secure communication over a network. For more information, see [Wikipedia](https://en.wikipedia.org/wiki/Transport_Layer_Security). unbounded stream : An unbounded stream is a stream of data that is continuously generated in real-time and has no defined end. Examples of unbounded streams include stock prices, sensor data, and social media feeds. Processing unbounded streams requires a different approach than processing bounded streams. Unbounded streams are processed incrementally as data arrives, while bounded streams are processed as a batch after all data has arrived. Kafka Streams and Flink can be used to process unbounded streams. **Related terms**: *bounded stream*, *stream processing*, *unbounded stream* under replication : Under replication is a situation when the number of in-sync replicas is below the number of all replicas. Under Replicated partitions can occur when a broker is down or cannot replicate fast enough from the leader (replica fetcher lag). user account : A user account is an account representing the identity of a person who can be authenticated and granted access to Confluent Cloud resources. **Related content** - [User Accounts for Confluent Cloud](/cloud/current/access-management/identity/user-accounts/overview.html) watermark : A watermark in Flink is a marker that keeps track of time as data is processed. A watermark means that all records until the current moment in time have been “seen”. This way, Flink can correctly perform tasks that depend on when things happened, like calculating aggregations over time windows. **Related content** - [Time and Watermarks](/cloud/current/flink/concepts/timely-stream-processing.html) ### S3 Object Names The S3 data model is a flat structure: each bucket stores objects, and the name of each S3 object serves as the unique key. However, a logical hierarchy can be inferred when the S3 object names uses directory delimiters, such as `/`. The S3 connector allows you to customize the names of the S3 objects it uploads to the S3 bucket. In general, the names of the S3 object uploaded by the S3 connector follow this format: ```bash ///++. Admin API : The Admin API is the Kafka REST API that enables administrators to manage and monitor Kafka clusters, topics, brokers, and other Kafka components. Ansible Playbooks for Confluent Platform : Ansible Playbooks for Confluent Platform is a set of Ansible playbooks and roles that are designed to automate the deployment and management of Confluent Platform. Apache Flink : Apache Flink is an open source stream processing framework for stateful computations over unbounded and bounded data streams. Flink provides a unified API for batch and stream processing that supports event-time and out-of-order processing, and supports exactly-once semantics. Flink applications include real-time analytics, data pipelines, and event-driven applications. **Related terms**: *bounded stream*, *data stream*, *stream processing*, *unbounded stream* **Related content** - [Apache Flink: Stream Processing and SQL on Confluent Cloud](/cloud/current/flink/index.html#) - [What is Apache Flink?](https://www.confluent.io/learn/apache-flink/) - [Apache Flink 101 (Confluent Developer course)](https://developer.confluent.io/courses/apache-flink/intro/) Apache Kafka : Apache Kafka is an open source event streaming platform that provides a unified, high-throughput, low-latency, fault-tolerant, scalable, distributed, and secure data streaming platform. Kafka is a publish-and-subscribe messaging system that enables distributed applications to ingest, process, and share data in real-time. **Related content** - [Introduction to Kafka](/kafka/introduction.html) audit log : An audit log is a historical record of actions and operations that are triggered when auditable events occurs. Audit log records can be used to troubleshoot system issues, manage security, and monitor compliance, by tracking administrative activity, data access and modification, monitoring sign-in attempts, and reconstructing security breaches and fraudulent activity. **Related terms**: *auditable event* **Related content** - [Audit Log Concepts for Confluent Cloud](/cloud/current/monitoring/audit-logging/cloud-audit-log-concepts.html) - [Audit Log Concepts for Confluent Platform](/platform/current/security/audit-logs/audit-logs-concepts.html) auditable event : An auditable event is an event that represents an action or operation that can be tracked and monitored for security purposes and compliance. When an auditable event occurs, an auditable event method is triggered and an event message is sent to the audit log cluster and stored as an audit log record. **Related terms**: *audit log*, *event message* **Related content** - [Auditable Events in Confluent Cloud](/cloud/current/monitoring/audit-logging/event-methods/index.html) - [Auditable Events in Confluent Platform](/platform/current/security/audit-logs/auditable-events.html) authentication : Authentication is the process of verifying the identity of a principal that interacts with a system or application. Authentication is often used in conjunction with authorization to determine whether a principal is allowed to access a resource and perform a specific action or operation on that resource. Digital authentication requires one or more of the following: something a principal knows (a password or security question), something a principal has (a security token or key), or something a principal is (a biometric characteristic, such as a fingerprint or voiceprint). Multi-factor authentication (MFA) requires two or more forms of authentication. **Related terms**: *authorization*, *identity*, *identity provider*, *identity pool*, *principal*, *role* authorization : Authorization is the process of evaluating and then granting or denying a principal a set of permissions required to access and perform operations on resources. **Related terms**: *authentication*, *group mapping*, *identity*, *identity provider*, *identity pool*, *principal*, *role* Avro : Avro is a data serialization and exchange framework that provides data structures, remote procedure call (RPC), compact binary data format, a container file, and uses JSON to represent schemas. Avro schemas ensure that every field is properly described and documented for use with serializers and deserializers. You can either send a schema with every message or use Schema Registry to store and receive schemas for use by consumers and producers to save bandwidth and storage space. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [Apache Avro - a data serialization system](https://avro.apache.org/) - Confluent Cloud: [Avro Schema Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-avro.html) - Confluent Platform: [Avro Schema Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-avro.html) Basic Kafka cluster : A Confluent Cloud cluster type. Basic Kafka clusters are designed for experimentation, early development, and basic use cases. batch processing : Batch processing is the method of collecting a large volume of data over a specific time interval, after which the data is processed all at once and loaded into a destination system. Batch processing is often used when processing data can occur independently of the source and timing of the data. It is efficient for non-real-time data processing, such as data warehousing, reporting, and analytics. **Related terms**: *bounded stream*, *stream processing*, *unbounded stream* CIDR block : A CIDR block is a group of IP addresses that are contiguous and can be represented as a single block. CIDR blocks are expressed using Classless Inter-domain Routing (CIDR) notation that includes an IP address and a number of bits in the network mask. **Related content** - [CIDR notation](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing#CIDR_notation) - [Classless Inter-domain Routing (CIDR): The Internet Address Assignment and Aggregation Plan [RFC 4632]](https://www.rfc-editor.org/rfc/rfc4632.html) Cluster Linking : Cluster Linking is a highly performant data replication feature that enables links between Kafka clusters to mirror data from one cluster to another. Cluster Linking creates perfect copies of Kafka topics, which keep data in sync across clusters. Use cases include geo-replication of data, data sharing, migration, disaster recovery, and tiered separation of critical applications. **Related content** - [Geo-replication with Cluster Linking on Confluent Cloud](/cloud/current/multi-cloud/cluster-linking/index.html) - [Cluster Linking for Confluent Platform](/platform/current/multi-dc-deployments/cluster-linking/index.html) commit log : A commit log is a log of all event messages about commits (changes or operations made) sent to a Kafka topic. A commit log ensures that all event messages are processed at least once and provides a mechanism for recovery in the event of a failure. The commit log is also referred to as a write-ahead log (WAL) or a transaction log. **Related terms**: *event message* Confluent Cloud : Confluent Cloud is the fully managed, cloud-native event streaming service powered by Kora, the event streaming platform based on Kafka and extended by Confluent to provide high availability, scalability, elasticity, security, and global interconnectivity. Confluent Cloud offers cost-effective multi-tenant configurations as well as dedicated solutions, if stronger isolation is required. **Related terms**: *Apache Kafka*, *Kora* **Related content** - [Confluent Cloud Overview](/cloud/current/index.html) - [Confluent Cloud](https://www.confluent.io/confluent-cloud/) Confluent Cloud network : A Confluent Cloud network is an abstraction for a single tenant network environment that hosts Dedicated Kafka clusters in Confluent Cloud along with their single tenant services, like ksqlDB clusters and managed connectors. **Related content** - [Confluent Cloud Network Overview](/cloud/current/networking/overview.html#ccloud-networks) Confluent for Kubernetes (CFK) : *Confluent for Kubernetes (CFK)* is a cloud-native control plane for deploying and managing Confluent in private cloud environments through declarative API. Confluent Platform : Confluent Platform is a specialized distribution of Kafka at its core, with additional components for data integration, streaming data pipelines, and stream processing. Confluent REST Proxy : Confluent REST Proxy provides a RESTful interface to an Kafka cluster, making it easy to produce and consume messages, view the state of the cluster, and perform administrative actions without using the native Kafka protocol or clients. **Related content** - Confluent Platform: [REST Proxy](/platform/current/kafka-rest/index.html) Confluent Server : Confluent Server is the default Kafka broker component of Confluent Platform that builds on the foundation of Apache Kafka® and provides enhanced proprietary features designed for enterprise use. Confluent Server is fully compatible with Kafka, and adds Kafka cluster support for Role-Based Access Control, Audit Logs, Schema Validation, Self Balancing Clusters, Tiered Storage, Multi-Region Clusters, and Cluster Linking. **Related terms**: *Confluent Platform*, *Apache Kafka*, *Kafka broker*, *Cluster Linking*, *multi-region cluster (MRC)* Confluent Unit for Kafka (CKU) : Confluent Unit for Kafka (CKU) is a unit of horizontal scaling for Dedicated Kafka clusters in Confluent Cloud that provide preallocated resources. CKUs determine the capacity of a Dedicated Kafka cluster in Confluent Cloud. **Related content** - [CKU limits per cluster](/cloud/current/clusters/cluster-types.html#cku-limits-per-cluster) Connect API : The Connect API is the Kafka API that enables a connector to read event streams from a source system and write to a target system. Connect worker : A Connect worker is a server process that runs a connector and performs the actual work of moving data in and out of Kafka topics. A worker is a server process that runs on hardware independent of the Kafka brokers themselves. It is scalable and fault-tolerant, meaning you can run a cluster of workers that share the load of moving data in and out of Kafka from and to external systems. **Related terms**: *connector*, *Kafka Connect* connection attempts : In Confluent Cloud, connection attempts are a Kafka cluster billing dimension that defines the maximum number of new TCP connections to the cluster you can create in one second. This includes successful and unsuccessful authentication attempts. Available in the Metrics API as `successful_authentication_count` (only includes successful authentications, not unsuccessful authentication attempts). To reduce usage on connection attempts, use longer-lived connections to the cluster. If you exceed the maximum, connection attempts may be refused. connector : A connector is an abstract mechanism that enables communication, coordination, or cooperation among components by transferring data elements from one interface to another without changing the data. connector offset : Connector offset uniquely identifies the position of a connector as it processes data. Connectors use a variety of strategies to implement the connector offset, including everything from monotonically increasing integers to replay ids, lists of files, timestamps and even checkpoint information. Connector offsets keep track of already-processed data in the event of a connector restart or recovery. While sink connectors use a pattern for connector offsets similar to the offset mechanism used throughout Kafka, the implementation details for source connectors are often much different. This is because source connectors track the progress of a source system as it process data. consumer : A consumer is a Kafka client application that subscribes to (reads and processes) event messages from a Kafka topic. The Streams API and the Consumer API are the two APIs that enable consumers to read event streams from Kafka topics. **Related terms**: *Consumer API*, *consumer group*, *producer*, *Streams API* Consumer API : The Consumer API is the Kafka API used for consuming (reading) event messages or records from Kafka topics and enables a Kafka consumer to subscribe to a topic and read event messages as they arrive. Batch processing is a common use case for the Consumer API. consumer group : A consumer group is a single logical consumer implemented with multiple physical consumers for reasons of throughput and resilience. By dividing topics among consumers in the group into partitions, consumers in the group can process messages in parallel, increasing message throughput and enabling load balancing. **Related terms**: *consumer*, *partition*, *partition*, *producer*, *topic* consumer lag : Consumer lag is the number of consumer offsets between the latest message produced in a partition and the last message consumed by a consumer, that is the number of messages pending to be consumed from a particular partition. A large consumer lag, or a quickly growing lag, indicates that the consumer is unable to read from a partition as fast as the messages are available. This can be caused by a slow consumer, slow network, or slow broker. consumer offset : Consumer offset is the unique and monotonically increasing integer value that uniquely identifies the position of an event record in a partition. Consumers use offsets to track their current position in the Kafka topic, allowing consumers to resume processing from where they left off. Offsets are stored on the Kafka broker, which does not track which records have been read and which have not. It is up to the consumer connection to track this information. When a consumer acknowledges receiving and processing a message, it commits an offset value that is stored in the special internal topic `__commit_offsets`. cross-resource RBAC role binding : A cross-resource RBAC role binding is a role binding in Confluent Cloud that is applied at the Organization or Environment scope and grants access to multiple resources. For example, assigning a principal the NetworkAdmin role at the Organization scope lets them administer all networks across all Environments in their Organization. **Related terms**: *identity pool*, *principal*, *role*, *role binding* CRUD : CRUD is an acronym for the four basic operations that can be performed on data: Create, Read, Update, and Delete. custom connector : A custom connector is a connector created using Connect plugins uploaded to Confluent Cloud by users. This includes connector plugins that are built from scratch, modified open-source connector plugins, or third-party connector plugins. data at rest : Data at rest is data that is physically stored on non-volatile media (such as hard drives, solid-state drives, or other storage devices) and is not actively being transmitted or processed by a system. data contract : A data contract is a formal agreement between an upstream component and a downstream component on the structure and semantics of data that is in motion. A schema is a key element of a data contract. The schema, metadata, rules, policies, and evolution plan form the data contract. You can associate data contracts (schemas and more) with [topics](#term-Kafka-topic). **Related content** - Confluent Platform: [Data Contracts for Schema Registry on Confluent Platform](/platform/current/schema-registry/fundamentals/data-contracts.html) - Confluent Cloud: [Data Contracts for Schema Registry on Confluent Cloud](/cloud/current/sr/fundamentals/data-contracts.html) - Cloud Console: [Manage Schemas in Confluent Cloud](/cloud/current//sr/schemas-manage.html) data encryption key (DEK) : A data encryption key (DEK) is a symmetric key that is used to encrypt and decrypt data. The DEK is used in client-side field level encryption (CSFLE) to encrypt sensitive data. The DEK is itself encrypted using a key encryption key (KEK) that is only accessible to authorized users. The encrypted DEK and encrypted data are stored together. Only users with access to the KEK can decrypt the DEK and access the sensitive data. **Related terms**: *envelope encryption*, *key encryption key (KEK)* data in motion : Data in motion is data that is actively being transferred between source and destination, typically systems, devices, or networks. Data in motion is also referred to as data in transit or data in flight. data in use : Data in use is data that is actively being processed or manipulated in memory (RAM, CPU caches, or CPU registers). data ingestion : Data ingestion is the process of collecting, importing, and integrating data from various sources into a system for further processing, analysis, or storage. data mapping : Data mapping is the process of defining relationships or associations between source data elements and target data elements. Data mapping is an important process in data integration, data migration, and data transformation, ensuring that data is accurately and consistently represented when it is moved or combined. data pipeline : A data pipeline is a series of processes and systems that enable the flow of data from sources to destinations, automating the movement and transformation of data for various purposes, such as analytics, reporting, or machine learning. A data pipeline typically comprised of a source system, a data ingestion tool, a data transformation tool, and a target system. A data pipeline covers the following stages: data extraction, data transformation, data loading, and data validation. Data Portal : Data Portal is a Confluent Cloud application that uses Stream Catalog and Stream Lineage to provide self-service access throughout Confluent Cloud Console for data practitioners to search and discover existing topics using tags and business metadata, request access to topics and data, and access data in topics to to build streaming applications and data pipelines. Leverages Stream Catalog and Stream Lineage to provide a data-centric view of Confluent optimized for self-service access to data where users can search, discover and understand available data, request access to data, and use data. **Related terms**: *Stream Catalog*, *Stream Lineage* **Related content** - [Data Portal on Confluent Cloud](/cloud/current/stream-governance/data-portal.html) data serialization : Data serialization is the process of converting data structures or objects into a format that can be stored or transmitted, and reconstructed later in the same or another computer environment. Data serialization is a common technique for implementing data persistence, interprocess communication, and object communication. Confluent Schema Registry (in Confluent Platform) and Confluent Cloud Schema Registry support data serialization using serializers and deserializers for the following formats: Avro, JSON Schema, and Protobuf. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) data steward : A data steward is a person with data-related responsibilities, such as data governance, data quality, and data security. data stream : A data stream is a continuous flow of data records that are produced and consumed by applications. dead letter queue (DLQ) : A dead letter queue (DLQ) is a queue where messages that could not be processed successfully by a sink connector are placed. Instead of stopping, the sink connector sends messages that could not be written successfully as event records to the DLQ topic while the sink connector continues processing messages. Dedicated Kafka cluster : A Confluent Cloud cluster type. Dedicated Kafka clusters are designed for critical production workloads with high traffic or private networking requirements. deserializer : A deserializer is a tool that converts a serial byte stream back into objects and parallel data. Deserializers work with serializers (known together as Serdes) to support efficient storage and high-speed data transmission over the wire. Confluent provides Serdes for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) egress : In general networking, egress refers to outbound traffic leaving a network or a specific network segment. In Confluent Cloud, egress is a Kafka cluster billing dimension that defines the number of bytes consumed from the cluster in one second. Available in the Metrics API as `sent_bytes` (convert from bytes to MB). To reduce egress in Confluent Cloud, compress your messages and ensure each consumer is only consuming from the topics it requires. For compression, use lz4. Avoid gzip because of high overhead on the cluster. Elastic Confluent Unit for Kafka (eCKU) : Elastic Confluent Unit for Kafka (eCKU) is used to express capacity for Basic, Standard, Enterprise, and Freight Kafka clusters. These clusters automatically scale up to a fixed ceiling. There is no need to resize these type clusters. When you need more capacity, your cluster expands up to the fixed ceiling. If you’re not using capacity above the minimum, you’re not paying for it. ELT : ELT is an acronym for Extract-Load-Transform, where data is extracted from a source system and loaded into a target system before processing or transformation. Compared to ETL, ELT is a more flexible approach to data ingestion because the data is loaded into the target system before transformation. Enterprise Kafka cluster : A Confluent Cloud cluster type. Enterprise Kafka clusters are designed for production-ready functionality that requires private endpoint networking capabilities. envelope encryption : Envelope encryption is a cryptographic technique that uses two keys to encrypt data. The symmetric data encryption key (DEK) is used to encrypt sensitive data. The separate asymmetric key encryption key (KEK) is the master key used to encrypt the DEK. The DEK and encrypted data are stored together. Only users with access to the KEK can decrypt the DEK and access the sensitive data. In Confluent Cloud, envelope encryption is used to enable client-side field level encryption (CSFLE). CSFLE encrypts sensitive data in a message before it is sent to Confluent Cloud and allows for temporary decryption of sensitive data when required to perform operations on the data. **Related terms**: *data encryption key (DEK)*, *key encryption key (KEK)* ETL : ETL is an acronym for Extract-Transform-Load, where data is extracted from a source system, transformed into a target format, and loaded into a target system. Compared to ELT, ETL is a more rigid approach to data ingestion because the data is transformed before loading into the target system. event : An event is a meaningful action or occurrence of something that happened. Events that can be recognized by a program, either human-generated or triggered by software, can be recorded in a log file or other data store. **Related terms**: *event message*, *event record*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event message : An event message is a record of an event sent to a Kafka topic, represented as a key-value pair. Each event message consists of a key-value pair, a timestamp, the compression type, headers for metadata (optional), and a partition and offset ID (once the message is written). The key is optional and can be used to identify the event. The value is required and contains details about the event that happened. **Related terms**: *event*, *event record*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event record : An event record is the record of an event stored in a Kafka topic. Event records are organized and durably stored in topics. Examples of events include orders, payments, activities, or measurements. An event typically contains one or more data fields that describe the fact, as well as a timestamp that denotes when the event was created by its event source. The event may also contain various metadata, such as its source of origin (for example, the application or cloud service that created the event) and storage-level information (for example, its position in the event stream). **Related terms**: *event*, *event message*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event sink : An event sink is a consumer of events, which can include applications, cloud services, databases, IoT sensors, and more. **Related terms**: *event*, *event message, \*event record*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event source : An event source is a producer of events, which can include cloud services, databases, IoT sensors, mainframes, and more. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event stream*, *event streaming*, *event streaming platform*, *event time* event stream : An event stream is a continuous flow of event messages produced by an event source and consumed by one or more consumers. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event streaming platform*, *event time* event streaming : Event streaming is the practice of capturing event data in real-time from data sources. Event streaming is a form of data streaming that is used to capture, store, process, and react to data in real-time or retrospectively. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming platform*, *event time* event streaming platform : An event streaming platform is a platform that events can be written to once, allowing distributed functions within an organization to react in realtime. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event time* event time : Event time is the time when an event occurred on the producing device, as opposed to the time when the event was processed or recorded. Event time is often used in stream processing to determine the order of events and to perform windowing operations. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event streaming platform* exactly-once semantics : Exactly-once semantics is a guarantee that a message is delivered exactly once and in the order that it was sent. Even if a producer retries sending a message, or a consumer retries processing a message, the message is delivered exactly once. This guarantee is achieved by the broker assigning a unique ID to each message and storing the ID in the consumer offset. The consumer offset is committed to the broker only after the message is processed. If the consumer fails to process the message, the message is redelivered and processed again. Freight Kafka cluster : A Confluent Cloud cluster type. Freight Kafka clusters are designed for high-throughput, relaxed latency workloads that are less expensive than self-managed open source Kafka. granularity : Granularity is the degree or level of detail to which an entity (a system, service, or resource) is broken down into subcomponents, parts, or elements. Entities that are *fine-grained* have a higher level of detail, while *coarse-grained* entities have a reduced level of detail, often combining finer parts into a larger whole. In the context of access control, granular permissions provide precise control over resource access. They allow administrators to grant specific operations on distinct resources. This ensures users only have permissions tailored to their needs, minimizing unnecessary or potentially risky access. group mapping : Group mapping is a set of rules that map groups in your SSO identity provider to Confluent Cloud RBAC roles. When a user signs in to Confluent Cloud using SSO, Confluent Cloud uses the group mapping to grant access to Confluent Cloud resources. **Related terms**: *identity provider*, *identity pool*, *principal*, *role* **Related content** - [Group Mapping for Confluent Cloud](/cloud/current/access-management/authenticate/sso/group-mapping/overview.html) identity : An identity is a unique identifier that is used to authenticate and authorize users and applications to access resources. Identity is often used in conjunction with access control to determine whether a user or application is allowed to access a resource and perform a specific action or operation on that resource. **Related terms**: *identity provider*, *identity pool*, *principal*, *role* identity pool : An identity pool is a collection of identities that can be used to authenticate and authorize users and applications to access resources. Identity pools are used to manage permissions for users and applications that access resources in Confluent Cloud. They are also used to manage permissions for Confluent Cloud service accounts that are used to access resources in Confluent Cloud. identity provider : An identity provider is a trusted provider that authenticates users and issues security tokens that are used to verify the identity of a user. Identity providers are often used in single sign-on (SSO) scenarios, where a user can log in to multiple applications or services with a single set of credentials. Infinite Storage : Infinite Storage is the Confluent Cloud storage service that enhances the scalability of Confluent Cloud resources by separating storage and processing. Tiered storage within Confluent Cloud moves data between storage layers based on the needs of the workload, retrieves tiered data when requested, and garbage collects data that is past retention or otherwise deleted. If an application reads historical data, latency is not increased for other applications reading more recent data. Storage resources are decoupled from compute resources, you only pay for what you produce to Confluent Cloud and for storage that you use, and CKUs do not have storage limits. Related content: - [Infinite Storage in Confluent Cloud for Apache Kafka](https://www.confluent.io/blog/infinite-kafka-data-storage-in-confluent-cloud/) ingress : In general networking, ingress refers to traffic that enters a network from an external source. In Confluent Cloud, ingress is a Kafka cluster billing dimension that defines the number of bytes produced to the cluster in one second. Available in the Metrics API as `received_bytes` (convert from bytes to MB). To reduce ingress in Confluent Cloud, compress your messages. For compression, use lz4. Avoid gzip because of high overhead on the cluster. internal topic : An internal topic is a topic, prefixed with double underscores (“_\_”), that is automatically created by a Kafka component to store metadata about the broker, partition assignment, consumer offsets, and other information. Examples of internal topics: `__cluster_metadata`, `__consumer_offsets`, `__transaction_state`, `__confluent.support.metrics`, and `__confluent.support.metrics-raw`. JSON Schema : JSON Schema is a declarative language used for data serialization and exchange to define data structures, specify formats, and validate JSON documents. It is a way to encode expected data types, properties, and constraints to ensure that all fields are properly described for use with serializers and deserializers. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [JSON Schema - a declarative language that allows you to annotate and validate JSON documents.](https://json-schema.org/) - Confluent Cloud: [JSON Schema Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-avro.html) - Confluent Platform: [JSON Schema Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-json.html) Kafka bootstrap server : A Kafka bootstrap server is a Kafka broker that a Kafka client initiates a connection to a Kafka cluster and returns metadata, which includes the addresses for all of the brokers in the Kafka cluster. Although only one bootstrap server is required to connect to a Kafka cluster, multiple brokers can be specified in a bootstrap server list to provide high availability and fault tolerance in case a broker is unavailable. In Confluent Cloud, the bootstrap server is the general cluster endpoint. Kafka broker : A Kafka broker is a server in the Kafka storage layer that stores event streams from one or more sources. A Kafka cluster is typically comprised of several brokers. Every broker in a cluster is also a bootstrap server, meaning if you can connect to one broker in a cluster, you can connect to every broker. Kafka client : A Kafka client allows you to write distributed applications and microservices that read, write, and process streams of events in parallel, at scale, and in a fault-tolerant manner, even in the case of network problems or machine failures. The Kafka client library provides functions, classes, and utilities that allow developers to create Kafka producer clients (Producers) and consumer clients (Consumers) using various programming languages. The primary way to build production-ready Producers and Consumers is by using your preferred programming language and a Kafka client library. **Related content** - [Build Client Applications for Confluent Cloud](/cloud/current/client-apps/overview.html) - [Build Client Applications for Confluent Platform](/platform/current/clients/index.html) - [Getting Started with Apache Kafka and Java (or Python, Go, .Net, and others)](https://developer.confluent.io/get-started/java/) Kafka cluster : A Kafka cluster is a group of interconnected Kafka brokers that manage and distribute real-time data streaming, processing, and storage as if they are a single system. By distributing tasks and services across multiple Kafka brokers, the Kafka cluster improves availability, reliability, and performance. Kafka Connect : Kafka Connect is the component of Kafka that provides data integration between databases, key-value stores, search indexes, file systems, and Kafka brokers. Kafka Connect is an ecosystem of a client application and pluggable connectors. As a client application, Connect is a server process that runs on hardware independent of the Kafka brokers themselves. It is scalable and fault-tolerant, meaning you can run a cluster of Connect workers that share the load of moving data in and out of Kafka from and to external systems. Connect also abstracts the business of code away from the user and instead requires only JSON configuration to run. **Related content** - Confluent Cloud: [Kafka Connect](/cloud/current/billing/overview.html#kconnect-long) - Confluent Platform: [Kafka Connect](/platform/current/connect/index.html) Kafka controller : A Kafka controller is the node in a Kafka cluster that is responsible for managing and changing the metadata of the cluster. This node also communicates metadata changes to the rest of the cluster. When Kafka uses ZooKeeper for metadata management, the controller is a broker, and the broker persists the metadata to ZooKeeper for backup and recovery. With KRaft, you dedicate Kafka nodes to operate as controllers and the metadata is stored in Kafka itself and not persisted to ZooKeeper. KRaft enables faster recovery because of this. For more information, see [KRaft overview](/platform/current/kafka-metadata/kraft.html). Kafka listener : A Kafka listener is an endpoint that Kafka brokers bind to use to communicate with clients. For Kafka clusters, Kafka listeners are configured in the `listeners` property of the `server.properties` file. Advertised listeners are publicly accessible endpoints that are used by clients to connect to the Kafka cluster. **Related content** - [Kafka Listeners – Explained](https://www.confluent.io/blog/kafka-listeners-explained/) Kafka metadata : Kafka metadata is the information about the Kafka cluster and the topics that are stored in it. This information includes details such as the brokers in the cluster, the topics that are available, the partitions for each topic, and the location of the leader for each partition. Kafka metadata is used by clients to discover the available brokers and topics, and to determine which broker is the leader for a particular partition. This information is essential for clients to be able to send and receive messages to and from Kafka. Kafka Streams : Kafka Streams is a stream processing library for building streaming applications and microservices that transform (filter, group mapping, aggregate, join, and more) incoming event streams in real-time to Kafka topics stored in an Kafka cluster. The Streams API can be used to build applications that process data in real-time, analyze data continuously, and build data pipelines. **Related content** - [Kafka Streams](/platform/current/streams/overview.html) Kafka topic : See *topic*. key encryption key (KEK) : A key encryption key (KEK) is a master key that is used to encrypt and a decrypt other keys, specifically the data encryption key (DEK). Only users with access to the KEK can decrypt the DEK and access the sensitive data. **Related terms**: *data encryption key (DEK)*, *envelope encryption*. Kora : Kora is the cloud-native streaming data service based on Kafka technology that powers the Confluent Cloud event streaming platform for building real-time data pipelines and streaming applications. Kora abstracts low-level resources, such as Kafka brokers, and hides operational complexities, such as system upgrades. Kora is built on the following foundations: a tiered storage layer that improves cost and performance, elasticity and consistent performance through incremental load balancing, cost effective multi-tenancy with dynamic quota management and cell-based isolation, continuous monitoring of both system health and data integrity, and clean abstraction with standard Kafka protocols and CKUs to hide underlying resources. **Related terms**: *Apache Kafka*, *Confluent Cloud*, *Confluent Unit for Kafka (CKU)* **Related content** - [Kora: The Cloud Native Engine for Apache Kafka](https://www.confluent.io/blog/cloud-native-data-streaming-kafka-engine/) - [Kora: A Cloud-Native Event Streaming Platform For Kafka](https://www.vldb.org/pvldb/vol16/p3822-povzner.pdf) KRaft : KRaft (or Apache Kafka Raft) is a consensus protocol introduced in Kafka 2.4 to provide metadata management for Kafka with the goal to replace ZooKeeper. KRaft simplifies Kafka because it enables the management of metadata in Kafka itself, rather than splitting it between ZooKeeper and Kafka. As of Confluent Platform 7.5, KRaft is the default method of metadata management in new deployments. For more information, see [KRaft overview](/platform/current/kafka-metadata/kraft.html). ksqlDB : ksqlDB is a streaming SQL database engine purpose-built for creating stream processing applications on top of Kafka. logical Kafka cluster (LKC) : A logical Kafka cluster (LKC) is a subset of a physical Kafka cluster (PKC) that is isolated from other logical clusters within Confluent Cloud. Each logical unit of isolation is considered a tenant and maps to a specific organization. If the mapping is one-to-one, one LKC maps to one PKC (a Dedicated cluster). If the mapping is many-to-one, one LKC maps to one of the multitenant Kafka cluster types (Basic, Standard, Enterprise, and Freight). **Related terms**: *Confluent Cloud*, *Kafka cluster*, *physical Kafka cluster (PKC)* **Related content** - [Kafka Cluster Types in Confluent Cloud](/cloud/current/clusters/cluster-types.html) - [Multi-tenancy and Client Quotas on Confluent Cloud](/cloud/current/clusters/client-quotas.html) multi-region cluster (MRC) : A multi-region cluster (MRC) is a single Kafka cluster that replicates data between datacenters across regional availability zones. multi-tenancy : Multi-tenancy is a software architecture in which a single physical instance is shared among multiple logical instances, or tenants. In Confluent Cloud, each Basic, Standard, Enterprise, and Freight cluster is a logical Kafka cluster (LKC) that shares a physical Kafka cluster (PKC) with other tenants. Each LKC is isolated from other L and has its own resources, such as memory, compute, and storage. **Related terms**: *Confluent Cloud*, *logical Kafka cluster (LKC)*, *physical Kafka cluster (PKC)* **Related content** * [From On-Prem to Cloud-Native: Multi-Tenancy in Confluent Cloud](https://www.confluent.io/blog/cloud-native-multi-tenant-kafka-with-confluent-cloud/) * [Multi-tenancy and Client Quotas on Confluent Cloud](/cloud/current/clusters/client-quotas.html) offset : An offset is an integer assigned to each message that uniquely represents its position within the data stream, guaranteeing the ordering of records and allowing offset-based connections to replay messages from any point in time. **Related terms**: *consumer offset*, *connector offset*, *offset commit*, *replayability* offset commit : An offset commit is the process of keeping track of the current position of an offset-based connection (primarily Kafka consumers and connectors) within the data stream. The offset commit process is not specific to consumers, producers, or connectors. It is a general mechanism in Kafka to track the position of any application that is reading data. When a consumer commits an offset, the offset identifies the next message the consumer should consume. For example, if a consumer has an offset of 5, it has consumed messages 0 through 4 and will next consume message 5. If the consumer crashes or is shut down, its partitions are reassigned to another consumer which initiates consuming from the last committed offset of each partition. The committed offset for consumers is stored on a Kafka broker. When a consumer commits an offset, it sends a commit request to the Kafka cluster, specifying the partition and offset it wants to commit for a particular consumer group. The Kafka broker receiving the commit request then stores this offset in the `__consumer_offsets` internal topic. **Related terms**: *consumer offset*, *offset* OpenSSL : OpenSSL is an open-source software library and toolkit that implements the Secure Sockets Layer (SSL) and Transport Layer Security (TLS) protocols. For more information, see [Wikipedia](https://en.wikipedia.org/wiki/OpenSSL). parent cluster : The Kafka cluster that a resource belongs to. **Related terms**: *Kafka cluster* partition : A partition is a unit of data storage that divides a topic into multiple, parallel event streams, each of which is stored on separate Kafka brokers and can be consumed independently. Partitioning is a key concept in Kafka because it allows Kafka to scale horizontally by adding more brokers to the cluster. Partitions are also the unit of parallelism in Kafka. A topic can have one or more partitions, and each partition is an ordered, immutable sequence of event records that is continually appended to a partition log. partitions (pre-replication) : In Confluent Cloud, partitions are a Kafka cluster billing dimension that define the maximum number of partitions that can exist on the cluster at one time, before replication. While you are not charged for partitions on any type of Kafka cluster, the number of partitions you use has an impact on eCKU. To determine eCKUs limits for partitions, Confluent Cloud bills only for pre-replication (leader partitions) across a cluster. All topics that you create (as well as internal topics that are automatically created by Confluent Platform components such as ksqlDB, Kafka Streams, Connect, and Control Center (Legacy)) count towards the cluster partition limit. Confluent prefixes topics created automatically with an underscore (_). Topics that are internal to Kafka itself (consumer offsets) are not visible in Cloud Console and do not count against partition limits or toward partition billing. Available in the Metrics API as `partition_count`. In Confluent Cloud, attempts to create additional partitions beyond the cluster limit fail with an error message. To reduce usage on partitions (pre-replication), delete unused topics and create new topics with fewer partitions. Use the Kafka Admin interface to increase the partition count of an existing topic if the initial partition count is too low. physical Kafka cluster (PKC) : A physical Kafka cluster (PKC) is a Kafka cluster comprised of multiple brokers. Each physical Kafka cluster is created on a Kubernetes cluster by the control plane. A PKC is not directly accessible by clients. principal : A principal is an entity that can be authenticated and granted permissions based on roles to access resources and perform operations. An entity can be a user account, service account, group mapping, or identity pool. **Related terms**: *group mapping*, *identity*, *identity pool*, *role*, *service account*, *user account* private internet : A private internet is a closed, restricted computer network typically used by organizations to provide secure environments for managing sensitive data and resources. processing time : Processing time is the time when an event is processed or recorded by a system, as opposed to the time when the event occurred on the producing device. Processing time is often used in stream processing to determine the order of events and to perform windowing operations. producer : A producer is a client application that publishes (writes) data to a topic in an Kafka cluster. Producers write data to a topic and are the only clients that can write data to a topic. Each record written to a topic is appended to the partition of the topic that is selected by the producer. Producer API : The Producer API is the Kafka API that allows you to write data to a topic in an Kafka cluster. The Producer API is used by producer clients to publish data to a topic in an Kafka cluster. Protobuf : Protobuf (or Protocol Buffers) is an open-source data format used to serialize structured data for storage. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [Protocol Buffers](https://protobuf.dev/) - [Getting Started with Protobuf in Confluent Cloud](https://www.confluent.io/blog/using-protobuf-in-confluent-cloud/) - Confluent Cloud: [Protobuf Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-protobuf.html) - Confluent Platform: [Protobuf Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-protobuf.html) public internet : The public internet is the global system of interconnected computers and networks that use TCP/IP to communicate with each other. rebalancing : Rebalancing is the process of redistributing the partitions of a topic among the consumers of a consumer group for improved performance and scalability. A rebalance can occur if a consumer has failed the heartbeat and has been excluded from the group, it voluntarily left the group, metadata has been updated for a consumer, or a consumer has joined the group. replayability : Replayability is the ability to replay messages from any point in time. **Related terms**: *consumer offset*, *offset*, *offset commit* replication : Replication is the process of creating and maintaining multiple copies (or *replicas*) of data across different nodes in a distributed system to increase availability, reliability, redundancy, and accessibility. replication factor : A replication factor is the number of copies of a partition that are distributed across the brokers in a cluster. requests : In Confluent Cloud, requests are a Kafka cluster billing dimension that defines the number of client requests to the cluster in one second. Available in the Metrics API as `request_count`. To reduce usage on requests, you can adjust producer batching configurations, consumer client batching configurations, and shut down otherwise inactive clients. For Dedicated clusters, a high number of requests per second results in increased load on the cluster. role : A role is a Confluent-defined job function assigned a set of permissions required to perform specific actions or operations on Confluent resources bound to a principal and Confluent resources. A role can be assigned to a user account, group mapping, service account, or identity pool. **Related terms**: *group mapping*, *identity*, *identity pool*, *principal*, *service account* **Related content** - [Predefined RBAC Roles in Confluent Cloud](/cloud/current/access-management/access-control/rbac/predefined-rbac-roles.html) - [Role-Based Access Control Predefined Roles in Confluent Platform](/platform/current/security/rbac/rbac-predefined-roles.html) rolling restart : A rolling restart restarts the brokers in a Kafka cluster with zero downtime by incrementally restarting a Kafka broker after verifying that there are no under-replicated partitions on the broker before proceeding to the next broker. Restarting the brokers one at a time allows for software upgrades, broker configuration updates, or cluster maintenance while maintaining high availability by avoiding downtime. **Related content** - [Rolling restart](/platform/current/kafka/post-deployment.html#rolling-restart) schema : A schema is the structured definition or blueprint used to describe the format and structure event messages sent through the Kafka event streaming platform. Schemas are used to validate the structure of data in event messages and ensures that producers and consumers are sending and receiving data in the same format. Schemas are defined in the Schema Registry. Schema Registry : Schema Registry is a centralized repository for managing and validating schemas for topic message data that stores and manages schemas for Kafka topics. Schema Registry is built into Confluent Cloud as a managed service, available with the Advanced Stream Governance package, and offered as part of Confluent Enterprise for self-managed deployments. The Schema Registry is a RESTful service that stores and manages schemas for Kafka topics. The Schema Registry is integrated with Kafka and Connect to provide a central location for managing schemas and validating data. Producers and consumers to Kafka topics use schemas to ensure data consistency and compatibility as schemas evolve. Schema Registry is a key component of Stream Governance. **Related content** - Confluent Cloud: [Manage Schemas in Confluent Cloud](/cloud/current/sr/schemas-manage.html) - Confluent Platform: [Schema Registry Overview](/platform/current/schema-registry/index.html) schema subject : A schema subject is the namespace for a schema in Schema Registry. This unique identifier defines a logical grouping of related schemas. Kafka topics contain event messages serialized and deserialized using the structure and rules defined in a schema subject. This ensures compatibility and supports schema evolution. **Related content** - Confluent Cloud: [Manage Schemas in Confluent Cloud](/cloud/current/sr/schemas-manage.html) - Confluent Platform: [Schema Registry Concepts](/platform/current/schema-registry/index.html) - [Understanding Schema Subjects](https://developer.confluent.io/courses/schema-registry/schema-subjects/) Serdes : Serdes are serializers and deserializers that convert objects and parallel data into a serial byte stream for efficient storage and high-speed data transmission over the wire. Confluent provides Serdes for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) - [Serde](https://serde.rs/) serializer : A serializer is a tool that converts objects and parallel data into a serial byte stream. Serializers work with deserializers (known together as Serdes) to support efficient storage and high-speed data transmission over the wire. Confluent provides serializers for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) service account : A service account is a non-person entity used by an application or service to access resources and perform operations. Because a service account is an identity independent of the user who created it, it can be used programmatically to authenticate to resources and perform operations without the need for a user to be signed in. **Related content** - [Service Accounts for Confluent Cloud](/cloud/current/access-management/identity/service-accounts.html) service quota : A service quota is the limit, or maximum value, for a specific Confluent Cloud resource or operation that might vary by the resource scope it applies to. **Related content** - [Service Quotas for Confluent Cloud](/cloud/current/quotas/index.html) single message transform (SMT) : A single message transform (SMT) is a transformation or operation applied in realtime on an individual message that changes the values, keys, or headers of a message before being sent to a sink connector or after being read from a source connector. SMTs are convenient for inserting fields, masking information, event routing, and other minor data adjustments. single sign-on (SSO) : Single sign-on (SSO) is a centralized authentication service that allows users to use a single set of credentials to log in to multiple applications or services. **Related terms**: *authentication*, *group mapping*, *identity provider* **Related content** - [Single Sign-On for Confluent Cloud](/cloud/current/access-management/authenticate/sso/index.html) sink connector : A sink connector is a Kafka Connect connector that publishes (writes) data from a Kafka topic to an external system. source connector : A source connector is a Kafka Connect connector that subscribes (reads) data from a source (external system), extracts the payload and schema of the data, and publishes (writes) the data to Kafka topics. standalone : Standalone refers to a configuration in which a software application, system, or service operates independently on a single instance or device. This mode is commonly used for development, testing, and debugging purposes. For Kafka Connect, a standalone worker is a single process responsible for running all connectors and tasks on a single instance. Standard Kafka cluster : A Confluent Cloud cluster type. Standard Kafka clusters are designed for production-ready features and functionality. static egress IP address : A static egress IP address is an IP address used by a Confluent Cloud managed connector to establish outbound connections to endpoints of external data sources and sinks over the public internet. **Related content** - [Use Static IP Addresses on Confluent Cloud for Connectors and Cluster Linking](/cloud/current/networking/static-egress-ip-addresses.html) - [Static Egress IP Addresses for Confluent Cloud Connectors](/cloud/current/connectors/static-egress-ip.html) storage (pre-replication) : In Confluent Cloud, storage is a Kafka cluster billing dimension that defines the number of bytes retained on the cluster, pre-replication. Available in the Metrics API as `retained_bytes` (convert from bytes to TB). The returned value is pre-replication. Standard, Enterprise, Dedicated, and Freight clusters support Infinite Storage. This means there is no maximum size limit for the amount of data that can be stored on the cluster. You can configure policy settings `retention.bytes` and `retention.ms` at the topic level to control exactly how much and how long to retain data in a way that makes sense for your applications and helps control your costs. To reduce storage in Confluent Cloud, compress your messages and reduce retention settings. For compression, use lz4. Avoid gzip because of high overhead on the cluster. Stream Catalog : Stream Catalog is a pillar of Confluent Cloud Stream Governance that provides a centralized inventory of your organization’s data assets that supports data governance and data discovery. With Data Portal in Confluent Cloud Console, users can find event streams across systems, search topics by name or tags, and enrich event data to increase value and usefulness. REST and GraphQL APIs can be used to search schemas, apply tags to records or fields, manage business metadata, and discover relationships across data assets. **Related content** - [Stream Catalog on Confluent Cloud: User Guide to Manage Tags and Metadata](/cloud/current/stream-governance/stream-catalog.html) - [Stream Catalog in Streaming Data Governance (Confluent Developer course)](https://developer.confluent.io/courses/governing-data-streams/stream-catalog/) Stream Governance : Stream Governance is a collection of tools and features that provide data governance for data in motion. These include data quality tools such as Schema Registry, schema ID validation, and schema linking; built-in data catalog capabilities to classify, organize, and find event streams across systems; and stream lineage to visualize complex data relationships and uncover insights with interactive, end-to-end maps of event streams. Taken together, these and other governance tools enable teams to manage the availability, integrity, and security of data used across organizations, and help with standardization, monitoring, collaboration, reporting, and more. **Related terms**: *Stream Catalog*, *Stream Lineage* **Related content** - [Stream Governance on Confluent Cloud](/cloud/current/stream-governance/index.html) stream lineage : Stream lineage is the life cycle, or history, of data, including its origins, transformations, and consumption, as it moves through various stages in data pipelines, applications, and systems. Stream lineage provides a record of data’s journey from its source to its destination, and is used to track data quality, data governance, and data security. **Related terms**: **Data Portal**, *Stream Governance* **Related content** - [Stream Lineage on Confluent Cloud](/cloud/current/stream-governance/stream-lineage.html) stream processing : Stream processing is the method of collecting event stream data in real-time as it arrives, transforming the data in real-time using operations (such as filters, joins, and aggregations), and publishing the results to one or more target systems. Stream processing can be used to analyze data continuously, build data pipelines, and process time-sensitive data in real-time. Using the Confluent event streaming platform, event streams can be processed in real-time using Kafka Streams, Kafka Connect, or ksqlDB. Streams API : The Streams API is the Kafka API that allows you to build streaming applications and microservices that transform (for example, filter, group, aggregate, join) incoming event streams in real-time to Kafka topics stored in a Kafka cluster. The Streams API is used by stream processing clients to process data in real-time, analyze data continuously, and build data pipelines. **Related content** - [Introduction Kafka Streams API](/platform/current/streams/introduction.html) throttling : Throttling is the process Kafka clusters in Confluent Cloud use to protect themselves from getting to an over-utilized state. Also known as backpressure, throttling in Confluent Cloud occurs when cluster load reaches 80%. At this point, applications may start seeing higher latencies or timeouts as the cluster must begin throttling requests or connection attempts. topic : A topic is a user-defined category or feed name where event messages are stored and published by producers and subscribed to by consumers. Each topic is a log of event messages. Topics are stored in one or more partitions, which distribute topic records brokers in a Kafka cluster. Each partition is an ordered, immutable sequence of records that are continually appended to a topic. **Related content** - [Manage Topics in Confluent Cloud](/cloud/current/client-apps/topics/index.html) total client connections : In Confluent Cloud, total client connections are a Kafka cluster billing dimension that defines the number of TCP connections to the cluster you can open at one time. Available in the Metrics API as `active_connection_count`. Filter by principal to understand how many connections each application is creating. How many connections a cluster supports can vary widely based on several factors, including number of producer clients, number of consumer clients, partition keying strategy, produce patterns per client, and consume patterns per client. For Dedicated clusters, Confluent derives a guideline for total client connections from benchmarking that indicates exceeding this number of connections increases produce latency for test clients. However, this does not apply to all workloads. That is why total client connections are a guideline, not a hard limit for Dedicated Kafka clusters. Monitor the impact on cluster load as connection count increases, as this is the final representation of the impact of a given workload or CKU dimension on the underlying resources of the cluster. Consider the Confluent guideline a per-CKU guideline. The number of connections tends to increase when you add brokers. In other words, if you significantly exceed the per-CKU guideline, cluster expansion doesn’t always give your cluster more connection count headroom. Transport Layer Security (TLS) : Transport Layer Security (TLS) is a cryptographic protocol that provides secure communication over a network. For more information, see [Wikipedia](https://en.wikipedia.org/wiki/Transport_Layer_Security). unbounded stream : An unbounded stream is a stream of data that is continuously generated in real-time and has no defined end. Examples of unbounded streams include stock prices, sensor data, and social media feeds. Processing unbounded streams requires a different approach than processing bounded streams. Unbounded streams are processed incrementally as data arrives, while bounded streams are processed as a batch after all data has arrived. Kafka Streams and Flink can be used to process unbounded streams. **Related terms**: *bounded stream*, *stream processing*, *unbounded stream* under replication : Under replication is a situation when the number of in-sync replicas is below the number of all replicas. Under Replicated partitions can occur when a broker is down or cannot replicate fast enough from the leader (replica fetcher lag). user account : A user account is an account representing the identity of a person who can be authenticated and granted access to Confluent Cloud resources. **Related content** - [User Accounts for Confluent Cloud](/cloud/current/access-management/identity/user-accounts/overview.html) watermark : A watermark in Flink is a marker that keeps track of time as data is processed. A watermark means that all records until the current moment in time have been “seen”. This way, Flink can correctly perform tasks that depend on when things happened, like calculating aggregations over time windows. **Related content** - [Time and Watermarks](/cloud/current/flink/concepts/timely-stream-processing.html) ### SFTP file names The SFTP data model is a flat structure. Each record gets stored into a file and the name of each file serves as the unique key. Generally, the hierarchy of files in which records get stored follow this format: ```bash ///++. Admin API : The Admin API is the Kafka REST API that enables administrators to manage and monitor Kafka clusters, topics, brokers, and other Kafka components. Ansible Playbooks for Confluent Platform : Ansible Playbooks for Confluent Platform is a set of Ansible playbooks and roles that are designed to automate the deployment and management of Confluent Platform. Apache Flink : Apache Flink is an open source stream processing framework for stateful computations over unbounded and bounded data streams. Flink provides a unified API for batch and stream processing that supports event-time and out-of-order processing, and supports exactly-once semantics. Flink applications include real-time analytics, data pipelines, and event-driven applications. **Related terms**: *bounded stream*, *data stream*, *stream processing*, *unbounded stream* **Related content** - [Apache Flink: Stream Processing and SQL on Confluent Cloud](/cloud/current/flink/index.html#) - [What is Apache Flink?](https://www.confluent.io/learn/apache-flink/) - [Apache Flink 101 (Confluent Developer course)](https://developer.confluent.io/courses/apache-flink/intro/) Apache Kafka : Apache Kafka is an open source event streaming platform that provides a unified, high-throughput, low-latency, fault-tolerant, scalable, distributed, and secure data streaming platform. Kafka is a publish-and-subscribe messaging system that enables distributed applications to ingest, process, and share data in real-time. **Related content** - [Introduction to Kafka](/kafka/introduction.html) audit log : An audit log is a historical record of actions and operations that are triggered when auditable events occurs. Audit log records can be used to troubleshoot system issues, manage security, and monitor compliance, by tracking administrative activity, data access and modification, monitoring sign-in attempts, and reconstructing security breaches and fraudulent activity. **Related terms**: *auditable event* **Related content** - [Audit Log Concepts for Confluent Cloud](/cloud/current/monitoring/audit-logging/cloud-audit-log-concepts.html) - [Audit Log Concepts for Confluent Platform](/platform/current/security/audit-logs/audit-logs-concepts.html) auditable event : An auditable event is an event that represents an action or operation that can be tracked and monitored for security purposes and compliance. When an auditable event occurs, an auditable event method is triggered and an event message is sent to the audit log cluster and stored as an audit log record. **Related terms**: *audit log*, *event message* **Related content** - [Auditable Events in Confluent Cloud](/cloud/current/monitoring/audit-logging/event-methods/index.html) - [Auditable Events in Confluent Platform](/platform/current/security/audit-logs/auditable-events.html) authentication : Authentication is the process of verifying the identity of a principal that interacts with a system or application. Authentication is often used in conjunction with authorization to determine whether a principal is allowed to access a resource and perform a specific action or operation on that resource. Digital authentication requires one or more of the following: something a principal knows (a password or security question), something a principal has (a security token or key), or something a principal is (a biometric characteristic, such as a fingerprint or voiceprint). Multi-factor authentication (MFA) requires two or more forms of authentication. **Related terms**: *authorization*, *identity*, *identity provider*, *identity pool*, *principal*, *role* authorization : Authorization is the process of evaluating and then granting or denying a principal a set of permissions required to access and perform operations on resources. **Related terms**: *authentication*, *group mapping*, *identity*, *identity provider*, *identity pool*, *principal*, *role* Avro : Avro is a data serialization and exchange framework that provides data structures, remote procedure call (RPC), compact binary data format, a container file, and uses JSON to represent schemas. Avro schemas ensure that every field is properly described and documented for use with serializers and deserializers. You can either send a schema with every message or use Schema Registry to store and receive schemas for use by consumers and producers to save bandwidth and storage space. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [Apache Avro - a data serialization system](https://avro.apache.org/) - Confluent Cloud: [Avro Schema Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-avro.html) - Confluent Platform: [Avro Schema Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-avro.html) Basic Kafka cluster : A Confluent Cloud cluster type. Basic Kafka clusters are designed for experimentation, early development, and basic use cases. batch processing : Batch processing is the method of collecting a large volume of data over a specific time interval, after which the data is processed all at once and loaded into a destination system. Batch processing is often used when processing data can occur independently of the source and timing of the data. It is efficient for non-real-time data processing, such as data warehousing, reporting, and analytics. **Related terms**: *bounded stream*, *stream processing*, *unbounded stream* CIDR block : A CIDR block is a group of IP addresses that are contiguous and can be represented as a single block. CIDR blocks are expressed using Classless Inter-domain Routing (CIDR) notation that includes an IP address and a number of bits in the network mask. **Related content** - [CIDR notation](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing#CIDR_notation) - [Classless Inter-domain Routing (CIDR): The Internet Address Assignment and Aggregation Plan [RFC 4632]](https://www.rfc-editor.org/rfc/rfc4632.html) Cluster Linking : Cluster Linking is a highly performant data replication feature that enables links between Kafka clusters to mirror data from one cluster to another. Cluster Linking creates perfect copies of Kafka topics, which keep data in sync across clusters. Use cases include geo-replication of data, data sharing, migration, disaster recovery, and tiered separation of critical applications. **Related content** - [Geo-replication with Cluster Linking on Confluent Cloud](/cloud/current/multi-cloud/cluster-linking/index.html) - [Cluster Linking for Confluent Platform](/platform/current/multi-dc-deployments/cluster-linking/index.html) commit log : A commit log is a log of all event messages about commits (changes or operations made) sent to a Kafka topic. A commit log ensures that all event messages are processed at least once and provides a mechanism for recovery in the event of a failure. The commit log is also referred to as a write-ahead log (WAL) or a transaction log. **Related terms**: *event message* Confluent Cloud : Confluent Cloud is the fully managed, cloud-native event streaming service powered by Kora, the event streaming platform based on Kafka and extended by Confluent to provide high availability, scalability, elasticity, security, and global interconnectivity. Confluent Cloud offers cost-effective multi-tenant configurations as well as dedicated solutions, if stronger isolation is required. **Related terms**: *Apache Kafka*, *Kora* **Related content** - [Confluent Cloud Overview](/cloud/current/index.html) - [Confluent Cloud](https://www.confluent.io/confluent-cloud/) Confluent Cloud network : A Confluent Cloud network is an abstraction for a single tenant network environment that hosts Dedicated Kafka clusters in Confluent Cloud along with their single tenant services, like ksqlDB clusters and managed connectors. **Related content** - [Confluent Cloud Network Overview](/cloud/current/networking/overview.html#ccloud-networks) Confluent for Kubernetes (CFK) : *Confluent for Kubernetes (CFK)* is a cloud-native control plane for deploying and managing Confluent in private cloud environments through declarative API. Confluent Platform : Confluent Platform is a specialized distribution of Kafka at its core, with additional components for data integration, streaming data pipelines, and stream processing. Confluent REST Proxy : Confluent REST Proxy provides a RESTful interface to an Kafka cluster, making it easy to produce and consume messages, view the state of the cluster, and perform administrative actions without using the native Kafka protocol or clients. **Related content** - Confluent Platform: [REST Proxy](/platform/current/kafka-rest/index.html) Confluent Server : Confluent Server is the default Kafka broker component of Confluent Platform that builds on the foundation of Apache Kafka® and provides enhanced proprietary features designed for enterprise use. Confluent Server is fully compatible with Kafka, and adds Kafka cluster support for Role-Based Access Control, Audit Logs, Schema Validation, Self Balancing Clusters, Tiered Storage, Multi-Region Clusters, and Cluster Linking. **Related terms**: *Confluent Platform*, *Apache Kafka*, *Kafka broker*, *Cluster Linking*, *multi-region cluster (MRC)* Confluent Unit for Kafka (CKU) : Confluent Unit for Kafka (CKU) is a unit of horizontal scaling for Dedicated Kafka clusters in Confluent Cloud that provide preallocated resources. CKUs determine the capacity of a Dedicated Kafka cluster in Confluent Cloud. **Related content** - [CKU limits per cluster](/cloud/current/clusters/cluster-types.html#cku-limits-per-cluster) Connect API : The Connect API is the Kafka API that enables a connector to read event streams from a source system and write to a target system. Connect worker : A Connect worker is a server process that runs a connector and performs the actual work of moving data in and out of Kafka topics. A worker is a server process that runs on hardware independent of the Kafka brokers themselves. It is scalable and fault-tolerant, meaning you can run a cluster of workers that share the load of moving data in and out of Kafka from and to external systems. **Related terms**: *connector*, *Kafka Connect* connection attempts : In Confluent Cloud, connection attempts are a Kafka cluster billing dimension that defines the maximum number of new TCP connections to the cluster you can create in one second. This includes successful and unsuccessful authentication attempts. Available in the Metrics API as `successful_authentication_count` (only includes successful authentications, not unsuccessful authentication attempts). To reduce usage on connection attempts, use longer-lived connections to the cluster. If you exceed the maximum, connection attempts may be refused. connector : A connector is an abstract mechanism that enables communication, coordination, or cooperation among components by transferring data elements from one interface to another without changing the data. connector offset : Connector offset uniquely identifies the position of a connector as it processes data. Connectors use a variety of strategies to implement the connector offset, including everything from monotonically increasing integers to replay ids, lists of files, timestamps and even checkpoint information. Connector offsets keep track of already-processed data in the event of a connector restart or recovery. While sink connectors use a pattern for connector offsets similar to the offset mechanism used throughout Kafka, the implementation details for source connectors are often much different. This is because source connectors track the progress of a source system as it process data. consumer : A consumer is a Kafka client application that subscribes to (reads and processes) event messages from a Kafka topic. The Streams API and the Consumer API are the two APIs that enable consumers to read event streams from Kafka topics. **Related terms**: *Consumer API*, *consumer group*, *producer*, *Streams API* Consumer API : The Consumer API is the Kafka API used for consuming (reading) event messages or records from Kafka topics and enables a Kafka consumer to subscribe to a topic and read event messages as they arrive. Batch processing is a common use case for the Consumer API. consumer group : A consumer group is a single logical consumer implemented with multiple physical consumers for reasons of throughput and resilience. By dividing topics among consumers in the group into partitions, consumers in the group can process messages in parallel, increasing message throughput and enabling load balancing. **Related terms**: *consumer*, *partition*, *partition*, *producer*, *topic* consumer lag : Consumer lag is the number of consumer offsets between the latest message produced in a partition and the last message consumed by a consumer, that is the number of messages pending to be consumed from a particular partition. A large consumer lag, or a quickly growing lag, indicates that the consumer is unable to read from a partition as fast as the messages are available. This can be caused by a slow consumer, slow network, or slow broker. consumer offset : Consumer offset is the unique and monotonically increasing integer value that uniquely identifies the position of an event record in a partition. Consumers use offsets to track their current position in the Kafka topic, allowing consumers to resume processing from where they left off. Offsets are stored on the Kafka broker, which does not track which records have been read and which have not. It is up to the consumer connection to track this information. When a consumer acknowledges receiving and processing a message, it commits an offset value that is stored in the special internal topic `__commit_offsets`. cross-resource RBAC role binding : A cross-resource RBAC role binding is a role binding in Confluent Cloud that is applied at the Organization or Environment scope and grants access to multiple resources. For example, assigning a principal the NetworkAdmin role at the Organization scope lets them administer all networks across all Environments in their Organization. **Related terms**: *identity pool*, *principal*, *role*, *role binding* CRUD : CRUD is an acronym for the four basic operations that can be performed on data: Create, Read, Update, and Delete. custom connector : A custom connector is a connector created using Connect plugins uploaded to Confluent Cloud by users. This includes connector plugins that are built from scratch, modified open-source connector plugins, or third-party connector plugins. data at rest : Data at rest is data that is physically stored on non-volatile media (such as hard drives, solid-state drives, or other storage devices) and is not actively being transmitted or processed by a system. data contract : A data contract is a formal agreement between an upstream component and a downstream component on the structure and semantics of data that is in motion. A schema is a key element of a data contract. The schema, metadata, rules, policies, and evolution plan form the data contract. You can associate data contracts (schemas and more) with [topics](#term-Kafka-topic). **Related content** - Confluent Platform: [Data Contracts for Schema Registry on Confluent Platform](/platform/current/schema-registry/fundamentals/data-contracts.html) - Confluent Cloud: [Data Contracts for Schema Registry on Confluent Cloud](/cloud/current/sr/fundamentals/data-contracts.html) - Cloud Console: [Manage Schemas in Confluent Cloud](/cloud/current//sr/schemas-manage.html) data encryption key (DEK) : A data encryption key (DEK) is a symmetric key that is used to encrypt and decrypt data. The DEK is used in client-side field level encryption (CSFLE) to encrypt sensitive data. The DEK is itself encrypted using a key encryption key (KEK) that is only accessible to authorized users. The encrypted DEK and encrypted data are stored together. Only users with access to the KEK can decrypt the DEK and access the sensitive data. **Related terms**: *envelope encryption*, *key encryption key (KEK)* data in motion : Data in motion is data that is actively being transferred between source and destination, typically systems, devices, or networks. Data in motion is also referred to as data in transit or data in flight. data in use : Data in use is data that is actively being processed or manipulated in memory (RAM, CPU caches, or CPU registers). data ingestion : Data ingestion is the process of collecting, importing, and integrating data from various sources into a system for further processing, analysis, or storage. data mapping : Data mapping is the process of defining relationships or associations between source data elements and target data elements. Data mapping is an important process in data integration, data migration, and data transformation, ensuring that data is accurately and consistently represented when it is moved or combined. data pipeline : A data pipeline is a series of processes and systems that enable the flow of data from sources to destinations, automating the movement and transformation of data for various purposes, such as analytics, reporting, or machine learning. A data pipeline typically comprised of a source system, a data ingestion tool, a data transformation tool, and a target system. A data pipeline covers the following stages: data extraction, data transformation, data loading, and data validation. Data Portal : Data Portal is a Confluent Cloud application that uses Stream Catalog and Stream Lineage to provide self-service access throughout Confluent Cloud Console for data practitioners to search and discover existing topics using tags and business metadata, request access to topics and data, and access data in topics to to build streaming applications and data pipelines. Leverages Stream Catalog and Stream Lineage to provide a data-centric view of Confluent optimized for self-service access to data where users can search, discover and understand available data, request access to data, and use data. **Related terms**: *Stream Catalog*, *Stream Lineage* **Related content** - [Data Portal on Confluent Cloud](/cloud/current/stream-governance/data-portal.html) data serialization : Data serialization is the process of converting data structures or objects into a format that can be stored or transmitted, and reconstructed later in the same or another computer environment. Data serialization is a common technique for implementing data persistence, interprocess communication, and object communication. Confluent Schema Registry (in Confluent Platform) and Confluent Cloud Schema Registry support data serialization using serializers and deserializers for the following formats: Avro, JSON Schema, and Protobuf. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) data steward : A data steward is a person with data-related responsibilities, such as data governance, data quality, and data security. data stream : A data stream is a continuous flow of data records that are produced and consumed by applications. dead letter queue (DLQ) : A dead letter queue (DLQ) is a queue where messages that could not be processed successfully by a sink connector are placed. Instead of stopping, the sink connector sends messages that could not be written successfully as event records to the DLQ topic while the sink connector continues processing messages. Dedicated Kafka cluster : A Confluent Cloud cluster type. Dedicated Kafka clusters are designed for critical production workloads with high traffic or private networking requirements. deserializer : A deserializer is a tool that converts a serial byte stream back into objects and parallel data. Deserializers work with serializers (known together as Serdes) to support efficient storage and high-speed data transmission over the wire. Confluent provides Serdes for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) egress : In general networking, egress refers to outbound traffic leaving a network or a specific network segment. In Confluent Cloud, egress is a Kafka cluster billing dimension that defines the number of bytes consumed from the cluster in one second. Available in the Metrics API as `sent_bytes` (convert from bytes to MB). To reduce egress in Confluent Cloud, compress your messages and ensure each consumer is only consuming from the topics it requires. For compression, use lz4. Avoid gzip because of high overhead on the cluster. Elastic Confluent Unit for Kafka (eCKU) : Elastic Confluent Unit for Kafka (eCKU) is used to express capacity for Basic, Standard, Enterprise, and Freight Kafka clusters. These clusters automatically scale up to a fixed ceiling. There is no need to resize these type clusters. When you need more capacity, your cluster expands up to the fixed ceiling. If you’re not using capacity above the minimum, you’re not paying for it. ELT : ELT is an acronym for Extract-Load-Transform, where data is extracted from a source system and loaded into a target system before processing or transformation. Compared to ETL, ELT is a more flexible approach to data ingestion because the data is loaded into the target system before transformation. Enterprise Kafka cluster : A Confluent Cloud cluster type. Enterprise Kafka clusters are designed for production-ready functionality that requires private endpoint networking capabilities. envelope encryption : Envelope encryption is a cryptographic technique that uses two keys to encrypt data. The symmetric data encryption key (DEK) is used to encrypt sensitive data. The separate asymmetric key encryption key (KEK) is the master key used to encrypt the DEK. The DEK and encrypted data are stored together. Only users with access to the KEK can decrypt the DEK and access the sensitive data. In Confluent Cloud, envelope encryption is used to enable client-side field level encryption (CSFLE). CSFLE encrypts sensitive data in a message before it is sent to Confluent Cloud and allows for temporary decryption of sensitive data when required to perform operations on the data. **Related terms**: *data encryption key (DEK)*, *key encryption key (KEK)* ETL : ETL is an acronym for Extract-Transform-Load, where data is extracted from a source system, transformed into a target format, and loaded into a target system. Compared to ELT, ETL is a more rigid approach to data ingestion because the data is transformed before loading into the target system. event : An event is a meaningful action or occurrence of something that happened. Events that can be recognized by a program, either human-generated or triggered by software, can be recorded in a log file or other data store. **Related terms**: *event message*, *event record*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event message : An event message is a record of an event sent to a Kafka topic, represented as a key-value pair. Each event message consists of a key-value pair, a timestamp, the compression type, headers for metadata (optional), and a partition and offset ID (once the message is written). The key is optional and can be used to identify the event. The value is required and contains details about the event that happened. **Related terms**: *event*, *event record*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event record : An event record is the record of an event stored in a Kafka topic. Event records are organized and durably stored in topics. Examples of events include orders, payments, activities, or measurements. An event typically contains one or more data fields that describe the fact, as well as a timestamp that denotes when the event was created by its event source. The event may also contain various metadata, such as its source of origin (for example, the application or cloud service that created the event) and storage-level information (for example, its position in the event stream). **Related terms**: *event*, *event message*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event sink : An event sink is a consumer of events, which can include applications, cloud services, databases, IoT sensors, and more. **Related terms**: *event*, *event message, \*event record*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event source : An event source is a producer of events, which can include cloud services, databases, IoT sensors, mainframes, and more. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event stream*, *event streaming*, *event streaming platform*, *event time* event stream : An event stream is a continuous flow of event messages produced by an event source and consumed by one or more consumers. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event streaming platform*, *event time* event streaming : Event streaming is the practice of capturing event data in real-time from data sources. Event streaming is a form of data streaming that is used to capture, store, process, and react to data in real-time or retrospectively. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming platform*, *event time* event streaming platform : An event streaming platform is a platform that events can be written to once, allowing distributed functions within an organization to react in realtime. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event time* event time : Event time is the time when an event occurred on the producing device, as opposed to the time when the event was processed or recorded. Event time is often used in stream processing to determine the order of events and to perform windowing operations. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event streaming platform* exactly-once semantics : Exactly-once semantics is a guarantee that a message is delivered exactly once and in the order that it was sent. Even if a producer retries sending a message, or a consumer retries processing a message, the message is delivered exactly once. This guarantee is achieved by the broker assigning a unique ID to each message and storing the ID in the consumer offset. The consumer offset is committed to the broker only after the message is processed. If the consumer fails to process the message, the message is redelivered and processed again. Freight Kafka cluster : A Confluent Cloud cluster type. Freight Kafka clusters are designed for high-throughput, relaxed latency workloads that are less expensive than self-managed open source Kafka. granularity : Granularity is the degree or level of detail to which an entity (a system, service, or resource) is broken down into subcomponents, parts, or elements. Entities that are *fine-grained* have a higher level of detail, while *coarse-grained* entities have a reduced level of detail, often combining finer parts into a larger whole. In the context of access control, granular permissions provide precise control over resource access. They allow administrators to grant specific operations on distinct resources. This ensures users only have permissions tailored to their needs, minimizing unnecessary or potentially risky access. group mapping : Group mapping is a set of rules that map groups in your SSO identity provider to Confluent Cloud RBAC roles. When a user signs in to Confluent Cloud using SSO, Confluent Cloud uses the group mapping to grant access to Confluent Cloud resources. **Related terms**: *identity provider*, *identity pool*, *principal*, *role* **Related content** - [Group Mapping for Confluent Cloud](/cloud/current/access-management/authenticate/sso/group-mapping/overview.html) identity : An identity is a unique identifier that is used to authenticate and authorize users and applications to access resources. Identity is often used in conjunction with access control to determine whether a user or application is allowed to access a resource and perform a specific action or operation on that resource. **Related terms**: *identity provider*, *identity pool*, *principal*, *role* identity pool : An identity pool is a collection of identities that can be used to authenticate and authorize users and applications to access resources. Identity pools are used to manage permissions for users and applications that access resources in Confluent Cloud. They are also used to manage permissions for Confluent Cloud service accounts that are used to access resources in Confluent Cloud. identity provider : An identity provider is a trusted provider that authenticates users and issues security tokens that are used to verify the identity of a user. Identity providers are often used in single sign-on (SSO) scenarios, where a user can log in to multiple applications or services with a single set of credentials. Infinite Storage : Infinite Storage is the Confluent Cloud storage service that enhances the scalability of Confluent Cloud resources by separating storage and processing. Tiered storage within Confluent Cloud moves data between storage layers based on the needs of the workload, retrieves tiered data when requested, and garbage collects data that is past retention or otherwise deleted. If an application reads historical data, latency is not increased for other applications reading more recent data. Storage resources are decoupled from compute resources, you only pay for what you produce to Confluent Cloud and for storage that you use, and CKUs do not have storage limits. Related content: - [Infinite Storage in Confluent Cloud for Apache Kafka](https://www.confluent.io/blog/infinite-kafka-data-storage-in-confluent-cloud/) ingress : In general networking, ingress refers to traffic that enters a network from an external source. In Confluent Cloud, ingress is a Kafka cluster billing dimension that defines the number of bytes produced to the cluster in one second. Available in the Metrics API as `received_bytes` (convert from bytes to MB). To reduce ingress in Confluent Cloud, compress your messages. For compression, use lz4. Avoid gzip because of high overhead on the cluster. internal topic : An internal topic is a topic, prefixed with double underscores (“_\_”), that is automatically created by a Kafka component to store metadata about the broker, partition assignment, consumer offsets, and other information. Examples of internal topics: `__cluster_metadata`, `__consumer_offsets`, `__transaction_state`, `__confluent.support.metrics`, and `__confluent.support.metrics-raw`. JSON Schema : JSON Schema is a declarative language used for data serialization and exchange to define data structures, specify formats, and validate JSON documents. It is a way to encode expected data types, properties, and constraints to ensure that all fields are properly described for use with serializers and deserializers. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [JSON Schema - a declarative language that allows you to annotate and validate JSON documents.](https://json-schema.org/) - Confluent Cloud: [JSON Schema Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-avro.html) - Confluent Platform: [JSON Schema Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-json.html) Kafka bootstrap server : A Kafka bootstrap server is a Kafka broker that a Kafka client initiates a connection to a Kafka cluster and returns metadata, which includes the addresses for all of the brokers in the Kafka cluster. Although only one bootstrap server is required to connect to a Kafka cluster, multiple brokers can be specified in a bootstrap server list to provide high availability and fault tolerance in case a broker is unavailable. In Confluent Cloud, the bootstrap server is the general cluster endpoint. Kafka broker : A Kafka broker is a server in the Kafka storage layer that stores event streams from one or more sources. A Kafka cluster is typically comprised of several brokers. Every broker in a cluster is also a bootstrap server, meaning if you can connect to one broker in a cluster, you can connect to every broker. Kafka client : A Kafka client allows you to write distributed applications and microservices that read, write, and process streams of events in parallel, at scale, and in a fault-tolerant manner, even in the case of network problems or machine failures. The Kafka client library provides functions, classes, and utilities that allow developers to create Kafka producer clients (Producers) and consumer clients (Consumers) using various programming languages. The primary way to build production-ready Producers and Consumers is by using your preferred programming language and a Kafka client library. **Related content** - [Build Client Applications for Confluent Cloud](/cloud/current/client-apps/overview.html) - [Build Client Applications for Confluent Platform](/platform/current/clients/index.html) - [Getting Started with Apache Kafka and Java (or Python, Go, .Net, and others)](https://developer.confluent.io/get-started/java/) Kafka cluster : A Kafka cluster is a group of interconnected Kafka brokers that manage and distribute real-time data streaming, processing, and storage as if they are a single system. By distributing tasks and services across multiple Kafka brokers, the Kafka cluster improves availability, reliability, and performance. Kafka Connect : Kafka Connect is the component of Kafka that provides data integration between databases, key-value stores, search indexes, file systems, and Kafka brokers. Kafka Connect is an ecosystem of a client application and pluggable connectors. As a client application, Connect is a server process that runs on hardware independent of the Kafka brokers themselves. It is scalable and fault-tolerant, meaning you can run a cluster of Connect workers that share the load of moving data in and out of Kafka from and to external systems. Connect also abstracts the business of code away from the user and instead requires only JSON configuration to run. **Related content** - Confluent Cloud: [Kafka Connect](/cloud/current/billing/overview.html#kconnect-long) - Confluent Platform: [Kafka Connect](/platform/current/connect/index.html) Kafka controller : A Kafka controller is the node in a Kafka cluster that is responsible for managing and changing the metadata of the cluster. This node also communicates metadata changes to the rest of the cluster. When Kafka uses ZooKeeper for metadata management, the controller is a broker, and the broker persists the metadata to ZooKeeper for backup and recovery. With KRaft, you dedicate Kafka nodes to operate as controllers and the metadata is stored in Kafka itself and not persisted to ZooKeeper. KRaft enables faster recovery because of this. For more information, see [KRaft overview](/platform/current/kafka-metadata/kraft.html). Kafka listener : A Kafka listener is an endpoint that Kafka brokers bind to use to communicate with clients. For Kafka clusters, Kafka listeners are configured in the `listeners` property of the `server.properties` file. Advertised listeners are publicly accessible endpoints that are used by clients to connect to the Kafka cluster. **Related content** - [Kafka Listeners – Explained](https://www.confluent.io/blog/kafka-listeners-explained/) Kafka metadata : Kafka metadata is the information about the Kafka cluster and the topics that are stored in it. This information includes details such as the brokers in the cluster, the topics that are available, the partitions for each topic, and the location of the leader for each partition. Kafka metadata is used by clients to discover the available brokers and topics, and to determine which broker is the leader for a particular partition. This information is essential for clients to be able to send and receive messages to and from Kafka. Kafka Streams : Kafka Streams is a stream processing library for building streaming applications and microservices that transform (filter, group mapping, aggregate, join, and more) incoming event streams in real-time to Kafka topics stored in an Kafka cluster. The Streams API can be used to build applications that process data in real-time, analyze data continuously, and build data pipelines. **Related content** - [Kafka Streams](/platform/current/streams/overview.html) Kafka topic : See *topic*. key encryption key (KEK) : A key encryption key (KEK) is a master key that is used to encrypt and a decrypt other keys, specifically the data encryption key (DEK). Only users with access to the KEK can decrypt the DEK and access the sensitive data. **Related terms**: *data encryption key (DEK)*, *envelope encryption*. Kora : Kora is the cloud-native streaming data service based on Kafka technology that powers the Confluent Cloud event streaming platform for building real-time data pipelines and streaming applications. Kora abstracts low-level resources, such as Kafka brokers, and hides operational complexities, such as system upgrades. Kora is built on the following foundations: a tiered storage layer that improves cost and performance, elasticity and consistent performance through incremental load balancing, cost effective multi-tenancy with dynamic quota management and cell-based isolation, continuous monitoring of both system health and data integrity, and clean abstraction with standard Kafka protocols and CKUs to hide underlying resources. **Related terms**: *Apache Kafka*, *Confluent Cloud*, *Confluent Unit for Kafka (CKU)* **Related content** - [Kora: The Cloud Native Engine for Apache Kafka](https://www.confluent.io/blog/cloud-native-data-streaming-kafka-engine/) - [Kora: A Cloud-Native Event Streaming Platform For Kafka](https://www.vldb.org/pvldb/vol16/p3822-povzner.pdf) KRaft : KRaft (or Apache Kafka Raft) is a consensus protocol introduced in Kafka 2.4 to provide metadata management for Kafka with the goal to replace ZooKeeper. KRaft simplifies Kafka because it enables the management of metadata in Kafka itself, rather than splitting it between ZooKeeper and Kafka. As of Confluent Platform 7.5, KRaft is the default method of metadata management in new deployments. For more information, see [KRaft overview](/platform/current/kafka-metadata/kraft.html). ksqlDB : ksqlDB is a streaming SQL database engine purpose-built for creating stream processing applications on top of Kafka. logical Kafka cluster (LKC) : A logical Kafka cluster (LKC) is a subset of a physical Kafka cluster (PKC) that is isolated from other logical clusters within Confluent Cloud. Each logical unit of isolation is considered a tenant and maps to a specific organization. If the mapping is one-to-one, one LKC maps to one PKC (a Dedicated cluster). If the mapping is many-to-one, one LKC maps to one of the multitenant Kafka cluster types (Basic, Standard, Enterprise, and Freight). **Related terms**: *Confluent Cloud*, *Kafka cluster*, *physical Kafka cluster (PKC)* **Related content** - [Kafka Cluster Types in Confluent Cloud](/cloud/current/clusters/cluster-types.html) - [Multi-tenancy and Client Quotas on Confluent Cloud](/cloud/current/clusters/client-quotas.html) multi-region cluster (MRC) : A multi-region cluster (MRC) is a single Kafka cluster that replicates data between datacenters across regional availability zones. multi-tenancy : Multi-tenancy is a software architecture in which a single physical instance is shared among multiple logical instances, or tenants. In Confluent Cloud, each Basic, Standard, Enterprise, and Freight cluster is a logical Kafka cluster (LKC) that shares a physical Kafka cluster (PKC) with other tenants. Each LKC is isolated from other L and has its own resources, such as memory, compute, and storage. **Related terms**: *Confluent Cloud*, *logical Kafka cluster (LKC)*, *physical Kafka cluster (PKC)* **Related content** * [From On-Prem to Cloud-Native: Multi-Tenancy in Confluent Cloud](https://www.confluent.io/blog/cloud-native-multi-tenant-kafka-with-confluent-cloud/) * [Multi-tenancy and Client Quotas on Confluent Cloud](/cloud/current/clusters/client-quotas.html) offset : An offset is an integer assigned to each message that uniquely represents its position within the data stream, guaranteeing the ordering of records and allowing offset-based connections to replay messages from any point in time. **Related terms**: *consumer offset*, *connector offset*, *offset commit*, *replayability* offset commit : An offset commit is the process of keeping track of the current position of an offset-based connection (primarily Kafka consumers and connectors) within the data stream. The offset commit process is not specific to consumers, producers, or connectors. It is a general mechanism in Kafka to track the position of any application that is reading data. When a consumer commits an offset, the offset identifies the next message the consumer should consume. For example, if a consumer has an offset of 5, it has consumed messages 0 through 4 and will next consume message 5. If the consumer crashes or is shut down, its partitions are reassigned to another consumer which initiates consuming from the last committed offset of each partition. The committed offset for consumers is stored on a Kafka broker. When a consumer commits an offset, it sends a commit request to the Kafka cluster, specifying the partition and offset it wants to commit for a particular consumer group. The Kafka broker receiving the commit request then stores this offset in the `__consumer_offsets` internal topic. **Related terms**: *consumer offset*, *offset* OpenSSL : OpenSSL is an open-source software library and toolkit that implements the Secure Sockets Layer (SSL) and Transport Layer Security (TLS) protocols. For more information, see [Wikipedia](https://en.wikipedia.org/wiki/OpenSSL). parent cluster : The Kafka cluster that a resource belongs to. **Related terms**: *Kafka cluster* partition : A partition is a unit of data storage that divides a topic into multiple, parallel event streams, each of which is stored on separate Kafka brokers and can be consumed independently. Partitioning is a key concept in Kafka because it allows Kafka to scale horizontally by adding more brokers to the cluster. Partitions are also the unit of parallelism in Kafka. A topic can have one or more partitions, and each partition is an ordered, immutable sequence of event records that is continually appended to a partition log. partitions (pre-replication) : In Confluent Cloud, partitions are a Kafka cluster billing dimension that define the maximum number of partitions that can exist on the cluster at one time, before replication. While you are not charged for partitions on any type of Kafka cluster, the number of partitions you use has an impact on eCKU. To determine eCKUs limits for partitions, Confluent Cloud bills only for pre-replication (leader partitions) across a cluster. All topics that you create (as well as internal topics that are automatically created by Confluent Platform components such as ksqlDB, Kafka Streams, Connect, and Control Center (Legacy)) count towards the cluster partition limit. Confluent prefixes topics created automatically with an underscore (_). Topics that are internal to Kafka itself (consumer offsets) are not visible in Cloud Console and do not count against partition limits or toward partition billing. Available in the Metrics API as `partition_count`. In Confluent Cloud, attempts to create additional partitions beyond the cluster limit fail with an error message. To reduce usage on partitions (pre-replication), delete unused topics and create new topics with fewer partitions. Use the Kafka Admin interface to increase the partition count of an existing topic if the initial partition count is too low. physical Kafka cluster (PKC) : A physical Kafka cluster (PKC) is a Kafka cluster comprised of multiple brokers. Each physical Kafka cluster is created on a Kubernetes cluster by the control plane. A PKC is not directly accessible by clients. principal : A principal is an entity that can be authenticated and granted permissions based on roles to access resources and perform operations. An entity can be a user account, service account, group mapping, or identity pool. **Related terms**: *group mapping*, *identity*, *identity pool*, *role*, *service account*, *user account* private internet : A private internet is a closed, restricted computer network typically used by organizations to provide secure environments for managing sensitive data and resources. processing time : Processing time is the time when an event is processed or recorded by a system, as opposed to the time when the event occurred on the producing device. Processing time is often used in stream processing to determine the order of events and to perform windowing operations. producer : A producer is a client application that publishes (writes) data to a topic in an Kafka cluster. Producers write data to a topic and are the only clients that can write data to a topic. Each record written to a topic is appended to the partition of the topic that is selected by the producer. Producer API : The Producer API is the Kafka API that allows you to write data to a topic in an Kafka cluster. The Producer API is used by producer clients to publish data to a topic in an Kafka cluster. Protobuf : Protobuf (or Protocol Buffers) is an open-source data format used to serialize structured data for storage. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [Protocol Buffers](https://protobuf.dev/) - [Getting Started with Protobuf in Confluent Cloud](https://www.confluent.io/blog/using-protobuf-in-confluent-cloud/) - Confluent Cloud: [Protobuf Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-protobuf.html) - Confluent Platform: [Protobuf Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-protobuf.html) public internet : The public internet is the global system of interconnected computers and networks that use TCP/IP to communicate with each other. rebalancing : Rebalancing is the process of redistributing the partitions of a topic among the consumers of a consumer group for improved performance and scalability. A rebalance can occur if a consumer has failed the heartbeat and has been excluded from the group, it voluntarily left the group, metadata has been updated for a consumer, or a consumer has joined the group. replayability : Replayability is the ability to replay messages from any point in time. **Related terms**: *consumer offset*, *offset*, *offset commit* replication : Replication is the process of creating and maintaining multiple copies (or *replicas*) of data across different nodes in a distributed system to increase availability, reliability, redundancy, and accessibility. replication factor : A replication factor is the number of copies of a partition that are distributed across the brokers in a cluster. requests : In Confluent Cloud, requests are a Kafka cluster billing dimension that defines the number of client requests to the cluster in one second. Available in the Metrics API as `request_count`. To reduce usage on requests, you can adjust producer batching configurations, consumer client batching configurations, and shut down otherwise inactive clients. For Dedicated clusters, a high number of requests per second results in increased load on the cluster. role : A role is a Confluent-defined job function assigned a set of permissions required to perform specific actions or operations on Confluent resources bound to a principal and Confluent resources. A role can be assigned to a user account, group mapping, service account, or identity pool. **Related terms**: *group mapping*, *identity*, *identity pool*, *principal*, *service account* **Related content** - [Predefined RBAC Roles in Confluent Cloud](/cloud/current/access-management/access-control/rbac/predefined-rbac-roles.html) - [Role-Based Access Control Predefined Roles in Confluent Platform](/platform/current/security/rbac/rbac-predefined-roles.html) rolling restart : A rolling restart restarts the brokers in a Kafka cluster with zero downtime by incrementally restarting a Kafka broker after verifying that there are no under-replicated partitions on the broker before proceeding to the next broker. Restarting the brokers one at a time allows for software upgrades, broker configuration updates, or cluster maintenance while maintaining high availability by avoiding downtime. **Related content** - [Rolling restart](/platform/current/kafka/post-deployment.html#rolling-restart) schema : A schema is the structured definition or blueprint used to describe the format and structure event messages sent through the Kafka event streaming platform. Schemas are used to validate the structure of data in event messages and ensures that producers and consumers are sending and receiving data in the same format. Schemas are defined in the Schema Registry. Schema Registry : Schema Registry is a centralized repository for managing and validating schemas for topic message data that stores and manages schemas for Kafka topics. Schema Registry is built into Confluent Cloud as a managed service, available with the Advanced Stream Governance package, and offered as part of Confluent Enterprise for self-managed deployments. The Schema Registry is a RESTful service that stores and manages schemas for Kafka topics. The Schema Registry is integrated with Kafka and Connect to provide a central location for managing schemas and validating data. Producers and consumers to Kafka topics use schemas to ensure data consistency and compatibility as schemas evolve. Schema Registry is a key component of Stream Governance. **Related content** - Confluent Cloud: [Manage Schemas in Confluent Cloud](/cloud/current/sr/schemas-manage.html) - Confluent Platform: [Schema Registry Overview](/platform/current/schema-registry/index.html) schema subject : A schema subject is the namespace for a schema in Schema Registry. This unique identifier defines a logical grouping of related schemas. Kafka topics contain event messages serialized and deserialized using the structure and rules defined in a schema subject. This ensures compatibility and supports schema evolution. **Related content** - Confluent Cloud: [Manage Schemas in Confluent Cloud](/cloud/current/sr/schemas-manage.html) - Confluent Platform: [Schema Registry Concepts](/platform/current/schema-registry/index.html) - [Understanding Schema Subjects](https://developer.confluent.io/courses/schema-registry/schema-subjects/) Serdes : Serdes are serializers and deserializers that convert objects and parallel data into a serial byte stream for efficient storage and high-speed data transmission over the wire. Confluent provides Serdes for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) - [Serde](https://serde.rs/) serializer : A serializer is a tool that converts objects and parallel data into a serial byte stream. Serializers work with deserializers (known together as Serdes) to support efficient storage and high-speed data transmission over the wire. Confluent provides serializers for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) service account : A service account is a non-person entity used by an application or service to access resources and perform operations. Because a service account is an identity independent of the user who created it, it can be used programmatically to authenticate to resources and perform operations without the need for a user to be signed in. **Related content** - [Service Accounts for Confluent Cloud](/cloud/current/access-management/identity/service-accounts.html) service quota : A service quota is the limit, or maximum value, for a specific Confluent Cloud resource or operation that might vary by the resource scope it applies to. **Related content** - [Service Quotas for Confluent Cloud](/cloud/current/quotas/index.html) single message transform (SMT) : A single message transform (SMT) is a transformation or operation applied in realtime on an individual message that changes the values, keys, or headers of a message before being sent to a sink connector or after being read from a source connector. SMTs are convenient for inserting fields, masking information, event routing, and other minor data adjustments. single sign-on (SSO) : Single sign-on (SSO) is a centralized authentication service that allows users to use a single set of credentials to log in to multiple applications or services. **Related terms**: *authentication*, *group mapping*, *identity provider* **Related content** - [Single Sign-On for Confluent Cloud](/cloud/current/access-management/authenticate/sso/index.html) sink connector : A sink connector is a Kafka Connect connector that publishes (writes) data from a Kafka topic to an external system. source connector : A source connector is a Kafka Connect connector that subscribes (reads) data from a source (external system), extracts the payload and schema of the data, and publishes (writes) the data to Kafka topics. standalone : Standalone refers to a configuration in which a software application, system, or service operates independently on a single instance or device. This mode is commonly used for development, testing, and debugging purposes. For Kafka Connect, a standalone worker is a single process responsible for running all connectors and tasks on a single instance. Standard Kafka cluster : A Confluent Cloud cluster type. Standard Kafka clusters are designed for production-ready features and functionality. static egress IP address : A static egress IP address is an IP address used by a Confluent Cloud managed connector to establish outbound connections to endpoints of external data sources and sinks over the public internet. **Related content** - [Use Static IP Addresses on Confluent Cloud for Connectors and Cluster Linking](/cloud/current/networking/static-egress-ip-addresses.html) - [Static Egress IP Addresses for Confluent Cloud Connectors](/cloud/current/connectors/static-egress-ip.html) storage (pre-replication) : In Confluent Cloud, storage is a Kafka cluster billing dimension that defines the number of bytes retained on the cluster, pre-replication. Available in the Metrics API as `retained_bytes` (convert from bytes to TB). The returned value is pre-replication. Standard, Enterprise, Dedicated, and Freight clusters support Infinite Storage. This means there is no maximum size limit for the amount of data that can be stored on the cluster. You can configure policy settings `retention.bytes` and `retention.ms` at the topic level to control exactly how much and how long to retain data in a way that makes sense for your applications and helps control your costs. To reduce storage in Confluent Cloud, compress your messages and reduce retention settings. For compression, use lz4. Avoid gzip because of high overhead on the cluster. Stream Catalog : Stream Catalog is a pillar of Confluent Cloud Stream Governance that provides a centralized inventory of your organization’s data assets that supports data governance and data discovery. With Data Portal in Confluent Cloud Console, users can find event streams across systems, search topics by name or tags, and enrich event data to increase value and usefulness. REST and GraphQL APIs can be used to search schemas, apply tags to records or fields, manage business metadata, and discover relationships across data assets. **Related content** - [Stream Catalog on Confluent Cloud: User Guide to Manage Tags and Metadata](/cloud/current/stream-governance/stream-catalog.html) - [Stream Catalog in Streaming Data Governance (Confluent Developer course)](https://developer.confluent.io/courses/governing-data-streams/stream-catalog/) Stream Governance : Stream Governance is a collection of tools and features that provide data governance for data in motion. These include data quality tools such as Schema Registry, schema ID validation, and schema linking; built-in data catalog capabilities to classify, organize, and find event streams across systems; and stream lineage to visualize complex data relationships and uncover insights with interactive, end-to-end maps of event streams. Taken together, these and other governance tools enable teams to manage the availability, integrity, and security of data used across organizations, and help with standardization, monitoring, collaboration, reporting, and more. **Related terms**: *Stream Catalog*, *Stream Lineage* **Related content** - [Stream Governance on Confluent Cloud](/cloud/current/stream-governance/index.html) stream lineage : Stream lineage is the life cycle, or history, of data, including its origins, transformations, and consumption, as it moves through various stages in data pipelines, applications, and systems. Stream lineage provides a record of data’s journey from its source to its destination, and is used to track data quality, data governance, and data security. **Related terms**: **Data Portal**, *Stream Governance* **Related content** - [Stream Lineage on Confluent Cloud](/cloud/current/stream-governance/stream-lineage.html) stream processing : Stream processing is the method of collecting event stream data in real-time as it arrives, transforming the data in real-time using operations (such as filters, joins, and aggregations), and publishing the results to one or more target systems. Stream processing can be used to analyze data continuously, build data pipelines, and process time-sensitive data in real-time. Using the Confluent event streaming platform, event streams can be processed in real-time using Kafka Streams, Kafka Connect, or ksqlDB. Streams API : The Streams API is the Kafka API that allows you to build streaming applications and microservices that transform (for example, filter, group, aggregate, join) incoming event streams in real-time to Kafka topics stored in a Kafka cluster. The Streams API is used by stream processing clients to process data in real-time, analyze data continuously, and build data pipelines. **Related content** - [Introduction Kafka Streams API](/platform/current/streams/introduction.html) throttling : Throttling is the process Kafka clusters in Confluent Cloud use to protect themselves from getting to an over-utilized state. Also known as backpressure, throttling in Confluent Cloud occurs when cluster load reaches 80%. At this point, applications may start seeing higher latencies or timeouts as the cluster must begin throttling requests or connection attempts. topic : A topic is a user-defined category or feed name where event messages are stored and published by producers and subscribed to by consumers. Each topic is a log of event messages. Topics are stored in one or more partitions, which distribute topic records brokers in a Kafka cluster. Each partition is an ordered, immutable sequence of records that are continually appended to a topic. **Related content** - [Manage Topics in Confluent Cloud](/cloud/current/client-apps/topics/index.html) total client connections : In Confluent Cloud, total client connections are a Kafka cluster billing dimension that defines the number of TCP connections to the cluster you can open at one time. Available in the Metrics API as `active_connection_count`. Filter by principal to understand how many connections each application is creating. How many connections a cluster supports can vary widely based on several factors, including number of producer clients, number of consumer clients, partition keying strategy, produce patterns per client, and consume patterns per client. For Dedicated clusters, Confluent derives a guideline for total client connections from benchmarking that indicates exceeding this number of connections increases produce latency for test clients. However, this does not apply to all workloads. That is why total client connections are a guideline, not a hard limit for Dedicated Kafka clusters. Monitor the impact on cluster load as connection count increases, as this is the final representation of the impact of a given workload or CKU dimension on the underlying resources of the cluster. Consider the Confluent guideline a per-CKU guideline. The number of connections tends to increase when you add brokers. In other words, if you significantly exceed the per-CKU guideline, cluster expansion doesn’t always give your cluster more connection count headroom. Transport Layer Security (TLS) : Transport Layer Security (TLS) is a cryptographic protocol that provides secure communication over a network. For more information, see [Wikipedia](https://en.wikipedia.org/wiki/Transport_Layer_Security). unbounded stream : An unbounded stream is a stream of data that is continuously generated in real-time and has no defined end. Examples of unbounded streams include stock prices, sensor data, and social media feeds. Processing unbounded streams requires a different approach than processing bounded streams. Unbounded streams are processed incrementally as data arrives, while bounded streams are processed as a batch after all data has arrived. Kafka Streams and Flink can be used to process unbounded streams. **Related terms**: *bounded stream*, *stream processing*, *unbounded stream* under replication : Under replication is a situation when the number of in-sync replicas is below the number of all replicas. Under Replicated partitions can occur when a broker is down or cannot replicate fast enough from the leader (replica fetcher lag). user account : A user account is an account representing the identity of a person who can be authenticated and granted access to Confluent Cloud resources. **Related content** - [User Accounts for Confluent Cloud](/cloud/current/access-management/identity/user-accounts/overview.html) watermark : A watermark in Flink is a marker that keeps track of time as data is processed. A watermark means that all records until the current moment in time have been “seen”. This way, Flink can correctly perform tasks that depend on when things happened, like calculating aggregations over time windows. **Related content** - [Time and Watermarks](/cloud/current/flink/concepts/timely-stream-processing.html) # Apache Kafka Glossary New to Apache Kafka® and Confluent or looking for definitions? The terms below provide brief explanations and links to related content for important terms you’ll encounter when working with the Confluent event streaming platform. Admin API : The Admin API is the Kafka REST API that enables administrators to manage and monitor Kafka clusters, topics, brokers, and other Kafka components. Ansible Playbooks for Confluent Platform : Ansible Playbooks for Confluent Platform is a set of Ansible playbooks and roles that are designed to automate the deployment and management of Confluent Platform. Apache Flink : Apache Flink is an open source stream processing framework for stateful computations over unbounded and bounded data streams. Flink provides a unified API for batch and stream processing that supports event-time and out-of-order processing, and supports exactly-once semantics. Flink applications include real-time analytics, data pipelines, and event-driven applications. **Related terms**: *bounded stream*, *data stream*, *stream processing*, *unbounded stream* **Related content** - [Apache Flink: Stream Processing and SQL on Confluent Cloud](/cloud/current/flink/index.html#) - [What is Apache Flink?](https://www.confluent.io/learn/apache-flink/) - [Apache Flink 101 (Confluent Developer course)](https://developer.confluent.io/courses/apache-flink/intro/) Apache Kafka : Apache Kafka is an open source event streaming platform that provides a unified, high-throughput, low-latency, fault-tolerant, scalable, distributed, and secure data streaming platform. Kafka is a publish-and-subscribe messaging system that enables distributed applications to ingest, process, and share data in real-time. **Related content** - [Introduction to Kafka](/kafka/introduction.html) audit log : An audit log is a historical record of actions and operations that are triggered when auditable events occurs. Audit log records can be used to troubleshoot system issues, manage security, and monitor compliance, by tracking administrative activity, data access and modification, monitoring sign-in attempts, and reconstructing security breaches and fraudulent activity. **Related terms**: *auditable event* **Related content** - [Audit Log Concepts for Confluent Cloud](/cloud/current/monitoring/audit-logging/cloud-audit-log-concepts.html) - [Audit Log Concepts for Confluent Platform](/platform/current/security/audit-logs/audit-logs-concepts.html) auditable event : An auditable event is an event that represents an action or operation that can be tracked and monitored for security purposes and compliance. When an auditable event occurs, an auditable event method is triggered and an event message is sent to the audit log cluster and stored as an audit log record. **Related terms**: *audit log*, *event message* **Related content** - [Auditable Events in Confluent Cloud](/cloud/current/monitoring/audit-logging/event-methods/index.html) - [Auditable Events in Confluent Platform](/platform/current/security/audit-logs/auditable-events.html) authentication : Authentication is the process of verifying the identity of a principal that interacts with a system or application. Authentication is often used in conjunction with authorization to determine whether a principal is allowed to access a resource and perform a specific action or operation on that resource. Digital authentication requires one or more of the following: something a principal knows (a password or security question), something a principal has (a security token or key), or something a principal is (a biometric characteristic, such as a fingerprint or voiceprint). Multi-factor authentication (MFA) requires two or more forms of authentication. **Related terms**: *authorization*, *identity*, *identity provider*, *identity pool*, *principal*, *role* authorization : Authorization is the process of evaluating and then granting or denying a principal a set of permissions required to access and perform operations on resources. **Related terms**: *authentication*, *group mapping*, *identity*, *identity provider*, *identity pool*, *principal*, *role* Avro : Avro is a data serialization and exchange framework that provides data structures, remote procedure call (RPC), compact binary data format, a container file, and uses JSON to represent schemas. Avro schemas ensure that every field is properly described and documented for use with serializers and deserializers. You can either send a schema with every message or use Schema Registry to store and receive schemas for use by consumers and producers to save bandwidth and storage space. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [Apache Avro - a data serialization system](https://avro.apache.org/) - Confluent Cloud: [Avro Schema Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-avro.html) - Confluent Platform: [Avro Schema Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-avro.html) Basic Kafka cluster : A Confluent Cloud cluster type. Basic Kafka clusters are designed for experimentation, early development, and basic use cases. batch processing : Batch processing is the method of collecting a large volume of data over a specific time interval, after which the data is processed all at once and loaded into a destination system. Batch processing is often used when processing data can occur independently of the source and timing of the data. It is efficient for non-real-time data processing, such as data warehousing, reporting, and analytics. **Related terms**: *bounded stream*, *stream processing*, *unbounded stream* CIDR block : A CIDR block is a group of IP addresses that are contiguous and can be represented as a single block. CIDR blocks are expressed using Classless Inter-domain Routing (CIDR) notation that includes an IP address and a number of bits in the network mask. **Related content** - [CIDR notation](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing#CIDR_notation) - [Classless Inter-domain Routing (CIDR): The Internet Address Assignment and Aggregation Plan [RFC 4632]](https://www.rfc-editor.org/rfc/rfc4632.html) Cluster Linking : Cluster Linking is a highly performant data replication feature that enables links between Kafka clusters to mirror data from one cluster to another. Cluster Linking creates perfect copies of Kafka topics, which keep data in sync across clusters. Use cases include geo-replication of data, data sharing, migration, disaster recovery, and tiered separation of critical applications. **Related content** - [Geo-replication with Cluster Linking on Confluent Cloud](/cloud/current/multi-cloud/cluster-linking/index.html) - [Cluster Linking for Confluent Platform](/platform/current/multi-dc-deployments/cluster-linking/index.html) commit log : A commit log is a log of all event messages about commits (changes or operations made) sent to a Kafka topic. A commit log ensures that all event messages are processed at least once and provides a mechanism for recovery in the event of a failure. The commit log is also referred to as a write-ahead log (WAL) or a transaction log. **Related terms**: *event message* Confluent Cloud : Confluent Cloud is the fully managed, cloud-native event streaming service powered by Kora, the event streaming platform based on Kafka and extended by Confluent to provide high availability, scalability, elasticity, security, and global interconnectivity. Confluent Cloud offers cost-effective multi-tenant configurations as well as dedicated solutions, if stronger isolation is required. **Related terms**: *Apache Kafka*, *Kora* **Related content** - [Confluent Cloud Overview](/cloud/current/index.html) - [Confluent Cloud](https://www.confluent.io/confluent-cloud/) Confluent Cloud network : A Confluent Cloud network is an abstraction for a single tenant network environment that hosts Dedicated Kafka clusters in Confluent Cloud along with their single tenant services, like ksqlDB clusters and managed connectors. **Related content** - [Confluent Cloud Network Overview](/cloud/current/networking/overview.html#ccloud-networks) Confluent for Kubernetes (CFK) : *Confluent for Kubernetes (CFK)* is a cloud-native control plane for deploying and managing Confluent in private cloud environments through declarative API. Confluent Platform : Confluent Platform is a specialized distribution of Kafka at its core, with additional components for data integration, streaming data pipelines, and stream processing. Confluent REST Proxy : Confluent REST Proxy provides a RESTful interface to an Kafka cluster, making it easy to produce and consume messages, view the state of the cluster, and perform administrative actions without using the native Kafka protocol or clients. **Related content** - Confluent Platform: [REST Proxy](/platform/current/kafka-rest/index.html) Confluent Server : Confluent Server is the default Kafka broker component of Confluent Platform that builds on the foundation of Apache Kafka® and provides enhanced proprietary features designed for enterprise use. Confluent Server is fully compatible with Kafka, and adds Kafka cluster support for Role-Based Access Control, Audit Logs, Schema Validation, Self Balancing Clusters, Tiered Storage, Multi-Region Clusters, and Cluster Linking. **Related terms**: *Confluent Platform*, *Apache Kafka*, *Kafka broker*, *Cluster Linking*, *multi-region cluster (MRC)* Confluent Unit for Kafka (CKU) : Confluent Unit for Kafka (CKU) is a unit of horizontal scaling for Dedicated Kafka clusters in Confluent Cloud that provide preallocated resources. CKUs determine the capacity of a Dedicated Kafka cluster in Confluent Cloud. **Related content** - [CKU limits per cluster](/cloud/current/clusters/cluster-types.html#cku-limits-per-cluster) Connect API : The Connect API is the Kafka API that enables a connector to read event streams from a source system and write to a target system. Connect worker : A Connect worker is a server process that runs a connector and performs the actual work of moving data in and out of Kafka topics. A worker is a server process that runs on hardware independent of the Kafka brokers themselves. It is scalable and fault-tolerant, meaning you can run a cluster of workers that share the load of moving data in and out of Kafka from and to external systems. **Related terms**: *connector*, *Kafka Connect* connection attempts : In Confluent Cloud, connection attempts are a Kafka cluster billing dimension that defines the maximum number of new TCP connections to the cluster you can create in one second. This includes successful and unsuccessful authentication attempts. Available in the Metrics API as `successful_authentication_count` (only includes successful authentications, not unsuccessful authentication attempts). To reduce usage on connection attempts, use longer-lived connections to the cluster. If you exceed the maximum, connection attempts may be refused. connector : A connector is an abstract mechanism that enables communication, coordination, or cooperation among components by transferring data elements from one interface to another without changing the data. connector offset : Connector offset uniquely identifies the position of a connector as it processes data. Connectors use a variety of strategies to implement the connector offset, including everything from monotonically increasing integers to replay ids, lists of files, timestamps and even checkpoint information. Connector offsets keep track of already-processed data in the event of a connector restart or recovery. While sink connectors use a pattern for connector offsets similar to the offset mechanism used throughout Kafka, the implementation details for source connectors are often much different. This is because source connectors track the progress of a source system as it process data. consumer : A consumer is a Kafka client application that subscribes to (reads and processes) event messages from a Kafka topic. The Streams API and the Consumer API are the two APIs that enable consumers to read event streams from Kafka topics. **Related terms**: *Consumer API*, *consumer group*, *producer*, *Streams API* Consumer API : The Consumer API is the Kafka API used for consuming (reading) event messages or records from Kafka topics and enables a Kafka consumer to subscribe to a topic and read event messages as they arrive. Batch processing is a common use case for the Consumer API. consumer group : A consumer group is a single logical consumer implemented with multiple physical consumers for reasons of throughput and resilience. By dividing topics among consumers in the group into partitions, consumers in the group can process messages in parallel, increasing message throughput and enabling load balancing. **Related terms**: *consumer*, *partition*, *partition*, *producer*, *topic* consumer lag : Consumer lag is the number of consumer offsets between the latest message produced in a partition and the last message consumed by a consumer, that is the number of messages pending to be consumed from a particular partition. A large consumer lag, or a quickly growing lag, indicates that the consumer is unable to read from a partition as fast as the messages are available. This can be caused by a slow consumer, slow network, or slow broker. consumer offset : Consumer offset is the unique and monotonically increasing integer value that uniquely identifies the position of an event record in a partition. Consumers use offsets to track their current position in the Kafka topic, allowing consumers to resume processing from where they left off. Offsets are stored on the Kafka broker, which does not track which records have been read and which have not. It is up to the consumer connection to track this information. When a consumer acknowledges receiving and processing a message, it commits an offset value that is stored in the special internal topic `__commit_offsets`. cross-resource RBAC role binding : A cross-resource RBAC role binding is a role binding in Confluent Cloud that is applied at the Organization or Environment scope and grants access to multiple resources. For example, assigning a principal the NetworkAdmin role at the Organization scope lets them administer all networks across all Environments in their Organization. **Related terms**: *identity pool*, *principal*, *role*, *role binding* CRUD : CRUD is an acronym for the four basic operations that can be performed on data: Create, Read, Update, and Delete. custom connector : A custom connector is a connector created using Connect plugins uploaded to Confluent Cloud by users. This includes connector plugins that are built from scratch, modified open-source connector plugins, or third-party connector plugins. data at rest : Data at rest is data that is physically stored on non-volatile media (such as hard drives, solid-state drives, or other storage devices) and is not actively being transmitted or processed by a system. data contract : A data contract is a formal agreement between an upstream component and a downstream component on the structure and semantics of data that is in motion. A schema is a key element of a data contract. The schema, metadata, rules, policies, and evolution plan form the data contract. You can associate data contracts (schemas and more) with [topics](#term-Kafka-topic). **Related content** - Confluent Platform: [Data Contracts for Schema Registry on Confluent Platform](/platform/current/schema-registry/fundamentals/data-contracts.html) - Confluent Cloud: [Data Contracts for Schema Registry on Confluent Cloud](/cloud/current/sr/fundamentals/data-contracts.html) - Cloud Console: [Manage Schemas in Confluent Cloud](/cloud/current//sr/schemas-manage.html) data encryption key (DEK) : A data encryption key (DEK) is a symmetric key that is used to encrypt and decrypt data. The DEK is used in client-side field level encryption (CSFLE) to encrypt sensitive data. The DEK is itself encrypted using a key encryption key (KEK) that is only accessible to authorized users. The encrypted DEK and encrypted data are stored together. Only users with access to the KEK can decrypt the DEK and access the sensitive data. **Related terms**: *envelope encryption*, *key encryption key (KEK)* data in motion : Data in motion is data that is actively being transferred between source and destination, typically systems, devices, or networks. Data in motion is also referred to as data in transit or data in flight. data in use : Data in use is data that is actively being processed or manipulated in memory (RAM, CPU caches, or CPU registers). data ingestion : Data ingestion is the process of collecting, importing, and integrating data from various sources into a system for further processing, analysis, or storage. data mapping : Data mapping is the process of defining relationships or associations between source data elements and target data elements. Data mapping is an important process in data integration, data migration, and data transformation, ensuring that data is accurately and consistently represented when it is moved or combined. data pipeline : A data pipeline is a series of processes and systems that enable the flow of data from sources to destinations, automating the movement and transformation of data for various purposes, such as analytics, reporting, or machine learning. A data pipeline typically comprised of a source system, a data ingestion tool, a data transformation tool, and a target system. A data pipeline covers the following stages: data extraction, data transformation, data loading, and data validation. Data Portal : Data Portal is a Confluent Cloud application that uses Stream Catalog and Stream Lineage to provide self-service access throughout Confluent Cloud Console for data practitioners to search and discover existing topics using tags and business metadata, request access to topics and data, and access data in topics to to build streaming applications and data pipelines. Leverages Stream Catalog and Stream Lineage to provide a data-centric view of Confluent optimized for self-service access to data where users can search, discover and understand available data, request access to data, and use data. **Related terms**: *Stream Catalog*, *Stream Lineage* **Related content** - [Data Portal on Confluent Cloud](/cloud/current/stream-governance/data-portal.html) data serialization : Data serialization is the process of converting data structures or objects into a format that can be stored or transmitted, and reconstructed later in the same or another computer environment. Data serialization is a common technique for implementing data persistence, interprocess communication, and object communication. Confluent Schema Registry (in Confluent Platform) and Confluent Cloud Schema Registry support data serialization using serializers and deserializers for the following formats: Avro, JSON Schema, and Protobuf. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) data steward : A data steward is a person with data-related responsibilities, such as data governance, data quality, and data security. data stream : A data stream is a continuous flow of data records that are produced and consumed by applications. dead letter queue (DLQ) : A dead letter queue (DLQ) is a queue where messages that could not be processed successfully by a sink connector are placed. Instead of stopping, the sink connector sends messages that could not be written successfully as event records to the DLQ topic while the sink connector continues processing messages. Dedicated Kafka cluster : A Confluent Cloud cluster type. Dedicated Kafka clusters are designed for critical production workloads with high traffic or private networking requirements. deserializer : A deserializer is a tool that converts a serial byte stream back into objects and parallel data. Deserializers work with serializers (known together as Serdes) to support efficient storage and high-speed data transmission over the wire. Confluent provides Serdes for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) egress : In general networking, egress refers to outbound traffic leaving a network or a specific network segment. In Confluent Cloud, egress is a Kafka cluster billing dimension that defines the number of bytes consumed from the cluster in one second. Available in the Metrics API as `sent_bytes` (convert from bytes to MB). To reduce egress in Confluent Cloud, compress your messages and ensure each consumer is only consuming from the topics it requires. For compression, use lz4. Avoid gzip because of high overhead on the cluster. Elastic Confluent Unit for Kafka (eCKU) : Elastic Confluent Unit for Kafka (eCKU) is used to express capacity for Basic, Standard, Enterprise, and Freight Kafka clusters. These clusters automatically scale up to a fixed ceiling. There is no need to resize these type clusters. When you need more capacity, your cluster expands up to the fixed ceiling. If you’re not using capacity above the minimum, you’re not paying for it. ELT : ELT is an acronym for Extract-Load-Transform, where data is extracted from a source system and loaded into a target system before processing or transformation. Compared to ETL, ELT is a more flexible approach to data ingestion because the data is loaded into the target system before transformation. Enterprise Kafka cluster : A Confluent Cloud cluster type. Enterprise Kafka clusters are designed for production-ready functionality that requires private endpoint networking capabilities. envelope encryption : Envelope encryption is a cryptographic technique that uses two keys to encrypt data. The symmetric data encryption key (DEK) is used to encrypt sensitive data. The separate asymmetric key encryption key (KEK) is the master key used to encrypt the DEK. The DEK and encrypted data are stored together. Only users with access to the KEK can decrypt the DEK and access the sensitive data. In Confluent Cloud, envelope encryption is used to enable client-side field level encryption (CSFLE). CSFLE encrypts sensitive data in a message before it is sent to Confluent Cloud and allows for temporary decryption of sensitive data when required to perform operations on the data. **Related terms**: *data encryption key (DEK)*, *key encryption key (KEK)* ETL : ETL is an acronym for Extract-Transform-Load, where data is extracted from a source system, transformed into a target format, and loaded into a target system. Compared to ELT, ETL is a more rigid approach to data ingestion because the data is transformed before loading into the target system. event : An event is a meaningful action or occurrence of something that happened. Events that can be recognized by a program, either human-generated or triggered by software, can be recorded in a log file or other data store. **Related terms**: *event message*, *event record*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event message : An event message is a record of an event sent to a Kafka topic, represented as a key-value pair. Each event message consists of a key-value pair, a timestamp, the compression type, headers for metadata (optional), and a partition and offset ID (once the message is written). The key is optional and can be used to identify the event. The value is required and contains details about the event that happened. **Related terms**: *event*, *event record*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event record : An event record is the record of an event stored in a Kafka topic. Event records are organized and durably stored in topics. Examples of events include orders, payments, activities, or measurements. An event typically contains one or more data fields that describe the fact, as well as a timestamp that denotes when the event was created by its event source. The event may also contain various metadata, such as its source of origin (for example, the application or cloud service that created the event) and storage-level information (for example, its position in the event stream). **Related terms**: *event*, *event message*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event sink : An event sink is a consumer of events, which can include applications, cloud services, databases, IoT sensors, and more. **Related terms**: *event*, *event message, \*event record*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event source : An event source is a producer of events, which can include cloud services, databases, IoT sensors, mainframes, and more. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event stream*, *event streaming*, *event streaming platform*, *event time* event stream : An event stream is a continuous flow of event messages produced by an event source and consumed by one or more consumers. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event streaming platform*, *event time* event streaming : Event streaming is the practice of capturing event data in real-time from data sources. Event streaming is a form of data streaming that is used to capture, store, process, and react to data in real-time or retrospectively. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming platform*, *event time* event streaming platform : An event streaming platform is a platform that events can be written to once, allowing distributed functions within an organization to react in realtime. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event time* event time : Event time is the time when an event occurred on the producing device, as opposed to the time when the event was processed or recorded. Event time is often used in stream processing to determine the order of events and to perform windowing operations. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event streaming platform* exactly-once semantics : Exactly-once semantics is a guarantee that a message is delivered exactly once and in the order that it was sent. Even if a producer retries sending a message, or a consumer retries processing a message, the message is delivered exactly once. This guarantee is achieved by the broker assigning a unique ID to each message and storing the ID in the consumer offset. The consumer offset is committed to the broker only after the message is processed. If the consumer fails to process the message, the message is redelivered and processed again. Freight Kafka cluster : A Confluent Cloud cluster type. Freight Kafka clusters are designed for high-throughput, relaxed latency workloads that are less expensive than self-managed open source Kafka. granularity : Granularity is the degree or level of detail to which an entity (a system, service, or resource) is broken down into subcomponents, parts, or elements. Entities that are *fine-grained* have a higher level of detail, while *coarse-grained* entities have a reduced level of detail, often combining finer parts into a larger whole. In the context of access control, granular permissions provide precise control over resource access. They allow administrators to grant specific operations on distinct resources. This ensures users only have permissions tailored to their needs, minimizing unnecessary or potentially risky access. group mapping : Group mapping is a set of rules that map groups in your SSO identity provider to Confluent Cloud RBAC roles. When a user signs in to Confluent Cloud using SSO, Confluent Cloud uses the group mapping to grant access to Confluent Cloud resources. **Related terms**: *identity provider*, *identity pool*, *principal*, *role* **Related content** - [Group Mapping for Confluent Cloud](/cloud/current/access-management/authenticate/sso/group-mapping/overview.html) identity : An identity is a unique identifier that is used to authenticate and authorize users and applications to access resources. Identity is often used in conjunction with access control to determine whether a user or application is allowed to access a resource and perform a specific action or operation on that resource. **Related terms**: *identity provider*, *identity pool*, *principal*, *role* identity pool : An identity pool is a collection of identities that can be used to authenticate and authorize users and applications to access resources. Identity pools are used to manage permissions for users and applications that access resources in Confluent Cloud. They are also used to manage permissions for Confluent Cloud service accounts that are used to access resources in Confluent Cloud. identity provider : An identity provider is a trusted provider that authenticates users and issues security tokens that are used to verify the identity of a user. Identity providers are often used in single sign-on (SSO) scenarios, where a user can log in to multiple applications or services with a single set of credentials. Infinite Storage : Infinite Storage is the Confluent Cloud storage service that enhances the scalability of Confluent Cloud resources by separating storage and processing. Tiered storage within Confluent Cloud moves data between storage layers based on the needs of the workload, retrieves tiered data when requested, and garbage collects data that is past retention or otherwise deleted. If an application reads historical data, latency is not increased for other applications reading more recent data. Storage resources are decoupled from compute resources, you only pay for what you produce to Confluent Cloud and for storage that you use, and CKUs do not have storage limits. Related content: - [Infinite Storage in Confluent Cloud for Apache Kafka](https://www.confluent.io/blog/infinite-kafka-data-storage-in-confluent-cloud/) ingress : In general networking, ingress refers to traffic that enters a network from an external source. In Confluent Cloud, ingress is a Kafka cluster billing dimension that defines the number of bytes produced to the cluster in one second. Available in the Metrics API as `received_bytes` (convert from bytes to MB). To reduce ingress in Confluent Cloud, compress your messages. For compression, use lz4. Avoid gzip because of high overhead on the cluster. internal topic : An internal topic is a topic, prefixed with double underscores (“_\_”), that is automatically created by a Kafka component to store metadata about the broker, partition assignment, consumer offsets, and other information. Examples of internal topics: `__cluster_metadata`, `__consumer_offsets`, `__transaction_state`, `__confluent.support.metrics`, and `__confluent.support.metrics-raw`. JSON Schema : JSON Schema is a declarative language used for data serialization and exchange to define data structures, specify formats, and validate JSON documents. It is a way to encode expected data types, properties, and constraints to ensure that all fields are properly described for use with serializers and deserializers. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [JSON Schema - a declarative language that allows you to annotate and validate JSON documents.](https://json-schema.org/) - Confluent Cloud: [JSON Schema Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-avro.html) - Confluent Platform: [JSON Schema Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-json.html) Kafka bootstrap server : A Kafka bootstrap server is a Kafka broker that a Kafka client initiates a connection to a Kafka cluster and returns metadata, which includes the addresses for all of the brokers in the Kafka cluster. Although only one bootstrap server is required to connect to a Kafka cluster, multiple brokers can be specified in a bootstrap server list to provide high availability and fault tolerance in case a broker is unavailable. In Confluent Cloud, the bootstrap server is the general cluster endpoint. Kafka broker : A Kafka broker is a server in the Kafka storage layer that stores event streams from one or more sources. A Kafka cluster is typically comprised of several brokers. Every broker in a cluster is also a bootstrap server, meaning if you can connect to one broker in a cluster, you can connect to every broker. Kafka client : A Kafka client allows you to write distributed applications and microservices that read, write, and process streams of events in parallel, at scale, and in a fault-tolerant manner, even in the case of network problems or machine failures. The Kafka client library provides functions, classes, and utilities that allow developers to create Kafka producer clients (Producers) and consumer clients (Consumers) using various programming languages. The primary way to build production-ready Producers and Consumers is by using your preferred programming language and a Kafka client library. **Related content** - [Build Client Applications for Confluent Cloud](/cloud/current/client-apps/overview.html) - [Build Client Applications for Confluent Platform](/platform/current/clients/index.html) - [Getting Started with Apache Kafka and Java (or Python, Go, .Net, and others)](https://developer.confluent.io/get-started/java/) Kafka cluster : A Kafka cluster is a group of interconnected Kafka brokers that manage and distribute real-time data streaming, processing, and storage as if they are a single system. By distributing tasks and services across multiple Kafka brokers, the Kafka cluster improves availability, reliability, and performance. Kafka Connect : Kafka Connect is the component of Kafka that provides data integration between databases, key-value stores, search indexes, file systems, and Kafka brokers. Kafka Connect is an ecosystem of a client application and pluggable connectors. As a client application, Connect is a server process that runs on hardware independent of the Kafka brokers themselves. It is scalable and fault-tolerant, meaning you can run a cluster of Connect workers that share the load of moving data in and out of Kafka from and to external systems. Connect also abstracts the business of code away from the user and instead requires only JSON configuration to run. **Related content** - Confluent Cloud: [Kafka Connect](/cloud/current/billing/overview.html#kconnect-long) - Confluent Platform: [Kafka Connect](/platform/current/connect/index.html) Kafka controller : A Kafka controller is the node in a Kafka cluster that is responsible for managing and changing the metadata of the cluster. This node also communicates metadata changes to the rest of the cluster. When Kafka uses ZooKeeper for metadata management, the controller is a broker, and the broker persists the metadata to ZooKeeper for backup and recovery. With KRaft, you dedicate Kafka nodes to operate as controllers and the metadata is stored in Kafka itself and not persisted to ZooKeeper. KRaft enables faster recovery because of this. For more information, see [KRaft overview](/platform/current/kafka-metadata/kraft.html). Kafka listener : A Kafka listener is an endpoint that Kafka brokers bind to use to communicate with clients. For Kafka clusters, Kafka listeners are configured in the `listeners` property of the `server.properties` file. Advertised listeners are publicly accessible endpoints that are used by clients to connect to the Kafka cluster. **Related content** - [Kafka Listeners – Explained](https://www.confluent.io/blog/kafka-listeners-explained/) Kafka metadata : Kafka metadata is the information about the Kafka cluster and the topics that are stored in it. This information includes details such as the brokers in the cluster, the topics that are available, the partitions for each topic, and the location of the leader for each partition. Kafka metadata is used by clients to discover the available brokers and topics, and to determine which broker is the leader for a particular partition. This information is essential for clients to be able to send and receive messages to and from Kafka. Kafka Streams : Kafka Streams is a stream processing library for building streaming applications and microservices that transform (filter, group mapping, aggregate, join, and more) incoming event streams in real-time to Kafka topics stored in an Kafka cluster. The Streams API can be used to build applications that process data in real-time, analyze data continuously, and build data pipelines. **Related content** - [Kafka Streams](/platform/current/streams/overview.html) Kafka topic : See *topic*. key encryption key (KEK) : A key encryption key (KEK) is a master key that is used to encrypt and a decrypt other keys, specifically the data encryption key (DEK). Only users with access to the KEK can decrypt the DEK and access the sensitive data. **Related terms**: *data encryption key (DEK)*, *envelope encryption*. Kora : Kora is the cloud-native streaming data service based on Kafka technology that powers the Confluent Cloud event streaming platform for building real-time data pipelines and streaming applications. Kora abstracts low-level resources, such as Kafka brokers, and hides operational complexities, such as system upgrades. Kora is built on the following foundations: a tiered storage layer that improves cost and performance, elasticity and consistent performance through incremental load balancing, cost effective multi-tenancy with dynamic quota management and cell-based isolation, continuous monitoring of both system health and data integrity, and clean abstraction with standard Kafka protocols and CKUs to hide underlying resources. **Related terms**: *Apache Kafka*, *Confluent Cloud*, *Confluent Unit for Kafka (CKU)* **Related content** - [Kora: The Cloud Native Engine for Apache Kafka](https://www.confluent.io/blog/cloud-native-data-streaming-kafka-engine/) - [Kora: A Cloud-Native Event Streaming Platform For Kafka](https://www.vldb.org/pvldb/vol16/p3822-povzner.pdf) KRaft : KRaft (or Apache Kafka Raft) is a consensus protocol introduced in Kafka 2.4 to provide metadata management for Kafka with the goal to replace ZooKeeper. KRaft simplifies Kafka because it enables the management of metadata in Kafka itself, rather than splitting it between ZooKeeper and Kafka. As of Confluent Platform 7.5, KRaft is the default method of metadata management in new deployments. For more information, see [KRaft overview](/platform/current/kafka-metadata/kraft.html). ksqlDB : ksqlDB is a streaming SQL database engine purpose-built for creating stream processing applications on top of Kafka. logical Kafka cluster (LKC) : A logical Kafka cluster (LKC) is a subset of a physical Kafka cluster (PKC) that is isolated from other logical clusters within Confluent Cloud. Each logical unit of isolation is considered a tenant and maps to a specific organization. If the mapping is one-to-one, one LKC maps to one PKC (a Dedicated cluster). If the mapping is many-to-one, one LKC maps to one of the multitenant Kafka cluster types (Basic, Standard, Enterprise, and Freight). **Related terms**: *Confluent Cloud*, *Kafka cluster*, *physical Kafka cluster (PKC)* **Related content** - [Kafka Cluster Types in Confluent Cloud](/cloud/current/clusters/cluster-types.html) - [Multi-tenancy and Client Quotas on Confluent Cloud](/cloud/current/clusters/client-quotas.html) multi-region cluster (MRC) : A multi-region cluster (MRC) is a single Kafka cluster that replicates data between datacenters across regional availability zones. multi-tenancy : Multi-tenancy is a software architecture in which a single physical instance is shared among multiple logical instances, or tenants. In Confluent Cloud, each Basic, Standard, Enterprise, and Freight cluster is a logical Kafka cluster (LKC) that shares a physical Kafka cluster (PKC) with other tenants. Each LKC is isolated from other L and has its own resources, such as memory, compute, and storage. **Related terms**: *Confluent Cloud*, *logical Kafka cluster (LKC)*, *physical Kafka cluster (PKC)* **Related content** * [From On-Prem to Cloud-Native: Multi-Tenancy in Confluent Cloud](https://www.confluent.io/blog/cloud-native-multi-tenant-kafka-with-confluent-cloud/) * [Multi-tenancy and Client Quotas on Confluent Cloud](/cloud/current/clusters/client-quotas.html) offset : An offset is an integer assigned to each message that uniquely represents its position within the data stream, guaranteeing the ordering of records and allowing offset-based connections to replay messages from any point in time. **Related terms**: *consumer offset*, *connector offset*, *offset commit*, *replayability* offset commit : An offset commit is the process of keeping track of the current position of an offset-based connection (primarily Kafka consumers and connectors) within the data stream. The offset commit process is not specific to consumers, producers, or connectors. It is a general mechanism in Kafka to track the position of any application that is reading data. When a consumer commits an offset, the offset identifies the next message the consumer should consume. For example, if a consumer has an offset of 5, it has consumed messages 0 through 4 and will next consume message 5. If the consumer crashes or is shut down, its partitions are reassigned to another consumer which initiates consuming from the last committed offset of each partition. The committed offset for consumers is stored on a Kafka broker. When a consumer commits an offset, it sends a commit request to the Kafka cluster, specifying the partition and offset it wants to commit for a particular consumer group. The Kafka broker receiving the commit request then stores this offset in the `__consumer_offsets` internal topic. **Related terms**: *consumer offset*, *offset* OpenSSL : OpenSSL is an open-source software library and toolkit that implements the Secure Sockets Layer (SSL) and Transport Layer Security (TLS) protocols. For more information, see [Wikipedia](https://en.wikipedia.org/wiki/OpenSSL). parent cluster : The Kafka cluster that a resource belongs to. **Related terms**: *Kafka cluster* partition : A partition is a unit of data storage that divides a topic into multiple, parallel event streams, each of which is stored on separate Kafka brokers and can be consumed independently. Partitioning is a key concept in Kafka because it allows Kafka to scale horizontally by adding more brokers to the cluster. Partitions are also the unit of parallelism in Kafka. A topic can have one or more partitions, and each partition is an ordered, immutable sequence of event records that is continually appended to a partition log. partitions (pre-replication) : In Confluent Cloud, partitions are a Kafka cluster billing dimension that define the maximum number of partitions that can exist on the cluster at one time, before replication. While you are not charged for partitions on any type of Kafka cluster, the number of partitions you use has an impact on eCKU. To determine eCKUs limits for partitions, Confluent Cloud bills only for pre-replication (leader partitions) across a cluster. All topics that you create (as well as internal topics that are automatically created by Confluent Platform components such as ksqlDB, Kafka Streams, Connect, and Control Center (Legacy)) count towards the cluster partition limit. Confluent prefixes topics created automatically with an underscore (_). Topics that are internal to Kafka itself (consumer offsets) are not visible in Cloud Console and do not count against partition limits or toward partition billing. Available in the Metrics API as `partition_count`. In Confluent Cloud, attempts to create additional partitions beyond the cluster limit fail with an error message. To reduce usage on partitions (pre-replication), delete unused topics and create new topics with fewer partitions. Use the Kafka Admin interface to increase the partition count of an existing topic if the initial partition count is too low. physical Kafka cluster (PKC) : A physical Kafka cluster (PKC) is a Kafka cluster comprised of multiple brokers. Each physical Kafka cluster is created on a Kubernetes cluster by the control plane. A PKC is not directly accessible by clients. principal : A principal is an entity that can be authenticated and granted permissions based on roles to access resources and perform operations. An entity can be a user account, service account, group mapping, or identity pool. **Related terms**: *group mapping*, *identity*, *identity pool*, *role*, *service account*, *user account* private internet : A private internet is a closed, restricted computer network typically used by organizations to provide secure environments for managing sensitive data and resources. processing time : Processing time is the time when an event is processed or recorded by a system, as opposed to the time when the event occurred on the producing device. Processing time is often used in stream processing to determine the order of events and to perform windowing operations. producer : A producer is a client application that publishes (writes) data to a topic in an Kafka cluster. Producers write data to a topic and are the only clients that can write data to a topic. Each record written to a topic is appended to the partition of the topic that is selected by the producer. Producer API : The Producer API is the Kafka API that allows you to write data to a topic in an Kafka cluster. The Producer API is used by producer clients to publish data to a topic in an Kafka cluster. Protobuf : Protobuf (or Protocol Buffers) is an open-source data format used to serialize structured data for storage. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [Protocol Buffers](https://protobuf.dev/) - [Getting Started with Protobuf in Confluent Cloud](https://www.confluent.io/blog/using-protobuf-in-confluent-cloud/) - Confluent Cloud: [Protobuf Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-protobuf.html) - Confluent Platform: [Protobuf Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-protobuf.html) public internet : The public internet is the global system of interconnected computers and networks that use TCP/IP to communicate with each other. rebalancing : Rebalancing is the process of redistributing the partitions of a topic among the consumers of a consumer group for improved performance and scalability. A rebalance can occur if a consumer has failed the heartbeat and has been excluded from the group, it voluntarily left the group, metadata has been updated for a consumer, or a consumer has joined the group. replayability : Replayability is the ability to replay messages from any point in time. **Related terms**: *consumer offset*, *offset*, *offset commit* replication : Replication is the process of creating and maintaining multiple copies (or *replicas*) of data across different nodes in a distributed system to increase availability, reliability, redundancy, and accessibility. replication factor : A replication factor is the number of copies of a partition that are distributed across the brokers in a cluster. requests : In Confluent Cloud, requests are a Kafka cluster billing dimension that defines the number of client requests to the cluster in one second. Available in the Metrics API as `request_count`. To reduce usage on requests, you can adjust producer batching configurations, consumer client batching configurations, and shut down otherwise inactive clients. For Dedicated clusters, a high number of requests per second results in increased load on the cluster. role : A role is a Confluent-defined job function assigned a set of permissions required to perform specific actions or operations on Confluent resources bound to a principal and Confluent resources. A role can be assigned to a user account, group mapping, service account, or identity pool. **Related terms**: *group mapping*, *identity*, *identity pool*, *principal*, *service account* **Related content** - [Predefined RBAC Roles in Confluent Cloud](/cloud/current/access-management/access-control/rbac/predefined-rbac-roles.html) - [Role-Based Access Control Predefined Roles in Confluent Platform](/platform/current/security/rbac/rbac-predefined-roles.html) rolling restart : A rolling restart restarts the brokers in a Kafka cluster with zero downtime by incrementally restarting a Kafka broker after verifying that there are no under-replicated partitions on the broker before proceeding to the next broker. Restarting the brokers one at a time allows for software upgrades, broker configuration updates, or cluster maintenance while maintaining high availability by avoiding downtime. **Related content** - [Rolling restart](/platform/current/kafka/post-deployment.html#rolling-restart) schema : A schema is the structured definition or blueprint used to describe the format and structure event messages sent through the Kafka event streaming platform. Schemas are used to validate the structure of data in event messages and ensures that producers and consumers are sending and receiving data in the same format. Schemas are defined in the Schema Registry. Schema Registry : Schema Registry is a centralized repository for managing and validating schemas for topic message data that stores and manages schemas for Kafka topics. Schema Registry is built into Confluent Cloud as a managed service, available with the Advanced Stream Governance package, and offered as part of Confluent Enterprise for self-managed deployments. The Schema Registry is a RESTful service that stores and manages schemas for Kafka topics. The Schema Registry is integrated with Kafka and Connect to provide a central location for managing schemas and validating data. Producers and consumers to Kafka topics use schemas to ensure data consistency and compatibility as schemas evolve. Schema Registry is a key component of Stream Governance. **Related content** - Confluent Cloud: [Manage Schemas in Confluent Cloud](/cloud/current/sr/schemas-manage.html) - Confluent Platform: [Schema Registry Overview](/platform/current/schema-registry/index.html) schema subject : A schema subject is the namespace for a schema in Schema Registry. This unique identifier defines a logical grouping of related schemas. Kafka topics contain event messages serialized and deserialized using the structure and rules defined in a schema subject. This ensures compatibility and supports schema evolution. **Related content** - Confluent Cloud: [Manage Schemas in Confluent Cloud](/cloud/current/sr/schemas-manage.html) - Confluent Platform: [Schema Registry Concepts](/platform/current/schema-registry/index.html) - [Understanding Schema Subjects](https://developer.confluent.io/courses/schema-registry/schema-subjects/) Serdes : Serdes are serializers and deserializers that convert objects and parallel data into a serial byte stream for efficient storage and high-speed data transmission over the wire. Confluent provides Serdes for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) - [Serde](https://serde.rs/) serializer : A serializer is a tool that converts objects and parallel data into a serial byte stream. Serializers work with deserializers (known together as Serdes) to support efficient storage and high-speed data transmission over the wire. Confluent provides serializers for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) service account : A service account is a non-person entity used by an application or service to access resources and perform operations. Because a service account is an identity independent of the user who created it, it can be used programmatically to authenticate to resources and perform operations without the need for a user to be signed in. **Related content** - [Service Accounts for Confluent Cloud](/cloud/current/access-management/identity/service-accounts.html) service quota : A service quota is the limit, or maximum value, for a specific Confluent Cloud resource or operation that might vary by the resource scope it applies to. **Related content** - [Service Quotas for Confluent Cloud](/cloud/current/quotas/index.html) single message transform (SMT) : A single message transform (SMT) is a transformation or operation applied in realtime on an individual message that changes the values, keys, or headers of a message before being sent to a sink connector or after being read from a source connector. SMTs are convenient for inserting fields, masking information, event routing, and other minor data adjustments. single sign-on (SSO) : Single sign-on (SSO) is a centralized authentication service that allows users to use a single set of credentials to log in to multiple applications or services. **Related terms**: *authentication*, *group mapping*, *identity provider* **Related content** - [Single Sign-On for Confluent Cloud](/cloud/current/access-management/authenticate/sso/index.html) sink connector : A sink connector is a Kafka Connect connector that publishes (writes) data from a Kafka topic to an external system. source connector : A source connector is a Kafka Connect connector that subscribes (reads) data from a source (external system), extracts the payload and schema of the data, and publishes (writes) the data to Kafka topics. standalone : Standalone refers to a configuration in which a software application, system, or service operates independently on a single instance or device. This mode is commonly used for development, testing, and debugging purposes. For Kafka Connect, a standalone worker is a single process responsible for running all connectors and tasks on a single instance. Standard Kafka cluster : A Confluent Cloud cluster type. Standard Kafka clusters are designed for production-ready features and functionality. static egress IP address : A static egress IP address is an IP address used by a Confluent Cloud managed connector to establish outbound connections to endpoints of external data sources and sinks over the public internet. **Related content** - [Use Static IP Addresses on Confluent Cloud for Connectors and Cluster Linking](/cloud/current/networking/static-egress-ip-addresses.html) - [Static Egress IP Addresses for Confluent Cloud Connectors](/cloud/current/connectors/static-egress-ip.html) storage (pre-replication) : In Confluent Cloud, storage is a Kafka cluster billing dimension that defines the number of bytes retained on the cluster, pre-replication. Available in the Metrics API as `retained_bytes` (convert from bytes to TB). The returned value is pre-replication. Standard, Enterprise, Dedicated, and Freight clusters support Infinite Storage. This means there is no maximum size limit for the amount of data that can be stored on the cluster. You can configure policy settings `retention.bytes` and `retention.ms` at the topic level to control exactly how much and how long to retain data in a way that makes sense for your applications and helps control your costs. To reduce storage in Confluent Cloud, compress your messages and reduce retention settings. For compression, use lz4. Avoid gzip because of high overhead on the cluster. Stream Catalog : Stream Catalog is a pillar of Confluent Cloud Stream Governance that provides a centralized inventory of your organization’s data assets that supports data governance and data discovery. With Data Portal in Confluent Cloud Console, users can find event streams across systems, search topics by name or tags, and enrich event data to increase value and usefulness. REST and GraphQL APIs can be used to search schemas, apply tags to records or fields, manage business metadata, and discover relationships across data assets. **Related content** - [Stream Catalog on Confluent Cloud: User Guide to Manage Tags and Metadata](/cloud/current/stream-governance/stream-catalog.html) - [Stream Catalog in Streaming Data Governance (Confluent Developer course)](https://developer.confluent.io/courses/governing-data-streams/stream-catalog/) Stream Governance : Stream Governance is a collection of tools and features that provide data governance for data in motion. These include data quality tools such as Schema Registry, schema ID validation, and schema linking; built-in data catalog capabilities to classify, organize, and find event streams across systems; and stream lineage to visualize complex data relationships and uncover insights with interactive, end-to-end maps of event streams. Taken together, these and other governance tools enable teams to manage the availability, integrity, and security of data used across organizations, and help with standardization, monitoring, collaboration, reporting, and more. **Related terms**: *Stream Catalog*, *Stream Lineage* **Related content** - [Stream Governance on Confluent Cloud](/cloud/current/stream-governance/index.html) stream lineage : Stream lineage is the life cycle, or history, of data, including its origins, transformations, and consumption, as it moves through various stages in data pipelines, applications, and systems. Stream lineage provides a record of data’s journey from its source to its destination, and is used to track data quality, data governance, and data security. **Related terms**: **Data Portal**, *Stream Governance* **Related content** - [Stream Lineage on Confluent Cloud](/cloud/current/stream-governance/stream-lineage.html) stream processing : Stream processing is the method of collecting event stream data in real-time as it arrives, transforming the data in real-time using operations (such as filters, joins, and aggregations), and publishing the results to one or more target systems. Stream processing can be used to analyze data continuously, build data pipelines, and process time-sensitive data in real-time. Using the Confluent event streaming platform, event streams can be processed in real-time using Kafka Streams, Kafka Connect, or ksqlDB. Streams API : The Streams API is the Kafka API that allows you to build streaming applications and microservices that transform (for example, filter, group, aggregate, join) incoming event streams in real-time to Kafka topics stored in a Kafka cluster. The Streams API is used by stream processing clients to process data in real-time, analyze data continuously, and build data pipelines. **Related content** - [Introduction Kafka Streams API](/platform/current/streams/introduction.html) throttling : Throttling is the process Kafka clusters in Confluent Cloud use to protect themselves from getting to an over-utilized state. Also known as backpressure, throttling in Confluent Cloud occurs when cluster load reaches 80%. At this point, applications may start seeing higher latencies or timeouts as the cluster must begin throttling requests or connection attempts. topic : A topic is a user-defined category or feed name where event messages are stored and published by producers and subscribed to by consumers. Each topic is a log of event messages. Topics are stored in one or more partitions, which distribute topic records brokers in a Kafka cluster. Each partition is an ordered, immutable sequence of records that are continually appended to a topic. **Related content** - [Manage Topics in Confluent Cloud](/cloud/current/client-apps/topics/index.html) total client connections : In Confluent Cloud, total client connections are a Kafka cluster billing dimension that defines the number of TCP connections to the cluster you can open at one time. Available in the Metrics API as `active_connection_count`. Filter by principal to understand how many connections each application is creating. How many connections a cluster supports can vary widely based on several factors, including number of producer clients, number of consumer clients, partition keying strategy, produce patterns per client, and consume patterns per client. For Dedicated clusters, Confluent derives a guideline for total client connections from benchmarking that indicates exceeding this number of connections increases produce latency for test clients. However, this does not apply to all workloads. That is why total client connections are a guideline, not a hard limit for Dedicated Kafka clusters. Monitor the impact on cluster load as connection count increases, as this is the final representation of the impact of a given workload or CKU dimension on the underlying resources of the cluster. Consider the Confluent guideline a per-CKU guideline. The number of connections tends to increase when you add brokers. In other words, if you significantly exceed the per-CKU guideline, cluster expansion doesn’t always give your cluster more connection count headroom. Transport Layer Security (TLS) : Transport Layer Security (TLS) is a cryptographic protocol that provides secure communication over a network. For more information, see [Wikipedia](https://en.wikipedia.org/wiki/Transport_Layer_Security). unbounded stream : An unbounded stream is a stream of data that is continuously generated in real-time and has no defined end. Examples of unbounded streams include stock prices, sensor data, and social media feeds. Processing unbounded streams requires a different approach than processing bounded streams. Unbounded streams are processed incrementally as data arrives, while bounded streams are processed as a batch after all data has arrived. Kafka Streams and Flink can be used to process unbounded streams. **Related terms**: *bounded stream*, *stream processing*, *unbounded stream* under replication : Under replication is a situation when the number of in-sync replicas is below the number of all replicas. Under Replicated partitions can occur when a broker is down or cannot replicate fast enough from the leader (replica fetcher lag). user account : A user account is an account representing the identity of a person who can be authenticated and granted access to Confluent Cloud resources. **Related content** - [User Accounts for Confluent Cloud](/cloud/current/access-management/identity/user-accounts/overview.html) watermark : A watermark in Flink is a marker that keeps track of time as data is processed. A watermark means that all records until the current moment in time have been “seen”. This way, Flink can correctly perform tasks that depend on when things happened, like calculating aggregations over time windows. **Related content** - [Time and Watermarks](/cloud/current/flink/concepts/timely-stream-processing.html) ## Backup and Restore Azure Blob Storage Source Connector Partitions The Backup and Restore Azure Blob Storage Source connector’s *partitioner* determines how records read from Azure Blob Storage objects are pushed into a Kafka topic. The partitioner is specified in the connector configuration with the `partitioner.class` configuration property. The Backup and Restore Azure Blob Storage Source connector comes with the following partitioner: * **Default Partitioner**: The `io.confluent.connect.storage.partitioner.DefaultPartitioner` reads records from each Azure Blob Storage objects with names that include the Kafka topic and push it to the same topic partitions as in Kafka. The `` is always `/partition=`, resulting in Azure Blob Storage object names such as `//partition=/++. Admin API : The Admin API is the Kafka REST API that enables administrators to manage and monitor Kafka clusters, topics, brokers, and other Kafka components. Ansible Playbooks for Confluent Platform : Ansible Playbooks for Confluent Platform is a set of Ansible playbooks and roles that are designed to automate the deployment and management of Confluent Platform. Apache Flink : Apache Flink is an open source stream processing framework for stateful computations over unbounded and bounded data streams. Flink provides a unified API for batch and stream processing that supports event-time and out-of-order processing, and supports exactly-once semantics. Flink applications include real-time analytics, data pipelines, and event-driven applications. **Related terms**: *bounded stream*, *data stream*, *stream processing*, *unbounded stream* **Related content** - [Apache Flink: Stream Processing and SQL on Confluent Cloud](/cloud/current/flink/index.html#) - [What is Apache Flink?](https://www.confluent.io/learn/apache-flink/) - [Apache Flink 101 (Confluent Developer course)](https://developer.confluent.io/courses/apache-flink/intro/) Apache Kafka : Apache Kafka is an open source event streaming platform that provides a unified, high-throughput, low-latency, fault-tolerant, scalable, distributed, and secure data streaming platform. Kafka is a publish-and-subscribe messaging system that enables distributed applications to ingest, process, and share data in real-time. **Related content** - [Introduction to Kafka](/kafka/introduction.html) audit log : An audit log is a historical record of actions and operations that are triggered when auditable events occurs. Audit log records can be used to troubleshoot system issues, manage security, and monitor compliance, by tracking administrative activity, data access and modification, monitoring sign-in attempts, and reconstructing security breaches and fraudulent activity. **Related terms**: *auditable event* **Related content** - [Audit Log Concepts for Confluent Cloud](/cloud/current/monitoring/audit-logging/cloud-audit-log-concepts.html) - [Audit Log Concepts for Confluent Platform](/platform/current/security/audit-logs/audit-logs-concepts.html) auditable event : An auditable event is an event that represents an action or operation that can be tracked and monitored for security purposes and compliance. When an auditable event occurs, an auditable event method is triggered and an event message is sent to the audit log cluster and stored as an audit log record. **Related terms**: *audit log*, *event message* **Related content** - [Auditable Events in Confluent Cloud](/cloud/current/monitoring/audit-logging/event-methods/index.html) - [Auditable Events in Confluent Platform](/platform/current/security/audit-logs/auditable-events.html) authentication : Authentication is the process of verifying the identity of a principal that interacts with a system or application. Authentication is often used in conjunction with authorization to determine whether a principal is allowed to access a resource and perform a specific action or operation on that resource. Digital authentication requires one or more of the following: something a principal knows (a password or security question), something a principal has (a security token or key), or something a principal is (a biometric characteristic, such as a fingerprint or voiceprint). Multi-factor authentication (MFA) requires two or more forms of authentication. **Related terms**: *authorization*, *identity*, *identity provider*, *identity pool*, *principal*, *role* authorization : Authorization is the process of evaluating and then granting or denying a principal a set of permissions required to access and perform operations on resources. **Related terms**: *authentication*, *group mapping*, *identity*, *identity provider*, *identity pool*, *principal*, *role* Avro : Avro is a data serialization and exchange framework that provides data structures, remote procedure call (RPC), compact binary data format, a container file, and uses JSON to represent schemas. Avro schemas ensure that every field is properly described and documented for use with serializers and deserializers. You can either send a schema with every message or use Schema Registry to store and receive schemas for use by consumers and producers to save bandwidth and storage space. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [Apache Avro - a data serialization system](https://avro.apache.org/) - Confluent Cloud: [Avro Schema Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-avro.html) - Confluent Platform: [Avro Schema Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-avro.html) Basic Kafka cluster : A Confluent Cloud cluster type. Basic Kafka clusters are designed for experimentation, early development, and basic use cases. batch processing : Batch processing is the method of collecting a large volume of data over a specific time interval, after which the data is processed all at once and loaded into a destination system. Batch processing is often used when processing data can occur independently of the source and timing of the data. It is efficient for non-real-time data processing, such as data warehousing, reporting, and analytics. **Related terms**: *bounded stream*, *stream processing*, *unbounded stream* CIDR block : A CIDR block is a group of IP addresses that are contiguous and can be represented as a single block. CIDR blocks are expressed using Classless Inter-domain Routing (CIDR) notation that includes an IP address and a number of bits in the network mask. **Related content** - [CIDR notation](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing#CIDR_notation) - [Classless Inter-domain Routing (CIDR): The Internet Address Assignment and Aggregation Plan [RFC 4632]](https://www.rfc-editor.org/rfc/rfc4632.html) Cluster Linking : Cluster Linking is a highly performant data replication feature that enables links between Kafka clusters to mirror data from one cluster to another. Cluster Linking creates perfect copies of Kafka topics, which keep data in sync across clusters. Use cases include geo-replication of data, data sharing, migration, disaster recovery, and tiered separation of critical applications. **Related content** - [Geo-replication with Cluster Linking on Confluent Cloud](/cloud/current/multi-cloud/cluster-linking/index.html) - [Cluster Linking for Confluent Platform](/platform/current/multi-dc-deployments/cluster-linking/index.html) commit log : A commit log is a log of all event messages about commits (changes or operations made) sent to a Kafka topic. A commit log ensures that all event messages are processed at least once and provides a mechanism for recovery in the event of a failure. The commit log is also referred to as a write-ahead log (WAL) or a transaction log. **Related terms**: *event message* Confluent Cloud : Confluent Cloud is the fully managed, cloud-native event streaming service powered by Kora, the event streaming platform based on Kafka and extended by Confluent to provide high availability, scalability, elasticity, security, and global interconnectivity. Confluent Cloud offers cost-effective multi-tenant configurations as well as dedicated solutions, if stronger isolation is required. **Related terms**: *Apache Kafka*, *Kora* **Related content** - [Confluent Cloud Overview](/cloud/current/index.html) - [Confluent Cloud](https://www.confluent.io/confluent-cloud/) Confluent Cloud network : A Confluent Cloud network is an abstraction for a single tenant network environment that hosts Dedicated Kafka clusters in Confluent Cloud along with their single tenant services, like ksqlDB clusters and managed connectors. **Related content** - [Confluent Cloud Network Overview](/cloud/current/networking/overview.html#ccloud-networks) Confluent for Kubernetes (CFK) : *Confluent for Kubernetes (CFK)* is a cloud-native control plane for deploying and managing Confluent in private cloud environments through declarative API. Confluent Platform : Confluent Platform is a specialized distribution of Kafka at its core, with additional components for data integration, streaming data pipelines, and stream processing. Confluent REST Proxy : Confluent REST Proxy provides a RESTful interface to an Kafka cluster, making it easy to produce and consume messages, view the state of the cluster, and perform administrative actions without using the native Kafka protocol or clients. **Related content** - Confluent Platform: [REST Proxy](/platform/current/kafka-rest/index.html) Confluent Server : Confluent Server is the default Kafka broker component of Confluent Platform that builds on the foundation of Apache Kafka® and provides enhanced proprietary features designed for enterprise use. Confluent Server is fully compatible with Kafka, and adds Kafka cluster support for Role-Based Access Control, Audit Logs, Schema Validation, Self Balancing Clusters, Tiered Storage, Multi-Region Clusters, and Cluster Linking. **Related terms**: *Confluent Platform*, *Apache Kafka*, *Kafka broker*, *Cluster Linking*, *multi-region cluster (MRC)* Confluent Unit for Kafka (CKU) : Confluent Unit for Kafka (CKU) is a unit of horizontal scaling for Dedicated Kafka clusters in Confluent Cloud that provide preallocated resources. CKUs determine the capacity of a Dedicated Kafka cluster in Confluent Cloud. **Related content** - [CKU limits per cluster](/cloud/current/clusters/cluster-types.html#cku-limits-per-cluster) Connect API : The Connect API is the Kafka API that enables a connector to read event streams from a source system and write to a target system. Connect worker : A Connect worker is a server process that runs a connector and performs the actual work of moving data in and out of Kafka topics. A worker is a server process that runs on hardware independent of the Kafka brokers themselves. It is scalable and fault-tolerant, meaning you can run a cluster of workers that share the load of moving data in and out of Kafka from and to external systems. **Related terms**: *connector*, *Kafka Connect* connection attempts : In Confluent Cloud, connection attempts are a Kafka cluster billing dimension that defines the maximum number of new TCP connections to the cluster you can create in one second. This includes successful and unsuccessful authentication attempts. Available in the Metrics API as `successful_authentication_count` (only includes successful authentications, not unsuccessful authentication attempts). To reduce usage on connection attempts, use longer-lived connections to the cluster. If you exceed the maximum, connection attempts may be refused. connector : A connector is an abstract mechanism that enables communication, coordination, or cooperation among components by transferring data elements from one interface to another without changing the data. connector offset : Connector offset uniquely identifies the position of a connector as it processes data. Connectors use a variety of strategies to implement the connector offset, including everything from monotonically increasing integers to replay ids, lists of files, timestamps and even checkpoint information. Connector offsets keep track of already-processed data in the event of a connector restart or recovery. While sink connectors use a pattern for connector offsets similar to the offset mechanism used throughout Kafka, the implementation details for source connectors are often much different. This is because source connectors track the progress of a source system as it process data. consumer : A consumer is a Kafka client application that subscribes to (reads and processes) event messages from a Kafka topic. The Streams API and the Consumer API are the two APIs that enable consumers to read event streams from Kafka topics. **Related terms**: *Consumer API*, *consumer group*, *producer*, *Streams API* Consumer API : The Consumer API is the Kafka API used for consuming (reading) event messages or records from Kafka topics and enables a Kafka consumer to subscribe to a topic and read event messages as they arrive. Batch processing is a common use case for the Consumer API. consumer group : A consumer group is a single logical consumer implemented with multiple physical consumers for reasons of throughput and resilience. By dividing topics among consumers in the group into partitions, consumers in the group can process messages in parallel, increasing message throughput and enabling load balancing. **Related terms**: *consumer*, *partition*, *partition*, *producer*, *topic* consumer lag : Consumer lag is the number of consumer offsets between the latest message produced in a partition and the last message consumed by a consumer, that is the number of messages pending to be consumed from a particular partition. A large consumer lag, or a quickly growing lag, indicates that the consumer is unable to read from a partition as fast as the messages are available. This can be caused by a slow consumer, slow network, or slow broker. consumer offset : Consumer offset is the unique and monotonically increasing integer value that uniquely identifies the position of an event record in a partition. Consumers use offsets to track their current position in the Kafka topic, allowing consumers to resume processing from where they left off. Offsets are stored on the Kafka broker, which does not track which records have been read and which have not. It is up to the consumer connection to track this information. When a consumer acknowledges receiving and processing a message, it commits an offset value that is stored in the special internal topic `__commit_offsets`. cross-resource RBAC role binding : A cross-resource RBAC role binding is a role binding in Confluent Cloud that is applied at the Organization or Environment scope and grants access to multiple resources. For example, assigning a principal the NetworkAdmin role at the Organization scope lets them administer all networks across all Environments in their Organization. **Related terms**: *identity pool*, *principal*, *role*, *role binding* CRUD : CRUD is an acronym for the four basic operations that can be performed on data: Create, Read, Update, and Delete. custom connector : A custom connector is a connector created using Connect plugins uploaded to Confluent Cloud by users. This includes connector plugins that are built from scratch, modified open-source connector plugins, or third-party connector plugins. data at rest : Data at rest is data that is physically stored on non-volatile media (such as hard drives, solid-state drives, or other storage devices) and is not actively being transmitted or processed by a system. data contract : A data contract is a formal agreement between an upstream component and a downstream component on the structure and semantics of data that is in motion. A schema is a key element of a data contract. The schema, metadata, rules, policies, and evolution plan form the data contract. You can associate data contracts (schemas and more) with [topics](#term-Kafka-topic). **Related content** - Confluent Platform: [Data Contracts for Schema Registry on Confluent Platform](/platform/current/schema-registry/fundamentals/data-contracts.html) - Confluent Cloud: [Data Contracts for Schema Registry on Confluent Cloud](/cloud/current/sr/fundamentals/data-contracts.html) - Cloud Console: [Manage Schemas in Confluent Cloud](/cloud/current//sr/schemas-manage.html) data encryption key (DEK) : A data encryption key (DEK) is a symmetric key that is used to encrypt and decrypt data. The DEK is used in client-side field level encryption (CSFLE) to encrypt sensitive data. The DEK is itself encrypted using a key encryption key (KEK) that is only accessible to authorized users. The encrypted DEK and encrypted data are stored together. Only users with access to the KEK can decrypt the DEK and access the sensitive data. **Related terms**: *envelope encryption*, *key encryption key (KEK)* data in motion : Data in motion is data that is actively being transferred between source and destination, typically systems, devices, or networks. Data in motion is also referred to as data in transit or data in flight. data in use : Data in use is data that is actively being processed or manipulated in memory (RAM, CPU caches, or CPU registers). data ingestion : Data ingestion is the process of collecting, importing, and integrating data from various sources into a system for further processing, analysis, or storage. data mapping : Data mapping is the process of defining relationships or associations between source data elements and target data elements. Data mapping is an important process in data integration, data migration, and data transformation, ensuring that data is accurately and consistently represented when it is moved or combined. data pipeline : A data pipeline is a series of processes and systems that enable the flow of data from sources to destinations, automating the movement and transformation of data for various purposes, such as analytics, reporting, or machine learning. A data pipeline typically comprised of a source system, a data ingestion tool, a data transformation tool, and a target system. A data pipeline covers the following stages: data extraction, data transformation, data loading, and data validation. Data Portal : Data Portal is a Confluent Cloud application that uses Stream Catalog and Stream Lineage to provide self-service access throughout Confluent Cloud Console for data practitioners to search and discover existing topics using tags and business metadata, request access to topics and data, and access data in topics to to build streaming applications and data pipelines. Leverages Stream Catalog and Stream Lineage to provide a data-centric view of Confluent optimized for self-service access to data where users can search, discover and understand available data, request access to data, and use data. **Related terms**: *Stream Catalog*, *Stream Lineage* **Related content** - [Data Portal on Confluent Cloud](/cloud/current/stream-governance/data-portal.html) data serialization : Data serialization is the process of converting data structures or objects into a format that can be stored or transmitted, and reconstructed later in the same or another computer environment. Data serialization is a common technique for implementing data persistence, interprocess communication, and object communication. Confluent Schema Registry (in Confluent Platform) and Confluent Cloud Schema Registry support data serialization using serializers and deserializers for the following formats: Avro, JSON Schema, and Protobuf. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) data steward : A data steward is a person with data-related responsibilities, such as data governance, data quality, and data security. data stream : A data stream is a continuous flow of data records that are produced and consumed by applications. dead letter queue (DLQ) : A dead letter queue (DLQ) is a queue where messages that could not be processed successfully by a sink connector are placed. Instead of stopping, the sink connector sends messages that could not be written successfully as event records to the DLQ topic while the sink connector continues processing messages. Dedicated Kafka cluster : A Confluent Cloud cluster type. Dedicated Kafka clusters are designed for critical production workloads with high traffic or private networking requirements. deserializer : A deserializer is a tool that converts a serial byte stream back into objects and parallel data. Deserializers work with serializers (known together as Serdes) to support efficient storage and high-speed data transmission over the wire. Confluent provides Serdes for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) egress : In general networking, egress refers to outbound traffic leaving a network or a specific network segment. In Confluent Cloud, egress is a Kafka cluster billing dimension that defines the number of bytes consumed from the cluster in one second. Available in the Metrics API as `sent_bytes` (convert from bytes to MB). To reduce egress in Confluent Cloud, compress your messages and ensure each consumer is only consuming from the topics it requires. For compression, use lz4. Avoid gzip because of high overhead on the cluster. Elastic Confluent Unit for Kafka (eCKU) : Elastic Confluent Unit for Kafka (eCKU) is used to express capacity for Basic, Standard, Enterprise, and Freight Kafka clusters. These clusters automatically scale up to a fixed ceiling. There is no need to resize these type clusters. When you need more capacity, your cluster expands up to the fixed ceiling. If you’re not using capacity above the minimum, you’re not paying for it. ELT : ELT is an acronym for Extract-Load-Transform, where data is extracted from a source system and loaded into a target system before processing or transformation. Compared to ETL, ELT is a more flexible approach to data ingestion because the data is loaded into the target system before transformation. Enterprise Kafka cluster : A Confluent Cloud cluster type. Enterprise Kafka clusters are designed for production-ready functionality that requires private endpoint networking capabilities. envelope encryption : Envelope encryption is a cryptographic technique that uses two keys to encrypt data. The symmetric data encryption key (DEK) is used to encrypt sensitive data. The separate asymmetric key encryption key (KEK) is the master key used to encrypt the DEK. The DEK and encrypted data are stored together. Only users with access to the KEK can decrypt the DEK and access the sensitive data. In Confluent Cloud, envelope encryption is used to enable client-side field level encryption (CSFLE). CSFLE encrypts sensitive data in a message before it is sent to Confluent Cloud and allows for temporary decryption of sensitive data when required to perform operations on the data. **Related terms**: *data encryption key (DEK)*, *key encryption key (KEK)* ETL : ETL is an acronym for Extract-Transform-Load, where data is extracted from a source system, transformed into a target format, and loaded into a target system. Compared to ELT, ETL is a more rigid approach to data ingestion because the data is transformed before loading into the target system. event : An event is a meaningful action or occurrence of something that happened. Events that can be recognized by a program, either human-generated or triggered by software, can be recorded in a log file or other data store. **Related terms**: *event message*, *event record*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event message : An event message is a record of an event sent to a Kafka topic, represented as a key-value pair. Each event message consists of a key-value pair, a timestamp, the compression type, headers for metadata (optional), and a partition and offset ID (once the message is written). The key is optional and can be used to identify the event. The value is required and contains details about the event that happened. **Related terms**: *event*, *event record*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event record : An event record is the record of an event stored in a Kafka topic. Event records are organized and durably stored in topics. Examples of events include orders, payments, activities, or measurements. An event typically contains one or more data fields that describe the fact, as well as a timestamp that denotes when the event was created by its event source. The event may also contain various metadata, such as its source of origin (for example, the application or cloud service that created the event) and storage-level information (for example, its position in the event stream). **Related terms**: *event*, *event message*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event sink : An event sink is a consumer of events, which can include applications, cloud services, databases, IoT sensors, and more. **Related terms**: *event*, *event message, \*event record*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event source : An event source is a producer of events, which can include cloud services, databases, IoT sensors, mainframes, and more. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event stream*, *event streaming*, *event streaming platform*, *event time* event stream : An event stream is a continuous flow of event messages produced by an event source and consumed by one or more consumers. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event streaming platform*, *event time* event streaming : Event streaming is the practice of capturing event data in real-time from data sources. Event streaming is a form of data streaming that is used to capture, store, process, and react to data in real-time or retrospectively. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming platform*, *event time* event streaming platform : An event streaming platform is a platform that events can be written to once, allowing distributed functions within an organization to react in realtime. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event time* event time : Event time is the time when an event occurred on the producing device, as opposed to the time when the event was processed or recorded. Event time is often used in stream processing to determine the order of events and to perform windowing operations. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event streaming platform* exactly-once semantics : Exactly-once semantics is a guarantee that a message is delivered exactly once and in the order that it was sent. Even if a producer retries sending a message, or a consumer retries processing a message, the message is delivered exactly once. This guarantee is achieved by the broker assigning a unique ID to each message and storing the ID in the consumer offset. The consumer offset is committed to the broker only after the message is processed. If the consumer fails to process the message, the message is redelivered and processed again. Freight Kafka cluster : A Confluent Cloud cluster type. Freight Kafka clusters are designed for high-throughput, relaxed latency workloads that are less expensive than self-managed open source Kafka. granularity : Granularity is the degree or level of detail to which an entity (a system, service, or resource) is broken down into subcomponents, parts, or elements. Entities that are *fine-grained* have a higher level of detail, while *coarse-grained* entities have a reduced level of detail, often combining finer parts into a larger whole. In the context of access control, granular permissions provide precise control over resource access. They allow administrators to grant specific operations on distinct resources. This ensures users only have permissions tailored to their needs, minimizing unnecessary or potentially risky access. group mapping : Group mapping is a set of rules that map groups in your SSO identity provider to Confluent Cloud RBAC roles. When a user signs in to Confluent Cloud using SSO, Confluent Cloud uses the group mapping to grant access to Confluent Cloud resources. **Related terms**: *identity provider*, *identity pool*, *principal*, *role* **Related content** - [Group Mapping for Confluent Cloud](/cloud/current/access-management/authenticate/sso/group-mapping/overview.html) identity : An identity is a unique identifier that is used to authenticate and authorize users and applications to access resources. Identity is often used in conjunction with access control to determine whether a user or application is allowed to access a resource and perform a specific action or operation on that resource. **Related terms**: *identity provider*, *identity pool*, *principal*, *role* identity pool : An identity pool is a collection of identities that can be used to authenticate and authorize users and applications to access resources. Identity pools are used to manage permissions for users and applications that access resources in Confluent Cloud. They are also used to manage permissions for Confluent Cloud service accounts that are used to access resources in Confluent Cloud. identity provider : An identity provider is a trusted provider that authenticates users and issues security tokens that are used to verify the identity of a user. Identity providers are often used in single sign-on (SSO) scenarios, where a user can log in to multiple applications or services with a single set of credentials. Infinite Storage : Infinite Storage is the Confluent Cloud storage service that enhances the scalability of Confluent Cloud resources by separating storage and processing. Tiered storage within Confluent Cloud moves data between storage layers based on the needs of the workload, retrieves tiered data when requested, and garbage collects data that is past retention or otherwise deleted. If an application reads historical data, latency is not increased for other applications reading more recent data. Storage resources are decoupled from compute resources, you only pay for what you produce to Confluent Cloud and for storage that you use, and CKUs do not have storage limits. Related content: - [Infinite Storage in Confluent Cloud for Apache Kafka](https://www.confluent.io/blog/infinite-kafka-data-storage-in-confluent-cloud/) ingress : In general networking, ingress refers to traffic that enters a network from an external source. In Confluent Cloud, ingress is a Kafka cluster billing dimension that defines the number of bytes produced to the cluster in one second. Available in the Metrics API as `received_bytes` (convert from bytes to MB). To reduce ingress in Confluent Cloud, compress your messages. For compression, use lz4. Avoid gzip because of high overhead on the cluster. internal topic : An internal topic is a topic, prefixed with double underscores (“_\_”), that is automatically created by a Kafka component to store metadata about the broker, partition assignment, consumer offsets, and other information. Examples of internal topics: `__cluster_metadata`, `__consumer_offsets`, `__transaction_state`, `__confluent.support.metrics`, and `__confluent.support.metrics-raw`. JSON Schema : JSON Schema is a declarative language used for data serialization and exchange to define data structures, specify formats, and validate JSON documents. It is a way to encode expected data types, properties, and constraints to ensure that all fields are properly described for use with serializers and deserializers. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [JSON Schema - a declarative language that allows you to annotate and validate JSON documents.](https://json-schema.org/) - Confluent Cloud: [JSON Schema Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-avro.html) - Confluent Platform: [JSON Schema Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-json.html) Kafka bootstrap server : A Kafka bootstrap server is a Kafka broker that a Kafka client initiates a connection to a Kafka cluster and returns metadata, which includes the addresses for all of the brokers in the Kafka cluster. Although only one bootstrap server is required to connect to a Kafka cluster, multiple brokers can be specified in a bootstrap server list to provide high availability and fault tolerance in case a broker is unavailable. In Confluent Cloud, the bootstrap server is the general cluster endpoint. Kafka broker : A Kafka broker is a server in the Kafka storage layer that stores event streams from one or more sources. A Kafka cluster is typically comprised of several brokers. Every broker in a cluster is also a bootstrap server, meaning if you can connect to one broker in a cluster, you can connect to every broker. Kafka client : A Kafka client allows you to write distributed applications and microservices that read, write, and process streams of events in parallel, at scale, and in a fault-tolerant manner, even in the case of network problems or machine failures. The Kafka client library provides functions, classes, and utilities that allow developers to create Kafka producer clients (Producers) and consumer clients (Consumers) using various programming languages. The primary way to build production-ready Producers and Consumers is by using your preferred programming language and a Kafka client library. **Related content** - [Build Client Applications for Confluent Cloud](/cloud/current/client-apps/overview.html) - [Build Client Applications for Confluent Platform](/platform/current/clients/index.html) - [Getting Started with Apache Kafka and Java (or Python, Go, .Net, and others)](https://developer.confluent.io/get-started/java/) Kafka cluster : A Kafka cluster is a group of interconnected Kafka brokers that manage and distribute real-time data streaming, processing, and storage as if they are a single system. By distributing tasks and services across multiple Kafka brokers, the Kafka cluster improves availability, reliability, and performance. Kafka Connect : Kafka Connect is the component of Kafka that provides data integration between databases, key-value stores, search indexes, file systems, and Kafka brokers. Kafka Connect is an ecosystem of a client application and pluggable connectors. As a client application, Connect is a server process that runs on hardware independent of the Kafka brokers themselves. It is scalable and fault-tolerant, meaning you can run a cluster of Connect workers that share the load of moving data in and out of Kafka from and to external systems. Connect also abstracts the business of code away from the user and instead requires only JSON configuration to run. **Related content** - Confluent Cloud: [Kafka Connect](/cloud/current/billing/overview.html#kconnect-long) - Confluent Platform: [Kafka Connect](/platform/current/connect/index.html) Kafka controller : A Kafka controller is the node in a Kafka cluster that is responsible for managing and changing the metadata of the cluster. This node also communicates metadata changes to the rest of the cluster. When Kafka uses ZooKeeper for metadata management, the controller is a broker, and the broker persists the metadata to ZooKeeper for backup and recovery. With KRaft, you dedicate Kafka nodes to operate as controllers and the metadata is stored in Kafka itself and not persisted to ZooKeeper. KRaft enables faster recovery because of this. For more information, see [KRaft overview](/platform/current/kafka-metadata/kraft.html). Kafka listener : A Kafka listener is an endpoint that Kafka brokers bind to use to communicate with clients. For Kafka clusters, Kafka listeners are configured in the `listeners` property of the `server.properties` file. Advertised listeners are publicly accessible endpoints that are used by clients to connect to the Kafka cluster. **Related content** - [Kafka Listeners – Explained](https://www.confluent.io/blog/kafka-listeners-explained/) Kafka metadata : Kafka metadata is the information about the Kafka cluster and the topics that are stored in it. This information includes details such as the brokers in the cluster, the topics that are available, the partitions for each topic, and the location of the leader for each partition. Kafka metadata is used by clients to discover the available brokers and topics, and to determine which broker is the leader for a particular partition. This information is essential for clients to be able to send and receive messages to and from Kafka. Kafka Streams : Kafka Streams is a stream processing library for building streaming applications and microservices that transform (filter, group mapping, aggregate, join, and more) incoming event streams in real-time to Kafka topics stored in an Kafka cluster. The Streams API can be used to build applications that process data in real-time, analyze data continuously, and build data pipelines. **Related content** - [Kafka Streams](/platform/current/streams/overview.html) Kafka topic : See *topic*. key encryption key (KEK) : A key encryption key (KEK) is a master key that is used to encrypt and a decrypt other keys, specifically the data encryption key (DEK). Only users with access to the KEK can decrypt the DEK and access the sensitive data. **Related terms**: *data encryption key (DEK)*, *envelope encryption*. Kora : Kora is the cloud-native streaming data service based on Kafka technology that powers the Confluent Cloud event streaming platform for building real-time data pipelines and streaming applications. Kora abstracts low-level resources, such as Kafka brokers, and hides operational complexities, such as system upgrades. Kora is built on the following foundations: a tiered storage layer that improves cost and performance, elasticity and consistent performance through incremental load balancing, cost effective multi-tenancy with dynamic quota management and cell-based isolation, continuous monitoring of both system health and data integrity, and clean abstraction with standard Kafka protocols and CKUs to hide underlying resources. **Related terms**: *Apache Kafka*, *Confluent Cloud*, *Confluent Unit for Kafka (CKU)* **Related content** - [Kora: The Cloud Native Engine for Apache Kafka](https://www.confluent.io/blog/cloud-native-data-streaming-kafka-engine/) - [Kora: A Cloud-Native Event Streaming Platform For Kafka](https://www.vldb.org/pvldb/vol16/p3822-povzner.pdf) KRaft : KRaft (or Apache Kafka Raft) is a consensus protocol introduced in Kafka 2.4 to provide metadata management for Kafka with the goal to replace ZooKeeper. KRaft simplifies Kafka because it enables the management of metadata in Kafka itself, rather than splitting it between ZooKeeper and Kafka. As of Confluent Platform 7.5, KRaft is the default method of metadata management in new deployments. For more information, see [KRaft overview](/platform/current/kafka-metadata/kraft.html). ksqlDB : ksqlDB is a streaming SQL database engine purpose-built for creating stream processing applications on top of Kafka. logical Kafka cluster (LKC) : A logical Kafka cluster (LKC) is a subset of a physical Kafka cluster (PKC) that is isolated from other logical clusters within Confluent Cloud. Each logical unit of isolation is considered a tenant and maps to a specific organization. If the mapping is one-to-one, one LKC maps to one PKC (a Dedicated cluster). If the mapping is many-to-one, one LKC maps to one of the multitenant Kafka cluster types (Basic, Standard, Enterprise, and Freight). **Related terms**: *Confluent Cloud*, *Kafka cluster*, *physical Kafka cluster (PKC)* **Related content** - [Kafka Cluster Types in Confluent Cloud](/cloud/current/clusters/cluster-types.html) - [Multi-tenancy and Client Quotas on Confluent Cloud](/cloud/current/clusters/client-quotas.html) multi-region cluster (MRC) : A multi-region cluster (MRC) is a single Kafka cluster that replicates data between datacenters across regional availability zones. multi-tenancy : Multi-tenancy is a software architecture in which a single physical instance is shared among multiple logical instances, or tenants. In Confluent Cloud, each Basic, Standard, Enterprise, and Freight cluster is a logical Kafka cluster (LKC) that shares a physical Kafka cluster (PKC) with other tenants. Each LKC is isolated from other L and has its own resources, such as memory, compute, and storage. **Related terms**: *Confluent Cloud*, *logical Kafka cluster (LKC)*, *physical Kafka cluster (PKC)* **Related content** * [From On-Prem to Cloud-Native: Multi-Tenancy in Confluent Cloud](https://www.confluent.io/blog/cloud-native-multi-tenant-kafka-with-confluent-cloud/) * [Multi-tenancy and Client Quotas on Confluent Cloud](/cloud/current/clusters/client-quotas.html) offset : An offset is an integer assigned to each message that uniquely represents its position within the data stream, guaranteeing the ordering of records and allowing offset-based connections to replay messages from any point in time. **Related terms**: *consumer offset*, *connector offset*, *offset commit*, *replayability* offset commit : An offset commit is the process of keeping track of the current position of an offset-based connection (primarily Kafka consumers and connectors) within the data stream. The offset commit process is not specific to consumers, producers, or connectors. It is a general mechanism in Kafka to track the position of any application that is reading data. When a consumer commits an offset, the offset identifies the next message the consumer should consume. For example, if a consumer has an offset of 5, it has consumed messages 0 through 4 and will next consume message 5. If the consumer crashes or is shut down, its partitions are reassigned to another consumer which initiates consuming from the last committed offset of each partition. The committed offset for consumers is stored on a Kafka broker. When a consumer commits an offset, it sends a commit request to the Kafka cluster, specifying the partition and offset it wants to commit for a particular consumer group. The Kafka broker receiving the commit request then stores this offset in the `__consumer_offsets` internal topic. **Related terms**: *consumer offset*, *offset* OpenSSL : OpenSSL is an open-source software library and toolkit that implements the Secure Sockets Layer (SSL) and Transport Layer Security (TLS) protocols. For more information, see [Wikipedia](https://en.wikipedia.org/wiki/OpenSSL). parent cluster : The Kafka cluster that a resource belongs to. **Related terms**: *Kafka cluster* partition : A partition is a unit of data storage that divides a topic into multiple, parallel event streams, each of which is stored on separate Kafka brokers and can be consumed independently. Partitioning is a key concept in Kafka because it allows Kafka to scale horizontally by adding more brokers to the cluster. Partitions are also the unit of parallelism in Kafka. A topic can have one or more partitions, and each partition is an ordered, immutable sequence of event records that is continually appended to a partition log. partitions (pre-replication) : In Confluent Cloud, partitions are a Kafka cluster billing dimension that define the maximum number of partitions that can exist on the cluster at one time, before replication. While you are not charged for partitions on any type of Kafka cluster, the number of partitions you use has an impact on eCKU. To determine eCKUs limits for partitions, Confluent Cloud bills only for pre-replication (leader partitions) across a cluster. All topics that you create (as well as internal topics that are automatically created by Confluent Platform components such as ksqlDB, Kafka Streams, Connect, and Control Center (Legacy)) count towards the cluster partition limit. Confluent prefixes topics created automatically with an underscore (_). Topics that are internal to Kafka itself (consumer offsets) are not visible in Cloud Console and do not count against partition limits or toward partition billing. Available in the Metrics API as `partition_count`. In Confluent Cloud, attempts to create additional partitions beyond the cluster limit fail with an error message. To reduce usage on partitions (pre-replication), delete unused topics and create new topics with fewer partitions. Use the Kafka Admin interface to increase the partition count of an existing topic if the initial partition count is too low. physical Kafka cluster (PKC) : A physical Kafka cluster (PKC) is a Kafka cluster comprised of multiple brokers. Each physical Kafka cluster is created on a Kubernetes cluster by the control plane. A PKC is not directly accessible by clients. principal : A principal is an entity that can be authenticated and granted permissions based on roles to access resources and perform operations. An entity can be a user account, service account, group mapping, or identity pool. **Related terms**: *group mapping*, *identity*, *identity pool*, *role*, *service account*, *user account* private internet : A private internet is a closed, restricted computer network typically used by organizations to provide secure environments for managing sensitive data and resources. processing time : Processing time is the time when an event is processed or recorded by a system, as opposed to the time when the event occurred on the producing device. Processing time is often used in stream processing to determine the order of events and to perform windowing operations. producer : A producer is a client application that publishes (writes) data to a topic in an Kafka cluster. Producers write data to a topic and are the only clients that can write data to a topic. Each record written to a topic is appended to the partition of the topic that is selected by the producer. Producer API : The Producer API is the Kafka API that allows you to write data to a topic in an Kafka cluster. The Producer API is used by producer clients to publish data to a topic in an Kafka cluster. Protobuf : Protobuf (or Protocol Buffers) is an open-source data format used to serialize structured data for storage. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [Protocol Buffers](https://protobuf.dev/) - [Getting Started with Protobuf in Confluent Cloud](https://www.confluent.io/blog/using-protobuf-in-confluent-cloud/) - Confluent Cloud: [Protobuf Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-protobuf.html) - Confluent Platform: [Protobuf Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-protobuf.html) public internet : The public internet is the global system of interconnected computers and networks that use TCP/IP to communicate with each other. rebalancing : Rebalancing is the process of redistributing the partitions of a topic among the consumers of a consumer group for improved performance and scalability. A rebalance can occur if a consumer has failed the heartbeat and has been excluded from the group, it voluntarily left the group, metadata has been updated for a consumer, or a consumer has joined the group. replayability : Replayability is the ability to replay messages from any point in time. **Related terms**: *consumer offset*, *offset*, *offset commit* replication : Replication is the process of creating and maintaining multiple copies (or *replicas*) of data across different nodes in a distributed system to increase availability, reliability, redundancy, and accessibility. replication factor : A replication factor is the number of copies of a partition that are distributed across the brokers in a cluster. requests : In Confluent Cloud, requests are a Kafka cluster billing dimension that defines the number of client requests to the cluster in one second. Available in the Metrics API as `request_count`. To reduce usage on requests, you can adjust producer batching configurations, consumer client batching configurations, and shut down otherwise inactive clients. For Dedicated clusters, a high number of requests per second results in increased load on the cluster. role : A role is a Confluent-defined job function assigned a set of permissions required to perform specific actions or operations on Confluent resources bound to a principal and Confluent resources. A role can be assigned to a user account, group mapping, service account, or identity pool. **Related terms**: *group mapping*, *identity*, *identity pool*, *principal*, *service account* **Related content** - [Predefined RBAC Roles in Confluent Cloud](/cloud/current/access-management/access-control/rbac/predefined-rbac-roles.html) - [Role-Based Access Control Predefined Roles in Confluent Platform](/platform/current/security/rbac/rbac-predefined-roles.html) rolling restart : A rolling restart restarts the brokers in a Kafka cluster with zero downtime by incrementally restarting a Kafka broker after verifying that there are no under-replicated partitions on the broker before proceeding to the next broker. Restarting the brokers one at a time allows for software upgrades, broker configuration updates, or cluster maintenance while maintaining high availability by avoiding downtime. **Related content** - [Rolling restart](/platform/current/kafka/post-deployment.html#rolling-restart) schema : A schema is the structured definition or blueprint used to describe the format and structure event messages sent through the Kafka event streaming platform. Schemas are used to validate the structure of data in event messages and ensures that producers and consumers are sending and receiving data in the same format. Schemas are defined in the Schema Registry. Schema Registry : Schema Registry is a centralized repository for managing and validating schemas for topic message data that stores and manages schemas for Kafka topics. Schema Registry is built into Confluent Cloud as a managed service, available with the Advanced Stream Governance package, and offered as part of Confluent Enterprise for self-managed deployments. The Schema Registry is a RESTful service that stores and manages schemas for Kafka topics. The Schema Registry is integrated with Kafka and Connect to provide a central location for managing schemas and validating data. Producers and consumers to Kafka topics use schemas to ensure data consistency and compatibility as schemas evolve. Schema Registry is a key component of Stream Governance. **Related content** - Confluent Cloud: [Manage Schemas in Confluent Cloud](/cloud/current/sr/schemas-manage.html) - Confluent Platform: [Schema Registry Overview](/platform/current/schema-registry/index.html) schema subject : A schema subject is the namespace for a schema in Schema Registry. This unique identifier defines a logical grouping of related schemas. Kafka topics contain event messages serialized and deserialized using the structure and rules defined in a schema subject. This ensures compatibility and supports schema evolution. **Related content** - Confluent Cloud: [Manage Schemas in Confluent Cloud](/cloud/current/sr/schemas-manage.html) - Confluent Platform: [Schema Registry Concepts](/platform/current/schema-registry/index.html) - [Understanding Schema Subjects](https://developer.confluent.io/courses/schema-registry/schema-subjects/) Serdes : Serdes are serializers and deserializers that convert objects and parallel data into a serial byte stream for efficient storage and high-speed data transmission over the wire. Confluent provides Serdes for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) - [Serde](https://serde.rs/) serializer : A serializer is a tool that converts objects and parallel data into a serial byte stream. Serializers work with deserializers (known together as Serdes) to support efficient storage and high-speed data transmission over the wire. Confluent provides serializers for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) service account : A service account is a non-person entity used by an application or service to access resources and perform operations. Because a service account is an identity independent of the user who created it, it can be used programmatically to authenticate to resources and perform operations without the need for a user to be signed in. **Related content** - [Service Accounts for Confluent Cloud](/cloud/current/access-management/identity/service-accounts.html) service quota : A service quota is the limit, or maximum value, for a specific Confluent Cloud resource or operation that might vary by the resource scope it applies to. **Related content** - [Service Quotas for Confluent Cloud](/cloud/current/quotas/index.html) single message transform (SMT) : A single message transform (SMT) is a transformation or operation applied in realtime on an individual message that changes the values, keys, or headers of a message before being sent to a sink connector or after being read from a source connector. SMTs are convenient for inserting fields, masking information, event routing, and other minor data adjustments. single sign-on (SSO) : Single sign-on (SSO) is a centralized authentication service that allows users to use a single set of credentials to log in to multiple applications or services. **Related terms**: *authentication*, *group mapping*, *identity provider* **Related content** - [Single Sign-On for Confluent Cloud](/cloud/current/access-management/authenticate/sso/index.html) sink connector : A sink connector is a Kafka Connect connector that publishes (writes) data from a Kafka topic to an external system. source connector : A source connector is a Kafka Connect connector that subscribes (reads) data from a source (external system), extracts the payload and schema of the data, and publishes (writes) the data to Kafka topics. standalone : Standalone refers to a configuration in which a software application, system, or service operates independently on a single instance or device. This mode is commonly used for development, testing, and debugging purposes. For Kafka Connect, a standalone worker is a single process responsible for running all connectors and tasks on a single instance. Standard Kafka cluster : A Confluent Cloud cluster type. Standard Kafka clusters are designed for production-ready features and functionality. static egress IP address : A static egress IP address is an IP address used by a Confluent Cloud managed connector to establish outbound connections to endpoints of external data sources and sinks over the public internet. **Related content** - [Use Static IP Addresses on Confluent Cloud for Connectors and Cluster Linking](/cloud/current/networking/static-egress-ip-addresses.html) - [Static Egress IP Addresses for Confluent Cloud Connectors](/cloud/current/connectors/static-egress-ip.html) storage (pre-replication) : In Confluent Cloud, storage is a Kafka cluster billing dimension that defines the number of bytes retained on the cluster, pre-replication. Available in the Metrics API as `retained_bytes` (convert from bytes to TB). The returned value is pre-replication. Standard, Enterprise, Dedicated, and Freight clusters support Infinite Storage. This means there is no maximum size limit for the amount of data that can be stored on the cluster. You can configure policy settings `retention.bytes` and `retention.ms` at the topic level to control exactly how much and how long to retain data in a way that makes sense for your applications and helps control your costs. To reduce storage in Confluent Cloud, compress your messages and reduce retention settings. For compression, use lz4. Avoid gzip because of high overhead on the cluster. Stream Catalog : Stream Catalog is a pillar of Confluent Cloud Stream Governance that provides a centralized inventory of your organization’s data assets that supports data governance and data discovery. With Data Portal in Confluent Cloud Console, users can find event streams across systems, search topics by name or tags, and enrich event data to increase value and usefulness. REST and GraphQL APIs can be used to search schemas, apply tags to records or fields, manage business metadata, and discover relationships across data assets. **Related content** - [Stream Catalog on Confluent Cloud: User Guide to Manage Tags and Metadata](/cloud/current/stream-governance/stream-catalog.html) - [Stream Catalog in Streaming Data Governance (Confluent Developer course)](https://developer.confluent.io/courses/governing-data-streams/stream-catalog/) Stream Governance : Stream Governance is a collection of tools and features that provide data governance for data in motion. These include data quality tools such as Schema Registry, schema ID validation, and schema linking; built-in data catalog capabilities to classify, organize, and find event streams across systems; and stream lineage to visualize complex data relationships and uncover insights with interactive, end-to-end maps of event streams. Taken together, these and other governance tools enable teams to manage the availability, integrity, and security of data used across organizations, and help with standardization, monitoring, collaboration, reporting, and more. **Related terms**: *Stream Catalog*, *Stream Lineage* **Related content** - [Stream Governance on Confluent Cloud](/cloud/current/stream-governance/index.html) stream lineage : Stream lineage is the life cycle, or history, of data, including its origins, transformations, and consumption, as it moves through various stages in data pipelines, applications, and systems. Stream lineage provides a record of data’s journey from its source to its destination, and is used to track data quality, data governance, and data security. **Related terms**: **Data Portal**, *Stream Governance* **Related content** - [Stream Lineage on Confluent Cloud](/cloud/current/stream-governance/stream-lineage.html) stream processing : Stream processing is the method of collecting event stream data in real-time as it arrives, transforming the data in real-time using operations (such as filters, joins, and aggregations), and publishing the results to one or more target systems. Stream processing can be used to analyze data continuously, build data pipelines, and process time-sensitive data in real-time. Using the Confluent event streaming platform, event streams can be processed in real-time using Kafka Streams, Kafka Connect, or ksqlDB. Streams API : The Streams API is the Kafka API that allows you to build streaming applications and microservices that transform (for example, filter, group, aggregate, join) incoming event streams in real-time to Kafka topics stored in a Kafka cluster. The Streams API is used by stream processing clients to process data in real-time, analyze data continuously, and build data pipelines. **Related content** - [Introduction Kafka Streams API](/platform/current/streams/introduction.html) throttling : Throttling is the process Kafka clusters in Confluent Cloud use to protect themselves from getting to an over-utilized state. Also known as backpressure, throttling in Confluent Cloud occurs when cluster load reaches 80%. At this point, applications may start seeing higher latencies or timeouts as the cluster must begin throttling requests or connection attempts. topic : A topic is a user-defined category or feed name where event messages are stored and published by producers and subscribed to by consumers. Each topic is a log of event messages. Topics are stored in one or more partitions, which distribute topic records brokers in a Kafka cluster. Each partition is an ordered, immutable sequence of records that are continually appended to a topic. **Related content** - [Manage Topics in Confluent Cloud](/cloud/current/client-apps/topics/index.html) total client connections : In Confluent Cloud, total client connections are a Kafka cluster billing dimension that defines the number of TCP connections to the cluster you can open at one time. Available in the Metrics API as `active_connection_count`. Filter by principal to understand how many connections each application is creating. How many connections a cluster supports can vary widely based on several factors, including number of producer clients, number of consumer clients, partition keying strategy, produce patterns per client, and consume patterns per client. For Dedicated clusters, Confluent derives a guideline for total client connections from benchmarking that indicates exceeding this number of connections increases produce latency for test clients. However, this does not apply to all workloads. That is why total client connections are a guideline, not a hard limit for Dedicated Kafka clusters. Monitor the impact on cluster load as connection count increases, as this is the final representation of the impact of a given workload or CKU dimension on the underlying resources of the cluster. Consider the Confluent guideline a per-CKU guideline. The number of connections tends to increase when you add brokers. In other words, if you significantly exceed the per-CKU guideline, cluster expansion doesn’t always give your cluster more connection count headroom. Transport Layer Security (TLS) : Transport Layer Security (TLS) is a cryptographic protocol that provides secure communication over a network. For more information, see [Wikipedia](https://en.wikipedia.org/wiki/Transport_Layer_Security). unbounded stream : An unbounded stream is a stream of data that is continuously generated in real-time and has no defined end. Examples of unbounded streams include stock prices, sensor data, and social media feeds. Processing unbounded streams requires a different approach than processing bounded streams. Unbounded streams are processed incrementally as data arrives, while bounded streams are processed as a batch after all data has arrived. Kafka Streams and Flink can be used to process unbounded streams. **Related terms**: *bounded stream*, *stream processing*, *unbounded stream* under replication : Under replication is a situation when the number of in-sync replicas is below the number of all replicas. Under Replicated partitions can occur when a broker is down or cannot replicate fast enough from the leader (replica fetcher lag). user account : A user account is an account representing the identity of a person who can be authenticated and granted access to Confluent Cloud resources. **Related content** - [User Accounts for Confluent Cloud](/cloud/current/access-management/identity/user-accounts/overview.html) watermark : A watermark in Flink is a marker that keeps track of time as data is processed. A watermark means that all records until the current moment in time have been “seen”. This way, Flink can correctly perform tasks that depend on when things happened, like calculating aggregations over time windows. **Related content** - [Time and Watermarks](/cloud/current/flink/concepts/timely-stream-processing.html) ### Partitioners The connector comes with the following partitioners: * **Default Kafka Partitioner**: The `io.confluent.connect.storage.partitioner.DefaultPartitioner` preserves the same topic partitions as the partitions in the Kafka records. Each topic partition ultimately ends up as a storage object with a name that includes both the Kafka topic and Kafka partition. The `` is always `/partition=`, resulting in storage object names like `//partition=/++. Admin API : The Admin API is the Kafka REST API that enables administrators to manage and monitor Kafka clusters, topics, brokers, and other Kafka components. Ansible Playbooks for Confluent Platform : Ansible Playbooks for Confluent Platform is a set of Ansible playbooks and roles that are designed to automate the deployment and management of Confluent Platform. Apache Flink : Apache Flink is an open source stream processing framework for stateful computations over unbounded and bounded data streams. Flink provides a unified API for batch and stream processing that supports event-time and out-of-order processing, and supports exactly-once semantics. Flink applications include real-time analytics, data pipelines, and event-driven applications. **Related terms**: *bounded stream*, *data stream*, *stream processing*, *unbounded stream* **Related content** - [Apache Flink: Stream Processing and SQL on Confluent Cloud](/cloud/current/flink/index.html#) - [What is Apache Flink?](https://www.confluent.io/learn/apache-flink/) - [Apache Flink 101 (Confluent Developer course)](https://developer.confluent.io/courses/apache-flink/intro/) Apache Kafka : Apache Kafka is an open source event streaming platform that provides a unified, high-throughput, low-latency, fault-tolerant, scalable, distributed, and secure data streaming platform. Kafka is a publish-and-subscribe messaging system that enables distributed applications to ingest, process, and share data in real-time. **Related content** - [Introduction to Kafka](/kafka/introduction.html) audit log : An audit log is a historical record of actions and operations that are triggered when auditable events occurs. Audit log records can be used to troubleshoot system issues, manage security, and monitor compliance, by tracking administrative activity, data access and modification, monitoring sign-in attempts, and reconstructing security breaches and fraudulent activity. **Related terms**: *auditable event* **Related content** - [Audit Log Concepts for Confluent Cloud](/cloud/current/monitoring/audit-logging/cloud-audit-log-concepts.html) - [Audit Log Concepts for Confluent Platform](/platform/current/security/audit-logs/audit-logs-concepts.html) auditable event : An auditable event is an event that represents an action or operation that can be tracked and monitored for security purposes and compliance. When an auditable event occurs, an auditable event method is triggered and an event message is sent to the audit log cluster and stored as an audit log record. **Related terms**: *audit log*, *event message* **Related content** - [Auditable Events in Confluent Cloud](/cloud/current/monitoring/audit-logging/event-methods/index.html) - [Auditable Events in Confluent Platform](/platform/current/security/audit-logs/auditable-events.html) authentication : Authentication is the process of verifying the identity of a principal that interacts with a system or application. Authentication is often used in conjunction with authorization to determine whether a principal is allowed to access a resource and perform a specific action or operation on that resource. Digital authentication requires one or more of the following: something a principal knows (a password or security question), something a principal has (a security token or key), or something a principal is (a biometric characteristic, such as a fingerprint or voiceprint). Multi-factor authentication (MFA) requires two or more forms of authentication. **Related terms**: *authorization*, *identity*, *identity provider*, *identity pool*, *principal*, *role* authorization : Authorization is the process of evaluating and then granting or denying a principal a set of permissions required to access and perform operations on resources. **Related terms**: *authentication*, *group mapping*, *identity*, *identity provider*, *identity pool*, *principal*, *role* Avro : Avro is a data serialization and exchange framework that provides data structures, remote procedure call (RPC), compact binary data format, a container file, and uses JSON to represent schemas. Avro schemas ensure that every field is properly described and documented for use with serializers and deserializers. You can either send a schema with every message or use Schema Registry to store and receive schemas for use by consumers and producers to save bandwidth and storage space. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [Apache Avro - a data serialization system](https://avro.apache.org/) - Confluent Cloud: [Avro Schema Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-avro.html) - Confluent Platform: [Avro Schema Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-avro.html) Basic Kafka cluster : A Confluent Cloud cluster type. Basic Kafka clusters are designed for experimentation, early development, and basic use cases. batch processing : Batch processing is the method of collecting a large volume of data over a specific time interval, after which the data is processed all at once and loaded into a destination system. Batch processing is often used when processing data can occur independently of the source and timing of the data. It is efficient for non-real-time data processing, such as data warehousing, reporting, and analytics. **Related terms**: *bounded stream*, *stream processing*, *unbounded stream* CIDR block : A CIDR block is a group of IP addresses that are contiguous and can be represented as a single block. CIDR blocks are expressed using Classless Inter-domain Routing (CIDR) notation that includes an IP address and a number of bits in the network mask. **Related content** - [CIDR notation](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing#CIDR_notation) - [Classless Inter-domain Routing (CIDR): The Internet Address Assignment and Aggregation Plan [RFC 4632]](https://www.rfc-editor.org/rfc/rfc4632.html) Cluster Linking : Cluster Linking is a highly performant data replication feature that enables links between Kafka clusters to mirror data from one cluster to another. Cluster Linking creates perfect copies of Kafka topics, which keep data in sync across clusters. Use cases include geo-replication of data, data sharing, migration, disaster recovery, and tiered separation of critical applications. **Related content** - [Geo-replication with Cluster Linking on Confluent Cloud](/cloud/current/multi-cloud/cluster-linking/index.html) - [Cluster Linking for Confluent Platform](/platform/current/multi-dc-deployments/cluster-linking/index.html) commit log : A commit log is a log of all event messages about commits (changes or operations made) sent to a Kafka topic. A commit log ensures that all event messages are processed at least once and provides a mechanism for recovery in the event of a failure. The commit log is also referred to as a write-ahead log (WAL) or a transaction log. **Related terms**: *event message* Confluent Cloud : Confluent Cloud is the fully managed, cloud-native event streaming service powered by Kora, the event streaming platform based on Kafka and extended by Confluent to provide high availability, scalability, elasticity, security, and global interconnectivity. Confluent Cloud offers cost-effective multi-tenant configurations as well as dedicated solutions, if stronger isolation is required. **Related terms**: *Apache Kafka*, *Kora* **Related content** - [Confluent Cloud Overview](/cloud/current/index.html) - [Confluent Cloud](https://www.confluent.io/confluent-cloud/) Confluent Cloud network : A Confluent Cloud network is an abstraction for a single tenant network environment that hosts Dedicated Kafka clusters in Confluent Cloud along with their single tenant services, like ksqlDB clusters and managed connectors. **Related content** - [Confluent Cloud Network Overview](/cloud/current/networking/overview.html#ccloud-networks) Confluent for Kubernetes (CFK) : *Confluent for Kubernetes (CFK)* is a cloud-native control plane for deploying and managing Confluent in private cloud environments through declarative API. Confluent Platform : Confluent Platform is a specialized distribution of Kafka at its core, with additional components for data integration, streaming data pipelines, and stream processing. Confluent REST Proxy : Confluent REST Proxy provides a RESTful interface to an Kafka cluster, making it easy to produce and consume messages, view the state of the cluster, and perform administrative actions without using the native Kafka protocol or clients. **Related content** - Confluent Platform: [REST Proxy](/platform/current/kafka-rest/index.html) Confluent Server : Confluent Server is the default Kafka broker component of Confluent Platform that builds on the foundation of Apache Kafka® and provides enhanced proprietary features designed for enterprise use. Confluent Server is fully compatible with Kafka, and adds Kafka cluster support for Role-Based Access Control, Audit Logs, Schema Validation, Self Balancing Clusters, Tiered Storage, Multi-Region Clusters, and Cluster Linking. **Related terms**: *Confluent Platform*, *Apache Kafka*, *Kafka broker*, *Cluster Linking*, *multi-region cluster (MRC)* Confluent Unit for Kafka (CKU) : Confluent Unit for Kafka (CKU) is a unit of horizontal scaling for Dedicated Kafka clusters in Confluent Cloud that provide preallocated resources. CKUs determine the capacity of a Dedicated Kafka cluster in Confluent Cloud. **Related content** - [CKU limits per cluster](/cloud/current/clusters/cluster-types.html#cku-limits-per-cluster) Connect API : The Connect API is the Kafka API that enables a connector to read event streams from a source system and write to a target system. Connect worker : A Connect worker is a server process that runs a connector and performs the actual work of moving data in and out of Kafka topics. A worker is a server process that runs on hardware independent of the Kafka brokers themselves. It is scalable and fault-tolerant, meaning you can run a cluster of workers that share the load of moving data in and out of Kafka from and to external systems. **Related terms**: *connector*, *Kafka Connect* connection attempts : In Confluent Cloud, connection attempts are a Kafka cluster billing dimension that defines the maximum number of new TCP connections to the cluster you can create in one second. This includes successful and unsuccessful authentication attempts. Available in the Metrics API as `successful_authentication_count` (only includes successful authentications, not unsuccessful authentication attempts). To reduce usage on connection attempts, use longer-lived connections to the cluster. If you exceed the maximum, connection attempts may be refused. connector : A connector is an abstract mechanism that enables communication, coordination, or cooperation among components by transferring data elements from one interface to another without changing the data. connector offset : Connector offset uniquely identifies the position of a connector as it processes data. Connectors use a variety of strategies to implement the connector offset, including everything from monotonically increasing integers to replay ids, lists of files, timestamps and even checkpoint information. Connector offsets keep track of already-processed data in the event of a connector restart or recovery. While sink connectors use a pattern for connector offsets similar to the offset mechanism used throughout Kafka, the implementation details for source connectors are often much different. This is because source connectors track the progress of a source system as it process data. consumer : A consumer is a Kafka client application that subscribes to (reads and processes) event messages from a Kafka topic. The Streams API and the Consumer API are the two APIs that enable consumers to read event streams from Kafka topics. **Related terms**: *Consumer API*, *consumer group*, *producer*, *Streams API* Consumer API : The Consumer API is the Kafka API used for consuming (reading) event messages or records from Kafka topics and enables a Kafka consumer to subscribe to a topic and read event messages as they arrive. Batch processing is a common use case for the Consumer API. consumer group : A consumer group is a single logical consumer implemented with multiple physical consumers for reasons of throughput and resilience. By dividing topics among consumers in the group into partitions, consumers in the group can process messages in parallel, increasing message throughput and enabling load balancing. **Related terms**: *consumer*, *partition*, *partition*, *producer*, *topic* consumer lag : Consumer lag is the number of consumer offsets between the latest message produced in a partition and the last message consumed by a consumer, that is the number of messages pending to be consumed from a particular partition. A large consumer lag, or a quickly growing lag, indicates that the consumer is unable to read from a partition as fast as the messages are available. This can be caused by a slow consumer, slow network, or slow broker. consumer offset : Consumer offset is the unique and monotonically increasing integer value that uniquely identifies the position of an event record in a partition. Consumers use offsets to track their current position in the Kafka topic, allowing consumers to resume processing from where they left off. Offsets are stored on the Kafka broker, which does not track which records have been read and which have not. It is up to the consumer connection to track this information. When a consumer acknowledges receiving and processing a message, it commits an offset value that is stored in the special internal topic `__commit_offsets`. cross-resource RBAC role binding : A cross-resource RBAC role binding is a role binding in Confluent Cloud that is applied at the Organization or Environment scope and grants access to multiple resources. For example, assigning a principal the NetworkAdmin role at the Organization scope lets them administer all networks across all Environments in their Organization. **Related terms**: *identity pool*, *principal*, *role*, *role binding* CRUD : CRUD is an acronym for the four basic operations that can be performed on data: Create, Read, Update, and Delete. custom connector : A custom connector is a connector created using Connect plugins uploaded to Confluent Cloud by users. This includes connector plugins that are built from scratch, modified open-source connector plugins, or third-party connector plugins. data at rest : Data at rest is data that is physically stored on non-volatile media (such as hard drives, solid-state drives, or other storage devices) and is not actively being transmitted or processed by a system. data contract : A data contract is a formal agreement between an upstream component and a downstream component on the structure and semantics of data that is in motion. A schema is a key element of a data contract. The schema, metadata, rules, policies, and evolution plan form the data contract. You can associate data contracts (schemas and more) with [topics](#term-Kafka-topic). **Related content** - Confluent Platform: [Data Contracts for Schema Registry on Confluent Platform](/platform/current/schema-registry/fundamentals/data-contracts.html) - Confluent Cloud: [Data Contracts for Schema Registry on Confluent Cloud](/cloud/current/sr/fundamentals/data-contracts.html) - Cloud Console: [Manage Schemas in Confluent Cloud](/cloud/current//sr/schemas-manage.html) data encryption key (DEK) : A data encryption key (DEK) is a symmetric key that is used to encrypt and decrypt data. The DEK is used in client-side field level encryption (CSFLE) to encrypt sensitive data. The DEK is itself encrypted using a key encryption key (KEK) that is only accessible to authorized users. The encrypted DEK and encrypted data are stored together. Only users with access to the KEK can decrypt the DEK and access the sensitive data. **Related terms**: *envelope encryption*, *key encryption key (KEK)* data in motion : Data in motion is data that is actively being transferred between source and destination, typically systems, devices, or networks. Data in motion is also referred to as data in transit or data in flight. data in use : Data in use is data that is actively being processed or manipulated in memory (RAM, CPU caches, or CPU registers). data ingestion : Data ingestion is the process of collecting, importing, and integrating data from various sources into a system for further processing, analysis, or storage. data mapping : Data mapping is the process of defining relationships or associations between source data elements and target data elements. Data mapping is an important process in data integration, data migration, and data transformation, ensuring that data is accurately and consistently represented when it is moved or combined. data pipeline : A data pipeline is a series of processes and systems that enable the flow of data from sources to destinations, automating the movement and transformation of data for various purposes, such as analytics, reporting, or machine learning. A data pipeline typically comprised of a source system, a data ingestion tool, a data transformation tool, and a target system. A data pipeline covers the following stages: data extraction, data transformation, data loading, and data validation. Data Portal : Data Portal is a Confluent Cloud application that uses Stream Catalog and Stream Lineage to provide self-service access throughout Confluent Cloud Console for data practitioners to search and discover existing topics using tags and business metadata, request access to topics and data, and access data in topics to to build streaming applications and data pipelines. Leverages Stream Catalog and Stream Lineage to provide a data-centric view of Confluent optimized for self-service access to data where users can search, discover and understand available data, request access to data, and use data. **Related terms**: *Stream Catalog*, *Stream Lineage* **Related content** - [Data Portal on Confluent Cloud](/cloud/current/stream-governance/data-portal.html) data serialization : Data serialization is the process of converting data structures or objects into a format that can be stored or transmitted, and reconstructed later in the same or another computer environment. Data serialization is a common technique for implementing data persistence, interprocess communication, and object communication. Confluent Schema Registry (in Confluent Platform) and Confluent Cloud Schema Registry support data serialization using serializers and deserializers for the following formats: Avro, JSON Schema, and Protobuf. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) data steward : A data steward is a person with data-related responsibilities, such as data governance, data quality, and data security. data stream : A data stream is a continuous flow of data records that are produced and consumed by applications. dead letter queue (DLQ) : A dead letter queue (DLQ) is a queue where messages that could not be processed successfully by a sink connector are placed. Instead of stopping, the sink connector sends messages that could not be written successfully as event records to the DLQ topic while the sink connector continues processing messages. Dedicated Kafka cluster : A Confluent Cloud cluster type. Dedicated Kafka clusters are designed for critical production workloads with high traffic or private networking requirements. deserializer : A deserializer is a tool that converts a serial byte stream back into objects and parallel data. Deserializers work with serializers (known together as Serdes) to support efficient storage and high-speed data transmission over the wire. Confluent provides Serdes for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) egress : In general networking, egress refers to outbound traffic leaving a network or a specific network segment. In Confluent Cloud, egress is a Kafka cluster billing dimension that defines the number of bytes consumed from the cluster in one second. Available in the Metrics API as `sent_bytes` (convert from bytes to MB). To reduce egress in Confluent Cloud, compress your messages and ensure each consumer is only consuming from the topics it requires. For compression, use lz4. Avoid gzip because of high overhead on the cluster. Elastic Confluent Unit for Kafka (eCKU) : Elastic Confluent Unit for Kafka (eCKU) is used to express capacity for Basic, Standard, Enterprise, and Freight Kafka clusters. These clusters automatically scale up to a fixed ceiling. There is no need to resize these type clusters. When you need more capacity, your cluster expands up to the fixed ceiling. If you’re not using capacity above the minimum, you’re not paying for it. ELT : ELT is an acronym for Extract-Load-Transform, where data is extracted from a source system and loaded into a target system before processing or transformation. Compared to ETL, ELT is a more flexible approach to data ingestion because the data is loaded into the target system before transformation. Enterprise Kafka cluster : A Confluent Cloud cluster type. Enterprise Kafka clusters are designed for production-ready functionality that requires private endpoint networking capabilities. envelope encryption : Envelope encryption is a cryptographic technique that uses two keys to encrypt data. The symmetric data encryption key (DEK) is used to encrypt sensitive data. The separate asymmetric key encryption key (KEK) is the master key used to encrypt the DEK. The DEK and encrypted data are stored together. Only users with access to the KEK can decrypt the DEK and access the sensitive data. In Confluent Cloud, envelope encryption is used to enable client-side field level encryption (CSFLE). CSFLE encrypts sensitive data in a message before it is sent to Confluent Cloud and allows for temporary decryption of sensitive data when required to perform operations on the data. **Related terms**: *data encryption key (DEK)*, *key encryption key (KEK)* ETL : ETL is an acronym for Extract-Transform-Load, where data is extracted from a source system, transformed into a target format, and loaded into a target system. Compared to ELT, ETL is a more rigid approach to data ingestion because the data is transformed before loading into the target system. event : An event is a meaningful action or occurrence of something that happened. Events that can be recognized by a program, either human-generated or triggered by software, can be recorded in a log file or other data store. **Related terms**: *event message*, *event record*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event message : An event message is a record of an event sent to a Kafka topic, represented as a key-value pair. Each event message consists of a key-value pair, a timestamp, the compression type, headers for metadata (optional), and a partition and offset ID (once the message is written). The key is optional and can be used to identify the event. The value is required and contains details about the event that happened. **Related terms**: *event*, *event record*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event record : An event record is the record of an event stored in a Kafka topic. Event records are organized and durably stored in topics. Examples of events include orders, payments, activities, or measurements. An event typically contains one or more data fields that describe the fact, as well as a timestamp that denotes when the event was created by its event source. The event may also contain various metadata, such as its source of origin (for example, the application or cloud service that created the event) and storage-level information (for example, its position in the event stream). **Related terms**: *event*, *event message*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event sink : An event sink is a consumer of events, which can include applications, cloud services, databases, IoT sensors, and more. **Related terms**: *event*, *event message, \*event record*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event source : An event source is a producer of events, which can include cloud services, databases, IoT sensors, mainframes, and more. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event stream*, *event streaming*, *event streaming platform*, *event time* event stream : An event stream is a continuous flow of event messages produced by an event source and consumed by one or more consumers. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event streaming platform*, *event time* event streaming : Event streaming is the practice of capturing event data in real-time from data sources. Event streaming is a form of data streaming that is used to capture, store, process, and react to data in real-time or retrospectively. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming platform*, *event time* event streaming platform : An event streaming platform is a platform that events can be written to once, allowing distributed functions within an organization to react in realtime. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event time* event time : Event time is the time when an event occurred on the producing device, as opposed to the time when the event was processed or recorded. Event time is often used in stream processing to determine the order of events and to perform windowing operations. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event streaming platform* exactly-once semantics : Exactly-once semantics is a guarantee that a message is delivered exactly once and in the order that it was sent. Even if a producer retries sending a message, or a consumer retries processing a message, the message is delivered exactly once. This guarantee is achieved by the broker assigning a unique ID to each message and storing the ID in the consumer offset. The consumer offset is committed to the broker only after the message is processed. If the consumer fails to process the message, the message is redelivered and processed again. Freight Kafka cluster : A Confluent Cloud cluster type. Freight Kafka clusters are designed for high-throughput, relaxed latency workloads that are less expensive than self-managed open source Kafka. granularity : Granularity is the degree or level of detail to which an entity (a system, service, or resource) is broken down into subcomponents, parts, or elements. Entities that are *fine-grained* have a higher level of detail, while *coarse-grained* entities have a reduced level of detail, often combining finer parts into a larger whole. In the context of access control, granular permissions provide precise control over resource access. They allow administrators to grant specific operations on distinct resources. This ensures users only have permissions tailored to their needs, minimizing unnecessary or potentially risky access. group mapping : Group mapping is a set of rules that map groups in your SSO identity provider to Confluent Cloud RBAC roles. When a user signs in to Confluent Cloud using SSO, Confluent Cloud uses the group mapping to grant access to Confluent Cloud resources. **Related terms**: *identity provider*, *identity pool*, *principal*, *role* **Related content** - [Group Mapping for Confluent Cloud](/cloud/current/access-management/authenticate/sso/group-mapping/overview.html) identity : An identity is a unique identifier that is used to authenticate and authorize users and applications to access resources. Identity is often used in conjunction with access control to determine whether a user or application is allowed to access a resource and perform a specific action or operation on that resource. **Related terms**: *identity provider*, *identity pool*, *principal*, *role* identity pool : An identity pool is a collection of identities that can be used to authenticate and authorize users and applications to access resources. Identity pools are used to manage permissions for users and applications that access resources in Confluent Cloud. They are also used to manage permissions for Confluent Cloud service accounts that are used to access resources in Confluent Cloud. identity provider : An identity provider is a trusted provider that authenticates users and issues security tokens that are used to verify the identity of a user. Identity providers are often used in single sign-on (SSO) scenarios, where a user can log in to multiple applications or services with a single set of credentials. Infinite Storage : Infinite Storage is the Confluent Cloud storage service that enhances the scalability of Confluent Cloud resources by separating storage and processing. Tiered storage within Confluent Cloud moves data between storage layers based on the needs of the workload, retrieves tiered data when requested, and garbage collects data that is past retention or otherwise deleted. If an application reads historical data, latency is not increased for other applications reading more recent data. Storage resources are decoupled from compute resources, you only pay for what you produce to Confluent Cloud and for storage that you use, and CKUs do not have storage limits. Related content: - [Infinite Storage in Confluent Cloud for Apache Kafka](https://www.confluent.io/blog/infinite-kafka-data-storage-in-confluent-cloud/) ingress : In general networking, ingress refers to traffic that enters a network from an external source. In Confluent Cloud, ingress is a Kafka cluster billing dimension that defines the number of bytes produced to the cluster in one second. Available in the Metrics API as `received_bytes` (convert from bytes to MB). To reduce ingress in Confluent Cloud, compress your messages. For compression, use lz4. Avoid gzip because of high overhead on the cluster. internal topic : An internal topic is a topic, prefixed with double underscores (“_\_”), that is automatically created by a Kafka component to store metadata about the broker, partition assignment, consumer offsets, and other information. Examples of internal topics: `__cluster_metadata`, `__consumer_offsets`, `__transaction_state`, `__confluent.support.metrics`, and `__confluent.support.metrics-raw`. JSON Schema : JSON Schema is a declarative language used for data serialization and exchange to define data structures, specify formats, and validate JSON documents. It is a way to encode expected data types, properties, and constraints to ensure that all fields are properly described for use with serializers and deserializers. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [JSON Schema - a declarative language that allows you to annotate and validate JSON documents.](https://json-schema.org/) - Confluent Cloud: [JSON Schema Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-avro.html) - Confluent Platform: [JSON Schema Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-json.html) Kafka bootstrap server : A Kafka bootstrap server is a Kafka broker that a Kafka client initiates a connection to a Kafka cluster and returns metadata, which includes the addresses for all of the brokers in the Kafka cluster. Although only one bootstrap server is required to connect to a Kafka cluster, multiple brokers can be specified in a bootstrap server list to provide high availability and fault tolerance in case a broker is unavailable. In Confluent Cloud, the bootstrap server is the general cluster endpoint. Kafka broker : A Kafka broker is a server in the Kafka storage layer that stores event streams from one or more sources. A Kafka cluster is typically comprised of several brokers. Every broker in a cluster is also a bootstrap server, meaning if you can connect to one broker in a cluster, you can connect to every broker. Kafka client : A Kafka client allows you to write distributed applications and microservices that read, write, and process streams of events in parallel, at scale, and in a fault-tolerant manner, even in the case of network problems or machine failures. The Kafka client library provides functions, classes, and utilities that allow developers to create Kafka producer clients (Producers) and consumer clients (Consumers) using various programming languages. The primary way to build production-ready Producers and Consumers is by using your preferred programming language and a Kafka client library. **Related content** - [Build Client Applications for Confluent Cloud](/cloud/current/client-apps/overview.html) - [Build Client Applications for Confluent Platform](/platform/current/clients/index.html) - [Getting Started with Apache Kafka and Java (or Python, Go, .Net, and others)](https://developer.confluent.io/get-started/java/) Kafka cluster : A Kafka cluster is a group of interconnected Kafka brokers that manage and distribute real-time data streaming, processing, and storage as if they are a single system. By distributing tasks and services across multiple Kafka brokers, the Kafka cluster improves availability, reliability, and performance. Kafka Connect : Kafka Connect is the component of Kafka that provides data integration between databases, key-value stores, search indexes, file systems, and Kafka brokers. Kafka Connect is an ecosystem of a client application and pluggable connectors. As a client application, Connect is a server process that runs on hardware independent of the Kafka brokers themselves. It is scalable and fault-tolerant, meaning you can run a cluster of Connect workers that share the load of moving data in and out of Kafka from and to external systems. Connect also abstracts the business of code away from the user and instead requires only JSON configuration to run. **Related content** - Confluent Cloud: [Kafka Connect](/cloud/current/billing/overview.html#kconnect-long) - Confluent Platform: [Kafka Connect](/platform/current/connect/index.html) Kafka controller : A Kafka controller is the node in a Kafka cluster that is responsible for managing and changing the metadata of the cluster. This node also communicates metadata changes to the rest of the cluster. When Kafka uses ZooKeeper for metadata management, the controller is a broker, and the broker persists the metadata to ZooKeeper for backup and recovery. With KRaft, you dedicate Kafka nodes to operate as controllers and the metadata is stored in Kafka itself and not persisted to ZooKeeper. KRaft enables faster recovery because of this. For more information, see [KRaft overview](/platform/current/kafka-metadata/kraft.html). Kafka listener : A Kafka listener is an endpoint that Kafka brokers bind to use to communicate with clients. For Kafka clusters, Kafka listeners are configured in the `listeners` property of the `server.properties` file. Advertised listeners are publicly accessible endpoints that are used by clients to connect to the Kafka cluster. **Related content** - [Kafka Listeners – Explained](https://www.confluent.io/blog/kafka-listeners-explained/) Kafka metadata : Kafka metadata is the information about the Kafka cluster and the topics that are stored in it. This information includes details such as the brokers in the cluster, the topics that are available, the partitions for each topic, and the location of the leader for each partition. Kafka metadata is used by clients to discover the available brokers and topics, and to determine which broker is the leader for a particular partition. This information is essential for clients to be able to send and receive messages to and from Kafka. Kafka Streams : Kafka Streams is a stream processing library for building streaming applications and microservices that transform (filter, group mapping, aggregate, join, and more) incoming event streams in real-time to Kafka topics stored in an Kafka cluster. The Streams API can be used to build applications that process data in real-time, analyze data continuously, and build data pipelines. **Related content** - [Kafka Streams](/platform/current/streams/overview.html) Kafka topic : See *topic*. key encryption key (KEK) : A key encryption key (KEK) is a master key that is used to encrypt and a decrypt other keys, specifically the data encryption key (DEK). Only users with access to the KEK can decrypt the DEK and access the sensitive data. **Related terms**: *data encryption key (DEK)*, *envelope encryption*. Kora : Kora is the cloud-native streaming data service based on Kafka technology that powers the Confluent Cloud event streaming platform for building real-time data pipelines and streaming applications. Kora abstracts low-level resources, such as Kafka brokers, and hides operational complexities, such as system upgrades. Kora is built on the following foundations: a tiered storage layer that improves cost and performance, elasticity and consistent performance through incremental load balancing, cost effective multi-tenancy with dynamic quota management and cell-based isolation, continuous monitoring of both system health and data integrity, and clean abstraction with standard Kafka protocols and CKUs to hide underlying resources. **Related terms**: *Apache Kafka*, *Confluent Cloud*, *Confluent Unit for Kafka (CKU)* **Related content** - [Kora: The Cloud Native Engine for Apache Kafka](https://www.confluent.io/blog/cloud-native-data-streaming-kafka-engine/) - [Kora: A Cloud-Native Event Streaming Platform For Kafka](https://www.vldb.org/pvldb/vol16/p3822-povzner.pdf) KRaft : KRaft (or Apache Kafka Raft) is a consensus protocol introduced in Kafka 2.4 to provide metadata management for Kafka with the goal to replace ZooKeeper. KRaft simplifies Kafka because it enables the management of metadata in Kafka itself, rather than splitting it between ZooKeeper and Kafka. As of Confluent Platform 7.5, KRaft is the default method of metadata management in new deployments. For more information, see [KRaft overview](/platform/current/kafka-metadata/kraft.html). ksqlDB : ksqlDB is a streaming SQL database engine purpose-built for creating stream processing applications on top of Kafka. logical Kafka cluster (LKC) : A logical Kafka cluster (LKC) is a subset of a physical Kafka cluster (PKC) that is isolated from other logical clusters within Confluent Cloud. Each logical unit of isolation is considered a tenant and maps to a specific organization. If the mapping is one-to-one, one LKC maps to one PKC (a Dedicated cluster). If the mapping is many-to-one, one LKC maps to one of the multitenant Kafka cluster types (Basic, Standard, Enterprise, and Freight). **Related terms**: *Confluent Cloud*, *Kafka cluster*, *physical Kafka cluster (PKC)* **Related content** - [Kafka Cluster Types in Confluent Cloud](/cloud/current/clusters/cluster-types.html) - [Multi-tenancy and Client Quotas on Confluent Cloud](/cloud/current/clusters/client-quotas.html) multi-region cluster (MRC) : A multi-region cluster (MRC) is a single Kafka cluster that replicates data between datacenters across regional availability zones. multi-tenancy : Multi-tenancy is a software architecture in which a single physical instance is shared among multiple logical instances, or tenants. In Confluent Cloud, each Basic, Standard, Enterprise, and Freight cluster is a logical Kafka cluster (LKC) that shares a physical Kafka cluster (PKC) with other tenants. Each LKC is isolated from other L and has its own resources, such as memory, compute, and storage. **Related terms**: *Confluent Cloud*, *logical Kafka cluster (LKC)*, *physical Kafka cluster (PKC)* **Related content** * [From On-Prem to Cloud-Native: Multi-Tenancy in Confluent Cloud](https://www.confluent.io/blog/cloud-native-multi-tenant-kafka-with-confluent-cloud/) * [Multi-tenancy and Client Quotas on Confluent Cloud](/cloud/current/clusters/client-quotas.html) offset : An offset is an integer assigned to each message that uniquely represents its position within the data stream, guaranteeing the ordering of records and allowing offset-based connections to replay messages from any point in time. **Related terms**: *consumer offset*, *connector offset*, *offset commit*, *replayability* offset commit : An offset commit is the process of keeping track of the current position of an offset-based connection (primarily Kafka consumers and connectors) within the data stream. The offset commit process is not specific to consumers, producers, or connectors. It is a general mechanism in Kafka to track the position of any application that is reading data. When a consumer commits an offset, the offset identifies the next message the consumer should consume. For example, if a consumer has an offset of 5, it has consumed messages 0 through 4 and will next consume message 5. If the consumer crashes or is shut down, its partitions are reassigned to another consumer which initiates consuming from the last committed offset of each partition. The committed offset for consumers is stored on a Kafka broker. When a consumer commits an offset, it sends a commit request to the Kafka cluster, specifying the partition and offset it wants to commit for a particular consumer group. The Kafka broker receiving the commit request then stores this offset in the `__consumer_offsets` internal topic. **Related terms**: *consumer offset*, *offset* OpenSSL : OpenSSL is an open-source software library and toolkit that implements the Secure Sockets Layer (SSL) and Transport Layer Security (TLS) protocols. For more information, see [Wikipedia](https://en.wikipedia.org/wiki/OpenSSL). parent cluster : The Kafka cluster that a resource belongs to. **Related terms**: *Kafka cluster* partition : A partition is a unit of data storage that divides a topic into multiple, parallel event streams, each of which is stored on separate Kafka brokers and can be consumed independently. Partitioning is a key concept in Kafka because it allows Kafka to scale horizontally by adding more brokers to the cluster. Partitions are also the unit of parallelism in Kafka. A topic can have one or more partitions, and each partition is an ordered, immutable sequence of event records that is continually appended to a partition log. partitions (pre-replication) : In Confluent Cloud, partitions are a Kafka cluster billing dimension that define the maximum number of partitions that can exist on the cluster at one time, before replication. While you are not charged for partitions on any type of Kafka cluster, the number of partitions you use has an impact on eCKU. To determine eCKUs limits for partitions, Confluent Cloud bills only for pre-replication (leader partitions) across a cluster. All topics that you create (as well as internal topics that are automatically created by Confluent Platform components such as ksqlDB, Kafka Streams, Connect, and Control Center (Legacy)) count towards the cluster partition limit. Confluent prefixes topics created automatically with an underscore (_). Topics that are internal to Kafka itself (consumer offsets) are not visible in Cloud Console and do not count against partition limits or toward partition billing. Available in the Metrics API as `partition_count`. In Confluent Cloud, attempts to create additional partitions beyond the cluster limit fail with an error message. To reduce usage on partitions (pre-replication), delete unused topics and create new topics with fewer partitions. Use the Kafka Admin interface to increase the partition count of an existing topic if the initial partition count is too low. physical Kafka cluster (PKC) : A physical Kafka cluster (PKC) is a Kafka cluster comprised of multiple brokers. Each physical Kafka cluster is created on a Kubernetes cluster by the control plane. A PKC is not directly accessible by clients. principal : A principal is an entity that can be authenticated and granted permissions based on roles to access resources and perform operations. An entity can be a user account, service account, group mapping, or identity pool. **Related terms**: *group mapping*, *identity*, *identity pool*, *role*, *service account*, *user account* private internet : A private internet is a closed, restricted computer network typically used by organizations to provide secure environments for managing sensitive data and resources. processing time : Processing time is the time when an event is processed or recorded by a system, as opposed to the time when the event occurred on the producing device. Processing time is often used in stream processing to determine the order of events and to perform windowing operations. producer : A producer is a client application that publishes (writes) data to a topic in an Kafka cluster. Producers write data to a topic and are the only clients that can write data to a topic. Each record written to a topic is appended to the partition of the topic that is selected by the producer. Producer API : The Producer API is the Kafka API that allows you to write data to a topic in an Kafka cluster. The Producer API is used by producer clients to publish data to a topic in an Kafka cluster. Protobuf : Protobuf (or Protocol Buffers) is an open-source data format used to serialize structured data for storage. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [Protocol Buffers](https://protobuf.dev/) - [Getting Started with Protobuf in Confluent Cloud](https://www.confluent.io/blog/using-protobuf-in-confluent-cloud/) - Confluent Cloud: [Protobuf Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-protobuf.html) - Confluent Platform: [Protobuf Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-protobuf.html) public internet : The public internet is the global system of interconnected computers and networks that use TCP/IP to communicate with each other. rebalancing : Rebalancing is the process of redistributing the partitions of a topic among the consumers of a consumer group for improved performance and scalability. A rebalance can occur if a consumer has failed the heartbeat and has been excluded from the group, it voluntarily left the group, metadata has been updated for a consumer, or a consumer has joined the group. replayability : Replayability is the ability to replay messages from any point in time. **Related terms**: *consumer offset*, *offset*, *offset commit* replication : Replication is the process of creating and maintaining multiple copies (or *replicas*) of data across different nodes in a distributed system to increase availability, reliability, redundancy, and accessibility. replication factor : A replication factor is the number of copies of a partition that are distributed across the brokers in a cluster. requests : In Confluent Cloud, requests are a Kafka cluster billing dimension that defines the number of client requests to the cluster in one second. Available in the Metrics API as `request_count`. To reduce usage on requests, you can adjust producer batching configurations, consumer client batching configurations, and shut down otherwise inactive clients. For Dedicated clusters, a high number of requests per second results in increased load on the cluster. role : A role is a Confluent-defined job function assigned a set of permissions required to perform specific actions or operations on Confluent resources bound to a principal and Confluent resources. A role can be assigned to a user account, group mapping, service account, or identity pool. **Related terms**: *group mapping*, *identity*, *identity pool*, *principal*, *service account* **Related content** - [Predefined RBAC Roles in Confluent Cloud](/cloud/current/access-management/access-control/rbac/predefined-rbac-roles.html) - [Role-Based Access Control Predefined Roles in Confluent Platform](/platform/current/security/rbac/rbac-predefined-roles.html) rolling restart : A rolling restart restarts the brokers in a Kafka cluster with zero downtime by incrementally restarting a Kafka broker after verifying that there are no under-replicated partitions on the broker before proceeding to the next broker. Restarting the brokers one at a time allows for software upgrades, broker configuration updates, or cluster maintenance while maintaining high availability by avoiding downtime. **Related content** - [Rolling restart](/platform/current/kafka/post-deployment.html#rolling-restart) schema : A schema is the structured definition or blueprint used to describe the format and structure event messages sent through the Kafka event streaming platform. Schemas are used to validate the structure of data in event messages and ensures that producers and consumers are sending and receiving data in the same format. Schemas are defined in the Schema Registry. Schema Registry : Schema Registry is a centralized repository for managing and validating schemas for topic message data that stores and manages schemas for Kafka topics. Schema Registry is built into Confluent Cloud as a managed service, available with the Advanced Stream Governance package, and offered as part of Confluent Enterprise for self-managed deployments. The Schema Registry is a RESTful service that stores and manages schemas for Kafka topics. The Schema Registry is integrated with Kafka and Connect to provide a central location for managing schemas and validating data. Producers and consumers to Kafka topics use schemas to ensure data consistency and compatibility as schemas evolve. Schema Registry is a key component of Stream Governance. **Related content** - Confluent Cloud: [Manage Schemas in Confluent Cloud](/cloud/current/sr/schemas-manage.html) - Confluent Platform: [Schema Registry Overview](/platform/current/schema-registry/index.html) schema subject : A schema subject is the namespace for a schema in Schema Registry. This unique identifier defines a logical grouping of related schemas. Kafka topics contain event messages serialized and deserialized using the structure and rules defined in a schema subject. This ensures compatibility and supports schema evolution. **Related content** - Confluent Cloud: [Manage Schemas in Confluent Cloud](/cloud/current/sr/schemas-manage.html) - Confluent Platform: [Schema Registry Concepts](/platform/current/schema-registry/index.html) - [Understanding Schema Subjects](https://developer.confluent.io/courses/schema-registry/schema-subjects/) Serdes : Serdes are serializers and deserializers that convert objects and parallel data into a serial byte stream for efficient storage and high-speed data transmission over the wire. Confluent provides Serdes for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) - [Serde](https://serde.rs/) serializer : A serializer is a tool that converts objects and parallel data into a serial byte stream. Serializers work with deserializers (known together as Serdes) to support efficient storage and high-speed data transmission over the wire. Confluent provides serializers for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) service account : A service account is a non-person entity used by an application or service to access resources and perform operations. Because a service account is an identity independent of the user who created it, it can be used programmatically to authenticate to resources and perform operations without the need for a user to be signed in. **Related content** - [Service Accounts for Confluent Cloud](/cloud/current/access-management/identity/service-accounts.html) service quota : A service quota is the limit, or maximum value, for a specific Confluent Cloud resource or operation that might vary by the resource scope it applies to. **Related content** - [Service Quotas for Confluent Cloud](/cloud/current/quotas/index.html) single message transform (SMT) : A single message transform (SMT) is a transformation or operation applied in realtime on an individual message that changes the values, keys, or headers of a message before being sent to a sink connector or after being read from a source connector. SMTs are convenient for inserting fields, masking information, event routing, and other minor data adjustments. single sign-on (SSO) : Single sign-on (SSO) is a centralized authentication service that allows users to use a single set of credentials to log in to multiple applications or services. **Related terms**: *authentication*, *group mapping*, *identity provider* **Related content** - [Single Sign-On for Confluent Cloud](/cloud/current/access-management/authenticate/sso/index.html) sink connector : A sink connector is a Kafka Connect connector that publishes (writes) data from a Kafka topic to an external system. source connector : A source connector is a Kafka Connect connector that subscribes (reads) data from a source (external system), extracts the payload and schema of the data, and publishes (writes) the data to Kafka topics. standalone : Standalone refers to a configuration in which a software application, system, or service operates independently on a single instance or device. This mode is commonly used for development, testing, and debugging purposes. For Kafka Connect, a standalone worker is a single process responsible for running all connectors and tasks on a single instance. Standard Kafka cluster : A Confluent Cloud cluster type. Standard Kafka clusters are designed for production-ready features and functionality. static egress IP address : A static egress IP address is an IP address used by a Confluent Cloud managed connector to establish outbound connections to endpoints of external data sources and sinks over the public internet. **Related content** - [Use Static IP Addresses on Confluent Cloud for Connectors and Cluster Linking](/cloud/current/networking/static-egress-ip-addresses.html) - [Static Egress IP Addresses for Confluent Cloud Connectors](/cloud/current/connectors/static-egress-ip.html) storage (pre-replication) : In Confluent Cloud, storage is a Kafka cluster billing dimension that defines the number of bytes retained on the cluster, pre-replication. Available in the Metrics API as `retained_bytes` (convert from bytes to TB). The returned value is pre-replication. Standard, Enterprise, Dedicated, and Freight clusters support Infinite Storage. This means there is no maximum size limit for the amount of data that can be stored on the cluster. You can configure policy settings `retention.bytes` and `retention.ms` at the topic level to control exactly how much and how long to retain data in a way that makes sense for your applications and helps control your costs. To reduce storage in Confluent Cloud, compress your messages and reduce retention settings. For compression, use lz4. Avoid gzip because of high overhead on the cluster. Stream Catalog : Stream Catalog is a pillar of Confluent Cloud Stream Governance that provides a centralized inventory of your organization’s data assets that supports data governance and data discovery. With Data Portal in Confluent Cloud Console, users can find event streams across systems, search topics by name or tags, and enrich event data to increase value and usefulness. REST and GraphQL APIs can be used to search schemas, apply tags to records or fields, manage business metadata, and discover relationships across data assets. **Related content** - [Stream Catalog on Confluent Cloud: User Guide to Manage Tags and Metadata](/cloud/current/stream-governance/stream-catalog.html) - [Stream Catalog in Streaming Data Governance (Confluent Developer course)](https://developer.confluent.io/courses/governing-data-streams/stream-catalog/) Stream Governance : Stream Governance is a collection of tools and features that provide data governance for data in motion. These include data quality tools such as Schema Registry, schema ID validation, and schema linking; built-in data catalog capabilities to classify, organize, and find event streams across systems; and stream lineage to visualize complex data relationships and uncover insights with interactive, end-to-end maps of event streams. Taken together, these and other governance tools enable teams to manage the availability, integrity, and security of data used across organizations, and help with standardization, monitoring, collaboration, reporting, and more. **Related terms**: *Stream Catalog*, *Stream Lineage* **Related content** - [Stream Governance on Confluent Cloud](/cloud/current/stream-governance/index.html) stream lineage : Stream lineage is the life cycle, or history, of data, including its origins, transformations, and consumption, as it moves through various stages in data pipelines, applications, and systems. Stream lineage provides a record of data’s journey from its source to its destination, and is used to track data quality, data governance, and data security. **Related terms**: **Data Portal**, *Stream Governance* **Related content** - [Stream Lineage on Confluent Cloud](/cloud/current/stream-governance/stream-lineage.html) stream processing : Stream processing is the method of collecting event stream data in real-time as it arrives, transforming the data in real-time using operations (such as filters, joins, and aggregations), and publishing the results to one or more target systems. Stream processing can be used to analyze data continuously, build data pipelines, and process time-sensitive data in real-time. Using the Confluent event streaming platform, event streams can be processed in real-time using Kafka Streams, Kafka Connect, or ksqlDB. Streams API : The Streams API is the Kafka API that allows you to build streaming applications and microservices that transform (for example, filter, group, aggregate, join) incoming event streams in real-time to Kafka topics stored in a Kafka cluster. The Streams API is used by stream processing clients to process data in real-time, analyze data continuously, and build data pipelines. **Related content** - [Introduction Kafka Streams API](/platform/current/streams/introduction.html) throttling : Throttling is the process Kafka clusters in Confluent Cloud use to protect themselves from getting to an over-utilized state. Also known as backpressure, throttling in Confluent Cloud occurs when cluster load reaches 80%. At this point, applications may start seeing higher latencies or timeouts as the cluster must begin throttling requests or connection attempts. topic : A topic is a user-defined category or feed name where event messages are stored and published by producers and subscribed to by consumers. Each topic is a log of event messages. Topics are stored in one or more partitions, which distribute topic records brokers in a Kafka cluster. Each partition is an ordered, immutable sequence of records that are continually appended to a topic. **Related content** - [Manage Topics in Confluent Cloud](/cloud/current/client-apps/topics/index.html) total client connections : In Confluent Cloud, total client connections are a Kafka cluster billing dimension that defines the number of TCP connections to the cluster you can open at one time. Available in the Metrics API as `active_connection_count`. Filter by principal to understand how many connections each application is creating. How many connections a cluster supports can vary widely based on several factors, including number of producer clients, number of consumer clients, partition keying strategy, produce patterns per client, and consume patterns per client. For Dedicated clusters, Confluent derives a guideline for total client connections from benchmarking that indicates exceeding this number of connections increases produce latency for test clients. However, this does not apply to all workloads. That is why total client connections are a guideline, not a hard limit for Dedicated Kafka clusters. Monitor the impact on cluster load as connection count increases, as this is the final representation of the impact of a given workload or CKU dimension on the underlying resources of the cluster. Consider the Confluent guideline a per-CKU guideline. The number of connections tends to increase when you add brokers. In other words, if you significantly exceed the per-CKU guideline, cluster expansion doesn’t always give your cluster more connection count headroom. Transport Layer Security (TLS) : Transport Layer Security (TLS) is a cryptographic protocol that provides secure communication over a network. For more information, see [Wikipedia](https://en.wikipedia.org/wiki/Transport_Layer_Security). unbounded stream : An unbounded stream is a stream of data that is continuously generated in real-time and has no defined end. Examples of unbounded streams include stock prices, sensor data, and social media feeds. Processing unbounded streams requires a different approach than processing bounded streams. Unbounded streams are processed incrementally as data arrives, while bounded streams are processed as a batch after all data has arrived. Kafka Streams and Flink can be used to process unbounded streams. **Related terms**: *bounded stream*, *stream processing*, *unbounded stream* under replication : Under replication is a situation when the number of in-sync replicas is below the number of all replicas. Under Replicated partitions can occur when a broker is down or cannot replicate fast enough from the leader (replica fetcher lag). user account : A user account is an account representing the identity of a person who can be authenticated and granted access to Confluent Cloud resources. **Related content** - [User Accounts for Confluent Cloud](/cloud/current/access-management/identity/user-accounts/overview.html) watermark : A watermark in Flink is a marker that keeps track of time as data is processed. A watermark means that all records until the current moment in time have been “seen”. This way, Flink can correctly perform tasks that depend on when things happened, like calculating aggregations over time windows. **Related content** - [Time and Watermarks](/cloud/current/flink/concepts/timely-stream-processing.html) ## HDFS 2 Source Connector Partitions The connector comes out of the box with partitioners that support default partitioning based on Kafka partitions, field partitioning, and time-based partitioning in days or hours. You may implement your own partitioners by extending the Partitioner class. The following partitioners are available by default: * **DefaultPartitioner**: To use `DefaultPartitioner` you have to configure the `partition.class`:`io.confluent.connect.storage.partitioner.DefaultPartitioner`. This partitioner helps to read the data from hadoop2 files which are of the form `//partition=/+++. Admin API : The Admin API is the Kafka REST API that enables administrators to manage and monitor Kafka clusters, topics, brokers, and other Kafka components. Ansible Playbooks for Confluent Platform : Ansible Playbooks for Confluent Platform is a set of Ansible playbooks and roles that are designed to automate the deployment and management of Confluent Platform. Apache Flink : Apache Flink is an open source stream processing framework for stateful computations over unbounded and bounded data streams. Flink provides a unified API for batch and stream processing that supports event-time and out-of-order processing, and supports exactly-once semantics. Flink applications include real-time analytics, data pipelines, and event-driven applications. **Related terms**: *bounded stream*, *data stream*, *stream processing*, *unbounded stream* **Related content** - [Apache Flink: Stream Processing and SQL on Confluent Cloud](/cloud/current/flink/index.html#) - [What is Apache Flink?](https://www.confluent.io/learn/apache-flink/) - [Apache Flink 101 (Confluent Developer course)](https://developer.confluent.io/courses/apache-flink/intro/) Apache Kafka : Apache Kafka is an open source event streaming platform that provides a unified, high-throughput, low-latency, fault-tolerant, scalable, distributed, and secure data streaming platform. Kafka is a publish-and-subscribe messaging system that enables distributed applications to ingest, process, and share data in real-time. **Related content** - [Introduction to Kafka](/kafka/introduction.html) audit log : An audit log is a historical record of actions and operations that are triggered when auditable events occurs. Audit log records can be used to troubleshoot system issues, manage security, and monitor compliance, by tracking administrative activity, data access and modification, monitoring sign-in attempts, and reconstructing security breaches and fraudulent activity. **Related terms**: *auditable event* **Related content** - [Audit Log Concepts for Confluent Cloud](/cloud/current/monitoring/audit-logging/cloud-audit-log-concepts.html) - [Audit Log Concepts for Confluent Platform](/platform/current/security/audit-logs/audit-logs-concepts.html) auditable event : An auditable event is an event that represents an action or operation that can be tracked and monitored for security purposes and compliance. When an auditable event occurs, an auditable event method is triggered and an event message is sent to the audit log cluster and stored as an audit log record. **Related terms**: *audit log*, *event message* **Related content** - [Auditable Events in Confluent Cloud](/cloud/current/monitoring/audit-logging/event-methods/index.html) - [Auditable Events in Confluent Platform](/platform/current/security/audit-logs/auditable-events.html) authentication : Authentication is the process of verifying the identity of a principal that interacts with a system or application. Authentication is often used in conjunction with authorization to determine whether a principal is allowed to access a resource and perform a specific action or operation on that resource. Digital authentication requires one or more of the following: something a principal knows (a password or security question), something a principal has (a security token or key), or something a principal is (a biometric characteristic, such as a fingerprint or voiceprint). Multi-factor authentication (MFA) requires two or more forms of authentication. **Related terms**: *authorization*, *identity*, *identity provider*, *identity pool*, *principal*, *role* authorization : Authorization is the process of evaluating and then granting or denying a principal a set of permissions required to access and perform operations on resources. **Related terms**: *authentication*, *group mapping*, *identity*, *identity provider*, *identity pool*, *principal*, *role* Avro : Avro is a data serialization and exchange framework that provides data structures, remote procedure call (RPC), compact binary data format, a container file, and uses JSON to represent schemas. Avro schemas ensure that every field is properly described and documented for use with serializers and deserializers. You can either send a schema with every message or use Schema Registry to store and receive schemas for use by consumers and producers to save bandwidth and storage space. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [Apache Avro - a data serialization system](https://avro.apache.org/) - Confluent Cloud: [Avro Schema Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-avro.html) - Confluent Platform: [Avro Schema Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-avro.html) Basic Kafka cluster : A Confluent Cloud cluster type. Basic Kafka clusters are designed for experimentation, early development, and basic use cases. batch processing : Batch processing is the method of collecting a large volume of data over a specific time interval, after which the data is processed all at once and loaded into a destination system. Batch processing is often used when processing data can occur independently of the source and timing of the data. It is efficient for non-real-time data processing, such as data warehousing, reporting, and analytics. **Related terms**: *bounded stream*, *stream processing*, *unbounded stream* CIDR block : A CIDR block is a group of IP addresses that are contiguous and can be represented as a single block. CIDR blocks are expressed using Classless Inter-domain Routing (CIDR) notation that includes an IP address and a number of bits in the network mask. **Related content** - [CIDR notation](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing#CIDR_notation) - [Classless Inter-domain Routing (CIDR): The Internet Address Assignment and Aggregation Plan [RFC 4632]](https://www.rfc-editor.org/rfc/rfc4632.html) Cluster Linking : Cluster Linking is a highly performant data replication feature that enables links between Kafka clusters to mirror data from one cluster to another. Cluster Linking creates perfect copies of Kafka topics, which keep data in sync across clusters. Use cases include geo-replication of data, data sharing, migration, disaster recovery, and tiered separation of critical applications. **Related content** - [Geo-replication with Cluster Linking on Confluent Cloud](/cloud/current/multi-cloud/cluster-linking/index.html) - [Cluster Linking for Confluent Platform](/platform/current/multi-dc-deployments/cluster-linking/index.html) commit log : A commit log is a log of all event messages about commits (changes or operations made) sent to a Kafka topic. A commit log ensures that all event messages are processed at least once and provides a mechanism for recovery in the event of a failure. The commit log is also referred to as a write-ahead log (WAL) or a transaction log. **Related terms**: *event message* Confluent Cloud : Confluent Cloud is the fully managed, cloud-native event streaming service powered by Kora, the event streaming platform based on Kafka and extended by Confluent to provide high availability, scalability, elasticity, security, and global interconnectivity. Confluent Cloud offers cost-effective multi-tenant configurations as well as dedicated solutions, if stronger isolation is required. **Related terms**: *Apache Kafka*, *Kora* **Related content** - [Confluent Cloud Overview](/cloud/current/index.html) - [Confluent Cloud](https://www.confluent.io/confluent-cloud/) Confluent Cloud network : A Confluent Cloud network is an abstraction for a single tenant network environment that hosts Dedicated Kafka clusters in Confluent Cloud along with their single tenant services, like ksqlDB clusters and managed connectors. **Related content** - [Confluent Cloud Network Overview](/cloud/current/networking/overview.html#ccloud-networks) Confluent for Kubernetes (CFK) : *Confluent for Kubernetes (CFK)* is a cloud-native control plane for deploying and managing Confluent in private cloud environments through declarative API. Confluent Platform : Confluent Platform is a specialized distribution of Kafka at its core, with additional components for data integration, streaming data pipelines, and stream processing. Confluent REST Proxy : Confluent REST Proxy provides a RESTful interface to an Kafka cluster, making it easy to produce and consume messages, view the state of the cluster, and perform administrative actions without using the native Kafka protocol or clients. **Related content** - Confluent Platform: [REST Proxy](/platform/current/kafka-rest/index.html) Confluent Server : Confluent Server is the default Kafka broker component of Confluent Platform that builds on the foundation of Apache Kafka® and provides enhanced proprietary features designed for enterprise use. Confluent Server is fully compatible with Kafka, and adds Kafka cluster support for Role-Based Access Control, Audit Logs, Schema Validation, Self Balancing Clusters, Tiered Storage, Multi-Region Clusters, and Cluster Linking. **Related terms**: *Confluent Platform*, *Apache Kafka*, *Kafka broker*, *Cluster Linking*, *multi-region cluster (MRC)* Confluent Unit for Kafka (CKU) : Confluent Unit for Kafka (CKU) is a unit of horizontal scaling for Dedicated Kafka clusters in Confluent Cloud that provide preallocated resources. CKUs determine the capacity of a Dedicated Kafka cluster in Confluent Cloud. **Related content** - [CKU limits per cluster](/cloud/current/clusters/cluster-types.html#cku-limits-per-cluster) Connect API : The Connect API is the Kafka API that enables a connector to read event streams from a source system and write to a target system. Connect worker : A Connect worker is a server process that runs a connector and performs the actual work of moving data in and out of Kafka topics. A worker is a server process that runs on hardware independent of the Kafka brokers themselves. It is scalable and fault-tolerant, meaning you can run a cluster of workers that share the load of moving data in and out of Kafka from and to external systems. **Related terms**: *connector*, *Kafka Connect* connection attempts : In Confluent Cloud, connection attempts are a Kafka cluster billing dimension that defines the maximum number of new TCP connections to the cluster you can create in one second. This includes successful and unsuccessful authentication attempts. Available in the Metrics API as `successful_authentication_count` (only includes successful authentications, not unsuccessful authentication attempts). To reduce usage on connection attempts, use longer-lived connections to the cluster. If you exceed the maximum, connection attempts may be refused. connector : A connector is an abstract mechanism that enables communication, coordination, or cooperation among components by transferring data elements from one interface to another without changing the data. connector offset : Connector offset uniquely identifies the position of a connector as it processes data. Connectors use a variety of strategies to implement the connector offset, including everything from monotonically increasing integers to replay ids, lists of files, timestamps and even checkpoint information. Connector offsets keep track of already-processed data in the event of a connector restart or recovery. While sink connectors use a pattern for connector offsets similar to the offset mechanism used throughout Kafka, the implementation details for source connectors are often much different. This is because source connectors track the progress of a source system as it process data. consumer : A consumer is a Kafka client application that subscribes to (reads and processes) event messages from a Kafka topic. The Streams API and the Consumer API are the two APIs that enable consumers to read event streams from Kafka topics. **Related terms**: *Consumer API*, *consumer group*, *producer*, *Streams API* Consumer API : The Consumer API is the Kafka API used for consuming (reading) event messages or records from Kafka topics and enables a Kafka consumer to subscribe to a topic and read event messages as they arrive. Batch processing is a common use case for the Consumer API. consumer group : A consumer group is a single logical consumer implemented with multiple physical consumers for reasons of throughput and resilience. By dividing topics among consumers in the group into partitions, consumers in the group can process messages in parallel, increasing message throughput and enabling load balancing. **Related terms**: *consumer*, *partition*, *partition*, *producer*, *topic* consumer lag : Consumer lag is the number of consumer offsets between the latest message produced in a partition and the last message consumed by a consumer, that is the number of messages pending to be consumed from a particular partition. A large consumer lag, or a quickly growing lag, indicates that the consumer is unable to read from a partition as fast as the messages are available. This can be caused by a slow consumer, slow network, or slow broker. consumer offset : Consumer offset is the unique and monotonically increasing integer value that uniquely identifies the position of an event record in a partition. Consumers use offsets to track their current position in the Kafka topic, allowing consumers to resume processing from where they left off. Offsets are stored on the Kafka broker, which does not track which records have been read and which have not. It is up to the consumer connection to track this information. When a consumer acknowledges receiving and processing a message, it commits an offset value that is stored in the special internal topic `__commit_offsets`. cross-resource RBAC role binding : A cross-resource RBAC role binding is a role binding in Confluent Cloud that is applied at the Organization or Environment scope and grants access to multiple resources. For example, assigning a principal the NetworkAdmin role at the Organization scope lets them administer all networks across all Environments in their Organization. **Related terms**: *identity pool*, *principal*, *role*, *role binding* CRUD : CRUD is an acronym for the four basic operations that can be performed on data: Create, Read, Update, and Delete. custom connector : A custom connector is a connector created using Connect plugins uploaded to Confluent Cloud by users. This includes connector plugins that are built from scratch, modified open-source connector plugins, or third-party connector plugins. data at rest : Data at rest is data that is physically stored on non-volatile media (such as hard drives, solid-state drives, or other storage devices) and is not actively being transmitted or processed by a system. data contract : A data contract is a formal agreement between an upstream component and a downstream component on the structure and semantics of data that is in motion. A schema is a key element of a data contract. The schema, metadata, rules, policies, and evolution plan form the data contract. You can associate data contracts (schemas and more) with [topics](#term-Kafka-topic). **Related content** - Confluent Platform: [Data Contracts for Schema Registry on Confluent Platform](/platform/current/schema-registry/fundamentals/data-contracts.html) - Confluent Cloud: [Data Contracts for Schema Registry on Confluent Cloud](/cloud/current/sr/fundamentals/data-contracts.html) - Cloud Console: [Manage Schemas in Confluent Cloud](/cloud/current//sr/schemas-manage.html) data encryption key (DEK) : A data encryption key (DEK) is a symmetric key that is used to encrypt and decrypt data. The DEK is used in client-side field level encryption (CSFLE) to encrypt sensitive data. The DEK is itself encrypted using a key encryption key (KEK) that is only accessible to authorized users. The encrypted DEK and encrypted data are stored together. Only users with access to the KEK can decrypt the DEK and access the sensitive data. **Related terms**: *envelope encryption*, *key encryption key (KEK)* data in motion : Data in motion is data that is actively being transferred between source and destination, typically systems, devices, or networks. Data in motion is also referred to as data in transit or data in flight. data in use : Data in use is data that is actively being processed or manipulated in memory (RAM, CPU caches, or CPU registers). data ingestion : Data ingestion is the process of collecting, importing, and integrating data from various sources into a system for further processing, analysis, or storage. data mapping : Data mapping is the process of defining relationships or associations between source data elements and target data elements. Data mapping is an important process in data integration, data migration, and data transformation, ensuring that data is accurately and consistently represented when it is moved or combined. data pipeline : A data pipeline is a series of processes and systems that enable the flow of data from sources to destinations, automating the movement and transformation of data for various purposes, such as analytics, reporting, or machine learning. A data pipeline typically comprised of a source system, a data ingestion tool, a data transformation tool, and a target system. A data pipeline covers the following stages: data extraction, data transformation, data loading, and data validation. Data Portal : Data Portal is a Confluent Cloud application that uses Stream Catalog and Stream Lineage to provide self-service access throughout Confluent Cloud Console for data practitioners to search and discover existing topics using tags and business metadata, request access to topics and data, and access data in topics to to build streaming applications and data pipelines. Leverages Stream Catalog and Stream Lineage to provide a data-centric view of Confluent optimized for self-service access to data where users can search, discover and understand available data, request access to data, and use data. **Related terms**: *Stream Catalog*, *Stream Lineage* **Related content** - [Data Portal on Confluent Cloud](/cloud/current/stream-governance/data-portal.html) data serialization : Data serialization is the process of converting data structures or objects into a format that can be stored or transmitted, and reconstructed later in the same or another computer environment. Data serialization is a common technique for implementing data persistence, interprocess communication, and object communication. Confluent Schema Registry (in Confluent Platform) and Confluent Cloud Schema Registry support data serialization using serializers and deserializers for the following formats: Avro, JSON Schema, and Protobuf. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) data steward : A data steward is a person with data-related responsibilities, such as data governance, data quality, and data security. data stream : A data stream is a continuous flow of data records that are produced and consumed by applications. dead letter queue (DLQ) : A dead letter queue (DLQ) is a queue where messages that could not be processed successfully by a sink connector are placed. Instead of stopping, the sink connector sends messages that could not be written successfully as event records to the DLQ topic while the sink connector continues processing messages. Dedicated Kafka cluster : A Confluent Cloud cluster type. Dedicated Kafka clusters are designed for critical production workloads with high traffic or private networking requirements. deserializer : A deserializer is a tool that converts a serial byte stream back into objects and parallel data. Deserializers work with serializers (known together as Serdes) to support efficient storage and high-speed data transmission over the wire. Confluent provides Serdes for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) egress : In general networking, egress refers to outbound traffic leaving a network or a specific network segment. In Confluent Cloud, egress is a Kafka cluster billing dimension that defines the number of bytes consumed from the cluster in one second. Available in the Metrics API as `sent_bytes` (convert from bytes to MB). To reduce egress in Confluent Cloud, compress your messages and ensure each consumer is only consuming from the topics it requires. For compression, use lz4. Avoid gzip because of high overhead on the cluster. Elastic Confluent Unit for Kafka (eCKU) : Elastic Confluent Unit for Kafka (eCKU) is used to express capacity for Basic, Standard, Enterprise, and Freight Kafka clusters. These clusters automatically scale up to a fixed ceiling. There is no need to resize these type clusters. When you need more capacity, your cluster expands up to the fixed ceiling. If you’re not using capacity above the minimum, you’re not paying for it. ELT : ELT is an acronym for Extract-Load-Transform, where data is extracted from a source system and loaded into a target system before processing or transformation. Compared to ETL, ELT is a more flexible approach to data ingestion because the data is loaded into the target system before transformation. Enterprise Kafka cluster : A Confluent Cloud cluster type. Enterprise Kafka clusters are designed for production-ready functionality that requires private endpoint networking capabilities. envelope encryption : Envelope encryption is a cryptographic technique that uses two keys to encrypt data. The symmetric data encryption key (DEK) is used to encrypt sensitive data. The separate asymmetric key encryption key (KEK) is the master key used to encrypt the DEK. The DEK and encrypted data are stored together. Only users with access to the KEK can decrypt the DEK and access the sensitive data. In Confluent Cloud, envelope encryption is used to enable client-side field level encryption (CSFLE). CSFLE encrypts sensitive data in a message before it is sent to Confluent Cloud and allows for temporary decryption of sensitive data when required to perform operations on the data. **Related terms**: *data encryption key (DEK)*, *key encryption key (KEK)* ETL : ETL is an acronym for Extract-Transform-Load, where data is extracted from a source system, transformed into a target format, and loaded into a target system. Compared to ELT, ETL is a more rigid approach to data ingestion because the data is transformed before loading into the target system. event : An event is a meaningful action or occurrence of something that happened. Events that can be recognized by a program, either human-generated or triggered by software, can be recorded in a log file or other data store. **Related terms**: *event message*, *event record*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event message : An event message is a record of an event sent to a Kafka topic, represented as a key-value pair. Each event message consists of a key-value pair, a timestamp, the compression type, headers for metadata (optional), and a partition and offset ID (once the message is written). The key is optional and can be used to identify the event. The value is required and contains details about the event that happened. **Related terms**: *event*, *event record*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event record : An event record is the record of an event stored in a Kafka topic. Event records are organized and durably stored in topics. Examples of events include orders, payments, activities, or measurements. An event typically contains one or more data fields that describe the fact, as well as a timestamp that denotes when the event was created by its event source. The event may also contain various metadata, such as its source of origin (for example, the application or cloud service that created the event) and storage-level information (for example, its position in the event stream). **Related terms**: *event*, *event message*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event sink : An event sink is a consumer of events, which can include applications, cloud services, databases, IoT sensors, and more. **Related terms**: *event*, *event message, \*event record*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event source : An event source is a producer of events, which can include cloud services, databases, IoT sensors, mainframes, and more. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event stream*, *event streaming*, *event streaming platform*, *event time* event stream : An event stream is a continuous flow of event messages produced by an event source and consumed by one or more consumers. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event streaming platform*, *event time* event streaming : Event streaming is the practice of capturing event data in real-time from data sources. Event streaming is a form of data streaming that is used to capture, store, process, and react to data in real-time or retrospectively. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming platform*, *event time* event streaming platform : An event streaming platform is a platform that events can be written to once, allowing distributed functions within an organization to react in realtime. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event time* event time : Event time is the time when an event occurred on the producing device, as opposed to the time when the event was processed or recorded. Event time is often used in stream processing to determine the order of events and to perform windowing operations. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event streaming platform* exactly-once semantics : Exactly-once semantics is a guarantee that a message is delivered exactly once and in the order that it was sent. Even if a producer retries sending a message, or a consumer retries processing a message, the message is delivered exactly once. This guarantee is achieved by the broker assigning a unique ID to each message and storing the ID in the consumer offset. The consumer offset is committed to the broker only after the message is processed. If the consumer fails to process the message, the message is redelivered and processed again. Freight Kafka cluster : A Confluent Cloud cluster type. Freight Kafka clusters are designed for high-throughput, relaxed latency workloads that are less expensive than self-managed open source Kafka. granularity : Granularity is the degree or level of detail to which an entity (a system, service, or resource) is broken down into subcomponents, parts, or elements. Entities that are *fine-grained* have a higher level of detail, while *coarse-grained* entities have a reduced level of detail, often combining finer parts into a larger whole. In the context of access control, granular permissions provide precise control over resource access. They allow administrators to grant specific operations on distinct resources. This ensures users only have permissions tailored to their needs, minimizing unnecessary or potentially risky access. group mapping : Group mapping is a set of rules that map groups in your SSO identity provider to Confluent Cloud RBAC roles. When a user signs in to Confluent Cloud using SSO, Confluent Cloud uses the group mapping to grant access to Confluent Cloud resources. **Related terms**: *identity provider*, *identity pool*, *principal*, *role* **Related content** - [Group Mapping for Confluent Cloud](/cloud/current/access-management/authenticate/sso/group-mapping/overview.html) identity : An identity is a unique identifier that is used to authenticate and authorize users and applications to access resources. Identity is often used in conjunction with access control to determine whether a user or application is allowed to access a resource and perform a specific action or operation on that resource. **Related terms**: *identity provider*, *identity pool*, *principal*, *role* identity pool : An identity pool is a collection of identities that can be used to authenticate and authorize users and applications to access resources. Identity pools are used to manage permissions for users and applications that access resources in Confluent Cloud. They are also used to manage permissions for Confluent Cloud service accounts that are used to access resources in Confluent Cloud. identity provider : An identity provider is a trusted provider that authenticates users and issues security tokens that are used to verify the identity of a user. Identity providers are often used in single sign-on (SSO) scenarios, where a user can log in to multiple applications or services with a single set of credentials. Infinite Storage : Infinite Storage is the Confluent Cloud storage service that enhances the scalability of Confluent Cloud resources by separating storage and processing. Tiered storage within Confluent Cloud moves data between storage layers based on the needs of the workload, retrieves tiered data when requested, and garbage collects data that is past retention or otherwise deleted. If an application reads historical data, latency is not increased for other applications reading more recent data. Storage resources are decoupled from compute resources, you only pay for what you produce to Confluent Cloud and for storage that you use, and CKUs do not have storage limits. Related content: - [Infinite Storage in Confluent Cloud for Apache Kafka](https://www.confluent.io/blog/infinite-kafka-data-storage-in-confluent-cloud/) ingress : In general networking, ingress refers to traffic that enters a network from an external source. In Confluent Cloud, ingress is a Kafka cluster billing dimension that defines the number of bytes produced to the cluster in one second. Available in the Metrics API as `received_bytes` (convert from bytes to MB). To reduce ingress in Confluent Cloud, compress your messages. For compression, use lz4. Avoid gzip because of high overhead on the cluster. internal topic : An internal topic is a topic, prefixed with double underscores (“_\_”), that is automatically created by a Kafka component to store metadata about the broker, partition assignment, consumer offsets, and other information. Examples of internal topics: `__cluster_metadata`, `__consumer_offsets`, `__transaction_state`, `__confluent.support.metrics`, and `__confluent.support.metrics-raw`. JSON Schema : JSON Schema is a declarative language used for data serialization and exchange to define data structures, specify formats, and validate JSON documents. It is a way to encode expected data types, properties, and constraints to ensure that all fields are properly described for use with serializers and deserializers. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [JSON Schema - a declarative language that allows you to annotate and validate JSON documents.](https://json-schema.org/) - Confluent Cloud: [JSON Schema Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-avro.html) - Confluent Platform: [JSON Schema Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-json.html) Kafka bootstrap server : A Kafka bootstrap server is a Kafka broker that a Kafka client initiates a connection to a Kafka cluster and returns metadata, which includes the addresses for all of the brokers in the Kafka cluster. Although only one bootstrap server is required to connect to a Kafka cluster, multiple brokers can be specified in a bootstrap server list to provide high availability and fault tolerance in case a broker is unavailable. In Confluent Cloud, the bootstrap server is the general cluster endpoint. Kafka broker : A Kafka broker is a server in the Kafka storage layer that stores event streams from one or more sources. A Kafka cluster is typically comprised of several brokers. Every broker in a cluster is also a bootstrap server, meaning if you can connect to one broker in a cluster, you can connect to every broker. Kafka client : A Kafka client allows you to write distributed applications and microservices that read, write, and process streams of events in parallel, at scale, and in a fault-tolerant manner, even in the case of network problems or machine failures. The Kafka client library provides functions, classes, and utilities that allow developers to create Kafka producer clients (Producers) and consumer clients (Consumers) using various programming languages. The primary way to build production-ready Producers and Consumers is by using your preferred programming language and a Kafka client library. **Related content** - [Build Client Applications for Confluent Cloud](/cloud/current/client-apps/overview.html) - [Build Client Applications for Confluent Platform](/platform/current/clients/index.html) - [Getting Started with Apache Kafka and Java (or Python, Go, .Net, and others)](https://developer.confluent.io/get-started/java/) Kafka cluster : A Kafka cluster is a group of interconnected Kafka brokers that manage and distribute real-time data streaming, processing, and storage as if they are a single system. By distributing tasks and services across multiple Kafka brokers, the Kafka cluster improves availability, reliability, and performance. Kafka Connect : Kafka Connect is the component of Kafka that provides data integration between databases, key-value stores, search indexes, file systems, and Kafka brokers. Kafka Connect is an ecosystem of a client application and pluggable connectors. As a client application, Connect is a server process that runs on hardware independent of the Kafka brokers themselves. It is scalable and fault-tolerant, meaning you can run a cluster of Connect workers that share the load of moving data in and out of Kafka from and to external systems. Connect also abstracts the business of code away from the user and instead requires only JSON configuration to run. **Related content** - Confluent Cloud: [Kafka Connect](/cloud/current/billing/overview.html#kconnect-long) - Confluent Platform: [Kafka Connect](/platform/current/connect/index.html) Kafka controller : A Kafka controller is the node in a Kafka cluster that is responsible for managing and changing the metadata of the cluster. This node also communicates metadata changes to the rest of the cluster. When Kafka uses ZooKeeper for metadata management, the controller is a broker, and the broker persists the metadata to ZooKeeper for backup and recovery. With KRaft, you dedicate Kafka nodes to operate as controllers and the metadata is stored in Kafka itself and not persisted to ZooKeeper. KRaft enables faster recovery because of this. For more information, see [KRaft overview](/platform/current/kafka-metadata/kraft.html). Kafka listener : A Kafka listener is an endpoint that Kafka brokers bind to use to communicate with clients. For Kafka clusters, Kafka listeners are configured in the `listeners` property of the `server.properties` file. Advertised listeners are publicly accessible endpoints that are used by clients to connect to the Kafka cluster. **Related content** - [Kafka Listeners – Explained](https://www.confluent.io/blog/kafka-listeners-explained/) Kafka metadata : Kafka metadata is the information about the Kafka cluster and the topics that are stored in it. This information includes details such as the brokers in the cluster, the topics that are available, the partitions for each topic, and the location of the leader for each partition. Kafka metadata is used by clients to discover the available brokers and topics, and to determine which broker is the leader for a particular partition. This information is essential for clients to be able to send and receive messages to and from Kafka. Kafka Streams : Kafka Streams is a stream processing library for building streaming applications and microservices that transform (filter, group mapping, aggregate, join, and more) incoming event streams in real-time to Kafka topics stored in an Kafka cluster. The Streams API can be used to build applications that process data in real-time, analyze data continuously, and build data pipelines. **Related content** - [Kafka Streams](/platform/current/streams/overview.html) Kafka topic : See *topic*. key encryption key (KEK) : A key encryption key (KEK) is a master key that is used to encrypt and a decrypt other keys, specifically the data encryption key (DEK). Only users with access to the KEK can decrypt the DEK and access the sensitive data. **Related terms**: *data encryption key (DEK)*, *envelope encryption*. Kora : Kora is the cloud-native streaming data service based on Kafka technology that powers the Confluent Cloud event streaming platform for building real-time data pipelines and streaming applications. Kora abstracts low-level resources, such as Kafka brokers, and hides operational complexities, such as system upgrades. Kora is built on the following foundations: a tiered storage layer that improves cost and performance, elasticity and consistent performance through incremental load balancing, cost effective multi-tenancy with dynamic quota management and cell-based isolation, continuous monitoring of both system health and data integrity, and clean abstraction with standard Kafka protocols and CKUs to hide underlying resources. **Related terms**: *Apache Kafka*, *Confluent Cloud*, *Confluent Unit for Kafka (CKU)* **Related content** - [Kora: The Cloud Native Engine for Apache Kafka](https://www.confluent.io/blog/cloud-native-data-streaming-kafka-engine/) - [Kora: A Cloud-Native Event Streaming Platform For Kafka](https://www.vldb.org/pvldb/vol16/p3822-povzner.pdf) KRaft : KRaft (or Apache Kafka Raft) is a consensus protocol introduced in Kafka 2.4 to provide metadata management for Kafka with the goal to replace ZooKeeper. KRaft simplifies Kafka because it enables the management of metadata in Kafka itself, rather than splitting it between ZooKeeper and Kafka. As of Confluent Platform 7.5, KRaft is the default method of metadata management in new deployments. For more information, see [KRaft overview](/platform/current/kafka-metadata/kraft.html). ksqlDB : ksqlDB is a streaming SQL database engine purpose-built for creating stream processing applications on top of Kafka. logical Kafka cluster (LKC) : A logical Kafka cluster (LKC) is a subset of a physical Kafka cluster (PKC) that is isolated from other logical clusters within Confluent Cloud. Each logical unit of isolation is considered a tenant and maps to a specific organization. If the mapping is one-to-one, one LKC maps to one PKC (a Dedicated cluster). If the mapping is many-to-one, one LKC maps to one of the multitenant Kafka cluster types (Basic, Standard, Enterprise, and Freight). **Related terms**: *Confluent Cloud*, *Kafka cluster*, *physical Kafka cluster (PKC)* **Related content** - [Kafka Cluster Types in Confluent Cloud](/cloud/current/clusters/cluster-types.html) - [Multi-tenancy and Client Quotas on Confluent Cloud](/cloud/current/clusters/client-quotas.html) multi-region cluster (MRC) : A multi-region cluster (MRC) is a single Kafka cluster that replicates data between datacenters across regional availability zones. multi-tenancy : Multi-tenancy is a software architecture in which a single physical instance is shared among multiple logical instances, or tenants. In Confluent Cloud, each Basic, Standard, Enterprise, and Freight cluster is a logical Kafka cluster (LKC) that shares a physical Kafka cluster (PKC) with other tenants. Each LKC is isolated from other L and has its own resources, such as memory, compute, and storage. **Related terms**: *Confluent Cloud*, *logical Kafka cluster (LKC)*, *physical Kafka cluster (PKC)* **Related content** * [From On-Prem to Cloud-Native: Multi-Tenancy in Confluent Cloud](https://www.confluent.io/blog/cloud-native-multi-tenant-kafka-with-confluent-cloud/) * [Multi-tenancy and Client Quotas on Confluent Cloud](/cloud/current/clusters/client-quotas.html) offset : An offset is an integer assigned to each message that uniquely represents its position within the data stream, guaranteeing the ordering of records and allowing offset-based connections to replay messages from any point in time. **Related terms**: *consumer offset*, *connector offset*, *offset commit*, *replayability* offset commit : An offset commit is the process of keeping track of the current position of an offset-based connection (primarily Kafka consumers and connectors) within the data stream. The offset commit process is not specific to consumers, producers, or connectors. It is a general mechanism in Kafka to track the position of any application that is reading data. When a consumer commits an offset, the offset identifies the next message the consumer should consume. For example, if a consumer has an offset of 5, it has consumed messages 0 through 4 and will next consume message 5. If the consumer crashes or is shut down, its partitions are reassigned to another consumer which initiates consuming from the last committed offset of each partition. The committed offset for consumers is stored on a Kafka broker. When a consumer commits an offset, it sends a commit request to the Kafka cluster, specifying the partition and offset it wants to commit for a particular consumer group. The Kafka broker receiving the commit request then stores this offset in the `__consumer_offsets` internal topic. **Related terms**: *consumer offset*, *offset* OpenSSL : OpenSSL is an open-source software library and toolkit that implements the Secure Sockets Layer (SSL) and Transport Layer Security (TLS) protocols. For more information, see [Wikipedia](https://en.wikipedia.org/wiki/OpenSSL). parent cluster : The Kafka cluster that a resource belongs to. **Related terms**: *Kafka cluster* partition : A partition is a unit of data storage that divides a topic into multiple, parallel event streams, each of which is stored on separate Kafka brokers and can be consumed independently. Partitioning is a key concept in Kafka because it allows Kafka to scale horizontally by adding more brokers to the cluster. Partitions are also the unit of parallelism in Kafka. A topic can have one or more partitions, and each partition is an ordered, immutable sequence of event records that is continually appended to a partition log. partitions (pre-replication) : In Confluent Cloud, partitions are a Kafka cluster billing dimension that define the maximum number of partitions that can exist on the cluster at one time, before replication. While you are not charged for partitions on any type of Kafka cluster, the number of partitions you use has an impact on eCKU. To determine eCKUs limits for partitions, Confluent Cloud bills only for pre-replication (leader partitions) across a cluster. All topics that you create (as well as internal topics that are automatically created by Confluent Platform components such as ksqlDB, Kafka Streams, Connect, and Control Center (Legacy)) count towards the cluster partition limit. Confluent prefixes topics created automatically with an underscore (_). Topics that are internal to Kafka itself (consumer offsets) are not visible in Cloud Console and do not count against partition limits or toward partition billing. Available in the Metrics API as `partition_count`. In Confluent Cloud, attempts to create additional partitions beyond the cluster limit fail with an error message. To reduce usage on partitions (pre-replication), delete unused topics and create new topics with fewer partitions. Use the Kafka Admin interface to increase the partition count of an existing topic if the initial partition count is too low. physical Kafka cluster (PKC) : A physical Kafka cluster (PKC) is a Kafka cluster comprised of multiple brokers. Each physical Kafka cluster is created on a Kubernetes cluster by the control plane. A PKC is not directly accessible by clients. principal : A principal is an entity that can be authenticated and granted permissions based on roles to access resources and perform operations. An entity can be a user account, service account, group mapping, or identity pool. **Related terms**: *group mapping*, *identity*, *identity pool*, *role*, *service account*, *user account* private internet : A private internet is a closed, restricted computer network typically used by organizations to provide secure environments for managing sensitive data and resources. processing time : Processing time is the time when an event is processed or recorded by a system, as opposed to the time when the event occurred on the producing device. Processing time is often used in stream processing to determine the order of events and to perform windowing operations. producer : A producer is a client application that publishes (writes) data to a topic in an Kafka cluster. Producers write data to a topic and are the only clients that can write data to a topic. Each record written to a topic is appended to the partition of the topic that is selected by the producer. Producer API : The Producer API is the Kafka API that allows you to write data to a topic in an Kafka cluster. The Producer API is used by producer clients to publish data to a topic in an Kafka cluster. Protobuf : Protobuf (or Protocol Buffers) is an open-source data format used to serialize structured data for storage. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [Protocol Buffers](https://protobuf.dev/) - [Getting Started with Protobuf in Confluent Cloud](https://www.confluent.io/blog/using-protobuf-in-confluent-cloud/) - Confluent Cloud: [Protobuf Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-protobuf.html) - Confluent Platform: [Protobuf Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-protobuf.html) public internet : The public internet is the global system of interconnected computers and networks that use TCP/IP to communicate with each other. rebalancing : Rebalancing is the process of redistributing the partitions of a topic among the consumers of a consumer group for improved performance and scalability. A rebalance can occur if a consumer has failed the heartbeat and has been excluded from the group, it voluntarily left the group, metadata has been updated for a consumer, or a consumer has joined the group. replayability : Replayability is the ability to replay messages from any point in time. **Related terms**: *consumer offset*, *offset*, *offset commit* replication : Replication is the process of creating and maintaining multiple copies (or *replicas*) of data across different nodes in a distributed system to increase availability, reliability, redundancy, and accessibility. replication factor : A replication factor is the number of copies of a partition that are distributed across the brokers in a cluster. requests : In Confluent Cloud, requests are a Kafka cluster billing dimension that defines the number of client requests to the cluster in one second. Available in the Metrics API as `request_count`. To reduce usage on requests, you can adjust producer batching configurations, consumer client batching configurations, and shut down otherwise inactive clients. For Dedicated clusters, a high number of requests per second results in increased load on the cluster. role : A role is a Confluent-defined job function assigned a set of permissions required to perform specific actions or operations on Confluent resources bound to a principal and Confluent resources. A role can be assigned to a user account, group mapping, service account, or identity pool. **Related terms**: *group mapping*, *identity*, *identity pool*, *principal*, *service account* **Related content** - [Predefined RBAC Roles in Confluent Cloud](/cloud/current/access-management/access-control/rbac/predefined-rbac-roles.html) - [Role-Based Access Control Predefined Roles in Confluent Platform](/platform/current/security/rbac/rbac-predefined-roles.html) rolling restart : A rolling restart restarts the brokers in a Kafka cluster with zero downtime by incrementally restarting a Kafka broker after verifying that there are no under-replicated partitions on the broker before proceeding to the next broker. Restarting the brokers one at a time allows for software upgrades, broker configuration updates, or cluster maintenance while maintaining high availability by avoiding downtime. **Related content** - [Rolling restart](/platform/current/kafka/post-deployment.html#rolling-restart) schema : A schema is the structured definition or blueprint used to describe the format and structure event messages sent through the Kafka event streaming platform. Schemas are used to validate the structure of data in event messages and ensures that producers and consumers are sending and receiving data in the same format. Schemas are defined in the Schema Registry. Schema Registry : Schema Registry is a centralized repository for managing and validating schemas for topic message data that stores and manages schemas for Kafka topics. Schema Registry is built into Confluent Cloud as a managed service, available with the Advanced Stream Governance package, and offered as part of Confluent Enterprise for self-managed deployments. The Schema Registry is a RESTful service that stores and manages schemas for Kafka topics. The Schema Registry is integrated with Kafka and Connect to provide a central location for managing schemas and validating data. Producers and consumers to Kafka topics use schemas to ensure data consistency and compatibility as schemas evolve. Schema Registry is a key component of Stream Governance. **Related content** - Confluent Cloud: [Manage Schemas in Confluent Cloud](/cloud/current/sr/schemas-manage.html) - Confluent Platform: [Schema Registry Overview](/platform/current/schema-registry/index.html) schema subject : A schema subject is the namespace for a schema in Schema Registry. This unique identifier defines a logical grouping of related schemas. Kafka topics contain event messages serialized and deserialized using the structure and rules defined in a schema subject. This ensures compatibility and supports schema evolution. **Related content** - Confluent Cloud: [Manage Schemas in Confluent Cloud](/cloud/current/sr/schemas-manage.html) - Confluent Platform: [Schema Registry Concepts](/platform/current/schema-registry/index.html) - [Understanding Schema Subjects](https://developer.confluent.io/courses/schema-registry/schema-subjects/) Serdes : Serdes are serializers and deserializers that convert objects and parallel data into a serial byte stream for efficient storage and high-speed data transmission over the wire. Confluent provides Serdes for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) - [Serde](https://serde.rs/) serializer : A serializer is a tool that converts objects and parallel data into a serial byte stream. Serializers work with deserializers (known together as Serdes) to support efficient storage and high-speed data transmission over the wire. Confluent provides serializers for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) service account : A service account is a non-person entity used by an application or service to access resources and perform operations. Because a service account is an identity independent of the user who created it, it can be used programmatically to authenticate to resources and perform operations without the need for a user to be signed in. **Related content** - [Service Accounts for Confluent Cloud](/cloud/current/access-management/identity/service-accounts.html) service quota : A service quota is the limit, or maximum value, for a specific Confluent Cloud resource or operation that might vary by the resource scope it applies to. **Related content** - [Service Quotas for Confluent Cloud](/cloud/current/quotas/index.html) single message transform (SMT) : A single message transform (SMT) is a transformation or operation applied in realtime on an individual message that changes the values, keys, or headers of a message before being sent to a sink connector or after being read from a source connector. SMTs are convenient for inserting fields, masking information, event routing, and other minor data adjustments. single sign-on (SSO) : Single sign-on (SSO) is a centralized authentication service that allows users to use a single set of credentials to log in to multiple applications or services. **Related terms**: *authentication*, *group mapping*, *identity provider* **Related content** - [Single Sign-On for Confluent Cloud](/cloud/current/access-management/authenticate/sso/index.html) sink connector : A sink connector is a Kafka Connect connector that publishes (writes) data from a Kafka topic to an external system. source connector : A source connector is a Kafka Connect connector that subscribes (reads) data from a source (external system), extracts the payload and schema of the data, and publishes (writes) the data to Kafka topics. standalone : Standalone refers to a configuration in which a software application, system, or service operates independently on a single instance or device. This mode is commonly used for development, testing, and debugging purposes. For Kafka Connect, a standalone worker is a single process responsible for running all connectors and tasks on a single instance. Standard Kafka cluster : A Confluent Cloud cluster type. Standard Kafka clusters are designed for production-ready features and functionality. static egress IP address : A static egress IP address is an IP address used by a Confluent Cloud managed connector to establish outbound connections to endpoints of external data sources and sinks over the public internet. **Related content** - [Use Static IP Addresses on Confluent Cloud for Connectors and Cluster Linking](/cloud/current/networking/static-egress-ip-addresses.html) - [Static Egress IP Addresses for Confluent Cloud Connectors](/cloud/current/connectors/static-egress-ip.html) storage (pre-replication) : In Confluent Cloud, storage is a Kafka cluster billing dimension that defines the number of bytes retained on the cluster, pre-replication. Available in the Metrics API as `retained_bytes` (convert from bytes to TB). The returned value is pre-replication. Standard, Enterprise, Dedicated, and Freight clusters support Infinite Storage. This means there is no maximum size limit for the amount of data that can be stored on the cluster. You can configure policy settings `retention.bytes` and `retention.ms` at the topic level to control exactly how much and how long to retain data in a way that makes sense for your applications and helps control your costs. To reduce storage in Confluent Cloud, compress your messages and reduce retention settings. For compression, use lz4. Avoid gzip because of high overhead on the cluster. Stream Catalog : Stream Catalog is a pillar of Confluent Cloud Stream Governance that provides a centralized inventory of your organization’s data assets that supports data governance and data discovery. With Data Portal in Confluent Cloud Console, users can find event streams across systems, search topics by name or tags, and enrich event data to increase value and usefulness. REST and GraphQL APIs can be used to search schemas, apply tags to records or fields, manage business metadata, and discover relationships across data assets. **Related content** - [Stream Catalog on Confluent Cloud: User Guide to Manage Tags and Metadata](/cloud/current/stream-governance/stream-catalog.html) - [Stream Catalog in Streaming Data Governance (Confluent Developer course)](https://developer.confluent.io/courses/governing-data-streams/stream-catalog/) Stream Governance : Stream Governance is a collection of tools and features that provide data governance for data in motion. These include data quality tools such as Schema Registry, schema ID validation, and schema linking; built-in data catalog capabilities to classify, organize, and find event streams across systems; and stream lineage to visualize complex data relationships and uncover insights with interactive, end-to-end maps of event streams. Taken together, these and other governance tools enable teams to manage the availability, integrity, and security of data used across organizations, and help with standardization, monitoring, collaboration, reporting, and more. **Related terms**: *Stream Catalog*, *Stream Lineage* **Related content** - [Stream Governance on Confluent Cloud](/cloud/current/stream-governance/index.html) stream lineage : Stream lineage is the life cycle, or history, of data, including its origins, transformations, and consumption, as it moves through various stages in data pipelines, applications, and systems. Stream lineage provides a record of data’s journey from its source to its destination, and is used to track data quality, data governance, and data security. **Related terms**: **Data Portal**, *Stream Governance* **Related content** - [Stream Lineage on Confluent Cloud](/cloud/current/stream-governance/stream-lineage.html) stream processing : Stream processing is the method of collecting event stream data in real-time as it arrives, transforming the data in real-time using operations (such as filters, joins, and aggregations), and publishing the results to one or more target systems. Stream processing can be used to analyze data continuously, build data pipelines, and process time-sensitive data in real-time. Using the Confluent event streaming platform, event streams can be processed in real-time using Kafka Streams, Kafka Connect, or ksqlDB. Streams API : The Streams API is the Kafka API that allows you to build streaming applications and microservices that transform (for example, filter, group, aggregate, join) incoming event streams in real-time to Kafka topics stored in a Kafka cluster. The Streams API is used by stream processing clients to process data in real-time, analyze data continuously, and build data pipelines. **Related content** - [Introduction Kafka Streams API](/platform/current/streams/introduction.html) throttling : Throttling is the process Kafka clusters in Confluent Cloud use to protect themselves from getting to an over-utilized state. Also known as backpressure, throttling in Confluent Cloud occurs when cluster load reaches 80%. At this point, applications may start seeing higher latencies or timeouts as the cluster must begin throttling requests or connection attempts. topic : A topic is a user-defined category or feed name where event messages are stored and published by producers and subscribed to by consumers. Each topic is a log of event messages. Topics are stored in one or more partitions, which distribute topic records brokers in a Kafka cluster. Each partition is an ordered, immutable sequence of records that are continually appended to a topic. **Related content** - [Manage Topics in Confluent Cloud](/cloud/current/client-apps/topics/index.html) total client connections : In Confluent Cloud, total client connections are a Kafka cluster billing dimension that defines the number of TCP connections to the cluster you can open at one time. Available in the Metrics API as `active_connection_count`. Filter by principal to understand how many connections each application is creating. How many connections a cluster supports can vary widely based on several factors, including number of producer clients, number of consumer clients, partition keying strategy, produce patterns per client, and consume patterns per client. For Dedicated clusters, Confluent derives a guideline for total client connections from benchmarking that indicates exceeding this number of connections increases produce latency for test clients. However, this does not apply to all workloads. That is why total client connections are a guideline, not a hard limit for Dedicated Kafka clusters. Monitor the impact on cluster load as connection count increases, as this is the final representation of the impact of a given workload or CKU dimension on the underlying resources of the cluster. Consider the Confluent guideline a per-CKU guideline. The number of connections tends to increase when you add brokers. In other words, if you significantly exceed the per-CKU guideline, cluster expansion doesn’t always give your cluster more connection count headroom. Transport Layer Security (TLS) : Transport Layer Security (TLS) is a cryptographic protocol that provides secure communication over a network. For more information, see [Wikipedia](https://en.wikipedia.org/wiki/Transport_Layer_Security). unbounded stream : An unbounded stream is a stream of data that is continuously generated in real-time and has no defined end. Examples of unbounded streams include stock prices, sensor data, and social media feeds. Processing unbounded streams requires a different approach than processing bounded streams. Unbounded streams are processed incrementally as data arrives, while bounded streams are processed as a batch after all data has arrived. Kafka Streams and Flink can be used to process unbounded streams. **Related terms**: *bounded stream*, *stream processing*, *unbounded stream* under replication : Under replication is a situation when the number of in-sync replicas is below the number of all replicas. Under Replicated partitions can occur when a broker is down or cannot replicate fast enough from the leader (replica fetcher lag). user account : A user account is an account representing the identity of a person who can be authenticated and granted access to Confluent Cloud resources. **Related content** - [User Accounts for Confluent Cloud](/cloud/current/access-management/identity/user-accounts/overview.html) watermark : A watermark in Flink is a marker that keeps track of time as data is processed. A watermark means that all records until the current moment in time have been “seen”. This way, Flink can correctly perform tasks that depend on when things happened, like calculating aggregations over time windows. **Related content** - [Time and Watermarks](/cloud/current/flink/concepts/timely-stream-processing.html) ## HDFS 3 Source Connector Partitions The connector comes out of the box with partitioners that support default partitioning based on Kafka partitions, field partitioning, and time-based partitioning in days or hours. You may implement your own partitioners by extending the Partitioner class. The following partitioners are available by default: * **DefaultPartitioner** : To use `DefaultPartitioner` you have to configure the `partition.class`:`io.confluent.connect.storage.partitioner.DefaultPartitioner`. This partitioner helps to read the data from hadoop3 files which are of the form `//partition=/+++. Admin API : The Admin API is the Kafka REST API that enables administrators to manage and monitor Kafka clusters, topics, brokers, and other Kafka components. Ansible Playbooks for Confluent Platform : Ansible Playbooks for Confluent Platform is a set of Ansible playbooks and roles that are designed to automate the deployment and management of Confluent Platform. Apache Flink : Apache Flink is an open source stream processing framework for stateful computations over unbounded and bounded data streams. Flink provides a unified API for batch and stream processing that supports event-time and out-of-order processing, and supports exactly-once semantics. Flink applications include real-time analytics, data pipelines, and event-driven applications. **Related terms**: *bounded stream*, *data stream*, *stream processing*, *unbounded stream* **Related content** - [Apache Flink: Stream Processing and SQL on Confluent Cloud](/cloud/current/flink/index.html#) - [What is Apache Flink?](https://www.confluent.io/learn/apache-flink/) - [Apache Flink 101 (Confluent Developer course)](https://developer.confluent.io/courses/apache-flink/intro/) Apache Kafka : Apache Kafka is an open source event streaming platform that provides a unified, high-throughput, low-latency, fault-tolerant, scalable, distributed, and secure data streaming platform. Kafka is a publish-and-subscribe messaging system that enables distributed applications to ingest, process, and share data in real-time. **Related content** - [Introduction to Kafka](/kafka/introduction.html) audit log : An audit log is a historical record of actions and operations that are triggered when auditable events occurs. Audit log records can be used to troubleshoot system issues, manage security, and monitor compliance, by tracking administrative activity, data access and modification, monitoring sign-in attempts, and reconstructing security breaches and fraudulent activity. **Related terms**: *auditable event* **Related content** - [Audit Log Concepts for Confluent Cloud](/cloud/current/monitoring/audit-logging/cloud-audit-log-concepts.html) - [Audit Log Concepts for Confluent Platform](/platform/current/security/audit-logs/audit-logs-concepts.html) auditable event : An auditable event is an event that represents an action or operation that can be tracked and monitored for security purposes and compliance. When an auditable event occurs, an auditable event method is triggered and an event message is sent to the audit log cluster and stored as an audit log record. **Related terms**: *audit log*, *event message* **Related content** - [Auditable Events in Confluent Cloud](/cloud/current/monitoring/audit-logging/event-methods/index.html) - [Auditable Events in Confluent Platform](/platform/current/security/audit-logs/auditable-events.html) authentication : Authentication is the process of verifying the identity of a principal that interacts with a system or application. Authentication is often used in conjunction with authorization to determine whether a principal is allowed to access a resource and perform a specific action or operation on that resource. Digital authentication requires one or more of the following: something a principal knows (a password or security question), something a principal has (a security token or key), or something a principal is (a biometric characteristic, such as a fingerprint or voiceprint). Multi-factor authentication (MFA) requires two or more forms of authentication. **Related terms**: *authorization*, *identity*, *identity provider*, *identity pool*, *principal*, *role* authorization : Authorization is the process of evaluating and then granting or denying a principal a set of permissions required to access and perform operations on resources. **Related terms**: *authentication*, *group mapping*, *identity*, *identity provider*, *identity pool*, *principal*, *role* Avro : Avro is a data serialization and exchange framework that provides data structures, remote procedure call (RPC), compact binary data format, a container file, and uses JSON to represent schemas. Avro schemas ensure that every field is properly described and documented for use with serializers and deserializers. You can either send a schema with every message or use Schema Registry to store and receive schemas for use by consumers and producers to save bandwidth and storage space. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [Apache Avro - a data serialization system](https://avro.apache.org/) - Confluent Cloud: [Avro Schema Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-avro.html) - Confluent Platform: [Avro Schema Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-avro.html) Basic Kafka cluster : A Confluent Cloud cluster type. Basic Kafka clusters are designed for experimentation, early development, and basic use cases. batch processing : Batch processing is the method of collecting a large volume of data over a specific time interval, after which the data is processed all at once and loaded into a destination system. Batch processing is often used when processing data can occur independently of the source and timing of the data. It is efficient for non-real-time data processing, such as data warehousing, reporting, and analytics. **Related terms**: *bounded stream*, *stream processing*, *unbounded stream* CIDR block : A CIDR block is a group of IP addresses that are contiguous and can be represented as a single block. CIDR blocks are expressed using Classless Inter-domain Routing (CIDR) notation that includes an IP address and a number of bits in the network mask. **Related content** - [CIDR notation](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing#CIDR_notation) - [Classless Inter-domain Routing (CIDR): The Internet Address Assignment and Aggregation Plan [RFC 4632]](https://www.rfc-editor.org/rfc/rfc4632.html) Cluster Linking : Cluster Linking is a highly performant data replication feature that enables links between Kafka clusters to mirror data from one cluster to another. Cluster Linking creates perfect copies of Kafka topics, which keep data in sync across clusters. Use cases include geo-replication of data, data sharing, migration, disaster recovery, and tiered separation of critical applications. **Related content** - [Geo-replication with Cluster Linking on Confluent Cloud](/cloud/current/multi-cloud/cluster-linking/index.html) - [Cluster Linking for Confluent Platform](/platform/current/multi-dc-deployments/cluster-linking/index.html) commit log : A commit log is a log of all event messages about commits (changes or operations made) sent to a Kafka topic. A commit log ensures that all event messages are processed at least once and provides a mechanism for recovery in the event of a failure. The commit log is also referred to as a write-ahead log (WAL) or a transaction log. **Related terms**: *event message* Confluent Cloud : Confluent Cloud is the fully managed, cloud-native event streaming service powered by Kora, the event streaming platform based on Kafka and extended by Confluent to provide high availability, scalability, elasticity, security, and global interconnectivity. Confluent Cloud offers cost-effective multi-tenant configurations as well as dedicated solutions, if stronger isolation is required. **Related terms**: *Apache Kafka*, *Kora* **Related content** - [Confluent Cloud Overview](/cloud/current/index.html) - [Confluent Cloud](https://www.confluent.io/confluent-cloud/) Confluent Cloud network : A Confluent Cloud network is an abstraction for a single tenant network environment that hosts Dedicated Kafka clusters in Confluent Cloud along with their single tenant services, like ksqlDB clusters and managed connectors. **Related content** - [Confluent Cloud Network Overview](/cloud/current/networking/overview.html#ccloud-networks) Confluent for Kubernetes (CFK) : *Confluent for Kubernetes (CFK)* is a cloud-native control plane for deploying and managing Confluent in private cloud environments through declarative API. Confluent Platform : Confluent Platform is a specialized distribution of Kafka at its core, with additional components for data integration, streaming data pipelines, and stream processing. Confluent REST Proxy : Confluent REST Proxy provides a RESTful interface to an Kafka cluster, making it easy to produce and consume messages, view the state of the cluster, and perform administrative actions without using the native Kafka protocol or clients. **Related content** - Confluent Platform: [REST Proxy](/platform/current/kafka-rest/index.html) Confluent Server : Confluent Server is the default Kafka broker component of Confluent Platform that builds on the foundation of Apache Kafka® and provides enhanced proprietary features designed for enterprise use. Confluent Server is fully compatible with Kafka, and adds Kafka cluster support for Role-Based Access Control, Audit Logs, Schema Validation, Self Balancing Clusters, Tiered Storage, Multi-Region Clusters, and Cluster Linking. **Related terms**: *Confluent Platform*, *Apache Kafka*, *Kafka broker*, *Cluster Linking*, *multi-region cluster (MRC)* Confluent Unit for Kafka (CKU) : Confluent Unit for Kafka (CKU) is a unit of horizontal scaling for Dedicated Kafka clusters in Confluent Cloud that provide preallocated resources. CKUs determine the capacity of a Dedicated Kafka cluster in Confluent Cloud. **Related content** - [CKU limits per cluster](/cloud/current/clusters/cluster-types.html#cku-limits-per-cluster) Connect API : The Connect API is the Kafka API that enables a connector to read event streams from a source system and write to a target system. Connect worker : A Connect worker is a server process that runs a connector and performs the actual work of moving data in and out of Kafka topics. A worker is a server process that runs on hardware independent of the Kafka brokers themselves. It is scalable and fault-tolerant, meaning you can run a cluster of workers that share the load of moving data in and out of Kafka from and to external systems. **Related terms**: *connector*, *Kafka Connect* connection attempts : In Confluent Cloud, connection attempts are a Kafka cluster billing dimension that defines the maximum number of new TCP connections to the cluster you can create in one second. This includes successful and unsuccessful authentication attempts. Available in the Metrics API as `successful_authentication_count` (only includes successful authentications, not unsuccessful authentication attempts). To reduce usage on connection attempts, use longer-lived connections to the cluster. If you exceed the maximum, connection attempts may be refused. connector : A connector is an abstract mechanism that enables communication, coordination, or cooperation among components by transferring data elements from one interface to another without changing the data. connector offset : Connector offset uniquely identifies the position of a connector as it processes data. Connectors use a variety of strategies to implement the connector offset, including everything from monotonically increasing integers to replay ids, lists of files, timestamps and even checkpoint information. Connector offsets keep track of already-processed data in the event of a connector restart or recovery. While sink connectors use a pattern for connector offsets similar to the offset mechanism used throughout Kafka, the implementation details for source connectors are often much different. This is because source connectors track the progress of a source system as it process data. consumer : A consumer is a Kafka client application that subscribes to (reads and processes) event messages from a Kafka topic. The Streams API and the Consumer API are the two APIs that enable consumers to read event streams from Kafka topics. **Related terms**: *Consumer API*, *consumer group*, *producer*, *Streams API* Consumer API : The Consumer API is the Kafka API used for consuming (reading) event messages or records from Kafka topics and enables a Kafka consumer to subscribe to a topic and read event messages as they arrive. Batch processing is a common use case for the Consumer API. consumer group : A consumer group is a single logical consumer implemented with multiple physical consumers for reasons of throughput and resilience. By dividing topics among consumers in the group into partitions, consumers in the group can process messages in parallel, increasing message throughput and enabling load balancing. **Related terms**: *consumer*, *partition*, *partition*, *producer*, *topic* consumer lag : Consumer lag is the number of consumer offsets between the latest message produced in a partition and the last message consumed by a consumer, that is the number of messages pending to be consumed from a particular partition. A large consumer lag, or a quickly growing lag, indicates that the consumer is unable to read from a partition as fast as the messages are available. This can be caused by a slow consumer, slow network, or slow broker. consumer offset : Consumer offset is the unique and monotonically increasing integer value that uniquely identifies the position of an event record in a partition. Consumers use offsets to track their current position in the Kafka topic, allowing consumers to resume processing from where they left off. Offsets are stored on the Kafka broker, which does not track which records have been read and which have not. It is up to the consumer connection to track this information. When a consumer acknowledges receiving and processing a message, it commits an offset value that is stored in the special internal topic `__commit_offsets`. cross-resource RBAC role binding : A cross-resource RBAC role binding is a role binding in Confluent Cloud that is applied at the Organization or Environment scope and grants access to multiple resources. For example, assigning a principal the NetworkAdmin role at the Organization scope lets them administer all networks across all Environments in their Organization. **Related terms**: *identity pool*, *principal*, *role*, *role binding* CRUD : CRUD is an acronym for the four basic operations that can be performed on data: Create, Read, Update, and Delete. custom connector : A custom connector is a connector created using Connect plugins uploaded to Confluent Cloud by users. This includes connector plugins that are built from scratch, modified open-source connector plugins, or third-party connector plugins. data at rest : Data at rest is data that is physically stored on non-volatile media (such as hard drives, solid-state drives, or other storage devices) and is not actively being transmitted or processed by a system. data contract : A data contract is a formal agreement between an upstream component and a downstream component on the structure and semantics of data that is in motion. A schema is a key element of a data contract. The schema, metadata, rules, policies, and evolution plan form the data contract. You can associate data contracts (schemas and more) with [topics](#term-Kafka-topic). **Related content** - Confluent Platform: [Data Contracts for Schema Registry on Confluent Platform](/platform/current/schema-registry/fundamentals/data-contracts.html) - Confluent Cloud: [Data Contracts for Schema Registry on Confluent Cloud](/cloud/current/sr/fundamentals/data-contracts.html) - Cloud Console: [Manage Schemas in Confluent Cloud](/cloud/current//sr/schemas-manage.html) data encryption key (DEK) : A data encryption key (DEK) is a symmetric key that is used to encrypt and decrypt data. The DEK is used in client-side field level encryption (CSFLE) to encrypt sensitive data. The DEK is itself encrypted using a key encryption key (KEK) that is only accessible to authorized users. The encrypted DEK and encrypted data are stored together. Only users with access to the KEK can decrypt the DEK and access the sensitive data. **Related terms**: *envelope encryption*, *key encryption key (KEK)* data in motion : Data in motion is data that is actively being transferred between source and destination, typically systems, devices, or networks. Data in motion is also referred to as data in transit or data in flight. data in use : Data in use is data that is actively being processed or manipulated in memory (RAM, CPU caches, or CPU registers). data ingestion : Data ingestion is the process of collecting, importing, and integrating data from various sources into a system for further processing, analysis, or storage. data mapping : Data mapping is the process of defining relationships or associations between source data elements and target data elements. Data mapping is an important process in data integration, data migration, and data transformation, ensuring that data is accurately and consistently represented when it is moved or combined. data pipeline : A data pipeline is a series of processes and systems that enable the flow of data from sources to destinations, automating the movement and transformation of data for various purposes, such as analytics, reporting, or machine learning. A data pipeline typically comprised of a source system, a data ingestion tool, a data transformation tool, and a target system. A data pipeline covers the following stages: data extraction, data transformation, data loading, and data validation. Data Portal : Data Portal is a Confluent Cloud application that uses Stream Catalog and Stream Lineage to provide self-service access throughout Confluent Cloud Console for data practitioners to search and discover existing topics using tags and business metadata, request access to topics and data, and access data in topics to to build streaming applications and data pipelines. Leverages Stream Catalog and Stream Lineage to provide a data-centric view of Confluent optimized for self-service access to data where users can search, discover and understand available data, request access to data, and use data. **Related terms**: *Stream Catalog*, *Stream Lineage* **Related content** - [Data Portal on Confluent Cloud](/cloud/current/stream-governance/data-portal.html) data serialization : Data serialization is the process of converting data structures or objects into a format that can be stored or transmitted, and reconstructed later in the same or another computer environment. Data serialization is a common technique for implementing data persistence, interprocess communication, and object communication. Confluent Schema Registry (in Confluent Platform) and Confluent Cloud Schema Registry support data serialization using serializers and deserializers for the following formats: Avro, JSON Schema, and Protobuf. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) data steward : A data steward is a person with data-related responsibilities, such as data governance, data quality, and data security. data stream : A data stream is a continuous flow of data records that are produced and consumed by applications. dead letter queue (DLQ) : A dead letter queue (DLQ) is a queue where messages that could not be processed successfully by a sink connector are placed. Instead of stopping, the sink connector sends messages that could not be written successfully as event records to the DLQ topic while the sink connector continues processing messages. Dedicated Kafka cluster : A Confluent Cloud cluster type. Dedicated Kafka clusters are designed for critical production workloads with high traffic or private networking requirements. deserializer : A deserializer is a tool that converts a serial byte stream back into objects and parallel data. Deserializers work with serializers (known together as Serdes) to support efficient storage and high-speed data transmission over the wire. Confluent provides Serdes for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) egress : In general networking, egress refers to outbound traffic leaving a network or a specific network segment. In Confluent Cloud, egress is a Kafka cluster billing dimension that defines the number of bytes consumed from the cluster in one second. Available in the Metrics API as `sent_bytes` (convert from bytes to MB). To reduce egress in Confluent Cloud, compress your messages and ensure each consumer is only consuming from the topics it requires. For compression, use lz4. Avoid gzip because of high overhead on the cluster. Elastic Confluent Unit for Kafka (eCKU) : Elastic Confluent Unit for Kafka (eCKU) is used to express capacity for Basic, Standard, Enterprise, and Freight Kafka clusters. These clusters automatically scale up to a fixed ceiling. There is no need to resize these type clusters. When you need more capacity, your cluster expands up to the fixed ceiling. If you’re not using capacity above the minimum, you’re not paying for it. ELT : ELT is an acronym for Extract-Load-Transform, where data is extracted from a source system and loaded into a target system before processing or transformation. Compared to ETL, ELT is a more flexible approach to data ingestion because the data is loaded into the target system before transformation. Enterprise Kafka cluster : A Confluent Cloud cluster type. Enterprise Kafka clusters are designed for production-ready functionality that requires private endpoint networking capabilities. envelope encryption : Envelope encryption is a cryptographic technique that uses two keys to encrypt data. The symmetric data encryption key (DEK) is used to encrypt sensitive data. The separate asymmetric key encryption key (KEK) is the master key used to encrypt the DEK. The DEK and encrypted data are stored together. Only users with access to the KEK can decrypt the DEK and access the sensitive data. In Confluent Cloud, envelope encryption is used to enable client-side field level encryption (CSFLE). CSFLE encrypts sensitive data in a message before it is sent to Confluent Cloud and allows for temporary decryption of sensitive data when required to perform operations on the data. **Related terms**: *data encryption key (DEK)*, *key encryption key (KEK)* ETL : ETL is an acronym for Extract-Transform-Load, where data is extracted from a source system, transformed into a target format, and loaded into a target system. Compared to ELT, ETL is a more rigid approach to data ingestion because the data is transformed before loading into the target system. event : An event is a meaningful action or occurrence of something that happened. Events that can be recognized by a program, either human-generated or triggered by software, can be recorded in a log file or other data store. **Related terms**: *event message*, *event record*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event message : An event message is a record of an event sent to a Kafka topic, represented as a key-value pair. Each event message consists of a key-value pair, a timestamp, the compression type, headers for metadata (optional), and a partition and offset ID (once the message is written). The key is optional and can be used to identify the event. The value is required and contains details about the event that happened. **Related terms**: *event*, *event record*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event record : An event record is the record of an event stored in a Kafka topic. Event records are organized and durably stored in topics. Examples of events include orders, payments, activities, or measurements. An event typically contains one or more data fields that describe the fact, as well as a timestamp that denotes when the event was created by its event source. The event may also contain various metadata, such as its source of origin (for example, the application or cloud service that created the event) and storage-level information (for example, its position in the event stream). **Related terms**: *event*, *event message*, *event sink*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event sink : An event sink is a consumer of events, which can include applications, cloud services, databases, IoT sensors, and more. **Related terms**: *event*, *event message, \*event record*, *event source*, *event stream*, *event streaming*, *event streaming platform*, *event time* event source : An event source is a producer of events, which can include cloud services, databases, IoT sensors, mainframes, and more. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event stream*, *event streaming*, *event streaming platform*, *event time* event stream : An event stream is a continuous flow of event messages produced by an event source and consumed by one or more consumers. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event streaming platform*, *event time* event streaming : Event streaming is the practice of capturing event data in real-time from data sources. Event streaming is a form of data streaming that is used to capture, store, process, and react to data in real-time or retrospectively. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming platform*, *event time* event streaming platform : An event streaming platform is a platform that events can be written to once, allowing distributed functions within an organization to react in realtime. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event time* event time : Event time is the time when an event occurred on the producing device, as opposed to the time when the event was processed or recorded. Event time is often used in stream processing to determine the order of events and to perform windowing operations. **Related terms**: *event*, *event message*, *event record*, *event sink*, *event source*, *event streaming*, *event streaming platform* exactly-once semantics : Exactly-once semantics is a guarantee that a message is delivered exactly once and in the order that it was sent. Even if a producer retries sending a message, or a consumer retries processing a message, the message is delivered exactly once. This guarantee is achieved by the broker assigning a unique ID to each message and storing the ID in the consumer offset. The consumer offset is committed to the broker only after the message is processed. If the consumer fails to process the message, the message is redelivered and processed again. Freight Kafka cluster : A Confluent Cloud cluster type. Freight Kafka clusters are designed for high-throughput, relaxed latency workloads that are less expensive than self-managed open source Kafka. granularity : Granularity is the degree or level of detail to which an entity (a system, service, or resource) is broken down into subcomponents, parts, or elements. Entities that are *fine-grained* have a higher level of detail, while *coarse-grained* entities have a reduced level of detail, often combining finer parts into a larger whole. In the context of access control, granular permissions provide precise control over resource access. They allow administrators to grant specific operations on distinct resources. This ensures users only have permissions tailored to their needs, minimizing unnecessary or potentially risky access. group mapping : Group mapping is a set of rules that map groups in your SSO identity provider to Confluent Cloud RBAC roles. When a user signs in to Confluent Cloud using SSO, Confluent Cloud uses the group mapping to grant access to Confluent Cloud resources. **Related terms**: *identity provider*, *identity pool*, *principal*, *role* **Related content** - [Group Mapping for Confluent Cloud](/cloud/current/access-management/authenticate/sso/group-mapping/overview.html) identity : An identity is a unique identifier that is used to authenticate and authorize users and applications to access resources. Identity is often used in conjunction with access control to determine whether a user or application is allowed to access a resource and perform a specific action or operation on that resource. **Related terms**: *identity provider*, *identity pool*, *principal*, *role* identity pool : An identity pool is a collection of identities that can be used to authenticate and authorize users and applications to access resources. Identity pools are used to manage permissions for users and applications that access resources in Confluent Cloud. They are also used to manage permissions for Confluent Cloud service accounts that are used to access resources in Confluent Cloud. identity provider : An identity provider is a trusted provider that authenticates users and issues security tokens that are used to verify the identity of a user. Identity providers are often used in single sign-on (SSO) scenarios, where a user can log in to multiple applications or services with a single set of credentials. Infinite Storage : Infinite Storage is the Confluent Cloud storage service that enhances the scalability of Confluent Cloud resources by separating storage and processing. Tiered storage within Confluent Cloud moves data between storage layers based on the needs of the workload, retrieves tiered data when requested, and garbage collects data that is past retention or otherwise deleted. If an application reads historical data, latency is not increased for other applications reading more recent data. Storage resources are decoupled from compute resources, you only pay for what you produce to Confluent Cloud and for storage that you use, and CKUs do not have storage limits. Related content: - [Infinite Storage in Confluent Cloud for Apache Kafka](https://www.confluent.io/blog/infinite-kafka-data-storage-in-confluent-cloud/) ingress : In general networking, ingress refers to traffic that enters a network from an external source. In Confluent Cloud, ingress is a Kafka cluster billing dimension that defines the number of bytes produced to the cluster in one second. Available in the Metrics API as `received_bytes` (convert from bytes to MB). To reduce ingress in Confluent Cloud, compress your messages. For compression, use lz4. Avoid gzip because of high overhead on the cluster. internal topic : An internal topic is a topic, prefixed with double underscores (“_\_”), that is automatically created by a Kafka component to store metadata about the broker, partition assignment, consumer offsets, and other information. Examples of internal topics: `__cluster_metadata`, `__consumer_offsets`, `__transaction_state`, `__confluent.support.metrics`, and `__confluent.support.metrics-raw`. JSON Schema : JSON Schema is a declarative language used for data serialization and exchange to define data structures, specify formats, and validate JSON documents. It is a way to encode expected data types, properties, and constraints to ensure that all fields are properly described for use with serializers and deserializers. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [JSON Schema - a declarative language that allows you to annotate and validate JSON documents.](https://json-schema.org/) - Confluent Cloud: [JSON Schema Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-avro.html) - Confluent Platform: [JSON Schema Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-json.html) Kafka bootstrap server : A Kafka bootstrap server is a Kafka broker that a Kafka client initiates a connection to a Kafka cluster and returns metadata, which includes the addresses for all of the brokers in the Kafka cluster. Although only one bootstrap server is required to connect to a Kafka cluster, multiple brokers can be specified in a bootstrap server list to provide high availability and fault tolerance in case a broker is unavailable. In Confluent Cloud, the bootstrap server is the general cluster endpoint. Kafka broker : A Kafka broker is a server in the Kafka storage layer that stores event streams from one or more sources. A Kafka cluster is typically comprised of several brokers. Every broker in a cluster is also a bootstrap server, meaning if you can connect to one broker in a cluster, you can connect to every broker. Kafka client : A Kafka client allows you to write distributed applications and microservices that read, write, and process streams of events in parallel, at scale, and in a fault-tolerant manner, even in the case of network problems or machine failures. The Kafka client library provides functions, classes, and utilities that allow developers to create Kafka producer clients (Producers) and consumer clients (Consumers) using various programming languages. The primary way to build production-ready Producers and Consumers is by using your preferred programming language and a Kafka client library. **Related content** - [Build Client Applications for Confluent Cloud](/cloud/current/client-apps/overview.html) - [Build Client Applications for Confluent Platform](/platform/current/clients/index.html) - [Getting Started with Apache Kafka and Java (or Python, Go, .Net, and others)](https://developer.confluent.io/get-started/java/) Kafka cluster : A Kafka cluster is a group of interconnected Kafka brokers that manage and distribute real-time data streaming, processing, and storage as if they are a single system. By distributing tasks and services across multiple Kafka brokers, the Kafka cluster improves availability, reliability, and performance. Kafka Connect : Kafka Connect is the component of Kafka that provides data integration between databases, key-value stores, search indexes, file systems, and Kafka brokers. Kafka Connect is an ecosystem of a client application and pluggable connectors. As a client application, Connect is a server process that runs on hardware independent of the Kafka brokers themselves. It is scalable and fault-tolerant, meaning you can run a cluster of Connect workers that share the load of moving data in and out of Kafka from and to external systems. Connect also abstracts the business of code away from the user and instead requires only JSON configuration to run. **Related content** - Confluent Cloud: [Kafka Connect](/cloud/current/billing/overview.html#kconnect-long) - Confluent Platform: [Kafka Connect](/platform/current/connect/index.html) Kafka controller : A Kafka controller is the node in a Kafka cluster that is responsible for managing and changing the metadata of the cluster. This node also communicates metadata changes to the rest of the cluster. When Kafka uses ZooKeeper for metadata management, the controller is a broker, and the broker persists the metadata to ZooKeeper for backup and recovery. With KRaft, you dedicate Kafka nodes to operate as controllers and the metadata is stored in Kafka itself and not persisted to ZooKeeper. KRaft enables faster recovery because of this. For more information, see [KRaft overview](/platform/current/kafka-metadata/kraft.html). Kafka listener : A Kafka listener is an endpoint that Kafka brokers bind to use to communicate with clients. For Kafka clusters, Kafka listeners are configured in the `listeners` property of the `server.properties` file. Advertised listeners are publicly accessible endpoints that are used by clients to connect to the Kafka cluster. **Related content** - [Kafka Listeners – Explained](https://www.confluent.io/blog/kafka-listeners-explained/) Kafka metadata : Kafka metadata is the information about the Kafka cluster and the topics that are stored in it. This information includes details such as the brokers in the cluster, the topics that are available, the partitions for each topic, and the location of the leader for each partition. Kafka metadata is used by clients to discover the available brokers and topics, and to determine which broker is the leader for a particular partition. This information is essential for clients to be able to send and receive messages to and from Kafka. Kafka Streams : Kafka Streams is a stream processing library for building streaming applications and microservices that transform (filter, group mapping, aggregate, join, and more) incoming event streams in real-time to Kafka topics stored in an Kafka cluster. The Streams API can be used to build applications that process data in real-time, analyze data continuously, and build data pipelines. **Related content** - [Kafka Streams](/platform/current/streams/overview.html) Kafka topic : See *topic*. key encryption key (KEK) : A key encryption key (KEK) is a master key that is used to encrypt and a decrypt other keys, specifically the data encryption key (DEK). Only users with access to the KEK can decrypt the DEK and access the sensitive data. **Related terms**: *data encryption key (DEK)*, *envelope encryption*. Kora : Kora is the cloud-native streaming data service based on Kafka technology that powers the Confluent Cloud event streaming platform for building real-time data pipelines and streaming applications. Kora abstracts low-level resources, such as Kafka brokers, and hides operational complexities, such as system upgrades. Kora is built on the following foundations: a tiered storage layer that improves cost and performance, elasticity and consistent performance through incremental load balancing, cost effective multi-tenancy with dynamic quota management and cell-based isolation, continuous monitoring of both system health and data integrity, and clean abstraction with standard Kafka protocols and CKUs to hide underlying resources. **Related terms**: *Apache Kafka*, *Confluent Cloud*, *Confluent Unit for Kafka (CKU)* **Related content** - [Kora: The Cloud Native Engine for Apache Kafka](https://www.confluent.io/blog/cloud-native-data-streaming-kafka-engine/) - [Kora: A Cloud-Native Event Streaming Platform For Kafka](https://www.vldb.org/pvldb/vol16/p3822-povzner.pdf) KRaft : KRaft (or Apache Kafka Raft) is a consensus protocol introduced in Kafka 2.4 to provide metadata management for Kafka with the goal to replace ZooKeeper. KRaft simplifies Kafka because it enables the management of metadata in Kafka itself, rather than splitting it between ZooKeeper and Kafka. As of Confluent Platform 7.5, KRaft is the default method of metadata management in new deployments. For more information, see [KRaft overview](/platform/current/kafka-metadata/kraft.html). ksqlDB : ksqlDB is a streaming SQL database engine purpose-built for creating stream processing applications on top of Kafka. logical Kafka cluster (LKC) : A logical Kafka cluster (LKC) is a subset of a physical Kafka cluster (PKC) that is isolated from other logical clusters within Confluent Cloud. Each logical unit of isolation is considered a tenant and maps to a specific organization. If the mapping is one-to-one, one LKC maps to one PKC (a Dedicated cluster). If the mapping is many-to-one, one LKC maps to one of the multitenant Kafka cluster types (Basic, Standard, Enterprise, and Freight). **Related terms**: *Confluent Cloud*, *Kafka cluster*, *physical Kafka cluster (PKC)* **Related content** - [Kafka Cluster Types in Confluent Cloud](/cloud/current/clusters/cluster-types.html) - [Multi-tenancy and Client Quotas on Confluent Cloud](/cloud/current/clusters/client-quotas.html) multi-region cluster (MRC) : A multi-region cluster (MRC) is a single Kafka cluster that replicates data between datacenters across regional availability zones. multi-tenancy : Multi-tenancy is a software architecture in which a single physical instance is shared among multiple logical instances, or tenants. In Confluent Cloud, each Basic, Standard, Enterprise, and Freight cluster is a logical Kafka cluster (LKC) that shares a physical Kafka cluster (PKC) with other tenants. Each LKC is isolated from other L and has its own resources, such as memory, compute, and storage. **Related terms**: *Confluent Cloud*, *logical Kafka cluster (LKC)*, *physical Kafka cluster (PKC)* **Related content** * [From On-Prem to Cloud-Native: Multi-Tenancy in Confluent Cloud](https://www.confluent.io/blog/cloud-native-multi-tenant-kafka-with-confluent-cloud/) * [Multi-tenancy and Client Quotas on Confluent Cloud](/cloud/current/clusters/client-quotas.html) offset : An offset is an integer assigned to each message that uniquely represents its position within the data stream, guaranteeing the ordering of records and allowing offset-based connections to replay messages from any point in time. **Related terms**: *consumer offset*, *connector offset*, *offset commit*, *replayability* offset commit : An offset commit is the process of keeping track of the current position of an offset-based connection (primarily Kafka consumers and connectors) within the data stream. The offset commit process is not specific to consumers, producers, or connectors. It is a general mechanism in Kafka to track the position of any application that is reading data. When a consumer commits an offset, the offset identifies the next message the consumer should consume. For example, if a consumer has an offset of 5, it has consumed messages 0 through 4 and will next consume message 5. If the consumer crashes or is shut down, its partitions are reassigned to another consumer which initiates consuming from the last committed offset of each partition. The committed offset for consumers is stored on a Kafka broker. When a consumer commits an offset, it sends a commit request to the Kafka cluster, specifying the partition and offset it wants to commit for a particular consumer group. The Kafka broker receiving the commit request then stores this offset in the `__consumer_offsets` internal topic. **Related terms**: *consumer offset*, *offset* OpenSSL : OpenSSL is an open-source software library and toolkit that implements the Secure Sockets Layer (SSL) and Transport Layer Security (TLS) protocols. For more information, see [Wikipedia](https://en.wikipedia.org/wiki/OpenSSL). parent cluster : The Kafka cluster that a resource belongs to. **Related terms**: *Kafka cluster* partition : A partition is a unit of data storage that divides a topic into multiple, parallel event streams, each of which is stored on separate Kafka brokers and can be consumed independently. Partitioning is a key concept in Kafka because it allows Kafka to scale horizontally by adding more brokers to the cluster. Partitions are also the unit of parallelism in Kafka. A topic can have one or more partitions, and each partition is an ordered, immutable sequence of event records that is continually appended to a partition log. partitions (pre-replication) : In Confluent Cloud, partitions are a Kafka cluster billing dimension that define the maximum number of partitions that can exist on the cluster at one time, before replication. While you are not charged for partitions on any type of Kafka cluster, the number of partitions you use has an impact on eCKU. To determine eCKUs limits for partitions, Confluent Cloud bills only for pre-replication (leader partitions) across a cluster. All topics that you create (as well as internal topics that are automatically created by Confluent Platform components such as ksqlDB, Kafka Streams, Connect, and Control Center (Legacy)) count towards the cluster partition limit. Confluent prefixes topics created automatically with an underscore (_). Topics that are internal to Kafka itself (consumer offsets) are not visible in Cloud Console and do not count against partition limits or toward partition billing. Available in the Metrics API as `partition_count`. In Confluent Cloud, attempts to create additional partitions beyond the cluster limit fail with an error message. To reduce usage on partitions (pre-replication), delete unused topics and create new topics with fewer partitions. Use the Kafka Admin interface to increase the partition count of an existing topic if the initial partition count is too low. physical Kafka cluster (PKC) : A physical Kafka cluster (PKC) is a Kafka cluster comprised of multiple brokers. Each physical Kafka cluster is created on a Kubernetes cluster by the control plane. A PKC is not directly accessible by clients. principal : A principal is an entity that can be authenticated and granted permissions based on roles to access resources and perform operations. An entity can be a user account, service account, group mapping, or identity pool. **Related terms**: *group mapping*, *identity*, *identity pool*, *role*, *service account*, *user account* private internet : A private internet is a closed, restricted computer network typically used by organizations to provide secure environments for managing sensitive data and resources. processing time : Processing time is the time when an event is processed or recorded by a system, as opposed to the time when the event occurred on the producing device. Processing time is often used in stream processing to determine the order of events and to perform windowing operations. producer : A producer is a client application that publishes (writes) data to a topic in an Kafka cluster. Producers write data to a topic and are the only clients that can write data to a topic. Each record written to a topic is appended to the partition of the topic that is selected by the producer. Producer API : The Producer API is the Kafka API that allows you to write data to a topic in an Kafka cluster. The Producer API is used by producer clients to publish data to a topic in an Kafka cluster. Protobuf : Protobuf (or Protocol Buffers) is an open-source data format used to serialize structured data for storage. **Related terms**: *data serialization*, *deserializer*, *serializer* **Related content** - [Protocol Buffers](https://protobuf.dev/) - [Getting Started with Protobuf in Confluent Cloud](https://www.confluent.io/blog/using-protobuf-in-confluent-cloud/) - Confluent Cloud: [Protobuf Serializer and Deserializer](/cloud/current/sr/fundamentals/serdes-develop/serdes-protobuf.html) - Confluent Platform: [Protobuf Serializer and Deserializer](/platform/current/schema-registry/fundamentals/serdes-develop/serdes-protobuf.html) public internet : The public internet is the global system of interconnected computers and networks that use TCP/IP to communicate with each other. rebalancing : Rebalancing is the process of redistributing the partitions of a topic among the consumers of a consumer group for improved performance and scalability. A rebalance can occur if a consumer has failed the heartbeat and has been excluded from the group, it voluntarily left the group, metadata has been updated for a consumer, or a consumer has joined the group. replayability : Replayability is the ability to replay messages from any point in time. **Related terms**: *consumer offset*, *offset*, *offset commit* replication : Replication is the process of creating and maintaining multiple copies (or *replicas*) of data across different nodes in a distributed system to increase availability, reliability, redundancy, and accessibility. replication factor : A replication factor is the number of copies of a partition that are distributed across the brokers in a cluster. requests : In Confluent Cloud, requests are a Kafka cluster billing dimension that defines the number of client requests to the cluster in one second. Available in the Metrics API as `request_count`. To reduce usage on requests, you can adjust producer batching configurations, consumer client batching configurations, and shut down otherwise inactive clients. For Dedicated clusters, a high number of requests per second results in increased load on the cluster. role : A role is a Confluent-defined job function assigned a set of permissions required to perform specific actions or operations on Confluent resources bound to a principal and Confluent resources. A role can be assigned to a user account, group mapping, service account, or identity pool. **Related terms**: *group mapping*, *identity*, *identity pool*, *principal*, *service account* **Related content** - [Predefined RBAC Roles in Confluent Cloud](/cloud/current/access-management/access-control/rbac/predefined-rbac-roles.html) - [Role-Based Access Control Predefined Roles in Confluent Platform](/platform/current/security/rbac/rbac-predefined-roles.html) rolling restart : A rolling restart restarts the brokers in a Kafka cluster with zero downtime by incrementally restarting a Kafka broker after verifying that there are no under-replicated partitions on the broker before proceeding to the next broker. Restarting the brokers one at a time allows for software upgrades, broker configuration updates, or cluster maintenance while maintaining high availability by avoiding downtime. **Related content** - [Rolling restart](/platform/current/kafka/post-deployment.html#rolling-restart) schema : A schema is the structured definition or blueprint used to describe the format and structure event messages sent through the Kafka event streaming platform. Schemas are used to validate the structure of data in event messages and ensures that producers and consumers are sending and receiving data in the same format. Schemas are defined in the Schema Registry. Schema Registry : Schema Registry is a centralized repository for managing and validating schemas for topic message data that stores and manages schemas for Kafka topics. Schema Registry is built into Confluent Cloud as a managed service, available with the Advanced Stream Governance package, and offered as part of Confluent Enterprise for self-managed deployments. The Schema Registry is a RESTful service that stores and manages schemas for Kafka topics. The Schema Registry is integrated with Kafka and Connect to provide a central location for managing schemas and validating data. Producers and consumers to Kafka topics use schemas to ensure data consistency and compatibility as schemas evolve. Schema Registry is a key component of Stream Governance. **Related content** - Confluent Cloud: [Manage Schemas in Confluent Cloud](/cloud/current/sr/schemas-manage.html) - Confluent Platform: [Schema Registry Overview](/platform/current/schema-registry/index.html) schema subject : A schema subject is the namespace for a schema in Schema Registry. This unique identifier defines a logical grouping of related schemas. Kafka topics contain event messages serialized and deserialized using the structure and rules defined in a schema subject. This ensures compatibility and supports schema evolution. **Related content** - Confluent Cloud: [Manage Schemas in Confluent Cloud](/cloud/current/sr/schemas-manage.html) - Confluent Platform: [Schema Registry Concepts](/platform/current/schema-registry/index.html) - [Understanding Schema Subjects](https://developer.confluent.io/courses/schema-registry/schema-subjects/) Serdes : Serdes are serializers and deserializers that convert objects and parallel data into a serial byte stream for efficient storage and high-speed data transmission over the wire. Confluent provides Serdes for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) - [Serde](https://serde.rs/) serializer : A serializer is a tool that converts objects and parallel data into a serial byte stream. Serializers work with deserializers (known together as Serdes) to support efficient storage and high-speed data transmission over the wire. Confluent provides serializers for schemas in Avro, Protobuf, and JSON Schema formats. **Related content** - Confluent Cloud: [Formats, Serializers, and Deserializers](/cloud/current/sr/fundamentals/serdes-develop/index.html) - Confluent Platform: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) service account : A service account is a non-person entity used by an application or service to access resources and perform operations. Because a service account is an identity independent of the user who created it, it can be used programmatically to authenticate to resources and perform operations without the need for a user to be signed in. **Related content** - [Service Accounts for Confluent Cloud](/cloud/current/access-management/identity/service-accounts.html) service quota : A service quota is the limit, or maximum value, for a specific Confluent Cloud resource or operation that might vary by the resource scope it applies to. **Related content** - [Service Quotas for Confluent Cloud](/cloud/current/quotas/index.html) single message transform (SMT) : A single message transform (SMT) is a transformation or operation applied in realtime on an individual message that changes the values, keys, or headers of a message before being sent to a sink connector or after being read from a source connector. SMTs are convenient for inserting fields, masking information, event routing, and other minor data adjustments. single sign-on (SSO) : Single sign-on (SSO) is a centralized authentication service that allows users to use a single set of credentials to log in to multiple applications or services. **Related terms**: *authentication*, *group mapping*, *identity provider* **Related content** - [Single Sign-On for Confluent Cloud](/cloud/current/access-management/authenticate/sso/index.html) sink connector : A sink connector is a Kafka Connect connector that publishes (writes) data from a Kafka topic to an external system. source connector : A source connector is a Kafka Connect connector that subscribes (reads) data from a source (external system), extracts the payload and schema of the data, and publishes (writes) the data to Kafka topics. standalone : Standalone refers to a configuration in which a software application, system, or service operates independently on a single instance or device. This mode is commonly used for development, testing, and debugging purposes. For Kafka Connect, a standalone worker is a single process responsible for running all connectors and tasks on a single instance. Standard Kafka cluster : A Confluent Cloud cluster type. Standard Kafka clusters are designed for production-ready features and functionality. static egress IP address : A static egress IP address is an IP address used by a Confluent Cloud managed connector to establish outbound connections to endpoints of external data sources and sinks over the public internet. **Related content** - [Use Static IP Addresses on Confluent Cloud for Connectors and Cluster Linking](/cloud/current/networking/static-egress-ip-addresses.html) - [Static Egress IP Addresses for Confluent Cloud Connectors](/cloud/current/connectors/static-egress-ip.html) storage (pre-replication) : In Confluent Cloud, storage is a Kafka cluster billing dimension that defines the number of bytes retained on the cluster, pre-replication. Available in the Metrics API as `retained_bytes` (convert from bytes to TB). The returned value is pre-replication. Standard, Enterprise, Dedicated, and Freight clusters support Infinite Storage. This means there is no maximum size limit for the amount of data that can be stored on the cluster. You can configure policy settings `retention.bytes` and `retention.ms` at the topic level to control exactly how much and how long to retain data in a way that makes sense for your applications and helps control your costs. To reduce storage in Confluent Cloud, compress your messages and reduce retention settings. For compression, use lz4. Avoid gzip because of high overhead on the cluster. Stream Catalog : Stream Catalog is a pillar of Confluent Cloud Stream Governance that provides a centralized inventory of your organization’s data assets that supports data governance and data discovery. With Data Portal in Confluent Cloud Console, users can find event streams across systems, search topics by name or tags, and enrich event data to increase value and usefulness. REST and GraphQL APIs can be used to search schemas, apply tags to records or fields, manage business metadata, and discover relationships across data assets. **Related content** - [Stream Catalog on Confluent Cloud: User Guide to Manage Tags and Metadata](/cloud/current/stream-governance/stream-catalog.html) - [Stream Catalog in Streaming Data Governance (Confluent Developer course)](https://developer.confluent.io/courses/governing-data-streams/stream-catalog/) Stream Governance : Stream Governance is a collection of tools and features that provide data governance for data in motion. These include data quality tools such as Schema Registry, schema ID validation, and schema linking; built-in data catalog capabilities to classify, organize, and find event streams across systems; and stream lineage to visualize complex data relationships and uncover insights with interactive, end-to-end maps of event streams. Taken together, these and other governance tools enable teams to manage the availability, integrity, and security of data used across organizations, and help with standardization, monitoring, collaboration, reporting, and more. **Related terms**: *Stream Catalog*, *Stream Lineage* **Related content** - [Stream Governance on Confluent Cloud](/cloud/current/stream-governance/index.html) stream lineage : Stream lineage is the life cycle, or history, of data, including its origins, transformations, and consumption, as it moves through various stages in data pipelines, applications, and systems. Stream lineage provides a record of data’s journey from its source to its destination, and is used to track data quality, data governance, and data security. **Related terms**: **Data Portal**, *Stream Governance* **Related content** - [Stream Lineage on Confluent Cloud](/cloud/current/stream-governance/stream-lineage.html) stream processing : Stream processing is the method of collecting event stream data in real-time as it arrives, transforming the data in real-time using operations (such as filters, joins, and aggregations), and publishing the results to one or more target systems. Stream processing can be used to analyze data continuously, build data pipelines, and process time-sensitive data in real-time. Using the Confluent event streaming platform, event streams can be processed in real-time using Kafka Streams, Kafka Connect, or ksqlDB. Streams API : The Streams API is the Kafka API that allows you to build streaming applications and microservices that transform (for example, filter, group, aggregate, join) incoming event streams in real-time to Kafka topics stored in a Kafka cluster. The Streams API is used by stream processing clients to process data in real-time, analyze data continuously, and build data pipelines. **Related content** - [Introduction Kafka Streams API](/platform/current/streams/introduction.html) throttling : Throttling is the process Kafka clusters in Confluent Cloud use to protect themselves from getting to an over-utilized state. Also known as backpressure, throttling in Confluent Cloud occurs when cluster load reaches 80%. At this point, applications may start seeing higher latencies or timeouts as the cluster must begin throttling requests or connection attempts. topic : A topic is a user-defined category or feed name where event messages are stored and published by producers and subscribed to by consumers. Each topic is a log of event messages. Topics are stored in one or more partitions, which distribute topic records brokers in a Kafka cluster. Each partition is an ordered, immutable sequence of records that are continually appended to a topic. **Related content** - [Manage Topics in Confluent Cloud](/cloud/current/client-apps/topics/index.html) total client connections : In Confluent Cloud, total client connections are a Kafka cluster billing dimension that defines the number of TCP connections to the cluster you can open at one time. Available in the Metrics API as `active_connection_count`. Filter by principal to understand how many connections each application is creating. How many connections a cluster supports can vary widely based on several factors, including number of producer clients, number of consumer clients, partition keying strategy, produce patterns per client, and consume patterns per client. For Dedicated clusters, Confluent derives a guideline for total client connections from benchmarking that indicates exceeding this number of connections increases produce latency for test clients. However, this does not apply to all workloads. That is why total client connections are a guideline, not a hard limit for Dedicated Kafka clusters. Monitor the impact on cluster load as connection count increases, as this is the final representation of the impact of a given workload or CKU dimension on the underlying resources of the cluster. Consider the Confluent guideline a per-CKU guideline. The number of connections tends to increase when you add brokers. In other words, if you significantly exceed the per-CKU guideline, cluster expansion doesn’t always give your cluster more connection count headroom. Transport Layer Security (TLS) : Transport Layer Security (TLS) is a cryptographic protocol that provides secure communication over a network. For more information, see [Wikipedia](https://en.wikipedia.org/wiki/Transport_Layer_Security). unbounded stream : An unbounded stream is a stream of data that is continuously generated in real-time and has no defined end. Examples of unbounded streams include stock prices, sensor data, and social media feeds. Processing unbounded streams requires a different approach than processing bounded streams. Unbounded streams are processed incrementally as data arrives, while bounded streams are processed as a batch after all data has arrived. Kafka Streams and Flink can be used to process unbounded streams. **Related terms**: *bounded stream*, *stream processing*, *unbounded stream* under replication : Under replication is a situation when the number of in-sync replicas is below the number of all replicas. Under Replicated partitions can occur when a broker is down or cannot replicate fast enough from the leader (replica fetcher lag). user account : A user account is an account representing the identity of a person who can be authenticated and granted access to Confluent Cloud resources. **Related content** - [User Accounts for Confluent Cloud](/cloud/current/access-management/identity/user-accounts/overview.html) watermark : A watermark in Flink is a marker that keeps track of time as data is processed. A watermark means that all records until the current moment in time have been “seen”. This way, Flink can correctly perform tasks that depend on when things happened, like calculating aggregations over time windows. **Related content** - [Time and Watermarks](/cloud/current/flink/concepts/timely-stream-processing.html) #### Migrate Schemas To migrate Schema Registry and associated schemas to Confluent Cloud, follow these steps: 1. Start the origin cluster. If you are running a local cluster; for example, from a [Quick Start for Confluent Platform](../../get-started/platform-quickstart.md#quickstart) download, start only Schema Registry for the purposes of this tutorial using the Confluent CLI [confluent local](https://docs.confluent.io/confluent-cli/current/command-reference/local/index.html) commands. ```bash confluent local services schema-registry start ``` 2. Verify that `schema-registry`, `kafka`, and KRaft are running. For example, run `confluent local services status`: ```none Schema Registry is [UP] Kafka is [UP] Zookeeper is [UP] ``` 3. Verify that no subjects exist on the destination Schema Registry in Confluent Cloud. ```bash curl -u : /subjects ``` If no subjects exist, your output will be empty (`[]`), which is what you want. If subjects exist, delete them. For example: ```bash curl -X DELETE -u : /subjects/my-existing-subject ``` 4. Set the destination Schema Registry to IMPORT mode. For example: ```bash curl -u : -X PUT -H "Content-Type: application/json" "https:///mode" --data '{"mode": "IMPORT"}' ``` 5. Configure a Replicator worker to specify the addresses of brokers in the destination cluster, as described in [Configure and run Replicator](../../multi-dc-deployments/replicator/replicator-quickstart.md#config-and-run-replicator). The worker configuration file is in `CONFLUENT_HOME/etc/kafka/connect-standalone.properties`. ```properties # Connect Standalone Worker configuration bootstrap.servers=:9092 ``` 6. Configure [Replicator](../../multi-dc-deployments/replicator/replicator-quickstart.md#replicator-quickstart) with Schema Registry and destination cluster information. - For stand-alone Connect instance, configure the following properties in `CONFLUENT_HOME/etc/kafka-connect-replicator/quickstart-replicator.properties`: ```properties # basic connector configuration name=replicator-source connector.class=io.confluent.connect.replicator.ReplicatorSourceConnector key.converter=io.confluent.connect.replicator.util.ByteArrayConverter value.converter=io.confluent.connect.replicator.util.ByteArrayConverter header.converter=io.confluent.connect.replicator.util.ByteArrayConverter tasks.max=4 # source cluster connection info src.kafka.bootstrap.servers=localhost:9092 # destination cluster connection info dest.kafka.ssl.endpoint.identification.algorithm=https dest.kafka.sasl.mechanism=PLAIN dest.kafka.request.timeout.ms=20000 dest.kafka.bootstrap.servers=:9092 retry.backoff.ms=500 dest.kafka.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="" password=""; dest.kafka.security.protocol=SASL_SSL # Schema Registry migration topics to replicate from source to destination # topic.whitelist indicates which topics are of interest to replicator topic.whitelist=_schemas # schema.registry.topic indicates which of the topics in the ``whitelist`` contains schemas schema.registry.topic=_schemas # Connection settings for destination Confluent Cloud Schema Registry schema.registry.url=https:// schema.registry.client.basic.auth.credentials.source=USER_INFO schema.registry.client.basic.auth.user.info=: ``` - If your clusters have TLS/SSL enabled, you must set the TLS/SSL configurations as appropriate for Schema Registry clients. ```properties # TLS/SSL configurations for clients to Schema Registry schema.registry.client.schema.registry.ssl.truststore.location schema.registry.client.schema.registry.ssl.truststore.type schema.registry.client.schema.registry.ssl.truststore.password schema.registry.client.schema.registry.ssl.keystore.location schema.registry.client.schema.registry.ssl.keystore.type schema.registry.client.schema.registry.ssl.keystore.password schema.registry.client.schema.registry.ssl.key.password ``` 7. In `quickstart-replicator.properties`, the replication factor is set to `1` for demo purposes on a development cluster with one broker. For this schema migration tutorial, and in production, change this to at least `3`: ```none confluent.topic.replication.factor=3 ``` #### SEE ALSO For an example of a JSON configuration for Replicator in distributed mode, see [submit_replicator_schema_migration_config.sh](https://github.com/confluentinc/examples/tree/latest/ccloud/connectors/submit_replicator_schema_migration_config.sh) on GitHub [examples repository](https://github.com/confluentinc/examples). 8. Start Replicator so that it can perform the schema migration. For example: ```bash connect-standalone ${CONFLUENT_HOME}/etc/kafka/connect-standalone.properties \ ${CONFLUENT_HOME}/etc/kafka-connect-replicator/quickstart-replicator.properties ``` The method or commands you use to start Replicator is dependent on your application setup, and may differ from this example. For more information, see [Tutorial: Configure and Run Replicator for Confluent Platform as an Executable or Connector](../../multi-dc-deployments/replicator/replicator-run.md#replicator-run) and [Configure and run Replicator](../../multi-dc-deployments/replicator/replicator-quickstart.md#config-and-run-replicator). 9. Stop all producers that are producing to Kafka. 10. Wait until the replication lag is 0. For more information, see [Monitoring Replicator lag (Legacy versions only)](../../multi-dc-deployments/replicator/replicator-monitoring.md#monitor-replicator-lag). 11. Stop Replicator. 12. Enable mode changes in the self-managed source Schema Registry properties file by adding the following to the configuration and restarting. ```none mode.mutability=true ``` #### IMPORTANT Modes are only supported starting with version 5.2 of Schema Registry. This step and the one following (set Schema Registry to READYONLY) are precautionary and not strictly necessary. If using version 5.1 of Schema Registry or earlier, you can skip these two steps if you make certain to stop all producers so that no further schemas are registered in the source Schema Registry. 13. Set the source Schema Registry to READONLY mode. ```bash curl -u : -X PUT -H "Content-Type: application/json" "https:///mode" --data '{"mode": "READONLY"}' ``` 14. Set the destination Schema Registry to READWRITE mode. ```bash curl -u : -X PUT -H "Content-Type: application/json" "https:///mode" --data '{"mode": "READWRITE"}' ``` 15. Stop all consumers. 16. Configure all consumers to point to the destination Schema Registry in the cloud and restart them. For example, if you are configuring Schema Registry in a Java client, change Schema Registry URL from source to destination either in the code or in a properties file that specifies the Schema Registry URL, type of authentication USER_INFO, and credentials). For more examples, see [Java Consumers](../schema_registry_onprem_tutorial.md#sr-tutorial-java-consumers). 17. Configure all producers to point to the destination Schema Registry in the cloud and restart them. For more examples, see [Java Producers](../schema_registry_onprem_tutorial.md#sr-tutorial-java-producers). 18. (Optional) Stop the source Schema Registry. # Kafka Streams for Confluent Platform * [Overview](overview.md) * [Quick Start](quickstart.md) * [Streams API](introduction.md) * [Tutorial: Streaming Application Development Basics on Confluent Platform](microservices-orders.md) * [Connect Streams to Confluent Cloud](https://docs.confluent.io/cloud/current/cp-component/streams-cloud-config.html) * [Concepts](concepts.md) * [Architecture](architecture.md) * [Examples](code-examples.md) * [Developer Guide](developer-guide/index.md) * [Overview](developer-guide/overview.md) * [Write a Streams Application](developer-guide/write-streams.md) * [Configure](developer-guide/config-streams.md) * [Run a Streams Application](developer-guide/running-app.md) * [Test](developer-guide/test-streams.md) * [Domain Specific Language](developer-guide/dsl-api.md) * [Name Domain Specific Language Topologies](developer-guide/dsl-topology-naming.md) * [Optimize Topologies](developer-guide/optimizing-streams.md) * [Processor API](developer-guide/processor-api.md) * [Data Types and Serialization](developer-guide/datatypes.md) * [Interactive Queries](developer-guide/interactive-queries.md) * [Memory](developer-guide/memory-mgmt.md) * [Manage Application Topics](developer-guide/manage-topics.md) * [Security](developer-guide/security.md) * [Reset Streams Applications](developer-guide/app-reset-tool.md) * [Build Pipeline with Connect and Streams](connect-streams-pipeline.md) * [Operations](operations.md) * [Metrics](kafka-streams-metrics.md) * [Monitor Kafka Streams Applications in Confluent Platform](monitoring.md) * [Integration with Confluent Control Center](monitoring.md#integration-with-c3) * [Plan and Size](sizing.md) * [Upgrade](upgrade-guide.md) * [Frequently Asked Questions](faq.md) * [Javadocs](javadocs.md) * [ksqlDB](../ksqldb/index.md) * [Overview](../ksqldb/overview.md) * [Quick Start](../ksqldb/quickstart.md) * [Install](../ksqldb/installing.md) * [Operate](../ksqldb/operations.md) * [Upgrade](../ksqldb/upgrading.md) * [Concepts](../ksqldb/concepts/index.md) * [Overview](../ksqldb/concepts/overview.md) * [Kafka Primer](../ksqldb/concepts/apache-kafka-primer.md) * [Connectors](../ksqldb/concepts/connectors.md) * [Events](../ksqldb/concepts/events.md) * [Functions](../ksqldb/concepts/functions.md) * [Lambda Functions](../ksqldb/concepts/lambda-functions.md) * [Materialized Views](../ksqldb/concepts/materialized-views.md) * [Queries](../ksqldb/concepts/queries.md) * [Streams](../ksqldb/concepts/streams.md) * [Stream Processing](../ksqldb/concepts/stream-processing.md) * [Tables](../ksqldb/concepts/tables.md) * [Time and Windows in ksqlDB Queries](../ksqldb/concepts/time-and-windows-in-ksqldb-queries.md) * [How-to Guides](../ksqldb/how-to-guides/index.md) * [Overview](../ksqldb/how-to-guides/overview.md) * [Control the Case of Identifiers](../ksqldb/how-to-guides/control-the-case-of-identifiers.md) * [Convert a Changelog to a Table](../ksqldb/how-to-guides/convert-changelog-to-table.md) * [Create a User-defined Function](../ksqldb/how-to-guides/create-a-user-defined-function.md) * [Manage Connectors](../ksqldb/how-to-guides/use-connector-management.md) * [Query Structured Data](../ksqldb/how-to-guides/query-structured-data.md) * [Test an Application](../ksqldb/how-to-guides/test-an-app.md) * [Update a Running Persistent Query](../ksqldb/how-to-guides/update-a-running-persistent-query.md) * [Use Variables in SQL Statements](../ksqldb/how-to-guides/substitute-variables.md) * [Use a Custom Timestamp Column](../ksqldb/how-to-guides/use-a-custom-timestamp-column.md) * [Use Lambda Functions](../ksqldb/how-to-guides/use-lambda-functions.md) * [Develop Applications](../ksqldb/developer-guide/index.md) * [Overview](../ksqldb/developer-guide/overview.md) * [Joins](../ksqldb/developer-guide/joins/index.md) * [Reference](../ksqldb/developer-guide/ksqldb-reference/index.md) * [REST API](../ksqldb/developer-guide/ksqldb-rest-api/index.md) * [Java Client](../ksqldb/developer-guide/java-client/java-client.md) * [Operate and Deploy](../ksqldb/operate-and-deploy/index.md) * [Overview](../ksqldb/operate-and-deploy/overview.md) * [Installation](../ksqldb/operate-and-deploy/installation/index.md) * [ksqlDB Architecture](../ksqldb/operate-and-deploy/how-it-works.md) * [Capacity Planning](../ksqldb/operate-and-deploy/capacity-planning.md) * [Changelog](../ksqldb/operate-and-deploy/changelog.md) * [Processing Guarantees](../ksqldb/operate-and-deploy/processing-guarantees.md) * [High Availability](../ksqldb/operate-and-deploy/high-availability.md) * [High Availability Pull Queries](../ksqldb/operate-and-deploy/high-availability-pull-queries.md) * [KSQL versus ksqlDB](../ksqldb/operate-and-deploy/ksql-vs-ksqldb.md) * [Logging](../ksqldb/operate-and-deploy/logging.md) * [Manage Metadata Schemas](../ksqldb/operate-and-deploy/migrations-tool.md) * [Monitoring](../ksqldb/operate-and-deploy/monitoring.md) * [Performance Guidelines](../ksqldb/operate-and-deploy/performance-guidelines.md) * [Schema Inference With ID](../ksqldb/operate-and-deploy/schema-inference-with-id.md) * [Schema Inference](../ksqldb/operate-and-deploy/schema-registry-integration.md) * [Reference](../ksqldb/reference/index.md) * [Overview](../ksqldb/reference/overview.md) * [SQL](../ksqldb/reference/sql/index.md) * [Metrics](../ksqldb/reference/metrics.md) * [Migrations Tool](../ksqldb/reference/migrations-tool-configuration.md) * [Processing Log](../ksqldb/reference/processing-log.md) * [Serialization Formats](../ksqldb/reference/serialization.md) * [Server Configuration Parameters](../ksqldb/reference/server-configuration.md) * [User-defined functions (UDFs)](../ksqldb/reference/user-defined-functions.md) * [Run ksqlDB in Confluent Cloud](https://docs.confluent.io/cloud/current/ksqldb/ksqldb-quick-start.html) * [Connect Local ksqlDB to Confluent Cloud](https://docs.confluent.io/cloud/current/cp-component/ksql-cloud-config.html) * [Connect ksqlDB to Control Center](../ksqldb/integrate-ksql-with-confluent-control-center.md) * [Secure ksqlDB with RBAC](../ksqldb/ksqldb-redirect.md) * [Frequently Asked Questions](../ksqldb/faq.md) * [Troubleshoot](../ksqldb/troubleshoot-ksql.md) * [Tutorials and Examples](../ksqldb/tutorials/index.md) ### Environment Setup 1. Use the [Quick Start for Confluent Platform](../get-started/platform-quickstart.md#quickstart) to bring up a single-node Confluent Platform development environment. With a single-line [confluent local](https://docs.confluent.io/confluent-cli/current/command-reference/local/index.html) command, you can have a basic Kafka cluster with Schema Registry, Control Center, and other services running on your local machine. ```bash confluent local start ``` Your output should resemble: ```bash Starting zookeeper zookeeper is [UP] Starting kafka kafka is [UP] Starting schema-registry schema-registry is [UP] Starting kafka-rest kafka-rest is [UP] Starting connect connect is [UP] Starting ksql-server ksql-server is [UP] Starting control-center control-center is [UP] ``` 2. Clone the Confluent [examples](https://github.com/confluentinc/examples) repo from GitHub and work in the `clients/avro/` subdirectory, which provides the sample code you will compile and run in this tutorial. ```bash git clone https://github.com/confluentinc/examples.git ``` ```bash cd examples/clients/avro ``` ```bash git checkout 8.1.0-post ``` 3. Create a local configuration file with all the Kafka and Schema Registry connection information that is running on your local machine, and save it to `$HOME/.confluent/java.config`, where [$HOME](https://en.wikipedia.org/wiki/Environment_variable#Syntax) represents your user home directory. It should resemble below: ```none # Required connection configs for Kafka producer, consumer, and admin bootstrap.servers={{ BROKER_ENDPOINT }} security.protocol=SASL_SSL sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username='{{ CLUSTER_API_KEY }}' password='{{ CLUSTER_API_SECRET }}'; sasl.mechanism=PLAIN # Required for correctness in Apache Kafka clients prior to 2.6 client.dns.lookup=use_all_dns_ips # Best practice for higher availability in Apache Kafka clients prior to 3.0 session.timeout.ms=45000 # Best practice for Kafka producer to prevent data loss acks=all # Required connection configs for Confluent Cloud Schema Registry schema.registry.url=https://{{ SR_ENDPOINT }} basic.auth.credentials.source=USER_INFO basic.auth.user.info={{ SR_API_KEY }}:{{ SR_API_SECRET }} ``` # CONFLUENT FOR KUBERNETES * [Overview](overview.md) * [Quick Start](co-quickstart.md) * [Plan for Deployment](co-plan.md) * [Prepare Kubernetes Cluster](co-prepare.md) * [Deploy CFK](co-deploy-cfk.md) * [Configure Confluent Platform](co-configure.md) * [Overview](co-configure-overview.md) * [Configure Storage](co-storage.md) * [Manage License](co-license.md) * [Use Custom Docker Registry](co-custom-registry.md) * [Configure CPU and Memory](co-resources.md) * [Configure Networking](co-networking.md) * [Overview](co-networking-overview.md) * [Configure Load Balancers](co-loadbalancers.md) * [Configure Node Ports](co-nodeports.md) * [Configure Port-Based Static Access](co-staticportbased.md) * [Configure Host-Based Static Access](co-statichostbased.md) * [Configure Routes](co-routes.md) * [Configure Security](co-security.md) * [Overview](co-security-overview.md) * [Authentication](co-authenticate.md) * [Authorization](co-authorize.md) * [Network Encryption](co-network-encryption.md) * [Security Compliance](co-security-compliance.md) * [Credentials and Certificates](co-credentials.md) * [Configure Pod Scheduling](co-schedule-workloads.md) * [Configure Connect](co-configure-connect.md) * [Configure Replicator](co-configure-replicator.md) * [Configure Rack Awareness](co-configure-rack-awareness.md) * [Configure REST Proxy](co-configure-rest-proxy.md) * [Configure KRaft](co-configure-kraft.md) * [Configure Unified Stream Manager](co-configure-usm.md) * [Advanced Configuration](co-configure-misc.md) * [Deploy Confluent Platform](co-deploy-cp.md) * [Manage Confluent Platform](co-manage-cp.md) * [Overview](co-manage-overview.md) * [Manage Flink](co-manage-flink.md) * [Manage Kafka Admin REST Class](co-manage-rest-api.md) * [Manage Kafka Topics](co-manage-topics.md) * [Manage Schemas](co-manage-schemas-index.md) * [Manage Schemas](co-manage-schemas.md) * [Link Schemas](co-link-schemas.md) * [Schema Registry Switchover using Unified Stream Manager](co-schema-registry-switchover.md) * [Manage Connectors](co-manage-connectors.md) * [Scale Clusters](co-scale-cluster.md) * [Scale Storage](co-scale-storage.md) * [Link Kafka Clusters](co-link-clusters.md) * [Manage Security](co-manage-security.md) * [Overview](co-manage-security-overview.md) * [Manage Authentication](co-manage-authentication.md) * [Manage RBAC](co-manage-rbac.md) * [Manage Certificates](co-manage-certificates.md) * [Manage Password Encoder Secret](co-password-encoder-secret.md) * [Restart Confluent Components](co-roll-cluster.md) * [Delete Confluent Deployment](co-delete-deployment.md) * [Manage Confluent Cloud](co-manage-ccloud.md) * [Monitor Confluent Platform](co-monitor-cp.md) * [Upgrade](co-upgrade.md) * [Upgrade Overview](co-upgrade-overview.md) * [Upgrade Confluent for Kubernetes](co-upgrade-cfk.md) * [Upgrade Confluent Platform](co-upgrade-cp.md) * [Migrate Zookeeper to KRaft](co-migrate-kraft.md) * [Migrate On-Premise Deployment to Confluent for Kubernetes](co-migrate-onprem.md) * [Migrate from Operator to Confluent for Kubernetes](co-migration.md) * [Deployment Scenarios](co-scenarios.md) * [Overview](co-scenarios-overview.md) * [Multi-AZ Deployment](co-multi-az.md) * [Multi-Region Deployment](co-multi-region.md) * [Hybrid Deployment with Confluent Cloud](co-hybrid.md) * [Troubleshoot](co-troubleshooting.md) * [Manage Confluent Gateway](gateway/co-gateway-index.md) * [Overview](gateway/co-gateway-overview.md) * [Deploy Confluent Gateway](gateway/co-gateway-deploy.md) * [Configure Security for Confluent Gateway](gateway/co-gateway-security.md) * [API Reference](co-api.md) * [Confluent Plugin Reference](co-plugin-cli-index.md) * [Release Notes](release-notes.md) * [Glossary](_glossary.md) ### Development and connectivity features To supplement Kafka’s Java APIs, and to help you connect all of your systems to Kafka, Confluent Platform provides the following features: - Confluent Connectors, which leverage the Kafka Connect API to connect Kafka to other systems such as databases, key-value stores, search indexes, and file systems. Confluent Hub has downloadable connectors for the most popular data sources and sinks. These include fully tested and supported versions of these connectors with Confluent Platform. See the following documentation for more information: - [How to Use Kafka Connect - Get Started](../connect/userguide.md#connect-userguide) - [Supported Self-Managed Connectors](../connect/supported.md#connect-bundled-connectors) - [Preview Self-Managed Connectors](../connect/preview.md#connect-preview-connectors) Confluent provides both commercial and community licensed connectors. For details, and to download connectors , see [Confluent Hub](https://www.confluent.io/hub/). - [Non-java clients](../clients/overview.md#kafka-clients) such as a [C/C++](/kafka-clients/librdkafka/current/overview.html), [Python](/kafka-clients/python/current/overview.html), [Go](/kafka-clients/go/current/overview.html), and [.NET client](/kafka-clients/dotnet/current/overview.html) libraries in addition to the Java client. These clients are full-featured and performant. For more information, see the [Build Streaming Applications on Confluent Platform](../clients/overview.md#kafka-clients). - A [REST Proxy](../kafka-rest/index.md#kafkarest-intro), which leverages the Admin API and makes it easy to work with Kafka from any language by providing a RESTful HTTP service for interacting with Kafka clusters. The REST Proxy supports all the admin core functionality: sending messages to Kafka, reading messages, both individually and as part of a consumer group, and inspecting cluster metadata, such as the list of topics and their settings. You get the full benefits of the high quality, officially maintained Java clients from any language. The REST Proxy also integrates with Schema Registry. Because it automatically translates JSON data to and from Avro, you can get all the benefits of centralized schema management from any language using only HTTP and JSON. - All of the Kafka command-line tools and additional tools, including the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/overview.html). You can find a list of all of these tools in [CLI Tools Bundled With Confluent Platform](../tools/cli-reference.md#cp-all-cli). - [Schema Registry](/platform/current/schema-registry/index.html), which provides a centralized repository for managing and validating schemas for topic message data, and for serialization and deserialization of data over a network. With a messaging service like Kafka, services that interact with each other must agree on a common format, called a schema, for messages. Schema Registry helps enable safe, zero-downtime evolution of schemas by centralizing schema management. It provides a RESTful interface for storing and retrieving Avro®, JSON Schema, and Protobuf schemas. Schema Registry tracks all versions of schemas and enables the evolution of schemas according to user-defined compatibility settings. Schema Registry also includes plugins for Kafka clients that handle schema storage and retrieval for Kafka messages that are sent in the Avro format. For more information, see the [Schema Registry Documentation](/platform/current/schema-registry/index.html). For a hands-on introduction to working with schemas, see the [On-Premises Schema Registry Tutorial](/platform/current/schema-registry/schema_registry_onprem_tutorial.html). For a deep dive into supported serialization and deserialization formats, see [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html). - [ksqlDB](../ksqldb/overview.md#ksql-home), a streaming SQL engine for Kafka. It provides an interactive SQL interface for stream processing on Kafka, without the need to write code in a programming language such as Java or Python. ksqlDB is scalable, elastic, fault-tolerant, and real-time. It supports a wide range of streaming operations, including data filtering, transformations, aggregations, joins, windowing, and sessionization. For more information, see the [ksqlDB Documentation](../ksqldb/overview.md#ksql-home), or the [ksqlDB Quick Start](../ksqldb/quickstart.md#ksqldb-quick-start). - A [MQTT Proxy](../kafka-mqtt/intro.md#kafka-mqtt-intro), which provides a way to publish data directly to Kafka from MQTT devices and gateways without the need for a MQTT Broker in the middle. For more information, see [MQTT Proxy](../kafka-mqtt/intro.md#kafka-mqtt-intro). # ksqlDB for Confluent Platform * [Overview](overview.md) * [Quick Start](quickstart.md) * [Install](installing.md) * [Operate](operations.md) * [Upgrade](upgrading.md) * [Concepts](concepts/index.md) * [Overview](concepts/overview.md) * [Kafka Primer](concepts/apache-kafka-primer.md) * [Connectors](concepts/connectors.md) * [Events](concepts/events.md) * [Functions](concepts/functions.md) * [Lambda Functions](concepts/lambda-functions.md) * [Materialized Views](concepts/materialized-views.md) * [Queries](concepts/queries.md) * [Streams](concepts/streams.md) * [Stream Processing](concepts/stream-processing.md) * [Tables](concepts/tables.md) * [Time and Windows in ksqlDB Queries](concepts/time-and-windows-in-ksqldb-queries.md) * [How-to Guides](how-to-guides/index.md) * [Overview](how-to-guides/overview.md) * [Control the Case of Identifiers](how-to-guides/control-the-case-of-identifiers.md) * [Convert a Changelog to a Table](how-to-guides/convert-changelog-to-table.md) * [Create a User-defined Function](how-to-guides/create-a-user-defined-function.md) * [Manage Connectors](how-to-guides/use-connector-management.md) * [Query Structured Data](how-to-guides/query-structured-data.md) * [Test an Application](how-to-guides/test-an-app.md) * [Update a Running Persistent Query](how-to-guides/update-a-running-persistent-query.md) * [Use Variables in SQL Statements](how-to-guides/substitute-variables.md) * [Use a Custom Timestamp Column](how-to-guides/use-a-custom-timestamp-column.md) * [Use Lambda Functions](how-to-guides/use-lambda-functions.md) * [Develop Applications](developer-guide/index.md) * [Overview](developer-guide/overview.md) * [Joins](developer-guide/joins/index.md) * [Reference](developer-guide/ksqldb-reference/index.md) * [REST API](developer-guide/ksqldb-rest-api/index.md) * [Java Client](developer-guide/java-client/java-client.md) * [Operate and Deploy](operate-and-deploy/index.md) * [Overview](operate-and-deploy/overview.md) * [Installation](operate-and-deploy/installation/index.md) * [ksqlDB Architecture](operate-and-deploy/how-it-works.md) * [Capacity Planning](operate-and-deploy/capacity-planning.md) * [Changelog](operate-and-deploy/changelog.md) * [Processing Guarantees](operate-and-deploy/processing-guarantees.md) * [High Availability](operate-and-deploy/high-availability.md) * [High Availability Pull Queries](operate-and-deploy/high-availability-pull-queries.md) * [KSQL versus ksqlDB](operate-and-deploy/ksql-vs-ksqldb.md) * [Logging](operate-and-deploy/logging.md) * [Manage Metadata Schemas](operate-and-deploy/migrations-tool.md) * [Monitoring](operate-and-deploy/monitoring.md) * [Performance Guidelines](operate-and-deploy/performance-guidelines.md) * [Schema Inference With ID](operate-and-deploy/schema-inference-with-id.md) * [Schema Inference](operate-and-deploy/schema-registry-integration.md) * [Reference](reference/index.md) * [Overview](reference/overview.md) * [SQL](reference/sql/index.md) * [Metrics](reference/metrics.md) * [Migrations Tool](reference/migrations-tool-configuration.md) * [Processing Log](reference/processing-log.md) * [Serialization Formats](reference/serialization.md) * [Server Configuration Parameters](reference/server-configuration.md) * [User-defined functions (UDFs)](reference/user-defined-functions.md) * [Run ksqlDB in Confluent Cloud](https://docs.confluent.io/cloud/current/ksqldb/ksqldb-quick-start.html) * [Connect Local ksqlDB to Confluent Cloud](https://docs.confluent.io/cloud/current/cp-component/ksql-cloud-config.html) * [Connect ksqlDB to Control Center](integrate-ksql-with-confluent-control-center.md) * [Secure ksqlDB with RBAC](ksqldb-redirect.md) * [Frequently Asked Questions](faq.md) * [Troubleshoot](troubleshoot-ksql.md) * [Tutorials and Examples](tutorials/index.md) # Confluent Platform Confluent Platform Documentation

An enterprise-grade distribution of Apache Kafka® that is available on-premises as self-managed software, complete with enterprise-grade security, stream processing, and governance tooling.

Quick Start Products Schema Registry Schema Registry provides a serving layer for your metadata. It provides a RESTful interface for storing and retrieving your Avro®, JSON Schema, and Protobuf schemas. Kafka Clients Clients make it fast and easy to produce and consume messages through Apache Kafka. Official Confluent clients are available for Java, along with librdkafka and derived clients. Kafka Connect Use connectors to stream data between Apache Kafka and other systems that you want to pull data from or push data to. Unified Stream Manager Unified Stream Manager provides a single, centralized interface to manage and monitor both self-managed Confluent Platform clusters and fully-managed Confluent Cloud clusters. Confluent Platform for Apache Flink Use Confluent Platform for Apache Flink® to run complex, stateful, and low-latency streaming applications. Confluent CLI Command line interface for administering your streaming service, including Apache Kafka topics, clusters, schemas, Connectors, security, billing, and more. Ansible Playbooks Automate configuration and deployment of Confluent Platform on multiple hosts. Confluent for Kubernetes Deploy and manage Confluent Platform as a cloud-native system on Kubernetes. Kafka Streams Build applications and microservices using Kafka Streams. Kafka APIs Apache Kafka provides APIs for Producer, Consumer, Streams, Connect, and Admin. Security Security protects your mission-critical services and data, end-to-end across all Confluent Platform components. Get started Kafka Configs The Kafka configuration reference provides broker, topic, producer, consumer, Connect, Streams, and AdminClient configuration properties. Kafka Connect Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other data systems. Authorization using ACLs Apache Kafka ships with a pluggable, out-of-the-box Authorizer implementation that uses Apache ZooKeeper to store all the ACLs. Learning resources Apache Kafka 101 Learn the fundamentals of Apache Kafka with this video course. Introduction to Kafka Connect Learn the fundamentals of Kafka Connect with this video course. Kafka Streams 101 Learn the fundamentals of Kafka Streams with this video course. ## Avro and Confluent Cloud Schema Registry This example is similar to the previous example, except the value is formatted as Avro and integrates with the Confluent Cloud Schema Registry. Before using Confluent Cloud Schema Registry, check its [availability and limits](https://docs.confluent.io/cloud/current/overview.html). 1. As described in the [Quick Start for Schema Management on Confluent Cloud](https://docs.confluent.io/cloud/current/get-started/schema-registry.html) in the Confluent Cloud Console, enable Confluent Cloud Schema Registry and create an API key and secret to connect to it. 2. Verify that your VPC can connect to the Confluent Cloud Schema Registry public internet endpoint. 3. Update your local configuration file (for example, at `$HOME/.confluent/java.config`) with parameters to connect to Schema Registry. - Template configuration file for Confluent Cloud ```none # Required connection configs for Kafka producer, consumer, and admin bootstrap.servers={{ BROKER_ENDPOINT }} security.protocol=SASL_SSL sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username='{{ CLUSTER_API_KEY }}' password='{{ CLUSTER_API_SECRET }}'; sasl.mechanism=PLAIN # Required for correctness in Apache Kafka clients prior to 2.6 client.dns.lookup=use_all_dns_ips # Best practice for higher availability in Apache Kafka clients prior to 3.0 session.timeout.ms=45000 # Best practice for Kafka producer to prevent data loss acks=all # Required connection configs for Confluent Cloud Schema Registry schema.registry.url=https://{{ SR_ENDPOINT }} basic.auth.credentials.source=USER_INFO basic.auth.user.info={{ SR_API_KEY }}:{{ SR_API_SECRET }} ``` - Template configuration file for local host ```none # Kafka bootstrap.servers=localhost:9092 # Confluent Schema Registry schema.registry.url=http://localhost:8081 ``` 4. Verify your Confluent Cloud Schema Registry credentials by listing the Schema Registry subjects. In the following example, substitute your values for `{{ SR_API_KEY }}`, `{{ SR_API_SECRET }}`, and `{{ SR_ENDPOINT }}`. ```text curl -u {{ SR_API_KEY }}:{{ SR_API_SECRET }} https://{{ SR_ENDPOINT }}/subjects ``` # Connect Local ksqlDB to Confluent Cloud You can connect ksqlDB to your Apache Kafka® cluster in Confluent Cloud. The ksqlDB servers must be configured to use Confluent Cloud. The ksqlDB CLI does not require configuration. **Prerequisites** - [Confluent Platform](/platform/current/installation/index.html) - [Confluent Cloud CLI](https://docs.confluent.io/confluent-cli/current/overview.html) 1. Use the Confluent CLI to log in to your Confluent Cloud cluster, and run the `confluent kafka cluster list` command to get the Kafka cluster ID. ```bash confluent kafka cluster list ``` Your output should resemble: ```none Id | Name | Type | Cloud | Region | Availability | Status +-------------+-------------------+--------------+----------+----------+--------------+--------+ lkc-a123b | ksqldb-quickstart | BASIC_LEGACY | gcp | us-west2 | multi-zone | UP ``` 2. Run the `confluent kafka cluster describe` command to get the endpoint for your Confluent Cloud cluster. ```bash confluent kafka cluster describe lkc-a123b ``` Your output should resemble: ```text +--------------+--------------------------------------------------------+ | Id | lkc-a123b | | Name | ksqldb-quickstart | | Type | BASIC | | Ingress | 100 | | Egress | 100 | | Storage | 5000 | | Cloud | azure | | Availability | single-zone | | Region | us-west2 | | Status | UP | | Endpoint | SASL_SSL://pkc-4s987.us-west2.gcp.confluent.cloud:9092 | | ApiEndpoint | https://pkac-42kz6.us-west2.gcp.confluent.cloud | +--------------+--------------------------------------------------------+ ``` Save the `Endpoint` value, which you’ll use in a later step. 3. Create a service account named `my-ksqldb-app`. You must include a description. ```bash confluent iam service-account create my-ksqldb-app --description "My ksqlDB API and secrets service account." ``` Your output should resemble: ```text +-------------+--------------------------------+ | Id | 123456 | | Resource ID | sa-efg123 | | Name | my-ksqldb-app | | Description | My ksqlDB API and secrets | | | service account. | +-------------+--------------------------------+ ``` Save the service account ID, which you’ll use in later steps. 4. Create an API key and secret for service account `123456`. Be sure to replace the service account ID and Kafka cluster ID values shown here with your own: ```bash confluent api-key create --service-account 123456 --resource lkc-a123b ``` Your output should resemble: ```text It may take a couple of minutes for the API key to be ready. Save the API key and secret. The secret is not retrievable later. +---------+------------------------------------------------------------------+ | API key | ABCXQHYDZXMMUDEF | | Secret | aBCde3s54+4Xv36YKPLDKy2aklGr6x/ShUrEX5D1Te4AzRlphFlr6eghmPX81HTF | +---------+------------------------------------------------------------------+ ``` #### IMPORTANT **Save the API key and secret.** You require this information to configure your client applications. Be aware that this is the *only* time that you can access and view the API key and secret. 5. Customize your `/etc/ksqldb/ksql-server.properties` properties file. The following example shows the minimum configuration required to use ksqlDB with Confluent Cloud. You should also review the [Recommended ksqlDB production settings](/platform/current/ksqldb/operate-and-deploy/installation/server-config.html). Replace `` and `` with the API key and secret that you generated previously. ```properties # For bootstrap.servers, assign the Endpoint value from the "confluent kafka cluster describe" command. # eg. pkc-4s087.us-west2.gcp.confluent.cloud:9092 bootstrap.servers= ksql.internal.topic.replicas=3 ksql.streams.replication.factor=3 ksql.logging.processing.topic.replication.factor=3 listeners=http://0.0.0.0:8088 security.protocol=SASL_SSL sasl.mechanism=PLAIN # Replace and with your API key and secret. sasl.jaas.config=\ org.apache.kafka.common.security.plain.PlainLoginModule required \ username="" \ password=""; ``` 6. (Optional) Add configs for Confluent Cloud Schema Registry per the example in [ksql-server-ccloud.delta](https://github.com/confluentinc/examples/tree/latest/ccloud/template_delta_configs/ksql-server-ccloud.delta) on GitHub at [ccloud/examples/template_delta_configs](https://github.com/confluentinc/examples/tree/latest/ccloud/template_delta_configs). ```properties # Confluent Schema Registry configuration for ksqlDB Server ksql.schema.registry.basic.auth.credentials.source=USER_INFO ksql.schema.registry.basic.auth.user.info=: ksql.schema.registry.url=https:// ``` 7. Restart the ksqlDB server. The steps to restart are [dependent on your environment](/platform/current/installation/installing_cp/index.html). For more information, [ksqlDB Configuration Parameter Reference](/platform/current/ksqldb/operate-and-deploy/installation/server-config.html). ### Configure and connect 1. Configure Schema Registry by modifying `etc/schema-registry/schema-registry.properties`. The minimally required Schema Registry property settings for Confluent Cloud are provided below: ```bash # If set to true, API requests that fail will include extra debugging information, including stack traces. debug=false # REQUIRED: Specifies the bootstrap servers for your Kafka cluster. It is used for selecting the primary # Schema Registry instance and for storing the registered schema data. kafkastore.bootstrap.servers= # REQUIRED: Specifies Confluent Cloud authentication. kafkastore.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \ username="" \ password=""; # Configures Schema Registry to use SASL authentication. kafkastore.sasl.mechanism=PLAIN # Configures Schema Registry for SSL encryption. kafkastore.security.protocol=SASL_SSL # Specifies the name of the topic to store schemas in. kafkastore.topic=_schemas # Specifies the address the socket server listens on. The format is # "listeners = listener_name://host_name:port". For example, "listeners = PLAINTEXT://your.host.name:9092". listeners=http://0.0.0.0:8081 ``` For more information, see [Schema Registry configuration options](/platform/current/schema-registry/installation/deployment.html), [Configuring PLAIN](/platform/current/kafka/authentication_sasl/authentication_sasl_plain.html#sr), and [Quick Start for Schema Management on Confluent Cloud](../get-started/schema-registry.md#cloud-sr-config) (for native cloud Schema Registry). 2. Start Schema Registry with the `schema-registry.properties` file specified. ```bash bin/schema-registry-start etc/schema-registry/schema-registry.properties ``` # Configure and Manage Confluent Platform Use these resources to administer Confluent Platform in your environment. * [Overview](config-manage/overview.md) * [Configuration Reference](installation/configuration/config-index.md) * [Overview](installation/configuration/index.md) * [Configure Brokers and Controllers](installation/configuration/broker-configs.md) * [Configure Topics](installation/configuration/topic-configs.md) * [Configure Consumers](installation/configuration/consumer-configs.md) * [Configure Producers](installation/configuration/producer-configs.md) * [Configure Connect](installation/configuration/connect/overview.md) * [Overview](installation/configuration/connect/index.md) * [Configure Sink Connectors](installation/configuration/connect/sink-connect-configs.md) * [Configure Source Connectors](installation/configuration/connect/source-connect-configs.md) * [Configure AdminClient](installation/configuration/admin-configs.md) * [Configure Licenses](installation/configuration/license-configs.md) * [Configure Streams](installation/configuration/streams-configs.md) * [CLI Tools for Use with Confluent Platform](tools/cli-reference-overview.md) * [Change Configurations Without Restart](kafka/dynamic-config.md) * [Manage Clusters](clusters/index.md) * [Overview](clusters/overview.md) * [Cluster Metadata Management](kafka-metadata/index.md) * [Overview](kafka-metadata/overview.md) * [KRaft Overview](kafka-metadata/kraft.md) * [Configure KRaft](kafka-metadata/config-kraft.md) * [Find ZooKeeper Resources](kafka-metadata/zk-production.md) * [Manage Self-Balancing Clusters](clusters/sbc/overview.md) * [Overview](clusters/sbc/index.md) * [Tutorial: Adding and Remove Brokers](clusters/sbc/sbc-tutorial.md) * [Configure](clusters/sbc/configuration-options.md) * [Performance and Resource Usage](clusters/sbc/performance.md) * [Auto Data Balancing](clusters/rebalancer/overview.md) * [Overview](clusters/rebalancer/index.md) * [Quick Start](clusters/rebalancer/quickstart.md) * [Tutorial: Add and Remove Brokers](clusters/rebalancer/adb-docker-tutorial.md) * [Configure](clusters/rebalancer/configuration-options.md) * [Tiered Storage](clusters/tiered-storage.md) * [Metadata Service (MDS) in Confluent Platform](kafka/configure-mds/overview.md) * [Configure MDS](kafka/configure-mds/index.md) * [Configure Communication with MDS over TLS](kafka/configure-mds/mds-ssl-config-for-components.md) * [Configure mTLS Authentication and RBAC for Kafka Brokers](kafka/configure-mds/mutual-tls-auth-rbac.md) * [Configure Kerberos Authentication for Brokers Running MDS](kafka/configure-mds/kerberos-auth-config.md) * [Configure LDAP Authentication](kafka/configure-mds/ldap-auth-mds.md) * [Configure LDAP Group-Based Authorization for MDS](kafka/configure-mds/ldap-auth-config.md) * [MDS as token issuer](kafka/configure-mds/mds-token-issuer.md) * [Metadata Service Configuration Settings](kafka/configure-mds/mds-configuration.md) * [MDS File-Based Authentication for Confluent Platform](kafka/configure-mds/mds-file-configuration.md) * [Docker Operations for Confluent Platform](installation/docker/operations/overview.md) * [Overview](installation/docker/operations/index.md) * [Monitor and Track Metrics Using JMX](installation/docker/operations/monitoring.md) * [Configure Logs](installation/docker/operations/logging.md) * [Mount External Volumes](installation/docker/operations/external-volumes.md) * [Configure a Multi-Node Environment](kafka/multi-node.md) * [Run Kafka in Production](kafka/deployment.md) * [Production Best Practices](kafka/post-deployment.md) ### Security and resilience features Confluent Platform also offers a number of features that build on Kafka’s security features to help ensure your deployment stays secure and resilient. - You can set authorization by role with Confluent’s [Role-based Access Control (RBAC)](../security/authorization/rbac/overview.md#rbac-overview) feature. - If you use Control Center, you can set up [Single Sign On (SSO)](../security/authentication/sso-for-c3/overview.md#sso-for-c3) that integrates with a supported OIDC identity provider, and enable additional security measures such as multi-factor authentication. - The [REST Proxy Security Plugins in Confluent Platform](../confluent-security-plugins/kafka-rest.md#kafka-rest-security-plugins-install) and [Schema Registry Security Plugin for Confluent Platform](../confluent-security-plugins/schema-registry/introduction.md#confluentsecurityplugins-schema-registry-security-plugin) add security capabilities to the Confluent Platform REST Proxy and Schema Registry. The Confluent REST Proxy Security Plugin helps in authenticating the incoming requests and propagating the authenticated principal to requests to Kafka. This enables Confluent REST Proxy clients to utilize the multi-tenant security features of the Kafka broker. The Schema Registry Security Plugin supports authorization for both role-based access control (RBAC) and ACLs. - [Audit logs](../security/compliance/audit-logs/audit-logs-concepts.md#audit-logs-concepts) provide the ability to capture, protect, and preserve authorization activity into topics in Kafka clusters on Confluent Platform using [Confluent Server Authorizer](../security/csa-introduction.md#confluent-server-authorizer). - The [Cluster Linking](../multi-dc-deployments/cluster-linking/index.md#cluster-linking) feature enables you to directly connect clusters and mirror topics from one cluster to another. This makes it easier to build multi-datacenter, multi-region and hybrid cloud deployments. - [Confluent Replicator](../multi-dc-deployments/replicator/index.md#replicator-detail) makes it easier to maintain multiple Kafka clusters in multiple data centers. Managing replication of data and topic configuration between data centers enables use-cases such as active geo-localized deployments, centralized analytics and cloud migration. You can use Replicator to configure and manage replication for all these scenarios from either Control Center or command-line tools. To get started, see the [Replicator documentation](../multi-dc-deployments/replicator/index.md#replicator-detail), including the [Replicator Quick Start](../multi-dc-deployments/replicator/replicator-quickstart.md#replicator-quickstart). # CONTROL CENTER * [Overview](overview.md) * [Install and Configure](installation/index.md) * [Installation](installation/overview.md) * [System Requirements](installation/system-requirements.md) * [Support Policy](installation/support-policy.md) * [Sample Configurations](installation/properties.md) * [Configuration Reference](installation/configuration.md) * [Data Retention](installation/data-retention.md) * [Manage Licenses](installation/license.md) * [Monitor Logs](installation/logging.md) * [Manage Updates](installation/auto-update-ui.md) * [Troubleshoot](installation/troubleshooting.md) * [Migrate Alerts](installation/alert-migrate.md) * [Upgrade](installation/upgrade.md) * [Security](security/index.md) * [Overview](security/overview.md) * [Configure TLS](security/ssl.md) * [Configure SASL](security/sasl.md) * [Configure HTTP Basic Authentication](security/authentication.md) * [Configure LDAP](security/c3-auth-ldap.md) * [Configure RBAC](security/c3-rbac.md) * [Authorize with Kafka ACLs](security/config-c3-for-kafka-acls.md) * [Add mTLS Authentication for Monitoring and Alerting](security/mtls-to-alert.md) * [Add Basic Authentication for Monitoring and Alerting](security/broker-to-alert.md) * [Manage and View RBAC Roles](security/c3-rbac-roles.md) * [Sign in to Control Center when RBAC enabled on Confluent Platform](security/c3-rbac-login.md) * [Manage RBAC roles with Control Center on Confluent Platform](security/c3-rbac-manage-roles-ui.md) * [View your RBAC roles in Control Center on Confluent Platform](security/c3-rbac-view-roles-ui.md) * [Manage Clusters](clusters.md) * [Manage Brokers](brokers.md) * [Manage Topics](topics/index.md) * [Overview](topics/overview.md) * [Create Topics](topics/create.md) * [Topic Metrics](topics/view.md) * [View Messages](topics/messages.md) * [Configure Topics](topics/edit.md) * [Delete Topics](topics/delete.md) * [Connect](connect.md) * [Manage Flink](cmf.md) * [ksqlDB](ksql.md) * [Clients](clients/index.md) * [Overview](clients/overview.md) * [Consumers Groups](clients/consumers.md) * [Reset Offsets](clients/reset-offsets.md) * [Configure Cluster](clients/cluster-configuration.md) * [Copy Topics](replicators.md) * [Alerts](alerts/index.md) * [Overview](alerts/concepts.md) * [Manage Alerts](alerts/navigate.md) * [Configure Alerts](alerts/configure.md) * [Manage Triggers](alerts/triggers.md) * [Manage Actions](alerts/actions.md) * [Configure PagerDuty](alerts/pagerduty.md) * [REST API](alerts/rest.md) * [Usage Examples](alerts/examples.md) * [Troubleshoot](alerts/trouble.md) * [Release Notes](release-notes.md) # Manage Schemas on Confluent Platform * [Overview](index.md) * [Get Started with Schema Registry Tutorial](schema_registry_onprem_tutorial.md) * [Install and Configure](installation/overview.md) * [Fundamentals](fundamentals/overview.md) * [Concepts](fundamentals/index.md) * [Schema Evolution and Compatibility](fundamentals/schema-evolution.md) * [Schema Formats](fundamentals/serdes-develop/overview.md) * [Serializers and Deserializers Overview](fundamentals/serdes-develop/index.md) * [Avro](fundamentals/serdes-develop/serdes-avro.md) * [Protobuf](fundamentals/serdes-develop/serdes-protobuf.md) * [JSON Schema](fundamentals/serdes-develop/serdes-json.md) * [Data Contracts](fundamentals/data-contracts.md) * [Manage Schemas](schemas-overview.md) * [Work with Schemas in Control Center](schema.md) * [Schema Contexts](schema-contexts-cp.md) * [Schema Linking](schema-linking-cp.md) * [Validate Schema IDs](schema-validation.md) * [Monitor](monitoring.md) * [Delete Schemas](schema-deletion-guidelines.md) * [Integrate Schemas from Connectors](connect.md) * [Security](security/overview.md) * [Overview](security/index.md) * [Configure Role-Based Access Control](security/rbac-schema-registry.md) * [Configure OAuth](security/oauth-schema-registry.md) * [Schema Registry Security Plugin](../confluent-security-plugins/schema-registry/overview.md) * [Overview](../confluent-security-plugins/schema-registry/introduction.md) * [Install](../confluent-security-plugins/schema-registry/install.md) * [Schema Registry Authorization](../confluent-security-plugins/schema-registry/authorization/overview.md) * [Operation and Resource Support](../confluent-security-plugins/schema-registry/authorization/index.md) * [Role-Based Access Control](security/rbac-schema-registry.md) * [ACL Authorizer](../confluent-security-plugins/schema-registry/authorization/sracl_authorizer.md) * [Topic ACL Authorizer](../confluent-security-plugins/schema-registry/authorization/topicacl_authorizer.md) * [Passwordless authentication for Schema Registry](security/passwordless-auth.md) * [Reference](develop/overview.md) * [Overview](develop/index.md) * [Maven Plugin](develop/maven-plugin.md) * [API](develop/api.md) * [API Examples](develop/using.md) * [FAQ](faqs-cp.md) ### Start the stack To get started, create the following `docker-compose.yml` file. This specifies all the infrastructure that you’ll need to run this tutorial: ```yaml version: '2' services: broker: image: confluentinc/cp-kafka:8.1.0 hostname: broker container_name: broker ports: - "29092:29092" environment: KAFKA_BROKER_ID: 1 KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:9092,PLAINTEXT_HOST://localhost:29092 KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0 KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1 KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1 schema-registry: image: confluentinc/cp-schema-registry:8.1.0 hostname: schema-registry container_name: schema-registry depends_on: - broker ports: - "8081:8081" environment: SCHEMA_REGISTRY_HOST_NAME: schema-registry SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: "PLAINTEXT://broker:9092" ksqldb-server: image: confluentinc/cp-ksqldb-server:8.1.0 hostname: ksqldb-server container_name: ksqldb-server depends_on: - broker - schema-registry ports: - "8088:8088" environment: KSQL_LISTENERS: "http://0.0.0.0:8088" KSQL_BOOTSTRAP_SERVERS: "broker:9092" KSQL_KSQL_SCHEMA_REGISTRY_URL: "http://schema-registry:8081" KSQL_KSQL_LOGGING_PROCESSING_STREAM_AUTO_CREATE: "true" KSQL_KSQL_LOGGING_PROCESSING_TOPIC_AUTO_CREATE: "true" # Configuration to embed Kafka Connect support. KSQL_CONNECT_GROUP_ID: "ksql-connect-cluster" KSQL_CONNECT_BOOTSTRAP_SERVERS: "broker:9092" KSQL_CONNECT_KEY_CONVERTER: "org.apache.kafka.connect.storage.StringConverter" KSQL_CONNECT_VALUE_CONVERTER: "io.confluent.connect.avro.AvroConverter" KSQL_CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL: "http://schema-registry:8081" KSQL_CONNECT_CONFIG_STORAGE_TOPIC: "_ksql-connect-configs" KSQL_CONNECT_OFFSET_STORAGE_TOPIC: "_ksql-connect-offsets" KSQL_CONNECT_STATUS_STORAGE_TOPIC: "_ksql-connect-statuses" KSQL_CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1 KSQL_CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1 KSQL_CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1 KSQL_CONNECT_PLUGIN_PATH: "/usr/share/kafka/plugins" ksqldb-cli: image: confluentinc/cp-ksqldb-cli:8.1.0 container_name: ksqldb-cli depends_on: - broker - ksqldb-server entrypoint: /bin/sh tty: true ``` Bring up the stack by running: ```bash docker-compose up ``` ## Configuration steps Following are the basic configuration steps: 1. Using an account with [OrganizationAdmin access](../security/access-control/rbac/predefined-rbac-roles.md#organizationadmin-role), create an API key and secret to connect to Confluent Cloud. For details, refer to [Use API Keys to Authenticate to Confluent Cloud](../security/authenticate/workload-identities/service-accounts/api-keys/overview.md#cloud-api-keys). 2. Validate that Confluent Cloud can be accessed from the machine where you are installing Control Center (Legacy). - Check connection by using `confluent kafka topic list`. - Try producing or consuming from the machine. 3. Install Control Center (Legacy) using the [documentation](/platform/current/control-center/installation/configure-control-center.html). 4. Configure Control Center (Legacy) with the Confluent Cloud specific settings. A minimum valid configuration is shown below. These settings are different from the standard Confluent Cloud configuration. Customize the `bootstrap.servers` and `confluent.controlcenter.streams.sasl.jaas.config` for your Confluent Cloud cluster. ```bash bootstrap.servers= confluent.controlcenter.streams.security.protocol=SASL_SSL confluent.controlcenter.streams.sasl.mechanism=PLAIN confluent.controlcenter.streams.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \ username="" \ password=""; confluent.metrics.topic.max.message.bytes=8388608 confluent.controlcenter.streams.ssl.endpoint.identification.algorithm=https ``` #### IMPORTANT The `confluent.metrics.topic.max.message.bytes` property must be set to `8388608`. See [Control Center Cannot Connect to Confluent Cloud](/platform/current/control-center/installation/troubleshooting.html#c3-connect-ccloud-max-bytes) for details. 5. Configure data stream interceptors by following the [documentation](/platform/current/control-center/installation/clients.html) security configuration that must be added: ```bash confluent.monitoring.interceptor.security.protocol=SASL_SSL confluent.monitoring.interceptor.sasl.mechanism=PLAIN confluent.monitoring.interceptor.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="" password=""; confluent.monitoring.interceptor.ssl.endpoint.identification.algorithm=https ``` 6. (Optional) Add configs for Confluent Cloud Schema Registry per the example in [control-center-ccloud.delta](https://github.com/confluentinc/examples/tree/latest/ccloud/template_delta_configs/control-center-ccloud.delta) on GitHub at [ccloud/examples/template_delta_configs](https://github.com/confluentinc/examples/tree/latest/ccloud/template_delta_configs). The `schema.registry.url` for Control Center (Legacy) is specified using an HTTPS protocol prefix which requires an explicit `443` port, as shown in the example. ```bash # Confluent Schema Registry configuration for Confluent Control Center confluent.controlcenter.schema.registry.basic.auth.credentials.source=USER_INFO confluent.controlcenter.schema.registry.basic.auth.user.info=: confluent.controlcenter.schema.registry.url=https://:443 ``` ### Distributed Cluster 1. Create a distributed properties file named `my-connect-distributed.properties` in the config directory. The contents of this distributed properties file should resemble the following example. Note the security properties with `consumer.*` and `producer.*` prefixes. ```bash bootstrap.servers= group.id=connect-cluster key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.json.JsonConverter key.converter.schemas.enable=false value.converter.schemas.enable=false internal.key.converter=org.apache.kafka.connect.json.JsonConverter internal.value.converter=org.apache.kafka.connect.json.JsonConverter internal.key.converter.schemas.enable=false internal.value.converter.schemas.enable=false # Connect clusters create three topics to manage offsets, configs, and status # information. Note that these contribute towards the total partition limit quota. offset.storage.topic=connect-offsets offset.storage.replication.factor=3 offset.storage.partitions=3 config.storage.topic=connect-configs config.storage.replication.factor=3 status.storage.topic=connect-status status.storage.replication.factor=3 offset.flush.interval.ms=10000 ssl.endpoint.identification.algorithm=https sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \ username="" password=""; security.protocol=SASL_SSL consumer.ssl.endpoint.identification.algorithm=https consumer.sasl.mechanism=PLAIN consumer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \ username="" password=""; consumer.security.protocol=SASL_SSL producer.ssl.endpoint.identification.algorithm=https producer.sasl.mechanism=PLAIN producer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \ username="" password=""; producer.security.protocol=SASL_SSL # Set to a list of filesystem paths separated by commas (,) to enable class loading isolation for plugins # (connectors, converters, transformations). plugin.path=/usr/share/java,/Users//confluent-6.2.1/share/confluent-hub-components ``` 2. (Optional) Add the configuration properties below to the `my-connect-distributed.properties` file. This allows connections to Confluent Cloud Schema Registry. For an example, see [connect-ccloud.delta](https://github.com/confluentinc/examples/tree/latest/ccloud/template_delta_configs/connect-ccloud.delta) on the [ccloud/examples/template_delta_configs](https://github.com/confluentinc/examples/tree/latest/ccloud/template_delta_configs). ```bash # Confluent Schema Registry for Kafka Connect value.converter=io.confluent.connect.avro.AvroConverter value.converter.basic.auth.credentials.source=USER_INFO value.converter.schema.registry.basic.auth.user.info=: value.converter.schema.registry.url=https:// ``` 3. Run Connect using the following command: ```bash ./bin/connect-distributed ./etc/my-connect-distributed.properties ``` To test if the workers came up correctly, you can set up another file sink as follows. Create a file `my-file-sink.json` whose contents are as follows: ```text { "name": "my-file-sink", "config": { "connector.class": "org.apache.kafka.connect.file.FileStreamSinkConnector", "tasks.max": 3, "topics": "page_visits", "file": "my_file.txt" } } ``` #### IMPORTANT You must include the following properties in the connector configuration if you are using a self-managed connector that requires an enterprise license. ```text "confluent.topic.bootstrap.servers":"", "confluent.topic.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";", "confluent.topic.security.protocol":"SASL_SSL", "confluent.topic.sasl.mechanism":"PLAIN" ``` #### IMPORTANT You must include the following configuration properties if you are using a self-managed connector that uses Reporter to write response back to Kafka (for example, the [Azure Functions Sink Connector for Confluent Platform](../../../kafka-connect-azure-functions/current/index.html) or the [Google Cloud Functions Sink Connector for Confluent Platform](../../../kafka-connect-gcp-functions/current/index.html) connector) . ```text "reporter.admin.bootstrap.servers":"", "reporter.admin.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";", "reporter.admin.security.protocol":"SASL_SSL", "reporter.admin.sasl.mechanism":"PLAIN", "reporter.producer.bootstrap.servers":"", "reporter.producer.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";", "reporter.producer.security.protocol":"SASL_SSL", "reporter.producer.sasl.mechanism":"PLAIN" ``` #### IMPORTANT You must include the following properties in the connector configuration if you are using the following connectors: ### Debezium 2 and later ```text "schema.history.internal.kafka.bootstrap.servers": "", "schema.history.internal.consumer.security.protocol": "SASL_SSL", "schema.history.internal.consumer.ssl.endpoint.identification.algorithm": "https", "schema.history.internal.consumer.sasl.mechanism": "PLAIN", "schema.history.internal.consumer.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";", "schema.history.internal.producer.security.protocol": "SASL_SSL", "schema.history.internal.producer.ssl.endpoint.identification.algorithm": "https", "schema.history.internal.producer.sasl.mechanism": "PLAIN", "schema.history.internal.producer.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";" ``` ### Debezium 1.9 and earlier ```text "database.history.kafka.bootstrap.servers": "", "database.history.consumer.security.protocol": "SASL_SSL", "database.history.consumer.ssl.endpoint.identification.algorithm": "https", "database.history.consumer.sasl.mechanism": "PLAIN", "database.history.consumer.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";", "database.history.producer.security.protocol": "SASL_SSL", "database.history.producer.ssl.endpoint.identification.algorithm": "https", "database.history.producer.sasl.mechanism": "PLAIN", "database.history.producer.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";" ``` ### Oracle XStream CDC ```text "schema.history.internal.kafka.bootstrap.servers": "", "schema.history.internal.consumer.security.protocol": "SASL_SSL", "schema.history.internal.consumer.ssl.endpoint.identification.algorithm": "https", "schema.history.internal.consumer.sasl.mechanism": "PLAIN", "schema.history.internal.consumer.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";", "schema.history.internal.producer.security.protocol": "SASL_SSL", "schema.history.internal.producer.ssl.endpoint.identification.algorithm": "https", "schema.history.internal.producer.sasl.mechanism": "PLAIN", "schema.history.internal.producer.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";", # Uncomment and include the following properties only if the connector is configured to use Kafka topics for signaling #"signal.kafka.bootstrap.servers": "", #"signal.consumer.security.protocol": "SASL_SSL", #"signal.consumer.ssl.endpoint.identification.algorithm": "https", #"signal.consumer.sasl.mechanism": "PLAIN", #"signal.consumer.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";" ``` 4. Post this connector config to the worker using the curl command: ```bash curl -s -H "Content-Type: application/json" -X POST -d @my-file-sink.json http://localhost:8083/connectors/ | jq . ``` This should give the following response: ```text { "name": "my-file-sink", "config": { "connector.class": "org.apache.kafka.connect.file.FileStreamSinkConnector", "tasks.max": "1", "topics": "page_visits", "file": "my_file", "name": "my-file-sink" }, "tasks": [], "type": null } ``` 5. Produce some records using Confluent Cloud and tail this file to check if the connectors were successfully created. # Connect Self-Managed Kafka Streams to Confluent Cloud You can connect Kafka Streams to your Confluent Platform Apache Kafka® cluster in Confluent Cloud. **Prerequisites** - [Confluent Platform](/platform/current/installation/index.html) 1. Use the Confluent CLI to log in to your Confluent Cloud cluster, and run the `confluent kafka cluster list` command to get the Kafka cluster ID. ```bash confluent kafka cluster list ``` Your output should resemble: ```none Current | ID | Name | Type | Cloud | Region | Availability | Status ----------+------------+------------+-------+----------+----------+--------------+--------- * | lkc-a123b | my-cluster | BASIC | azure | westus2 | single-zone | UP ``` 2. Run the `confluent kafka cluster describe` command to get the endpoint for your Confluent Cloud cluster. ```bash confluent kafka cluster describe lkc-a123b ``` Your output should resemble: ```text +----------------------+---------------------------------------------------------+ | Current | true | | ID | lkc-a123b | | Name | wikiedits_cluster | | Type | BASIC | | Ingress Limit (MB/s) | 250 | | Egress Limit (MB/s) | 750 | | Storage | 5 TB | | Cloud | azure | | Region | westus2 | | Availability | single-zone | | Status | UP | | Endpoint | SASL_SSL://pkc-41973.westus2.azure.confluent.cloud:9092 | | REST Endpoint | https://pkc-41973.westus2.azure.confluent.cloud:443 | | Topic Count | 30 | +----------------------+---------------------------------------------------------+ ``` Save the `Endpoint` value, which you’ll use in a later step. 3. Create a service account named `my-streams-app`. You must include a description. ```bash confluent iam service-account create my-streams-app --description "My Streams API and secrets service account." ``` Your output should resemble: ```text +-------------+--------------------------------+ | ID | sa-ab01cd | | Name | my-streams-app | | Description | My Streams API and secrets | | | service account. | +-------------+--------------------------------+ ``` Save the service account ID, which you’ll use in later steps. 4. Create an API key and secret for service account `sa-ab01cd`. Be sure to replace the service account ID and Kafka cluster ID values shown here with your own: ```bash confluent api-key create --service-account sa-ab01cd --resource lkc-a123b ``` Your output should resemble: ```text It may take a couple of minutes for the API key to be ready. Save the API key and secret. The secret is not retrievable later. +---------+------------------------------------------------------------------+ | API Key | ABCXQHYDZXMMUDEF | | Secret | aBCde3s54+4Xv36YKPLDKy2aklGr6x/ShUrEX5D1Te4AzRlphFlr6eghmPX81HTF | +---------+------------------------------------------------------------------+ ``` #### IMPORTANT **Save the API key and secret.** You require this information to configure your client applications. Be aware that this is the *only* time that you can access and view the key and secret. To connect Streams to Confluent Cloud, update your [existing Streams configs](/platform/current/streams/developer-guide/config-streams.html) with the properties described here. 1. Create a `java.util.Properties` instance. 2. Configure your streams application. Kafka and Kafka Streams configuration options must be configured in the `java.util.Properties` instance before using Streams. In this example you must configure the Confluent Cloud broker endpoints (`StreamsConfig.BOOTSTRAP_SERVERS_CONFIG`) and SASL config (`SASL_JAAS_CONFIG`) ```java import java.util.Properties; import org.apache.kafka.clients.producer.ProducerConfig; import org.apache.kafka.common.config.SaslConfigs; import org.apache.kafka.streams.StreamsConfig; Properties props = new Properties(); // Comma-separated list of the Confluent Cloud broker endpoints. For example: // r0.great-app.confluent.aws.prod.cloud:9092,r1.great-app.confluent.aws.prod.cloud:9093,r2.great-app.confluent.aws.prod.cloud:9094 props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, ""); props.put(StreamsConfig.REPLICATION_FACTOR_CONFIG, 3); props.put(StreamsConfig.SECURITY_PROTOCOL_CONFIG, "SASL_SSL"); props.put(SaslConfigs.SASL_MECHANISM, "PLAIN"); props.put(SaslConfigs.SASL_JAAS_CONFIG, "org.apache.kafka.common.security.plain.PlainLoginModule required \ username=\"\" password=\"\";"); // Recommended performance/resilience settings props.put(StreamsConfig.producerPrefix(ProducerConfig.DELIVERY_TIMEOUT_MS_CONFIG), 2147483647); props.put(StreamsConfig.producerPrefix(ProducerConfig.MAX_BLOCK_MS_CONFIG), 9223372036854775807); // Any further settings props.put(... , ...); ``` 3. (Optional) Add configs for Confluent Cloud Schema Registry to your streams application per the example in [java_streams.delta](https://github.com/confluentinc/examples/tree/latest/ccloud/template_delta_configs/java_streams.delta) on GitHub at [ccloud/examples/template_delta_configs](https://github.com/confluentinc/examples/tree/latest/ccloud/template_delta_configs). ```java // Confluent Schema Registry for Java props.put("basic.auth.credentials.source", "USER_INFO"); props.put("schema.registry.basic.auth.user.info", ":"); props.put("schema.registry.url", "https://"); ``` - For more information, see [Configuring a Streams Application](/platform/current/streams/developer-guide/config-streams.html). - To view a working example of hybrid Apache Kafka® clusters from self-hosted to Confluent Cloud, see [cp-demo](/platform/current/tutorials/cp-demo/docs/index.html). - For example configs for all Confluent Platform components and clients connecting to Confluent Cloud, see [template examples for components](https://github.com/confluentinc/examples/tree/latest/ccloud/template_delta_configs). - To look at all the code used in the Confluent Cloud demo, see the [Confluent Cloud demo examples](https://github.com/confluentinc/examples/tree/latest/ccloud). ## Quick Start This quick start uses the DynamoDB connector to export data produced by the Avro console producer to DynamoDB. Before you begin, you must create the user or IAM role running the connector with [write and create access](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/authentication-and-access-control.html) to DynamoDB. 1. Install the connector through the [Confluent Hub Client](/kafka-connectors/self-managed/confluent-hub/client.html). ```bash # run from your CP installation directory confluent connect plugin install confluentinc/kafka-connect-aws-dynamodb:latest ``` 2. Start the services using the Confluent CLI. ```bash confluent local start ``` Every service starts in order, printing a message with its status. ```bash Starting Zookeeper Zookeeper is [UP] Starting Kafka Kafka is [UP] Starting Schema Registry Schema Registry is [UP] Starting Kafka REST Kafka REST is [UP] Starting Connect Connect is [UP] Starting KSQL Server KSQL Server is [UP] Starting Control Center Control Center is [UP] ``` #### NOTE You must ensure the connector user has write access to DynamoDB and has deployed credentials [appropriately](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html). You can also pass more properties to the credentials provider. For details, refer to [AWS Credentials](https://docs.confluent.io/kafka-connect-s3-sink/current/index.html#aws-credentials). 3. Start the Avro console producer to import a few records with a simple schema in Kafka. Use the following command: ```bash ./bin/kafka-avro-console-producer --broker-list localhost:9092 --topic dynamodb_topic \ --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"f1","type":"string"}]}' ``` 4. Enter the following in the console producer: ```bash {"f1": "value1"} {"f1": "value2"} {"f1": "value3"} {"f1": "value4"} {"f1": "value5"} {"f1": "value6"} {"f1": "value7"} {"f1": "value8"} {"f1": "value9"} ``` The records are published to the Kafka topic `dynamodb_topic` in Avro format. 5. Find the region that the DynamoDB instance is running in (for example, `us-east-2`) and create a config file with the following contents. Save it as `quickstart-dynamodb.properties`. #### NOTE In the following example, a DynamoDB table called `dynamodb_topic` will be created in your DynamoDB instance. ```none name=dynamodb-sink connector.class=io.confluent.connect.aws.dynamodb.DynamoDbSinkConnector tasks.max=1 topics=dynamodb_topic # use the region to populate the next two properties aws.dynamodb.region= aws.dynamodb.endpoint=https://dynamodb..amazonaws.com confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 ``` 6. Start the DynamoDB connector by loading its configuration with the following command: ```bash confluent local load dynamodb-sink --config quickstart-dynamodb.properties { "name": "dynamodb-sink", "config": { "connector.class": "io.confluent.connect.aws.dynamodb.DynamoDbSinkConnector", "tasks.max": "1", "topics": "dynamodb_topic", "aws.dynamodb.region": "", "aws.dynamodb.endpoint": "https://dynamodb..amazonaws.com", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1", "name": "dynamodb-sink" }, "tasks": [], "type": "sink" } ``` #### IMPORTANT Don’t use the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/index.html) commands in production environments. 7. Confirm that the connector is in a `RUNNING` state. ```bash confluent local status dynamodb-sink ``` 8. After the connector has ingested some records, use the AWS CLI to check that the data is available in DynamoDB. ```bash aws dynamodb scan --table-name dynamodb_topic --region us-east-1 ``` You should see nine items with keys: ```bash { "Items": [ { "partition": { "N": "0" },"offset": { "N": "0" },"name": { "S": "f1" },"type": { "S": "value1" } },{ "partition": { "N": "0" },"offset": { "N": "1" },"name": { "S": "f1" },"type": { "S": "value2" } },{ "partition": { "N": "0" },"offset": { "N": "2" },"name": { "S": "f1" },"type": { "S": "value3" } },{ "partition": { "N": "0" },"offset": { "N": "3" },"name": { "S": "f1" },"type": { "S": "value4" } },{ "partition": { "N": "0" },"offset": { "N": "4" },"name": { "S": "f1" },"type": { "S": "value5" } },{ "partition": { "N": "0" },"offset": { "N": "5" },"name": { "S": "f1" },"type": { "S": "value6" } },{ "partition": { "N": "0" },"offset": { "N": "6" },"name": { "S": "f1" },"type": { "S": "value7" } },{ "partition": { "N": "0" },"offset": { "N": "7" },"name": { "S": "f1" },"type": { "S": "value8" } },{ "partition": { "N": "0" },"offset": { "N": "8" },"name": { "S": "f1" },"type": { "S": "value9" } } ], "Count": 9, "ScannedCount": 9, "ConsumedCapacity": null } ``` 9. Enter the following command to stop the Connect worker and all services: ```bash confluent local stop ``` Your output should resemble: ```none Stopping Control Center Control Center is [DOWN] Stopping KSQL Server KSQL Server is [DOWN] Stopping Connect Connect is [DOWN] Stopping Kafka REST Kafka REST is [DOWN] Stopping Schema Registry Schema Registry is [DOWN] Stopping Kafka Kafka is [DOWN] Stopping Zookeeper Zookeeper is [DOWN] ``` Or, enter the following command to stop all services and delete all generated data: ```bash confluent local destroy ``` Your output should resemble: ```bash Stopping Control Center Control Center is [DOWN] Stopping KSQL Server KSQL Server is [DOWN] Stopping Connect Connect is [DOWN] Stopping Kafka REST Kafka REST is [DOWN] Stopping Schema Registry Schema Registry is [DOWN] Stopping Kafka Kafka is [DOWN] Stopping Zookeeper Zookeeper is [DOWN] Deleting: /var/folders/ty/rqbqmjv54rg_v10ykmrgd1_80000gp/T/confluent.PkQpsKfE ``` #### Reporter example 1. Run the demo app with the `basic-auth` Spring profile. ```bash mvn spring-boot:run -Dspring.profiles.active=basic-auth ``` 2. Create a `http-sink.properties` file with the following contents: ```text name=ReporterExample topics=http-messages tasks.max=1 connector.class=io.confluent.connect.http.HttpSinkConnector # key/val converters key.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.storage.StringConverter # licensing for local single-node Kafka cluster confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 # http sink connector configs http.api.url=http://localhost:8080/api/messages auth.type=BASIC connection.user=admin connection.password=password behavior.on.null.values=delete # reporter configurations reporter.bootstrap.servers=localhost:9092 reporter.result.topic.name=success-responses reporter.result.topic.replication.factor=1 reporter.error.topic.name=error-responses reporter.error.topic.replication.factor=1 reporter.admin.bootstrap.servers=.confluent.cloud:9092 reporter.admin.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule / required username= password= reporter.admin.security.protocol=SASL_SSL reporter.admin.sasl.mechanism=PLAIN" reporter.producer.bootstrap.servers=.confluent.cloud:9092 reporter.producer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule / required username= password= reporter.producer.security.protocol=SASL_SSL reporter.producer.sasl.mechanism=PLAIN" ``` For additional information about Connect Reporter for secure environments, see [Kafka Connect Reporter](/platform/current/connect/security.html#kconnect-reporter). 3. Publish messages to the topic that have keys and values. ```bash confluent local produce http-messages --property parse.key=true --property key.separator=, > 1,message-value > 2,another-message ``` 4. Run and validate the connector as described in the [Quick start](#http-connector-quickstart). 5. Consume the records from `success-responses` and `error-responses` topic to see the http operation response. ```bash kafkacat -C -b localhost:9092 -t success-responses -J |jq ``` ```json { "topic": "success-responses", "partition": 0, "offset": 0, "tstype": "create", "ts": 1581579911854, "headers": [ "input_record_offset", "0", "input_record_timestamp", "1581488456476", "input_record_partition", "0", "input_record_topic", "http-connect" ], "key": null, "payload": "{\"id\":1,\"message\":\"1,message-value\"}" } { "topic": "success-responses", "partition": 0, "offset": 1, "tstype": "create", "ts": 1581579911854, "headers": [ "input_record_offset", "1", "input_record_timestamp", "1581488456476", "input_record_partition", "0", "input_record_topic", "http-connect" ], "key": null, "payload": "{\"id\":2,\"message\":\"2,message-value\"}" } ``` In case of retryable errors (that is, errors with a 5xx status code), a response like the one shown below is included in the error-responses topic. ```bash kafkacat -C -b localhost:9092 -t error-responses -J |jq ``` ```json { "topic": "error-responses", "partition": 0, "offset": 0, "tstype": "create", "ts": 1581579911854, "headers": [ "input_record_offset", "0", "input_record_timestamp", "1581579931450", "input_record_partition", "0", "input_record_topic", "http-messages" ], "key": null, "payload": "Retry time lapsed, unable to process HTTP request. Error while processing HTTP request with Url : http://localhost:8080/api/messages, Payload : 6,test, Status code : 500, Reason Phrase : , Response Content : {\"timestamp\":\"2020-02-11T10:44:41.574+0000\",\"status\":500,\"error\":\"Internal Server Error\",\"message\":\"Unresolved compilation problem: \\n\\tlog cannot be resolved\\n\",\"path\":\"/api/messages\"}, " } ``` #### Distributed worker configuration 1. Create your `my-connect-distributed.properties` file based on the following example. ```properties bootstrap.servers= key.converter=org.apache.kafka.connect.storage.StringConverter value.converter=io.confluent.connect.avro.AvroConverter ssl.endpoint.identification.algorithm=https security.protocol=SASL_SSL sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="" password=""; request.timeout.ms=20000 retry.backoff.ms=500 producer.bootstrap.servers= producer.ssl.endpoint.identification.algorithm=https producer.security.protocol=SASL_SSL producer.sasl.mechanism=PLAIN producer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="" password=""; producer.request.timeout.ms=20000 producer.retry.backoff.ms=500 consumer.bootstrap.servers= consumer.ssl.endpoint.identification.algorithm=https consumer.security.protocol=SASL_SSL consumer.sasl.mechanism=PLAIN consumer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="" password=""; consumer.request.timeout.ms=20000 consumer.retry.backoff.ms=500 offset.flush.interval.ms=10000 offset.storage.file.filename=/tmp/connect.offsets group.id=connect-cluster offset.storage.topic=connect-offsets offset.storage.replication.factor=3 offset.storage.partitions=3 config.storage.topic=connect-configs config.storage.replication.factor=3 status.storage.topic=connect-status status.storage.replication.factor=3 # Schema Registry specific settings # We recommend you use Confluent Cloud Schema Registry if you run Oracle CDC Source against Confluent Cloud value.converter.basic.auth.credentials.source=USER_INFO value.converter.schema.registry.basic.auth.user.info=: value.converter.schema.registry.url= # Confluent license settings confluent.topic.bootstrap.servers= confluent.topic.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="" password=""; confluent.topic.security.protocol=SASL_SSL confluent.topic.sasl.mechanism=PLAIN # Set to a list of filesystem paths separated by commas (,) to enable class loading isolation for plugins # (connectors, converters, transformations). The list should consist of top level directories that include # any combination of: # a) directories immediately containing jars with plugins and their dependencies # b) uber-jars with plugins and their dependencies # c) directories immediately containing the package directory structure of classes of plugins and their dependencies # Examples: # plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors, plugin.path=/usr/share/java,/confluent-6.0.0/share/confluent-hub-components ``` 2. Start Kafka Connect with the following command: ```text /bin/connect-distributed my-connect-distributed.properties ``` ### Known issues * When deploying CFK to Red Hat OpenShift with Red Hat’s Operator Lifecycle Manager using the OperatorHub, you must use OpenShift version 4.9 or higher. This OpenShift version restriction does not apply when deploying CFK to Red Hat OpenShift in the standard way without using the Red Hat Operator Lifecycle Manager. * When CFK is deployed on an OpenShift cluster with Red Hat’s Operator LIfecycle Manager/OperatorHub, capturing the support bundle for CFK using the command, `kubectl confluent support-bundle --namespace `, can fail with the following error message: ```text panic: runtime error: index out of range [0] with length 0 ``` * If the ksqlDB REST endpoint is using the auto-generated certificates, the ksqlDB deployment that points to Confluent Cloud requires trusting the Let’s Encrypt CA. For this to work, you must provide a CA bundle through `cacerts.pem` that contains both (1) the Confluent Cloud CA and (2) the self-signed CA to the ksqlDB CR. * When TLS is enabled, and when Confluent Control Center (Legacy) uses a different TLS certificate to communicate with MDS or Confluent Cloud Schema Registry, Control Center (Legacy) (used with Confluent Platform 7.x) cannot use an auto-generated TLS certificate to connect to MDS or Confluent Cloud Schema Registry. See [Troubleshooting Guide](co-troubleshooting.md#co-c3-mds-certificates) for a workaround. * When deploying the Schema Registry and Kafka CRs simultaneously, Schema Registry could fail because it cannot create topics with a replication factor of 3. It is because the Kafka brokers have not fully started. The workaround is to delete the Schema Registry deployment and re-deploy once Kafka is fully up. * When deploying an RBAC-enabled Kafka cluster in centralized mode, where another “secondary” Kafka is being used to store RBAC metadata, an error, “License Topic could not be created”, may return on the secondary Kafka cluster. * A periodic Kubernetes TCP probe on ZooKeeper (in Confluent Platform 7.x) causes frequent warning messages “client has closed socket” when warning logs are enabled. * REST Proxy configured with monitoring interceptors is missing the callback handler properties when RBAC is enabled. Interceptor would not work, and you would see an error message in the KafkaRestProxy log. As a workaround, manually add configuration overrides as shown in the following KafkaRestProxy CR: ```yaml configOverrides: server: - confluent.monitoring.interceptor.sasl.login.callback.handler.class=io.confluent.kafka.clients.plugins.auth.token.TokenUserLoginCallbackHandler - consumer.confluent.monitoring.interceptor.sasl.login.callback.handler.class=io.confluent.kafka.clients.plugins.auth.token.TokenUserLoginCallbackHandler - producer.confluent.monitoring.interceptor.sasl.login.callback.handler.class=io.confluent.kafka.clients.plugins.auth.token.TokenUserLoginCallbackHandler ``` * When configuring source-initiated cluster links with CFK where the source cluster has TLS enabled, set the following in the Source mode ClusterLink CR, under the `spec.configs` section: * `local.security.protocol: SSL` and `local.listener.name: ` for mTLS. * `local.security.protocol: SASL_SSL` for SASL authentication with TLS. For details about configuring source-initiated Cluster Linking, see [Configure the source-initiated cluster link on the source cluster](co-link-clusters.md#co-clusterlink-source-initiated-connection-source-mode). * The CFK [support bundle plugin](co-troubleshooting.md#co-support-bundle) on Windows systems does not capture all the logs. As a workaround, specify the `--out-dir` flag in the `kubectl confluent support-bundle` command to provide the output location for the support bundle. * When you have external access enabled with load balancer type for both Control Center and Prometheus, only `controlcenter-next-gen-prometheus-bootstrap-lb` gets created. The workaround is to enable Control Center external access first, and then add Prometheus external access. This will show both Control Center and Prometheus external `controlcenter-next-gen-bootstrap-lb` and `controlcenter-next-gen-prometheus-bootstrap-lb` get created. ## Preparation Follow these guidelines when you prepare to upgrade. * Read the [Release Notes for Confluent Platform 8.1](../release-notes/index.md#release-notes) and review the [Changelogs](../release-notes/changelog.md#cp-changelog) for your Confluent Platform components. : The release notes contain important information about noteworthy features, and changes to configurations that may impact your upgrade. Changelogs note updates to third-party components such as Jolokia or JMX exporters that might affect your system. * Form a plan. : Read the documentation and draft an upgrade plan that matches your specific requirements and environment before starting the upgrade process. In other words, don’t start working through this guide on a live cluster. Read the guide entirely, make a plan, then execute the plan. * Perform backups. : Before upgrading, always back up all configuration and unit files with their file permissions, ownership, and customizations. Confluent Platform may not run if the proper ownership isn’t preserved on configuration files. By default, configuration files are located in the `$CONFLUENT_HOME/etc` directory and are organized by component. * Upgrade all components. : Upgrade your entire platform deployment so that all components are running the same version. Do not bridge component versions. * Consider upgrade order. : The general recommended upgrade order is to upgrade your server-side components (brokers, controllers, and Confluent Control Center) before you upgrade your client applications. Newer brokers are designed to work with compliant older clients. As noted in the [KIP-896](https://cwiki.apache.org/confluence/display/KAFKA/KIP-896%3A+Remove+old+client+protocol+API+versions+in+Kafka+4.0) warning, this compatibility guarantee has changed. Confluent Platform 8.1 won’t communicate with clients using a protocol older than Kafka 2.1.0. Therefore, the upgrade process must be: 1. Ensure you’re running Confluent Platform 7.7, 7.8, or 7.9. 2. Identify and upgrade all non-compliant clients. 3. Upgrade Confluent Control Center to a version compatible with 8.1. 4. Upgrade your brokers, controllers, and other server-side components to 8.1. 5. Upgrade your compliant clients to the 8.1 libraries as needed to access new features. Clients include any application that uses Kafka producers or consumers, command-line tools, Schema Registry, REST Proxy, Kafka Connect, and Kafka Streams. * Determine if clients are colocated with brokers. : Although not recommended, some deployments have clients co-located with brokers (on the same node). In these cases, brokers and clients share the same packages. In this colocation case, ensure all client processes are not upgraded until *all* Kafka brokers have been upgraded. * Decide between a rolling upgrade or a downtime upgrade. : Confluent Platform supports both rolling upgrades, meaning you upgrade one broker at a time to avoid cluster downtime, and downtime upgrades meaning you take down the entire cluster, upgrade it, and bring everything back up. * Use Confluent Control Center for monitoring for a rolling restart. : Consider using [Control Center](/control-center/current/installation/overview.html) to monitor broker status during the [rolling restart](../kafka/post-deployment.md#rolling-restart). * Set a license string. : The Confluent Platform package includes Confluent Server by default and requires a `confluent.license` key in your `server.properties` file. The Confluent Server broker checks for a license during start up. You must supply a license string in each broker’s properties file using the `confluent.license` property as shown in the following code: ```none confluent.license=LICENCE_STRING_HERE_NO_QUOTES ``` If you want to use the Kafka broker, download the `confluent-community` package. The Kafka broker is the default in all Debian or RHEL and CentOS packages. * Run the correct version of Java. : Determine and install the appropriate Java version. See [Supported Java Versions](versions-interoperability.md#java-sys-req) for a list of Confluent Platform versions and the corresponding Java version support before you upgrade. For complete compatibility information, see the [Supported Versions and Interoperability for Confluent Platform](versions-interoperability.md#interoperability-versions). ### Print schema IDs with command line consumer utilities You can use the `kafka-avro-console-consumer`, `kafka-protobuf-console-consumer`, and `kafka-json-schema-console-consumer` utilities to get the schema IDs for all messages on a topic, or for a specified subset of messages. This can be useful for exploring or troubleshooting schemas. To print schema IDs, run the consumer with `--property print.schema.ids=true` and `--property print.key=true`. The basic command syntax for Avro is as follows: ```bash kafka-avro-console-consumer --bootstrap-server $BOOTSTRAP_SERVER \ --property basic.auth.credentials.source="USER_INFO" \ --property print.key=true --property print.schema.ids=true \ --property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer \ --property schema.registry.url=$SCHEMA_REGISTRY_URL \ --consumer.config /Users/vicky/creds.config \ --topic --from-beginning \ --property schema.registry.basic.auth.user.info=$SR_APIKEY:$SR_APISECRET ``` Note that to run this command against Confluent Cloud, you must have an API key and secret for the Kafka cluster and for the Schema Registry cluster associated with the environment. To specify the value for `$BOOTSTRAP_SERVER`, you must use the Endpoint URL on Confluent Cloud or the host and port as specified in your properties files for Confluent Platform. - To find the Endpoint URL on Confluent Cloud to use as the value for $BOOTSTRAP_SERVER, on the Cloud Console navigate to **Cluster settings** and find the URL for **Bootstrap server** under **Endpoints**. Alternatively, use the Confluent CLI command [confluent kafka cluster describe](https://docs.confluent.io/confluent-cli/current/command-reference/kafka/cluster/confluent_kafka_cluster_describe.html) to find the value given for **Endpoint**, minus the security protocol prefix. For Confluent Cloud, this will always be in the form of `URL:port`, such as `pkc-12576z.us-west2.gcp.confluent.cloud:9092`. - The examples use shell environment variables to indicate values for `--bootstrap-server`, `schema.registry.url`, API key and secret, and so forth. You may want to store the values for these properties in local shell environment variables to make testing at the command line easier. (For example: `export API_KEY=xyz`.) You can check the contents of a variable with `echo $` (for example, `echo $APIKEY`), then use it as such in subsequent commands and config files. - The users’ credentials are in a local file called `creds.config`, which contains the following information: ```bash # Required connection configs for Kafka producer, consumer, and admin bootstrap.servers= security.protocol=SASL_SSL sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="" password=""; sasl.mechanism=PLAIN # Required for correctness in Apache Kafka clients prior to 2.6 client.dns.lookup=use_all_dns_ips # Best practice for higher availability in Apache Kafka clients prior to 3.0 session.timeout.ms=45000 # Best practice for Kafka producer to prevent data loss acks=all ``` The subsequent examples use basic authentication and API keys. To learn more about authentication on Confluent Cloud, see [security/authenticate/workload-identities/service-accounts/api-keys/manage-api-keys.html#add-an-api-key](/cloud/current/security/authenticate/workload-identities/service-accounts/api-keys/manage-api-keys.html#add-an-api-key). To learn more about authentication on Confluent Platform, see [Use HTTP Basic Authentication in Confluent Platform](/platform/current/security/authentication/http-basic-auth/overview.html). #### IMPORTANT If you are configuring this for Schema Registry or REST Proxy, you must prefix each parameter with `confluent.license`. For example, `sasl.mechanism` becomes `confluent.license.sasl.mechanism`. For additional information, see [Configure license clients to authenticate to Kafka](../../../installation/license.md#kafka-rest-and-sasl-ssl-configs). The new Producer and Consumer clients support security for Kafka versions 0.9.0 and higher. If you are using the Kafka Streams API, you can read on how to configure equivalent [SSL](/platform/current/clients/javadocs/javadoc/org/apache/kafka/common/config/SslConfigs.html) and [SASL](/platform/current/clients/javadocs/javadoc/org/apache/kafka/common/config/SaslConfigs.html) parameters. In the following configuration example, the underlying assumption is that client authentication is required by the broker so that you can store it in a client properties file `client-ssl.properties`. Because this stores passwords directly in the broker configuration file, it is important to restrict access to these files using file system permissions. ```bash bootstrap.servers=kafka1:9093 security.protocol=SSL ssl.truststore.location=/var/private/ssl/kafka.client.truststore.jks ssl.truststore.password=test1234 ssl.keystore.location=/var/private/ssl/kafka.client.keystore.jks ssl.keystore.password=test1234 ssl.key.password=test1234 ``` Note that `ssl.truststore.password` is technically optional, but strongly recommended. If a password is not set, access to the truststore is still available, but integrity checking is disabled. The following examples use `kafka-console-producer` and `kafka-console-consumer`, and pass in the `client-ssl.properties` defined above: ```bash kafka-console-producer --bootstrap-server kafka1:9093 --topic test --producer.config client-ssl.properties kafka-console-consumer --bootstrap-server kafka1:9093 --topic test --consumer.config client-ssl.properties --from-beginning ``` # API Reference for Confluent Cloud * [Confluent Cloud APIs](api.md) * [Kafka Admin and Produce REST APIs](kafka-rest/kafka-rest-cc.md) * [Using Base64 Encoded Data and Credentials](kafka-rest/kafka-rest-cc.md#using-base64-encoded-data-and-credentials) * [Use cases](kafka-rest/kafka-rest-cc.md#use-cases) * [Cluster Admin API](kafka-rest/kafka-rest-cc.md#cluster-admin-api) * [Principal IDs for ACLs](kafka-rest/kafka-rest-cc.md#principal-ids-for-acls) * [Produce API](kafka-rest/kafka-rest-cc.md#produce-api) * [Streaming mode (recommended)](kafka-rest/kafka-rest-cc.md#streaming-mode-recommended) * [Non-streaming mode (not recommended)](kafka-rest/kafka-rest-cc.md#non-streaming-mode-not-recommended) * [Producing a single record to a topic](kafka-rest/kafka-rest-cc.md#producing-a-single-record-to-a-topic) * [Producing a batch of records to a topic](kafka-rest/kafka-rest-cc.md#producing-a-batch-of-records-to-a-topic) * [Data payload specification](kafka-rest/kafka-rest-cc.md#data-payload-specification) * [Connection bias and request limits](kafka-rest/kafka-rest-cc.md#connection-bias-and-request-limits) * [Suggested Resources](kafka-rest/kafka-rest-cc.md#suggested-resources) * [Connect API](connectors/connect-api-section.md) * [Prerequisites](connectors/connect-api-section.md#prerequisites) * [Managed and Custom Connector API Examples](connectors/connect-api-section.md#managed-and-custom-connector-api-examples) * [Get a list of connectors](connectors/connect-api-section.md#get-a-list-of-connectors) * [Create a connector](connectors/connect-api-section.md#create-a-connector) * [Custom connector configuration payload](connectors/connect-api-section.md#custom-connector-configuration-payload) * [Raw JSON payload example](connectors/connect-api-section.md#raw-json-payload-example) * [JSON file payload example](connectors/connect-api-section.md#json-file-payload-example) * [List Java and Kafka runtimes](connectors/connect-api-section.md#list-java-and-ak-runtimes) * [Update (or create) a connector](connectors/connect-api-section.md#update-or-create-a-connector) * [Read a connector configuration](connectors/connect-api-section.md#read-a-connector-configuration) * [Migrate a connector configuration](connectors/connect-api-section.md#migrate-a-connector-configuration) * [Query a sink connector for metrics](connectors/connect-api-section.md#query-a-sink-connector-for-metrics) * [Delete a connector](connectors/connect-api-section.md#delete-a-connector) * [Fully-Managed and Custom Connector Plugin API Examples](connectors/connect-api-section.md#fully-managed-and-custom-connector-plugin-api-examples) * [Fully-managed connector plugin examples](connectors/connect-api-section.md#fully-managed-connector-plugin-examples) * [List fully-managed plugins](connectors/connect-api-section.md#list-fully-managed-plugins) * [Validate a fully-managed plugin](connectors/connect-api-section.md#validate-a-fully-managed-plugin) * [Custom Connector Plugin API examples](connectors/connect-api-section.md#custom-connector-plugin-api-examples) * [List custom plugins](connectors/connect-api-section.md#list-custom-plugins) * [Request a presigned URL](connectors/connect-api-section.md#request-a-presigned-url) * [Upload a custom plugin](connectors/connect-api-section.md#upload-a-custom-plugin) * [Create a custom plugin](connectors/connect-api-section.md#create-a-custom-plugin) * [Read a custom plugin](connectors/connect-api-section.md#read-a-custom-plugin) * [Update a custom plugin](connectors/connect-api-section.md#update-a-custom-plugin) * [Delete a custom plugin](connectors/connect-api-section.md#delete-a-custom-plugin) * [Custom Connector Plugin version API examples](connectors/connect-api-section.md#custom-connector-plugin-version-api-examples) * [Create a custom connector plugin version](connectors/connect-api-section.md#create-a-custom-connector-plugin-version) * [List custom connector plugin version](connectors/connect-api-section.md#list-custom-connector-plugin-version) * [Describe custom connector plugin version](connectors/connect-api-section.md#describe-custom-connector-plugin-version) * [Delete custom connector plugin version](connectors/connect-api-section.md#delete-custom-connector-plugin-version) * [Next Steps](connectors/connect-api-section.md#next-steps) * [Client APIs](client-api.md) * [C++ Client API](https://docs.confluent.io/platform/current/clients/api-docs/librdkafka.html) * [Python Client API](https://docs.confluent.io/platform/current/clients/api-docs/confluent-kafka-python.html) * [Go Client API](https://docs.confluent.io/platform/current/clients/api-docs/confluent-kafka-go.html) * [.NET Client API](https://docs.confluent.io/platform/current/clients/api-docs/confluent-kafka-dotnet.html) * [Provider Integration API](pi-api.md) * [Prerequisites](pi-api.md#prerequisites) * [Manage provider integration](pi-api.md#manage-provider-integration) * [List provider integrations](pi-api.md#list-provider-integrations) * [Register a provider integration](pi-api.md#register-a-provider-integration) * [Read a provider integration](pi-api.md#read-a-provider-integration) * [Delete a provider integration](pi-api.md#delete-a-provider-integration) * [Flink REST API](flink/operate-and-deploy/flink-rest-api.md) * [Prerequisites](flink/operate-and-deploy/flink-rest-api.md#prerequisites) * [Rate limits](flink/operate-and-deploy/flink-rest-api.md#rate-limits) * [Private networking endpoints](flink/operate-and-deploy/flink-rest-api.md#private-networking-endpoints) * [Generate a Flink API key](flink/operate-and-deploy/flink-rest-api.md#generate-a-af-api-key) * [Manage statements](flink/operate-and-deploy/flink-rest-api.md#manage-statements) * [Flink SQL statement schema](flink/operate-and-deploy/flink-rest-api.md#flink-sql-statement-schema) * [Submit a statement](flink/operate-and-deploy/flink-rest-api.md#submit-a-statement) * [Get a statement](flink/operate-and-deploy/flink-rest-api.md#get-a-statement) * [List statements](flink/operate-and-deploy/flink-rest-api.md#list-statements) * [Update metadata for a statement](flink/operate-and-deploy/flink-rest-api.md#update-metadata-for-a-statement) * [Delete a statement](flink/operate-and-deploy/flink-rest-api.md#delete-a-statement) * [Manage compute pools](flink/operate-and-deploy/flink-rest-api.md#manage-compute-pools) * [List Flink compute pools](flink/operate-and-deploy/flink-rest-api.md#list-af-compute-pools) * [Create a Flink compute pool](flink/operate-and-deploy/flink-rest-api.md#create-a-af-compute-pool) * [Read a Flink compute pool](flink/operate-and-deploy/flink-rest-api.md#read-a-af-compute-pool) * [Update a Flink compute pool](flink/operate-and-deploy/flink-rest-api.md#update-a-af-compute-pool) * [Delete a Flink compute pool](flink/operate-and-deploy/flink-rest-api.md#delete-a-af-compute-pool) * [List Flink regions](flink/operate-and-deploy/flink-rest-api.md#list-af-regions) * [Manage Flink artifacts](flink/operate-and-deploy/flink-rest-api.md#manage-af-artifacts) * [List Flink artifacts](flink/operate-and-deploy/flink-rest-api.md#list-af-artifacts) * [Create a Flink artifact](flink/operate-and-deploy/flink-rest-api.md#create-a-af-artifact) * [Read an artifact](flink/operate-and-deploy/flink-rest-api.md#read-an-artifact) * [Update an artifact](flink/operate-and-deploy/flink-rest-api.md#update-an-artifact) * [Delete an artifact](flink/operate-and-deploy/flink-rest-api.md#delete-an-artifact) * [Manage UDF logging](flink/operate-and-deploy/flink-rest-api.md#manage-udf-logging) * [Enable logging](flink/operate-and-deploy/flink-rest-api.md#enable-logging) * [List UDF logs](flink/operate-and-deploy/flink-rest-api.md#list-udf-logs) * [Disable a UDF log](flink/operate-and-deploy/flink-rest-api.md#disable-a-udf-log) * [View log details](flink/operate-and-deploy/flink-rest-api.md#view-log-details) * [Update the logging level for a UDF log](flink/operate-and-deploy/flink-rest-api.md#update-the-logging-level-for-a-udf-log) * [Manage connections](flink/operate-and-deploy/flink-rest-api.md#manage-connections) * [Create a connection](flink/operate-and-deploy/flink-rest-api.md#create-a-connection) * [Delete a connection](flink/operate-and-deploy/flink-rest-api.md#delete-a-connection) * [List connections](flink/operate-and-deploy/flink-rest-api.md#list-connections) * [Related content](flink/operate-and-deploy/flink-rest-api.md#related-content) * [Metrics API](https://api.telemetry.confluent.cloud/docs) * [Stream Catalog REST API Usage](stream-governance/stream-catalog-rest-apis.md) * [Catalog API usage examples](stream-governance/stream-catalog-rest-apis.md#catalog-api-usage-examples) * [What’s in this guide](stream-governance/stream-catalog-rest-apis.md#what-s-in-this-guide) * [What this guide doesn’t cover](stream-governance/stream-catalog-rest-apis.md#what-this-guide-doesn-t-cover) * [Catalog API usage limitations and best practices](stream-governance/stream-catalog-rest-apis.md#catalog-api-usage-limitations-and-best-practices) * [Character limits on tag and business metadata definitions](stream-governance/stream-catalog-rest-apis.md#character-limits-on-tag-and-business-metadata-definitions) * [Rate limits on searches](stream-governance/stream-catalog-rest-apis.md#rate-limits-on-searches) * [Limits on topic listings](stream-governance/stream-catalog-rest-apis.md#limits-on-topic-listings) * [Global sorting of search results](stream-governance/stream-catalog-rest-apis.md#global-sorting-of-search-results) * [Business metadata support on Unified Stream Manager entities](stream-governance/stream-catalog-rest-apis.md#business-metadata-support-on-usm-entities) * [Setup and suggestions](stream-governance/stream-catalog-rest-apis.md#setup-and-suggestions) * [OAuth for Confluent Cloud Stream Catalog REST API](stream-governance/stream-catalog-rest-apis.md#oauth-for-ccloud-sg-catalog-rest-api) * [Entity types](stream-governance/stream-catalog-rest-apis.md#entity-types) * [How entities are identified](stream-governance/stream-catalog-rest-apis.md#how-entities-are-identified) * [Qualified name definitions](stream-governance/stream-catalog-rest-apis.md#qualified-name-definitions) * [Examples of qualified names](stream-governance/stream-catalog-rest-apis.md#examples-of-qualified-names) * [Entity APIs](stream-governance/stream-catalog-rest-apis.md#entity-apis) * [Searching](stream-governance/stream-catalog-rest-apis.md#searching) * [Tags API examples](stream-governance/stream-catalog-rest-apis.md#tags-api-examples) * [Create a tag](stream-governance/stream-catalog-rest-apis.md#create-a-tag) * [Create a generic tag applicable to a topic or any entity](stream-governance/stream-catalog-rest-apis.md#create-a-generic-tag-applicable-to-a-topic-or-any-entity) * [List all tags (with definitions)](stream-governance/stream-catalog-rest-apis.md#list-all-tags-with-definitions) * [Get tag definition](stream-governance/stream-catalog-rest-apis.md#get-tag-definition) * [Search fields by name](stream-governance/stream-catalog-rest-apis.md#search-fields-by-name) * [Search fields by tag](stream-governance/stream-catalog-rest-apis.md#search-fields-by-tag) * [Search schema record by name](stream-governance/stream-catalog-rest-apis.md#search-schema-record-by-name) * [Search schema by tag](stream-governance/stream-catalog-rest-apis.md#search-schema-by-tag) * [Tag a field in Avro](stream-governance/stream-catalog-rest-apis.md#tag-a-field-in-avro) * [Get the tag attributes from a field](stream-governance/stream-catalog-rest-apis.md#get-the-tag-attributes-from-a-field) * [Tag a schema version](stream-governance/stream-catalog-rest-apis.md#tag-a-schema-version) * [Get schemas with a given a subject name prefix](stream-governance/stream-catalog-rest-apis.md#get-schemas-with-a-given-a-subject-name-prefix) * [Delete a tag](stream-governance/stream-catalog-rest-apis.md#delete-a-tag) * [Topics](stream-governance/stream-catalog-rest-apis.md#topics) * [List all topics](stream-governance/stream-catalog-rest-apis.md#list-all-topics) * [Search for topics by name](stream-governance/stream-catalog-rest-apis.md#search-for-topics-by-name) * [Search for topics by tag](stream-governance/stream-catalog-rest-apis.md#search-for-topics-by-tag) * [Tag a topic](stream-governance/stream-catalog-rest-apis.md#tag-a-topic) * [Add a topic owner and email](stream-governance/stream-catalog-rest-apis.md#add-a-topic-owner-and-email) * [Create a tag for a topic](stream-governance/stream-catalog-rest-apis.md#create-a-tag-for-a-topic) * [Connectors](stream-governance/stream-catalog-rest-apis.md#connectors) * [List all connectors](stream-governance/stream-catalog-rest-apis.md#list-all-connectors) * [Search for connectors by name](stream-governance/stream-catalog-rest-apis.md#search-for-connectors-by-name) * [Tag a connector](stream-governance/stream-catalog-rest-apis.md#tag-a-connector) * [Business metadata API examples](stream-governance/stream-catalog-rest-apis.md#business-metadata-api-examples) * [Create a schema](stream-governance/stream-catalog-rest-apis.md#create-a-schema) * [Create your first business metadata definition](stream-governance/stream-catalog-rest-apis.md#create-your-first-business-metadata-definition) * [Create a business metadata definition for a topic](stream-governance/stream-catalog-rest-apis.md#create-a-business-metadata-definition-for-a-topic) * [Get all the business metadata definitions created so far](stream-governance/stream-catalog-rest-apis.md#get-all-the-business-metadata-definitions-created-so-far) * [Get a specific business metadata definition by its name](stream-governance/stream-catalog-rest-apis.md#get-a-specific-business-metadata-definition-by-its-name) * [Update business metadata definitions](stream-governance/stream-catalog-rest-apis.md#update-business-metadata-definitions) * [Add business metadata to a schema-related entity](stream-governance/stream-catalog-rest-apis.md#add-business-metadata-to-a-schema-related-entity) * [Add business metadata to a topic](stream-governance/stream-catalog-rest-apis.md#add-business-metadata-to-a-topic) * [Add business metadata to a connector](stream-governance/stream-catalog-rest-apis.md#add-business-metadata-to-a-connector) * [Get business metadata associated with an instance of an entity](stream-governance/stream-catalog-rest-apis.md#get-business-metadata-associated-with-an-instance-of-an-entity) * [Update business metadata associated with entity](stream-governance/stream-catalog-rest-apis.md#update-business-metadata-associated-with-entity) * [Search for business metadata associated with entity](stream-governance/stream-catalog-rest-apis.md#search-for-business-metadata-associated-with-entity) * [Remove business metadata associated with an entity](stream-governance/stream-catalog-rest-apis.md#remove-business-metadata-associated-with-an-entity) * [Delete business metadata definitions](stream-governance/stream-catalog-rest-apis.md#delete-business-metadata-definitions) * [Related content](stream-governance/stream-catalog-rest-apis.md#related-content) * [GraphQL API](stream-governance/graphql.md) * [Overview](stream-governance/graphql.md#overview) * [What is it?](stream-governance/graphql.md#what-is-it) * [Why is it important?](stream-governance/graphql.md#why-is-it-important) * [When to use REST API and when to use GraphQL API](stream-governance/graphql.md#when-to-use-rest-api-and-when-to-use-graphql-api) * [Getting started](stream-governance/graphql.md#getting-started) * [GraphQL endpoint](stream-governance/graphql.md#graphql-endpoint) * [GraphQL schema](stream-governance/graphql.md#graphql-schema) * [Entity queries](stream-governance/graphql.md#entity-queries) * [Fetch list of entities](stream-governance/graphql.md#fetch-list-of-entities) * [Fetch nested entities using relationships](stream-governance/graphql.md#fetch-nested-entities-using-relationships) * [Filtering using the “where” argument](stream-governance/graphql.md#filtering-using-the-where-argument) * [Sort Using the “order_by” Argument](stream-governance/graphql.md#sort-using-the-order-by-argument) * [Pagination with the “limit” and “offset” Arguments](stream-governance/graphql.md#pagination-with-the-limit-and-offset-arguments) * [Filtering by tag with the “tags” argument](stream-governance/graphql.md#filtering-by-tag-with-the-tags-argument) * [Including deleted objects with the “deleted” argument](stream-governance/graphql.md#including-deleted-objects-with-the-deleted-argument) * [GraphQL API usage limitations and best practices](stream-governance/graphql.md#graphql-api-usage-limitations-and-best-practices) * [Global sorting of search results](stream-governance/graphql.md#global-sorting-of-search-results) * [API limits](stream-governance/graphql.md#api-limits) * [Query limits](stream-governance/graphql.md#query-limits) * [Time limits](stream-governance/graphql.md#time-limits) * [Rate limits](stream-governance/graphql.md#rate-limits) * [API reference](stream-governance/graphql.md#api-reference) * [Related content](stream-governance/graphql.md#related-content) * [Service Quotas API](quotas/quotas.md) * [Get an API key and secret](quotas/quotas.md#get-an-api-key-and-secret) * [Service Quotas API endpoints](quotas/quotas.md#quotas-api-endpoints) * [Query a Service Quotas endpoint](quotas/quotas.md#query-a-service-quotas-endpoint) * [Paged responses](quotas/quotas.md#paged-responses) * [RBAC Model](quotas/quotas.md#rbac-model) * [Example requests](quotas/quotas.md#example-requests) * [Filtering](quotas/quotas.md#filtering) * [Query for scopes](quotas/quotas.md#query-for-scopes) * [Organization quotas](quotas/quotas.md#organization-quotas) * [Max BYOK keys per organization](quotas/quotas.md#max-byok-keys-per-organization) * [Max API keys scoped for resource management for an organization](quotas/quotas.md#max-api-keys-scoped-for-resource-management-for-an-organization) * [Max environments for an organization](quotas/quotas.md#max-environments-for-an-organization) * [Max Kafka clusters for an organization](quotas/quotas.md#max-ak-clusters-for-an-organization) * [Max service accounts for an organization](quotas/quotas.md#max-service-accounts-for-an-organization) * [Max user accounts for an organization](quotas/quotas.md#max-user-accounts-for-an-organization) * [Max pending invitations for an organization](quotas/quotas.md#max-pending-invitations-for-an-organization) * [Max audit log consumer API keys per organization](quotas/quotas.md#max-audit-log-consumer-api-keys-per-organization) * [Max Kafka cluster provisioning requests per day](quotas/quotas.md#max-ak-cluster-provisioning-requests-per-day) * [Service account quotas](quotas/quotas.md#service-account-quotas) * [Maximum API keys scoped for resource management per service account](quotas/quotas.md#maximum-api-keys-scoped-for-resource-management-per-service-account) * [Max Kafka API keys per service account](quotas/quotas.md#max-ak-api-keys-per-service-account) * [User account quotas](quotas/quotas.md#user-account-quotas) * [Maximum API keys scoped for resource management per user](quotas/quotas.md#maximum-api-keys-scoped-for-resource-management-per-user) * [Max Cluster API keys per user](quotas/quotas.md#max-cluster-api-keys-per-user) * [Environment quotas](quotas/quotas.md#environment-quotas) * [Get your environment ID](quotas/quotas.md#get-your-environment-id) * [Max clusters for an environment](quotas/quotas.md#max-clusters-for-an-environment) * [Max cluster CKUs for an environment](quotas/quotas.md#max-cluster-ckus-for-an-environment) * [Max pending clusters for an environment](quotas/quotas.md#max-pending-clusters-for-an-environment) * [Max ksqlDB clusters for an environment](quotas/quotas.md#max-ksqldb-clusters-for-an-environment) * [Kafka cluster quotas](quotas/quotas.md#ak-cluster-quotas) * [Get the Kafka cluster ID](quotas/quotas.md#get-the-ak-cluster-id) * [Max API keys per Kafka cluster](quotas/quotas.md#max-api-keys-per-ak-cluster) * [Max private links per Kafka cluster](quotas/quotas.md#max-private-links-per-ak-cluster) * [Max peering connections per Kafka cluster](quotas/quotas.md#max-peering-connections-per-ak-cluster) * [Max CKUs per Kafka cluster](quotas/quotas.md#max-ckus-per-ak-cluster) * [Network quotas](quotas/quotas.md#network-quotas) * [Max peering connections per network](quotas/quotas.md#max-peering-connections-per-network) * [Max private link connections per network](quotas/quotas.md#max-private-link-connections-per-network) * [Related content](quotas/quotas.md#related-content) ## Can I access logs for Confluent Cloud services? Internal service logs for Confluent Cloud managed services (such as Kafka brokers, Schema Registry, and other infrastructure components) are not directly accessible to customers, but there are several tools and approaches to help you debug and monitor your streaming applications. **General monitoring and debugging tools:** The [Confluent Cloud Metrics](monitoring/metrics-api.md#metrics-api) provides actionable operational metrics about your Confluent Cloud deployment. The Confluent Cloud Console shows cluster activity and usage relative to your cluster’s capacity. The Cloud Console also includes [topic management](topics/overview.md#cloud-topics-manage) and [consumer lag monitoring](monitoring/monitor-lag.md#cloud-monitoring-lag). [Build Streaming Applications](client-apps/index.md#ccloud-best-practices) details best practices for configuring, monitoring, and debugging Kafka clients. **Component-specific logging options:** For some Confluent Cloud services, specific logging and monitoring capabilities are available: - **Audit logs**: [Confluent Cloud audit logs](monitoring/audit-logging/cloud-audit-log-concepts.md#cloud-audit-logs) track administrative and data plane activities within your organization. - **Connector events**: [View connector events](connectors/logging-cloud-connectors.md#ccloud-connector-logging) to monitor and troubleshoot your connectors. - **Flink user-defined functions**: [Enable logging in Flink UDFs](flink/how-to-guides/enable-udf-logging.md#flink-sql-enable-udf-logging) for custom application debugging. - **ksqlDB processing logs**: Monitor ksqlDB application health using [ksqlDB processing logs](ksqldb/monitoring-ksqldb.md#cloud-ksql-monitor). For comprehensive monitoring guidance, see [Confluent Cloud Metrics](monitoring/metrics-api.md#metrics-api). ## Kafka * [PR-20633](https://github.com/apache/kafka/pull/20633) - KAFKA-19748: Fix metrics leak in Kafka Streams (#20633) * [PR-20618](https://github.com/apache/kafka/pull/20618) - KAFKA-19690 Add epoch check before verification guard check to prevent unexpected fatal error (#20618) * [PR-20583](https://github.com/apache/kafka/pull/20583) - [MINOR] Cleaning ignored streams test (#20583) * [PR-20604](https://github.com/apache/kafka/pull/20604) - KAFKA-19719 –no-initial-controllers should not assume kraft.version=1 (#20604) * [PR-19961](https://github.com/apache/kafka/pull/19961) - KAFKA-19390: Call safeForceUnmap() in AbstractIndex.resize() on Linux to prevent stale mmap of index files (#19961) * [PR-20591](https://github.com/apache/kafka/pull/20591) - KAFKA-19732, KAFKA-19716: Clear out coordinator snapshots periodically while loading (#20591) * [PR-20581](https://github.com/apache/kafka/pull/20581) - KAFKA-19546: Rebalance should be triggered by subscription change during group protocol downgrade (#20581) * [PR-20519](https://github.com/apache/kafka/pull/20519) - KAFKA-19695: Fix bug in redundant offset calculation. (#20516) (#20519) * [PR-20512](https://github.com/apache/kafka/pull/20512) - KAFKA-19679: Fix NoSuchElementException in oldest open iterator metric (#20512) * [PR-20470](https://github.com/apache/kafka/pull/20470) - KAFKA-19668: processValue() must be declared as value-changing operation (#20470) * [13f70256](https://github.com/apache/kafka/commit/13f70256db3c994c590e5d262a7cc50b9e973204) - Bump version to 4.1.0 * [70dd1ca2](https://github.com/apache/kafka/commit/70dd1ca2cab81f78c68782659db1d8453b1de5d6) - Revert “Bump version to 4.1.0” * [PR-20405](https://github.com/apache/kafka/pull/20405) - KAFKA-19642 Replace dynamicPerBrokerConfigs with dynamicDefaultConfigs (#20405) * [PR-1777](https://github.com/confluentinc/kafka/pull/1777) - KSECURITY-2558: Bump jetty to version 12.0.25 in 4.1 * [PR-20070](https://github.com/apache/kafka/pull/20070) - KAFKA-19429: Deflake streams_smoke_test, again (#20070) * [PR-20398](https://github.com/apache/kafka/pull/20398) - Revert “KAFKA-13722: remove usage of old ProcessorContext (#18292)” (#20398) * [PR-1765](https://github.com/confluentinc/kafka/pull/1765) - DPA-1801 Add run_tags to worker-ami and aws-packer * [PR-1746](https://github.com/confluentinc/kafka/pull/1746) - Change ci_tools import path * [23b64404](https://github.com/apache/kafka/commit/23b64404ae7ba98d89a2d456991abaf2f32af35f) - Bump version to 4.1.0 * [6340f437](https://github.com/apache/kafka/commit/6340f437cd2d15be4180febb9505437266080002) - Revert “Bump version to 4.1.0” * [de16dd10](https://github.com/apache/kafka/commit/de16dd103af93bb68a329987ff19469941f85cbc) - KAFKA-19581: Temporary fix for Streams system tests * [PR-20269](https://github.com/apache/kafka/pull/20269) - KAFKA-19576 Fix typo in state-change log filename after rotate (#20269) * [PR-20274](https://github.com/apache/kafka/pull/20274) - KAFKA-19529: State updater sensor names should be unique (#20262) (#20274) * [PR-1708](https://github.com/confluentinc/kafka/pull/1708) - DPA-1675: In case of infra failure in ccs-kafka tag that as infra failure in testbreak * [PR-20165](https://github.com/apache/kafka/pull/20165) - KAFKA-19501 Update OpenJDK base image from buster to bullseye (#20165) * [e14d849c](https://github.com/apache/kafka/commit/e14d849cbf8836cc9e4a592342baf19a1fbd93c9) - Bump version to 4.1.0 * [PR-20200](https://github.com/apache/kafka/pull/20200) - KAFKA-19522: avoid electing fenced lastKnownLeader (#20200) * [PR-20196](https://github.com/apache/kafka/pull/20196) - KAFKA-19520 Bump Commons-Lang for CVE-2025-48924 (#20196) * [PR-20040](https://github.com/apache/kafka/pull/20040) - KAFKA-19427 Allow the coordinator to grow its buffer dynamically (#20040) * [PR-20166](https://github.com/apache/kafka/pull/20166) - KAFKA-19504: Remove unused metrics reporter initialization in KafkaAdminClient (#20166) * [PR-20151](https://github.com/apache/kafka/pull/20151) - KAFKA-19495: Update config for native image (v4.1.0) (#20151) * [610f0765](https://github.com/apache/kafka/commit/610f076542e1ac177c4b97ea7d6ca1335f9a3065) - Bump version to 4.1.0 * [PR-1684](https://github.com/confluentinc/kafka/pull/1684) - DPA-1489 migrate from vagrant to terraform * [PR-1693](https://github.com/confluentinc/kafka/pull/1693) - Revert “Temporarily disable artifact publishing for the 4.1 branch.” * [57e81f20](https://github.com/apache/kafka/commit/57e81f201055b58f94febf0509bfc8acba632854) - Bump version to 4.1.0 * [PR-20071](https://github.com/apache/kafka/pull/20071) - KAFKA-19184: Add documentation for upgrading the kraft version (#20071) * [PR-20116](https://github.com/apache/kafka/pull/20116) - KAFKA-19444: Add back JoinGroup v0 & v1 (#20116) * [PR-19964](https://github.com/apache/kafka/pull/19964) - KAFKA-19397: Ensure consistent metadata usage in produce request and response (#19964) * [PR-19971](https://github.com/apache/kafka/pull/19971) - KAFKA-19042 Move ProducerSendWhileDeletionTest to client-integration-tests module (#19971) * [PR-20100](https://github.com/apache/kafka/pull/20100) - KAFKA-19453: Ignore group not found in share group record replay (#20100) * [PR-20025](https://github.com/apache/kafka/pull/20025) - KAFKA-19152: Add top-level documentation for OAuth flows (#20025) * [PR-20029](https://github.com/apache/kafka/pull/20029) - KAFKA-19379: Basic upgrade guide for KIP-1071 EA (#20029) * [PR-20062](https://github.com/apache/kafka/pull/20062) - KAFKA-19445: Fix coordinator runtime metrics sharing sensors (#20062) * [PR-19704](https://github.com/apache/kafka/pull/19704) - KAFKA-19246; OffsetFetch API does not return group level errors correctly with version 1 (#19704) * [PR-19985](https://github.com/apache/kafka/pull/19985) - KAFKA-19414: Remove 2PC public APIs from 4.1 until release (KIP-939) (#19985) * [PR-1672](https://github.com/confluentinc/kafka/pull/1672) - DPA-1593 exclude newly added files to fix build * [PR-1663](https://github.com/confluentinc/kafka/pull/1663) - DPA-1593 add cloudwatch metrics to view cpu, memory and disk usage * [PR-20022](https://github.com/apache/kafka/pull/20022) - KAFKA-19398: (De)Register oldest-iterator-open-since-ms metric dynamically (#20022) * [PR-20033](https://github.com/apache/kafka/pull/20033) - KAFKA-19383: Handle the deleted topics when applying ClearElrRecord (#20033) * [PR-19745](https://github.com/apache/kafka/pull/19745) - KAFKA-19294: Fix BrokerLifecycleManager RPC timeouts (#19745) * [PR-19974](https://github.com/apache/kafka/pull/19974) - KAFKA-19411: Fix deleteAcls bug which allows more deletions than max records per user op (#19974) * [PR-19972](https://github.com/apache/kafka/pull/19972) - KAFKA-19407 Fix potential IllegalStateException when appending to timeIndex (#19972) * [PR-1659](https://github.com/confluentinc/kafka/pull/1659) - Reapply “KAFKA-18296 Remove deprecated KafkaBasedLog constructor (#18 * [PR-20019](https://github.com/apache/kafka/pull/20019) - KAFKA-19429: Deflake streams_smoke_test (#20019) * [PR-19999](https://github.com/apache/kafka/pull/19999) - KAFKA-19421: Deflake streams_broker_down_resilience_test (#19999) * [PR-20004](https://github.com/apache/kafka/pull/20004) - KAFKA-19422: Deflake streams_application_upgrade_test (#20004) * [PR-20005](https://github.com/apache/kafka/pull/20005) - KAFKA-19423: Deflake streams_broker_bounce_test (#20005) * [PR-19983](https://github.com/apache/kafka/pull/19983) - KAFKA-19356: Prevent new consumer fetch assigned partitions not in explicit subscription (#19983) * [PR-19917](https://github.com/apache/kafka/pull/19917) - KAFKA-19297: Refactor AsyncKafkaConsumer’s use of Java Streams APIs in critical sections (#19917) * [PR-19981](https://github.com/apache/kafka/pull/19981) - KAFKA-19413: Extended AuthorizerIntegrationTest to cover StreamsGroupDescribe (#19981) * [PR-19978](https://github.com/apache/kafka/pull/19978) - KAFKA-19412: Extended AuthorizerIntegrationTest to cover StreamsGroupHeartbeat (#19978) * [PR-19976](https://github.com/apache/kafka/pull/19976) - KAFKA-19367: Follow up bug fix (#19976) * [PR-19800](https://github.com/apache/kafka/pull/19800) - KAFKA-14145; Faster KRaft HWM replication (#19800) * [PR-1655](https://github.com/confluentinc/kafka/pull/1655) - Add back deprecated constructors in KafkaBasedLog * [PR-19938](https://github.com/apache/kafka/pull/19938) - KAFKA-19153: Add OAuth integration tests (#19938) * [PR-19910](https://github.com/apache/kafka/pull/19910) - KAFKA-19367: Fix InitProducerId with TV2 double-increments epoch if ongoing transaction is aborted (#19910) * [PR-19814](https://github.com/apache/kafka/pull/19814) - KAFKA-18117; KAFKA-18729: Use assigned topic IDs to avoid full metadata requests on broker-side regex (#19814) * [PR-19904](https://github.com/apache/kafka/pull/19904) - KAFKA-18961: Time-based refresh for server-side RE2J regex (#19904) * [PR-19939](https://github.com/apache/kafka/pull/19939) - KAFKA-19359: force bump commons-beanutils for CVE-2025-48734 (#19939) * [b311ac7d](https://github.com/apache/kafka/commit/b311ac7dd5bce649fd5bd83a948f95c8c468a9aa) - Temporarily disable artifact publishing for the 4.1 branch. * [PR-19607](https://github.com/apache/kafka/pull/19607) - KAFKA-19221 Propagate IOException on LogSegment#close (#19607) * [PR-19928](https://github.com/apache/kafka/pull/19928) - KAFKA-19389: Fix memory consumption for completed share fetch requests (#19928) * [PR-19895](https://github.com/apache/kafka/pull/19895) - KAFKA-19244: Add support for kafka-streams-groups.sh options (delete offsets) [4/N] (#19895) * [PR-19908](https://github.com/apache/kafka/pull/19908) - KAFKA-19376: Throw an error message if any unsupported feature is used with KIP-1071 (#19908) * [PR-19936](https://github.com/apache/kafka/pull/19936) - KAFKA-19392 Fix metadata.log.segment.ms not being applied (#19936) * [PR-19919](https://github.com/apache/kafka/pull/19919) - KAFKA-19382:Upgrade junit from 5.10 to 5.13 (#19919) * [PR-19929](https://github.com/apache/kafka/pull/19929) - KAFKA-18486 Remove becomeLeaderOrFollower from readFromLogWithOffsetOutOfRange and other related methods. (#19929) * [PR-19931](https://github.com/apache/kafka/pull/19931) - KAFKA-19283: Update transaction exception handling documentation (#19931) * [PR-19832](https://github.com/apache/kafka/pull/19832) - KAFKA-19271: allow intercepting internal method call (#19832) * [PR-19918](https://github.com/apache/kafka/pull/19918) - KAFKA-19386: Correcting ExpirationReaper thread names from Purgatory (#19918) * [PR-19817](https://github.com/apache/kafka/pull/19817) - KAFKA-19334 MetadataShell execution unintentionally deletes lock file (#19817) * [PR-19922](https://github.com/apache/kafka/pull/19922) - KAFKA-18486 Remove ReplicaManager#becomeLeaderOrFollower from testReplicaAlterLogDirs (#19922) * [PR-19827](https://github.com/apache/kafka/pull/19827) - KAFKA-19042 Move PlaintextConsumerSubscriptionTest to client-integration-tests module (#19827) * [PR-19883](https://github.com/apache/kafka/pull/19883) - KAFKA-18486 Update testExceptionWhenUnverifiedTransactionHasMultipleProducerIds (#19883) * [PR-19890](https://github.com/apache/kafka/pull/19890) - KAFKA-18486 Update activeProducerState wih KRaft mechanism in ReplicaManagerTest (#19890) * [PR-19879](https://github.com/apache/kafka/pull/19879) - KAFKA-14895 [1/N] Move AddPartitionsToTxnManager files to java (#19879) * [PR-19915](https://github.com/apache/kafka/pull/19915) - KAFKA-19295: Remove AsyncKafkaConsumer event ID generation (#19915) * [PR-19902](https://github.com/apache/kafka/pull/19902) - KAFKA-18202: Add rejection for non-zero sequences in TV2 (KIP-890) (#19902) * [PR-15913](https://github.com/apache/kafka/pull/15913) - KAFKA-19309 : Add transaction client template code in kafka examples (#15913) * [PR-19900](https://github.com/apache/kafka/pull/19900) - KAFKA-19369: Add group.share.assignors config and integration test (#19900) * [PR-19815](https://github.com/apache/kafka/pull/19815) - KAFKA-19290: Exploit mapKey optimisation in protocol requests and responses (wip) (#19815) * [PR-19889](https://github.com/apache/kafka/pull/19889) - KAFKA-18913: Start state updater in task manager (#19889) * [PR-19907](https://github.com/apache/kafka/pull/19907) - KAFKA-19370: Create JMH benchmark for share group assignor (#19907) * [PR-19773](https://github.com/apache/kafka/pull/19773) - KAFKA-19042 Move PlaintextConsumerAssignTest to clients-integration-tests module (#19773) * [PR-18739](https://github.com/apache/kafka/pull/18739) - KAFKA-16505: Add source raw key and value (#18739) * [PR-19903](https://github.com/apache/kafka/pull/19903) - KAFKA-19373 Fix protocol name comparison (#19903) * [PR-18325](https://github.com/apache/kafka/pull/18325) - KAFKA-19248: Multiversioning in Kafka Connect - Plugin Loading Isolation Tests (#18325) * [PR-19844](https://github.com/apache/kafka/pull/19844) - KAFKA-18042: Reject the produce request with lower producer epoch early (KIP-890) (#19844) * [PR-19901](https://github.com/apache/kafka/pull/19901) - KAFKA-19372: StreamsGroup not subscribed to a topic when empty (#19901) * [PR-19722](https://github.com/apache/kafka/pull/19722) - KAFKA-19044: Handle tasks that are not present in the current topology (#19722) * [PR-19856](https://github.com/apache/kafka/pull/19856) - KAFKA-17747: [7/N] Add consumer group integration test for rack aware assignment (#19856) * [PR-19898](https://github.com/apache/kafka/pull/19898) - KAFKA-19347 Deduplicate ACLs when creating (#19898) * [PR-19872](https://github.com/apache/kafka/pull/19872) - KAFKA-19328: SharePartitionManagerTest testMultipleConcurrentShareFetches doAnswer chaining needs verification (#19872) * [PR-19758](https://github.com/apache/kafka/pull/19758) - KAFKA-19244: Add support for kafka-streams-groups.sh options (delete all groups) [2/N] (#19758) * [PR-19754](https://github.com/apache/kafka/pull/19754) - KAFKA-18573: Add support for OAuth jwt-bearer grant type (#19754) * [PR-19656](https://github.com/apache/kafka/pull/19656) - KAFKA-19250 : txnProducer.abortTransaction() API should not return abortable exception (#19656) * [PR-19522](https://github.com/apache/kafka/pull/19522) - KAFKA-19176: Update Transactional producer to translate retriable into abortable exceptions (#19522) * [PR-19796](https://github.com/apache/kafka/pull/19796) - KAFKA-17747: [6/N] Replace subscription metadata with metadata hash in share group (#19796) * [PR-19802](https://github.com/apache/kafka/pull/19802) - KAFKA-17747: [5/N] Replace subscription metadata with metadata hash in stream group (#19802) * [PR-19861](https://github.com/apache/kafka/pull/19861) - KAFKA-19338: Error on read/write of uninitialized share part. (#19861) * [PR-19878](https://github.com/apache/kafka/pull/19878) - KAFKA-19358: Updated share_consumer_test.py tests to use set_group_offset_reset_strategy (#19878) * [PR-19849](https://github.com/apache/kafka/pull/19849) - KAFKA-19349 Move CreateTopicsRequestWithPolicyTest to clients-integration-tests (#19849) * [PR-19877](https://github.com/apache/kafka/pull/19877) - KAFKA-18904: [4/N] Add ListClientMetricsResources metric if request is v0 ListConfigResources (#19877) * [PR-19811](https://github.com/apache/kafka/pull/19811) - KAFKA-19320: Added share_consume_bench_test.py system tests (#19811) * [PR-19831](https://github.com/apache/kafka/pull/19831) - KAFKA-16894: share.version becomes stable feature for preview (#19831) * [PR-19836](https://github.com/apache/kafka/pull/19836) - KAFKA-19321: Added share_consumer_performance.py and related system tests (#19836) * [PR-19866](https://github.com/apache/kafka/pull/19866) - KAFKA-19355 Remove interBrokerListenerName from ClusterControlManager (#19866) * [PR-19327](https://github.com/apache/kafka/pull/19327) - KAFKA-19053 Remove FetchResponse#of which is not used in production … (#19327) * [PR-19728](https://github.com/apache/kafka/pull/19728) - KAFKA-19284 Add documentation to clarify the behavior of null values for all partitionsToOffsetAndMetadata methods. (#19728) * [PR-19864](https://github.com/apache/kafka/pull/19864) - KAFKA-19311 Document commitAsync behavioral differences between Classic and Async Consumer (#19864) * [PR-19685](https://github.com/apache/kafka/pull/19685) - KAFKA-19042 Move GroupAuthorizerIntegrationTest to clients-integration-tests module (#19685) * [PR-19651](https://github.com/apache/kafka/pull/19651) - KAFKA-19042 Move BaseConsumerTest, SaslPlainPlaintextConsumerTest to client-integration-tests module (#19651) * [PR-19846](https://github.com/apache/kafka/pull/19846) - KAFKA-19346: Move LogReadResult to server module (#19846) * [PR-19855](https://github.com/apache/kafka/pull/19855) - KAFKA-19351: AsyncConsumer#commitAsync should copy the input offsets (#19855) * [PR-19810](https://github.com/apache/kafka/pull/19810) - KAFKA-19042 move ConsumerWithLegacyMessageFormatIntegrationTest to clients-integration-tests module (#19810) * [PR-19808](https://github.com/apache/kafka/pull/19808) - KAFKA-18904: kafka-configs.sh return resource doesn’t exist message [3/N] (#19808) * [PR-19714](https://github.com/apache/kafka/pull/19714) - KAFKA-19082:[4/4] Complete Txn Client Side Changes (KIP-939) (#19714) * [PR-19404](https://github.com/apache/kafka/pull/19404) - KAFKA-6629: parameterise SegmentedCacheFunctionTest for session key schemas (#19404) * [PR-19843](https://github.com/apache/kafka/pull/19843) - KAFKA-19337: Write state writes snapshot for higher state epoch. (#19843) * [PR-19741](https://github.com/apache/kafka/pull/19741) - KAFKA-19056 Rewrite EndToEndClusterIdTest in Java and move it to the server module (#19741) * [PR-19774](https://github.com/apache/kafka/pull/19774) - KAFKA-19316: added share_group_command_test.py system tests (#19774) * [PR-19838](https://github.com/apache/kafka/pull/19838) - KAFKA-19344: Replace desc.assignablePartitions with spec.isPartitionAssignable. (#19838) * [PR-19840](https://github.com/apache/kafka/pull/19840) - KAFKA-19347 Don’t update timeline data structures in createAcls (#19840) * [PR-19826](https://github.com/apache/kafka/pull/19826) - KAFKA-19342: Authorization tests for alter share-group offsets (#19826) * [PR-19818](https://github.com/apache/kafka/pull/19818) - KAFKA-19335: Membership managers send negative epoch in JOINING (#19818) * [PR-19778](https://github.com/apache/kafka/pull/19778) - KAFKA-19285: Added more tests in SharePartitionManagerTest (#19778) * [PR-19823](https://github.com/apache/kafka/pull/19823) - KAFKA-19310: (MINOR) Missing mocks for DelayedShareFetchTest tests related to Memory Records slicing (#19823) * [PR-19761](https://github.com/apache/kafka/pull/19761) - KAFKA-17747: [4/N] Replace subscription metadata with metadata hash in consumer group (#19761) * [PR-19835](https://github.com/apache/kafka/pull/19835) - KAFKA-19336 Upgrade Jackson to 2.19.0 (#19835) * [PR-19744](https://github.com/apache/kafka/pull/19744) - KAFKA-19154; Offset Fetch API should return INVALID_OFFSET if requested topic id does not match persisted one (#19744) * [PR-19812](https://github.com/apache/kafka/pull/19812) - KAFKA-19330 Change MockSerializer/Deserializer to use String serializer instead of byte[] (#19812) * [PR-19790](https://github.com/apache/kafka/pull/19790) - KAFKA-18687: Setting the subscriptionMetadata during conversion to consumer group (#19790) * [PR-19786](https://github.com/apache/kafka/pull/19786) - KAFKA-19268 Missing mocks for SharePartitionManagerTest tests (#19786) * [PR-19798](https://github.com/apache/kafka/pull/19798) - KAFKA-19322 Remove the DelayedOperation constructor that accepts an external lock (#19798) * [PR-19779](https://github.com/apache/kafka/pull/19779) - KAFKA-19300 AsyncConsumer#unsubscribe always timeout due to GroupAuthorizationException (#19779) * [PR-19093](https://github.com/apache/kafka/pull/19093) - KAFKA-18424: Consider splitting PlaintextAdminIntegrationTest#testConsumerGroups (#19093) * [PR-19371](https://github.com/apache/kafka/pull/19371) - KAFKA-19080 The constraint on segment.ms is not enforced at topic level (#19371) * [PR-19681](https://github.com/apache/kafka/pull/19681) - KAFKA-19034 [1/N] Rewrite RemoteTopicCrudTest by ClusterTest and move it to storage module (#19681) * [PR-19759](https://github.com/apache/kafka/pull/19759) - KAFKA-19312 Avoiding concurrent execution of onComplete and tryComplete (#19759) * [PR-19767](https://github.com/apache/kafka/pull/19767) - KAFKA-19313 Replace LogOffsetMetadata#UNIFIED_LOG_UNKNOWN_OFFSET by UnifiedLog.UNKNOWN_OFFSET (#19767) * [PR-19747](https://github.com/apache/kafka/pull/19747) - KAFKA-18345; Wait the entire election timeout on election loss (#19747) * [PR-19687](https://github.com/apache/kafka/pull/19687) - KAFKA-19260 Move LoggingController to server module (#19687) * [PR-18929](https://github.com/apache/kafka/pull/18929) - KAFKA-16717 [2/N]: Add AdminClient.alterShareGroupOffsets (#18929) * [PR-19729](https://github.com/apache/kafka/pull/19729) - KAFKA-19069 DumpLogSegments does not dump the LEADER_CHANGE record (#19729) * [PR-19781](https://github.com/apache/kafka/pull/19781) - KAFKA-19204: Add timestamp to share state metadata init maps [1/N] (#19781) * [PR-19582](https://github.com/apache/kafka/pull/19582) - KAFKA-19042 Move PlaintextConsumerPollTest to client-integration-tests module (#19582) * [PR-19763](https://github.com/apache/kafka/pull/19763) - KAFKA-19314 Remove unnecessary code of closing snapshotWriter (#19763) * [PR-19743](https://github.com/apache/kafka/pull/19743) - KAFKA-18904: Add Admin#listConfigResources [2/N] (#19743) * [PR-19757](https://github.com/apache/kafka/pull/19757) - KAFKA-19291: Increase the timeout of remote storage share fetch requests in purgatory (#19757) * [PR-18951](https://github.com/apache/kafka/pull/18951) - KAFKA-4650: Add unit tests for GraphNode class (#18951) * [PR-19749](https://github.com/apache/kafka/pull/19749) - KAFKA-19287 document all group coordinator metrics (#19749) * [PR-19731](https://github.com/apache/kafka/pull/19731) - KAFKA-18783 : Extend InvalidConfigurationException related exceptions (#19731) * [PR-19658](https://github.com/apache/kafka/pull/19658) - KAFKA-18345; Prevent livelocked elections (#19658) * [PR-1627](https://github.com/confluentinc/kafka/pull/1627) - Trigger cp-jar-build to verify CP packaging in after_pipeline job * [PR-19755](https://github.com/apache/kafka/pull/19755) - KAFKA-19302 Move ReplicaState and Replica to server module (#19755) * [PR-19389](https://github.com/apache/kafka/pull/19389) - KAFKA-19042 Move PlaintextConsumerCommitTest to client-integration-tests module (#19389) * [PR-19611](https://github.com/apache/kafka/pull/19611) - KAFKA-17747: [3/N] Get rid of TopicMetadata in SubscribedTopicDescriberImpl (#19611) * [PR-19700](https://github.com/apache/kafka/pull/19700) - KAFKA-19202: Enable KIP-1071 in streams_eos_test (#19700) * [PR-19717](https://github.com/apache/kafka/pull/19717) - KAFKA-19280: Fix NoSuchElementException in UnifiedLog (#19717) * [PR-19691](https://github.com/apache/kafka/pull/19691) - KAFKA-19256: Only send IQ metadata on assignment changes (#19691) * [PR-19708](https://github.com/apache/kafka/pull/19708) - KAFKA-19226: Added test_console_share_consumer.py (#19708) * [PR-19683](https://github.com/apache/kafka/pull/19683) - KAFKA-19141; Persist topic id in OffsetCommit record (#19683) * [PR-19697](https://github.com/apache/kafka/pull/19697) - KAFKA-19271: Add internal ConsumerWrapper (#19697) * [PR-1625](https://github.com/confluentinc/kafka/pull/1625) - Increase timeout for Connect tests * [PR-19734](https://github.com/apache/kafka/pull/19734) - KAFKA-19217: Fix ShareConsumerTest.testComplexConsumer flakiness. (#19734) * [PR-19507](https://github.com/apache/kafka/pull/19507) - KAFKA-19171: Kafka Streams crashes with UnsupportedOperationException (#19507) * [PR-19709](https://github.com/apache/kafka/pull/19709) - KAFKA-19267 the min version used by ListOffsetsRequest should be 1 rather than 0 (#19709) * [PR-19580](https://github.com/apache/kafka/pull/19580) - KAFKA-19208: KStream-GlobalKTable join should not drop left-null-key record (#19580) * [PR-19493](https://github.com/apache/kafka/pull/19493) - KAFKA-18904: [1/N] Change ListClientMetricsResources API to ListConfigResources (#19493) * [PR-19713](https://github.com/apache/kafka/pull/19713) - KAFKA-19274; Group Coordinator Shards are not unloaded when \_\_consumer_offsets topic is deleted (#19713) * [PR-19701](https://github.com/apache/kafka/pull/19701) - KAFKA-19231-1: Handle fetch request when share session cache is full (#19701) * [PR-19721](https://github.com/apache/kafka/pull/19721) - KAFKA-19281: Add share enable flag to periodic jobs. (#19721) * [PR-19523](https://github.com/apache/kafka/pull/19523) - KAFKA-17747: [2/N] Add compute topic and group hash (#19523) * [PR-19698](https://github.com/apache/kafka/pull/19698) - KAFKA-19269 Unexpected error .. should not happen when the delete.topic.enable is false (#19698) * [PR-19718](https://github.com/apache/kafka/pull/19718) - KAFKA-19270: Remove Optional from ClusterInstance#controllerListenerName() return type (#19718) * [PR-19539](https://github.com/apache/kafka/pull/19539) - KAFKA-19082:[3/4] Add prepare txn method (KIP-939) (#19539) * [PR-19586](https://github.com/apache/kafka/pull/19586) - KAFKA-18666: Controller-side monitoring for broker shutdown and startup (#19586) * [PR-19635](https://github.com/apache/kafka/pull/19635) - KAFKA-19234: broker should return UNAUTHORIZATION error for non-existing topic in produce request (#19635) * [PR-19702](https://github.com/apache/kafka/pull/19702) - KAFKA-19273 Ensure the delete policy is configured when the tiered storage is enabled (#19702) * [PR-19553](https://github.com/apache/kafka/pull/19553) - KAFKA-19091 Fix race condition in DelayedFutureTest (#19553) * [PR-19666](https://github.com/apache/kafka/pull/19666) - KAFKA-19116, KAFKA-19258: Handling share group member change events (#19666) * [PR-19569](https://github.com/apache/kafka/pull/19569) - KAFKA-19206 ConsumerNetworkThread.cleanup() throws NullPointerException if initializeResources() previously failed (#19569) * [PR-19712](https://github.com/apache/kafka/pull/19712) - KAFKA-19275 client-state and thread-state metrics are always “Unavailable” (#19712) * [PR-19630](https://github.com/apache/kafka/pull/19630) - KAFKA-19145 Move LeaderEndPoint to Server module (#19630) * [PR-19622](https://github.com/apache/kafka/pull/19622) - KAFKA-18847: Refactor OAuth layer to improve reusability 1/N (#19622) * [PR-19677](https://github.com/apache/kafka/pull/19677) - KAFKA-18688: Fix uniform homogeneous assignor stability (#19677) * [PR-19659](https://github.com/apache/kafka/pull/19659) - KAFKA-19253: Improve metadata handling for share version using feature listeners (1/N) (#19659) * [PR-19559](https://github.com/apache/kafka/pull/19559) - KAFKA-19201: Handle deletion of user topics part of share partitions. (#19559) * [PR-19515](https://github.com/apache/kafka/pull/19515) - KAFKA-14691; Add TopicId to OffsetFetch API (#19515) * [PR-19705](https://github.com/apache/kafka/pull/19705) - KAFKA-19245: Updated default locks config for share group (#19705) * [PR-19496](https://github.com/apache/kafka/pull/19496) - KAFKA-19163: Avoid deleting groups with pending transactional offsets (#19496) * [PR-1554](https://github.com/confluentinc/kafka/pull/1554) - Chore: update repo by service bot * [PR-19644](https://github.com/apache/kafka/pull/19644) - KAFKA-18905; Disable idempotent producer to remove test flakiness (#19644) * [PR-19631](https://github.com/apache/kafka/pull/19631) - KAFKA-19242: Fix commit bugs caused by race condition during rebalancing. (#19631) * [PR-19497](https://github.com/apache/kafka/pull/19497) - KAFKA-19160;KAFKA-19164; Improve performance of fetching stable offsets (#19497) * [PR-19633](https://github.com/apache/kafka/pull/19633) - KAFKA-18695 Remove quorum=kraft and kip932 from all integration tests (#19633) * [PR-19673](https://github.com/apache/kafka/pull/19673) - KAFKA-19264 Remove fallback for thread pool sizes in RemoteLogManagerConfig (#19673) * [PR-19346](https://github.com/apache/kafka/pull/19346) - KAFKA-19068 Eliminate the duplicate type check in creating ControlRecord (#19346) * [PR-19543](https://github.com/apache/kafka/pull/19543) - KAFKA-19109 Don’t print null in kafka-metadata-quorum describe status (#19543) * [PR-19650](https://github.com/apache/kafka/pull/19650) - KAFKA-19220 Add tests to ensure the internal configs don’t return by public APIs by default (#19650) * [PR-1623](https://github.com/confluentinc/kafka/pull/1623) - KBROKER-295: Ignore failing quota_test * [PR-1622](https://github.com/confluentinc/kafka/pull/1622) - KBROKER-295: Ignore failing quota_test * [PR-19508](https://github.com/apache/kafka/pull/19508) - KAFKA-17897: Deprecate Admin.listConsumerGroups [2/N] (#19508) * [PR-19657](https://github.com/apache/kafka/pull/19657) - KAFKA-19209: Clarify index.interval.bytes impact on offset and time index (#19657) * [PR-18391](https://github.com/apache/kafka/pull/18391) - KAFKA-18115; Fix for loading big files while performing load tests (#18391) * [PR-19608](https://github.com/apache/kafka/pull/19608) - KAFKA-19182 Move SchedulerTest to server module (#19608) * [PR-19568](https://github.com/apache/kafka/pull/19568) - KAFKA-19087 Move TransactionState to transaction-coordinator module (#19568) * [PR-19581](https://github.com/apache/kafka/pull/19581) - KAFKA-18855 Slice API for MemoryRecords (#19581) * [PR-19590](https://github.com/apache/kafka/pull/19590) - KAFKA-19212: Correct the unclean leader election metric calculation (#19590) * [PR-19609](https://github.com/apache/kafka/pull/19609) - KAFKA-19214: Clean up use of Optionals in RequestManagers.entries() (#19609) * [PR-19640](https://github.com/apache/kafka/pull/19640) - KAFKA-19241: Updated tests in ShareFetchAcknowledgeRequestTest to reuse the socket for subsequent requests (#19640) * [PR-19598](https://github.com/apache/kafka/pull/19598) - KAFKA-19215: Handle share partition fetch lock cleanly using tokens (#19598) * [PR-19625](https://github.com/apache/kafka/pull/19625) - KAFKA-19202: Enable KIP-1071 in streams_standby_replica_test.py (#19625) * [PR-19602](https://github.com/apache/kafka/pull/19602) - KAFKA-19218: Add missing leader epoch to share group state summary response (#19602) * [PR-19574](https://github.com/apache/kafka/pull/19574) - KAFKA-19207 Move ForwardingManagerMetrics and ForwardingManagerMetricsTest to server module (#19574) * [PR-19528](https://github.com/apache/kafka/pull/19528) - KAFKA-19170 Move MetricsDuringTopicCreationDeletionTest to client-integration-tests module (#19528) * [PR-19612](https://github.com/apache/kafka/pull/19612) - KAFKA-19227: Piggybacked share fetch acknowledgements performance issue (#19612) * [PR-19639](https://github.com/apache/kafka/pull/19639) - KAFKA-19216: Eliminate flakiness in kafka.server.share.SharePartitionTest (#19639) * [PR-19592](https://github.com/apache/kafka/pull/19592) - KAFKA-19133: Support fetching for multiple remote fetch topic partitions in a single share fetch request (#19592) * [PR-19641](https://github.com/apache/kafka/pull/19641) - KAFKA-19240 Move MetadataVersionIntegrationTest to clients-integration-tests module (#19641) * [PR-19619](https://github.com/apache/kafka/pull/19619) - KAFKA-19232: Handle Share session limit reached exception in clients. (#19619) * [PR-19629](https://github.com/apache/kafka/pull/19629) - KAFKA-19131: Adjust remote storage reader thread maximum pool size to avoid illegal argument (#19629) * [PR-19393](https://github.com/apache/kafka/pull/19393) - KAFKA-19060 Documented null edge cases in the Clients API JavaDoc (#19393) * [PR-19578](https://github.com/apache/kafka/pull/19578) - KAFKA-19205: inconsistent result of beginningOffsets/endoffset between classic and async consumer with 0 timeout (#19578) * [PR-19571](https://github.com/apache/kafka/pull/19571) - KAFKA-18267 Add unit tests for CloseOptions (#19571) * [PR-19603](https://github.com/apache/kafka/pull/19603) - KAFKA-19204: Allow persister retry of initializing topics. (#19603) * [PR-1621](https://github.com/confluentinc/kafka/pull/1621) - Dexcom fix master * [PR-1620](https://github.com/confluentinc/kafka/pull/1620) - Dexcom fix 4.0 * [PR-19475](https://github.com/apache/kafka/pull/19475) - KAFKA-19146 Merge OffsetAndEpoch from raft to server-common (#19475) * [PR-19606](https://github.com/apache/kafka/pull/19606) - KAFKA-16894 Correct definition of ShareVersion (#19606) * [PR-19355](https://github.com/apache/kafka/pull/19355) - KAFKA-19073 add transactional ID pattern filter to ListTransactions (#19355) * [PR-19430](https://github.com/apache/kafka/pull/19430) - KAFKA-17541:[1/2] Improve handling of delivery count (#19430) * [PR-19329](https://github.com/apache/kafka/pull/19329) - KAFKA-19015: Remove share session from cache on share consumer connection drop (#19329) * [PR-19540](https://github.com/apache/kafka/pull/19540) - KAFKA-19169: Enhance AuthorizerIntegrationTest for share group APIs (#19540) * [PR-19604](https://github.com/apache/kafka/pull/19604) - KAFKA-19202: Enable KIP-1071 in streams_relational_smoke_test (#19604) * [PR-19587](https://github.com/apache/kafka/pull/19587) - KAFKA-16718-4/n: ShareGroupCommand changes for DeleteShareGroupOffsets admin call (#19587) * [PR-19601](https://github.com/apache/kafka/pull/19601) - KAFKA-19210: resolved the flakiness in testShareGroupHeartbeatInitializeOnPartitionUpdate (#19601) * [PR-19542](https://github.com/apache/kafka/pull/19542) - KAFKA-16894: Exploit share feature [3/N] (#19542) * [PR-19594](https://github.com/apache/kafka/pull/19594) - KAFKA-19202: Enable KIP-1071 in streams_broker_down_resilience_test (#19594) * [PR-19509](https://github.com/apache/kafka/pull/19509) - KAFKA-19173: Add Feature for “streams” group (#19509) * [PR-19191](https://github.com/apache/kafka/pull/19191) - KAFKA-18760: Deprecate Optional and return String from public Endpoint#listener (#19191) * [PR-19519](https://github.com/apache/kafka/pull/19519) - KAFKA-19139 Plugin#wrapInstance should use LinkedHashMap instead of Map (#19519) * [PR-19588](https://github.com/apache/kafka/pull/19588) - KAFKA-19135 Migrate initial IQ support for KIP-1071 from feature branch to trunk (#19588) * [PR-15968](https://github.com/apache/kafka/pull/15968) - KAFKA-10551: Add topic id support to produce request and response (#15968) * [PR-19470](https://github.com/apache/kafka/pull/19470) - KAFKA-19082: [2/4] Add preparedTxnState class to Kafka Producer (KIP-939) (#19470) * [PR-19584](https://github.com/apache/kafka/pull/19584) - KAFKA-19202: Enable KIP-1071 in streams_broker_bounce_test.py (#19584) * [PR-19593](https://github.com/apache/kafka/pull/19593) - KAFKA-19181-2: Increased offsets.commit.timeout.ms value as a temporary solution for the system test test_broker_failure failure (#19593) * [PR-19560](https://github.com/apache/kafka/pull/19560) - KAFKA-19202: Enable KIP-1071 in streams_smoke_test.py (#19560) * [PR-19555](https://github.com/apache/kafka/pull/19555) - KAFKA-19195: Only send the right group ID subset to each GC shard (#19555) * [PR-19535](https://github.com/apache/kafka/pull/19535) - KAFKA-19183 Replace Pool with ConcurrentHashMap (#19535) * [PR-19529](https://github.com/apache/kafka/pull/19529) - KAFKA-19178 Replace Vector by ArrayList for PluginClassLoader#getResources (#19529) * [PR-19478](https://github.com/apache/kafka/pull/19478) - KAFKA-16718-3/n: Added the ShareGroupStatePartitionMetadata record during deletion of share group offsets (#19478) * [PR-19520](https://github.com/apache/kafka/pull/19520) - KAFKA-19042 Move PlaintextConsumerFetchTest to client-integration-tests module (#19520) * [PR-19532](https://github.com/apache/kafka/pull/19532) - KAFKA-19131: Adjust remote storage reader thread maximum pool size to avoid illegal argument (#19532) * [PR-19504](https://github.com/apache/kafka/pull/19504) - KAFKA-17747: [1/N] Add MetadataHash field to Consumer/Share/StreamGroupMetadataValue (#19504) * [PR-19544](https://github.com/apache/kafka/pull/19544) - KAFKA-19190: Handle shutdown application correctly (#19544) * [PR-19552](https://github.com/apache/kafka/pull/19552) - KAFKA-19198: Resolve NPE when topic assigned in share group is deleted (#19552) * [PR-19548](https://github.com/apache/kafka/pull/19548) - KAFKA-19195: Only send the right group ID subset to each GC shard (#19548) * [PR-19450](https://github.com/apache/kafka/pull/19450) - KAFKA-19128: Kafka Streams should not get offsets when close dirty (#19450) * [PR-19545](https://github.com/apache/kafka/pull/19545) - KAFKA-19192; Old bootstrap checkpoint files cause problems updated servers (#19545) * [PR-17988](https://github.com/apache/kafka/pull/17988) - KAFKA-18988: Connect Multiversion Support (Updates to status and metrics) (#17988) * [PR-19429](https://github.com/apache/kafka/pull/19429) - KAFKA-19082: [1/4] Add client config for enable2PC and overloaded initProducerId (KIP-939) (#19429) * [PR-19536](https://github.com/apache/kafka/pull/19536) - KAFKA-18889: Make records in ShareFetchResponse non-nullable (#19536) * [PR-19457](https://github.com/apache/kafka/pull/19457) - KAFKA-19110: Add missing unit test for Streams-consumer integration (#19457) * [PR-19440](https://github.com/apache/kafka/pull/19440) - KAFKA-15767 Refactor TransactionManager to avoid use of ThreadLocal (#19440) * [PR-19453](https://github.com/apache/kafka/pull/19453) - KAFKA-19124: Follow up on code improvements (#19453) * [PR-19443](https://github.com/apache/kafka/pull/19443) - KAFKA-18170: Add scheduled job to snapshot cold share partitions. (#19443) * [PR-19505](https://github.com/apache/kafka/pull/19505) - KAFKA-19156: Streamlined share group configs, with usage in ShareSessionCache (#19505) * [PR-19541](https://github.com/apache/kafka/pull/19541) - KAFKA-19181: removed assertions in test_share_multiple_partitions as a result of change in assignor algorithm (#19541) * [PR-19461](https://github.com/apache/kafka/pull/19461) - KAFKA-14690; Add TopicId to OffsetCommit API (#19461) * [PR-19416](https://github.com/apache/kafka/pull/19416) - KAFKA-16538; Enable upgrading kraft version for existing clusters (#19416) * [PR-18673](https://github.com/apache/kafka/pull/18673) - KAFKA-18572: Update Kafka Streams metric documenation (#18673) * [PR-19500](https://github.com/apache/kafka/pull/19500) - KAFKA-19159: Removed time based evictions for share sessions (#19500) * [PR-19378](https://github.com/apache/kafka/pull/19378) - KAFKA-19057: Stabilize KIP-932 RPCs for AK 4.1 (#19378) * [PR-19518](https://github.com/apache/kafka/pull/19518) - KAFKA-19166: Fix RC tag in release script (#19518) * [PR-19525](https://github.com/apache/kafka/pull/19525) - KAFKA-19179: remove the dot from thread_dump_url (#19525) * [PR-19437](https://github.com/apache/kafka/pull/19437) - KAFKA-19019: Add support for remote storage fetch for share groups (#19437) * [PR-17099](https://github.com/apache/kafka/pull/17099) - KAFKA-8830 make Record Headers available in onAcknowledgement (#17099) * [PR-19526](https://github.com/apache/kafka/pull/19526) - KAFKA-19180 Fix the hanging testPendingTaskSize (#19526) * [PR-19302](https://github.com/apache/kafka/pull/19302) - KAFKA-14487: Move LogManager static methods/fields to storage module (#19302) * [PR-19487](https://github.com/apache/kafka/pull/19487) - KAFKA-18854 remove DynamicConfig inner class (#19487) * [PR-19286](https://github.com/apache/kafka/pull/19286) - KAFKA-18891: Add KIP-877 support to RemoteLogMetadataManager and RemoteStorageManager (#19286) * [PR-19462](https://github.com/apache/kafka/pull/19462) - KAFKA-17184: Fix the error thrown while accessing the RemoteIndexCache (#19462) * [PR-19477](https://github.com/apache/kafka/pull/19477) - KAFKA-17897 Deprecate Admin.listConsumerGroups (#19477) * [PR-18926](https://github.com/apache/kafka/pull/18926) - KAFKA-18332 fix ClassDataAbstractionCoupling problem in KafkaRaftClientTest(1/2) (#18926) * [PR-19465](https://github.com/apache/kafka/pull/19465) - KAFKA-19136 Move metadata-related configs from KRaftConfigs to MetadataLogConfig (#19465) * [PR-19503](https://github.com/apache/kafka/pull/19503) - KAFKA-19157: added group.share.max.share.sessions config (#19503) * [PR-1614](https://github.com/confluentinc/kafka/pull/1614) - CONFLUENT: Fix tools-log4j files in the scripts * [PR-1613](https://github.com/confluentinc/kafka/pull/1613) - CONFLUENT: Fix tools-log4j files names in the scripts * [PR-19474](https://github.com/apache/kafka/pull/19474) - KAFKA-14523: Move kafka.log.remote classes to storage (#19474) * [PR-19491](https://github.com/apache/kafka/pull/19491) - KAFKA-19162: Topology metadata contains non-deterministically ordered topic configs (#19491) * [PR-19394](https://github.com/apache/kafka/pull/19394) - KAFKA-19054: StreamThread exception handling with SHUTDOWN_APPLICATION may trigger a tight loop with MANY logs (#19394) * [PR-19454](https://github.com/apache/kafka/pull/19454) - KAFKA-19130: Do not add fenced brokers to BrokerRegistrationTracker on startup (#19454) * [PR-19460](https://github.com/apache/kafka/pull/19460) - KAFKA-19002 Rewrite ListOffsetsIntegrationTest and move it to clients-integration-test (#19460) * [PR-19492](https://github.com/apache/kafka/pull/19492) - KAFKA-19158: Add SHARE_SESSION_LIMIT_REACHED error code (#19492) * [PR-19488](https://github.com/apache/kafka/pull/19488) - KAFKA-19147: Start authorizer before group coordinator to ensure coordinator authorizes regex topics (#19488) * [PR-19298](https://github.com/apache/kafka/pull/19298) - KAFKA-19042 Move PlaintextConsumerCallbackTest to client-integration-tests module (#19298) * [PR-19472](https://github.com/apache/kafka/pull/19472) - KAFKA-13610: Deprecate log.cleaner.enable configuration (#19472) * [PR-19050](https://github.com/apache/kafka/pull/19050) - KAFKA-18888: Add KIP-877 support to Authorizer (#19050) * [PR-19420](https://github.com/apache/kafka/pull/19420) - KAFKA-18983 Ensure all README.md(s) are mentioned by the root README.md (#19420) * [PR-19433](https://github.com/apache/kafka/pull/19433) - KAFKA-18288: Add support kafka-streams-groups.sh –describe (#19433) * [PR-19464](https://github.com/apache/kafka/pull/19464) - KAFKA-19137 Use StandardCharsets.UTF_8 instead of StandardCharsets.UTF_8.name() (#19464) * [PR-19364](https://github.com/apache/kafka/pull/19364) - KAFKA-15370: ACL changes to support 2PC (KIP-939) (#19364) * [PR-19417](https://github.com/apache/kafka/pull/19417) - KAFKA-18900: Implement share.acknowledgement.mode to choose acknowledgement mode (#19417) * [PR-19463](https://github.com/apache/kafka/pull/19463) - KAFKA-18629: Account for existing deleting topics in share group delete. (#19463) * [PR-19319](https://github.com/apache/kafka/pull/19319) - KAFKA-19042 Move ProducerCompressionTest, ProducerFailureHandlingTest, and ProducerIdExpirationTest to client-integration-tests module (#19319) * [PR-19391](https://github.com/apache/kafka/pull/19391) - KAFKA-14523: Decouple RemoteLogManager and Partition (#19391) * [PR-19469](https://github.com/apache/kafka/pull/19469) - KAFKA-18172 Move RemoteIndexCacheTest to the storage module (#19469) * [PR-19426](https://github.com/apache/kafka/pull/19426) - KAFKA-19119 Move ApiVersionManager/SimpleApiVersionManager to server (#19426) * [PR-19439](https://github.com/apache/kafka/pull/19439) - KAFKA-19121 Move AddPartitionsToTxnConfig and TransactionStateManagerConfig out of KafkaConfig (#19439) * [PR-19424](https://github.com/apache/kafka/pull/19424) - KAFKA-19113: Migrate DelegationTokenManager to server module (#19424) * [PR-19347](https://github.com/apache/kafka/pull/19347) - KAFKA-19027 Replace ConsumerGroupCommandTestUtils#generator by ClusterTestDefaults (#19347) * [PR-19431](https://github.com/apache/kafka/pull/19431) - KAFKA-19115: Utilize initialized topics info to verify delete share group offsets (#19431) * [PR-19419](https://github.com/apache/kafka/pull/19419) - KAFKA-15371 MetadataShell is stuck when bootstrapping (#19419) * [PR-19345](https://github.com/apache/kafka/pull/19345) - KAFKA-19071: Fix doc for remote.storage.enable (#19345) * [PR-19374](https://github.com/apache/kafka/pull/19374) - KAFKA-19030 Remove metricNamePrefix from RequestChannel (#19374) * [PR-19387](https://github.com/apache/kafka/pull/19387) - KAFKA-14485: Move LogCleaner to storage module (#19387) * [PR-19293](https://github.com/apache/kafka/pull/19293) - KAFKA-16894: Define feature to enable share groups (#19293) * [PR-19436](https://github.com/apache/kafka/pull/19436) - KAFKA-19127: Integration test for altering and describing streams group configs (#19436) * [PR-19441](https://github.com/apache/kafka/pull/19441) - KAFKA-19103 Remove OffsetConfig (#19441) * [PR-19438](https://github.com/apache/kafka/pull/19438) - KAFKA-19118: Enable KIP-1071 in StandbyTaskCreationIntegrationTest (#19438) * [PR-19423](https://github.com/apache/kafka/pull/19423) - KAFKA-18286: Implement support for streams groups in kafka-groups.sh (#19423) * [PR-19289](https://github.com/apache/kafka/pull/19289) - KAFKA-19042 Move TransactionsWithMaxInFlightOneTest to client-integration-tests module (#19289) * [PR-19410](https://github.com/apache/kafka/pull/19410) - KAFKA-19101 Remove ControllerMutationQuotaManager#throttleTimeMs unused parameter (#19410) * [PR-19354](https://github.com/apache/kafka/pull/19354) - KAFKA-18782: Extend ApplicationRecoverableException related exceptions (#19354) * [PR-19363](https://github.com/apache/kafka/pull/19363) - KAFKA-18629: Utilize share group partition metadata for delete group. (#19363) * [PR-19421](https://github.com/apache/kafka/pull/19421) - KAFKA-19124: Use consumer background event queue for Streams events (#19421) * [PR-19425](https://github.com/apache/kafka/pull/19425) - KAFKA-19118: Enable KIP-1071 in InternalTopicIntegrationTest (#19425) * [PR-19432](https://github.com/apache/kafka/pull/19432) - KAFKA-18170: Add create and write timestamp fields in share snapshot [1/N] (#19432) * [PR-19167](https://github.com/apache/kafka/pull/19167) - KAFKA-18935: Ensure brokers do not return null records in FetchResponse (#19167) * [PR-19261](https://github.com/apache/kafka/pull/19261) - KAFKA-16729: Support isolation level for share consumer (#19261) * [PR-19188](https://github.com/apache/kafka/pull/19188) - KAFKA-18962: Fix onBatchRestored call in GlobalStateManagerImpl (#19188) * [83f6a1d7](https://github.com/apache/kafka/commit/83f6a1d7e6dfce4a78e1192a8fecf523b39ddaab) - KAFKA-18991; Missing change for cherry-pick * [PR-19223](https://github.com/apache/kafka/pull/19223) - KAFKA-18991: FetcherThread should match leader epochs between fetch request and fetch state (#19223) * [PR-19422](https://github.com/apache/kafka/pull/19422) - KAFKA-18287: Add support for kafka-streams-groups.sh –list (#19422) * [PR-18852](https://github.com/apache/kafka/pull/18852) - KAFKA-18723; Better handle invalid records during replication (#18852) * [PR-19377](https://github.com/apache/kafka/pull/19377) - KAFKA-19037: Integrate consumer-side code with Streams (#19377) * [PR-1611](https://github.com/confluentinc/kafka/pull/1611) - Fix build failure (#1582) * [PR-19390](https://github.com/apache/kafka/pull/19390) - KAFKA-19090: Move DelayedFuture and DelayedFuturePurgatory to server module (#19390) * [PR-19213](https://github.com/apache/kafka/pull/19213) - KAFKA-18984: Reset interval.ms By Using kafka-client-metrics.sh (#19213) * [PR-18976](https://github.com/apache/kafka/pull/18976) - KAFKA-16718-2/n: KafkaAdminClient and GroupCoordinator implementation for DeleteShareGroupOffsets RPC (#18976) * [PR-19384](https://github.com/apache/kafka/pull/19384) - KAFKA-19093 Change the “Handler on Broker” to “Handler on Controller” for controller server (#19384) * [PR-19296](https://github.com/apache/kafka/pull/19296) - KAFKA-19047: Allow quickly re-registering brokers that are in controlled shutdown (#19296) * [PR-19413](https://github.com/apache/kafka/pull/19413) - KAFKA-19099 Remove GroupSyncKey, GroupJoinKey, and MemberKey (#19413) * [PR-19068](https://github.com/apache/kafka/pull/19068) - KAFKA-18892: Add KIP-877 support for ClientQuotaCallback (#19068) * [PR-19406](https://github.com/apache/kafka/pull/19406) - KAFKA-19100: Use ProcessRole instead of String in AclApis (#19406) * [PR-19398](https://github.com/apache/kafka/pull/19398) - KAFKA-19098 Remove lastOffset from PartitionResponse (#19398) * [PR-19359](https://github.com/apache/kafka/pull/19359) - KAFKA-19077: Propagate shutdownRequested field (#19359) * [PR-19219](https://github.com/apache/kafka/pull/19219) - KAFKA-19001: Use streams group-level configurations in heartbeat (#19219) * [PR-19369](https://github.com/apache/kafka/pull/19369) - KAFKA-19084: Port KAFKA-16224, KAFKA-16764 for ShareConsumers (#19369) * [PR-19392](https://github.com/apache/kafka/pull/19392) - KAFKA-19076 replace String by Supplier for UnifiedLog#maybeHandleIOException (#19392) * [PR-17614](https://github.com/apache/kafka/pull/17614) - KAFKA-16758: Extend Consumer#close with an option to leave the group or not (#17614) * [PR-19303](https://github.com/apache/kafka/pull/19303) - KAFKA-16407: Fix foreign key INNER join on change of FK from/to a null value (#19303) * [PR-19242](https://github.com/apache/kafka/pull/19242) - KAFKA-19013 Reformat PR body to 72 characters (#19242) * [PR-19288](https://github.com/apache/kafka/pull/19288) - KAFKA-19042 Move TransactionsExpirationTest to client-integration-tests module (#19288) * [PR-19357](https://github.com/apache/kafka/pull/19357) - KAFKA-19074 Remove the cached responseData from ShareFetchResponse (#19357) * [PR-19285](https://github.com/apache/kafka/pull/19285) - KAFKA-14523: Move DelayedRemoteListOffsets to the storage module (#19285) * [PR-19323](https://github.com/apache/kafka/pull/19323) - KAFKA-13747: refactor TopologyTest to test different store type parametrized (#19323) * [PR-19370](https://github.com/apache/kafka/pull/19370) - KAFKA-19085: SharePartitionManagerTest testMultipleConcurrentShareFetches throws silent exception and works incorrectly (#19370) * [PR-19218](https://github.com/apache/kafka/pull/19218) - KAFKA-7952: use in memory stores for KTable test (#19218) * [PR-19328](https://github.com/apache/kafka/pull/19328) - KAFKA-18761: [2/N] List share group offsets with state and auth (#19328) * [PR-19005](https://github.com/apache/kafka/pull/19005) - KAFKA-18713: Fix FK Left-Join result race condition (#19005) * [PR-19269](https://github.com/apache/kafka/pull/19269) - KAFKA-18067: Add a flag to disable producer reset during active task creator shutting down (#19269) * [PR-19320](https://github.com/apache/kafka/pull/19320) - KAFKA-19055 Cleanup the 0.10.x information from clients module (#19320) * [PR-19348](https://github.com/apache/kafka/pull/19348) - KAFKA-19075: Included other share group dynamic configs in extractShareGroupConfigMap method in ShareGroupConfig (#19348) * [PR-19333](https://github.com/apache/kafka/pull/19333) - KAFKA-19064: Handle exceptions from deferred events in coordinator (#19333) * [PR-19339](https://github.com/apache/kafka/pull/19339) - KAFKA-18827: Incorporate initializing topics in share group heartbeat [4/N] (#19339) * [PR-19111](https://github.com/apache/kafka/pull/19111) - KAFKA-18923: resource leak in RSM fetchIndex inputStream (#19111) * [PR-19317](https://github.com/apache/kafka/pull/19317) - KAFKA-18949 add consumer protocol to testDeleteRecordsAfterCorruptRecords (#19317) * [PR-19324](https://github.com/apache/kafka/pull/19324) - KAFKA-19058 Running the streams/streams-scala module tests produces a streams-scala.log (#19324) * [PR-19276](https://github.com/apache/kafka/pull/19276) - KAFKA-19003: Add forceTerminateTransaction command to CLI tools (#19276) * [PR-19226](https://github.com/apache/kafka/pull/19226) - KAFKA-19004 Move DelayedDeleteRecords to server-common module (#19226) * [PR-18953](https://github.com/apache/kafka/pull/18953) - KAFKA-18826: Add global thread metrics (#18953) * [PR-19343](https://github.com/apache/kafka/pull/19343) - KAFKA-19016: Updated the retention behaviour of share groups to retain them forever (#19343) * [PR-19344](https://github.com/apache/kafka/pull/19344) - KAFKA-19072: Add system test for ELR (#19344) * [PR-19331](https://github.com/apache/kafka/pull/19331) - KAFKA-15931: Cancel RemoteLogReader gracefully (#19331) * [PR-19338](https://github.com/apache/kafka/pull/19338) - KAFKA-18796-2: Corrected the check for acquisition lock timeout in Sh… (#19338) * [PR-19335](https://github.com/apache/kafka/pull/19335) - KAFKA-19062: Port changes from KAFKA-18645 to share-consumers (#19335) * [PR-19334](https://github.com/apache/kafka/pull/19334) - KAFKA-19018,KAFKA-19063: Implement maxRecords and acquisition lock timeout in share fetch request and response resp. (#19334) * [PR-18383](https://github.com/apache/kafka/pull/18383) - KAFKA-18613: Unit tests for usage of incorrect RPCs (#18383) * [PR-19189](https://github.com/apache/kafka/pull/19189) - KAFKA-18613: Improve test coverage for missing topics (#19189) * [PR-18510](https://github.com/apache/kafka/pull/18510) - KAFKA-18409: ShareGroupStateMessageFormatter should use CoordinatorRecordMessageFormatter (#18510) * [PR-19274](https://github.com/apache/kafka/pull/19274) - KAFKA-18959 increase the num_workers from 9 to 14 (#19274) * [PR-19283](https://github.com/apache/kafka/pull/19283) - KAFKA-19042 Move ConsumerTopicCreationTest to client-integration-tests module (#19283) * [PR-18297](https://github.com/apache/kafka/pull/18297) - KAFKA-16260: Deprecate window.size.ms and window.inner.class.serde in StreamsConfig (#18297) * [PR-19114](https://github.com/apache/kafka/pull/19114) - KAFKA-18613: Add StreamsGroupHeartbeat handler in the group coordinator (#19114) * [PR-19270](https://github.com/apache/kafka/pull/19270) - KAFKA-19032 Remove TestInfoUtils.TestWithParameterizedQuorumAndGroupProtocolNames (#19270) * [PR-19268](https://github.com/apache/kafka/pull/19268) - KAFKA-19005 improve the documentation of DescribeTopicsOptions#partitionSizeLimitPerResponse (#19268) * [PR-19282](https://github.com/apache/kafka/pull/19282) - KAFKA-19036 Rewrite LogAppendTimeTest and move it to storage module (#19282) * [PR-19299](https://github.com/apache/kafka/pull/19299) - KAFKA-19049 Remove the @ExtendWith(ClusterTestExtensions.class) from code base (#19299) * [PR-19076](https://github.com/apache/kafka/pull/19076) - KAFKA-17830 Cover unit tests for TBRLMM init failure scenarios (#19076) * [PR-19216](https://github.com/apache/kafka/pull/19216) - KAFKA-14486 Move LogCleanerManager to storage module (#19216) * [PR-19026](https://github.com/apache/kafka/pull/19026) - KAFKA-18827: Initialize share group state group coordinator impl. [3/N] (#19026) * [PR-18695](https://github.com/apache/kafka/pull/18695) - KAFKA-18616; Refactor Tools’s ApiMessageFormatter (#18695) * [PR-19192](https://github.com/apache/kafka/pull/19192) - KAFKA-18899: Improve handling of timeouts for commitAsync() in ShareConsumer. (#19192) * [PR-19154](https://github.com/apache/kafka/pull/19154) - KAFKA-18914 Migrate ConsumerRebootstrapTest to use new test infra (#19154) * [PR-19233](https://github.com/apache/kafka/pull/19233) - KAFKA-18736: Add pollOnClose() and maximumTimeToWait() (#19233) * [PR-19230](https://github.com/apache/kafka/pull/19230) - KAFKA-18736: Handle errors in the Streams group heartbeat request manager (#19230) * [PR-18711](https://github.com/apache/kafka/pull/18711) - KAFKA-18576 Convert ConfigType to Enum (#18711) * [PR-19247](https://github.com/apache/kafka/pull/19247) - KAFKA-18796: Added more information to error message when assertion fails for acquisition lock timeout (#19247) * [PR-19046](https://github.com/apache/kafka/pull/19046) - KAFKA-18276 Migrate ProducerRebootstrapTest to new test infra (#19046) * [PR-19207](https://github.com/apache/kafka/pull/19207) - KAFKA-18980 OffsetMetadataManager#cleanupExpiredOffsets should record the number of records rather than topic partitions (#19207) * [PR-19227](https://github.com/apache/kafka/pull/19227) - KAFKA-18999 Remove BrokerMetadata (#19227) * [PR-19256](https://github.com/apache/kafka/pull/19256) - KAFKA-17806 remove this-escape suppress warnings in AclCommand (#19256) * [PR-19255](https://github.com/apache/kafka/pull/19255) - KAFKA-18329; [3/3] Delete old group coordinator (KIP-848) (#19255) * [PR-19064](https://github.com/apache/kafka/pull/19064) - KAFKA-18893: Add KIP-877 support to ReplicaSelector (#19064) * [PR-19251](https://github.com/apache/kafka/pull/19251) - KAFKA-18329; [2/3] Delete old group coordinator (KIP-848) (#19251) * [PR-19254](https://github.com/apache/kafka/pull/19254) - KAFKA-19017: Changed consumer.config to command-config in verifiable_share_consumer.py (#19254) * [PR-19246](https://github.com/apache/kafka/pull/19246) - KAFKA-15599 Move MetadataLogConfig to raft module (#19246) * [PR-19180](https://github.com/apache/kafka/pull/19180) - KAFKA-18954: Add ELR election rate metric (#19180) * [PR-19197](https://github.com/apache/kafka/pull/19197) - KAFKA-15931: Cancel RemoteLogReader gracefully (#19197) * [PR-19243](https://github.com/apache/kafka/pull/19243) - KAFKA-18329; [1/3] Delete old group coordinator (KIP-848) (#19243) * [PR-19174](https://github.com/apache/kafka/pull/19174) - KAFKA-18946 Move BrokerReconfigurable and DynamicProducerStateManagerConfig to server module (#19174) * [PR-18842](https://github.com/apache/kafka/pull/18842) - KAFKA-806 Index may not always observe log.index.interval.bytes (#18842) * [PR-19214](https://github.com/apache/kafka/pull/19214) - KAFKA-18989 Optimize FileRecord#searchForOffsetWithSize (#19214) * [PR-19183](https://github.com/apache/kafka/pull/19183) - KAFKA-18819 StreamsGroupHeartbeat API and StreamsGroupDescribe API check topic describe (#19183) * [PR-19217](https://github.com/apache/kafka/pull/19217) - KAFKA-18975 Move clients-integration-test out of core module (#19217) * [PR-19193](https://github.com/apache/kafka/pull/19193) - KAFKA-18953: [1/N] Add broker side handling for 2 PC (KIP-939) (#19193) * [PR-18949](https://github.com/apache/kafka/pull/18949) - KAFKA-17431: Support invalid static configs for KRaft so long as dynamic configs are valid (#18949) * [PR-19202](https://github.com/apache/kafka/pull/19202) - KAFKA-18969 Rewrite ShareConsumerTest#setup and move to clients-integration-tests module (#19202) * [PR-19165](https://github.com/apache/kafka/pull/19165) - KAFKA-18955: Fix infinite loop and standardize options in MetadataSchemaCheckerTool (#19165) * [PR-18966](https://github.com/apache/kafka/pull/18966) - KAFKA-18808 add test to ensure the name= is not equal to default quota (#18966) * [PR-18463](https://github.com/apache/kafka/pull/18463) - KAFKA-17171 Add test cases for STATIC_BROKER_CONFIG in kraft mode (#18463) * [PR-18801](https://github.com/apache/kafka/pull/18801) - KAFKA-17565 Move MetadataCache interface to metadata module (#18801) * [PR-19181](https://github.com/apache/kafka/pull/19181) - KAFKA-18736: Do not send fields if not needed (#19181) * [PR-19215](https://github.com/apache/kafka/pull/19215) - KAFKA-18990 Avoid redundant MetricName creation in BaseQuotaTest#produceUntilThrottled (#19215) * [PR-19027](https://github.com/apache/kafka/pull/19027) - KAFKA-18859 honor the error message of UnregisterBrokerResponse (#19027) * [PR-19212](https://github.com/apache/kafka/pull/19212) - KAFKA-18993 Remove confusing notable change section from upgrade.html (#19212) * [PR-19129](https://github.com/apache/kafka/pull/19129) - KAFKA-18703 Remove unused class PayloadKeyType (#19129) * [PR-19187](https://github.com/apache/kafka/pull/19187) - KAFKA-18915 Rewrite AdminClientRebootstrapTest to cover the current scenario (#19187) * [PR-19147](https://github.com/apache/kafka/pull/19147) - KAFKA-18924 Running the storage module tests produces a storage/storage.log file (#19147) * [PR-19136](https://github.com/apache/kafka/pull/19136) - KAFKA-18781: Extend RefreshRetriableException related exceptions (#19136) * [PR-18994](https://github.com/apache/kafka/pull/18994) - KAFKA-18843: MirrorMaker2 unique workerId (#18994) * [PR-17264](https://github.com/apache/kafka/pull/17264) - KAFKA-17516 Synonyms for client metrics configs (#17264) * [PR-19134](https://github.com/apache/kafka/pull/19134) - KAFKA-18927 Remove LATEST_0_11, LATEST_1_0, LATEST_1_1, LATEST_2_0 (#19134) * [PR-19164](https://github.com/apache/kafka/pull/19164) - KAFKA-18943: Kafka Streams incorrectly commits TX during task revokation (#19164) * [PR-19205](https://github.com/apache/kafka/pull/19205) - KAFKA-18979; Report correct kraft.version in ApiVersions (#19205) * [PR-19176](https://github.com/apache/kafka/pull/19176) - KAFKA-18651: Add Streams-specific broker configurations (#19176) * [PR-19040](https://github.com/apache/kafka/pull/19040) - KAFKA-18858 Refactor FeatureControlManager to avoid using uninitialized MV (#19040) * [PR-19030](https://github.com/apache/kafka/pull/19030) - KAFKA-14484: Move UnifiedLog to storage module (#19030) * [PR-18662](https://github.com/apache/kafka/pull/18662) - KAFKA-18617 Allow use of ClusterInstance inside BeforeEach (#18662) * [PR-18018](https://github.com/apache/kafka/pull/18018) - KAFKA-18142 Switch to com.gradleup.shadow (#18018) * [PR-19169](https://github.com/apache/kafka/pull/19169) - KAFKA-18947 Remove unused raftManager in metadataShell (#19169) * [PR-18998](https://github.com/apache/kafka/pull/18998) - KAFKA-18837: Ensure controller quorum timeouts and backoffs are at least 0 (#18998) * [PR-19119](https://github.com/apache/kafka/pull/19119) - KAFKA-18422 Adjust Kafka client upgrade path section (#19119) * [PR-19168](https://github.com/apache/kafka/pull/19168) - KAFKA-18942: Add reviewers to PR body with committer-tools (#19168) * [PR-19148](https://github.com/apache/kafka/pull/19148) - KAFKA-18932: Removed usage of partition max bytes from share fetch requests (#19148) * [PR-19145](https://github.com/apache/kafka/pull/19145) - KAFKA-18936: Fix share fetch when records are larger than max bytes (#19145) * [PR-18091](https://github.com/apache/kafka/pull/18091) - KAFKA-18074: Add kafka client compatibility matrix (#18091) * [PR-18258](https://github.com/apache/kafka/pull/18258) - KAFKA-18195: Fix Kafka Streams broker compatibility matrix (#18258) * [PR-19171](https://github.com/apache/kafka/pull/19171) - KAFKA-17808: Fix id typo for connector-dlq-adminclient (#19171) * [PR-19144](https://github.com/apache/kafka/pull/19144) - KAFKA-18933 Add client integration tests module (#19144) * [PR-19155](https://github.com/apache/kafka/pull/19155) - KAFKA-18925: Add streams groups support to Admin.listGroups (#19155) * [PR-19142](https://github.com/apache/kafka/pull/19142) - KAFKA-18901: [1/N] Improved homogeneous SimpleAssignor (#19142) * [PR-19162](https://github.com/apache/kafka/pull/19162) - KAFKA-18941: Do not test 3.3 in upgrade_tests.py (#19162) * [PR-19121](https://github.com/apache/kafka/pull/19121) - KAFKA-18736: Decide when a heartbeat should be sent (#19121) * [PR-19173](https://github.com/apache/kafka/pull/19173) - KAFKA-18931: added a share group session timeout task when group coordinator is loaded (#19173) * [PR-19099](https://github.com/apache/kafka/pull/19099) - KAFKA-18637: Fix max connections per ip and override reconfigurations (#19099) * [PR-17767](https://github.com/apache/kafka/pull/17767) - KAFKA-17856 Move ConfigCommandTest and ConfigCommandIntegrationTest to tool module (#17767) * [PR-18802](https://github.com/apache/kafka/pull/18802) - KAFKA-18706 Move AclPublisher to metadata module (#18802) * [PR-19166](https://github.com/apache/kafka/pull/19166) - KAFKA-18944 Remove unused setters from ClusterConfig (#19166) * [PR-19081](https://github.com/apache/kafka/pull/19081) - KAFKA-18909 Move DynamicThreadPool to server module (#19081) * [PR-19062](https://github.com/apache/kafka/pull/19062) - KAFKA-18700 Migrate SnapshotPath, Entry, OffsetAndEpoch, LogFetchInfo, and LogAppendInfo to record classes (#19062) * [PR-19156](https://github.com/apache/kafka/pull/19156) - KAFKA-18940: fix electionWasClean (#19156) * [PR-19127](https://github.com/apache/kafka/pull/19127) - KAFKA-18920: The kcontrollers must set kraft.version in ApiVersionsResponse (#19127) * [PR-19116](https://github.com/apache/kafka/pull/19116) - KAFKA-18285: Add describeStreamsGroup to Admin API (#19116) * [PR-18684](https://github.com/apache/kafka/pull/18684) - KAFKA-18461 Add Objects.requireNotNull to Snapshot (#18684) * [PR-18299](https://github.com/apache/kafka/pull/18299) - KAFKA-17607: Add CI step to verify LICENSE-binary (#18299) * [PR-19137](https://github.com/apache/kafka/pull/19137) - KAFKA-18929: Log a warning when time based segment delete is blocked by a future timestamp (#19137) * [PR-15241](https://github.com/apache/kafka/pull/15241) - KAFKA-15931: Reopen TransactionIndex if channel is closed (#15241) * [PR-19138](https://github.com/apache/kafka/pull/19138) - KAFKA-18046; High CPU usage when using Log4j2 (#19138) * [PR-19094](https://github.com/apache/kafka/pull/19094) - KAFKA-18915: Migrate AdminClientRebootstrapTest to use new test infra (#19094) * [PR-19113](https://github.com/apache/kafka/pull/19113) - KAFKA-18900: Experimental share consumer acknowledge mode config (#19113) * [PR-19131](https://github.com/apache/kafka/pull/19131) - KAFKA-18648: Make records in FetchResponse nullable again (#19131) * [PR-19120](https://github.com/apache/kafka/pull/19120) - KAFKA-18887: Implement Streams Admin APIs (#19120) * [PR-19130](https://github.com/apache/kafka/pull/19130) - KAFKA-18811: Added command configs to admin client as well in VerifiableShareConsumer (#19130) * [PR-19112](https://github.com/apache/kafka/pull/19112) - KAFKA-18910 Remove kafka.utils.json (#19112) * [4a500418](https://github.com/apache/kafka/commit/4a500418c63a063198c5f6ce256bfef9ffd74e3a) - Revert “KAFKA-18246 Fix ConnectRestApiTest.test_rest_api by adding multiversioning configs (#18191)” * [d86cb597](https://github.com/apache/kafka/commit/d86cb597902d32ce83f27d65b60df6700cb7a61d) - Revert “KAFKA-18887: Implement Streams Admin APIs (#19049)” * [PR-19049](https://github.com/apache/kafka/pull/19049) - KAFKA-18887: Implement Streams Admin APIs (#19049) * [PR-19104](https://github.com/apache/kafka/pull/19104) - KAFKA-18919 Clarify that KafkaPrincipalBuilder classes must also implement KafkaPrincipalSerde (#19104) * [PR-19054](https://github.com/apache/kafka/pull/19054) - KAFKA-18882 Remove BaseKey, TxnKey, and UnknownKey (#19054) * [PR-19083](https://github.com/apache/kafka/pull/19083) - KAFKA-18817: ShareGroupHeartbeat and ShareGroupDescribe API must check topic describe (#19083) * [PR-18983](https://github.com/apache/kafka/pull/18983) - KAFKA-14121: AlterPartitionReassignments API should allow callers to specify the option of preserving the replication factor (#18983) * [PR-18918](https://github.com/apache/kafka/pull/18918) - KAFKA-18804 Remove slf4j warning when using tool script (#18918) * [PR-9766](https://github.com/apache/kafka/pull/9766) - KAFKA-10864 Convert end txn marker schema to use auto-generated protocol (#9766) * [PR-19087](https://github.com/apache/kafka/pull/19087) - KAFKA-18886 add behavior change of CreateTopicPolicy and AlterConfigPolicy to zk2kraft (#19087) * [PR-19097](https://github.com/apache/kafka/pull/19097) - KAFKA-18422 add link of KIP-1124 to “rolling upgrade” section (#19097) * [PR-19089](https://github.com/apache/kafka/pull/19089) - KAFKA-18917: TransformValues throws NPE (#19089) * [PR-19065](https://github.com/apache/kafka/pull/19065) - KAFKA-18876 4.0 documentation improvement (#19065) * [PR-19086](https://github.com/apache/kafka/pull/19086) - Fix typos in multiple files (#19086) * [PR-19091](https://github.com/apache/kafka/pull/19091) - KAFKA-18918: Correcting releasing of locks on exception (#19091) * [PR-19088](https://github.com/apache/kafka/pull/19088) - KAFKA-18916; Resolved regular expressions must update the group by topics data structure (#19088) * [PR-19075](https://github.com/apache/kafka/pull/19075) - KAFKA-18867 add tests to describe topic configs with empty name (#19075) * [PR-18449](https://github.com/apache/kafka/pull/18449) - KAFKA-18500 Build PRs at HEAD commit (#18449) * [PR-19059](https://github.com/apache/kafka/pull/19059) - KAFKA-18878: Added share session cache and delayed share fetch metrics (KIP-1103) (#19059) * [PR-18997](https://github.com/apache/kafka/pull/18997) - KAFKA-18844: Stale features information in QuorumController#registerBroker (#18997) * [PR-19036](https://github.com/apache/kafka/pull/19036) - KAFKA-18864:remove the Evolving tag from stable public interfaces (#19036) * [PR-19055](https://github.com/apache/kafka/pull/19055) - KAFKA-18817:[1/N] ShareGroupHeartbeat and ShareGroupDescribe API must check topic describe (#19055) * [PR-18981](https://github.com/apache/kafka/pull/18981) - KAFKA-18613: Auto-creation of internal topics in streams group heartbeat (#18981) * [PR-19056](https://github.com/apache/kafka/pull/19056) - KAFKA-18881 Document the ConsumerRecord as non-thread safe (#19056) * [PR-18752](https://github.com/apache/kafka/pull/18752) - KAFKA-18168: Adding checkpointing for GlobalKTable during restoration and closing (#18752) * [PR-19070](https://github.com/apache/kafka/pull/19070) - KAFKA-18907 Add suitable error message when the appended value is too larger (#19070) * [PR-19067](https://github.com/apache/kafka/pull/19067) - KAFKA-18908 Document that the size of appended value can’t be larger than Short.MAX_VALUE (#19067) * [PR-19047](https://github.com/apache/kafka/pull/19047) - KAFKA-18880 Remove kafka.cluster.Broker and BrokerEndPointNotAvailableException (#19047) * [PR-19063](https://github.com/apache/kafka/pull/19063) - KAFKA-17039 KIP-919 supports for unregisterBroker (#19063) * [PR-17771](https://github.com/apache/kafka/pull/17771) - KAFKA-17981 add Integration test for ConfigCommand to add config key=[val1,val2] (#17771) * [PR-19045](https://github.com/apache/kafka/pull/19045) - KAFKA-18734: Implemented share partition metrics (KIP-1103) (#19045) * [PR-19048](https://github.com/apache/kafka/pull/19048) - KAFKA-18860 Remove Missing Features section (#19048) * [PR-18349](https://github.com/apache/kafka/pull/18349) - KAFKA-18371 TopicBasedRemoteLogMetadataManagerConfig exposes sensitive configuration data in logs (#18349) * [PR-19020](https://github.com/apache/kafka/pull/19020) - KAFKA-18780: Extend RetriableException related exceptions (#19020) * [PR-19037](https://github.com/apache/kafka/pull/19037) - KAFKA-18869 add remote storage threads to “Updating Thread Configs” section (#19037) * [PR-17743](https://github.com/apache/kafka/pull/17743) - KAFKA-18863: Connect Multiversion Support (Versioned Connector Creation and related changes) (#17743) * [PR-19042](https://github.com/apache/kafka/pull/19042) - KAFKA-18813: ConsumerGroupHeartbeat API and ConsumerGroupDescribe API… (#19042) * [PR-18989](https://github.com/apache/kafka/pull/18989) - KAFKA-18813: ConsumerGroupHeartbeat API and ConsumerGroupDescribe API must check topic describe (#18989) * [PR-18979](https://github.com/apache/kafka/pull/18979) - KAFKA-18614, KAFKA-18613: Add streams group request plumbing (#18979) * [PR-18864](https://github.com/apache/kafka/pull/18864) - KAFKA-18757: Create full-function SimpleAssignor to match KIP-932 description (#18864) * [PR-18988](https://github.com/apache/kafka/pull/18988) - KAFKA-18839: Drop EAGER rebalancing support in Kafka Streams (#18988) * [PR-18985](https://github.com/apache/kafka/pull/18985) - KAFKA-18792 Add workflow to check PR format (#18985) * [PR-19010](https://github.com/apache/kafka/pull/19010) - KAFKA-17351: Improved handling of compacted topics in share partition (2/N) (#19010) * [PR-19021](https://github.com/apache/kafka/pull/19021) - KAFKA-17836 Move RackAwareTest to server module (#19021) * [PR-18803](https://github.com/apache/kafka/pull/18803) - KAFKA-18712 Move Endpoint to server module (#18803) * [PR-18387](https://github.com/apache/kafka/pull/18387) - KAFKA-18281: Kafka is improperly validating non-advertised listeners for routable controller addresses (#18387) * [PR-18900](https://github.com/apache/kafka/pull/18900) - KAFKA-17937 Cleanup AbstractFetcherThreadTest (#18900) * [PR-18898](https://github.com/apache/kafka/pull/18898) - KIP-966 part 1 release doc (#18898) * [PR-18770](https://github.com/apache/kafka/pull/18770) - KAFKA-18748 Run new tests separately in PRs (#18770) * [PR-18804](https://github.com/apache/kafka/pull/18804) - KAFKA-18522: Slice records for share fetch (#18804) * [PR-18233](https://github.com/apache/kafka/pull/18233) - KAFKA-18023: Enforcing Explicit Naming for Kafka Streams Internal Topics (#18233) * [PR-18939](https://github.com/apache/kafka/pull/18939) - KAFKA-18779: Validate responses from broker in client for ShareFetch and ShareAcknowledge RPCs. (#18939) * [PR-18992](https://github.com/apache/kafka/pull/18992) - KAFKA-18827: Initialize share group state persister impl [2/N]. (#18992) * [PR-18880](https://github.com/apache/kafka/pull/18880) - KAFKA-15583 doc update for the “strict min ISR” rule (#18880) * [PR-18928](https://github.com/apache/kafka/pull/18928) - KAFKA-18629: ShareGroupDeleteState admin client impl. (#18928) * [PR-18978](https://github.com/apache/kafka/pull/18978) - KAFKA-17351: Update tests and acquire API to allow discard batches from compacted topics (1/N) (#18978) * [PR-18968](https://github.com/apache/kafka/pull/18968) - KAFKA-18827: Initialize share state, share coordinator impl. [1/N] (#18968) * [PR-19000](https://github.com/apache/kafka/pull/19000) - Revert “KAFKA-16803: Change fork, update ShadowJavaPlugin to 8.1.7 (#16295)” (#19000) * [PR-18897](https://github.com/apache/kafka/pull/18897) - KAFKA-18795 Remove Records#downConvert (#18897) * [PR-18996](https://github.com/apache/kafka/pull/18996) - KAFKA-18813: [3/N] Client support for TopicAuthException in DescribeConsumerGroup path (#18996) * [PR-18959](https://github.com/apache/kafka/pull/18959) - KAFKA-18733: Implemented fetch ratio and partition acquire time metrics (3/N) (#18959) * [PR-18986](https://github.com/apache/kafka/pull/18986) - KAFKA-18813: [2/N] Client support for TopicAuthException in HB path (#18986) * [PR-18844](https://github.com/apache/kafka/pull/18844) - KAFKA-18737 KafkaDockerWrapper setup functions fails due to storage format command (#18844) * [PR-18848](https://github.com/apache/kafka/pull/18848) - KAFKA-18629: Delete share group state RPC group coordinator impl. [3/N] (#18848) * [PR-18982](https://github.com/apache/kafka/pull/18982) - KAFKA-18829: Added check before converting to IMPLICIT mode (#18964) (Cherry-pick) (#18982) * [PR-18969](https://github.com/apache/kafka/pull/18969) - KAFKA-18831 Migrating to log4j2 introduce behavior changes of adjusting level dynamically (#18969) * [PR-18737](https://github.com/apache/kafka/pull/18737) - KAFKA-18641: AsyncKafkaConsumer could lose records with auto offset commit (#18737) * [PR-18962](https://github.com/apache/kafka/pull/18962) - KAFKA-18828: Update share group metrics per new init and call mechanism. (#18962) * [PR-18891](https://github.com/apache/kafka/pull/18891) - KAFKA-16918 TestUtils#assertFutureThrows should use future.get with timeout (#18891) * [PR-18965](https://github.com/apache/kafka/pull/18965) - MIINOR: Remove redundant quorum parameter from `*AdminIntegrationTest` classes (#18965) * [PR-18967](https://github.com/apache/kafka/pull/18967) - KAFKA-18791 Set default commit to PR title and description [2/n] (#18967) * [PR-18964](https://github.com/apache/kafka/pull/18964) - KAFKA-18829: Added check before converting to IMPLICIT mode (#18964) * [PR-18955](https://github.com/apache/kafka/pull/18955) - KAFKA-18791 Enable new asf.yaml parser [1/n] (#18955) * [PR-18845](https://github.com/apache/kafka/pull/18845) - KAFKA-18601: Assume a baseline of 3.3 for server protocol versions (#18845) * [PR-18944](https://github.com/apache/kafka/pull/18944) - KAFKA-18198: Added check to prevent acknowledgements on initial ShareFetchRequest. (#18944) * [PR-18946](https://github.com/apache/kafka/pull/18946) - KAFKA-18799 Remove AdminUtils (#18946) * [PR-18757](https://github.com/apache/kafka/pull/18757) - KAFKA-18667 Add replication system test case for combined broker + controller failure (#18757) * [PR-18872](https://github.com/apache/kafka/pull/18872) - KAFKA-18773 Migrate the log4j1 config to log4j 2 for native image and README (#18872) * [PR-18004](https://github.com/apache/kafka/pull/18004) - KAFKA-18089: Upgrade Caffeine lib to 3.1.8 (#18004) * [PR-18850](https://github.com/apache/kafka/pull/18850) - KAFKA-18767: Add client side config check for shareConsumer (#18850) * [PR-18460](https://github.com/apache/kafka/pull/18460) - KAFKA-14484: Decouple UnifiedLog and RemoteLogManager (#18460) * [PR-18927](https://github.com/apache/kafka/pull/18927) - KAFKA-16718 [1/n]: Added DeleteShareGroupOffsets request and response schema (#18927) * [PR-18870](https://github.com/apache/kafka/pull/18870) - KAFKA-18736: Add Streams group heartbeat request manager (1/N) (#18870) * [PR-18914](https://github.com/apache/kafka/pull/18914) - KAFKA-18798 The replica placement policy used by ReassignPartitionsCommand is not aligned with kraft controller (#18914) * [PR-18888](https://github.com/apache/kafka/pull/18888) - KAFKA-18787: RemoteIndexCache fails to delete invalid files on init (#18888) * [PR-18934](https://github.com/apache/kafka/pull/18934) - KAFKA-18807; Fix thread idle ratio metric (#18934) * [PR-18871](https://github.com/apache/kafka/pull/18871) - KAFKA-18684: Add base exception classes (#18871) * [PR-18924](https://github.com/apache/kafka/pull/18924) - KAFKA-18733: Updating share group record acks metric (2/N) (#18924) * [PR-18907](https://github.com/apache/kafka/pull/18907) - KAFKA-18801 Remove ClusterGenerator and revise ClusterTemplate javadoc (#18907) * [PR-18809](https://github.com/apache/kafka/pull/18809) - KAFKA-18730: Add replaying streams group state from offset topic (#18809) * [PR-18889](https://github.com/apache/kafka/pull/18889) - KAFKA-18784 Fix ConsumerWithLegacyMessageFormatIntegrationTest (#18889) * [PR-18920](https://github.com/apache/kafka/pull/18920) - KAFKA-18805: add synchronized block for Consumer Heartbeat close (#18920) * [PR-18908](https://github.com/apache/kafka/pull/18908) - KAFKA-18755 Align timeout in kafka-share-groups.sh (#18908) * [PR-18922](https://github.com/apache/kafka/pull/18922) - KAFKA-18809 Set min in sync replicas for \_\_share_group_state. (#18922) * [PR-18916](https://github.com/apache/kafka/pull/18916) - KAFKA-18803 The acls would appear at the wrong level of the metadata shell “tree” (#18916) * [PR-18906](https://github.com/apache/kafka/pull/18906) - KAFKA-18790 Fix testCustomQuotaCallback (#18906) * [PR-18894](https://github.com/apache/kafka/pull/18894) - KAFKA-18761: Complete listing of share group offsets [1/N] (#18894) * [PR-18819](https://github.com/apache/kafka/pull/18819) - KAFKA-16717 [1/2]: Add AdminClient.alterShareGroupOffsets (#18819) * [PR-18899](https://github.com/apache/kafka/pull/18899) - KAFKA-18772 Define share group config defaults for Docker (#18899) * [PR-18826](https://github.com/apache/kafka/pull/18826) - KAFKA-18733: Updating share group metrics (1/N) (#18826) * [PR-18680](https://github.com/apache/kafka/pull/18680) - KAFKA-18634: Fix ELR metadata version issues (#18680) * [PR-18795](https://github.com/apache/kafka/pull/18795) - KAFKA-17182: Consumer fetch sessions are evicted too quickly with AsyncKafkaConsumer (#18795) * [PR-18834](https://github.com/apache/kafka/pull/18834) - KAFKA-16720: Support multiple groups in DescribeShareGroupOffsets RPC (#18834) * [PR-18810](https://github.com/apache/kafka/pull/18810) - KAFKA-18654[2/2]: Transction V2 retry add partitions on the server side when handling produce request. (#18810) * [PR-18756](https://github.com/apache/kafka/pull/18756) - KAFKA-17298: Update upgrade notes for 4.0 KIP-848 (#18756) * [PR-18807](https://github.com/apache/kafka/pull/18807) - KAFKA-18728 Move ListOffsetsPartitionStatus to server module (#18807) * [PR-18851](https://github.com/apache/kafka/pull/18851) - KAFKA-18769: Improve leadership changes handling in ShareConsumeRequestManager. (#18851) * [PR-18869](https://github.com/apache/kafka/pull/18869) - KAFKA-18777 add PartitionsWithLateTransactionsCount to BrokerMetricNamesTest (#18869) * [PR-18729](https://github.com/apache/kafka/pull/18729) - KAFKA-18323: Add StreamsGroup class (#18729) * [PR-18275](https://github.com/apache/kafka/pull/18275) - KAFKA-15443: Upgrade RocksDB to 9.7.3 (#18275) * [PR-18451](https://github.com/apache/kafka/pull/18451) - KAFKA-18035: TransactionsTest testBumpTransactionalEpochWithTV2Disabled failed on trunk (#18451) * [PR-17804](https://github.com/apache/kafka/pull/17804) - KAFKA-15995: Adding KIP-877 support to Connect (#17804) * [PR-18829](https://github.com/apache/kafka/pull/18829) - KAFKA-18756: Enabled share group configs for queues related system tests (#18829) * [PR-18858](https://github.com/apache/kafka/pull/18858) - Fix bug in json naming (#18858) * [PR-18833](https://github.com/apache/kafka/pull/18833) - KAFKA-18758: NullPointerException in shutdown following InvalidConfigurationException (#18833) * [PR-18855](https://github.com/apache/kafka/pull/18855) - KAFKA-18764: Throttle on share state RPCs auth failure. (#18855) * [PR-18039](https://github.com/apache/kafka/pull/18039) - KAFKA-14484: Move UnifiedLog static methods to storage (#18039) * [PR-18394](https://github.com/apache/kafka/pull/18394) - KAFKA-18396: Migrate log4j1 configuration to log4j2 in KafkaDockerWrapper (#18394) * [PR-18853](https://github.com/apache/kafka/pull/18853) - KAFKA-18770 close the RM created by testDelayedShareFetchPurgatoryOperationExpiration (#18853) * [PR-18820](https://github.com/apache/kafka/pull/18820) - KAFKA-18366 Remove KafkaConfig.interBrokerProtocolVersion (#18820) * [PR-18812](https://github.com/apache/kafka/pull/18812) - KAFKA-18658 add import control for examples module (#18812) * [PR-18821](https://github.com/apache/kafka/pull/18821) - KAFKA-18743 Remove leader.imbalance.per.broker.percentage as it is not supported by Kraft (#18821) * [PR-1578](https://github.com/confluentinc/kafka/pull/1578) - CCS CP release test regex updates * [PR-18196](https://github.com/apache/kafka/pull/18196) - KAFKA-18225 ClientQuotaCallback#updateClusterMetadata is unsupported by kraft (#18196) * [PR-1582](https://github.com/confluentinc/kafka/pull/1582) - Fix build failure * [PR-18846](https://github.com/apache/kafka/pull/18846) - KAFKA-18763: changed the assertion statement for acknowledgements to include only successful acks (#18846) * [PR-18824](https://github.com/apache/kafka/pull/18824) - KAFKA-18745: Handle network related errors in persister. (#18824) * [PR-18252](https://github.com/apache/kafka/pull/18252) - KAFKA-17833: Convert DescribeAuthorizedOperationsTest to use KRaft (#18252) * [PR-1577](https://github.com/confluentinc/kafka/pull/1577) - CCS CP release test regex updates * [PR-18381](https://github.com/apache/kafka/pull/18381) - KAFKA-18275 Restarting broker in testing should use the same port (#18381) * [PR-18818](https://github.com/apache/kafka/pull/18818) - KAFKA-18741 document the removal of inter.broker.protocol.version (#18818) * [PR-18496](https://github.com/apache/kafka/pull/18496) - KAFKA-18483 Disable Log4jController and Loggers if Log4j Core absent (#18496) * [PR-18672](https://github.com/apache/kafka/pull/18672) - KAFKA-18618: Improve leader change handling of acknowledgements [1/N] (#18672) * [PR-18566](https://github.com/apache/kafka/pull/18566) - KAFKA-18360 Remove zookeeper configurations (#18566) * [PR-18641](https://github.com/apache/kafka/pull/18641) - KAFKA-18530 Remove ZooKeeperInternals (#18641) * [PR-18583](https://github.com/apache/kafka/pull/18583) - KAFKA-18499 Clean up zookeeper from LogConfig (#18583) * [PR-18771](https://github.com/apache/kafka/pull/18771) - KAFKA-18689: Improve metric calculation to avoid NoSuchElementException (#18771) * [PR-18189](https://github.com/apache/kafka/pull/18189) - KAFKA-18206: EmbeddedKafkaCluster must set features (#18189) * [PR-18765](https://github.com/apache/kafka/pull/18765) - KAFKA-17379: Fix inexpected state transition from ERROR to PENDING_SHUTDOWN (#18765) * [PR-18696](https://github.com/apache/kafka/pull/18696) - KAFKA-18494-3: solution for the bug relating to gaps in the share partition cachedStates post initialization (#18696) * [PR-18748](https://github.com/apache/kafka/pull/18748) - KAFKA-18629: Add persister impl and tests for DeleteShareGroupState RPC. [2/N] (#18748) * [PR-18671](https://github.com/apache/kafka/pull/18671) - [KAFKA-16720] AdminClient Support for ListShareGroupOffsets (2/2) (#18671) * [PR-18702](https://github.com/apache/kafka/pull/18702) - KAFKA-18645: New consumer should align close timeout handling with classic consumer (#18702) * [PR-18791](https://github.com/apache/kafka/pull/18791) - KAFKA-18722: Remove the unreferenced methods in TBRLMM and ConsumerManager (#18791) * [PR-18782](https://github.com/apache/kafka/pull/18782) - KAFKA-18694: Migrate suitable classes to records in coordinator-common module (#18782) * [PR-18784](https://github.com/apache/kafka/pull/18784) - KAFKA-18705: Move ConfigRepository to metadata module (#18784) * [PR-18783](https://github.com/apache/kafka/pull/18783) - KAFKA-18698: Migrate suitable classes to records in server and server-common modules (#18783) * [PR-18781](https://github.com/apache/kafka/pull/18781) - KAFKA-18675 Add tests for valid and invalid broker addresses (#18781) * [PR-18304](https://github.com/apache/kafka/pull/18304) - KAFKA-16524; Metrics for KIP-853 (#18304) * [PR-18277](https://github.com/apache/kafka/pull/18277) - KAFKA-18635: reenable the unclean shutdown detection (#18277) * [PR-18708](https://github.com/apache/kafka/pull/18708) - KAFKA-18649: complete ClearElrRecord handling (#18708) * [PR-18148](https://github.com/apache/kafka/pull/18148) - KAFKA-16540: Clear ELRs when min.insync.replicas is changed. (#18148) * [PR-17952](https://github.com/apache/kafka/pull/17952) - KAFKA-16540: enforce min.insync.replicas config invariants for ELR (#17952) * [PR-15622](https://github.com/apache/kafka/pull/15622) - KAFKA-16446: Improve controller event duration logging (#15622) * [PR-18028](https://github.com/apache/kafka/pull/18028) - KAFKA-18131: Improve logs for voters (#18028) * [PR-18222](https://github.com/apache/kafka/pull/18222) - KAFKA-18305: validate controller.listener.names is not in inter.broker.listener.name for kcontrollers (#18222) * [PR-18777](https://github.com/apache/kafka/pull/18777) - KAFKA-18690: Keep leader metadata for RE2J-assigned partitions (#18777) * [PR-18551](https://github.com/apache/kafka/pull/18551) - KAFKA-18538: Add Streams membership manager (#18551) * [PR-18165](https://github.com/apache/kafka/pull/18165) - KAFKA-18230: Handle not controller or not leader error in admin client (#18165) * [PR-18700](https://github.com/apache/kafka/pull/18700) - KAFKA-18644: improve generic type names for internal FK-join classes (#18700) * [PR-18790](https://github.com/apache/kafka/pull/18790) - KAFKA-18693 Remove PasswordEncoder (#18790) * [PR-18720](https://github.com/apache/kafka/pull/18720) - KAFKA-18654 [1/2]: Transaction Version 2 performance regression due to early return (#18720) * [PR-18592](https://github.com/apache/kafka/pull/18592) - KAFKA-18545: Remove Zookeeper logic from LogManager (#18592) * [PR-18676](https://github.com/apache/kafka/pull/18676) - KAFKA-18325: Add TargetAssignmentBuilder (#18676) * [PR-18786](https://github.com/apache/kafka/pull/18786) - KAFKA-18672; CoordinatorRecordSerde must validate value version (4.0) (#18786) * [PR-18717](https://github.com/apache/kafka/pull/18717) - KAFKA-18655: Implement the consumer group size counter with scheduled task (#18717) * [PR-18764](https://github.com/apache/kafka/pull/18764) - KAFKA-18685: Cleanup DynamicLogConfig constructor (#18764) * [PR-18785](https://github.com/apache/kafka/pull/18785) - KAFKA-18676; Update Benchmark system tests (#18785) * [PR-18330](https://github.com/apache/kafka/pull/18330) - KAFKA-17631 Convert SaslApiVersionsRequestTest to kraft (#18330) * [PR-18749](https://github.com/apache/kafka/pull/18749) - KAFKA-18672; CoordinatorRecordSerde must validate value version (#18749) * [PR-18768](https://github.com/apache/kafka/pull/18768) - KAFKA-18678 Update TestVerifiableProducer system test (#18768) * [PR-18652](https://github.com/apache/kafka/pull/18652) - KAFKA-17125: Streams Sticky Task Assignor (#18652) * [PR-18751](https://github.com/apache/kafka/pull/18751) - KAFKA-18674 Document the incompatible changes in parsing –bootstrap-server (#18751) * [PR-18727](https://github.com/apache/kafka/pull/18727) - KAFKA-18659: librdkafka compressed produce fails unless api versions returns produce v0 (#18727) * [PR-18759](https://github.com/apache/kafka/pull/18759) - KAFKA-18683: Handle slicing of file records for updated start position (#18759) * [fc3dca4e](https://github.com/apache/kafka/commit/fc3dca4ed08a6acdcb5b1d5a4ed5b8a7095d318b) - Revert “KAFKA-17182: Consumer fetch sessions are evicted too quickly with AsyncKafkaConsumer (#17700)” * [7920fadb](https://github.com/apache/kafka/commit/7920fadbb586a9430ce1a45936d6bbd1555baa2d) - Revert “KAFKA-17182: Consumer fetch sessions are evicted too quickly with AsyncKafkaConsumer (#17700)” * [PR-18758](https://github.com/apache/kafka/pull/18758) - KAFKA-18660: Transactions Version 2 doesn’t handle epoch overflow correctly (#18730) (#18758) * [PR-18750](https://github.com/apache/kafka/pull/18750) - KAFKA-18320; Ensure that assignors are at the right place (#18750) * [PR-1541](https://github.com/confluentinc/kafka/pull/1541) - Merge trunk * [PR-17700](https://github.com/apache/kafka/pull/17700) - KAFKA-17182: Consumer fetch sessions are evicted too quickly with AsyncKafkaConsumer (#17700) * [PR-18766](https://github.com/apache/kafka/pull/18766) - KAFKA-18146; tests/kafkatest/tests/core/upgrade_test.py needs to be re-added as KRaft (#18766) * [PR-18763](https://github.com/apache/kafka/pull/18763) - KAFKA-18677; Update ConsoleConsumerTest system test (#18763) * [PR-17511](https://github.com/apache/kafka/pull/17511) - KAFKA-15995: Initial API + make Producer/Consumer plugins Monitorable (#17511) * [PR-18722](https://github.com/apache/kafka/pull/18722) - KAFKA-18644: improve generic type names for KStreamImpl and KTableImpl (#18722) * [PR-18754](https://github.com/apache/kafka/pull/18754) - KAFKA-18034: CommitRequestManager should fail pending requests on fatal coordinator errors (#18754) * [PR-18730](https://github.com/apache/kafka/pull/18730) - KAFKA-18660: Transactions Version 2 doesn’t handle epoch overflow correctly (#18730) * [PR-1556](https://github.com/confluentinc/kafka/pull/1556) - MINOR: Disable publish artifacts for 4.0 * [PR-18548](https://github.com/apache/kafka/pull/18548) - KAFKA-18034: CommitRequestManager should fail pending requests on fatal coordinator errors (#18548) * [PR-18731](https://github.com/apache/kafka/pull/18731) - KAFKA-18570: Update documentation to add remainingLogsToRecover, remainingSegmentsToRecover and LogDirectoryOffline metrics (#18731) * [PR-18669](https://github.com/apache/kafka/pull/18669) - KAFKA-18621: Add StreamsCoordinatorRecordHelpers (#18669) * [PR-18681](https://github.com/apache/kafka/pull/18681) - KAFKA-18636 Fix how we handle Gradle exits in CI (#18681) * [PR-18590](https://github.com/apache/kafka/pull/18590) - KAFKA-18569: New consumer close may wait on unneeded FindCoordinator (#18590) * [PR-18698](https://github.com/apache/kafka/pull/18698) - KAFKA-13722: remove internal usage of old ProcessorContext (#18698) * [PR-18314](https://github.com/apache/kafka/pull/18314) - KAFKA-16339: Add Kafka Streams migrating guide from transform to process (#18314) * [PR-18732](https://github.com/apache/kafka/pull/18732) - KAFKA-18498: Update lock ownership from main thread (#18732) * [PR-18478](https://github.com/apache/kafka/pull/18478) - KAFKA-18383 Remove reserved.broker.max.id and broker.id.generation.enable (#18478) * [PR-18733](https://github.com/apache/kafka/pull/18733) - KAFKA-18662: Return CONCURRENT_TRANSACTIONS on produce request in TV2 (#18733) * [PR-18718](https://github.com/apache/kafka/pull/18718) - KAFKA-18632: Multibroker test improvements. (#18718) * [PR-18725](https://github.com/apache/kafka/pull/18725) - KAFKA-18653: Fix mocks and potential thread leak issues causing silent RejectedExecutionException in share group broker tests (#18725) * [PR-18726](https://github.com/apache/kafka/pull/18726) - KAFKA-18646: Null records in fetch response breaks librdkafka (#18726) * [PR-18668](https://github.com/apache/kafka/pull/18668) - KAFKA-18619: New consumer topic metadata events should set requireMetadata flag (#18668) * [PR-18728](https://github.com/apache/kafka/pull/18728) - KAFKA-18488: Improve KafkaShareConsumerTest (#18728) * [PR-18716](https://github.com/apache/kafka/pull/18716) - KAFKA-18648: Add back support for metadata version 0-3 (#18716) * [PR-18555](https://github.com/apache/kafka/pull/18555) - KAFKA-18528: MultipleListenersWithSameSecurityProtocolBaseTest and GssapiAuthenticationTest should run for async consumer (#18555) * [PR-18651](https://github.com/apache/kafka/pull/18651) - KAFKA-17951: Share parition rotate strategy (#18651) * [PR-18712](https://github.com/apache/kafka/pull/18712) - KAFKA-18629: Delete share group state impl [1/N] (#18712) * [PR-18570](https://github.com/apache/kafka/pull/18570) - KAFKA-17162: join() started thread in DefaultTaskManagerTest (#18570) * [PR-18602](https://github.com/apache/kafka/pull/18602) - KAFKA-17587 Refactor test infrastructure (#18602) * [PR-18693](https://github.com/apache/kafka/pull/18693) - KAFKA-18631 Remove ZkConfigs (#18693) * [PR-18699](https://github.com/apache/kafka/pull/18699) - KAFKA-18642: Increased the timeouts in share_consumer_test.py system tests (#18699) * [PR-18632](https://github.com/apache/kafka/pull/18632) - KAFKA-18555 Avoid casting MetadataCache to KRaftMetadataCache (#18632) * [PR-18547](https://github.com/apache/kafka/pull/18547) - KAFKA-18533 Remove KafkaConfig zookeeper related logic (#18547) * [PR-18554](https://github.com/apache/kafka/pull/18554) - KAFKA-18529: ConsumerRebootstrapTest should run for async consumer (#18554) * [PR-18292](https://github.com/apache/kafka/pull/18292) - KAFKA-13722: remove usage of old ProcessorContext (#18292) * [PR-18444](https://github.com/apache/kafka/pull/18444) - KAFKA-17894: Implemented broker topic metrics for Share Group 1/N (KIP-1103) (#18444) * [PR-18687](https://github.com/apache/kafka/pull/18687) - KAFKA-18630: Clean ReplicaManagerBuilder (#18687) * [PR-18477](https://github.com/apache/kafka/pull/18477) - KAFKA-18474: Remove zkBroker listener (#18477) * [PR-18688](https://github.com/apache/kafka/pull/18688) - KAFKA-18616; Refactor DumpLogSegments’s MessageParsers (#18688) * [PR-15574](https://github.com/apache/kafka/pull/15574) - KAFKA-16372 Fix producer doc discrepancy with the exception behavior (#15574) * [PR-18618](https://github.com/apache/kafka/pull/18618) - KAFKA-18590 Cleanup DelegationTokenManager (#18618) * [PR-18593](https://github.com/apache/kafka/pull/18593) - KAFKA-18559 Cleanup FinalizedFeatures (#18593) * [PR-18627](https://github.com/apache/kafka/pull/18627) - KAFKA-18597 Fix max-buffer-utilization-percent is always 0 (#18627) * [PR-18686](https://github.com/apache/kafka/pull/18686) - KAFKA-18620: Remove UnifiedLog#legacyFetchOffsetsBefore (#18686) * [PR-18621](https://github.com/apache/kafka/pull/18621) - KAFKA-18592 Cleanup ReplicaManager (#18621) * [PR-18476](https://github.com/apache/kafka/pull/18476) - KAFKA-18324: Add CurrentAssignmentBuilder (#18476) * [PR-12042](https://github.com/apache/kafka/pull/12042) - KAFKA-13810: Document behavior of KafkaProducer.flush() w.r.t callbacks (#12042) * [PR-18667](https://github.com/apache/kafka/pull/18667) - KAFKA-18484 [2/2]; Handle exceptions during coordinator unload (#18667) * [PR-18601](https://github.com/apache/kafka/pull/18601) - KAFKA-18488: Additional protocol tests for share consumption (#18601) * [PR-18666](https://github.com/apache/kafka/pull/18666) - KAFKA-18486; [1/2] Update LocalLeaderEndPointTest (#18666) * [d2024436](https://github.com/apache/kafka/commit/d2024436218343a127385e0149a692caf432b772) - KAFKA-18575: Transaction Version 2 doesn’t correctly handle race condition with completing and new transaction(#18604) * [PR-18532](https://github.com/apache/kafka/pull/18532) - KAFKA-18517: Enable ConsumerBounceTest to run for new async consumer (#18532) * [PR-18614](https://github.com/apache/kafka/pull/18614) - KAFKA-18519: Remove Json.scala, cleanup AclEntry.scala (#18614) * [PR-18630](https://github.com/apache/kafka/pull/18630) - KAFKA-18599: Remove Optional wrapping for forwardingManager in ApiVersionManager (#18630) * [PR-18389](https://github.com/apache/kafka/pull/18389) - KAFKA-18229: Move configs out of “kraft” directory (#18389) * [PR-18661](https://github.com/apache/kafka/pull/18661) - KAFKA-18484 [1/N]; Handle exceptions from deferred events in coordinator (#18661) * [PR-18649](https://github.com/apache/kafka/pull/18649) - KAFKA-18392: Ensure client sets member ID for share group (#18649) * [PR-18527](https://github.com/apache/kafka/pull/18527) - KAFKA-18518: Add processor to handle rebalance events (#18527) * [PR-18607](https://github.com/apache/kafka/pull/18607) - KAFKA-17402: DefaultStateUpdated should transite task atomically (#18607) * [PR-18539](https://github.com/apache/kafka/pull/18539) - KAFKA-18454 Publish build scans to develocity.apache.org (#18539) * [PR-18512](https://github.com/apache/kafka/pull/18512) - KAFKA-18302; Update CoordinatorRecord (#18512) * [PR-18316](https://github.com/apache/kafka/pull/18316) - KAFKA-15370: Support Participation in 2PC (KIP-939) (2/N) (#18316) * [PR-18587](https://github.com/apache/kafka/pull/18587) - KAFKA-8862: Improve Producer error message for failed metadata update (#18587) * [PR-18581](https://github.com/apache/kafka/pull/18581) - KAFKA-17561: add processId tag to thread-state metric (#18581) * [PR-18629](https://github.com/apache/kafka/pull/18629) - KAFKA-18598: Remove ControllerMetadataMetrics ZK-related Metrics (#18629) * [PR-18611](https://github.com/apache/kafka/pull/18611) - KAFKA-18585 Fix fail test ValuesTest#shouldConvertDateValues (#18611) * [PR-18647](https://github.com/apache/kafka/pull/18647) - KAFKA-18487; Remove ReplicaManager#stopReplicas (#18647) * [PR-18635](https://github.com/apache/kafka/pull/18635) - KAFKA-18583; Fix getPartitionReplicaEndpoints for KRaft (#18635) * [PR-18442](https://github.com/apache/kafka/pull/18442) - KAFKA-18311: Internal Topic Manager (5/5) (#18442) * [PR-18636](https://github.com/apache/kafka/pull/18636) - KAFKA-18604; Update transaction coordinator (#18636) * [PR-18497](https://github.com/apache/kafka/pull/18497) - KAFKA-14552: Assume a baseline of 3.0 for server protocol versions (#18497) * [PR-18346](https://github.com/apache/kafka/pull/18346) - KAFKA-18363: Remove ZooKeeper mentiosn in broker configs (#18346) * [PR-18631](https://github.com/apache/kafka/pull/18631) - KAFKA-18595: Remove AuthorizerUtils#sessionToRequestContext (#18631) * [PR-18626](https://github.com/apache/kafka/pull/18626) - KAFKA-18594: Cleanup BrokerLifecycleManager (#18626) * [PR-18174](https://github.com/apache/kafka/pull/18174) - KAFKA-18232: Add share group state topic prune metrics. (#18174) * [PR-18567](https://github.com/apache/kafka/pull/18567) - KAFKA-18553: Update javadoc and comments of ConfigType (#18567) * [PR-18571](https://github.com/apache/kafka/pull/18571) - [KAFKA-16720] AdminClient Support for ListShareGroupOffsets (1/n) (#18571) * [PR-18624](https://github.com/apache/kafka/pull/18624) - KAFKA-18588 Remove TopicKey.scala (#18624) * [PR-18628](https://github.com/apache/kafka/pull/18628) - KAFKA-18578: Remove UpdateMetadataRequest from MetadataCacheTest (#18628) * [PR-18625](https://github.com/apache/kafka/pull/18625) - KAFKA-18593 Remove ZkCachedControllerId In MetadataCache (#18625) * [PR-17390](https://github.com/apache/kafka/pull/17390) - KAFKA-17668: Clean-up LogCleaner#maxOverCleanerThreads and LogCleanerManager#maintainUncleanablePartitions (#17390) * [PR-18616](https://github.com/apache/kafka/pull/18616) - KAFKA-18429 Remove ZkFinalizedFeatureCache and StateChangeFailedException (#18616) * [PR-18619](https://github.com/apache/kafka/pull/18619) - KAFKA-18589 Remove unused interBrokerProtocolVersion from GroupMetadataManager (#18619) * [PR-18598](https://github.com/apache/kafka/pull/18598) - KAFKA-18516 Remove RackAwareMode (#18598) * [PR-18608](https://github.com/apache/kafka/pull/18608) - KAFKA-18492 Cleanup RequestHandlerHelper (#18608) * [PR-18613](https://github.com/apache/kafka/pull/18613) - KAFKA-18427: Remove ZooKeeperClient (#18613) * [PR-18591](https://github.com/apache/kafka/pull/18591) - KAFKA-18540: Remove UpdataMetadataRequest from KafkaApisTest (#18591) * [PR-18594](https://github.com/apache/kafka/pull/18594) - KAFKA-18532: Clean Partition.scala zookeeper logic (#18594) * [PR-18605](https://github.com/apache/kafka/pull/18605) - KAFKA-18423: Remove ZkData and related unused references (#18605) * [PR-18586](https://github.com/apache/kafka/pull/18586) - KAFKA-18565 Cleanup SaslSetup (#18586) * [PR-18606](https://github.com/apache/kafka/pull/18606) - KAFKA-18430 Remove ZkNodeChangeNotificationListener (#18606) * [PR-18492](https://github.com/apache/kafka/pull/18492) - KAFKA-18480 Fix fail e2e test_offset_truncate (#18492) * [PR-18012](https://github.com/apache/kafka/pull/18012) - KAFKA-806: Index may not always observe log.index.interval.bytes (#18012) * [PR-18595](https://github.com/apache/kafka/pull/18595) - KAFKA-18515 Remove DelegationTokenManagerZk (#18595) * [PR-18579](https://github.com/apache/kafka/pull/18579) - Remove casts to KRaftMetadataCache (#18579) * [PR-18577](https://github.com/apache/kafka/pull/18577) - Convert BrokerEndPoint to record (#18577) * [PR-18240](https://github.com/apache/kafka/pull/18240) - KAFKA-17642: PreVote response handling and ProspectiveState (#18240) * [PR-18585](https://github.com/apache/kafka/pull/18585) - KAFKA-18413: Remove AdminZkClient (#18585) * [PR-18406](https://github.com/apache/kafka/pull/18406) - KAFKA-18318: Add logs for online/offline migration indication (#18406) * [PR-18224](https://github.com/apache/kafka/pull/18224) - KAFKA-18150; Downgrade group on classic leave of last consumer member (#18224) * [PR-18209](https://github.com/apache/kafka/pull/18209) - Infrastructure for system tests for the new share consumer client (#18209) * [PR-18553](https://github.com/apache/kafka/pull/18553) - KAFKA-18373: Remove ZkMetadataCache (#18553) * [PR-18582](https://github.com/apache/kafka/pull/18582) - KAFKA-18557 streamline codebase with testConfig() (#18582) * [PR-18573](https://github.com/apache/kafka/pull/18573) - KAFKA-18431: Remove KafkaController (#18573) * [PR-18574](https://github.com/apache/kafka/pull/18574) - KAFKA-18407: Remove ZkAdminManager, DelayedCreatePartitions, CreatePartitionsMetadata, ZkConfigRepository, DelayedDeleteTopics (#18574) * [PR-18568](https://github.com/apache/kafka/pull/18568) - KAFKA-18556: Remove JaasModule#zkDigestModule, JaasTestUtils#zkSections (#18568) * [PR-18534](https://github.com/apache/kafka/pull/18534) - KAFKA-14485: Move LogCleaner exceptions to storage module (#18534) * [PR-18565](https://github.com/apache/kafka/pull/18565) - KAFKA-18546: Use mocks instead of a real DNS lookup to the outside (#18565) * [PR-18140](https://github.com/apache/kafka/pull/18140) - KAFKA-16368: Add a new constraint for segment.bytes to min 1MB for KIP-1030 (#18140) * [PR-18106](https://github.com/apache/kafka/pull/18106) - KAFKA-16368: Update defaults for LOG_MESSAGE_TIMESTAMP_AFTER_MAX_MS_DEFAULT and NUM_RECOVERY_THREADS_PER_DATA_DIR_CONFIG (#18106) * [PR-18374](https://github.com/apache/kafka/pull/18374) - KAFKA-7776: Tests for ISO8601 in Connect value parsing (#18374) * [PR-18562](https://github.com/apache/kafka/pull/18562) - KAFKA-18558: Added check before adding previously subscribed partitions (#18562) * [PR-18535](https://github.com/apache/kafka/pull/18535) - KAFKA-18521 Cleanup NodeApiVersions zkMigrationEnabled field (#18535) * [PR-18552](https://github.com/apache/kafka/pull/18552) - KAFKA-18542 Cleanup AlterPartitionManager (#18552) * [PR-18561](https://github.com/apache/kafka/pull/18561) - KAFKA-18406 Remove ZkBrokerEpochManager.scala (#18561) * [PR-18508](https://github.com/apache/kafka/pull/18508) - KAFKA-18405 Remove ZooKeeper logic from DynamicBrokerConfig (#18508) * [PR-18080](https://github.com/apache/kafka/pull/18080) - KAFKA-16368: Update default linger.ms to 5ms for KIP-1030 (#18080) * [PR-18524](https://github.com/apache/kafka/pull/18524) - KAFKA-18514: Refactor share module code to server and server-common (#18524) * [PR-18414](https://github.com/apache/kafka/pull/18414) - KAFKA-18331: Make process.roles and node.id required configs (#18414) * [PR-18559](https://github.com/apache/kafka/pull/18559) - KAFKA-18552: Remove unnecessary version check in `testHandleOffsetFetch*` (#18559) * [PR-18483](https://github.com/apache/kafka/pull/18483) - KAFKA-18472: Remove MetadataSupport (#18483) * [PR-18342](https://github.com/apache/kafka/pull/18342) - KAFKA-18026: KIP-1112, clean up graph node grace period resolution (#18342) * [PR-18491](https://github.com/apache/kafka/pull/18491) - KAFKA-18479: Remove keepPartitionMetadataFile in UnifiedLog and LogMan… (#18491) * [PR-18365](https://github.com/apache/kafka/pull/18365) - KAFKA-18364 add document to show the changes of metrics and configs after removing zookeeper (#18365) * [PR-18550](https://github.com/apache/kafka/pull/18550) - KAFKA-18539 Remove optional managers in KafkaApis (#18550) * [PR-18563](https://github.com/apache/kafka/pull/18563) - Use version.py get_version to get version (#18563) * [PR-18459](https://github.com/apache/kafka/pull/18459) - KAFKA-18452: Implemented batch size in acquired records (#18459) * [PR-18448](https://github.com/apache/kafka/pull/18448) - KAFKA-18401: Transaction version 2 does not support commit transaction without records (#18448) * [PR-18490](https://github.com/apache/kafka/pull/18490) - KAFKA-18479: RocksDBTimeOrderedKeyValueBuffer not initialized correctly (#18490) * [PR-18536](https://github.com/apache/kafka/pull/18536) - KAFKA-18514 Remove server dependency on share coordinator (#18536) * [PR-18521](https://github.com/apache/kafka/pull/18521) - KAFKA-18513: Validate share state topic records produced in tests. (#18521) * [PR-18542](https://github.com/apache/kafka/pull/18542) - KAFKA-18399 Remove ZooKeeper from KafkaApis (12/N): clean up ZKMetadataCache, KafkaController and raftSupport (#18542) * [PR-18386](https://github.com/apache/kafka/pull/18386) - KAFKA-18346 Fix e2e TestKRaftUpgrade for v3.3.2 (#18386) * [PR-18530](https://github.com/apache/kafka/pull/18530) - KAFKA-18520: Remove ZooKeeper logic from JaasUtils (#18530) * [PR-18540](https://github.com/apache/kafka/pull/18540) - KAFKA-18399 Remove ZooKeeper from KafkaApis (11/N): CREATE_ACLS and DELETE_ACLS (#18540) * [PR-18432](https://github.com/apache/kafka/pull/18432) - KAFKA-18399 Remove ZooKeeper from KafkaApis (10/N): ALTER_CONFIG and INCREMENETAL_ALTER_CONFIG (#18432) * [PR-18544](https://github.com/apache/kafka/pull/18544) - Revert “KAFKA-18034: CommitRequestManager should fail pending requests on fatal coordinator errors (#18050)” (#18544) * [PR-18487](https://github.com/apache/kafka/pull/18487) - KAFKA-18476: KafkaStreams should swallow TransactionAbortedException (#18487) * [PR-18195](https://github.com/apache/kafka/pull/18195) - KAFKA-18026: KIP-1112, clean up StatefulProcessorNode (#18195) * [PR-18518](https://github.com/apache/kafka/pull/18518) - KAFKA-18502 Remove kafka.controller.Election (#18518) * [PR-18281](https://github.com/apache/kafka/pull/18281) - KAFKA-18330: Update documentation to remove controller deployment limitations (#18281) * [PR-18465](https://github.com/apache/kafka/pull/18465) - KAFKA-18399 Remove ZooKeeper from KafkaApis (9/N): ALTER_CLIENT_QUOTAS and ALLOCATE_PRODUCER_IDS (#18465) * [PR-18511](https://github.com/apache/kafka/pull/18511) - KAFKA-18493: Fix configure :streams:integration-tests project error (#18511) * [PR-18453](https://github.com/apache/kafka/pull/18453) - KAFKA-18399 Remove ZooKeeper from KafkaApis (8/N): ELECT_LEADERS , ALTER_PARTITION, UPDATE_FEATURES (#18453) * [PR-18525](https://github.com/apache/kafka/pull/18525) - Rename the variable to reflect its purpose (#18525) * [PR-18403](https://github.com/apache/kafka/pull/18403) - KAFKA-18211: Override class loaders for class graph scanning in connect. (#18403) * [PR-18500](https://github.com/apache/kafka/pull/18500) - Add DescribeShareGroupOffsets API [KIP-932] (#18500) * [PR-17669](https://github.com/apache/kafka/pull/17669) - KAFKA-17915: Convert Kafka Client system tests to use KRaft (#17669) * [PR-17901](https://github.com/apache/kafka/pull/17901) - KAFKA-18064: SASL mechanisms should throw exception on wrap/unwrap (#17901) * [PR-18507](https://github.com/apache/kafka/pull/18507) - KAFKA-18491 Remove zkClient & maybeUpdateMetadataCache from ReplicaManager (#18507) * [PR-18337](https://github.com/apache/kafka/pull/18337) - KAFKA-18274 Failed to restart controller in testing due to closed socket channel [2/2] (#18337) * [PR-18475](https://github.com/apache/kafka/pull/18475) - KAFKA-18469;KAFKA-18036: AsyncConsumer should request metadata update if ListOffsetRequest encounters a retriable error (#18475) * [PR-17728](https://github.com/apache/kafka/pull/17728) - KAFKA-17973: Relax Restriction for Voters Set Change (#17728) * [PR-18050](https://github.com/apache/kafka/pull/18050) - KAFKA-18034: CommitRequestManager should fail pending requests on fatal coordinator errors (#18050) * [PR-17870](https://github.com/apache/kafka/pull/17870) - KAFKA-18404: Remove partitionMaxBytes usage from DelayedShareFetch (#17870) * [PR-18504](https://github.com/apache/kafka/pull/18504) - KAFKA-18485; Update log4j2.yaml (#18504) * [PR-18433](https://github.com/apache/kafka/pull/18433) - KAFKA-18399 Remove ZooKeeper from KafkaApis (7/N): CREATE_TOPICS, DELETE_TOPICS, CREATE_PARTITIONS (#18433) * [PR-18320](https://github.com/apache/kafka/pull/18320) - KAFKA-18341: Remove KafkaConfig GroupType config check and warn log (#18320) * [PR-18480](https://github.com/apache/kafka/pull/18480) - KAFKA-18457; Update DumpLogSegments to use coordinator record json converters (#18480) * [PR-18447](https://github.com/apache/kafka/pull/18447) - KAFKA-18399 Remove ZooKeeper from KafkaApis (6/N): handleCreateTokenRequest, handleRenewTokenRequestZk, handleExpireTokenRequestZk (#18447) * [PR-18464](https://github.com/apache/kafka/pull/18464) - KAFKA-18399 Remove ZooKeeper from KafkaApis (5/N): ALTER_PARTITION_REASSIGNMENTS, LIST_PARTITION_REASSIGNMENTS (#18464) * [PR-18461](https://github.com/apache/kafka/pull/18461) - KAFKA-18399 Remove ZooKeeper from KafkaApis (4/N): OFFSET_COMMIT and OFFSET_FETCH (#18461) * [PR-18456](https://github.com/apache/kafka/pull/18456) - KAFKA-18399 Remove ZooKeeper from KafkaApis (3/N): USER_SCRAM_CREDENTIALS (#18456) * [PR-18472](https://github.com/apache/kafka/pull/18472) - KAFKA-18466 Remove log4j-1.2-api from runtime scope while keeping it in distribution package (#18472) * [PR-18404](https://github.com/apache/kafka/pull/18404) - KAFKA-18400: Don’t use YYYY when formatting/parsing dates in Java client (#18404) * [PR-18437](https://github.com/apache/kafka/pull/18437) - KAFKA-18446 Remove MetadataCacheControllerNodeProvider (#18437) * [PR-18468](https://github.com/apache/kafka/pull/18468) - KAFKA-18465: Remove MetadataVersions older than 3.0-IV1 (#18468) * [PR-18467](https://github.com/apache/kafka/pull/18467) - KAFKA-18464: Empty Abort Transaction can fence producer incorrectly with Transactions V2 (#18467) * [PR-18471](https://github.com/apache/kafka/pull/18471) - KAFKA-8116: Update Kafka Streams archetype for Java 11 (#18471) * [PR-17510](https://github.com/apache/kafka/pull/17510) - KAFKA-17792: Efficiently parse decimals with large exponents in Connect Values (#17510) * [PR-18679](https://github.com/apache/kafka/pull/18679) - KAFKA-18632: Added few share consumer multibroker tests. (#18679) * [82ccf75a](https://github.com/apache/kafka/commit/82ccf75ae091bffb94cbb3fd173240c48627db17) - KAFKA-18575: Transaction Version 2 doesn’t correctly handle race condition with completing and new transaction(#18604) * [94a1bfb1](https://github.com/apache/kafka/commit/94a1bfb1281f06263976b1ba8bba8c5ac5d7f2ce) - KAFKA-18575: Transaction Version 2 doesn’t correctly handle race condition with completing and new transaction(#18604) * [PR-18340](https://github.com/apache/kafka/pull/18340) - KAFKA-18339: Fix parseRequestHeader error handling (#18340) * [PR-18643](https://github.com/apache/kafka/pull/18643) - Revert “KAFKA-18404: Remove partitionMaxBytes usage from DelayedShareFetch (#17870)” (#18643) * [21c4539d](https://github.com/apache/kafka/commit/21c4539dfe1134e60a7d8680d9ea19ae48f569a3) - Revert “KAFKA-18034: CommitRequestManager should fail pending requests on fatal coordinator errors (#18050)” * [PR-18150](https://github.com/apache/kafka/pull/18150) - KAFKA-18026: KIP-1112 migrate KTableSuppressProcessorSupplier (#18150) * [0186534a](https://github.com/apache/kafka/commit/0186534a992a123a7f53dd32860c6ba5787dbb18) - Revert “KAFKA-17411: Create local state Standbys on start (#16922)” and “KAFKA-17978: Fix invalid topology on Task assignment (#17778)” * [PR-18378](https://github.com/apache/kafka/pull/18378) - KAFKA-18340: Change Dockerfile to use log4j2 yaml instead log4j properties (#18378) * [PR-18397](https://github.com/apache/kafka/pull/18397) - KAFKA-18311: Enforcing copartitioned topics (4/N) (#18397) * [PR-17454](https://github.com/apache/kafka/pull/17454) - KAFKA-17671: Create better documentation for transactions (#17454) * [PR-18455](https://github.com/apache/kafka/pull/18455) - KAFKA-18308; Update CoordinatorSerde (#18455) * [PR-18435](https://github.com/apache/kafka/pull/18435) - KAFKA-18440: Convert AuthorizationException to fatal error in AdminClient (#18435) * [PR-18458](https://github.com/apache/kafka/pull/18458) - KAFKA-18304; Introduce json converter generator (#18458) * [PR-18422](https://github.com/apache/kafka/pull/18422) - KAFKA-18399 Remove ZooKeeper from KafkaApis (2/N): CONTROLLED_SHUTDOWN and ENVELOPE (#18422) * [PR-18146](https://github.com/apache/kafka/pull/18146) - KAFKA-18073: Prevent dropped records from failed retriable exceptions (#18146) * [PR-18321](https://github.com/apache/kafka/pull/18321) - KAFKA-13093: Log compaction should write new segments with record version v2 (KIP-724) (#18321) * [PR-18100](https://github.com/apache/kafka/pull/18100) - KAFKA-18180: Move OffsetResultHolder to storage module (#18100) * [PR-17527](https://github.com/apache/kafka/pull/17527) - KAFKA-17455: fix stuck producer when throttling or retrying (#17527) * [PR-18367](https://github.com/apache/kafka/pull/18367) - KAFKA-17915: Convert remaining Kafka Client system tests to use KRaft (#18367) * [PR-18247](https://github.com/apache/kafka/pull/18247) - KAFKA-18277 Convert network_degrade_test to Kraft mode (#18247) * [PR-18175](https://github.com/apache/kafka/pull/18175) - KAFKA-17986 Fix ConsumerRebootstrapTest and ProducerRebootstrapTest (#18175) * [PR-18445](https://github.com/apache/kafka/pull/18445) - KAFKA-18445 Remove LazyDownConversionRecords and LazyDownConversionRecordsSend (#18445) * [PR-18417](https://github.com/apache/kafka/pull/18417) - KAFKA-18399 Remove ZooKeeper from KafkaApis (1/N): LEADER_AND_ISR, STOP_REPLICA, UPDATE_METADATA (#18417) * [PR-18382](https://github.com/apache/kafka/pull/18382) - KAFKA-17730 ReplicaFetcherThreadBenchmark is broken (#18382) * [PR-18423](https://github.com/apache/kafka/pull/18423) - KAFKA-18437: Correct version of ShareUpdateRecord value (#18423) * [PR-18457](https://github.com/apache/kafka/pull/18457) - KAFKA-18397: Added null check before sending background event from ShareConsumeRequestManager. (#18419) (#18457) * [PR-18462](https://github.com/apache/kafka/pull/18462) - KAFKA-18449: Add share group state configs to reconfig-server.properties (#18440) (#18462) * [PR-18395](https://github.com/apache/kafka/pull/18395) - KAFKA-18311: Configuring repartition topics (3/N) (#18395) * [PR-18446](https://github.com/apache/kafka/pull/18446) - KAFKA-18453: Add StreamsTopology class to group coordinator (#18446) * [PR-18450](https://github.com/apache/kafka/pull/18450) - KAFKA-18435 Remove zookeeper dependencies in build.gradle (#18450) * [PR-18452](https://github.com/apache/kafka/pull/18452) - KAFKA-18111: Add Kafka Logo to README (#18452) * [PR-18438](https://github.com/apache/kafka/pull/18438) - KAFKA-18432 Remove unused code from AutoTopicCreationManager (#18438) * [PR-18436](https://github.com/apache/kafka/pull/18436) - KAFKA-18434: enrich the authorization error message of connecting to controller (#18436) * [PR-18441](https://github.com/apache/kafka/pull/18441) - KAFKA-18426 Remove FinalizedFeatureChangeListener (#18441) * [PR-18276](https://github.com/apache/kafka/pull/18276) - KAFKA-18321: Add StreamsGroupMember, MemberState and Assignment classes (#18276) * [PR-18443](https://github.com/apache/kafka/pull/18443) - KAFKA-18425 Remove OffsetTrackingListener (#18443) * [PR-18439](https://github.com/apache/kafka/pull/18439) - KAFKA-18433: Add BatchSize to ShareFetch request (1/N) (#18439) * [PR-18428](https://github.com/apache/kafka/pull/18428) - Backport some GHA changes from trunk (#18428) * [PR-18415](https://github.com/apache/kafka/pull/18415) - KAFKA-18428: Measure share consumers performance (#18415) * [PR-18419](https://github.com/apache/kafka/pull/18419) - KAFKA-18397: Added null check before sending background event from ShareConsumeRequestManager. (#18419) * [PR-18440](https://github.com/apache/kafka/pull/18440) - KAFKA-18449: Add share group state configs to reconfig-server.properties (#18440) * [PR-18296](https://github.com/apache/kafka/pull/18296) - KAFKA-18173 Remove duplicate assertFutureError (#18296) * [PR-18094](https://github.com/apache/kafka/pull/18094) - KAFKA-15599: Move SegmentPosition/TimingWheelExpirationService to raft module (#18094) * [PR-18329](https://github.com/apache/kafka/pull/18329) - KAFKA-18353 Remove zk config control.plane.listener.name (#18329) * [PR-18429](https://github.com/apache/kafka/pull/18429) - KAFKA-18443 Remove ZkFourLetterWords (#18429) * [PR-18431](https://github.com/apache/kafka/pull/18431) - KAFKA-18417 Remove controlled.shutdown.max.retries and controlled.shutdown.retry.backoff.ms (#18431) * [PR-18287](https://github.com/apache/kafka/pull/18287) - KAFKA-18326: fix merge iterator with cache tombstones (#18287) * [PR-18413](https://github.com/apache/kafka/pull/18413) - KAFKA-18411 Remove ZkProducerIdManager (#18413) * [PR-18421](https://github.com/apache/kafka/pull/18421) - KAFKA-18408 tweak the ‘tag’ field for BrokerHeartbeatRequest.json, BrokerRegistrationChangeRecord.json and RegisterBrokerRecord.json (#18421) * [PR-18401](https://github.com/apache/kafka/pull/18401) - KAFKA-18414 Remove KRaftRegistrationResult (#18401) * [PR-17671](https://github.com/apache/kafka/pull/17671) - KAFKA-17921 Support SASL_PLAINTEXT protocol with java.security.auth.login.config (#17671) * [PR-18411](https://github.com/apache/kafka/pull/18411) - KAFKA-18436: Revert Multiversioning Changes from 4.0 release. (#18411) * [PR-18364](https://github.com/apache/kafka/pull/18364) - KAFKA-18384 Remove ZkAlterPartitionManager (#18364) * [PR-17946](https://github.com/apache/kafka/pull/17946) - KAFKA-10790: Add deadlock detection to producer#flush (#17946) * [PR-18399](https://github.com/apache/kafka/pull/18399) - KAFKA-18412: Remove EmbeddedZookeeper (#18399) * [PR-18352](https://github.com/apache/kafka/pull/18352) - KAFKA-18368 Remove TestUtils#MockZkConnect and remove zkConnect from TestUtils#createBrokerConfig (#18352) * [PR-18396](https://github.com/apache/kafka/pull/18396) - KAFKA-18303; Update ShareCoordinator to use new record format (#18396) * [PR-18370](https://github.com/apache/kafka/pull/18370) - KAFKA-18388 test-kraft-server-start.sh should use log4j2.yaml (#18370) * [PR-17742](https://github.com/apache/kafka/pull/17742) - KAFKA-18419: KIP-891 Connect Multiversion Support (Transformation and Predicate Changes) (#17742) * [PR-18355](https://github.com/apache/kafka/pull/18355) - KAFKA-18374 Remove EncryptingPasswordEncoder, CipherParamsEncoder, GcmParamsEncoder, IvParamsEncoder, and the unused static variables in PasswordEncoder (#18355) * [PR-18379](https://github.com/apache/kafka/pull/18379) - KAFKA-18311: Configuring changelog topics (2/N) (#18379) * [PR-18318](https://github.com/apache/kafka/pull/18318) - KAFKA-18307: Don’t report on disabled/removed tests (#18318) * [PR-17801](https://github.com/apache/kafka/pull/17801) - KAFKA-17278; Add KRaft RPC compatibility tests (#17801) * [PR-18377](https://github.com/apache/kafka/pull/18377) - KAFKA-17539: Application metrics extension for share consumer (#18377) * [PR-18384](https://github.com/apache/kafka/pull/18384) - KAFKA-17616: Remove KafkaServer (#18384) * [PR-18268](https://github.com/apache/kafka/pull/18268) - KAFKA-18311: Add internal datastructure for configuring topologies (1/N) (#18268) * [PR-18343](https://github.com/apache/kafka/pull/18343) - KAFKA-18358: Replace Deprecated $buildDir variable in build.gradle (#18343) * [PR-18353](https://github.com/apache/kafka/pull/18353) - KAFKA-18365 Remove zookeeper.connect in Test (#18353) * [PR-18373](https://github.com/apache/kafka/pull/18373) - Use instanceof pattern to avoid explicit cast (#18373) * [PR-18270](https://github.com/apache/kafka/pull/18270) - KAFKA-18319: Add task assignor interfaces (#18270) * [PR-18259](https://github.com/apache/kafka/pull/18259) - KAFKA-18273: KIP-1099 verbose display share group options (#18259) * [PR-18363](https://github.com/apache/kafka/pull/18363) - KAFKA-18367 Remove ZkConfigManager (#18363) * [PR-18351](https://github.com/apache/kafka/pull/18351) - KAFKA-18347 Add tools-log4j2.yaml to config and remove unsed tools-log4j.properties from config (#18351) * [PR-18359](https://github.com/apache/kafka/pull/18359) - KAFKA-18375 Update the LICENSE-binary (#18359) * [PR-18345](https://github.com/apache/kafka/pull/18345) - KAFKA-18026: KIP-1112, configure all StoreBuilder & StoreFactory layers (#18345) * [PR-18232](https://github.com/apache/kafka/pull/18232) - KAFKA-12469: Deprecated and corrected topic metrics for consumer (KIP-1109) (#18232) * [PR-18254](https://github.com/apache/kafka/pull/18254) - KAFKA-17421 Add integration tests for ConsumerRecord#leaderEpoch (#18254) * [PR-18347](https://github.com/apache/kafka/pull/18347) - KAFKA-18361 Remove PasswordEncoderConfigs (#18347) * [PR-18271](https://github.com/apache/kafka/pull/18271) - KAFKA-17615 Remove KafkaServer from tests (#18271) * [PR-18308](https://github.com/apache/kafka/pull/18308) - KAFKA-18280 fix e2e TestSecurityRollingUpgrade.test_rolling_upgrade_sasl_mechanism_phase_one (#18308) * [PR-18327](https://github.com/apache/kafka/pull/18327) - KAFKA-18313 Fix to Kraft or remove tests associate with Zk Broker config in SocketServerTest and ReplicaFetcherThreadTest (#18327) * [PR-18279](https://github.com/apache/kafka/pull/18279) - KAFKA-18316 Fix to Kraft or remove tests associate with Zk Broker config in ConnectionQuotasTest (#18279) * [PR-18185](https://github.com/apache/kafka/pull/18185) - KAFKA-18243 Fix compatibility of Loggers class between log4j and log4j2 (#18185) * [PR-18269](https://github.com/apache/kafka/pull/18269) - KAFKA-18315 Fix to Kraft or remove tests associate with Zk Broker config in DynamicBrokerConfigTest, ReplicaManagerTest, DescribeTopicPartitionsRequestHandlerTest, KafkaConfigTest (#18269) * [PR-18338](https://github.com/apache/kafka/pull/18338) - KAFKA-18354 Use log4j2 APIs to refactor LogCaptureAppender (#18338) * [PR-18309](https://github.com/apache/kafka/pull/18309) - KAFKA-18314 Fix to Kraft or remove tests associate with Zk Broker config in KafkaApisTest (#18309) * [PR-18344](https://github.com/apache/kafka/pull/18344) - KAFKA-18359 Set zkConnect to null in LocalLeaderEndPointTest, HighwatermarkPersistenceTest, IsrExpirationTest, ReplicaManagerQuotasTest, OffsetsForLeaderEpochTest (#18344) * [PR-18101](https://github.com/apache/kafka/pull/18101) - KAFKA-18135: ShareConsumer HB UnsupportedVersion msg mixed with Consumer HB (#18101) * [PR-18283](https://github.com/apache/kafka/pull/18283) - KAFKA-18317 Remove zookeeper.connect from RemoteLogManagerTest (#18283) * [PR-18295](https://github.com/apache/kafka/pull/18295) - KAFKA-18339: Remove raw unversioned direct SASL protocol (KIP-896) (#18295) * [PR-18313](https://github.com/apache/kafka/pull/18313) - KAFKA-18272: Deprecated protocol api usage should be logged at info level (#18313) * [PR-18282](https://github.com/apache/kafka/pull/18282) - KAFKA-18295 Remove deprecated function Partitioner#onNewBatch (#18282) * [PR-18317](https://github.com/apache/kafka/pull/18317) - KAFKA-18348 Remove the deprecated MockConsumer#setException (#18317) * [PR-18324](https://github.com/apache/kafka/pull/18324) - KAFKA-18352: Add back DeleteGroups v0, it incorrectly tagged as deprecated (#18324) * [PR-18310](https://github.com/apache/kafka/pull/18310) - KAFKA-18274 Failed to restart controller in testing due to closed socket channel [1/2] (#18310) * [PR-18250](https://github.com/apache/kafka/pull/18250) - KAFKA-18093 Remove deprecated DeleteTopicsResult#values (#18250) * [PR-18312](https://github.com/apache/kafka/pull/18312) - KAFKA-18343: Use java_pids to implement pids (#18312) * [PR-18294](https://github.com/apache/kafka/pull/18294) - KAFKA-18338 add log4j.yaml to test-common-api and remove unsed log4j.properties from test-common (#18294) * [PR-18306](https://github.com/apache/kafka/pull/18306) - KAFKA-18342 Use File.exist instead of File.exists to ensure the Vagrantfile works with Ruby 3.2+ (#18306) * [PR-18246](https://github.com/apache/kafka/pull/18246) - KAFKA-18290 Remove deprecated methods of FeatureUpdate (#18246) * [PR-18255](https://github.com/apache/kafka/pull/18255) - KAFKA-18289 Remove deprecated methods of DescribeTopicsResult (#18255) * [PR-18265](https://github.com/apache/kafka/pull/18265) - KAFKA-18291 Remove deprecated methods of ListConsumerGroupOffsetsOptions (#18265) * [PR-18223](https://github.com/apache/kafka/pull/18223) - KAFKA-18278: Correct name and description for run-gradle step (#18223) * [PR-18267](https://github.com/apache/kafka/pull/18267) - KAFKA-17393: Remove log.message.format.version/message.format.version (KIP-724) (#18267) * [PR-18132](https://github.com/apache/kafka/pull/18132) - KAFKA-17705: Add Transactions V2 system tests and mark as production ready (#18132) * [PR-18291](https://github.com/apache/kafka/pull/18291) - KAFKA-18269: Remove deprecated protocol APIs support (KIP-896, KIP-724) (#18291) * [PR-18218](https://github.com/apache/kafka/pull/18218) - KAFKA-18269: Remove deprecated protocol APIs support (KIP-896, KIP-724) (#18218) * [PR-18288](https://github.com/apache/kafka/pull/18288) - KAFKA-18334: Produce v4-v6 should be undeprecated (#18288) * [PR-18262](https://github.com/apache/kafka/pull/18262) - KAFKA-18270: FindCoordinator v0 incorrectly tagged as deprecated (#18262) * [PR-18221](https://github.com/apache/kafka/pull/18221) - KAFKA-18270: SaslHandshake v0 incorrectly tagged as deprecated (#18221) * [PR-18249](https://github.com/apache/kafka/pull/18249) - KAFKA-13722: code cleanup after deprecated StateStore.init() was removed (#18249) * [PR-17687](https://github.com/apache/kafka/pull/17687) - KAFKA-15370: Support Participation in 2PC (KIP-939) (1/N) (#17687) * [PR-18285](https://github.com/apache/kafka/pull/18285) - KAFKA-18312: Added entityType: topicName to SubscribedTopicNames in ShareGroupHeartbeatRequest.json (#18285) * [PR-18261](https://github.com/apache/kafka/pull/18261) - KAFKA-18301; Make coordinator records first class citizen (#18261) * [PR-18204](https://github.com/apache/kafka/pull/18204) - KAFKA-18262 Remove DefaultPartitioner and UniformStickyPartitioner (#18204) * [PR-18257](https://github.com/apache/kafka/pull/18257) - KAFKA-18296 Remove deprecated KafkaBasedLog constructor (#18257) * [PR-18238](https://github.com/apache/kafka/pull/18238) - KAFKA-12829: Remove old Processor and ProcessorSupplier interfaces (#18238) * [PR-18245](https://github.com/apache/kafka/pull/18245) - KAFKA-18292 Remove deprecated methods of UpdateFeaturesOptions (#18245) * [PR-18154](https://github.com/apache/kafka/pull/18154) - KAFKA-12829: Remove deprecated Topology#addProcessor of old Processor API (#18154) * [PR-18136](https://github.com/apache/kafka/pull/18136) - KAFKA-18207: Serde for handling transaction records (#18136) * [PR-18243](https://github.com/apache/kafka/pull/18243) - KAFKA-13722: Refactor Kafka Streams store interfaces (#18243) * [PR-18241](https://github.com/apache/kafka/pull/18241) - KAFKA-17131: Refactor TimeDefinitions (#18241) * [PR-18228](https://github.com/apache/kafka/pull/18228) - KAFKA-18284: Add group coordinator records for Streams rebalance protocol (#18228) * [PR-18242](https://github.com/apache/kafka/pull/18242) - KAFKA-13722: Refactor SerdeGetter (#18242) * [PR-18176](https://github.com/apache/kafka/pull/18176) - KAFKA-18227: Ensure v2 partitions are not added to last transaction during upgrade (#18176) * [PR-18251](https://github.com/apache/kafka/pull/18251) - Add IT for share consumer with duration base offet auto reset (#18251) * [PR-18230](https://github.com/apache/kafka/pull/18230) - KAFKA-18283: Add StreamsGroupDescribe RPC definitions (#18230) * [PR-18260](https://github.com/apache/kafka/pull/18260) - KAFKA-18294 Remove deprecated SourceTask#commitRecord (#18260) * [PR-18211](https://github.com/apache/kafka/pull/18211) - KAFKA-18264 Remove NotLeaderForPartitionException (#18211) * [PR-18248](https://github.com/apache/kafka/pull/18248) - KAFKA-18094 Remove deprecated TopicListing(String, Boolean) (#18248) * [PR-18227](https://github.com/apache/kafka/pull/18227) - KAFKA-18282: Add StreamsGroupHeartbeat RPC definitions (#18227) * [PR-18205](https://github.com/apache/kafka/pull/18205) - KAFKA-18026: transition KTable#filter impl to use processor wrapper (#18205) * [PR-18244](https://github.com/apache/kafka/pull/18244) - KAFKA-18293 Remove org.apache.kafka.common.security.oauthbearer.secured.OAuthBearerLoginCallbackHandler and org.apache.kafka.common.security.oauthbearer.secured.OAuthBearerValidatorCallbackHandler (#18244) * [PR-18234](https://github.com/apache/kafka/pull/18234) - KAFKA-17960; PlaintextAdminIntegrationTest.testConsumerGroups fails with CONSUMER group protocol (#18234) * [PR-18144](https://github.com/apache/kafka/pull/18144) - KAFKA-18200; Handle empty batches in coordinator runtime (#18144) * [PR-18180](https://github.com/apache/kafka/pull/18180) - KAFKA-18237: Upgrade system tests from using 3.7.1 to 3.7.2 (#18180) * [PR-18210](https://github.com/apache/kafka/pull/18210) - KAFKA-18259: Documentation for consumer auto.offset.reset contains invalid HTML (#18210) * [PR-18207](https://github.com/apache/kafka/pull/18207) - KAFKA-18263; Group lock must be acquired when reverting static membership rejoin (#18207) * [PR-18190](https://github.com/apache/kafka/pull/18190) - KAFKA-18244: Fix empty SHA on “Pull Request Labeled” workflow (#18190) * [PR-18166](https://github.com/apache/kafka/pull/18166) - KAFKA-18226: Disable CustomQuotaCallbackTest and remove isKRaftTest (#18166) #### Kafka * [PR-20633](https://github.com/apache/kafka/pull/20633) - KAFKA-19748: Fix metrics leak in Kafka Streams (#20633) * [PR-20618](https://github.com/apache/kafka/pull/20618) - KAFKA-19690 Add epoch check before verification guard check to prevent unexpected fatal error (#20618) * [PR-20583](https://github.com/apache/kafka/pull/20583) - [MINOR] Cleaning ignored streams test (#20583) * [PR-20604](https://github.com/apache/kafka/pull/20604) - KAFKA-19719 –no-initial-controllers should not assume kraft.version=1 (#20604) * [PR-19961](https://github.com/apache/kafka/pull/19961) - KAFKA-19390: Call safeForceUnmap() in AbstractIndex.resize() on Linux to prevent stale mmap of index files (#19961) * [PR-20591](https://github.com/apache/kafka/pull/20591) - KAFKA-19732, KAFKA-19716: Clear out coordinator snapshots periodically while loading (#20591) * [PR-20581](https://github.com/apache/kafka/pull/20581) - KAFKA-19546: Rebalance should be triggered by subscription change during group protocol downgrade (#20581) * [PR-20519](https://github.com/apache/kafka/pull/20519) - KAFKA-19695: Fix bug in redundant offset calculation. (#20516) (#20519) * [PR-20512](https://github.com/apache/kafka/pull/20512) - KAFKA-19679: Fix NoSuchElementException in oldest open iterator metric (#20512) * [PR-20470](https://github.com/apache/kafka/pull/20470) - KAFKA-19668: processValue() must be declared as value-changing operation (#20470) * [13f70256](https://github.com/apache/kafka/commit/13f70256db3c994c590e5d262a7cc50b9e973204) - Bump version to 4.1.0 * [70dd1ca2](https://github.com/apache/kafka/commit/70dd1ca2cab81f78c68782659db1d8453b1de5d6) - Revert “Bump version to 4.1.0” * [PR-20405](https://github.com/apache/kafka/pull/20405) - KAFKA-19642 Replace dynamicPerBrokerConfigs with dynamicDefaultConfigs (#20405) * [PR-1777](https://github.com/confluentinc/kafka/pull/1777) - KSECURITY-2558: Bump jetty to version 12.0.25 in 4.1 * [PR-20070](https://github.com/apache/kafka/pull/20070) - KAFKA-19429: Deflake streams_smoke_test, again (#20070) * [PR-20398](https://github.com/apache/kafka/pull/20398) - Revert “KAFKA-13722: remove usage of old ProcessorContext (#18292)” (#20398) * [PR-1765](https://github.com/confluentinc/kafka/pull/1765) - DPA-1801 Add run_tags to worker-ami and aws-packer * [PR-1746](https://github.com/confluentinc/kafka/pull/1746) - Change ci_tools import path * [23b64404](https://github.com/apache/kafka/commit/23b64404ae7ba98d89a2d456991abaf2f32af35f) - Bump version to 4.1.0 * [6340f437](https://github.com/apache/kafka/commit/6340f437cd2d15be4180febb9505437266080002) - Revert “Bump version to 4.1.0” * [de16dd10](https://github.com/apache/kafka/commit/de16dd103af93bb68a329987ff19469941f85cbc) - KAFKA-19581: Temporary fix for Streams system tests * [PR-20269](https://github.com/apache/kafka/pull/20269) - KAFKA-19576 Fix typo in state-change log filename after rotate (#20269) * [PR-20274](https://github.com/apache/kafka/pull/20274) - KAFKA-19529: State updater sensor names should be unique (#20262) (#20274) * [PR-1708](https://github.com/confluentinc/kafka/pull/1708) - DPA-1675: In case of infra failure in ccs-kafka tag that as infra failure in testbreak * [PR-20165](https://github.com/apache/kafka/pull/20165) - KAFKA-19501 Update OpenJDK base image from buster to bullseye (#20165) * [e14d849c](https://github.com/apache/kafka/commit/e14d849cbf8836cc9e4a592342baf19a1fbd93c9) - Bump version to 4.1.0 * [PR-20200](https://github.com/apache/kafka/pull/20200) - KAFKA-19522: avoid electing fenced lastKnownLeader (#20200) * [PR-20196](https://github.com/apache/kafka/pull/20196) - KAFKA-19520 Bump Commons-Lang for CVE-2025-48924 (#20196) * [PR-20040](https://github.com/apache/kafka/pull/20040) - KAFKA-19427 Allow the coordinator to grow its buffer dynamically (#20040) * [PR-20166](https://github.com/apache/kafka/pull/20166) - KAFKA-19504: Remove unused metrics reporter initialization in KafkaAdminClient (#20166) * [PR-20151](https://github.com/apache/kafka/pull/20151) - KAFKA-19495: Update config for native image (v4.1.0) (#20151) * [610f0765](https://github.com/apache/kafka/commit/610f076542e1ac177c4b97ea7d6ca1335f9a3065) - Bump version to 4.1.0 * [PR-1684](https://github.com/confluentinc/kafka/pull/1684) - DPA-1489 migrate from vagrant to terraform * [PR-1693](https://github.com/confluentinc/kafka/pull/1693) - Revert “Temporarily disable artifact publishing for the 4.1 branch.” * [57e81f20](https://github.com/apache/kafka/commit/57e81f201055b58f94febf0509bfc8acba632854) - Bump version to 4.1.0 * [PR-20071](https://github.com/apache/kafka/pull/20071) - KAFKA-19184: Add documentation for upgrading the kraft version (#20071) * [PR-20116](https://github.com/apache/kafka/pull/20116) - KAFKA-19444: Add back JoinGroup v0 & v1 (#20116) * [PR-19964](https://github.com/apache/kafka/pull/19964) - KAFKA-19397: Ensure consistent metadata usage in produce request and response (#19964) * [PR-19971](https://github.com/apache/kafka/pull/19971) - KAFKA-19042 Move ProducerSendWhileDeletionTest to client-integration-tests module (#19971) * [PR-20100](https://github.com/apache/kafka/pull/20100) - KAFKA-19453: Ignore group not found in share group record replay (#20100) * [PR-20025](https://github.com/apache/kafka/pull/20025) - KAFKA-19152: Add top-level documentation for OAuth flows (#20025) * [PR-20029](https://github.com/apache/kafka/pull/20029) - KAFKA-19379: Basic upgrade guide for KIP-1071 EA (#20029) * [PR-20062](https://github.com/apache/kafka/pull/20062) - KAFKA-19445: Fix coordinator runtime metrics sharing sensors (#20062) * [PR-19704](https://github.com/apache/kafka/pull/19704) - KAFKA-19246; OffsetFetch API does not return group level errors correctly with version 1 (#19704) * [PR-19985](https://github.com/apache/kafka/pull/19985) - KAFKA-19414: Remove 2PC public APIs from 4.1 until release (KIP-939) (#19985) * [PR-1672](https://github.com/confluentinc/kafka/pull/1672) - DPA-1593 exclude newly added files to fix build * [PR-1663](https://github.com/confluentinc/kafka/pull/1663) - DPA-1593 add cloudwatch metrics to view cpu, memory and disk usage * [PR-20022](https://github.com/apache/kafka/pull/20022) - KAFKA-19398: (De)Register oldest-iterator-open-since-ms metric dynamically (#20022) * [PR-20033](https://github.com/apache/kafka/pull/20033) - KAFKA-19383: Handle the deleted topics when applying ClearElrRecord (#20033) * [PR-19745](https://github.com/apache/kafka/pull/19745) - KAFKA-19294: Fix BrokerLifecycleManager RPC timeouts (#19745) * [PR-19974](https://github.com/apache/kafka/pull/19974) - KAFKA-19411: Fix deleteAcls bug which allows more deletions than max records per user op (#19974) * [PR-19972](https://github.com/apache/kafka/pull/19972) - KAFKA-19407 Fix potential IllegalStateException when appending to timeIndex (#19972) * [PR-1659](https://github.com/confluentinc/kafka/pull/1659) - Reapply “KAFKA-18296 Remove deprecated KafkaBasedLog constructor (#18 * [PR-20019](https://github.com/apache/kafka/pull/20019) - KAFKA-19429: Deflake streams_smoke_test (#20019) * [PR-19999](https://github.com/apache/kafka/pull/19999) - KAFKA-19421: Deflake streams_broker_down_resilience_test (#19999) * [PR-20004](https://github.com/apache/kafka/pull/20004) - KAFKA-19422: Deflake streams_application_upgrade_test (#20004) * [PR-20005](https://github.com/apache/kafka/pull/20005) - KAFKA-19423: Deflake streams_broker_bounce_test (#20005) * [PR-19983](https://github.com/apache/kafka/pull/19983) - KAFKA-19356: Prevent new consumer fetch assigned partitions not in explicit subscription (#19983) * [PR-19917](https://github.com/apache/kafka/pull/19917) - KAFKA-19297: Refactor AsyncKafkaConsumer’s use of Java Streams APIs in critical sections (#19917) * [PR-19981](https://github.com/apache/kafka/pull/19981) - KAFKA-19413: Extended AuthorizerIntegrationTest to cover StreamsGroupDescribe (#19981) * [PR-19978](https://github.com/apache/kafka/pull/19978) - KAFKA-19412: Extended AuthorizerIntegrationTest to cover StreamsGroupHeartbeat (#19978) * [PR-19976](https://github.com/apache/kafka/pull/19976) - KAFKA-19367: Follow up bug fix (#19976) * [PR-19800](https://github.com/apache/kafka/pull/19800) - KAFKA-14145; Faster KRaft HWM replication (#19800) * [PR-1655](https://github.com/confluentinc/kafka/pull/1655) - Add back deprecated constructors in KafkaBasedLog * [PR-19938](https://github.com/apache/kafka/pull/19938) - KAFKA-19153: Add OAuth integration tests (#19938) * [PR-19910](https://github.com/apache/kafka/pull/19910) - KAFKA-19367: Fix InitProducerId with TV2 double-increments epoch if ongoing transaction is aborted (#19910) * [PR-19814](https://github.com/apache/kafka/pull/19814) - KAFKA-18117; KAFKA-18729: Use assigned topic IDs to avoid full metadata requests on broker-side regex (#19814) * [PR-19904](https://github.com/apache/kafka/pull/19904) - KAFKA-18961: Time-based refresh for server-side RE2J regex (#19904) * [PR-19939](https://github.com/apache/kafka/pull/19939) - KAFKA-19359: force bump commons-beanutils for CVE-2025-48734 (#19939) * [b311ac7d](https://github.com/apache/kafka/commit/b311ac7dd5bce649fd5bd83a948f95c8c468a9aa) - Temporarily disable artifact publishing for the 4.1 branch. * [PR-19607](https://github.com/apache/kafka/pull/19607) - KAFKA-19221 Propagate IOException on LogSegment#close (#19607) * [PR-19928](https://github.com/apache/kafka/pull/19928) - KAFKA-19389: Fix memory consumption for completed share fetch requests (#19928) * [PR-19895](https://github.com/apache/kafka/pull/19895) - KAFKA-19244: Add support for kafka-streams-groups.sh options (delete offsets) [4/N] (#19895) * [PR-19908](https://github.com/apache/kafka/pull/19908) - KAFKA-19376: Throw an error message if any unsupported feature is used with KIP-1071 (#19908) * [PR-19936](https://github.com/apache/kafka/pull/19936) - KAFKA-19392 Fix metadata.log.segment.ms not being applied (#19936) * [PR-19919](https://github.com/apache/kafka/pull/19919) - KAFKA-19382:Upgrade junit from 5.10 to 5.13 (#19919) * [PR-19929](https://github.com/apache/kafka/pull/19929) - KAFKA-18486 Remove becomeLeaderOrFollower from readFromLogWithOffsetOutOfRange and other related methods. (#19929) * [PR-19931](https://github.com/apache/kafka/pull/19931) - KAFKA-19283: Update transaction exception handling documentation (#19931) * [PR-19832](https://github.com/apache/kafka/pull/19832) - KAFKA-19271: allow intercepting internal method call (#19832) * [PR-19918](https://github.com/apache/kafka/pull/19918) - KAFKA-19386: Correcting ExpirationReaper thread names from Purgatory (#19918) * [PR-19817](https://github.com/apache/kafka/pull/19817) - KAFKA-19334 MetadataShell execution unintentionally deletes lock file (#19817) * [PR-19922](https://github.com/apache/kafka/pull/19922) - KAFKA-18486 Remove ReplicaManager#becomeLeaderOrFollower from testReplicaAlterLogDirs (#19922) * [PR-19827](https://github.com/apache/kafka/pull/19827) - KAFKA-19042 Move PlaintextConsumerSubscriptionTest to client-integration-tests module (#19827) * [PR-19883](https://github.com/apache/kafka/pull/19883) - KAFKA-18486 Update testExceptionWhenUnverifiedTransactionHasMultipleProducerIds (#19883) * [PR-19890](https://github.com/apache/kafka/pull/19890) - KAFKA-18486 Update activeProducerState wih KRaft mechanism in ReplicaManagerTest (#19890) * [PR-19879](https://github.com/apache/kafka/pull/19879) - KAFKA-14895 [1/N] Move AddPartitionsToTxnManager files to java (#19879) * [PR-19915](https://github.com/apache/kafka/pull/19915) - KAFKA-19295: Remove AsyncKafkaConsumer event ID generation (#19915) * [PR-19902](https://github.com/apache/kafka/pull/19902) - KAFKA-18202: Add rejection for non-zero sequences in TV2 (KIP-890) (#19902) * [PR-15913](https://github.com/apache/kafka/pull/15913) - KAFKA-19309 : Add transaction client template code in kafka examples (#15913) * [PR-19900](https://github.com/apache/kafka/pull/19900) - KAFKA-19369: Add group.share.assignors config and integration test (#19900) * [PR-19815](https://github.com/apache/kafka/pull/19815) - KAFKA-19290: Exploit mapKey optimisation in protocol requests and responses (wip) (#19815) * [PR-19889](https://github.com/apache/kafka/pull/19889) - KAFKA-18913: Start state updater in task manager (#19889) * [PR-19907](https://github.com/apache/kafka/pull/19907) - KAFKA-19370: Create JMH benchmark for share group assignor (#19907) * [PR-19773](https://github.com/apache/kafka/pull/19773) - KAFKA-19042 Move PlaintextConsumerAssignTest to clients-integration-tests module (#19773) * [PR-18739](https://github.com/apache/kafka/pull/18739) - KAFKA-16505: Add source raw key and value (#18739) * [PR-19903](https://github.com/apache/kafka/pull/19903) - KAFKA-19373 Fix protocol name comparison (#19903) * [PR-18325](https://github.com/apache/kafka/pull/18325) - KAFKA-19248: Multiversioning in Kafka Connect - Plugin Loading Isolation Tests (#18325) * [PR-19844](https://github.com/apache/kafka/pull/19844) - KAFKA-18042: Reject the produce request with lower producer epoch early (KIP-890) (#19844) * [PR-19901](https://github.com/apache/kafka/pull/19901) - KAFKA-19372: StreamsGroup not subscribed to a topic when empty (#19901) * [PR-19722](https://github.com/apache/kafka/pull/19722) - KAFKA-19044: Handle tasks that are not present in the current topology (#19722) * [PR-19856](https://github.com/apache/kafka/pull/19856) - KAFKA-17747: [7/N] Add consumer group integration test for rack aware assignment (#19856) * [PR-19898](https://github.com/apache/kafka/pull/19898) - KAFKA-19347 Deduplicate ACLs when creating (#19898) * [PR-19872](https://github.com/apache/kafka/pull/19872) - KAFKA-19328: SharePartitionManagerTest testMultipleConcurrentShareFetches doAnswer chaining needs verification (#19872) * [PR-19758](https://github.com/apache/kafka/pull/19758) - KAFKA-19244: Add support for kafka-streams-groups.sh options (delete all groups) [2/N] (#19758) * [PR-19754](https://github.com/apache/kafka/pull/19754) - KAFKA-18573: Add support for OAuth jwt-bearer grant type (#19754) * [PR-19656](https://github.com/apache/kafka/pull/19656) - KAFKA-19250 : txnProducer.abortTransaction() API should not return abortable exception (#19656) * [PR-19522](https://github.com/apache/kafka/pull/19522) - KAFKA-19176: Update Transactional producer to translate retriable into abortable exceptions (#19522) * [PR-19796](https://github.com/apache/kafka/pull/19796) - KAFKA-17747: [6/N] Replace subscription metadata with metadata hash in share group (#19796) * [PR-19802](https://github.com/apache/kafka/pull/19802) - KAFKA-17747: [5/N] Replace subscription metadata with metadata hash in stream group (#19802) * [PR-19861](https://github.com/apache/kafka/pull/19861) - KAFKA-19338: Error on read/write of uninitialized share part. (#19861) * [PR-19878](https://github.com/apache/kafka/pull/19878) - KAFKA-19358: Updated share_consumer_test.py tests to use set_group_offset_reset_strategy (#19878) * [PR-19849](https://github.com/apache/kafka/pull/19849) - KAFKA-19349 Move CreateTopicsRequestWithPolicyTest to clients-integration-tests (#19849) * [PR-19877](https://github.com/apache/kafka/pull/19877) - KAFKA-18904: [4/N] Add ListClientMetricsResources metric if request is v0 ListConfigResources (#19877) * [PR-19811](https://github.com/apache/kafka/pull/19811) - KAFKA-19320: Added share_consume_bench_test.py system tests (#19811) * [PR-19831](https://github.com/apache/kafka/pull/19831) - KAFKA-16894: share.version becomes stable feature for preview (#19831) * [PR-19836](https://github.com/apache/kafka/pull/19836) - KAFKA-19321: Added share_consumer_performance.py and related system tests (#19836) * [PR-19866](https://github.com/apache/kafka/pull/19866) - KAFKA-19355 Remove interBrokerListenerName from ClusterControlManager (#19866) * [PR-19327](https://github.com/apache/kafka/pull/19327) - KAFKA-19053 Remove FetchResponse#of which is not used in production … (#19327) * [PR-19728](https://github.com/apache/kafka/pull/19728) - KAFKA-19284 Add documentation to clarify the behavior of null values for all partitionsToOffsetAndMetadata methods. (#19728) * [PR-19864](https://github.com/apache/kafka/pull/19864) - KAFKA-19311 Document commitAsync behavioral differences between Classic and Async Consumer (#19864) * [PR-19685](https://github.com/apache/kafka/pull/19685) - KAFKA-19042 Move GroupAuthorizerIntegrationTest to clients-integration-tests module (#19685) * [PR-19651](https://github.com/apache/kafka/pull/19651) - KAFKA-19042 Move BaseConsumerTest, SaslPlainPlaintextConsumerTest to client-integration-tests module (#19651) * [PR-19846](https://github.com/apache/kafka/pull/19846) - KAFKA-19346: Move LogReadResult to server module (#19846) * [PR-19855](https://github.com/apache/kafka/pull/19855) - KAFKA-19351: AsyncConsumer#commitAsync should copy the input offsets (#19855) * [PR-19810](https://github.com/apache/kafka/pull/19810) - KAFKA-19042 move ConsumerWithLegacyMessageFormatIntegrationTest to clients-integration-tests module (#19810) * [PR-19808](https://github.com/apache/kafka/pull/19808) - KAFKA-18904: kafka-configs.sh return resource doesn’t exist message [3/N] (#19808) * [PR-19714](https://github.com/apache/kafka/pull/19714) - KAFKA-19082:[4/4] Complete Txn Client Side Changes (KIP-939) (#19714) * [PR-19404](https://github.com/apache/kafka/pull/19404) - KAFKA-6629: parameterise SegmentedCacheFunctionTest for session key schemas (#19404) * [PR-19843](https://github.com/apache/kafka/pull/19843) - KAFKA-19337: Write state writes snapshot for higher state epoch. (#19843) * [PR-19741](https://github.com/apache/kafka/pull/19741) - KAFKA-19056 Rewrite EndToEndClusterIdTest in Java and move it to the server module (#19741) * [PR-19774](https://github.com/apache/kafka/pull/19774) - KAFKA-19316: added share_group_command_test.py system tests (#19774) * [PR-19838](https://github.com/apache/kafka/pull/19838) - KAFKA-19344: Replace desc.assignablePartitions with spec.isPartitionAssignable. (#19838) * [PR-19840](https://github.com/apache/kafka/pull/19840) - KAFKA-19347 Don’t update timeline data structures in createAcls (#19840) * [PR-19826](https://github.com/apache/kafka/pull/19826) - KAFKA-19342: Authorization tests for alter share-group offsets (#19826) * [PR-19818](https://github.com/apache/kafka/pull/19818) - KAFKA-19335: Membership managers send negative epoch in JOINING (#19818) * [PR-19778](https://github.com/apache/kafka/pull/19778) - KAFKA-19285: Added more tests in SharePartitionManagerTest (#19778) * [PR-19823](https://github.com/apache/kafka/pull/19823) - KAFKA-19310: (MINOR) Missing mocks for DelayedShareFetchTest tests related to Memory Records slicing (#19823) * [PR-19761](https://github.com/apache/kafka/pull/19761) - KAFKA-17747: [4/N] Replace subscription metadata with metadata hash in consumer group (#19761) * [PR-19835](https://github.com/apache/kafka/pull/19835) - KAFKA-19336 Upgrade Jackson to 2.19.0 (#19835) * [PR-19744](https://github.com/apache/kafka/pull/19744) - KAFKA-19154; Offset Fetch API should return INVALID_OFFSET if requested topic id does not match persisted one (#19744) * [PR-19812](https://github.com/apache/kafka/pull/19812) - KAFKA-19330 Change MockSerializer/Deserializer to use String serializer instead of byte[] (#19812) * [PR-19790](https://github.com/apache/kafka/pull/19790) - KAFKA-18687: Setting the subscriptionMetadata during conversion to consumer group (#19790) * [PR-19786](https://github.com/apache/kafka/pull/19786) - KAFKA-19268 Missing mocks for SharePartitionManagerTest tests (#19786) * [PR-19798](https://github.com/apache/kafka/pull/19798) - KAFKA-19322 Remove the DelayedOperation constructor that accepts an external lock (#19798) * [PR-19779](https://github.com/apache/kafka/pull/19779) - KAFKA-19300 AsyncConsumer#unsubscribe always timeout due to GroupAuthorizationException (#19779) * [PR-19093](https://github.com/apache/kafka/pull/19093) - KAFKA-18424: Consider splitting PlaintextAdminIntegrationTest#testConsumerGroups (#19093) * [PR-19371](https://github.com/apache/kafka/pull/19371) - KAFKA-19080 The constraint on segment.ms is not enforced at topic level (#19371) * [PR-19681](https://github.com/apache/kafka/pull/19681) - KAFKA-19034 [1/N] Rewrite RemoteTopicCrudTest by ClusterTest and move it to storage module (#19681) * [PR-19759](https://github.com/apache/kafka/pull/19759) - KAFKA-19312 Avoiding concurrent execution of onComplete and tryComplete (#19759) * [PR-19767](https://github.com/apache/kafka/pull/19767) - KAFKA-19313 Replace LogOffsetMetadata#UNIFIED_LOG_UNKNOWN_OFFSET by UnifiedLog.UNKNOWN_OFFSET (#19767) * [PR-19747](https://github.com/apache/kafka/pull/19747) - KAFKA-18345; Wait the entire election timeout on election loss (#19747) * [PR-19687](https://github.com/apache/kafka/pull/19687) - KAFKA-19260 Move LoggingController to server module (#19687) * [PR-18929](https://github.com/apache/kafka/pull/18929) - KAFKA-16717 [2/N]: Add AdminClient.alterShareGroupOffsets (#18929) * [PR-19729](https://github.com/apache/kafka/pull/19729) - KAFKA-19069 DumpLogSegments does not dump the LEADER_CHANGE record (#19729) * [PR-19781](https://github.com/apache/kafka/pull/19781) - KAFKA-19204: Add timestamp to share state metadata init maps [1/N] (#19781) * [PR-19582](https://github.com/apache/kafka/pull/19582) - KAFKA-19042 Move PlaintextConsumerPollTest to client-integration-tests module (#19582) * [PR-19763](https://github.com/apache/kafka/pull/19763) - KAFKA-19314 Remove unnecessary code of closing snapshotWriter (#19763) * [PR-19743](https://github.com/apache/kafka/pull/19743) - KAFKA-18904: Add Admin#listConfigResources [2/N] (#19743) * [PR-19757](https://github.com/apache/kafka/pull/19757) - KAFKA-19291: Increase the timeout of remote storage share fetch requests in purgatory (#19757) * [PR-18951](https://github.com/apache/kafka/pull/18951) - KAFKA-4650: Add unit tests for GraphNode class (#18951) * [PR-19749](https://github.com/apache/kafka/pull/19749) - KAFKA-19287 document all group coordinator metrics (#19749) * [PR-19731](https://github.com/apache/kafka/pull/19731) - KAFKA-18783 : Extend InvalidConfigurationException related exceptions (#19731) * [PR-19658](https://github.com/apache/kafka/pull/19658) - KAFKA-18345; Prevent livelocked elections (#19658) * [PR-1627](https://github.com/confluentinc/kafka/pull/1627) - Trigger cp-jar-build to verify CP packaging in after_pipeline job * [PR-19755](https://github.com/apache/kafka/pull/19755) - KAFKA-19302 Move ReplicaState and Replica to server module (#19755) * [PR-19389](https://github.com/apache/kafka/pull/19389) - KAFKA-19042 Move PlaintextConsumerCommitTest to client-integration-tests module (#19389) * [PR-19611](https://github.com/apache/kafka/pull/19611) - KAFKA-17747: [3/N] Get rid of TopicMetadata in SubscribedTopicDescriberImpl (#19611) * [PR-19700](https://github.com/apache/kafka/pull/19700) - KAFKA-19202: Enable KIP-1071 in streams_eos_test (#19700) * [PR-19717](https://github.com/apache/kafka/pull/19717) - KAFKA-19280: Fix NoSuchElementException in UnifiedLog (#19717) * [PR-19691](https://github.com/apache/kafka/pull/19691) - KAFKA-19256: Only send IQ metadata on assignment changes (#19691) * [PR-19708](https://github.com/apache/kafka/pull/19708) - KAFKA-19226: Added test_console_share_consumer.py (#19708) * [PR-19683](https://github.com/apache/kafka/pull/19683) - KAFKA-19141; Persist topic id in OffsetCommit record (#19683) * [PR-19697](https://github.com/apache/kafka/pull/19697) - KAFKA-19271: Add internal ConsumerWrapper (#19697) * [PR-1625](https://github.com/confluentinc/kafka/pull/1625) - Increase timeout for Connect tests * [PR-19734](https://github.com/apache/kafka/pull/19734) - KAFKA-19217: Fix ShareConsumerTest.testComplexConsumer flakiness. (#19734) * [PR-19507](https://github.com/apache/kafka/pull/19507) - KAFKA-19171: Kafka Streams crashes with UnsupportedOperationException (#19507) * [PR-19709](https://github.com/apache/kafka/pull/19709) - KAFKA-19267 the min version used by ListOffsetsRequest should be 1 rather than 0 (#19709) * [PR-19580](https://github.com/apache/kafka/pull/19580) - KAFKA-19208: KStream-GlobalKTable join should not drop left-null-key record (#19580) * [PR-19493](https://github.com/apache/kafka/pull/19493) - KAFKA-18904: [1/N] Change ListClientMetricsResources API to ListConfigResources (#19493) * [PR-19713](https://github.com/apache/kafka/pull/19713) - KAFKA-19274; Group Coordinator Shards are not unloaded when \_\_consumer_offsets topic is deleted (#19713) * [PR-19701](https://github.com/apache/kafka/pull/19701) - KAFKA-19231-1: Handle fetch request when share session cache is full (#19701) * [PR-19721](https://github.com/apache/kafka/pull/19721) - KAFKA-19281: Add share enable flag to periodic jobs. (#19721) * [PR-19523](https://github.com/apache/kafka/pull/19523) - KAFKA-17747: [2/N] Add compute topic and group hash (#19523) * [PR-19698](https://github.com/apache/kafka/pull/19698) - KAFKA-19269 Unexpected error .. should not happen when the delete.topic.enable is false (#19698) * [PR-19718](https://github.com/apache/kafka/pull/19718) - KAFKA-19270: Remove Optional from ClusterInstance#controllerListenerName() return type (#19718) * [PR-19539](https://github.com/apache/kafka/pull/19539) - KAFKA-19082:[3/4] Add prepare txn method (KIP-939) (#19539) * [PR-19586](https://github.com/apache/kafka/pull/19586) - KAFKA-18666: Controller-side monitoring for broker shutdown and startup (#19586) * [PR-19635](https://github.com/apache/kafka/pull/19635) - KAFKA-19234: broker should return UNAUTHORIZATION error for non-existing topic in produce request (#19635) * [PR-19702](https://github.com/apache/kafka/pull/19702) - KAFKA-19273 Ensure the delete policy is configured when the tiered storage is enabled (#19702) * [PR-19553](https://github.com/apache/kafka/pull/19553) - KAFKA-19091 Fix race condition in DelayedFutureTest (#19553) * [PR-19666](https://github.com/apache/kafka/pull/19666) - KAFKA-19116, KAFKA-19258: Handling share group member change events (#19666) * [PR-19569](https://github.com/apache/kafka/pull/19569) - KAFKA-19206 ConsumerNetworkThread.cleanup() throws NullPointerException if initializeResources() previously failed (#19569) * [PR-19712](https://github.com/apache/kafka/pull/19712) - KAFKA-19275 client-state and thread-state metrics are always “Unavailable” (#19712) * [PR-19630](https://github.com/apache/kafka/pull/19630) - KAFKA-19145 Move LeaderEndPoint to Server module (#19630) * [PR-19622](https://github.com/apache/kafka/pull/19622) - KAFKA-18847: Refactor OAuth layer to improve reusability 1/N (#19622) * [PR-19677](https://github.com/apache/kafka/pull/19677) - KAFKA-18688: Fix uniform homogeneous assignor stability (#19677) * [PR-19659](https://github.com/apache/kafka/pull/19659) - KAFKA-19253: Improve metadata handling for share version using feature listeners (1/N) (#19659) * [PR-19559](https://github.com/apache/kafka/pull/19559) - KAFKA-19201: Handle deletion of user topics part of share partitions. (#19559) * [PR-19515](https://github.com/apache/kafka/pull/19515) - KAFKA-14691; Add TopicId to OffsetFetch API (#19515) * [PR-19705](https://github.com/apache/kafka/pull/19705) - KAFKA-19245: Updated default locks config for share group (#19705) * [PR-19496](https://github.com/apache/kafka/pull/19496) - KAFKA-19163: Avoid deleting groups with pending transactional offsets (#19496) * [PR-1554](https://github.com/confluentinc/kafka/pull/1554) - Chore: update repo by service bot * [PR-19644](https://github.com/apache/kafka/pull/19644) - KAFKA-18905; Disable idempotent producer to remove test flakiness (#19644) * [PR-19631](https://github.com/apache/kafka/pull/19631) - KAFKA-19242: Fix commit bugs caused by race condition during rebalancing. (#19631) * [PR-19497](https://github.com/apache/kafka/pull/19497) - KAFKA-19160;KAFKA-19164; Improve performance of fetching stable offsets (#19497) * [PR-19633](https://github.com/apache/kafka/pull/19633) - KAFKA-18695 Remove quorum=kraft and kip932 from all integration tests (#19633) * [PR-19673](https://github.com/apache/kafka/pull/19673) - KAFKA-19264 Remove fallback for thread pool sizes in RemoteLogManagerConfig (#19673) * [PR-19346](https://github.com/apache/kafka/pull/19346) - KAFKA-19068 Eliminate the duplicate type check in creating ControlRecord (#19346) * [PR-19543](https://github.com/apache/kafka/pull/19543) - KAFKA-19109 Don’t print null in kafka-metadata-quorum describe status (#19543) * [PR-19650](https://github.com/apache/kafka/pull/19650) - KAFKA-19220 Add tests to ensure the internal configs don’t return by public APIs by default (#19650) * [PR-1623](https://github.com/confluentinc/kafka/pull/1623) - KBROKER-295: Ignore failing quota_test * [PR-1622](https://github.com/confluentinc/kafka/pull/1622) - KBROKER-295: Ignore failing quota_test * [PR-19508](https://github.com/apache/kafka/pull/19508) - KAFKA-17897: Deprecate Admin.listConsumerGroups [2/N] (#19508) * [PR-19657](https://github.com/apache/kafka/pull/19657) - KAFKA-19209: Clarify index.interval.bytes impact on offset and time index (#19657) * [PR-18391](https://github.com/apache/kafka/pull/18391) - KAFKA-18115; Fix for loading big files while performing load tests (#18391) * [PR-19608](https://github.com/apache/kafka/pull/19608) - KAFKA-19182 Move SchedulerTest to server module (#19608) * [PR-19568](https://github.com/apache/kafka/pull/19568) - KAFKA-19087 Move TransactionState to transaction-coordinator module (#19568) * [PR-19581](https://github.com/apache/kafka/pull/19581) - KAFKA-18855 Slice API for MemoryRecords (#19581) * [PR-19590](https://github.com/apache/kafka/pull/19590) - KAFKA-19212: Correct the unclean leader election metric calculation (#19590) * [PR-19609](https://github.com/apache/kafka/pull/19609) - KAFKA-19214: Clean up use of Optionals in RequestManagers.entries() (#19609) * [PR-19640](https://github.com/apache/kafka/pull/19640) - KAFKA-19241: Updated tests in ShareFetchAcknowledgeRequestTest to reuse the socket for subsequent requests (#19640) * [PR-19598](https://github.com/apache/kafka/pull/19598) - KAFKA-19215: Handle share partition fetch lock cleanly using tokens (#19598) * [PR-19625](https://github.com/apache/kafka/pull/19625) - KAFKA-19202: Enable KIP-1071 in streams_standby_replica_test.py (#19625) * [PR-19602](https://github.com/apache/kafka/pull/19602) - KAFKA-19218: Add missing leader epoch to share group state summary response (#19602) * [PR-19574](https://github.com/apache/kafka/pull/19574) - KAFKA-19207 Move ForwardingManagerMetrics and ForwardingManagerMetricsTest to server module (#19574) * [PR-19528](https://github.com/apache/kafka/pull/19528) - KAFKA-19170 Move MetricsDuringTopicCreationDeletionTest to client-integration-tests module (#19528) * [PR-19612](https://github.com/apache/kafka/pull/19612) - KAFKA-19227: Piggybacked share fetch acknowledgements performance issue (#19612) * [PR-19639](https://github.com/apache/kafka/pull/19639) - KAFKA-19216: Eliminate flakiness in kafka.server.share.SharePartitionTest (#19639) * [PR-19592](https://github.com/apache/kafka/pull/19592) - KAFKA-19133: Support fetching for multiple remote fetch topic partitions in a single share fetch request (#19592) * [PR-19641](https://github.com/apache/kafka/pull/19641) - KAFKA-19240 Move MetadataVersionIntegrationTest to clients-integration-tests module (#19641) * [PR-19619](https://github.com/apache/kafka/pull/19619) - KAFKA-19232: Handle Share session limit reached exception in clients. (#19619) * [PR-19629](https://github.com/apache/kafka/pull/19629) - KAFKA-19131: Adjust remote storage reader thread maximum pool size to avoid illegal argument (#19629) * [PR-19393](https://github.com/apache/kafka/pull/19393) - KAFKA-19060 Documented null edge cases in the Clients API JavaDoc (#19393) * [PR-19578](https://github.com/apache/kafka/pull/19578) - KAFKA-19205: inconsistent result of beginningOffsets/endoffset between classic and async consumer with 0 timeout (#19578) * [PR-19571](https://github.com/apache/kafka/pull/19571) - KAFKA-18267 Add unit tests for CloseOptions (#19571) * [PR-19603](https://github.com/apache/kafka/pull/19603) - KAFKA-19204: Allow persister retry of initializing topics. (#19603) * [PR-1621](https://github.com/confluentinc/kafka/pull/1621) - Dexcom fix master * [PR-1620](https://github.com/confluentinc/kafka/pull/1620) - Dexcom fix 4.0 * [PR-19475](https://github.com/apache/kafka/pull/19475) - KAFKA-19146 Merge OffsetAndEpoch from raft to server-common (#19475) * [PR-19606](https://github.com/apache/kafka/pull/19606) - KAFKA-16894 Correct definition of ShareVersion (#19606) * [PR-19355](https://github.com/apache/kafka/pull/19355) - KAFKA-19073 add transactional ID pattern filter to ListTransactions (#19355) * [PR-19430](https://github.com/apache/kafka/pull/19430) - KAFKA-17541:[1/2] Improve handling of delivery count (#19430) * [PR-19329](https://github.com/apache/kafka/pull/19329) - KAFKA-19015: Remove share session from cache on share consumer connection drop (#19329) * [PR-19540](https://github.com/apache/kafka/pull/19540) - KAFKA-19169: Enhance AuthorizerIntegrationTest for share group APIs (#19540) * [PR-19604](https://github.com/apache/kafka/pull/19604) - KAFKA-19202: Enable KIP-1071 in streams_relational_smoke_test (#19604) * [PR-19587](https://github.com/apache/kafka/pull/19587) - KAFKA-16718-4/n: ShareGroupCommand changes for DeleteShareGroupOffsets admin call (#19587) * [PR-19601](https://github.com/apache/kafka/pull/19601) - KAFKA-19210: resolved the flakiness in testShareGroupHeartbeatInitializeOnPartitionUpdate (#19601) * [PR-19542](https://github.com/apache/kafka/pull/19542) - KAFKA-16894: Exploit share feature [3/N] (#19542) * [PR-19594](https://github.com/apache/kafka/pull/19594) - KAFKA-19202: Enable KIP-1071 in streams_broker_down_resilience_test (#19594) * [PR-19509](https://github.com/apache/kafka/pull/19509) - KAFKA-19173: Add Feature for “streams” group (#19509) * [PR-19191](https://github.com/apache/kafka/pull/19191) - KAFKA-18760: Deprecate Optional and return String from public Endpoint#listener (#19191) * [PR-19519](https://github.com/apache/kafka/pull/19519) - KAFKA-19139 Plugin#wrapInstance should use LinkedHashMap instead of Map (#19519) * [PR-19588](https://github.com/apache/kafka/pull/19588) - KAFKA-19135 Migrate initial IQ support for KIP-1071 from feature branch to trunk (#19588) * [PR-15968](https://github.com/apache/kafka/pull/15968) - KAFKA-10551: Add topic id support to produce request and response (#15968) * [PR-19470](https://github.com/apache/kafka/pull/19470) - KAFKA-19082: [2/4] Add preparedTxnState class to Kafka Producer (KIP-939) (#19470) * [PR-19584](https://github.com/apache/kafka/pull/19584) - KAFKA-19202: Enable KIP-1071 in streams_broker_bounce_test.py (#19584) * [PR-19593](https://github.com/apache/kafka/pull/19593) - KAFKA-19181-2: Increased offsets.commit.timeout.ms value as a temporary solution for the system test test_broker_failure failure (#19593) * [PR-19560](https://github.com/apache/kafka/pull/19560) - KAFKA-19202: Enable KIP-1071 in streams_smoke_test.py (#19560) * [PR-19555](https://github.com/apache/kafka/pull/19555) - KAFKA-19195: Only send the right group ID subset to each GC shard (#19555) * [PR-19535](https://github.com/apache/kafka/pull/19535) - KAFKA-19183 Replace Pool with ConcurrentHashMap (#19535) * [PR-19529](https://github.com/apache/kafka/pull/19529) - KAFKA-19178 Replace Vector by ArrayList for PluginClassLoader#getResources (#19529) * [PR-19478](https://github.com/apache/kafka/pull/19478) - KAFKA-16718-3/n: Added the ShareGroupStatePartitionMetadata record during deletion of share group offsets (#19478) * [PR-19520](https://github.com/apache/kafka/pull/19520) - KAFKA-19042 Move PlaintextConsumerFetchTest to client-integration-tests module (#19520) * [PR-19532](https://github.com/apache/kafka/pull/19532) - KAFKA-19131: Adjust remote storage reader thread maximum pool size to avoid illegal argument (#19532) * [PR-19504](https://github.com/apache/kafka/pull/19504) - KAFKA-17747: [1/N] Add MetadataHash field to Consumer/Share/StreamGroupMetadataValue (#19504) * [PR-19544](https://github.com/apache/kafka/pull/19544) - KAFKA-19190: Handle shutdown application correctly (#19544) * [PR-19552](https://github.com/apache/kafka/pull/19552) - KAFKA-19198: Resolve NPE when topic assigned in share group is deleted (#19552) * [PR-19548](https://github.com/apache/kafka/pull/19548) - KAFKA-19195: Only send the right group ID subset to each GC shard (#19548) * [PR-19450](https://github.com/apache/kafka/pull/19450) - KAFKA-19128: Kafka Streams should not get offsets when close dirty (#19450) * [PR-19545](https://github.com/apache/kafka/pull/19545) - KAFKA-19192; Old bootstrap checkpoint files cause problems updated servers (#19545) * [PR-17988](https://github.com/apache/kafka/pull/17988) - KAFKA-18988: Connect Multiversion Support (Updates to status and metrics) (#17988) * [PR-19429](https://github.com/apache/kafka/pull/19429) - KAFKA-19082: [1/4] Add client config for enable2PC and overloaded initProducerId (KIP-939) (#19429) * [PR-19536](https://github.com/apache/kafka/pull/19536) - KAFKA-18889: Make records in ShareFetchResponse non-nullable (#19536) * [PR-19457](https://github.com/apache/kafka/pull/19457) - KAFKA-19110: Add missing unit test for Streams-consumer integration (#19457) * [PR-19440](https://github.com/apache/kafka/pull/19440) - KAFKA-15767 Refactor TransactionManager to avoid use of ThreadLocal (#19440) * [PR-19453](https://github.com/apache/kafka/pull/19453) - KAFKA-19124: Follow up on code improvements (#19453) * [PR-19443](https://github.com/apache/kafka/pull/19443) - KAFKA-18170: Add scheduled job to snapshot cold share partitions. (#19443) * [PR-19505](https://github.com/apache/kafka/pull/19505) - KAFKA-19156: Streamlined share group configs, with usage in ShareSessionCache (#19505) * [PR-19541](https://github.com/apache/kafka/pull/19541) - KAFKA-19181: removed assertions in test_share_multiple_partitions as a result of change in assignor algorithm (#19541) * [PR-19461](https://github.com/apache/kafka/pull/19461) - KAFKA-14690; Add TopicId to OffsetCommit API (#19461) * [PR-19416](https://github.com/apache/kafka/pull/19416) - KAFKA-16538; Enable upgrading kraft version for existing clusters (#19416) * [PR-18673](https://github.com/apache/kafka/pull/18673) - KAFKA-18572: Update Kafka Streams metric documenation (#18673) * [PR-19500](https://github.com/apache/kafka/pull/19500) - KAFKA-19159: Removed time based evictions for share sessions (#19500) * [PR-19378](https://github.com/apache/kafka/pull/19378) - KAFKA-19057: Stabilize KIP-932 RPCs for AK 4.1 (#19378) * [PR-19518](https://github.com/apache/kafka/pull/19518) - KAFKA-19166: Fix RC tag in release script (#19518) * [PR-19525](https://github.com/apache/kafka/pull/19525) - KAFKA-19179: remove the dot from thread_dump_url (#19525) * [PR-19437](https://github.com/apache/kafka/pull/19437) - KAFKA-19019: Add support for remote storage fetch for share groups (#19437) * [PR-17099](https://github.com/apache/kafka/pull/17099) - KAFKA-8830 make Record Headers available in onAcknowledgement (#17099) * [PR-19526](https://github.com/apache/kafka/pull/19526) - KAFKA-19180 Fix the hanging testPendingTaskSize (#19526) * [PR-19302](https://github.com/apache/kafka/pull/19302) - KAFKA-14487: Move LogManager static methods/fields to storage module (#19302) * [PR-19487](https://github.com/apache/kafka/pull/19487) - KAFKA-18854 remove DynamicConfig inner class (#19487) * [PR-19286](https://github.com/apache/kafka/pull/19286) - KAFKA-18891: Add KIP-877 support to RemoteLogMetadataManager and RemoteStorageManager (#19286) * [PR-19462](https://github.com/apache/kafka/pull/19462) - KAFKA-17184: Fix the error thrown while accessing the RemoteIndexCache (#19462) * [PR-19477](https://github.com/apache/kafka/pull/19477) - KAFKA-17897 Deprecate Admin.listConsumerGroups (#19477) * [PR-18926](https://github.com/apache/kafka/pull/18926) - KAFKA-18332 fix ClassDataAbstractionCoupling problem in KafkaRaftClientTest(1/2) (#18926) * [PR-19465](https://github.com/apache/kafka/pull/19465) - KAFKA-19136 Move metadata-related configs from KRaftConfigs to MetadataLogConfig (#19465) * [PR-19503](https://github.com/apache/kafka/pull/19503) - KAFKA-19157: added group.share.max.share.sessions config (#19503) * [PR-1614](https://github.com/confluentinc/kafka/pull/1614) - CONFLUENT: Fix tools-log4j files in the scripts * [PR-1613](https://github.com/confluentinc/kafka/pull/1613) - CONFLUENT: Fix tools-log4j files names in the scripts * [PR-19474](https://github.com/apache/kafka/pull/19474) - KAFKA-14523: Move kafka.log.remote classes to storage (#19474) * [PR-19491](https://github.com/apache/kafka/pull/19491) - KAFKA-19162: Topology metadata contains non-deterministically ordered topic configs (#19491) * [PR-19394](https://github.com/apache/kafka/pull/19394) - KAFKA-19054: StreamThread exception handling with SHUTDOWN_APPLICATION may trigger a tight loop with MANY logs (#19394) * [PR-19454](https://github.com/apache/kafka/pull/19454) - KAFKA-19130: Do not add fenced brokers to BrokerRegistrationTracker on startup (#19454) * [PR-19460](https://github.com/apache/kafka/pull/19460) - KAFKA-19002 Rewrite ListOffsetsIntegrationTest and move it to clients-integration-test (#19460) * [PR-19492](https://github.com/apache/kafka/pull/19492) - KAFKA-19158: Add SHARE_SESSION_LIMIT_REACHED error code (#19492) * [PR-19488](https://github.com/apache/kafka/pull/19488) - KAFKA-19147: Start authorizer before group coordinator to ensure coordinator authorizes regex topics (#19488) * [PR-19298](https://github.com/apache/kafka/pull/19298) - KAFKA-19042 Move PlaintextConsumerCallbackTest to client-integration-tests module (#19298) * [PR-19472](https://github.com/apache/kafka/pull/19472) - KAFKA-13610: Deprecate log.cleaner.enable configuration (#19472) * [PR-19050](https://github.com/apache/kafka/pull/19050) - KAFKA-18888: Add KIP-877 support to Authorizer (#19050) * [PR-19420](https://github.com/apache/kafka/pull/19420) - KAFKA-18983 Ensure all README.md(s) are mentioned by the root README.md (#19420) * [PR-19433](https://github.com/apache/kafka/pull/19433) - KAFKA-18288: Add support kafka-streams-groups.sh –describe (#19433) * [PR-19464](https://github.com/apache/kafka/pull/19464) - KAFKA-19137 Use StandardCharsets.UTF_8 instead of StandardCharsets.UTF_8.name() (#19464) * [PR-19364](https://github.com/apache/kafka/pull/19364) - KAFKA-15370: ACL changes to support 2PC (KIP-939) (#19364) * [PR-19417](https://github.com/apache/kafka/pull/19417) - KAFKA-18900: Implement share.acknowledgement.mode to choose acknowledgement mode (#19417) * [PR-19463](https://github.com/apache/kafka/pull/19463) - KAFKA-18629: Account for existing deleting topics in share group delete. (#19463) * [PR-19319](https://github.com/apache/kafka/pull/19319) - KAFKA-19042 Move ProducerCompressionTest, ProducerFailureHandlingTest, and ProducerIdExpirationTest to client-integration-tests module (#19319) * [PR-19391](https://github.com/apache/kafka/pull/19391) - KAFKA-14523: Decouple RemoteLogManager and Partition (#19391) * [PR-19469](https://github.com/apache/kafka/pull/19469) - KAFKA-18172 Move RemoteIndexCacheTest to the storage module (#19469) * [PR-19426](https://github.com/apache/kafka/pull/19426) - KAFKA-19119 Move ApiVersionManager/SimpleApiVersionManager to server (#19426) * [PR-19439](https://github.com/apache/kafka/pull/19439) - KAFKA-19121 Move AddPartitionsToTxnConfig and TransactionStateManagerConfig out of KafkaConfig (#19439) * [PR-19424](https://github.com/apache/kafka/pull/19424) - KAFKA-19113: Migrate DelegationTokenManager to server module (#19424) * [PR-19347](https://github.com/apache/kafka/pull/19347) - KAFKA-19027 Replace ConsumerGroupCommandTestUtils#generator by ClusterTestDefaults (#19347) * [PR-19431](https://github.com/apache/kafka/pull/19431) - KAFKA-19115: Utilize initialized topics info to verify delete share group offsets (#19431) * [PR-19419](https://github.com/apache/kafka/pull/19419) - KAFKA-15371 MetadataShell is stuck when bootstrapping (#19419) * [PR-19345](https://github.com/apache/kafka/pull/19345) - KAFKA-19071: Fix doc for remote.storage.enable (#19345) * [PR-19374](https://github.com/apache/kafka/pull/19374) - KAFKA-19030 Remove metricNamePrefix from RequestChannel (#19374) * [PR-19387](https://github.com/apache/kafka/pull/19387) - KAFKA-14485: Move LogCleaner to storage module (#19387) * [PR-19293](https://github.com/apache/kafka/pull/19293) - KAFKA-16894: Define feature to enable share groups (#19293) * [PR-19436](https://github.com/apache/kafka/pull/19436) - KAFKA-19127: Integration test for altering and describing streams group configs (#19436) * [PR-19441](https://github.com/apache/kafka/pull/19441) - KAFKA-19103 Remove OffsetConfig (#19441) * [PR-19438](https://github.com/apache/kafka/pull/19438) - KAFKA-19118: Enable KIP-1071 in StandbyTaskCreationIntegrationTest (#19438) * [PR-19423](https://github.com/apache/kafka/pull/19423) - KAFKA-18286: Implement support for streams groups in kafka-groups.sh (#19423) * [PR-19289](https://github.com/apache/kafka/pull/19289) - KAFKA-19042 Move TransactionsWithMaxInFlightOneTest to client-integration-tests module (#19289) * [PR-19410](https://github.com/apache/kafka/pull/19410) - KAFKA-19101 Remove ControllerMutationQuotaManager#throttleTimeMs unused parameter (#19410) * [PR-19354](https://github.com/apache/kafka/pull/19354) - KAFKA-18782: Extend ApplicationRecoverableException related exceptions (#19354) * [PR-19363](https://github.com/apache/kafka/pull/19363) - KAFKA-18629: Utilize share group partition metadata for delete group. (#19363) * [PR-19421](https://github.com/apache/kafka/pull/19421) - KAFKA-19124: Use consumer background event queue for Streams events (#19421) * [PR-19425](https://github.com/apache/kafka/pull/19425) - KAFKA-19118: Enable KIP-1071 in InternalTopicIntegrationTest (#19425) * [PR-19432](https://github.com/apache/kafka/pull/19432) - KAFKA-18170: Add create and write timestamp fields in share snapshot [1/N] (#19432) * [PR-19167](https://github.com/apache/kafka/pull/19167) - KAFKA-18935: Ensure brokers do not return null records in FetchResponse (#19167) * [PR-19261](https://github.com/apache/kafka/pull/19261) - KAFKA-16729: Support isolation level for share consumer (#19261) * [PR-19188](https://github.com/apache/kafka/pull/19188) - KAFKA-18962: Fix onBatchRestored call in GlobalStateManagerImpl (#19188) * [83f6a1d7](https://github.com/apache/kafka/commit/83f6a1d7e6dfce4a78e1192a8fecf523b39ddaab) - KAFKA-18991; Missing change for cherry-pick * [PR-19223](https://github.com/apache/kafka/pull/19223) - KAFKA-18991: FetcherThread should match leader epochs between fetch request and fetch state (#19223) * [PR-19422](https://github.com/apache/kafka/pull/19422) - KAFKA-18287: Add support for kafka-streams-groups.sh –list (#19422) * [PR-18852](https://github.com/apache/kafka/pull/18852) - KAFKA-18723; Better handle invalid records during replication (#18852) * [PR-19377](https://github.com/apache/kafka/pull/19377) - KAFKA-19037: Integrate consumer-side code with Streams (#19377) * [PR-1611](https://github.com/confluentinc/kafka/pull/1611) - Fix build failure (#1582) * [PR-19390](https://github.com/apache/kafka/pull/19390) - KAFKA-19090: Move DelayedFuture and DelayedFuturePurgatory to server module (#19390) * [PR-19213](https://github.com/apache/kafka/pull/19213) - KAFKA-18984: Reset interval.ms By Using kafka-client-metrics.sh (#19213) * [PR-18976](https://github.com/apache/kafka/pull/18976) - KAFKA-16718-2/n: KafkaAdminClient and GroupCoordinator implementation for DeleteShareGroupOffsets RPC (#18976) * [PR-19384](https://github.com/apache/kafka/pull/19384) - KAFKA-19093 Change the “Handler on Broker” to “Handler on Controller” for controller server (#19384) * [PR-19296](https://github.com/apache/kafka/pull/19296) - KAFKA-19047: Allow quickly re-registering brokers that are in controlled shutdown (#19296) * [PR-19413](https://github.com/apache/kafka/pull/19413) - KAFKA-19099 Remove GroupSyncKey, GroupJoinKey, and MemberKey (#19413) * [PR-19068](https://github.com/apache/kafka/pull/19068) - KAFKA-18892: Add KIP-877 support for ClientQuotaCallback (#19068) * [PR-19406](https://github.com/apache/kafka/pull/19406) - KAFKA-19100: Use ProcessRole instead of String in AclApis (#19406) * [PR-19398](https://github.com/apache/kafka/pull/19398) - KAFKA-19098 Remove lastOffset from PartitionResponse (#19398) * [PR-19359](https://github.com/apache/kafka/pull/19359) - KAFKA-19077: Propagate shutdownRequested field (#19359) * [PR-19219](https://github.com/apache/kafka/pull/19219) - KAFKA-19001: Use streams group-level configurations in heartbeat (#19219) * [PR-19369](https://github.com/apache/kafka/pull/19369) - KAFKA-19084: Port KAFKA-16224, KAFKA-16764 for ShareConsumers (#19369) * [PR-19392](https://github.com/apache/kafka/pull/19392) - KAFKA-19076 replace String by Supplier for UnifiedLog#maybeHandleIOException (#19392) * [PR-17614](https://github.com/apache/kafka/pull/17614) - KAFKA-16758: Extend Consumer#close with an option to leave the group or not (#17614) * [PR-19303](https://github.com/apache/kafka/pull/19303) - KAFKA-16407: Fix foreign key INNER join on change of FK from/to a null value (#19303) * [PR-19242](https://github.com/apache/kafka/pull/19242) - KAFKA-19013 Reformat PR body to 72 characters (#19242) * [PR-19288](https://github.com/apache/kafka/pull/19288) - KAFKA-19042 Move TransactionsExpirationTest to client-integration-tests module (#19288) * [PR-19357](https://github.com/apache/kafka/pull/19357) - KAFKA-19074 Remove the cached responseData from ShareFetchResponse (#19357) * [PR-19285](https://github.com/apache/kafka/pull/19285) - KAFKA-14523: Move DelayedRemoteListOffsets to the storage module (#19285) * [PR-19323](https://github.com/apache/kafka/pull/19323) - KAFKA-13747: refactor TopologyTest to test different store type parametrized (#19323) * [PR-19370](https://github.com/apache/kafka/pull/19370) - KAFKA-19085: SharePartitionManagerTest testMultipleConcurrentShareFetches throws silent exception and works incorrectly (#19370) * [PR-19218](https://github.com/apache/kafka/pull/19218) - KAFKA-7952: use in memory stores for KTable test (#19218) * [PR-19328](https://github.com/apache/kafka/pull/19328) - KAFKA-18761: [2/N] List share group offsets with state and auth (#19328) * [PR-19005](https://github.com/apache/kafka/pull/19005) - KAFKA-18713: Fix FK Left-Join result race condition (#19005) * [PR-19269](https://github.com/apache/kafka/pull/19269) - KAFKA-18067: Add a flag to disable producer reset during active task creator shutting down (#19269) * [PR-19320](https://github.com/apache/kafka/pull/19320) - KAFKA-19055 Cleanup the 0.10.x information from clients module (#19320) * [PR-19348](https://github.com/apache/kafka/pull/19348) - KAFKA-19075: Included other share group dynamic configs in extractShareGroupConfigMap method in ShareGroupConfig (#19348) * [PR-19333](https://github.com/apache/kafka/pull/19333) - KAFKA-19064: Handle exceptions from deferred events in coordinator (#19333) * [PR-19339](https://github.com/apache/kafka/pull/19339) - KAFKA-18827: Incorporate initializing topics in share group heartbeat [4/N] (#19339) * [PR-19111](https://github.com/apache/kafka/pull/19111) - KAFKA-18923: resource leak in RSM fetchIndex inputStream (#19111) * [PR-19317](https://github.com/apache/kafka/pull/19317) - KAFKA-18949 add consumer protocol to testDeleteRecordsAfterCorruptRecords (#19317) * [PR-19324](https://github.com/apache/kafka/pull/19324) - KAFKA-19058 Running the streams/streams-scala module tests produces a streams-scala.log (#19324) * [PR-19276](https://github.com/apache/kafka/pull/19276) - KAFKA-19003: Add forceTerminateTransaction command to CLI tools (#19276) * [PR-19226](https://github.com/apache/kafka/pull/19226) - KAFKA-19004 Move DelayedDeleteRecords to server-common module (#19226) * [PR-18953](https://github.com/apache/kafka/pull/18953) - KAFKA-18826: Add global thread metrics (#18953) * [PR-19343](https://github.com/apache/kafka/pull/19343) - KAFKA-19016: Updated the retention behaviour of share groups to retain them forever (#19343) * [PR-19344](https://github.com/apache/kafka/pull/19344) - KAFKA-19072: Add system test for ELR (#19344) * [PR-19331](https://github.com/apache/kafka/pull/19331) - KAFKA-15931: Cancel RemoteLogReader gracefully (#19331) * [PR-19338](https://github.com/apache/kafka/pull/19338) - KAFKA-18796-2: Corrected the check for acquisition lock timeout in Sh… (#19338) * [PR-19335](https://github.com/apache/kafka/pull/19335) - KAFKA-19062: Port changes from KAFKA-18645 to share-consumers (#19335) * [PR-19334](https://github.com/apache/kafka/pull/19334) - KAFKA-19018,KAFKA-19063: Implement maxRecords and acquisition lock timeout in share fetch request and response resp. (#19334) * [PR-18383](https://github.com/apache/kafka/pull/18383) - KAFKA-18613: Unit tests for usage of incorrect RPCs (#18383) * [PR-19189](https://github.com/apache/kafka/pull/19189) - KAFKA-18613: Improve test coverage for missing topics (#19189) * [PR-18510](https://github.com/apache/kafka/pull/18510) - KAFKA-18409: ShareGroupStateMessageFormatter should use CoordinatorRecordMessageFormatter (#18510) * [PR-19274](https://github.com/apache/kafka/pull/19274) - KAFKA-18959 increase the num_workers from 9 to 14 (#19274) * [PR-19283](https://github.com/apache/kafka/pull/19283) - KAFKA-19042 Move ConsumerTopicCreationTest to client-integration-tests module (#19283) * [PR-18297](https://github.com/apache/kafka/pull/18297) - KAFKA-16260: Deprecate window.size.ms and window.inner.class.serde in StreamsConfig (#18297) * [PR-19114](https://github.com/apache/kafka/pull/19114) - KAFKA-18613: Add StreamsGroupHeartbeat handler in the group coordinator (#19114) * [PR-19270](https://github.com/apache/kafka/pull/19270) - KAFKA-19032 Remove TestInfoUtils.TestWithParameterizedQuorumAndGroupProtocolNames (#19270) * [PR-19268](https://github.com/apache/kafka/pull/19268) - KAFKA-19005 improve the documentation of DescribeTopicsOptions#partitionSizeLimitPerResponse (#19268) * [PR-19282](https://github.com/apache/kafka/pull/19282) - KAFKA-19036 Rewrite LogAppendTimeTest and move it to storage module (#19282) * [PR-19299](https://github.com/apache/kafka/pull/19299) - KAFKA-19049 Remove the @ExtendWith(ClusterTestExtensions.class) from code base (#19299) * [PR-19076](https://github.com/apache/kafka/pull/19076) - KAFKA-17830 Cover unit tests for TBRLMM init failure scenarios (#19076) * [PR-19216](https://github.com/apache/kafka/pull/19216) - KAFKA-14486 Move LogCleanerManager to storage module (#19216) * [PR-19026](https://github.com/apache/kafka/pull/19026) - KAFKA-18827: Initialize share group state group coordinator impl. [3/N] (#19026) * [PR-18695](https://github.com/apache/kafka/pull/18695) - KAFKA-18616; Refactor Tools’s ApiMessageFormatter (#18695) * [PR-19192](https://github.com/apache/kafka/pull/19192) - KAFKA-18899: Improve handling of timeouts for commitAsync() in ShareConsumer. (#19192) * [PR-19154](https://github.com/apache/kafka/pull/19154) - KAFKA-18914 Migrate ConsumerRebootstrapTest to use new test infra (#19154) * [PR-19233](https://github.com/apache/kafka/pull/19233) - KAFKA-18736: Add pollOnClose() and maximumTimeToWait() (#19233) * [PR-19230](https://github.com/apache/kafka/pull/19230) - KAFKA-18736: Handle errors in the Streams group heartbeat request manager (#19230) * [PR-18711](https://github.com/apache/kafka/pull/18711) - KAFKA-18576 Convert ConfigType to Enum (#18711) * [PR-19247](https://github.com/apache/kafka/pull/19247) - KAFKA-18796: Added more information to error message when assertion fails for acquisition lock timeout (#19247) * [PR-19046](https://github.com/apache/kafka/pull/19046) - KAFKA-18276 Migrate ProducerRebootstrapTest to new test infra (#19046) * [PR-19207](https://github.com/apache/kafka/pull/19207) - KAFKA-18980 OffsetMetadataManager#cleanupExpiredOffsets should record the number of records rather than topic partitions (#19207) * [PR-19227](https://github.com/apache/kafka/pull/19227) - KAFKA-18999 Remove BrokerMetadata (#19227) * [PR-19256](https://github.com/apache/kafka/pull/19256) - KAFKA-17806 remove this-escape suppress warnings in AclCommand (#19256) * [PR-19255](https://github.com/apache/kafka/pull/19255) - KAFKA-18329; [3/3] Delete old group coordinator (KIP-848) (#19255) * [PR-19064](https://github.com/apache/kafka/pull/19064) - KAFKA-18893: Add KIP-877 support to ReplicaSelector (#19064) * [PR-19251](https://github.com/apache/kafka/pull/19251) - KAFKA-18329; [2/3] Delete old group coordinator (KIP-848) (#19251) * [PR-19254](https://github.com/apache/kafka/pull/19254) - KAFKA-19017: Changed consumer.config to command-config in verifiable_share_consumer.py (#19254) * [PR-19246](https://github.com/apache/kafka/pull/19246) - KAFKA-15599 Move MetadataLogConfig to raft module (#19246) * [PR-19180](https://github.com/apache/kafka/pull/19180) - KAFKA-18954: Add ELR election rate metric (#19180) * [PR-19197](https://github.com/apache/kafka/pull/19197) - KAFKA-15931: Cancel RemoteLogReader gracefully (#19197) * [PR-19243](https://github.com/apache/kafka/pull/19243) - KAFKA-18329; [1/3] Delete old group coordinator (KIP-848) (#19243) * [PR-19174](https://github.com/apache/kafka/pull/19174) - KAFKA-18946 Move BrokerReconfigurable and DynamicProducerStateManagerConfig to server module (#19174) * [PR-18842](https://github.com/apache/kafka/pull/18842) - KAFKA-806 Index may not always observe log.index.interval.bytes (#18842) * [PR-19214](https://github.com/apache/kafka/pull/19214) - KAFKA-18989 Optimize FileRecord#searchForOffsetWithSize (#19214) * [PR-19183](https://github.com/apache/kafka/pull/19183) - KAFKA-18819 StreamsGroupHeartbeat API and StreamsGroupDescribe API check topic describe (#19183) * [PR-19217](https://github.com/apache/kafka/pull/19217) - KAFKA-18975 Move clients-integration-test out of core module (#19217) * [PR-19193](https://github.com/apache/kafka/pull/19193) - KAFKA-18953: [1/N] Add broker side handling for 2 PC (KIP-939) (#19193) * [PR-18949](https://github.com/apache/kafka/pull/18949) - KAFKA-17431: Support invalid static configs for KRaft so long as dynamic configs are valid (#18949) * [PR-19202](https://github.com/apache/kafka/pull/19202) - KAFKA-18969 Rewrite ShareConsumerTest#setup and move to clients-integration-tests module (#19202) * [PR-19165](https://github.com/apache/kafka/pull/19165) - KAFKA-18955: Fix infinite loop and standardize options in MetadataSchemaCheckerTool (#19165) * [PR-18966](https://github.com/apache/kafka/pull/18966) - KAFKA-18808 add test to ensure the name= is not equal to default quota (#18966) * [PR-18463](https://github.com/apache/kafka/pull/18463) - KAFKA-17171 Add test cases for STATIC_BROKER_CONFIG in kraft mode (#18463) * [PR-18801](https://github.com/apache/kafka/pull/18801) - KAFKA-17565 Move MetadataCache interface to metadata module (#18801) * [PR-19181](https://github.com/apache/kafka/pull/19181) - KAFKA-18736: Do not send fields if not needed (#19181) * [PR-19215](https://github.com/apache/kafka/pull/19215) - KAFKA-18990 Avoid redundant MetricName creation in BaseQuotaTest#produceUntilThrottled (#19215) * [PR-19027](https://github.com/apache/kafka/pull/19027) - KAFKA-18859 honor the error message of UnregisterBrokerResponse (#19027) * [PR-19212](https://github.com/apache/kafka/pull/19212) - KAFKA-18993 Remove confusing notable change section from upgrade.html (#19212) * [PR-19129](https://github.com/apache/kafka/pull/19129) - KAFKA-18703 Remove unused class PayloadKeyType (#19129) * [PR-19187](https://github.com/apache/kafka/pull/19187) - KAFKA-18915 Rewrite AdminClientRebootstrapTest to cover the current scenario (#19187) * [PR-19147](https://github.com/apache/kafka/pull/19147) - KAFKA-18924 Running the storage module tests produces a storage/storage.log file (#19147) * [PR-19136](https://github.com/apache/kafka/pull/19136) - KAFKA-18781: Extend RefreshRetriableException related exceptions (#19136) * [PR-18994](https://github.com/apache/kafka/pull/18994) - KAFKA-18843: MirrorMaker2 unique workerId (#18994) * [PR-17264](https://github.com/apache/kafka/pull/17264) - KAFKA-17516 Synonyms for client metrics configs (#17264) * [PR-19134](https://github.com/apache/kafka/pull/19134) - KAFKA-18927 Remove LATEST_0_11, LATEST_1_0, LATEST_1_1, LATEST_2_0 (#19134) * [PR-19164](https://github.com/apache/kafka/pull/19164) - KAFKA-18943: Kafka Streams incorrectly commits TX during task revokation (#19164) * [PR-19205](https://github.com/apache/kafka/pull/19205) - KAFKA-18979; Report correct kraft.version in ApiVersions (#19205) * [PR-19176](https://github.com/apache/kafka/pull/19176) - KAFKA-18651: Add Streams-specific broker configurations (#19176) * [PR-19040](https://github.com/apache/kafka/pull/19040) - KAFKA-18858 Refactor FeatureControlManager to avoid using uninitialized MV (#19040) * [PR-19030](https://github.com/apache/kafka/pull/19030) - KAFKA-14484: Move UnifiedLog to storage module (#19030) * [PR-18662](https://github.com/apache/kafka/pull/18662) - KAFKA-18617 Allow use of ClusterInstance inside BeforeEach (#18662) * [PR-18018](https://github.com/apache/kafka/pull/18018) - KAFKA-18142 Switch to com.gradleup.shadow (#18018) * [PR-19169](https://github.com/apache/kafka/pull/19169) - KAFKA-18947 Remove unused raftManager in metadataShell (#19169) * [PR-18998](https://github.com/apache/kafka/pull/18998) - KAFKA-18837: Ensure controller quorum timeouts and backoffs are at least 0 (#18998) * [PR-19119](https://github.com/apache/kafka/pull/19119) - KAFKA-18422 Adjust Kafka client upgrade path section (#19119) * [PR-19168](https://github.com/apache/kafka/pull/19168) - KAFKA-18942: Add reviewers to PR body with committer-tools (#19168) * [PR-19148](https://github.com/apache/kafka/pull/19148) - KAFKA-18932: Removed usage of partition max bytes from share fetch requests (#19148) * [PR-19145](https://github.com/apache/kafka/pull/19145) - KAFKA-18936: Fix share fetch when records are larger than max bytes (#19145) * [PR-18091](https://github.com/apache/kafka/pull/18091) - KAFKA-18074: Add kafka client compatibility matrix (#18091) * [PR-18258](https://github.com/apache/kafka/pull/18258) - KAFKA-18195: Fix Kafka Streams broker compatibility matrix (#18258) * [PR-19171](https://github.com/apache/kafka/pull/19171) - KAFKA-17808: Fix id typo for connector-dlq-adminclient (#19171) * [PR-19144](https://github.com/apache/kafka/pull/19144) - KAFKA-18933 Add client integration tests module (#19144) * [PR-19155](https://github.com/apache/kafka/pull/19155) - KAFKA-18925: Add streams groups support to Admin.listGroups (#19155) * [PR-19142](https://github.com/apache/kafka/pull/19142) - KAFKA-18901: [1/N] Improved homogeneous SimpleAssignor (#19142) * [PR-19162](https://github.com/apache/kafka/pull/19162) - KAFKA-18941: Do not test 3.3 in upgrade_tests.py (#19162) * [PR-19121](https://github.com/apache/kafka/pull/19121) - KAFKA-18736: Decide when a heartbeat should be sent (#19121) * [PR-19173](https://github.com/apache/kafka/pull/19173) - KAFKA-18931: added a share group session timeout task when group coordinator is loaded (#19173) * [PR-19099](https://github.com/apache/kafka/pull/19099) - KAFKA-18637: Fix max connections per ip and override reconfigurations (#19099) * [PR-17767](https://github.com/apache/kafka/pull/17767) - KAFKA-17856 Move ConfigCommandTest and ConfigCommandIntegrationTest to tool module (#17767) * [PR-18802](https://github.com/apache/kafka/pull/18802) - KAFKA-18706 Move AclPublisher to metadata module (#18802) * [PR-19166](https://github.com/apache/kafka/pull/19166) - KAFKA-18944 Remove unused setters from ClusterConfig (#19166) * [PR-19081](https://github.com/apache/kafka/pull/19081) - KAFKA-18909 Move DynamicThreadPool to server module (#19081) * [PR-19062](https://github.com/apache/kafka/pull/19062) - KAFKA-18700 Migrate SnapshotPath, Entry, OffsetAndEpoch, LogFetchInfo, and LogAppendInfo to record classes (#19062) * [PR-19156](https://github.com/apache/kafka/pull/19156) - KAFKA-18940: fix electionWasClean (#19156) * [PR-19127](https://github.com/apache/kafka/pull/19127) - KAFKA-18920: The kcontrollers must set kraft.version in ApiVersionsResponse (#19127) * [PR-19116](https://github.com/apache/kafka/pull/19116) - KAFKA-18285: Add describeStreamsGroup to Admin API (#19116) * [PR-18684](https://github.com/apache/kafka/pull/18684) - KAFKA-18461 Add Objects.requireNotNull to Snapshot (#18684) * [PR-18299](https://github.com/apache/kafka/pull/18299) - KAFKA-17607: Add CI step to verify LICENSE-binary (#18299) * [PR-19137](https://github.com/apache/kafka/pull/19137) - KAFKA-18929: Log a warning when time based segment delete is blocked by a future timestamp (#19137) * [PR-15241](https://github.com/apache/kafka/pull/15241) - KAFKA-15931: Reopen TransactionIndex if channel is closed (#15241) * [PR-19138](https://github.com/apache/kafka/pull/19138) - KAFKA-18046; High CPU usage when using Log4j2 (#19138) * [PR-19094](https://github.com/apache/kafka/pull/19094) - KAFKA-18915: Migrate AdminClientRebootstrapTest to use new test infra (#19094) * [PR-19113](https://github.com/apache/kafka/pull/19113) - KAFKA-18900: Experimental share consumer acknowledge mode config (#19113) * [PR-19131](https://github.com/apache/kafka/pull/19131) - KAFKA-18648: Make records in FetchResponse nullable again (#19131) * [PR-19120](https://github.com/apache/kafka/pull/19120) - KAFKA-18887: Implement Streams Admin APIs (#19120) * [PR-19130](https://github.com/apache/kafka/pull/19130) - KAFKA-18811: Added command configs to admin client as well in VerifiableShareConsumer (#19130) * [PR-19112](https://github.com/apache/kafka/pull/19112) - KAFKA-18910 Remove kafka.utils.json (#19112) * [4a500418](https://github.com/apache/kafka/commit/4a500418c63a063198c5f6ce256bfef9ffd74e3a) - Revert “KAFKA-18246 Fix ConnectRestApiTest.test_rest_api by adding multiversioning configs (#18191)” * [d86cb597](https://github.com/apache/kafka/commit/d86cb597902d32ce83f27d65b60df6700cb7a61d) - Revert “KAFKA-18887: Implement Streams Admin APIs (#19049)” * [PR-19049](https://github.com/apache/kafka/pull/19049) - KAFKA-18887: Implement Streams Admin APIs (#19049) * [PR-19104](https://github.com/apache/kafka/pull/19104) - KAFKA-18919 Clarify that KafkaPrincipalBuilder classes must also implement KafkaPrincipalSerde (#19104) * [PR-19054](https://github.com/apache/kafka/pull/19054) - KAFKA-18882 Remove BaseKey, TxnKey, and UnknownKey (#19054) * [PR-19083](https://github.com/apache/kafka/pull/19083) - KAFKA-18817: ShareGroupHeartbeat and ShareGroupDescribe API must check topic describe (#19083) * [PR-18983](https://github.com/apache/kafka/pull/18983) - KAFKA-14121: AlterPartitionReassignments API should allow callers to specify the option of preserving the replication factor (#18983) * [PR-18918](https://github.com/apache/kafka/pull/18918) - KAFKA-18804 Remove slf4j warning when using tool script (#18918) * [PR-9766](https://github.com/apache/kafka/pull/9766) - KAFKA-10864 Convert end txn marker schema to use auto-generated protocol (#9766) * [PR-19087](https://github.com/apache/kafka/pull/19087) - KAFKA-18886 add behavior change of CreateTopicPolicy and AlterConfigPolicy to zk2kraft (#19087) * [PR-19097](https://github.com/apache/kafka/pull/19097) - KAFKA-18422 add link of KIP-1124 to “rolling upgrade” section (#19097) * [PR-19089](https://github.com/apache/kafka/pull/19089) - KAFKA-18917: TransformValues throws NPE (#19089) * [PR-19065](https://github.com/apache/kafka/pull/19065) - KAFKA-18876 4.0 documentation improvement (#19065) * [PR-19086](https://github.com/apache/kafka/pull/19086) - Fix typos in multiple files (#19086) * [PR-19091](https://github.com/apache/kafka/pull/19091) - KAFKA-18918: Correcting releasing of locks on exception (#19091) * [PR-19088](https://github.com/apache/kafka/pull/19088) - KAFKA-18916; Resolved regular expressions must update the group by topics data structure (#19088) * [PR-19075](https://github.com/apache/kafka/pull/19075) - KAFKA-18867 add tests to describe topic configs with empty name (#19075) * [PR-18449](https://github.com/apache/kafka/pull/18449) - KAFKA-18500 Build PRs at HEAD commit (#18449) * [PR-19059](https://github.com/apache/kafka/pull/19059) - KAFKA-18878: Added share session cache and delayed share fetch metrics (KIP-1103) (#19059) * [PR-18997](https://github.com/apache/kafka/pull/18997) - KAFKA-18844: Stale features information in QuorumController#registerBroker (#18997) * [PR-19036](https://github.com/apache/kafka/pull/19036) - KAFKA-18864:remove the Evolving tag from stable public interfaces (#19036) * [PR-19055](https://github.com/apache/kafka/pull/19055) - KAFKA-18817:[1/N] ShareGroupHeartbeat and ShareGroupDescribe API must check topic describe (#19055) * [PR-18981](https://github.com/apache/kafka/pull/18981) - KAFKA-18613: Auto-creation of internal topics in streams group heartbeat (#18981) * [PR-19056](https://github.com/apache/kafka/pull/19056) - KAFKA-18881 Document the ConsumerRecord as non-thread safe (#19056) * [PR-18752](https://github.com/apache/kafka/pull/18752) - KAFKA-18168: Adding checkpointing for GlobalKTable during restoration and closing (#18752) * [PR-19070](https://github.com/apache/kafka/pull/19070) - KAFKA-18907 Add suitable error message when the appended value is too larger (#19070) * [PR-19067](https://github.com/apache/kafka/pull/19067) - KAFKA-18908 Document that the size of appended value can’t be larger than Short.MAX_VALUE (#19067) * [PR-19047](https://github.com/apache/kafka/pull/19047) - KAFKA-18880 Remove kafka.cluster.Broker and BrokerEndPointNotAvailableException (#19047) * [PR-19063](https://github.com/apache/kafka/pull/19063) - KAFKA-17039 KIP-919 supports for unregisterBroker (#19063) * [PR-17771](https://github.com/apache/kafka/pull/17771) - KAFKA-17981 add Integration test for ConfigCommand to add config key=[val1,val2] (#17771) * [PR-19045](https://github.com/apache/kafka/pull/19045) - KAFKA-18734: Implemented share partition metrics (KIP-1103) (#19045) * [PR-19048](https://github.com/apache/kafka/pull/19048) - KAFKA-18860 Remove Missing Features section (#19048) * [PR-18349](https://github.com/apache/kafka/pull/18349) - KAFKA-18371 TopicBasedRemoteLogMetadataManagerConfig exposes sensitive configuration data in logs (#18349) * [PR-19020](https://github.com/apache/kafka/pull/19020) - KAFKA-18780: Extend RetriableException related exceptions (#19020) * [PR-19037](https://github.com/apache/kafka/pull/19037) - KAFKA-18869 add remote storage threads to “Updating Thread Configs” section (#19037) * [PR-17743](https://github.com/apache/kafka/pull/17743) - KAFKA-18863: Connect Multiversion Support (Versioned Connector Creation and related changes) (#17743) * [PR-19042](https://github.com/apache/kafka/pull/19042) - KAFKA-18813: ConsumerGroupHeartbeat API and ConsumerGroupDescribe API… (#19042) * [PR-18989](https://github.com/apache/kafka/pull/18989) - KAFKA-18813: ConsumerGroupHeartbeat API and ConsumerGroupDescribe API must check topic describe (#18989) * [PR-18979](https://github.com/apache/kafka/pull/18979) - KAFKA-18614, KAFKA-18613: Add streams group request plumbing (#18979) * [PR-18864](https://github.com/apache/kafka/pull/18864) - KAFKA-18757: Create full-function SimpleAssignor to match KIP-932 description (#18864) * [PR-18988](https://github.com/apache/kafka/pull/18988) - KAFKA-18839: Drop EAGER rebalancing support in Kafka Streams (#18988) * [PR-18985](https://github.com/apache/kafka/pull/18985) - KAFKA-18792 Add workflow to check PR format (#18985) * [PR-19010](https://github.com/apache/kafka/pull/19010) - KAFKA-17351: Improved handling of compacted topics in share partition (2/N) (#19010) * [PR-19021](https://github.com/apache/kafka/pull/19021) - KAFKA-17836 Move RackAwareTest to server module (#19021) * [PR-18803](https://github.com/apache/kafka/pull/18803) - KAFKA-18712 Move Endpoint to server module (#18803) * [PR-18387](https://github.com/apache/kafka/pull/18387) - KAFKA-18281: Kafka is improperly validating non-advertised listeners for routable controller addresses (#18387) * [PR-18900](https://github.com/apache/kafka/pull/18900) - KAFKA-17937 Cleanup AbstractFetcherThreadTest (#18900) * [PR-18898](https://github.com/apache/kafka/pull/18898) - KIP-966 part 1 release doc (#18898) * [PR-18770](https://github.com/apache/kafka/pull/18770) - KAFKA-18748 Run new tests separately in PRs (#18770) * [PR-18804](https://github.com/apache/kafka/pull/18804) - KAFKA-18522: Slice records for share fetch (#18804) * [PR-18233](https://github.com/apache/kafka/pull/18233) - KAFKA-18023: Enforcing Explicit Naming for Kafka Streams Internal Topics (#18233) * [PR-18939](https://github.com/apache/kafka/pull/18939) - KAFKA-18779: Validate responses from broker in client for ShareFetch and ShareAcknowledge RPCs. (#18939) * [PR-18992](https://github.com/apache/kafka/pull/18992) - KAFKA-18827: Initialize share group state persister impl [2/N]. (#18992) * [PR-18880](https://github.com/apache/kafka/pull/18880) - KAFKA-15583 doc update for the “strict min ISR” rule (#18880) * [PR-18928](https://github.com/apache/kafka/pull/18928) - KAFKA-18629: ShareGroupDeleteState admin client impl. (#18928) * [PR-18978](https://github.com/apache/kafka/pull/18978) - KAFKA-17351: Update tests and acquire API to allow discard batches from compacted topics (1/N) (#18978) * [PR-18968](https://github.com/apache/kafka/pull/18968) - KAFKA-18827: Initialize share state, share coordinator impl. [1/N] (#18968) * [PR-19000](https://github.com/apache/kafka/pull/19000) - Revert “KAFKA-16803: Change fork, update ShadowJavaPlugin to 8.1.7 (#16295)” (#19000) * [PR-18897](https://github.com/apache/kafka/pull/18897) - KAFKA-18795 Remove Records#downConvert (#18897) * [PR-18996](https://github.com/apache/kafka/pull/18996) - KAFKA-18813: [3/N] Client support for TopicAuthException in DescribeConsumerGroup path (#18996) * [PR-18959](https://github.com/apache/kafka/pull/18959) - KAFKA-18733: Implemented fetch ratio and partition acquire time metrics (3/N) (#18959) * [PR-18986](https://github.com/apache/kafka/pull/18986) - KAFKA-18813: [2/N] Client support for TopicAuthException in HB path (#18986) * [PR-18844](https://github.com/apache/kafka/pull/18844) - KAFKA-18737 KafkaDockerWrapper setup functions fails due to storage format command (#18844) * [PR-18848](https://github.com/apache/kafka/pull/18848) - KAFKA-18629: Delete share group state RPC group coordinator impl. [3/N] (#18848) * [PR-18982](https://github.com/apache/kafka/pull/18982) - KAFKA-18829: Added check before converting to IMPLICIT mode (#18964) (Cherry-pick) (#18982) * [PR-18969](https://github.com/apache/kafka/pull/18969) - KAFKA-18831 Migrating to log4j2 introduce behavior changes of adjusting level dynamically (#18969) * [PR-18737](https://github.com/apache/kafka/pull/18737) - KAFKA-18641: AsyncKafkaConsumer could lose records with auto offset commit (#18737) * [PR-18962](https://github.com/apache/kafka/pull/18962) - KAFKA-18828: Update share group metrics per new init and call mechanism. (#18962) * [PR-18891](https://github.com/apache/kafka/pull/18891) - KAFKA-16918 TestUtils#assertFutureThrows should use future.get with timeout (#18891) * [PR-18965](https://github.com/apache/kafka/pull/18965) - MIINOR: Remove redundant quorum parameter from `*AdminIntegrationTest` classes (#18965) * [PR-18967](https://github.com/apache/kafka/pull/18967) - KAFKA-18791 Set default commit to PR title and description [2/n] (#18967) * [PR-18964](https://github.com/apache/kafka/pull/18964) - KAFKA-18829: Added check before converting to IMPLICIT mode (#18964) * [PR-18955](https://github.com/apache/kafka/pull/18955) - KAFKA-18791 Enable new asf.yaml parser [1/n] (#18955) * [PR-18845](https://github.com/apache/kafka/pull/18845) - KAFKA-18601: Assume a baseline of 3.3 for server protocol versions (#18845) * [PR-18944](https://github.com/apache/kafka/pull/18944) - KAFKA-18198: Added check to prevent acknowledgements on initial ShareFetchRequest. (#18944) * [PR-18946](https://github.com/apache/kafka/pull/18946) - KAFKA-18799 Remove AdminUtils (#18946) * [PR-18757](https://github.com/apache/kafka/pull/18757) - KAFKA-18667 Add replication system test case for combined broker + controller failure (#18757) * [PR-18872](https://github.com/apache/kafka/pull/18872) - KAFKA-18773 Migrate the log4j1 config to log4j 2 for native image and README (#18872) * [PR-18004](https://github.com/apache/kafka/pull/18004) - KAFKA-18089: Upgrade Caffeine lib to 3.1.8 (#18004) * [PR-18850](https://github.com/apache/kafka/pull/18850) - KAFKA-18767: Add client side config check for shareConsumer (#18850) * [PR-18460](https://github.com/apache/kafka/pull/18460) - KAFKA-14484: Decouple UnifiedLog and RemoteLogManager (#18460) * [PR-18927](https://github.com/apache/kafka/pull/18927) - KAFKA-16718 [1/n]: Added DeleteShareGroupOffsets request and response schema (#18927) * [PR-18870](https://github.com/apache/kafka/pull/18870) - KAFKA-18736: Add Streams group heartbeat request manager (1/N) (#18870) * [PR-18914](https://github.com/apache/kafka/pull/18914) - KAFKA-18798 The replica placement policy used by ReassignPartitionsCommand is not aligned with kraft controller (#18914) * [PR-18888](https://github.com/apache/kafka/pull/18888) - KAFKA-18787: RemoteIndexCache fails to delete invalid files on init (#18888) * [PR-18934](https://github.com/apache/kafka/pull/18934) - KAFKA-18807; Fix thread idle ratio metric (#18934) * [PR-18871](https://github.com/apache/kafka/pull/18871) - KAFKA-18684: Add base exception classes (#18871) * [PR-18924](https://github.com/apache/kafka/pull/18924) - KAFKA-18733: Updating share group record acks metric (2/N) (#18924) * [PR-18907](https://github.com/apache/kafka/pull/18907) - KAFKA-18801 Remove ClusterGenerator and revise ClusterTemplate javadoc (#18907) * [PR-18809](https://github.com/apache/kafka/pull/18809) - KAFKA-18730: Add replaying streams group state from offset topic (#18809) * [PR-18889](https://github.com/apache/kafka/pull/18889) - KAFKA-18784 Fix ConsumerWithLegacyMessageFormatIntegrationTest (#18889) * [PR-18920](https://github.com/apache/kafka/pull/18920) - KAFKA-18805: add synchronized block for Consumer Heartbeat close (#18920) * [PR-18908](https://github.com/apache/kafka/pull/18908) - KAFKA-18755 Align timeout in kafka-share-groups.sh (#18908) * [PR-18922](https://github.com/apache/kafka/pull/18922) - KAFKA-18809 Set min in sync replicas for \_\_share_group_state. (#18922) * [PR-18916](https://github.com/apache/kafka/pull/18916) - KAFKA-18803 The acls would appear at the wrong level of the metadata shell “tree” (#18916) * [PR-18906](https://github.com/apache/kafka/pull/18906) - KAFKA-18790 Fix testCustomQuotaCallback (#18906) * [PR-18894](https://github.com/apache/kafka/pull/18894) - KAFKA-18761: Complete listing of share group offsets [1/N] (#18894) * [PR-18819](https://github.com/apache/kafka/pull/18819) - KAFKA-16717 [1/2]: Add AdminClient.alterShareGroupOffsets (#18819) * [PR-18899](https://github.com/apache/kafka/pull/18899) - KAFKA-18772 Define share group config defaults for Docker (#18899) * [PR-18826](https://github.com/apache/kafka/pull/18826) - KAFKA-18733: Updating share group metrics (1/N) (#18826) * [PR-18680](https://github.com/apache/kafka/pull/18680) - KAFKA-18634: Fix ELR metadata version issues (#18680) * [PR-18795](https://github.com/apache/kafka/pull/18795) - KAFKA-17182: Consumer fetch sessions are evicted too quickly with AsyncKafkaConsumer (#18795) * [PR-18834](https://github.com/apache/kafka/pull/18834) - KAFKA-16720: Support multiple groups in DescribeShareGroupOffsets RPC (#18834) * [PR-18810](https://github.com/apache/kafka/pull/18810) - KAFKA-18654[2/2]: Transction V2 retry add partitions on the server side when handling produce request. (#18810) * [PR-18756](https://github.com/apache/kafka/pull/18756) - KAFKA-17298: Update upgrade notes for 4.0 KIP-848 (#18756) * [PR-18807](https://github.com/apache/kafka/pull/18807) - KAFKA-18728 Move ListOffsetsPartitionStatus to server module (#18807) * [PR-18851](https://github.com/apache/kafka/pull/18851) - KAFKA-18769: Improve leadership changes handling in ShareConsumeRequestManager. (#18851) * [PR-18869](https://github.com/apache/kafka/pull/18869) - KAFKA-18777 add PartitionsWithLateTransactionsCount to BrokerMetricNamesTest (#18869) * [PR-18729](https://github.com/apache/kafka/pull/18729) - KAFKA-18323: Add StreamsGroup class (#18729) * [PR-18275](https://github.com/apache/kafka/pull/18275) - KAFKA-15443: Upgrade RocksDB to 9.7.3 (#18275) * [PR-18451](https://github.com/apache/kafka/pull/18451) - KAFKA-18035: TransactionsTest testBumpTransactionalEpochWithTV2Disabled failed on trunk (#18451) * [PR-17804](https://github.com/apache/kafka/pull/17804) - KAFKA-15995: Adding KIP-877 support to Connect (#17804) * [PR-18829](https://github.com/apache/kafka/pull/18829) - KAFKA-18756: Enabled share group configs for queues related system tests (#18829) * [PR-18858](https://github.com/apache/kafka/pull/18858) - Fix bug in json naming (#18858) * [PR-18833](https://github.com/apache/kafka/pull/18833) - KAFKA-18758: NullPointerException in shutdown following InvalidConfigurationException (#18833) * [PR-18855](https://github.com/apache/kafka/pull/18855) - KAFKA-18764: Throttle on share state RPCs auth failure. (#18855) * [PR-18039](https://github.com/apache/kafka/pull/18039) - KAFKA-14484: Move UnifiedLog static methods to storage (#18039) * [PR-18394](https://github.com/apache/kafka/pull/18394) - KAFKA-18396: Migrate log4j1 configuration to log4j2 in KafkaDockerWrapper (#18394) * [PR-18853](https://github.com/apache/kafka/pull/18853) - KAFKA-18770 close the RM created by testDelayedShareFetchPurgatoryOperationExpiration (#18853) * [PR-18820](https://github.com/apache/kafka/pull/18820) - KAFKA-18366 Remove KafkaConfig.interBrokerProtocolVersion (#18820) * [PR-18812](https://github.com/apache/kafka/pull/18812) - KAFKA-18658 add import control for examples module (#18812) * [PR-18821](https://github.com/apache/kafka/pull/18821) - KAFKA-18743 Remove leader.imbalance.per.broker.percentage as it is not supported by Kraft (#18821) * [PR-1578](https://github.com/confluentinc/kafka/pull/1578) - CCS CP release test regex updates * [PR-18196](https://github.com/apache/kafka/pull/18196) - KAFKA-18225 ClientQuotaCallback#updateClusterMetadata is unsupported by kraft (#18196) * [PR-1582](https://github.com/confluentinc/kafka/pull/1582) - Fix build failure * [PR-18846](https://github.com/apache/kafka/pull/18846) - KAFKA-18763: changed the assertion statement for acknowledgements to include only successful acks (#18846) * [PR-18824](https://github.com/apache/kafka/pull/18824) - KAFKA-18745: Handle network related errors in persister. (#18824) * [PR-18252](https://github.com/apache/kafka/pull/18252) - KAFKA-17833: Convert DescribeAuthorizedOperationsTest to use KRaft (#18252) * [PR-1577](https://github.com/confluentinc/kafka/pull/1577) - CCS CP release test regex updates * [PR-18381](https://github.com/apache/kafka/pull/18381) - KAFKA-18275 Restarting broker in testing should use the same port (#18381) * [PR-18818](https://github.com/apache/kafka/pull/18818) - KAFKA-18741 document the removal of inter.broker.protocol.version (#18818) * [PR-18496](https://github.com/apache/kafka/pull/18496) - KAFKA-18483 Disable Log4jController and Loggers if Log4j Core absent (#18496) * [PR-18672](https://github.com/apache/kafka/pull/18672) - KAFKA-18618: Improve leader change handling of acknowledgements [1/N] (#18672) * [PR-18566](https://github.com/apache/kafka/pull/18566) - KAFKA-18360 Remove zookeeper configurations (#18566) * [PR-18641](https://github.com/apache/kafka/pull/18641) - KAFKA-18530 Remove ZooKeeperInternals (#18641) * [PR-18583](https://github.com/apache/kafka/pull/18583) - KAFKA-18499 Clean up zookeeper from LogConfig (#18583) * [PR-18771](https://github.com/apache/kafka/pull/18771) - KAFKA-18689: Improve metric calculation to avoid NoSuchElementException (#18771) * [PR-18189](https://github.com/apache/kafka/pull/18189) - KAFKA-18206: EmbeddedKafkaCluster must set features (#18189) * [PR-18765](https://github.com/apache/kafka/pull/18765) - KAFKA-17379: Fix inexpected state transition from ERROR to PENDING_SHUTDOWN (#18765) * [PR-18696](https://github.com/apache/kafka/pull/18696) - KAFKA-18494-3: solution for the bug relating to gaps in the share partition cachedStates post initialization (#18696) * [PR-18748](https://github.com/apache/kafka/pull/18748) - KAFKA-18629: Add persister impl and tests for DeleteShareGroupState RPC. [2/N] (#18748) * [PR-18671](https://github.com/apache/kafka/pull/18671) - [KAFKA-16720] AdminClient Support for ListShareGroupOffsets (2/2) (#18671) * [PR-18702](https://github.com/apache/kafka/pull/18702) - KAFKA-18645: New consumer should align close timeout handling with classic consumer (#18702) * [PR-18791](https://github.com/apache/kafka/pull/18791) - KAFKA-18722: Remove the unreferenced methods in TBRLMM and ConsumerManager (#18791) * [PR-18782](https://github.com/apache/kafka/pull/18782) - KAFKA-18694: Migrate suitable classes to records in coordinator-common module (#18782) * [PR-18784](https://github.com/apache/kafka/pull/18784) - KAFKA-18705: Move ConfigRepository to metadata module (#18784) * [PR-18783](https://github.com/apache/kafka/pull/18783) - KAFKA-18698: Migrate suitable classes to records in server and server-common modules (#18783) * [PR-18781](https://github.com/apache/kafka/pull/18781) - KAFKA-18675 Add tests for valid and invalid broker addresses (#18781) * [PR-18304](https://github.com/apache/kafka/pull/18304) - KAFKA-16524; Metrics for KIP-853 (#18304) * [PR-18277](https://github.com/apache/kafka/pull/18277) - KAFKA-18635: reenable the unclean shutdown detection (#18277) * [PR-18708](https://github.com/apache/kafka/pull/18708) - KAFKA-18649: complete ClearElrRecord handling (#18708) * [PR-18148](https://github.com/apache/kafka/pull/18148) - KAFKA-16540: Clear ELRs when min.insync.replicas is changed. (#18148) * [PR-17952](https://github.com/apache/kafka/pull/17952) - KAFKA-16540: enforce min.insync.replicas config invariants for ELR (#17952) * [PR-15622](https://github.com/apache/kafka/pull/15622) - KAFKA-16446: Improve controller event duration logging (#15622) * [PR-18028](https://github.com/apache/kafka/pull/18028) - KAFKA-18131: Improve logs for voters (#18028) * [PR-18222](https://github.com/apache/kafka/pull/18222) - KAFKA-18305: validate controller.listener.names is not in inter.broker.listener.name for kcontrollers (#18222) * [PR-18777](https://github.com/apache/kafka/pull/18777) - KAFKA-18690: Keep leader metadata for RE2J-assigned partitions (#18777) * [PR-18551](https://github.com/apache/kafka/pull/18551) - KAFKA-18538: Add Streams membership manager (#18551) * [PR-18165](https://github.com/apache/kafka/pull/18165) - KAFKA-18230: Handle not controller or not leader error in admin client (#18165) * [PR-18700](https://github.com/apache/kafka/pull/18700) - KAFKA-18644: improve generic type names for internal FK-join classes (#18700) * [PR-18790](https://github.com/apache/kafka/pull/18790) - KAFKA-18693 Remove PasswordEncoder (#18790) * [PR-18720](https://github.com/apache/kafka/pull/18720) - KAFKA-18654 [1/2]: Transaction Version 2 performance regression due to early return (#18720) * [PR-18592](https://github.com/apache/kafka/pull/18592) - KAFKA-18545: Remove Zookeeper logic from LogManager (#18592) * [PR-18676](https://github.com/apache/kafka/pull/18676) - KAFKA-18325: Add TargetAssignmentBuilder (#18676) * [PR-18786](https://github.com/apache/kafka/pull/18786) - KAFKA-18672; CoordinatorRecordSerde must validate value version (4.0) (#18786) * [PR-18717](https://github.com/apache/kafka/pull/18717) - KAFKA-18655: Implement the consumer group size counter with scheduled task (#18717) * [PR-18764](https://github.com/apache/kafka/pull/18764) - KAFKA-18685: Cleanup DynamicLogConfig constructor (#18764) * [PR-18785](https://github.com/apache/kafka/pull/18785) - KAFKA-18676; Update Benchmark system tests (#18785) * [PR-18330](https://github.com/apache/kafka/pull/18330) - KAFKA-17631 Convert SaslApiVersionsRequestTest to kraft (#18330) * [PR-18749](https://github.com/apache/kafka/pull/18749) - KAFKA-18672; CoordinatorRecordSerde must validate value version (#18749) * [PR-18768](https://github.com/apache/kafka/pull/18768) - KAFKA-18678 Update TestVerifiableProducer system test (#18768) * [PR-18652](https://github.com/apache/kafka/pull/18652) - KAFKA-17125: Streams Sticky Task Assignor (#18652) * [PR-18751](https://github.com/apache/kafka/pull/18751) - KAFKA-18674 Document the incompatible changes in parsing –bootstrap-server (#18751) * [PR-18727](https://github.com/apache/kafka/pull/18727) - KAFKA-18659: librdkafka compressed produce fails unless api versions returns produce v0 (#18727) * [PR-18759](https://github.com/apache/kafka/pull/18759) - KAFKA-18683: Handle slicing of file records for updated start position (#18759) * [fc3dca4e](https://github.com/apache/kafka/commit/fc3dca4ed08a6acdcb5b1d5a4ed5b8a7095d318b) - Revert “KAFKA-17182: Consumer fetch sessions are evicted too quickly with AsyncKafkaConsumer (#17700)” * [7920fadb](https://github.com/apache/kafka/commit/7920fadbb586a9430ce1a45936d6bbd1555baa2d) - Revert “KAFKA-17182: Consumer fetch sessions are evicted too quickly with AsyncKafkaConsumer (#17700)” * [PR-18758](https://github.com/apache/kafka/pull/18758) - KAFKA-18660: Transactions Version 2 doesn’t handle epoch overflow correctly (#18730) (#18758) * [PR-18750](https://github.com/apache/kafka/pull/18750) - KAFKA-18320; Ensure that assignors are at the right place (#18750) * [PR-1541](https://github.com/confluentinc/kafka/pull/1541) - Merge trunk * [PR-17700](https://github.com/apache/kafka/pull/17700) - KAFKA-17182: Consumer fetch sessions are evicted too quickly with AsyncKafkaConsumer (#17700) * [PR-18766](https://github.com/apache/kafka/pull/18766) - KAFKA-18146; tests/kafkatest/tests/core/upgrade_test.py needs to be re-added as KRaft (#18766) * [PR-18763](https://github.com/apache/kafka/pull/18763) - KAFKA-18677; Update ConsoleConsumerTest system test (#18763) * [PR-17511](https://github.com/apache/kafka/pull/17511) - KAFKA-15995: Initial API + make Producer/Consumer plugins Monitorable (#17511) * [PR-18722](https://github.com/apache/kafka/pull/18722) - KAFKA-18644: improve generic type names for KStreamImpl and KTableImpl (#18722) * [PR-18754](https://github.com/apache/kafka/pull/18754) - KAFKA-18034: CommitRequestManager should fail pending requests on fatal coordinator errors (#18754) * [PR-18730](https://github.com/apache/kafka/pull/18730) - KAFKA-18660: Transactions Version 2 doesn’t handle epoch overflow correctly (#18730) * [PR-1556](https://github.com/confluentinc/kafka/pull/1556) - MINOR: Disable publish artifacts for 4.0 * [PR-18548](https://github.com/apache/kafka/pull/18548) - KAFKA-18034: CommitRequestManager should fail pending requests on fatal coordinator errors (#18548) * [PR-18731](https://github.com/apache/kafka/pull/18731) - KAFKA-18570: Update documentation to add remainingLogsToRecover, remainingSegmentsToRecover and LogDirectoryOffline metrics (#18731) * [PR-18669](https://github.com/apache/kafka/pull/18669) - KAFKA-18621: Add StreamsCoordinatorRecordHelpers (#18669) * [PR-18681](https://github.com/apache/kafka/pull/18681) - KAFKA-18636 Fix how we handle Gradle exits in CI (#18681) * [PR-18590](https://github.com/apache/kafka/pull/18590) - KAFKA-18569: New consumer close may wait on unneeded FindCoordinator (#18590) * [PR-18698](https://github.com/apache/kafka/pull/18698) - KAFKA-13722: remove internal usage of old ProcessorContext (#18698) * [PR-18314](https://github.com/apache/kafka/pull/18314) - KAFKA-16339: Add Kafka Streams migrating guide from transform to process (#18314) * [PR-18732](https://github.com/apache/kafka/pull/18732) - KAFKA-18498: Update lock ownership from main thread (#18732) * [PR-18478](https://github.com/apache/kafka/pull/18478) - KAFKA-18383 Remove reserved.broker.max.id and broker.id.generation.enable (#18478) * [PR-18733](https://github.com/apache/kafka/pull/18733) - KAFKA-18662: Return CONCURRENT_TRANSACTIONS on produce request in TV2 (#18733) * [PR-18718](https://github.com/apache/kafka/pull/18718) - KAFKA-18632: Multibroker test improvements. (#18718) * [PR-18725](https://github.com/apache/kafka/pull/18725) - KAFKA-18653: Fix mocks and potential thread leak issues causing silent RejectedExecutionException in share group broker tests (#18725) * [PR-18726](https://github.com/apache/kafka/pull/18726) - KAFKA-18646: Null records in fetch response breaks librdkafka (#18726) * [PR-18668](https://github.com/apache/kafka/pull/18668) - KAFKA-18619: New consumer topic metadata events should set requireMetadata flag (#18668) * [PR-18728](https://github.com/apache/kafka/pull/18728) - KAFKA-18488: Improve KafkaShareConsumerTest (#18728) * [PR-18716](https://github.com/apache/kafka/pull/18716) - KAFKA-18648: Add back support for metadata version 0-3 (#18716) * [PR-18555](https://github.com/apache/kafka/pull/18555) - KAFKA-18528: MultipleListenersWithSameSecurityProtocolBaseTest and GssapiAuthenticationTest should run for async consumer (#18555) * [PR-18651](https://github.com/apache/kafka/pull/18651) - KAFKA-17951: Share parition rotate strategy (#18651) * [PR-18712](https://github.com/apache/kafka/pull/18712) - KAFKA-18629: Delete share group state impl [1/N] (#18712) * [PR-18570](https://github.com/apache/kafka/pull/18570) - KAFKA-17162: join() started thread in DefaultTaskManagerTest (#18570) * [PR-18602](https://github.com/apache/kafka/pull/18602) - KAFKA-17587 Refactor test infrastructure (#18602) * [PR-18693](https://github.com/apache/kafka/pull/18693) - KAFKA-18631 Remove ZkConfigs (#18693) * [PR-18699](https://github.com/apache/kafka/pull/18699) - KAFKA-18642: Increased the timeouts in share_consumer_test.py system tests (#18699) * [PR-18632](https://github.com/apache/kafka/pull/18632) - KAFKA-18555 Avoid casting MetadataCache to KRaftMetadataCache (#18632) * [PR-18547](https://github.com/apache/kafka/pull/18547) - KAFKA-18533 Remove KafkaConfig zookeeper related logic (#18547) * [PR-18554](https://github.com/apache/kafka/pull/18554) - KAFKA-18529: ConsumerRebootstrapTest should run for async consumer (#18554) * [PR-18292](https://github.com/apache/kafka/pull/18292) - KAFKA-13722: remove usage of old ProcessorContext (#18292) * [PR-18444](https://github.com/apache/kafka/pull/18444) - KAFKA-17894: Implemented broker topic metrics for Share Group 1/N (KIP-1103) (#18444) * [PR-18687](https://github.com/apache/kafka/pull/18687) - KAFKA-18630: Clean ReplicaManagerBuilder (#18687) * [PR-18477](https://github.com/apache/kafka/pull/18477) - KAFKA-18474: Remove zkBroker listener (#18477) * [PR-18688](https://github.com/apache/kafka/pull/18688) - KAFKA-18616; Refactor DumpLogSegments’s MessageParsers (#18688) * [PR-15574](https://github.com/apache/kafka/pull/15574) - KAFKA-16372 Fix producer doc discrepancy with the exception behavior (#15574) * [PR-18618](https://github.com/apache/kafka/pull/18618) - KAFKA-18590 Cleanup DelegationTokenManager (#18618) * [PR-18593](https://github.com/apache/kafka/pull/18593) - KAFKA-18559 Cleanup FinalizedFeatures (#18593) * [PR-18627](https://github.com/apache/kafka/pull/18627) - KAFKA-18597 Fix max-buffer-utilization-percent is always 0 (#18627) * [PR-18686](https://github.com/apache/kafka/pull/18686) - KAFKA-18620: Remove UnifiedLog#legacyFetchOffsetsBefore (#18686) * [PR-18621](https://github.com/apache/kafka/pull/18621) - KAFKA-18592 Cleanup ReplicaManager (#18621) * [PR-18476](https://github.com/apache/kafka/pull/18476) - KAFKA-18324: Add CurrentAssignmentBuilder (#18476) * [PR-12042](https://github.com/apache/kafka/pull/12042) - KAFKA-13810: Document behavior of KafkaProducer.flush() w.r.t callbacks (#12042) * [PR-18667](https://github.com/apache/kafka/pull/18667) - KAFKA-18484 [2/2]; Handle exceptions during coordinator unload (#18667) * [PR-18601](https://github.com/apache/kafka/pull/18601) - KAFKA-18488: Additional protocol tests for share consumption (#18601) * [PR-18666](https://github.com/apache/kafka/pull/18666) - KAFKA-18486; [1/2] Update LocalLeaderEndPointTest (#18666) * [d2024436](https://github.com/apache/kafka/commit/d2024436218343a127385e0149a692caf432b772) - KAFKA-18575: Transaction Version 2 doesn’t correctly handle race condition with completing and new transaction(#18604) * [PR-18532](https://github.com/apache/kafka/pull/18532) - KAFKA-18517: Enable ConsumerBounceTest to run for new async consumer (#18532) * [PR-18614](https://github.com/apache/kafka/pull/18614) - KAFKA-18519: Remove Json.scala, cleanup AclEntry.scala (#18614) * [PR-18630](https://github.com/apache/kafka/pull/18630) - KAFKA-18599: Remove Optional wrapping for forwardingManager in ApiVersionManager (#18630) * [PR-18389](https://github.com/apache/kafka/pull/18389) - KAFKA-18229: Move configs out of “kraft” directory (#18389) * [PR-18661](https://github.com/apache/kafka/pull/18661) - KAFKA-18484 [1/N]; Handle exceptions from deferred events in coordinator (#18661) * [PR-18649](https://github.com/apache/kafka/pull/18649) - KAFKA-18392: Ensure client sets member ID for share group (#18649) * [PR-18527](https://github.com/apache/kafka/pull/18527) - KAFKA-18518: Add processor to handle rebalance events (#18527) * [PR-18607](https://github.com/apache/kafka/pull/18607) - KAFKA-17402: DefaultStateUpdated should transite task atomically (#18607) * [PR-18539](https://github.com/apache/kafka/pull/18539) - KAFKA-18454 Publish build scans to develocity.apache.org (#18539) * [PR-18512](https://github.com/apache/kafka/pull/18512) - KAFKA-18302; Update CoordinatorRecord (#18512) * [PR-18316](https://github.com/apache/kafka/pull/18316) - KAFKA-15370: Support Participation in 2PC (KIP-939) (2/N) (#18316) * [PR-18587](https://github.com/apache/kafka/pull/18587) - KAFKA-8862: Improve Producer error message for failed metadata update (#18587) * [PR-18581](https://github.com/apache/kafka/pull/18581) - KAFKA-17561: add processId tag to thread-state metric (#18581) * [PR-18629](https://github.com/apache/kafka/pull/18629) - KAFKA-18598: Remove ControllerMetadataMetrics ZK-related Metrics (#18629) * [PR-18611](https://github.com/apache/kafka/pull/18611) - KAFKA-18585 Fix fail test ValuesTest#shouldConvertDateValues (#18611) * [PR-18647](https://github.com/apache/kafka/pull/18647) - KAFKA-18487; Remove ReplicaManager#stopReplicas (#18647) * [PR-18635](https://github.com/apache/kafka/pull/18635) - KAFKA-18583; Fix getPartitionReplicaEndpoints for KRaft (#18635) * [PR-18442](https://github.com/apache/kafka/pull/18442) - KAFKA-18311: Internal Topic Manager (5/5) (#18442) * [PR-18636](https://github.com/apache/kafka/pull/18636) - KAFKA-18604; Update transaction coordinator (#18636) * [PR-18497](https://github.com/apache/kafka/pull/18497) - KAFKA-14552: Assume a baseline of 3.0 for server protocol versions (#18497) * [PR-18346](https://github.com/apache/kafka/pull/18346) - KAFKA-18363: Remove ZooKeeper mentiosn in broker configs (#18346) * [PR-18631](https://github.com/apache/kafka/pull/18631) - KAFKA-18595: Remove AuthorizerUtils#sessionToRequestContext (#18631) * [PR-18626](https://github.com/apache/kafka/pull/18626) - KAFKA-18594: Cleanup BrokerLifecycleManager (#18626) * [PR-18174](https://github.com/apache/kafka/pull/18174) - KAFKA-18232: Add share group state topic prune metrics. (#18174) * [PR-18567](https://github.com/apache/kafka/pull/18567) - KAFKA-18553: Update javadoc and comments of ConfigType (#18567) * [PR-18571](https://github.com/apache/kafka/pull/18571) - [KAFKA-16720] AdminClient Support for ListShareGroupOffsets (1/n) (#18571) * [PR-18624](https://github.com/apache/kafka/pull/18624) - KAFKA-18588 Remove TopicKey.scala (#18624) * [PR-18628](https://github.com/apache/kafka/pull/18628) - KAFKA-18578: Remove UpdateMetadataRequest from MetadataCacheTest (#18628) * [PR-18625](https://github.com/apache/kafka/pull/18625) - KAFKA-18593 Remove ZkCachedControllerId In MetadataCache (#18625) * [PR-17390](https://github.com/apache/kafka/pull/17390) - KAFKA-17668: Clean-up LogCleaner#maxOverCleanerThreads and LogCleanerManager#maintainUncleanablePartitions (#17390) * [PR-18616](https://github.com/apache/kafka/pull/18616) - KAFKA-18429 Remove ZkFinalizedFeatureCache and StateChangeFailedException (#18616) * [PR-18619](https://github.com/apache/kafka/pull/18619) - KAFKA-18589 Remove unused interBrokerProtocolVersion from GroupMetadataManager (#18619) * [PR-18598](https://github.com/apache/kafka/pull/18598) - KAFKA-18516 Remove RackAwareMode (#18598) * [PR-18608](https://github.com/apache/kafka/pull/18608) - KAFKA-18492 Cleanup RequestHandlerHelper (#18608) * [PR-18613](https://github.com/apache/kafka/pull/18613) - KAFKA-18427: Remove ZooKeeperClient (#18613) * [PR-18591](https://github.com/apache/kafka/pull/18591) - KAFKA-18540: Remove UpdataMetadataRequest from KafkaApisTest (#18591) * [PR-18594](https://github.com/apache/kafka/pull/18594) - KAFKA-18532: Clean Partition.scala zookeeper logic (#18594) * [PR-18605](https://github.com/apache/kafka/pull/18605) - KAFKA-18423: Remove ZkData and related unused references (#18605) * [PR-18586](https://github.com/apache/kafka/pull/18586) - KAFKA-18565 Cleanup SaslSetup (#18586) * [PR-18606](https://github.com/apache/kafka/pull/18606) - KAFKA-18430 Remove ZkNodeChangeNotificationListener (#18606) * [PR-18492](https://github.com/apache/kafka/pull/18492) - KAFKA-18480 Fix fail e2e test_offset_truncate (#18492) * [PR-18012](https://github.com/apache/kafka/pull/18012) - KAFKA-806: Index may not always observe log.index.interval.bytes (#18012) * [PR-18595](https://github.com/apache/kafka/pull/18595) - KAFKA-18515 Remove DelegationTokenManagerZk (#18595) * [PR-18579](https://github.com/apache/kafka/pull/18579) - Remove casts to KRaftMetadataCache (#18579) * [PR-18577](https://github.com/apache/kafka/pull/18577) - Convert BrokerEndPoint to record (#18577) * [PR-18240](https://github.com/apache/kafka/pull/18240) - KAFKA-17642: PreVote response handling and ProspectiveState (#18240) * [PR-18585](https://github.com/apache/kafka/pull/18585) - KAFKA-18413: Remove AdminZkClient (#18585) * [PR-18406](https://github.com/apache/kafka/pull/18406) - KAFKA-18318: Add logs for online/offline migration indication (#18406) * [PR-18224](https://github.com/apache/kafka/pull/18224) - KAFKA-18150; Downgrade group on classic leave of last consumer member (#18224) * [PR-18209](https://github.com/apache/kafka/pull/18209) - Infrastructure for system tests for the new share consumer client (#18209) * [PR-18553](https://github.com/apache/kafka/pull/18553) - KAFKA-18373: Remove ZkMetadataCache (#18553) * [PR-18582](https://github.com/apache/kafka/pull/18582) - KAFKA-18557 streamline codebase with testConfig() (#18582) * [PR-18573](https://github.com/apache/kafka/pull/18573) - KAFKA-18431: Remove KafkaController (#18573) * [PR-18574](https://github.com/apache/kafka/pull/18574) - KAFKA-18407: Remove ZkAdminManager, DelayedCreatePartitions, CreatePartitionsMetadata, ZkConfigRepository, DelayedDeleteTopics (#18574) * [PR-18568](https://github.com/apache/kafka/pull/18568) - KAFKA-18556: Remove JaasModule#zkDigestModule, JaasTestUtils#zkSections (#18568) * [PR-18534](https://github.com/apache/kafka/pull/18534) - KAFKA-14485: Move LogCleaner exceptions to storage module (#18534) * [PR-18565](https://github.com/apache/kafka/pull/18565) - KAFKA-18546: Use mocks instead of a real DNS lookup to the outside (#18565) * [PR-18140](https://github.com/apache/kafka/pull/18140) - KAFKA-16368: Add a new constraint for segment.bytes to min 1MB for KIP-1030 (#18140) * [PR-18106](https://github.com/apache/kafka/pull/18106) - KAFKA-16368: Update defaults for LOG_MESSAGE_TIMESTAMP_AFTER_MAX_MS_DEFAULT and NUM_RECOVERY_THREADS_PER_DATA_DIR_CONFIG (#18106) * [PR-18374](https://github.com/apache/kafka/pull/18374) - KAFKA-7776: Tests for ISO8601 in Connect value parsing (#18374) * [PR-18562](https://github.com/apache/kafka/pull/18562) - KAFKA-18558: Added check before adding previously subscribed partitions (#18562) * [PR-18535](https://github.com/apache/kafka/pull/18535) - KAFKA-18521 Cleanup NodeApiVersions zkMigrationEnabled field (#18535) * [PR-18552](https://github.com/apache/kafka/pull/18552) - KAFKA-18542 Cleanup AlterPartitionManager (#18552) * [PR-18561](https://github.com/apache/kafka/pull/18561) - KAFKA-18406 Remove ZkBrokerEpochManager.scala (#18561) * [PR-18508](https://github.com/apache/kafka/pull/18508) - KAFKA-18405 Remove ZooKeeper logic from DynamicBrokerConfig (#18508) * [PR-18080](https://github.com/apache/kafka/pull/18080) - KAFKA-16368: Update default linger.ms to 5ms for KIP-1030 (#18080) * [PR-18524](https://github.com/apache/kafka/pull/18524) - KAFKA-18514: Refactor share module code to server and server-common (#18524) * [PR-18414](https://github.com/apache/kafka/pull/18414) - KAFKA-18331: Make process.roles and node.id required configs (#18414) * [PR-18559](https://github.com/apache/kafka/pull/18559) - KAFKA-18552: Remove unnecessary version check in `testHandleOffsetFetch*` (#18559) * [PR-18483](https://github.com/apache/kafka/pull/18483) - KAFKA-18472: Remove MetadataSupport (#18483) * [PR-18342](https://github.com/apache/kafka/pull/18342) - KAFKA-18026: KIP-1112, clean up graph node grace period resolution (#18342) * [PR-18491](https://github.com/apache/kafka/pull/18491) - KAFKA-18479: Remove keepPartitionMetadataFile in UnifiedLog and LogMan… (#18491) * [PR-18365](https://github.com/apache/kafka/pull/18365) - KAFKA-18364 add document to show the changes of metrics and configs after removing zookeeper (#18365) * [PR-18550](https://github.com/apache/kafka/pull/18550) - KAFKA-18539 Remove optional managers in KafkaApis (#18550) * [PR-18563](https://github.com/apache/kafka/pull/18563) - Use version.py get_version to get version (#18563) * [PR-18459](https://github.com/apache/kafka/pull/18459) - KAFKA-18452: Implemented batch size in acquired records (#18459) * [PR-18448](https://github.com/apache/kafka/pull/18448) - KAFKA-18401: Transaction version 2 does not support commit transaction without records (#18448) * [PR-18490](https://github.com/apache/kafka/pull/18490) - KAFKA-18479: RocksDBTimeOrderedKeyValueBuffer not initialized correctly (#18490) * [PR-18536](https://github.com/apache/kafka/pull/18536) - KAFKA-18514 Remove server dependency on share coordinator (#18536) * [PR-18521](https://github.com/apache/kafka/pull/18521) - KAFKA-18513: Validate share state topic records produced in tests. (#18521) * [PR-18542](https://github.com/apache/kafka/pull/18542) - KAFKA-18399 Remove ZooKeeper from KafkaApis (12/N): clean up ZKMetadataCache, KafkaController and raftSupport (#18542) * [PR-18386](https://github.com/apache/kafka/pull/18386) - KAFKA-18346 Fix e2e TestKRaftUpgrade for v3.3.2 (#18386) * [PR-18530](https://github.com/apache/kafka/pull/18530) - KAFKA-18520: Remove ZooKeeper logic from JaasUtils (#18530) * [PR-18540](https://github.com/apache/kafka/pull/18540) - KAFKA-18399 Remove ZooKeeper from KafkaApis (11/N): CREATE_ACLS and DELETE_ACLS (#18540) * [PR-18432](https://github.com/apache/kafka/pull/18432) - KAFKA-18399 Remove ZooKeeper from KafkaApis (10/N): ALTER_CONFIG and INCREMENETAL_ALTER_CONFIG (#18432) * [PR-18544](https://github.com/apache/kafka/pull/18544) - Revert “KAFKA-18034: CommitRequestManager should fail pending requests on fatal coordinator errors (#18050)” (#18544) * [PR-18487](https://github.com/apache/kafka/pull/18487) - KAFKA-18476: KafkaStreams should swallow TransactionAbortedException (#18487) * [PR-18195](https://github.com/apache/kafka/pull/18195) - KAFKA-18026: KIP-1112, clean up StatefulProcessorNode (#18195) * [PR-18518](https://github.com/apache/kafka/pull/18518) - KAFKA-18502 Remove kafka.controller.Election (#18518) * [PR-18281](https://github.com/apache/kafka/pull/18281) - KAFKA-18330: Update documentation to remove controller deployment limitations (#18281) * [PR-18465](https://github.com/apache/kafka/pull/18465) - KAFKA-18399 Remove ZooKeeper from KafkaApis (9/N): ALTER_CLIENT_QUOTAS and ALLOCATE_PRODUCER_IDS (#18465) * [PR-18511](https://github.com/apache/kafka/pull/18511) - KAFKA-18493: Fix configure :streams:integration-tests project error (#18511) * [PR-18453](https://github.com/apache/kafka/pull/18453) - KAFKA-18399 Remove ZooKeeper from KafkaApis (8/N): ELECT_LEADERS , ALTER_PARTITION, UPDATE_FEATURES (#18453) * [PR-18525](https://github.com/apache/kafka/pull/18525) - Rename the variable to reflect its purpose (#18525) * [PR-18403](https://github.com/apache/kafka/pull/18403) - KAFKA-18211: Override class loaders for class graph scanning in connect. (#18403) * [PR-18500](https://github.com/apache/kafka/pull/18500) - Add DescribeShareGroupOffsets API [KIP-932] (#18500) * [PR-17669](https://github.com/apache/kafka/pull/17669) - KAFKA-17915: Convert Kafka Client system tests to use KRaft (#17669) * [PR-17901](https://github.com/apache/kafka/pull/17901) - KAFKA-18064: SASL mechanisms should throw exception on wrap/unwrap (#17901) * [PR-18507](https://github.com/apache/kafka/pull/18507) - KAFKA-18491 Remove zkClient & maybeUpdateMetadataCache from ReplicaManager (#18507) * [PR-18337](https://github.com/apache/kafka/pull/18337) - KAFKA-18274 Failed to restart controller in testing due to closed socket channel [2/2] (#18337) * [PR-18475](https://github.com/apache/kafka/pull/18475) - KAFKA-18469;KAFKA-18036: AsyncConsumer should request metadata update if ListOffsetRequest encounters a retriable error (#18475) * [PR-17728](https://github.com/apache/kafka/pull/17728) - KAFKA-17973: Relax Restriction for Voters Set Change (#17728) * [PR-18050](https://github.com/apache/kafka/pull/18050) - KAFKA-18034: CommitRequestManager should fail pending requests on fatal coordinator errors (#18050) * [PR-17870](https://github.com/apache/kafka/pull/17870) - KAFKA-18404: Remove partitionMaxBytes usage from DelayedShareFetch (#17870) * [PR-18504](https://github.com/apache/kafka/pull/18504) - KAFKA-18485; Update log4j2.yaml (#18504) * [PR-18433](https://github.com/apache/kafka/pull/18433) - KAFKA-18399 Remove ZooKeeper from KafkaApis (7/N): CREATE_TOPICS, DELETE_TOPICS, CREATE_PARTITIONS (#18433) * [PR-18320](https://github.com/apache/kafka/pull/18320) - KAFKA-18341: Remove KafkaConfig GroupType config check and warn log (#18320) * [PR-18480](https://github.com/apache/kafka/pull/18480) - KAFKA-18457; Update DumpLogSegments to use coordinator record json converters (#18480) * [PR-18447](https://github.com/apache/kafka/pull/18447) - KAFKA-18399 Remove ZooKeeper from KafkaApis (6/N): handleCreateTokenRequest, handleRenewTokenRequestZk, handleExpireTokenRequestZk (#18447) * [PR-18464](https://github.com/apache/kafka/pull/18464) - KAFKA-18399 Remove ZooKeeper from KafkaApis (5/N): ALTER_PARTITION_REASSIGNMENTS, LIST_PARTITION_REASSIGNMENTS (#18464) * [PR-18461](https://github.com/apache/kafka/pull/18461) - KAFKA-18399 Remove ZooKeeper from KafkaApis (4/N): OFFSET_COMMIT and OFFSET_FETCH (#18461) * [PR-18456](https://github.com/apache/kafka/pull/18456) - KAFKA-18399 Remove ZooKeeper from KafkaApis (3/N): USER_SCRAM_CREDENTIALS (#18456) * [PR-18472](https://github.com/apache/kafka/pull/18472) - KAFKA-18466 Remove log4j-1.2-api from runtime scope while keeping it in distribution package (#18472) * [PR-18404](https://github.com/apache/kafka/pull/18404) - KAFKA-18400: Don’t use YYYY when formatting/parsing dates in Java client (#18404) * [PR-18437](https://github.com/apache/kafka/pull/18437) - KAFKA-18446 Remove MetadataCacheControllerNodeProvider (#18437) * [PR-18468](https://github.com/apache/kafka/pull/18468) - KAFKA-18465: Remove MetadataVersions older than 3.0-IV1 (#18468) * [PR-18467](https://github.com/apache/kafka/pull/18467) - KAFKA-18464: Empty Abort Transaction can fence producer incorrectly with Transactions V2 (#18467) * [PR-18471](https://github.com/apache/kafka/pull/18471) - KAFKA-8116: Update Kafka Streams archetype for Java 11 (#18471) * [PR-17510](https://github.com/apache/kafka/pull/17510) - KAFKA-17792: Efficiently parse decimals with large exponents in Connect Values (#17510) * [PR-18679](https://github.com/apache/kafka/pull/18679) - KAFKA-18632: Added few share consumer multibroker tests. (#18679) * [82ccf75a](https://github.com/apache/kafka/commit/82ccf75ae091bffb94cbb3fd173240c48627db17) - KAFKA-18575: Transaction Version 2 doesn’t correctly handle race condition with completing and new transaction(#18604) * [94a1bfb1](https://github.com/apache/kafka/commit/94a1bfb1281f06263976b1ba8bba8c5ac5d7f2ce) - KAFKA-18575: Transaction Version 2 doesn’t correctly handle race condition with completing and new transaction(#18604) * [PR-18340](https://github.com/apache/kafka/pull/18340) - KAFKA-18339: Fix parseRequestHeader error handling (#18340) * [PR-18643](https://github.com/apache/kafka/pull/18643) - Revert “KAFKA-18404: Remove partitionMaxBytes usage from DelayedShareFetch (#17870)” (#18643) * [21c4539d](https://github.com/apache/kafka/commit/21c4539dfe1134e60a7d8680d9ea19ae48f569a3) - Revert “KAFKA-18034: CommitRequestManager should fail pending requests on fatal coordinator errors (#18050)” * [PR-18150](https://github.com/apache/kafka/pull/18150) - KAFKA-18026: KIP-1112 migrate KTableSuppressProcessorSupplier (#18150) * [0186534a](https://github.com/apache/kafka/commit/0186534a992a123a7f53dd32860c6ba5787dbb18) - Revert “KAFKA-17411: Create local state Standbys on start (#16922)” and “KAFKA-17978: Fix invalid topology on Task assignment (#17778)” * [PR-18378](https://github.com/apache/kafka/pull/18378) - KAFKA-18340: Change Dockerfile to use log4j2 yaml instead log4j properties (#18378) * [PR-18397](https://github.com/apache/kafka/pull/18397) - KAFKA-18311: Enforcing copartitioned topics (4/N) (#18397) * [PR-17454](https://github.com/apache/kafka/pull/17454) - KAFKA-17671: Create better documentation for transactions (#17454) * [PR-18455](https://github.com/apache/kafka/pull/18455) - KAFKA-18308; Update CoordinatorSerde (#18455) * [PR-18435](https://github.com/apache/kafka/pull/18435) - KAFKA-18440: Convert AuthorizationException to fatal error in AdminClient (#18435) * [PR-18458](https://github.com/apache/kafka/pull/18458) - KAFKA-18304; Introduce json converter generator (#18458) * [PR-18422](https://github.com/apache/kafka/pull/18422) - KAFKA-18399 Remove ZooKeeper from KafkaApis (2/N): CONTROLLED_SHUTDOWN and ENVELOPE (#18422) * [PR-18146](https://github.com/apache/kafka/pull/18146) - KAFKA-18073: Prevent dropped records from failed retriable exceptions (#18146) * [PR-18321](https://github.com/apache/kafka/pull/18321) - KAFKA-13093: Log compaction should write new segments with record version v2 (KIP-724) (#18321) * [PR-18100](https://github.com/apache/kafka/pull/18100) - KAFKA-18180: Move OffsetResultHolder to storage module (#18100) * [PR-17527](https://github.com/apache/kafka/pull/17527) - KAFKA-17455: fix stuck producer when throttling or retrying (#17527) * [PR-18367](https://github.com/apache/kafka/pull/18367) - KAFKA-17915: Convert remaining Kafka Client system tests to use KRaft (#18367) * [PR-18247](https://github.com/apache/kafka/pull/18247) - KAFKA-18277 Convert network_degrade_test to Kraft mode (#18247) * [PR-18175](https://github.com/apache/kafka/pull/18175) - KAFKA-17986 Fix ConsumerRebootstrapTest and ProducerRebootstrapTest (#18175) * [PR-18445](https://github.com/apache/kafka/pull/18445) - KAFKA-18445 Remove LazyDownConversionRecords and LazyDownConversionRecordsSend (#18445) * [PR-18417](https://github.com/apache/kafka/pull/18417) - KAFKA-18399 Remove ZooKeeper from KafkaApis (1/N): LEADER_AND_ISR, STOP_REPLICA, UPDATE_METADATA (#18417) * [PR-18382](https://github.com/apache/kafka/pull/18382) - KAFKA-17730 ReplicaFetcherThreadBenchmark is broken (#18382) * [PR-18423](https://github.com/apache/kafka/pull/18423) - KAFKA-18437: Correct version of ShareUpdateRecord value (#18423) * [PR-18457](https://github.com/apache/kafka/pull/18457) - KAFKA-18397: Added null check before sending background event from ShareConsumeRequestManager. (#18419) (#18457) * [PR-18462](https://github.com/apache/kafka/pull/18462) - KAFKA-18449: Add share group state configs to reconfig-server.properties (#18440) (#18462) * [PR-18395](https://github.com/apache/kafka/pull/18395) - KAFKA-18311: Configuring repartition topics (3/N) (#18395) * [PR-18446](https://github.com/apache/kafka/pull/18446) - KAFKA-18453: Add StreamsTopology class to group coordinator (#18446) * [PR-18450](https://github.com/apache/kafka/pull/18450) - KAFKA-18435 Remove zookeeper dependencies in build.gradle (#18450) * [PR-18452](https://github.com/apache/kafka/pull/18452) - KAFKA-18111: Add Kafka Logo to README (#18452) * [PR-18438](https://github.com/apache/kafka/pull/18438) - KAFKA-18432 Remove unused code from AutoTopicCreationManager (#18438) * [PR-18436](https://github.com/apache/kafka/pull/18436) - KAFKA-18434: enrich the authorization error message of connecting to controller (#18436) * [PR-18441](https://github.com/apache/kafka/pull/18441) - KAFKA-18426 Remove FinalizedFeatureChangeListener (#18441) * [PR-18276](https://github.com/apache/kafka/pull/18276) - KAFKA-18321: Add StreamsGroupMember, MemberState and Assignment classes (#18276) * [PR-18443](https://github.com/apache/kafka/pull/18443) - KAFKA-18425 Remove OffsetTrackingListener (#18443) * [PR-18439](https://github.com/apache/kafka/pull/18439) - KAFKA-18433: Add BatchSize to ShareFetch request (1/N) (#18439) * [PR-18428](https://github.com/apache/kafka/pull/18428) - Backport some GHA changes from trunk (#18428) * [PR-18415](https://github.com/apache/kafka/pull/18415) - KAFKA-18428: Measure share consumers performance (#18415) * [PR-18419](https://github.com/apache/kafka/pull/18419) - KAFKA-18397: Added null check before sending background event from ShareConsumeRequestManager. (#18419) * [PR-18440](https://github.com/apache/kafka/pull/18440) - KAFKA-18449: Add share group state configs to reconfig-server.properties (#18440) * [PR-18296](https://github.com/apache/kafka/pull/18296) - KAFKA-18173 Remove duplicate assertFutureError (#18296) * [PR-18094](https://github.com/apache/kafka/pull/18094) - KAFKA-15599: Move SegmentPosition/TimingWheelExpirationService to raft module (#18094) * [PR-18329](https://github.com/apache/kafka/pull/18329) - KAFKA-18353 Remove zk config control.plane.listener.name (#18329) * [PR-18429](https://github.com/apache/kafka/pull/18429) - KAFKA-18443 Remove ZkFourLetterWords (#18429) * [PR-18431](https://github.com/apache/kafka/pull/18431) - KAFKA-18417 Remove controlled.shutdown.max.retries and controlled.shutdown.retry.backoff.ms (#18431) * [PR-18287](https://github.com/apache/kafka/pull/18287) - KAFKA-18326: fix merge iterator with cache tombstones (#18287) * [PR-18413](https://github.com/apache/kafka/pull/18413) - KAFKA-18411 Remove ZkProducerIdManager (#18413) * [PR-18421](https://github.com/apache/kafka/pull/18421) - KAFKA-18408 tweak the ‘tag’ field for BrokerHeartbeatRequest.json, BrokerRegistrationChangeRecord.json and RegisterBrokerRecord.json (#18421) * [PR-18401](https://github.com/apache/kafka/pull/18401) - KAFKA-18414 Remove KRaftRegistrationResult (#18401) * [PR-17671](https://github.com/apache/kafka/pull/17671) - KAFKA-17921 Support SASL_PLAINTEXT protocol with java.security.auth.login.config (#17671) * [PR-18411](https://github.com/apache/kafka/pull/18411) - KAFKA-18436: Revert Multiversioning Changes from 4.0 release. (#18411) * [PR-18364](https://github.com/apache/kafka/pull/18364) - KAFKA-18384 Remove ZkAlterPartitionManager (#18364) * [PR-17946](https://github.com/apache/kafka/pull/17946) - KAFKA-10790: Add deadlock detection to producer#flush (#17946) * [PR-18399](https://github.com/apache/kafka/pull/18399) - KAFKA-18412: Remove EmbeddedZookeeper (#18399) * [PR-18352](https://github.com/apache/kafka/pull/18352) - KAFKA-18368 Remove TestUtils#MockZkConnect and remove zkConnect from TestUtils#createBrokerConfig (#18352) * [PR-18396](https://github.com/apache/kafka/pull/18396) - KAFKA-18303; Update ShareCoordinator to use new record format (#18396) * [PR-18370](https://github.com/apache/kafka/pull/18370) - KAFKA-18388 test-kraft-server-start.sh should use log4j2.yaml (#18370) * [PR-17742](https://github.com/apache/kafka/pull/17742) - KAFKA-18419: KIP-891 Connect Multiversion Support (Transformation and Predicate Changes) (#17742) * [PR-18355](https://github.com/apache/kafka/pull/18355) - KAFKA-18374 Remove EncryptingPasswordEncoder, CipherParamsEncoder, GcmParamsEncoder, IvParamsEncoder, and the unused static variables in PasswordEncoder (#18355) * [PR-18379](https://github.com/apache/kafka/pull/18379) - KAFKA-18311: Configuring changelog topics (2/N) (#18379) * [PR-18318](https://github.com/apache/kafka/pull/18318) - KAFKA-18307: Don’t report on disabled/removed tests (#18318) * [PR-17801](https://github.com/apache/kafka/pull/17801) - KAFKA-17278; Add KRaft RPC compatibility tests (#17801) * [PR-18377](https://github.com/apache/kafka/pull/18377) - KAFKA-17539: Application metrics extension for share consumer (#18377) * [PR-18384](https://github.com/apache/kafka/pull/18384) - KAFKA-17616: Remove KafkaServer (#18384) * [PR-18268](https://github.com/apache/kafka/pull/18268) - KAFKA-18311: Add internal datastructure for configuring topologies (1/N) (#18268) * [PR-18343](https://github.com/apache/kafka/pull/18343) - KAFKA-18358: Replace Deprecated $buildDir variable in build.gradle (#18343) * [PR-18353](https://github.com/apache/kafka/pull/18353) - KAFKA-18365 Remove zookeeper.connect in Test (#18353) * [PR-18373](https://github.com/apache/kafka/pull/18373) - Use instanceof pattern to avoid explicit cast (#18373) * [PR-18270](https://github.com/apache/kafka/pull/18270) - KAFKA-18319: Add task assignor interfaces (#18270) * [PR-18259](https://github.com/apache/kafka/pull/18259) - KAFKA-18273: KIP-1099 verbose display share group options (#18259) * [PR-18363](https://github.com/apache/kafka/pull/18363) - KAFKA-18367 Remove ZkConfigManager (#18363) * [PR-18351](https://github.com/apache/kafka/pull/18351) - KAFKA-18347 Add tools-log4j2.yaml to config and remove unsed tools-log4j.properties from config (#18351) * [PR-18359](https://github.com/apache/kafka/pull/18359) - KAFKA-18375 Update the LICENSE-binary (#18359) * [PR-18345](https://github.com/apache/kafka/pull/18345) - KAFKA-18026: KIP-1112, configure all StoreBuilder & StoreFactory layers (#18345) * [PR-18232](https://github.com/apache/kafka/pull/18232) - KAFKA-12469: Deprecated and corrected topic metrics for consumer (KIP-1109) (#18232) * [PR-18254](https://github.com/apache/kafka/pull/18254) - KAFKA-17421 Add integration tests for ConsumerRecord#leaderEpoch (#18254) * [PR-18347](https://github.com/apache/kafka/pull/18347) - KAFKA-18361 Remove PasswordEncoderConfigs (#18347) * [PR-18271](https://github.com/apache/kafka/pull/18271) - KAFKA-17615 Remove KafkaServer from tests (#18271) * [PR-18308](https://github.com/apache/kafka/pull/18308) - KAFKA-18280 fix e2e TestSecurityRollingUpgrade.test_rolling_upgrade_sasl_mechanism_phase_one (#18308) * [PR-18327](https://github.com/apache/kafka/pull/18327) - KAFKA-18313 Fix to Kraft or remove tests associate with Zk Broker config in SocketServerTest and ReplicaFetcherThreadTest (#18327) * [PR-18279](https://github.com/apache/kafka/pull/18279) - KAFKA-18316 Fix to Kraft or remove tests associate with Zk Broker config in ConnectionQuotasTest (#18279) * [PR-18185](https://github.com/apache/kafka/pull/18185) - KAFKA-18243 Fix compatibility of Loggers class between log4j and log4j2 (#18185) * [PR-18269](https://github.com/apache/kafka/pull/18269) - KAFKA-18315 Fix to Kraft or remove tests associate with Zk Broker config in DynamicBrokerConfigTest, ReplicaManagerTest, DescribeTopicPartitionsRequestHandlerTest, KafkaConfigTest (#18269) * [PR-18338](https://github.com/apache/kafka/pull/18338) - KAFKA-18354 Use log4j2 APIs to refactor LogCaptureAppender (#18338) * [PR-18309](https://github.com/apache/kafka/pull/18309) - KAFKA-18314 Fix to Kraft or remove tests associate with Zk Broker config in KafkaApisTest (#18309) * [PR-18344](https://github.com/apache/kafka/pull/18344) - KAFKA-18359 Set zkConnect to null in LocalLeaderEndPointTest, HighwatermarkPersistenceTest, IsrExpirationTest, ReplicaManagerQuotasTest, OffsetsForLeaderEpochTest (#18344) * [PR-18101](https://github.com/apache/kafka/pull/18101) - KAFKA-18135: ShareConsumer HB UnsupportedVersion msg mixed with Consumer HB (#18101) * [PR-18283](https://github.com/apache/kafka/pull/18283) - KAFKA-18317 Remove zookeeper.connect from RemoteLogManagerTest (#18283) * [PR-18295](https://github.com/apache/kafka/pull/18295) - KAFKA-18339: Remove raw unversioned direct SASL protocol (KIP-896) (#18295) * [PR-18313](https://github.com/apache/kafka/pull/18313) - KAFKA-18272: Deprecated protocol api usage should be logged at info level (#18313) * [PR-18282](https://github.com/apache/kafka/pull/18282) - KAFKA-18295 Remove deprecated function Partitioner#onNewBatch (#18282) * [PR-18317](https://github.com/apache/kafka/pull/18317) - KAFKA-18348 Remove the deprecated MockConsumer#setException (#18317) * [PR-18324](https://github.com/apache/kafka/pull/18324) - KAFKA-18352: Add back DeleteGroups v0, it incorrectly tagged as deprecated (#18324) * [PR-18310](https://github.com/apache/kafka/pull/18310) - KAFKA-18274 Failed to restart controller in testing due to closed socket channel [1/2] (#18310) * [PR-18250](https://github.com/apache/kafka/pull/18250) - KAFKA-18093 Remove deprecated DeleteTopicsResult#values (#18250) * [PR-18312](https://github.com/apache/kafka/pull/18312) - KAFKA-18343: Use java_pids to implement pids (#18312) * [PR-18294](https://github.com/apache/kafka/pull/18294) - KAFKA-18338 add log4j.yaml to test-common-api and remove unsed log4j.properties from test-common (#18294) * [PR-18306](https://github.com/apache/kafka/pull/18306) - KAFKA-18342 Use File.exist instead of File.exists to ensure the Vagrantfile works with Ruby 3.2+ (#18306) * [PR-18246](https://github.com/apache/kafka/pull/18246) - KAFKA-18290 Remove deprecated methods of FeatureUpdate (#18246) * [PR-18255](https://github.com/apache/kafka/pull/18255) - KAFKA-18289 Remove deprecated methods of DescribeTopicsResult (#18255) * [PR-18265](https://github.com/apache/kafka/pull/18265) - KAFKA-18291 Remove deprecated methods of ListConsumerGroupOffsetsOptions (#18265) * [PR-18223](https://github.com/apache/kafka/pull/18223) - KAFKA-18278: Correct name and description for run-gradle step (#18223) * [PR-18267](https://github.com/apache/kafka/pull/18267) - KAFKA-17393: Remove log.message.format.version/message.format.version (KIP-724) (#18267) * [PR-18132](https://github.com/apache/kafka/pull/18132) - KAFKA-17705: Add Transactions V2 system tests and mark as production ready (#18132) * [PR-18291](https://github.com/apache/kafka/pull/18291) - KAFKA-18269: Remove deprecated protocol APIs support (KIP-896, KIP-724) (#18291) * [PR-18218](https://github.com/apache/kafka/pull/18218) - KAFKA-18269: Remove deprecated protocol APIs support (KIP-896, KIP-724) (#18218) * [PR-18288](https://github.com/apache/kafka/pull/18288) - KAFKA-18334: Produce v4-v6 should be undeprecated (#18288) * [PR-18262](https://github.com/apache/kafka/pull/18262) - KAFKA-18270: FindCoordinator v0 incorrectly tagged as deprecated (#18262) * [PR-18221](https://github.com/apache/kafka/pull/18221) - KAFKA-18270: SaslHandshake v0 incorrectly tagged as deprecated (#18221) * [PR-18249](https://github.com/apache/kafka/pull/18249) - KAFKA-13722: code cleanup after deprecated StateStore.init() was removed (#18249) * [PR-17687](https://github.com/apache/kafka/pull/17687) - KAFKA-15370: Support Participation in 2PC (KIP-939) (1/N) (#17687) * [PR-18285](https://github.com/apache/kafka/pull/18285) - KAFKA-18312: Added entityType: topicName to SubscribedTopicNames in ShareGroupHeartbeatRequest.json (#18285) * [PR-18261](https://github.com/apache/kafka/pull/18261) - KAFKA-18301; Make coordinator records first class citizen (#18261) * [PR-18204](https://github.com/apache/kafka/pull/18204) - KAFKA-18262 Remove DefaultPartitioner and UniformStickyPartitioner (#18204) * [PR-18257](https://github.com/apache/kafka/pull/18257) - KAFKA-18296 Remove deprecated KafkaBasedLog constructor (#18257) * [PR-18238](https://github.com/apache/kafka/pull/18238) - KAFKA-12829: Remove old Processor and ProcessorSupplier interfaces (#18238) * [PR-18245](https://github.com/apache/kafka/pull/18245) - KAFKA-18292 Remove deprecated methods of UpdateFeaturesOptions (#18245) * [PR-18154](https://github.com/apache/kafka/pull/18154) - KAFKA-12829: Remove deprecated Topology#addProcessor of old Processor API (#18154) * [PR-18136](https://github.com/apache/kafka/pull/18136) - KAFKA-18207: Serde for handling transaction records (#18136) * [PR-18243](https://github.com/apache/kafka/pull/18243) - KAFKA-13722: Refactor Kafka Streams store interfaces (#18243) * [PR-18241](https://github.com/apache/kafka/pull/18241) - KAFKA-17131: Refactor TimeDefinitions (#18241) * [PR-18228](https://github.com/apache/kafka/pull/18228) - KAFKA-18284: Add group coordinator records for Streams rebalance protocol (#18228) * [PR-18242](https://github.com/apache/kafka/pull/18242) - KAFKA-13722: Refactor SerdeGetter (#18242) * [PR-18176](https://github.com/apache/kafka/pull/18176) - KAFKA-18227: Ensure v2 partitions are not added to last transaction during upgrade (#18176) * [PR-18251](https://github.com/apache/kafka/pull/18251) - Add IT for share consumer with duration base offet auto reset (#18251) * [PR-18230](https://github.com/apache/kafka/pull/18230) - KAFKA-18283: Add StreamsGroupDescribe RPC definitions (#18230) * [PR-18260](https://github.com/apache/kafka/pull/18260) - KAFKA-18294 Remove deprecated SourceTask#commitRecord (#18260) * [PR-18211](https://github.com/apache/kafka/pull/18211) - KAFKA-18264 Remove NotLeaderForPartitionException (#18211) * [PR-18248](https://github.com/apache/kafka/pull/18248) - KAFKA-18094 Remove deprecated TopicListing(String, Boolean) (#18248) * [PR-18227](https://github.com/apache/kafka/pull/18227) - KAFKA-18282: Add StreamsGroupHeartbeat RPC definitions (#18227) * [PR-18205](https://github.com/apache/kafka/pull/18205) - KAFKA-18026: transition KTable#filter impl to use processor wrapper (#18205) * [PR-18244](https://github.com/apache/kafka/pull/18244) - KAFKA-18293 Remove org.apache.kafka.common.security.oauthbearer.secured.OAuthBearerLoginCallbackHandler and org.apache.kafka.common.security.oauthbearer.secured.OAuthBearerValidatorCallbackHandler (#18244) * [PR-18234](https://github.com/apache/kafka/pull/18234) - KAFKA-17960; PlaintextAdminIntegrationTest.testConsumerGroups fails with CONSUMER group protocol (#18234) * [PR-18144](https://github.com/apache/kafka/pull/18144) - KAFKA-18200; Handle empty batches in coordinator runtime (#18144) * [PR-18180](https://github.com/apache/kafka/pull/18180) - KAFKA-18237: Upgrade system tests from using 3.7.1 to 3.7.2 (#18180) * [PR-18210](https://github.com/apache/kafka/pull/18210) - KAFKA-18259: Documentation for consumer auto.offset.reset contains invalid HTML (#18210) * [PR-18207](https://github.com/apache/kafka/pull/18207) - KAFKA-18263; Group lock must be acquired when reverting static membership rejoin (#18207) * [PR-18190](https://github.com/apache/kafka/pull/18190) - KAFKA-18244: Fix empty SHA on “Pull Request Labeled” workflow (#18190) * [PR-18166](https://github.com/apache/kafka/pull/18166) - KAFKA-18226: Disable CustomQuotaCallbackTest and remove isKRaftTest (#18166) #### NOTE Ansible offers a simpler way to configure and deploy RBAC and MDS. Refer to [Ansible RBAC settings](https://docs.confluent.io/ansible/current/ansible-authorize.html) for details. To set up RBAC: * [Install Confluent Platform](../../../installation/overview.md#installation), including the `confluent-server` commercial component. For more information, see [Migrate Confluent Platform to Confluent Server](../../../installation/migrate-confluent-server.md#migrate-confluent-server). * Work with your security team to evaluate the needs of the users in your organization and and, based on the resources they require to perform their duties, identify which roles should be assigned to users and groups. For a description of some typical use cases and required roles for each, refer to [RBAC role use cases](rbac-predefined-roles.md#rbac-roles-use-cases). To bootstrap RBAC, you must identify an ACL-level `super.user` in the Confluent Server broker’s `server.properties` file on the cluster that hosts MDS. This `super.user` can then assign the SystemAdmin role to another user who can create the required clusters and scope the required role bindings for users and groups. Be sure to identify which user will serve as a bootstrap `super.user`. For details, refer to [Use Predefined RBAC Roles in Confluent Platform](rbac-predefined-roles.md#rbac-predefined-roles). * [Configure the Metadata Service (MDS)](../../../kafka/configure-mds/index.md#rbac-mds-config). The MDS implements the core RBAC functionality and [communicates with LDAP](../../../kafka/configure-mds/ldap-auth-mds.md#ldap-auth-mds) to get user and group information and authenticate users. After configuring MDS, you can proceed with role bindings and configuration of other Confluent Platform components. Refer to [Configure the LDAP identity provider](../../../kafka/configure-mds/index.md#mds-id-provider-settings) to view an LDAP configuration for MDS. * After you have determined which roles must be assigned to users and groups, create the appropriate [role bindings](rbac-cli-quickstart.md#rbac-rolebinding-sysadmin-role) for users to access the resources (for example, Schema Registry, ksqlDB, Connect, and Confluent Control Center ) they require to perform their duties. * Confirm the user and group roles you defined using the [confluent iam rbac role-binding list](https://docs.confluent.io/confluent-cli/current/command-reference/iam/rbac/role-binding/confluent_iam_rbac_role-binding_list.html) command. * Configure Confluent Platform components to communicate with MDS for authentication and authorization. For details, see: * [Configure RBAC for Control Center on Confluent Platform](/control-center/current/security/c3-rbac.html) * [Kafka Connect and RBAC](../../../connect/rbac-index.md#connect-rbac-index) * [Deploy Secure ksqlDB with RBAC in Confluent Platform](ksql-rbac.md#ksql-rbac) * [Configure Role-Based Access Control for Schema Registry in Confluent Platform](../../../schema-registry/security/rbac-schema-registry.md#schemaregistry-rbac) * [Configure RBAC for REST Proxy](../../../kafka-rest/production-deployment/rest-proxy/security.md#rbac-rest-proxy-security) * [Configure RBAC using the REST API in Confluent Platform](rbac-config-using-rest-api.md#rbac-config-using-rest-api) # Use Predefined RBAC Roles in Confluent Platform Confluent Platform provides predefined roles to help implement granular permissions for specific resources and to simplify access control across the Confluent Platform. A predefined role is a Confluent-defined job function that is assigned a set of permissions required to perform specific actions or operations on Confluent resources. Each role is bound to a principal and Confluent resources. Users can have multiple roles assigned to them. You cannot use a predefined role to override denial-of-access (DENY) that is configured in an ACL. When a role is assigned at the **cluster-level** (Kafka cluster, Schema Registry cluster, ksqlDB cluster, or Connect cluster) it means that users who are assigned this role have access to all resources in a cluster. For example, the `ClusterAdmin` of a Kafka cluster has access to Confluent Control Center alerts. There are corresponding resource types for each cluster type. For example, you can assign the `ResourceOwner` role to the resource types `KsqlCluster:ksql-cluster` or `Cluster:kafka-cluster` to provide a user all the `ResourceOwner` privileges for a ksqlDB or Kafka cluster. When a role is assigned at the **resource-level** it means that users assigned this role only have access to specific resources as defined in the role binding. The resource types for which you can assign RBAC roles and role bindings are: - Kafka cluster - Topic - Consumer group - TransactionalID - Schema Registry cluster - Schema Registry subject - ksqlDB cluster - Connect cluster - Connector - Kek (CSFLE key resource in Schema Registry) Confluent Platform provides the following predefined roles: | Role Name | Level of Role Scope | View Role Bindings of Others | Manage Role Bindings | Monitor | Resource Read | Resource Write | Resource Manage | |-------------------|-----------------------|--------------------------------|------------------------|-----------|-----------------|------------------|------------------------| | `super.user` | Cluster | Yes | Yes | Yes | Yes | Yes | Yes | | `SystemAdmin` | Cluster | Yes | Yes | Yes | Yes | Yes | Yes | | `ClusterAdmin` | Cluster | No | No | Yes | No | No | Yes | | `UserAdmin` | Cluster | Yes | Yes | No | No | No | No | | `SecurityAdmin` | Cluster | Yes | No | No | No | No | No | | `AuditAdmin` | Cluster | No | No | No | No | No | Yes [(1)](#role-notes) | | `Operator` | Cluster | No | No | Yes | No | No | Yes [(2)](#role-notes) | | `ResourceOwner` | Resource | Yes | Yes | No | Yes | Yes | Yes | | `DeveloperRead` | Resource | No | No | No | Yes | No | No | | `DeveloperWrite` | Resource | No | No | No | No | Yes | No | | `DeveloperManage` | Resource | No | No | No | No | No | Yes | Notes: : 1. The AuditAdmin role provides sufficient access for creating and managing the audit log configuration. 2. For Operator Resource Manage, Operators can only pause, resume, and scale Connectors. super.user : The purpose of `super.user` is to have a bootstrap user who can initially grant another user the `SystemAdmin` role. Technically speaking, `super.user` is not a predefined role. It is a `server.properties` attribute that defines a user who has full access to all resources within a Metadata Service (MDS) cluster. A `super.user` has no access to resources in other clusters (unless also configured as a `super.user` on other clusters). The primary use of super.user is to bootstrap Confluent Platform and assign a SystemAdmin. On MDS clusters, `super.user` can create role bindings for all other clusters. Permissions granted by `super.user` apply only to the broker where the `super.user` attribute is specified, and not to other brokers, clusters, or Confluent Platform components. No authorization is enforced on users defined as `super.user`. It is strongly recommended that this role is assigned only to a limited number of users (for example, 1-2 users who are responsible for bootstrapping). SystemAdmin : Provides full access to all scoped [resources](overview.md#rbac-resource) in the cluster (ksqlDB cluster, Kafka cluster, or Schema Registry cluster). It is strongly recommended that this role is assigned only to a limited number of users (one or two per cluster) who need full permission for initial setup or to address urgent issues when absolutely necessary in production instances. You may wish to assign this role more liberally in small test and development use cases, or when working in ksqlDB clusters that are primarily single tenant. Otherwise, it is recommended that you do not assign this role. ClusterAdmin : Sets up clusters (ksqlDB cluster, Kafka cluster, or Schema Registry cluster). Responsible for setting up and managing Kafka clusters, brokers, networking, ksqlDB clusters, Connect clusters, and adding or removing nodes and performing upgrades. The `ClusterAdmin` typically creates topics and sets the properties of those topics, for example performance and capacity, but cannot read or write to topics, and has no access to data. For monitoring applications, it is recommended that this role is delegated to the operator who monitors your applications. Typically, the `ClusterAdmin` user does not possess knowledge about the content of the cluster data and he/she delegates the ownership responsibility of those resources to users assigned the `ResourceOwner` role. For example, after creating topics the `ClusterAdmin` can set ownership to a specific user familiar with the topic data. UserAdmin : Manages role bindings for users and groups in all clusters managed by MDS. Manages users and groups in a cluster, including the mapping of users and groups to roles. Has no access to any other resources. Typically, users with the `UserAdmin` role are tasked with setting up access to [resources](overview.md#rbac-resource). Users granted this role should be extremely trustworthy because they can grant roles to themselves and others. You can monitor the actions of the `UserAdmin` using audit logs. SecurityAdmin : Enables management of platform-wide security initiatives. Sets up security-related features (for example, encryption, tracking of audit logs, and watching for abnormal behavior). Provides a dedicated set of users for the initial setup and ongoing management of security functions. AuditAdmin : Users or groups assigned this role on the MDS cluster and every registered Kafka cluster can manage the audit log configuration using the [Confluent Metadata API](mds-api.md#mds-api). Operator : Provides operational management of clusters and scale applications as needed. Monitors the health of applications and clusters, including monitoring uptime. This role cannot create applications, nor does it allow you to view or edit the content of the topics. However, you can view what topics and partitions exist. ResourceOwner : Transfers the ownership of critical resources and to scale the ability to manage authorizations for those resources. Owns the [resource](overview.md#rbac-resource) and has full access to it, including read, write, and list. ResourceOwner can grant permission to others who need access to resources. Owner cannot change some of the configurations, for example the number of partitions. Must own the resource to grant others access to it. Enables scaling of authorization for critical resources. DeveloperRead, DeveloperWrite, DeveloperManage : Allows developers to drive the implementation of applications they are working on and manage the content. ### Load the Connector This quick start assumes that security is not configured for HDFS and Hive metastore. To make the necessary security configurations, see [Secure HDFS and Hive Metastore](#dataproc-secure-hdfs-hive-metastore). First, start all the necessary services using the Confluent CLI. ```bash confluent local start ``` Next, start the Avro console producer to import a few records to Kafka: ```bash ./bin/kafka-avro-console-producer --broker-list localhost:9092 --topic test_dataproc \ --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"f1","type":"string"}]}' ``` Then in the console producer, type: ```bash {"f1": "value1"} {"f1": "value2"} {"f1": "value3"} ``` The three records entered are published to the Kafka topic `test_dataproc` in Avro format. Before starting the connector, make sure that the configurations in `etc/gcp-dataproc-sink-quickstart.properties` are properly set to your configurations of Dataproc. For example, `$home` is replaced by your home directory path; `YOUR-PROJECT-ID`, `YOUR-CLUSTER-REGION`, and `YOUR-CLUSTER-NAME` are replaced by your perspective values. Then, start connector by loading its configuration with the following command. ```bash confluent local load dataproc-sink --config etc/gcp-dataproc-sink-quickstart.properties { "name": "dataproc-sink", "config": { "topics": "test_dataproc", "tasks.max": "1", "flush.size": "3", "connector.class": "io.confluent.connect.gcp.dataproc.DataprocSinkConnector", "gcp.dataproc.credentials.path": "/home/user/credentials.json", "gcp.dataproc.projectId": "dataproc-project-id", "gcp.dataproc.region": "us-west1", "gcp.dataproc.cluster": "dataproc-cluster-name", "confluent.license": "", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1", "name": "dataproc-sink" }, "tasks": [], "type": "sink" } ``` To check that the connector started successfully, view the Connect worker’s log by running: ```bash confluent local services connect log ``` Towards the end of the log you should see that the connector starts, logs a few messages, and then exports data from Kafka to HDFS. After the connector finishes ingesting data to HDFS, check that the data is available in HDFS. From the HDFS namenode in Dataproc: ```bash hadoop fs -ls /topics/test_dataproc/partition=0 ``` You should see a file with the name `/topics/test_dataproc/partition=0/test_dataproc+0+0000000000+0000000002.avro` The file name is encoded as `topic+kafkaPartition+startOffset+endOffset.format`. You can use `avro-tools-1.9.1.jar` (available in [Apache mirrors](https://archive.apache.org/dist/avro/avro-1.9.1/java/avro-tools-1.9.1.jar)) to extract the content of the file. Run `avro-tools` directly on Hadoop as: ```bash hadoop jar avro-tools-1.9.1.jar tojson \ hdfs:///topics/test_dataproc/partition=0/test_dataproc+0+0000000000+0000000002.avro ``` where “” is the HDFS Namenode hostname. Usually, the Namenode hostname will be your clustername with “-m” postfix attached. Or, if you experience issues, first copy the avro file from HDFS to the local filesystem and try again with Java: ```bash hadoop fs -copyToLocal /topics/test_dataproc/partition=0/test_dataproc+0+0000000000+0000000002.avro \ /tmp/test_dataproc+0+0000000000+0000000002.avro java -jar avro-tools-1.9.1.jar tojson /tmp/test_dataproc+0+0000000000+0000000002.avro ``` You should see the following output: ```bash {"f1":"value1"} {"f1":"value2"} {"f1":"value3"} ``` Finally, stop the Connect worker as well as all the rest of Confluent Platform by running: ```bash confluent local stop ``` or stop all the services and additionally wipe out any data generated during this quick start by running: ```bash confluent local destroy ``` # Comma separate input topic list topics=connect-test ``` Note that the configuration contains similar settings to the file source. A key difference is that multiple input topics are specified with `topics` whereas the file source allows for only one output topic specified with `topic`. Now start the FileStreamSinkConnector. The sink connector will run within the same worker as the source connector, but each connector task will have its own dedicated thread. ```bash confluent local load file-sink { "name": "file-sink", "config": { "connector.class": "FileStreamSink", "tasks.max": "1", "file": "test.sink.txt", "topics": "connect-test", "name": "file-sink" }, "tasks": [] } ``` To ensure the sink connector is up and running, use the following command to get the state of the connector: ```bash confluent local status file-sink { "name": "file-sink", "connector": { "state": "RUNNING", "worker_id": "192.168.10.1:8083" }, "tasks": [ { "state": "RUNNING", "id": 0, "worker_id": "192.168.10.1:8083" } ] } ``` as well as the list of all loaded connectors: ```bash confluent local status connectors [ "file-source", "file-sink" ] ``` By opening the file `test.sink.txt` you should see the two log lines written to it by the sink connector. With both connectors running, you can see data flowing end-to-end in real time. To check this out, use another terminal to tail the output file: ```bash tail -f test.sink.txt ``` and in a different terminal start appending additional lines to the text file: ```bash for i in {4..1000}; do echo "log line $i"; done >> test.txt ``` You should see the lines being added to `test.sink.txt`. The new data was picked up by the source connector, written to Kafka, read by the sink connector from Kafka, and finally appended to the file. ```bash "log line 1" "log line 2" "log line 3" "log line 4" "log line 5" ... ``` After you are done experimenting with reading from and writing to a file with Connect, you have a few options with respect to shutting down the connectors: * Unload the connectors but leave the Connect worker running. ```bash confluent local unload file-source confluent local unload file-sink ``` * Stop the Connect worker altogether. ```bash confluent local services connect stop Stopping Connect Connect is [DOWN] ``` * Stop the Connect worker as well as all the rest Confluent services. ```bash confluent local stop ``` Your output should resemble: ```none ksqlDB Server is [DOWN] Connect is [DOWN] Kafka REST is [DOWN] Schema Registry is [DOWN] Kafka is [DOWN] KRaft Controller is [DOWN] ``` * Stop all the services and wipe out any data of this particular run of Confluent services. ```bash confluent local destroy ``` Your output should resemble: ```bash ksqlDB Server is [DOWN] Connect is [DOWN] Kafka REST is [DOWN] Schema Registry is [DOWN] Kafka is [DOWN] KRaft Controller is [DOWN] Deleting: /var/folders/ty/rqbqmjv54rg_v10ykmrgd1_80000gp/T/confluent.PkQpsKfE ``` Both source and sink connectors can track offsets, so you can start and stop the process any number of times and add more data to the input file and both will resume where they previously left off. The connectors demonstrated in this tutorial are intentionally simple so no additional dependencies are necessary. Most connectors will require a bit more configuration to specify how to connect to the source or sink system and what data to copy, and for many you will want to execute on a Kafka Connect cluster for scalability and fault tolerance. To get started with Kafka Connect you’ll want to see the [user guide](/kafka-connectors/self-managed/userguide.html) for more details on running and managing Kafka Connect, including how to run in distributed mode. The [Connectors](/kafka-connectors/self-managed/supported.html) section includes details on configuring and deploying the connectors that ship with Confluent Platform. #### NOTE If you’re deploying with Docker, you can skip setting `ksql.connect.worker.config`. ksqlDB will look for environment variables prefixed with `KSQL_CONNECT_`. If it finds any, it will remove the `KSQL_` prefix and place them into a Connect configuration file. Embedded mode will use that configuration file. This is a convenience to avoid creating and mounting a separate configuration file. To get started, here is a Docker Compose example with a server configured for embedded mode. All `KSQL_` environment variables are converted automatically to server configuration properties. Any connectors installed on your host at `confluent-hub-components` are loaded. Save this in a file named `docker-compose.yml`: ```yaml version: '2' services: broker: image: confluentinc/cp-enterprise-kafka:8.1.0 hostname: broker container_name: broker ports: - "29092:29092" environment: KAFKA_BROKER_ID: 1 KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:9092,PLAINTEXT_HOST://localhost:29092 KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0 KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1 KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1 schema-registry: image: confluentinc/cp-schema-registry:8.1.0 hostname: schema-registry container_name: schema-registry depends_on: - broker ports: - "8081:8081" environment: SCHEMA_REGISTRY_HOST_NAME: schema-registry ksqldb-server: image: confluentinc/ksqldb-server:8.1.0 hostname: ksqldb-server container_name: ksqldb-server depends_on: - broker - schema-registry ports: - "8088:8088" volumes: - "./confluent-hub-components:/usr/share/kafka/plugins" environment: KSQL_LISTENERS: "http://0.0.0.0:8088" KSQL_BOOTSTRAP_SERVERS: "broker:9092" KSQL_KSQL_SCHEMA_REGISTRY_URL: "http://schema-registry:8081" KSQL_KSQL_LOGGING_PROCESSING_STREAM_AUTO_CREATE: "true" KSQL_KSQL_LOGGING_PROCESSING_TOPIC_AUTO_CREATE: "true" # Configuration to embed Kafka Connect support. KSQL_CONNECT_GROUP_ID: "ksql-connect-cluster" KSQL_CONNECT_BOOTSTRAP_SERVERS: "broker:9092" KSQL_CONNECT_KEY_CONVERTER: "org.apache.kafka.connect.storage.StringConverter" KSQL_CONNECT_VALUE_CONVERTER: "io.confluent.connect.avro.AvroConverter" KSQL_CONNECT_KEY_CONVERTER_SCHEMA_REGISTRY_URL: "http://schema-registry:8081" KSQL_CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL: "http://schema-registry:8081" KSQL_CONNECT_VALUE_CONVERTER_SCHEMAS_ENABLE: "false" KSQL_CONNECT_CONFIG_STORAGE_TOPIC: "ksql-connect-configs" KSQL_CONNECT_OFFSET_STORAGE_TOPIC: "ksql-connect-offsets" KSQL_CONNECT_STATUS_STORAGE_TOPIC: "ksql-connect-statuses" KSQL_CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1 KSQL_CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1 KSQL_CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1 KSQL_CONNECT_PLUGIN_PATH: "/usr/share/kafka/plugins" ksqldb-cli: image: confluentinc/ksqldb-cli:8.1.0 container_name: ksqldb-cli depends_on: - broker - ksqldb-server entrypoint: /bin/sh tty: true ``` Bring up the stack with: ```bash docker-compose up ``` ### Resources Users access and perform operations on specific Kafka and Confluent Platform resources. A resource can be a cluster, group, Kafka topic, transactional ID, or Delegation token. ACLs specify which users can access a specified resource and the operations they can perform on that resource. Within Kafka, resources include: Cluster : The Kafka cluster. To run operations that impact the entire Kafka cluster, such as a controlled shutdown or creating new topics, must be assigned privileges on the cluster resource. Delegation Token : Delegation tokens are shared secrets between Apache Kafka® brokers and clients. Authentication based on delegation tokens is a lightweight authentication mechanism that you can use to complement existing SASL/SSL methods. Refer to [Use Delegation Tokens for Authentication in Confluent Platform](../../authentication/delegation-tokens/overview.md#kafka-sasl-delegate-auth) for more details. Group : Groups in the brokers. All protocol calls that work with groups, such as joining a group, must have corresponding privileges with the group in the subject. Group (`group.id`) includes Consumer Group, Stream Group (`application.id`), Connect Worker Group, or any other group that uses the Consumer Group protocol, like Schema Registry cluster. When using the `kafka-acls` command’s `--group` flag with a wildcard, you must encapsulate the wildcard with quotes. Failure to do this can result in unexpected results. Topic : All Kafka messages are organized into topics (and partitions). To access a topic, you must have a corresponding operation (such as READ or WRITE) defined in an ACL. When using the `kafka-acls` command’s `--topic` flag with a wildcard, you must encapsulate the wildcard with quotes. Failure to do this can result in unexpected results. Transactional ID : A transactional ID (`transactional.id`) identifies a single producer instance across application restarts and provides a way to ensure a single writer; this is necessary for exactly-once semantics (EOS). Only one producer can be active for each `transactional.id`. When a producer starts, it first checks whether or not there is a pending transaction by a producer with its own `transactional.id`. If there is, then it waits until the transaction has finished (abort or commit). This guarantees that the producer always starts from a consistent state. When used, a producer must be able to manipulate transactional IDs and have all the permissions set. For example, the following ACL allows all users in the system access to an EOS producer: ```shell kafka-acls --bootstrap-server localhost:9092 \ --command-config adminclient-configs.conf \ --add \ --transactional-id * \ --allow-principal User:* \ --operation write ``` In cases where you need to create ACLs for a Kafka cluster to allow Streams exactly once (EOS) processing: ```shell # Allow Streams EOS: kafka-acls ... --add \ --allow-principal User:team1 \ --operation WRITE \ --operation DESCRIBE \ --transactional-id team1-streams-app1 \ --resource-pattern-type prefixed ``` For additional information about the role of transactional IDs, refer to [Transactions in Apache Kafka](https://www.confluent.io/blog/transactions-apache-kafka). The [Operations](#acl-operations) available to a user depend on the resources to which the user has been granted access. All resources have a unique resource identifier. For example, for the topic resource type, the resource identity is the topic name, and for the group resource type, the resource identity is the group name. You can view the ACLs for a specific resource using the `--list` option. For example, to view all ACLs for the topic `test-topic` run the following command: ```shell kafka-acls --bootstrap-server localhost:9092 \ --command-config adminclient-configs.conf \ --list \ --topic test-topic ``` ### ksqlDB In this example, ksqlDB is authenticated and authorized to connect to the secured Kafka cluster, and it is already running queries as defined in the [ksqlDB command file](https://github.com/confluentinc/cp-demo/tree/latest/scripts/ksqlDB/statements.sql). Its embedded producer is configured to be idempotent, exactly-once in order semantics per partition (in the event of an error that causes a producer retry, the same message—which is still sent by the producer multiple times—will only be written to the Kafka log on the broker once). 1. In the navigation bar, click **ksqlDB**. 2. From the list of ksqlDB applications, select `wikipedia`. ![image](tutorials/cp-demo/images/ksql_link.png) 3. View the ksqlDB Flow to see the streams and tables created in the example, and how they relate to one another. ![image](tutorials/cp-demo/images/ksqldb_flow.png) 4. Use Confluent Control Center to interact with ksqlDB, or run ksqlDB CLI to get to the ksqlDB CLI prompt. ```bash docker compose exec ksqldb-cli bash -c 'ksql -u ksqlDBUser -p ksqlDBUser http://ksqldb-server:8088' ``` 5. View the existing ksqlDB streams. (If you are using the ksqlDB CLI, at the `ksql>` prompt, type `SHOW STREAMS;`) ![image](tutorials/cp-demo/images/ksql_streams_list.png) 6. Click **WIKIPEDIA** to describe the schema (fields or columns) of an existing ksqlDB stream. (If you are using the ksqlDB CLI, at the `ksql>` prompt, type `DESCRIBE WIKIPEDIA;`) ![image](tutorials/cp-demo/images/wikipedia_describe.png) 7. View the existing ksqlDB tables. (If you are using the ksqlDB CLI, at the `ksql>` prompt, type `SHOW TABLES;`). One table is called `WIKIPEDIA_COUNT_GT_1`, which counts occurrences within a [tumbling window](../../ksqldb/concepts/time-and-windows-in-ksqldb-queries.md#ksqldb-time-and-windows-tumbling-window). ![image](tutorials/cp-demo/images/ksql_tables_list.png) 8. View the existing ksqlDB queries, which are continuously running. (If you are using the ksqlDB CLI, at the `ksql>` prompt, type `SHOW QUERIES;`). ![image](tutorials/cp-demo/images/ksql_queries_list.png) 9. View messages from different ksqlDB streams and tables. Click on your stream of choice and then click **Query stream** to open the Query Editor. The editor shows a pre-populated query, like `select * from WIKIPEDIA EMIT CHANGES;`, and it shows results for newly arriving data. ![image](tutorials/cp-demo/images/ksql_query_topic.png) 10. Click **ksqlDB Editor** and run the `SHOW PROPERTIES;` statement. You can see the configured ksqlDB server properties and check these values with the [docker-compose.yml](https://github.com/confluentinc/cp-demo/tree/latest/docker-compose.yml) file. ![image](tutorials/cp-demo/images/ksql_properties.png) 11. The [ksqlDB processing log](../../ksqldb/reference/processing-log.md#ksqldb-reference-processing-log) captures per-record errors during processing to help developers debug their ksqlDB queries. In this example, the processing log uses mutual TLS (mTLS) authentication, as configured in the custom [log4j properties file](https://github.com/confluentinc/cp-demo/tree/latest/scripts/helper/log4j-secure.properties), to write entries into a Kafka topic. To see it in action, in the ksqlDB editor run the following “bad” query for 20 seconds: ```bash SELECT 1/0 FROM wikipedia EMIT CHANGES; ``` No records should be returned from this query. ksqlDB writes errors into the processing log for each record. View the processing log topic `ksql-clusterksql_processing_log` with topic inspection (jump to offset 0/partition 0) or the corresponding ksqlDB stream `KSQL_PROCESSING_LOG` with the ksqlDB editor (set `auto.offset.reset=earliest`). ```bash SELECT * FROM KSQL_PROCESSING_LOG EMIT CHANGES; ``` ## Data governance with Schema Registry All the applications and connectors used in this example are configured to automatically read and write Avro-formatted data, leveraging the [Confluent Schema Registry](../../schema-registry/index.md#schemaregistry-intro). The security in place between Schema Registry and the end clients, e.g. `appSA`, is as follows: - Encryption: TLS, e.g. client has `schema.registry.ssl.truststore.*` configurations - Authentication: bearer token authentication from HTTP basic auth headers, e.g. client has `basic.auth.user.info` and `basic.auth.credentials.source` configurations - Authorization: Schema Registry uses the bearer token with RBAC to authorize the client 1. View the Schema Registry subjects for topics that have registered schemas for their keys and/or values. Notice the `curl` arguments include (a) TLS information required to interact with Schema Registry which is listening for HTTPS on port 8085, and (b) authentication credentials required for RBAC (using superUser:superUser to see all of them). ```text docker exec schemaregistry curl -s -X GET \ --tlsv1.2 \ --cacert /etc/kafka/secrets/snakeoil-ca-1.crt \ -u superUser:superUser \ https://schemaregistry:8085/subjects | jq . ``` Your output should resemble: ```JSON [ "WIKIPEDIA_COUNT_GT_1-value", "wikipedia-activity-monitor-KSTREAM-AGGREGATE-STATE-STORE-0000000003-repartition-value", "wikipedia.parsed.replica-value", "WIKIPEDIABOT-value", "WIKIPEDIANOBOT-value", "_confluent-ksql-ksql-clusterquery_CTAS_WIKIPEDIA_COUNT_GT_1_7-Aggregate-GroupBy-repartition-value", "wikipedia.parsed.count-by-domain-value", "wikipedia.parsed-value", "_confluent-ksql-ksql-clusterquery_CTAS_WIKIPEDIA_COUNT_GT_1_7-Aggregate-Aggregate-Materialize-changelog-value" ] ``` 2. Instead of using the superUser credentials, now use client credentials noexist:noexist (user does not exist in LDAP) to try to register a new Avro schema (a record with two fields `username` and `userid`) into Schema Registry for the value of a new topic `users`. It should fail due to an authorization error. ```text docker compose exec schemaregistry curl -X POST \ -H "Content-Type: application/vnd.schemaregistry.v1+json" \ --tlsv1.2 \ --cacert /etc/kafka/secrets/snakeoil-ca-1.crt \ --data '{ "schema": "[ { \"type\":\"record\", \"name\":\"user\", \"fields\": [ {\"name\":\"userid\",\"type\":\"long\"}, {\"name\":\"username\",\"type\":\"string\"} ]} ]" }' \ -u noexist:noexist \ https://schemaregistry:8085/subjects/users-value/versions ``` Your output should resemble: ```JSON {"error_code":401,"message":"Unauthorized"} ``` 3. Instead of using credentials for a user that does not exist, now use the client credentials appSA:appSA (the user appSA exists in LDAP) to try to register a new Avro schema (a record with two fields `username` and `userid`) into Schema Registry for the value of a new topic `users`. It should fail due to an authorization error, with a different message than above. ```text docker compose exec schemaregistry curl -X POST \ -H "Content-Type: application/vnd.schemaregistry.v1+json" \ --tlsv1.2 \ --cacert /etc/kafka/secrets/snakeoil-ca-1.crt \ --data '{ "schema": "[ { \"type\":\"record\", \"name\":\"user\", \"fields\": [ {\"name\":\"userid\",\"type\":\"long\"}, {\"name\":\"username\",\"type\":\"string\"} ]} ]" }' \ -u appSA:appSA \ https://schemaregistry:8085/subjects/users-value/versions ``` Your output should resemble: ```JSON {"error_code":40301,"message":"User is denied operation Write on Subject: users-value"} ``` 4. Create a role binding for the `appSA` client permitting it access to Schema Registry. Get the Kafka cluster ID: ```none KAFKA_CLUSTER_ID=$(curl -s https://localhost:8091/v1/metadata/id --tlsv1.2 --cacert scripts/security/snakeoil-ca-1.crt | jq -r ".id") ``` Create the role binding: ```text # Create the role binding for the subject ``users-value``, i.e., the topic-value (versus the topic-key) docker compose exec tools bash -c "confluent iam rbac role-binding create \ --principal User:appSA \ --role ResourceOwner \ --resource Subject:users-value \ --kafka-cluster-id $KAFKA_CLUSTER_ID \ --schema-registry-cluster schema-registry" ``` 5. Again try to register the schema. It should pass this time. Note the schema id that it returns, e.g. below schema id is `9`. ```text docker compose exec schemaregistry curl -X POST \ -H "Content-Type: application/vnd.schemaregistry.v1+json" \ --tlsv1.2 \ --cacert /etc/kafka/secrets/snakeoil-ca-1.crt \ --data '{ "schema": "[ { \"type\":\"record\", \"name\":\"user\", \"fields\": [ {\"name\":\"userid\",\"type\":\"long\"}, {\"name\":\"username\",\"type\":\"string\"} ]} ]" }' \ -u appSA:appSA \ https://schemaregistry:8085/subjects/users-value/versions ``` Your output should resemble: ```JSON {"id":9} ``` 6. View the new schema for the subject `users-value`. From Confluent Control Center, click **Topics**. Scroll down to and click on the topic users and select “SCHEMA”. ![image](tutorials/cp-demo/images/schema1.png) You may alternatively request the schema via the command line: ```text docker exec schemaregistry curl -s -X GET \ --tlsv1.2 \ --cacert /etc/kafka/secrets/snakeoil-ca-1.crt \ -u appSA:appSA \ https://schemaregistry:8085/subjects/users-value/versions/1 | jq . ``` Your output should resemble: ```JSON { "subject": "users-value", "version": 1, "id": 9, "schema": "{\"type\":\"record\",\"name\":\"user\",\"fields\":[{\"name\":\"username\",\"type\":\"string\"},{\"name\":\"userid\",\"type\":\"long\"}]}" } ``` 7. Describe the topic `users`. Notice that it has a special configuration `confluent.value.schema.validation=true` which enables [Schema Validation](../../schema-registry/schema-validation.md#schema-validation), a data governance feature in Confluent Server that gives operators a centralized location within the Kafka cluster itself to enforce data format correctness. Enabling Schema ID Validation allows brokers configured with `confluent.schema.registry.url` to validate that data produced to the topic is using a valid schema. ```bash docker compose exec kafka1 kafka-topics \ --describe \ --topic users \ --bootstrap-server kafka1:9091 \ --command-config /etc/kafka/secrets/client_sasl_plain.config ``` Your output should resemble: ```bash Topic: users PartitionCount: 2 ReplicationFactor: 2 Configs: confluent.value.schema.validation=true Topic: users Partition: 0 Leader: 1 Replicas: 1,2 Isr: 1,2 Offline: Topic: users Partition: 1 Leader: 2 Replicas: 2,1 Isr: 2,1 Offline: ``` 8. Now produce a non-Avro message to this topic using `kafka-console-producer`. ```bash docker compose exec connect kafka-console-producer \ --topic users \ --broker-list kafka1:11091 \ --producer-property security.protocol=SSL \ --producer-property ssl.truststore.location=/etc/kafka/secrets/kafka.appSA.truststore.jks \ --producer-property ssl.truststore.password=confluent \ --producer-property ssl.keystore.location=/etc/kafka/secrets/kafka.appSA.keystore.jks \ --producer-property ssl.keystore.password=confluent \ --producer-property ssl.key.password=confluent ``` After starting the console producer, it will wait for input. Enter a few characters and press enter. It should result in a failure with an error message that resembles: ```bash ERROR Error when sending message to topic users with key: null, value: 5 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) org.apache.kafka.common.InvalidRecordException: This record has failed the validation on broker and hence be rejected. ``` Close the console producer by entering `CTRL+C`. 9. Describe the topic `wikipedia.parsed`, which is the topic that the kafka-connect-sse source connector is writing to. Notice that it also has enabled Schema ID Validation. ```bash docker compose exec kafka1 kafka-topics \ --describe \ --topic wikipedia.parsed \ --bootstrap-server kafka1:9091 \ --command-config /etc/kafka/secrets/client_sasl_plain.config ``` 10. Next step: Learn more about Schema Registry with the [Schema Registry Tutorial](../../schema-registry/schema_registry_onprem_tutorial.md#schema-registry-tutorial). ## Quick Start Use this quick start to get up and running with the Confluent Cloud Amazon Redshift Sink connector. The quick start provides the basics of selecting the connector and configuring it to stream events to Amazon Redshift. Prerequisites : - Authorized access to a [Confluent Cloud](https://www.confluent.io/confluent-cloud/) cluster on Amazon Web Services. - The Confluent CLI installed and configured for the cluster. See [Install the Confluent CLI](https://docs.confluent.io/confluent-cli/current/install.html). - [Schema Registry](../get-started/schema-registry.md#cloud-sr-config) must be enabled to use a Schema Registry-based format (for example, Avro, JSON_SR (JSON Schema), or Protobuf). See [Schema Registry Enabled Environments](limits.md#connect-ccloud-environment-limits) for additional information. - The Amazon Redshift database must be in the same region as your Confluent Cloud cluster. - For networking considerations, see [Networking and DNS](overview.md#connect-internet-access-resources). To use a set of public egress IP addresses, see [Public Egress IP Addresses for Confluent Cloud Connectors](static-egress-ip.md#cc-static-egress-ips). - The connector configuration requires a Redshift user (and password) with Redshift database privileges. For example: ```sql CREATE DATABASE ; CREATE USER PASSWORD ''; GRANT USAGE ON SCHEMA public TO ; GRANT CREATE ON SCHEMA public TO ; GRANT SELECT ON ALL TABLES IN SCHEMA public TO ; GRANT ALL ON SCHEMA public TO ; GRANT CREATE ON DATABASE TO ; ``` For additional information, see the [Redshift docs](https://docs.aws.amazon.com/redshift/latest/gsg/database-tasks.html). - Kafka cluster credentials. The following lists the different ways you can provide credentials. - Enter an existing [service account](service-account.md#s3-cloud-service-account) resource ID. - Create a Confluent Cloud [service account](service-account.md#s3-cloud-service-account) for the connector. Make sure to review the ACL entries required in the [service account documentation](service-account.md#s3-cloud-service-account). Some connectors have specific ACL requirements. - Create a Confluent Cloud API key and secret. To create a key and secret, you can use [confluent api-key create](https://docs.confluent.io/confluent-cli/current/command-reference/api-key/confluent_api-key_create.html) *or* you can autogenerate the API key and secret directly in the Cloud Console when setting up the connector. ### Standalone Cluster 1. Create `my-connect-standalone.properties` in the config directory, whose contents look like the following (note the security configs with `consumer.*` and `producer.*` prefixes). ```bash bootstrap.servers= # The converters specify the format of data in Kafka and how to translate it into Connect data. Every Connect user will # need to configure these based on the format they want their data in when loaded from or stored into Kafka key.converter=org.apache.kafka.connect.json.JsonConverter value.converter=org.apache.kafka.connect.json.JsonConverter # Converter-specific settings can be passed in by prefixing the Converter's setting with the converter you want to apply # it to key.converter.schemas.enable=false value.converter.schemas.enable=false # The internal converter used for offsets and config data is configurable and must be specified, but most users will # always want to use the built-in default. Offset and config data is never visible outside of Kafka Connect in this format. internal.key.converter=org.apache.kafka.connect.json.JsonConverter internal.value.converter=org.apache.kafka.connect.json.JsonConverter internal.key.converter.schemas.enable=false internal.value.converter.schemas.enable=false # Store offsets on local filesystem offset.storage.file.filename=/tmp/connect.offsets # Flush much faster than normal, which is useful for testing/debugging offset.flush.interval.ms=10000 ssl.endpoint.identification.algorithm=https sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \ username="" password=""; security.protocol=SASL_SSL consumer.ssl.endpoint.identification.algorithm=https consumer.sasl.mechanism=PLAIN consumer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \ username="" password=""; consumer.security.protocol=SASL_SSL producer.ssl.endpoint.identification.algorithm=https producer.sasl.mechanism=PLAIN producer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \ username="" password=""; producer.security.protocol=SASL_SSL # Set to a list of filesystem paths separated by commas (,) to enable class loading isolation for plugins # (connectors, converters, transformations). plugin.path=/usr/share/java,/Users//confluent-6.2.1/share/confluent-hub-components ``` 2. (Optional) Add the configs to `my-connect-standalone.properties` to connect to Confluent Cloud Schema Registry per the example in [connect-ccloud.delta](https://github.com/confluentinc/examples/tree/latest/ccloud/template_delta_configs/connect-ccloud.delta) on GitHub at [ccloud/examples/template_delta_configs](https://github.com/confluentinc/examples/tree/latest/ccloud/template_delta_configs). ```bash # Confluent Schema Registry for Kafka Connect value.converter=io.confluent.connect.avro.AvroConverter value.converter.basic.auth.credentials.source=USER_INFO value.converter.schema.registry.basic.auth.user.info=: value.converter.schema.registry.url=https:// ``` In addition to the above settings shown in the referenced GitHub example, add these key and value converter configurations to provide valid credentials. ```bash "key.converter": "io.confluent.connect.avro.AvroConverter", "value.converter": "io.confluent.connect.avro.AvroConverter", "key.converter.schema.registry.url": "${file:/data/confluent-cloud/server.properties:SCHEMA_REGISTRY_URL}", "key.converter.schema.registry.basic.auth.credentials.source":"USER_INFO", "key.converter.schema.registry.basic.auth.user.info": "${file:/data/confluent-cloud/server.properties:BASIC_AUTH_INFO}", "value.converter.schema.registry.url": "${file:/data/confluent-cloud/server.properties:SCHEMA_REGISTRY_URL}", "value.converter.schema.registry.basic.auth.credentials.source":"USER_INFO", "value.converter.schema.registry.basic.auth.user.info": "${file:/data/confluent-cloud/server.properties:BASIC_AUTH_INFO}", ``` 3. Create `my-file-sink.properties` in the config directory, whose contents look like the following (note the security configs with `consumer.*` prefix): ```text name=my-file-sink connector.class=org.apache.kafka.connect.file.FileStreamSinkConnector tasks.max=1 topics=page_visits file=my_file.txt ``` #### IMPORTANT You must include the following properties in the connector configuration if you are using a self-managed connector that requires an enterprise license. ```text confluent.topic.bootstrap.servers= confluent.topic.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule \ required username="" password=""; confluent.topic.security.protocol=SASL_SSL confluent.topic.sasl.mechanism=PLAIN ``` #### IMPORTANT You must include the following properties in the connector configuration if you are using a self-managed connector that uses Reporter to write response back to Kafka (for example, the [Azure Functions Sink Connector for Confluent Platform](../../../kafka-connect-azure-functions/current/index.html) or the [Google Cloud Functions Sink Connector for Confluent Platform](../../../kafka-connect-gcp-functions/current/index.html) connector) . ```text reporter.admin.bootstrap.servers= reporter.admin.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule \ required username="" password=""; reporter.admin.security.protocol=SASL_SSL reporter.admin.sasl.mechanism=PLAIN reporter.producer.bootstrap.servers= reporter.producer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule \ required username="" password=""; reporter.producer.security.protocol=SASL_SSL reporter.producer.sasl.mechanism=PLAIN ``` #### IMPORTANT You must include the following properties in the connector configuration if you are using the following connectors: ### Debezium 2 and later ```text "schema.history.internal.kafka.bootstrap.servers": "", "schema.history.internal.consumer.security.protocol": "SASL_SSL", "schema.history.internal.consumer.ssl.endpoint.identification.algorithm": "https", "schema.history.internal.consumer.sasl.mechanism": "PLAIN", "schema.history.internal.consumer.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";", "schema.history.internal.producer.security.protocol": "SASL_SSL", "schema.history.internal.producer.ssl.endpoint.identification.algorithm": "https", "schema.history.internal.producer.sasl.mechanism": "PLAIN", "schema.history.internal.producer.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";" ``` ### Debezium 1.9 and earlier ```text "database.history.kafka.bootstrap.servers": "", "database.history.consumer.security.protocol": "SASL_SSL", "database.history.consumer.ssl.endpoint.identification.algorithm": "https", "database.history.consumer.sasl.mechanism": "PLAIN", "database.history.consumer.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";", "database.history.producer.security.protocol": "SASL_SSL", "database.history.producer.ssl.endpoint.identification.algorithm": "https", "database.history.producer.sasl.mechanism": "PLAIN", "database.history.producer.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";" ``` ### Oracle XStream CDC Source ```text "schema.history.internal.kafka.bootstrap.servers": "", "schema.history.internal.consumer.security.protocol": "SASL_SSL", "schema.history.internal.consumer.ssl.endpoint.identification.algorithm": "https", "schema.history.internal.consumer.sasl.mechanism": "PLAIN", "schema.history.internal.consumer.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";", "schema.history.internal.producer.security.protocol": "SASL_SSL", "schema.history.internal.producer.ssl.endpoint.identification.algorithm": "https", "schema.history.internal.producer.sasl.mechanism": "PLAIN", "schema.history.internal.producer.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";", # Uncomment and include the following properties only if the connector is configured to use Kafka topics for signaling #"signal.kafka.bootstrap.servers": "", #"signal.consumer.security.protocol": "SASL_SSL", #"signal.consumer.ssl.endpoint.identification.algorithm": "https", #"signal.consumer.sasl.mechanism": "PLAIN", #"signal.consumer.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";" ``` 4. Run the `connect-standalone` script with the filenames as arguments: ```bash ./bin/connect-standalone ./etc/my-connect-standalone.properties ./etc/my-file-sink.properties ``` This should start a connect worker on your machine which will consume the records produced earlier using the `ccloud` command. If you tail the contents of `my_file.txt`, it should resemble the following: ```text tail -f my_file.txt {"field1": "hello", "field2": 1} {"field1": "hello", "field2": 2} {"field1": "hello", "field2": 3} {"field1": "hello", "field2": 4} {"field1": "hello", "field2": 5} {"field1": "hello", "field2": 6} ``` ## Procedure 1. Download [Confluent Platform](https://www.confluent.io/download/) and extract the contents. 2. Create a topic named `rest-proxy-test` by using the Confluent CLI: ```bash confluent kafka topic create --partitions 4 rest-proxy-test ``` 3. Create a properties file. 1. Find the client settings for your cluster by clicking **CLI & client configuration** from the Cloud Console interface. 2. Click the **Clients** tab. 3. Click the **Java** client selection. This example uses the Java client. ![Java client configuration properties](images/ccloud-client-rest-config.png) 4. Create a properties file named `ccloud-kafka-rest.properties` where the Confluent Platform files are location. ```none cd ``` ```none touch ccloud-kafka-rest.properties ``` 5. Copy and paste the Java client configuration properties into the file. Add the `client.` prefix to each of security properties. For example: ```none # Kafka bootstrap.servers=.cloud:9092 security.protocol=SASL_SSL sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule \ required username="" password=""; ssl.endpoint.identification.algorithm=https sasl.mechanism=PLAIN client.bootstrap.servers=.cloud:9092 client.security.protocol=SASL_SSL client.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule \ required username="" password=""; client.ssl.endpoint.identification.algorithm=https client.sasl.mechanism=PLAIN # Confluent Cloud Schema Registry schema.registry.url= client.basic.auth.credentials.source=USER_INFO client.schema.registry.basic.auth.user.info=: ``` Producers, consumers, and the admin client share the `client.` properties. Refer to the following table to specify additional properties for the producer, consumer, or admin client. | Component | Prefix | Example | |--------------|-------------|-----------------------------| | Admin Client | `admin.` | admin.request.timeout.ms | | Consumer | `consumer.` | consumer.request.timeout.ms | | Producer | `producer.` | producer.acks | An example of adding these properties is shown below: ```none # Kafka bootstrap.servers=.cloud:9092 security.protocol=SASL_SSL client.security.protocol=SASL_SSL client.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="" password=""; client.ssl.endpoint.identification.algorithm=https sasl.mechanism=PLAIN client.sasl.mechanism=PLAIN # Confluent Cloud Schema Registry schema.registry.url= client.basic.auth.credentials.source=USER_INFO client.schema.registry.basic.auth.user.info=: # consumer only properties must be prefixed with consumer. consumer.retry.backoff.ms=600 consumer.request.timeout.ms=25000 # producer only properties must be prefixed with producer. producer.acks=1 # admin client only properties must be prefixed with admin. admin.request.timeout.ms=50000 ``` For details about how to create a Confluent Cloud API key and API secret so that you can communicate with the REST API, refer to [Create credentials to access the Kafka cluster resources](../kafka-rest/krest-qs.md#rest-api-qs-create-creds). 4. Start the REST Proxy. ```none ./bin/kafka-rest-start ccloud-kafka-rest.properties ``` 5. Make REST calls using [REST API v2](/platform/current/kafka-rest/api.html#crest_v2_api). Example request: ```none GET /topics/test HTTP/1.1 Accept: application/vnd.kafka.v2+json ``` #### Distributed worker configuration 1. Create your `my-connect-distributed-json.properties` file based on the following example. ```text bootstrap.servers= key.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.json.JsonConverter value.converter.schemas.enable=false ssl.endpoint.identification.algorithm=https security.protocol=SASL_SSL sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="" password=""; request.timeout.ms=20000 retry.backoff.ms=500 producer.bootstrap.servers= producer.ssl.endpoint.identification.algorithm=https producer.security.protocol=SASL_SSL producer.sasl.mechanism=PLAIN producer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="" password=""; producer.request.timeout.ms=20000 producer.retry.backoff.ms=500 consumer.bootstrap.servers= consumer.ssl.endpoint.identification.algorithm=https consumer.security.protocol=SASL_SSL consumer.sasl.mechanism=PLAIN consumer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="" password=""; consumer.request.timeout.ms=20000 consumer.retry.backoff.ms=500 offset.flush.interval.ms=10000 offset.storage.file.filename=/tmp/connect.offsets group.id=connect-cluster offset.storage.topic=connect-offsets offset.storage.replication.factor=3 offset.storage.partitions=3 config.storage.topic=connect-configs config.storage.replication.factor=3 status.storage.topic=connect-status status.storage.replication.factor=3 # Confluent license settings confluent.topic.bootstrap.servers= confluent.topic.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="" password=""; confluent.topic.security.protocol=SASL_SSL confluent.topic.sasl.mechanism=PLAIN # Set to a list of filesystem paths separated by commas (,) to enable class loading isolation for plugins # (connectors, converters, transformations). The list should consist of top level directories that include # any combination of: # a) directories immediately containing jars with plugins and their dependencies # b) uber-jars with plugins and their dependencies # c) directories immediately containing the package directory structure of classes of plugins and their dependencies # Examples: # plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors, plugin.path=/usr/share/java,/confluent-6.0.0/share/confluent-hub-components # Enable source connectors to create topics # KIP-158 topic.creation.enable=true ``` 2. Start Kafka Connect with the following command: ```text /bin/connect-distributed my-connect-distributed-json.properties ``` ## Quick start In this quick start guide, you use the SQS Source connector to export messages from an SQS FIFO queue to a Kafka topic. Before running the quick start, ensure the following: - [Confluent Platform](/platform/current/installation/installing_cp/index.html) is installed and services are running by using the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/index.html). This quick start assumes that you are using the Confluent CLI, but standalone installations are also supported. By default ZooKeeper, Kafka, Schema Registry, Connect, and the Connect REST API are started with the `confluent local start` command. For more information, see [Quick Start for Apache Kafka using Confluent Platform (Local)](/platform/current/quickstart/ce-quickstart.html). Note that as of Confluent Platform 7.5, ZooKeeper is deprecated for new deployments. Confluent recommends KRaft mode for new deployments. - You must install the [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html) and configure it by running the `aws configure` command. - Ensure the IAM user or role you configure has full access to SQS. - Create a Kafka topic called `sqs-quickstart`. 1. Create a FIFO queue by running the following command: ```bash aws sqs create-queue --queue-name sqs-source-connector-demo ``` You should see output similar to the following: ```bash { "QueueUrl": "https://queue.amazonaws.com/940887362971/sqs-source-connector-demo" } ``` 2. Add some records to the newly created queue by first creating a file called `send-message-batch.json` with the following content: ```bash [ { "Id":"FuelReport-0001-2015-09-16T140731Z", "MessageBody":"Fuel report for account 0001 on 2015-09-16 at 02:07:31 PM.", "DelaySeconds":10, "MessageAttributes":{ "SellerName":{ "DataType":"String", "StringValue":"Example Store" }, "City":{ "DataType":"String", "StringValue":"Any City" }, "Region":{ "DataType":"String", "StringValue":"WA" }, "PostalCode":{ "DataType":"String", "StringValue":"99065" }, "PricePerGallon":{ "DataType":"Number", "StringValue":"1.99" } } }, { "Id":"FuelReport-0002-2015-09-16T140930Z", "MessageBody":"Fuel report for account 0002 on 2015-09-16 at 02:09:30 PM.", "DelaySeconds":10, "MessageAttributes":{ "SellerName":{ "DataType":"String", "StringValue":"Example Fuels" }, "City":{ "DataType":"String", "StringValue":"North Town" }, "Region":{ "DataType":"String", "StringValue":"WA" }, "PostalCode":{ "DataType":"String", "StringValue":"99123" }, "PricePerGallon":{ "DataType":"Number", "StringValue":"1.87" } } } ] ``` 3. Add the records to the queue by running the following command: ```bash aws sqs send-message-batch --queue-url https://queue.amazonaws.com/940887362971/sqs-source-connector-demo --entries file://send-message-batch.json`` ``` 4. Load the SQS Source connector. Note that you must ensure the `sqs.url` configuration parameter points to the correct SQS URL. The `sqs.url` parameter format is: `sqs.url=https://sqs..amazonaws.com//`. For example, if the AWS CLI returns the queue URL: `https://eu-central-1.queue.amazonaws.com/829250931565/sqs-source-connector-demo`, the `sqs.url` for the SQS Source connector is `https://sqs.eu-central-1.amazonaws.com/829250931565/sqs-source-connector-demo`. ```bash confluent local load sqs-source ``` Your output should resemble: ```bash { "name": "sqs-source", "config": { "connector.class": "io.confluent.connect.sqs.source.SqsSourceConnector", "tasks.max": "1", "kafka.topic": "test-sqs-source", "sqs.url": "https://sqs.us-east-1.amazonaws.com/942288736285822/sqs-fifo-queue.fifo", "name": "sqs-source" }, "tasks": [], "type": null } ``` 5. After the connector finishes ingesting data to Kafka, check that the data is available in the Kafka topic. ```bash bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test-sqs-source --from-beginning ``` You should see two records, similar to the following: ```bash { "schema":{ "type":"struct", "fields":[ { "type":"int64", "optional":false, "field":"ApproximateFirstReceiveTimestamp" }, { "type":"int32", "optional":false, "field":"ApproximateReceiveCount" }, { "type":"string", "optional":false, "field":"SenderId" }, { "type":"int64", "optional":false, "field":"SentTimestamp" }, { "type":"string", "optional":true, "field":"MessageDeduplicationId" }, { "type":"string", "optional":true, "field":"MessageGroupId" }, { "type":"string", "optional":true, "field":"SequenceNumber" }, { "type":"string", "optional":false, "field":"Body" } ], "optional":false, "version":1 }, "payload":{ "ApproximateFirstReceiveTimestamp":1563430750668, "ApproximateReceiveCount":2, "SenderId":"AIDA5WEKBZWN3QYIY7KAJ", "SentTimestamp":1563430591780, "MessageDeduplicationId":null, "MessageGroupId":null, "SequenceNumber":null, "Body":"Fuel report for account 0001 on 2015-09-16 at 02:07:31 PM." } } ``` ## Requirements From a high level, Replicator works like a consumer group with the partitions of the replicated topics from the source cluster divided between the connector’s tasks. Replicator periodically polls the source cluster for changes to the configuration of replicated topics and the number of partitions, and updates the destination cluster accordingly by creating topics or updating configuration. For this to work correctly, the following is required: * The Origin and Destination clusters must be Apache Kafka® or Confluent Platform. For version compatibility see [connector interoperability](../../installation/versions-interoperability.md#interoperability-versions-connectors) * The Replicator version must match the Kafka Connect version it is deployed on. For instance Replicator 8.1 should only be deployed to Kafka Connect 8.1. * The ACLs mentioned in [here](#replicator-security-overview) are required. * The default topic configurations in the source and destination clusters must match. In general, aside from any broker-specific settings (such as `broker.id`), you should use the same broker configuration in both clusters. * The destination Kafka cluster must have a similar capacity as the source cluster. In particular, since Replicator will preserve the replication factor of topics in the source cluster, which means that there must be at least as many brokers as the maximum replication factor used. If not, topic creation will fail until the destination cluster has the capacity to support the same replication factor. Note in this case, that topic creation will be retried automatically by the connector, so replication will begin as soon as the destination cluster has enough brokers. * The `dest.kafka.bootstrap.servers` destination connection setting in the Replicator properties file must be configured to use a single destination cluster, even when using multiple source clusters. For example, the figure shown at the start of this section shows two source clusters in different datacenters targeting a single *aggregate* destination cluster. Note that the aggregate destination cluster must have a similar capacity as the total of all associated source clusters. * On Confluent Platform versions 5.3.0 and later, Confluent Replicator requires the enterprise edition of [Kafka Connect](../../connect/index.md#kafka-connect). Replicator does not support the community edition of Connect. You can install the enterprise edition of Connect as part of the Confluent Platform on-premises bundle, as described in [Production Environments](../../installation/overview.md#on-prem-production) and in the [Quick Start for Confluent Platform](../../get-started/platform-quickstart.md#quickstart) (choose self-managed Confluent Platform). Demos of enterprise Connect are available at [Quick Start for Confluent Platform](../../get-started/platform-quickstart.md#quickstart) and on Docker Hub at [confluentinc/cp-server-connect](https://hub.docker.com/r/confluentinc/cp-server-connect). * The `timestamp-interceptor` for consumers supports only Java clients, as described in [Configuring the consumer for failover (timestamp preservation)](replicator-failover.md#configuring-the-consumer-for-failover). ## Resume applications after failover After a disaster event occurs, switch your Java consumer application to a different datacenter, and then it can automatically restart consuming data in the destination cluster where it left off in the origin cluster. To use this capability, configure Java consumer applications with the [Consumer Timestamps Interceptor](replicator-failover.md#configuring-the-consumer-for-failover), which is shown in this [sample code](https://github.com/confluentinc/examples/tree/latest/multi-datacenter/src/main/java/io/confluent/examples/clients/ConsumerMultiDatacenterExample.java). 1. After starting the demo (see [previous section](#start-the-services)), run the consumer to connect to the `dc1` Kafka cluster. It automatically configures the consumer group ID as `java-consumer-topic1` and uses the Consumer Timestamps Interceptor. ```bash mvn clean package mvn exec:java -Dexec.mainClass=io.confluent.examples.clients.ConsumerMultiDatacenterExample -Dexec.args="topic1 localhost:9091 http://localhost:8081 localhost:9092" ``` 2. Verify in the consumer output that it is reading data originating from both dc1 and dc2: ```bash ... key = User_1, value = {"userid": "User_1", "dc": "dc1"} key = User_9, value = {"userid": "User_9", "dc": "dc2"} key = User_6, value = {"userid": "User_6", "dc": "dc2"} ... ``` 3. Even though the consumer is consuming from dc1, there are dc2 consumer offsets committed for the consumer group `java-consumer-topic1`. Run the following command to read from the `__consumer_offsets` topic in dc2. ```bash docker-compose exec broker-dc2 \ kafka-console-consumer \ --topic __consumer_offsets \ --bootstrap-server localhost:9092 \ --formatter "kafka.coordinator.group.GroupMetadataManager\$OffsetsMessageFormatter" | grep java-consumer ``` 4. Verify that there are committed offsets: ```bash ... [java-consumer-topic1,topic1,0]::OffsetAndMetadata(offset=1142, leaderEpoch=Optional.empty, metadata=, commitTimestamp=1547146285084, expireTimestamp=None) [java-consumer-topic1,topic1,0]::OffsetAndMetadata(offset=1146, leaderEpoch=Optional.empty, metadata=, commitTimestamp=1547146286082, expireTimestamp=None) [java-consumer-topic1,topic1,0]::OffsetAndMetadata(offset=1150, leaderEpoch=Optional.empty, metadata=, commitTimestamp=1547146287084, expireTimestamp=None) ... ``` 5. Kafka clients include any application that uses the Apache Kafka client API to connect to Kafka brokers, such as custom client code or any service that has embedded producers or consumers, such as Kafka Connect, ksqlDB, or a Kafka Streams application. Control Center uses that topic to ensure that all messages are delivered and to provide statistics on throughput and latency performance. From that same topic, you can also derive which producers are writing to which topics and which consumers are reading from which topics, and an example [script](https://github.com/confluentinc/examples/tree/latest/multi-datacenter/map_topics_clients.py) is provided with the repo. ```bash ./map_topics_clients.py ``` #### NOTE This script is for demo purposes only. It is not suitable for production. 6. In steady state with the Java consumer running, you should see: ```bash Reading topic _confluent-monitoring for 60 seconds...please wait __consumer_timestamps producers consumer-1 producer-10 producer-11 producer-6 producer-8 consumers replicator-dc1-to-dc2-topic1 replicator-dc1-to-dc2-topic2 replicator-dc2-to-dc1-topic1 _schemas producers connect-worker-producer-dc2 consumers replicator-dc1-to-dc2-topic1 topic1 producers connect-worker-producer-dc1 connect-worker-producer-dc2 datagen-dc1-topic1 datagen-dc2-topic1 consumers java-consumer-topic1 replicator-dc1-to-dc2-topic1 replicator-dc2-to-dc1-topic1 topic2 producers datagen-dc1-topic2 consumers replicator-dc1-to-dc2-topic2 topic2.replica producers connect-worker-producer-dc2 ``` 7. Shut down `dc1`: ```bash docker-compose stop connect-dc1 schema-registry-dc1 broker-dc1 zookeeper-dc1 ``` 8. Stop and restart the consumer to connect to the `dc2` Kafka cluster. It will still use the same consumer group ID `java-consumer-topic1` so it can resume where it left off: ```bash mvn exec:java -Dexec.mainClass=io.confluent.examples.clients.ConsumerMultiDatacenterExample -Dexec.args="topic1 localhost:9092 http://localhost:8082 localhost:9092" ``` 9. Verify that see data sourced only from `dc2`: ```bash ... key = User_8, value = {"userid": "User_8", "dc": "dc2"} key = User_9, value = {"userid": "User_9", "dc": "dc2"} key = User_5, value = {"userid": "User_5", "dc": "dc2"} ... ``` ## Kafka Connect This section describes how to enable security for Kafka Connect. Securing Kafka Connect requires that you configure security for: 1. Kafka Connect workers: part of the Kafka Connect API, a worker is really just an advanced client, underneath the covers 2. Kafka Connect connectors: connectors may have embedded producers or consumers, so you must override the default configurations for Connect producers used with source connectors and Connect consumers used with sink connectors 3. Kafka Connect REST: Kafka Connect exposes a REST API that can be configured to use TLS/SSL using [additional properties](../../protect-data/encrypt-tls.md#encryption-ssl-rest) Configure security for Kafka Connect as described in the section below. Additionally, if you are using Confluent Control Center streams monitoring for Kafka Connect, configure security for: * [Confluent Metrics Reporter](#authentication-ssl-metrics-reporter) Configure the top-level settings in the Connect workers to use TLS by adding these properties in `connect-distributed.properties`. These top-level settings are used by the Connect worker for group coordination and to read and write to the internal topics that are used to track the cluster’s state (for example, configs and offsets). The assumption here is that client authentication is required by the brokers. ```bash bootstrap.servers=kafka1:9093 security.protocol=SSL ssl.truststore.location=/var/private/ssl/kafka.client.truststore.jks ssl.truststore.password=test1234 ssl.keystore.location=/var/private/ssl/kafka.client.keystore.jks ssl.keystore.password=test1234 ssl.key.password=test1234 ``` Connect workers manage the producers used by source connectors and the consumers used by sink connectors. So, for the connectors to leverage security, must also override the default producer/consumer configuration that the worker uses. The assumption here is that client authentication is required by the brokers. * For source connectors: configure the same properties adding the `producer` prefix. ```bash producer.bootstrap.servers=kafka1:9093 producer.security.protocol=SSL producer.ssl.truststore.location=/var/private/ssl/kafka.client.truststore.jks producer.ssl.truststore.password=test1234 producer.ssl.keystore.location=/var/private/ssl/kafka.client.keystore.jks producer.ssl.keystore.password=test1234 producer.ssl.key.password=test1234 ``` * For sink connectors: configure the same properties adding the `consumer` prefix. ```bash consumer.bootstrap.servers=kafka1:9093 consumer.security.protocol=SSL consumer.ssl.truststore.location=/var/private/ssl/kafka.client.truststore.jks consumer.ssl.truststore.password=test1234 consumer.ssl.keystore.location=/var/private/ssl/kafka.client.keystore.jks consumer.ssl.keystore.password=test1234 consumer.ssl.key.password=test1234 ``` ## 1 - Establish trust between the IdP and Confluent Platform Use the procedure below to ensure that you configure your IdP correctly. Under each step, select the tab that corresponds to your IdP for details. 1. Create an OIDC client application configured with an authorization code grant type in your IdP. The following tabs contain provider-specific configuration instructions: ### Okta In the Okta documentation, complete [Create OIDC app integrations](https://help.okta.com/en-us/content/topics/apps/apps_app_integration_wizard_oidc.htm). ### Keycloak > In the Keycloak documentation, complete [Managing OpenID Connect clients](https://www.keycloak.org/docs/latest/server_admin/index.html#oidc-clients). ### Microsoft Entra ID In the Microsoft Azure documentation, complete [Quickstart: Register an application with the Microsoft identity platform](https://docs.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app). #### WARNING Microsoft Entra ID users must create a separate registered application for OAuth. Do not combine OAuth together with SAML in a single application configuration. Such a dual configuration can lead to issues with single sign-on (SSO) and other authentication flows. 2. Add a redirect (callback) URL to Confluent Control Center on Confluent Platform in the client application. The URL should follow this format: ```html https://:/api/metadata/security/1.0/oidc/authorization-code/callback ``` 3. Enable identity tokens. Identity tokens are enabled by default when you create an OIDC application in your IdP. ### Okta > Creating the authorization server by default enables identity (ID) tokens. For more information, see [ID tokens](https://developer.okta.com/docs/reference/api/oidc/#id-token). ### Keycloak > For more information, see [Server Administration Guide](https://www.keycloak.org/docs/latest/server_admin/) in the > Keycloak documentation. ### Microsoft Entra ID For more information on enabling identity tokens, see [ID tokens in the Microsoft identity platform](https://learn.microsoft.com/en-us/azure/active-directory/develop/id-tokens) in the Azure documentation. 4. Enable refresh tokens. ### Okta > Check the **Refresh Token** option in the **Grant type** section of > the **Applications** page. ### Keycloak > Refresh tokens are enabled by default. For more information, see > [Authorization Code Flow](https://www.keycloak.org/docs/latest/server_admin/#_oidc-auth-flows-authorization) > in the Keycloak documentation. ### Microsoft Entra ID See [Refresh tokens](https://docs.microsoft.com/en-us/azure/active-directory/develop/refresh-tokens) as documented in the Microsoft Azure documentation. 5. Include group claims in the ID tokens. Following are some details and links to help you get started. ### Okta > 1. Navigate to your authorization server under **Security** > **API**. > 2. Go to **Claims** and configure a claim for groups. > 3. [Add a Groups claim for the org authorization server](https://developer.okta.com/docs/guides/customize-tokens-groups-claim/main/#request-a-token-that-contains-the-custom-claim). ### Keycloak > Configure a new **Group Membership mapper**. Make sure to have **Full > group path** enabled. ### Microsoft Entra ID To add groups claims, navigate to **App registrations** > **Token configuration** and follow these instructions [Configuring group claims and app roles in tokens](https://learn.microsoft.com/en-us/security/zero-trust/develop/configure-tokens-group-claims-app-roles) in the Microsoft Azure documentation. 6. Assign users to the client application in your IdP. If you are using groups to control access to Confluent Control Center , you will assign users to the groups in the group configuration steps below. 7. Get the IdP endpoints. You can use the OpenID provider configuration response to get the identity provider endpoints required to fetch, authorize, and verify tokens. * Token endpoint URL (`token_endpoint`) * Authorization endpoint URL (`authorization_endpoint`) * JSON Web Key Set (JWKS) URL (`jwks_uri`) * Issuer URL (`issuer`) Use the OIDC metadata discovery URI listed below for your IdP to get the following IdP endpoints and save them for later use: ### Okta > ```html > https:///oauth2/default/.well-known/openid-configuration > ``` > For more information, see [/.well-known/openid-configuration [Okta > documentation]](https://developer.okta.com/docs/reference/api/oidc/#well-known-openid-configuration). ### Keycloak > ```html > https:///realms//.well-known/openid-configuration > ``` > For more information, see [Using OpenID Connect to secure applications > and services](https://www.keycloak.org/docs/latest/securing_apps/#_oidc). ### Microsoft Entra ID ```html https://login.microsoftonline.com//v2.0/.well-known/openid-configuration ``` For more information, see [OpenID Connect authentication with Azure Active Directory](https://learn.microsoft.com/en-us/azure/active-directory/architecture/auth-oidc#implement-oidc-with-azure-ad) and [OpenID Connect on the Microsoft identity platform](https://learn.microsoft.com/en-us/azure/active-directory/develop/v2-protocols-oidc) 8. Get the client credential details From the client application you created in the IdP, get the following client credentials and save them for later use: * Client ID (`client_id`) * Client secret (`client_secret`) 9. Configure IdP client credentials and endpoints. On each Confluent Server broker node, add or update the following parameters in the Confluent Platform broker configuration file using the endpoints obtained in step 7 above. ```properties confluent.oidc.idp.issuer= confluent.oidc.idp.jwks.endpoint.uri= confluent.oidc.idp.authorize.base.endpoint.uri= confluent.oidc.idp.token.base.endpoint.uri= confluent.oidc.idp.client.id= confluent.oidc.idp.client.secret= ``` 10. Configure groups in your IdP. In Confluent Platform you can use group authorization to control user access to any resource, for example, a Confluent Platform cluster or a topic. To support this behavior, you must create groups and assign users to them in your IdP. 11. Add the following `confluent.oidc.idp.groups.claim.name` parameter to the Confluent Platform broker configuration file on each Confluent Server broker. ```properties confluent.oidc.idp.groups.claim.name=groups ``` The `confluent.oidc.idp.groups.claim.name` is required and must match the value of the groups claim configured in your IdP. The default value is `groups`, but it should match the claim value on your IdP setup. If the values do not match, problems occur during authorization. 12. For KRaft clusters, you must add the following parameter to each Confluent Platform controller configuration file for the listener that is used for inter-broker communication. ```properties listener.name.${listenerName}.principal.builder.class=io.confluent.kafka.security.authenticator.OAuthKafkaPrincipalBuilder ``` The `io.confluent.kafka.security.authenticator.OAuthKafkaPrincipalBuilder` parameter enables administration requests to process the group extraction logic. Without this parameter, group-based authorization does not work. In a Confluent Platform cluster, the `DefaultPrincipalBuilder` creates a `KafkaPrincipal` that does not include groups. This becomes evident during interactions between the Confluent Control Center and Confluent Server brokers. For example, when using the `KafkaAdminClient` to retrieve topics, the `DefaultPrincipalBuilder` produces the `KafkaPrincipal`. In contrast, the `OAuthKafkaPrincipalBuilder` employs an `OAuthBearer` token to generate a `KafkaPrincipal`. This builder also incorporates groups, allowing them to be passed to the authorizer where necessary. By using the `OAuthKafkaPrincipalBuilder` the `KafkaPrincipal` can include groups. ## Security and connection requirements All connections to Confluent Cloud are encrypted using [Transport Layer Security (TLS)](../security/encrypt/tls.md#manage-data-in-transit-with-tls) and require specific client configurations for successful connection. For comprehensive information about TLS encryption, SNI requirements, certificate management, and client prerequisites, see [Client Configuration Properties](client-configs.md#client-producer-consumer-config-recs-cc). You can use Apache Kafka® clients to write distributed applications and microservices that read, write, and process streams of events in parallel, at scale, and in a fault-tolerant manner, even those related to network problems or machine failures. The Kafka client library provides functions, classes, and utilities that you can use to create Kafka [producer](../_glossary.md#term-producer) clients and [consumer](../_glossary.md#term-consumer) clients using your choice of programming languages. The primary way to build production-ready producers and consumers is by using a programming language and a Kafka client library. The official Confluent supported clients are: * Java: The official Java client library supports the producer, consumer, Streams, and Connect APIs. * [librdkafka](https://docs.confluent.io/platform/current/clients/librdkafka/html/md_INTRODUCTION.html): The librdkafka and the following derived clients libraries only support the admin, producer, and consumer APIs. * C/C++ * Python * Go * .NET * JavaScript When you use the official Confluent-supported clients, you get the same enterprise-level support that you get with the rest of Confluent Platform: * The release cycle for Confluent-provided clients follow the Confluent release cycle, as opposed to the Kafka release cycle. * Confluent Platform maintenance fixes are provided for the 2-3 years (2 years with the Standard Support and 3 years with the Platinum Support) after the initial release of a minor version. Additional open-source and community-developed Kafka client libraries are available for other programming languages. Some of these include Scala, Ruby, Rust, PHP, and Elixir. The core APIs in the Kafka client library are: * Producer API: This API provides classes and methods for creating and sending messages to Kafka topics. It allows developers to specify message payloads, keys, and metadata and to control message delivery and acknowledgment. * Consumer API: This API provides classes and methods for consuming messages from Kafka topics. It allows developers to subscribe to one or more topics, receive messages in batches or individually, and process messages using custom logic. * Streams API: This API provides a high-level abstraction for building real-time data processing applications that consume, transform, and produce data streams from Kafka topics. * Connector API: This API provides a framework for building connectors that can transfer data between Kafka topics and external data systems, such as databases, message queues, and cloud storage services. * Admin API: This API provides functions for managing Kafka topics, partitions, and configurations. It allows developers to create, delete, and update topics and retrieve metadata about Kafka clusters and brokers. In addition to these core APIs, the Kafka client library includes various tools and utilities for configuring and monitoring Kafka clients and clusters, handling errors and exceptions and optimizing client performance and scalability. # Configure Clients * [Consumer](consumer.md) * [A Quick Consumer Review](consumer.md#a-quick-consumer-review) * [Consumer groups](consumer.md#consumer-groups) * [Groups and rebalance protocols](consumer.md#groups-and-rebalance-protocols) * [Overview of the rebalance protocols](consumer.md#overview-of-the-rebalance-protocols) * [How a group’s rebalance protocol is determined](consumer.md#how-a-group-s-rebalance-protocol-is-determined) * [Upgrading or switching consumer protocols](consumer.md#upgrading-or-switching-consumer-protocols) * [Before migrating to the consumer rebalance protocol](consumer.md#before-migrating-to-the-consumer-rebalance-protocol) * [How to do a rolling deployment](consumer.md#how-to-do-a-rolling-deployment) * [How to do an empty group restart](consumer.md#how-to-do-an-empty-group-restart) * [Offset management](consumer.md#offset-management) * [Committing offsets and reset policy](consumer.md#committing-offsets-and-reset-policy) * [Auto-commit offsets](consumer.md#auto-commit-offsets) * [Manual commit API](consumer.md#manual-commit-api) * [Asynchronous commits](consumer.md#asynchronous-commits) * [Dealing with commit failures and rebalances](consumer.md#dealing-with-commit-failures-and-rebalances) * [Sync vs async: safety and performance tradeoffs](consumer.md#sync-vs-async-safety-and-performance-tradeoffs) * [Coordinating offset commits with external systems](consumer.md#coordinating-offset-commits-with-external-systems) * [Exactly-once processing and transactions](consumer.md#exactly-once-processing-and-transactions) * [Kafka consumer configuration](consumer.md#ak-consumer-configuration) * [Core configuration properties](consumer.md#core-configuration-properties) * [Consumer rebalance protocol configuration](consumer.md#consumer-rebalance-protocol-configuration) * [Configure a consumer for classic rebalance protocol](consumer.md#configure-a-consumer-for-classic-rebalance-protocol) * [Configure partition assignment](consumer.md#configure-partition-assignment) * [For the new consumer rebalance protocol](consumer.md#for-the-new-consumer-rebalance-protocol) * [For the classic rebalance protocol](consumer.md#for-the-classic-rebalance-protocol) * [Message handling](consumer.md#message-handling) * [Kafka consumer group tool](consumer.md#ak-consumer-group-tool) * [List consumer groups](consumer.md#list-consumer-groups) * [Describe groups](consumer.md#describe-groups) * [Reset offsets](consumer.md#reset-offsets) * [Consumer examples](consumer.md#consumer-examples) * [Related content](consumer.md#related-content) * [Share Consumers](sharegroups.md) * [Key capabilities and differences from consumer groups](sharegroups.md#key-capabilities-and-differences-from-consumer-groups) * [Configuration for client developers](sharegroups.md#configuration-for-client-developers) * [Manage share groups using the kafka-share-groups tool](sharegroups.md#manage-share-groups-using-the-kafka-share-groups-tool) * [Monitoring Share Groups](sharegroups.md#monitoring-share-groups) * [Using the KafkaShareConsumer for Share Groups](sharegroups.md#using-the-kafkashareconsumer-for-share-groups) * [Configuration](sharegroups.md#configuration) * [Subscribing to Topics](sharegroups.md#subscribing-to-topics) * [Polling for Records and Liveness](sharegroups.md#polling-for-records-and-liveness) * [Record Delivery and Acknowledgement](sharegroups.md#record-delivery-and-acknowledgement) * [Implicit Acknowledgement](sharegroups.md#implicit-acknowledgement) * [Explicit Acknowledgement](sharegroups.md#explicit-acknowledgement) * [Multithreaded Processing](sharegroups.md#multithreaded-processing) * [Transactional Records and Isolation Level](sharegroups.md#transactional-records-and-isolation-level) * [Producer](producer.md) * [Kafka Producer Configuration](producer.md#ak-producer-configuration) * [Core Configuration](producer.md#core-configuration) * [Message Durability](producer.md#message-durability) * [Message Ordering](producer.md#message-ordering) * [Batching and Compression](producer.md#batching-and-compression) * [Queuing Limit](producer.md#queuing-limit) * [Producer examples](producer.md#producer-examples) * [Learn More](producer.md#learn-more) * [Configuration Properties](client-configs.md) * [Recommendations](client-configs.md#recommendations) * [Transport Layer Security (TLS) and Connection Requirements](client-configs.md#transport-layer-security-tls-and-connection-requirements) * [TLS SNI extension requirement](client-configs.md#tls-sni-extension-requirement) * [Manage TLS certificates](client-configs.md#manage-tls-certificates) * [Client Prerequisites and Version Requirements](client-configs.md#client-prerequisites-and-version-requirements) * [JVM settings for Java clients](client-configs.md#jvm-settings-for-java-clients) * [Cluster upgrades and error handling](client-configs.md#cluster-upgrades-and-error-handling) * [Client configuration properties](client-configs.md#client-configuration-properties) * [Why tuning client configurations is important](client-configs.md#why-tuning-client-configurations-is-important) * [Configuration categories](client-configs.md#configuration-categories) * [Configuration properties](client-configs.md#configuration-properties) * [Before you modify properties](client-configs.md#before-you-modify-properties) * [Common properties](client-configs.md#common-properties) * [Producer properties](client-configs.md#producer-properties) * [Consumer properties](client-configs.md#consumer-properties) * [OpenId Connect (OIDC) and token retry behavior](client-configs.md#openid-connect-oidc-and-token-retry-behavior) * [Java Client](client-configs.md#java) * [Schema Registry Java Client](client-configs.md#sr-java) * [JavaScript Client for Kafka](client-configs.md#nodejs-for-ak) * [Schema Registry JavaScript Client](client-configs.md#sr-nodejs) * [librdkafka derived (non-Java) clients](client-configs.md#librdkafka-derived-non-java-clients) ### Confluent for Kubernetes For details about the supported Kubernetes environments, refer to [Confluent for Kubernetes Supported Environments](https://docs.confluent.io/operator/current/co-plan.html#supported-environments-and-prerequisites). The following table summarizes the Confluent Platform features supported with Confluent for Kubernetes. | Confluent Platform 8.1 Feature | Availability in CFK 3.00\* | |---------------------------------------|--------------------------------------| | Kafka Broker | Available, only via Confluent Server | | Schema Registry | Available | | REST Proxy | Available | | ksqlDB | Available | | Connect | Available | | Control Center | Available | | Replicator | Available | | Security: Role-based Access Control | Available [2] | | Security: Authentication | Available [3] | | Security: Network Encryption | Available | | Structured Audit Logs | Available [4] | | MDS-based Access Control Lists (ACLs) | Available | | Secrets Protection | Available [5] | | Schema Validation | Available | | FIPS | Available | | Multi-region Clusters | Available | | Tiered Storage | Available | | Self-Balancing Clusters | Available | | Auto Data Balancer | Use Self-Balancing Clusters | | Confluent REST API | Available | | Cluster Registry | Not Available | | Cluster Linking | Available | | Health+ | Available | - [1] Confluent Control Center is a separate download. See [Installation](/control-center/current/installation/overview.html). - [2] Only available for new installations. - [3] Supports SASL/Plain and mTLS for Kafka authentication. Does not support Kerberos or SASL/Scram. - [4] Supported through [Kafka configuration overrides](https://docs.confluent.io/operator/current/co-configure.html). See [Use Properties Files to Configure Audit Logs in Confluent Platform](../security/compliance/audit-logs/audit-logs-properties-config.md#audit-logs-properties-config) for the properties you need to set in config overrides. Does not support centrally managed Audit Logs. - [5] Kubernetes Secrets are integrated. CFK does not enable you to use Confluent Secret Protection. # Metadata Service Configuration Settings To enable the [Metadata Service](../../security/authorization/rbac/overview.md#metadata-service) (also known as the [Confluent Server Authorizer](../../security/csa-introduction.md#confluent-server-authorizer)), the broker configuration in the `server.properties` file must set `authorizer.class.name` to `io.confluent.kafka.security.authorizer.ConfluentServerAuthorizer`. To retain ACLs (that have already been enabled) and enable RBAC, set `confluent.authorizer.access.rule.providers=ZK_ACL,CONFLUENT`. For more details about how to configure RBAC, refer to [Enable RBAC for Authorization on a Running Cluster in Confluent Platform](../../security/authorization/rbac/enable-rbac-running-cluster.md#enable-rbac-running-cluster). RBAC supports the following Kafka configurations of the Metadata Service (MDS) back end, which you can override by using the prefixes specified below: * [Topic configurations](../../installation/configuration/topic-configs.md#cp-config-topics) used for creating the security metadata topics (`confluent.metadata.topic.`) * [Administration Client configurations](../../installation/configuration/admin-configs.md#cp-config-admin) used for creating administration clients (`confluent.metadata.admin.`) * [Consumer Coordinator configurations](../../installation/configuration/consumer-configs.md#cp-config-consumer) used for creating consumers (`confluent.metadata.coordinator.`) * [Producer configurations](../../installation/configuration/producer-configs.md#cp-config-producer) used for creating producers (`confluent.metadata.producer.`) * [HTTP configurations](#https-configs-for-ssl) used for connecting to MDS over HTTPS (`confluent.metadata.server.ssl.`) * [Centralized Audit Log configurations](../../security/compliance/audit-logs/mds-config-for-centralized-audit-logs.md#mds-config-for-centralized-audit-logs) used to provide API endpoints to register a list of the Kafka clusters in an organization and to centrally manage the audit log configurations of those clusters (`confluent.security.event.logger.destination.admin.`). ## ksqlDB videos See the latest videos on Confluent Platform ksqlDB and Confluent Cloud ksqlDB at the [Confluent YouTube channel](https://www.youtube.com/channel/UCmZz-Gj3caLLzEWBtbYUXaA). | Video | Description | |-----------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------| | [Flink vs Kafka Streams/ksqlDB](https://www.youtube.com/watch?v=Wqko7MunKZs) | Jeff Bean and Matthias Sax compare stream processing tools. | | [Build a Plant Monitoring System with ksqlDB](https://www.youtube.com/watch?v=yi-KOg2LSY4) | Robin Moffatt’s quick videos about ksqlDB, based on demo scripts that you can run for yourself. | | [Apache Kafka 101: ksqlDB](https://www.youtube.com/watch?v=Da6MwowCGHo) | Tim Berglund provides a gentle introduction to ksqlDB concepts and queries. | | [Confluent Cloud Quick Start, ksqlDB, and Project Reactor (Redux)](https://www.youtube.com/watch?v=xorcbmFDwYA) | Viktor Gamov provisions Kafka, Connect, and ksqlDB clusters in Confluent Cloud and accesses them with the ksqlDB Reactor client. | | [Demo: The Event Streaming Database in Action](https://www.youtube.com/watch?v=D5QMqapzX8o) | Tim Berglund builds a movie rating system with ksqlDB to write movie records into a Kafka topic. | | [Demo: Seamless Stream Processing with Kafka Connect & ksqlDB](https://www.youtube.com/watch?v=4odZGWl-yZo) | Set up and build ksqlDB applications using the AWS source, Azure sink, and MongoDB source connectors in Confluent Cloud. | | [Introduction to ksqlDB and stream processing](https://www.youtube.com/watch?v=-kFU6mCnOFw) | Vish Srinivasan talks Kafka stream processing fundamentals and discusses ksqlDB. | | [Ask Confluent #16: ksqlDB edition](https://www.youtube.com/watch?v=SHKjuN2iXyk) | Gwen Shapira hosts Vinoth Chandar in a wide-ranging talk on ksqlDB. | | [An introduction to ksqlDB](https://www.youtube.com/watch?v=7mGBxG2NhVQ) | Robin Moffatt describes how ksqlDB helps you build scalable and fault-tolerant stream processing systems. | | [ksqlDB and the Kafka Connect JDBC Sink](https://www.youtube.com/watch?v=ad02yDTAZx0) | Robin Moffatt demonstrates how to use ksqlDB with the Connect JDBC sink. | | [How to Transform a Stream of Events Using ksqlDB](https://www.youtube.com/watch?v=PaHv4fGq-9k) | Viktor Gamov demonstrates how to transform a stream of movie data. | | [ksqlDB Java Client and Confluent Cloud](https://www.youtube.com/watch?v=6mBY_GL_D5g) | Viktor Gamov takes the ksqlDB Java client for a spin and tests it against Confluent Cloud. | ## Known limitations and best practices - When deleting a cluster link, first check that all mirror topics are in the `STOPPED` state. If any are in the `PENDING_STOPPED` state, deleting a cluster link can cause irrecoverable errors on those mirror topics due to a temporary limitation. - In Confluent Platform 7.1 and later, REST API calls to list and get source-initiated cluster links will have their destination cluster IDs returned under the parameter `destination_cluster_id`, or with Confluent CLI v4 as `destination_cluster`. (This is a change from previous releases, where these were returned under `source_cluster_id`.) - For Confluent Platform in general, you should not use unauthenticated listeners. For Cluster Linking, this is even more important because Cluster Linking can access the listeners. As a best practice, always configure authentication on listeners. To learn more, see the [Enable Security for a KRaft-Based Cluster in Confluent Platform](../../security/security_tutorial.md#security-tutorial), the [Authentication in Confluent Platform](../../security/authentication/overview.md#authentication-overview), and the listener configuration examples in the brokers for the various protocols such as [SASL](../../security/authentication/overview.md#kafka-sasl-auth) and [Use TLS Authentication in Confluent Platform](../../security/authentication/mutual-tls/overview.md#kafka-ssl-authentication). See also, [Manage Security for Cluster Linking on Confluent Platform](security.md#cluster-link-security). - All TLS/SSL key stores, trust stores and Kerberos keytab files must be stored at the same location on each broker in a given cluster. If not, cluster links may fail. Alternatively, you can [configure a PEM certificate in-line](https://cwiki.apache.org/confluence/display/KAFKA/KIP-651+-+Support+PEM+format+for+SSL+certificates+and+private+key) on the cluster link configuration. - Cluster link configurations stored in files (TLS/SSL key stores, trust stores, Kerberos keytab files) should not be stored in `/tmp` because `/tmp` files may get deleted, leaving links and mirrors in a bad state on some brokers. - Confluent Control Center will only display mirror topics correctly if the Confluent Platform cluster and Control Center are connected to a [REST Proxy API v3](../../kafka-rest/api.md#rest-proxy-v3). If not connected to the v3 Confluent REST API, Control Center will display mirror topics as regular topics, which can lead to showing features that are not actually available on mirror topics; for example, producing messages or editing configurations. To learn how to configure these clusters for the v3 REST API, see [Required Configurations for Control Center](configs.md#cluster-linking-configs-c3). - Prerequisites are provided per tutorial or use case because these differ depending on the context. Tutorials are provided on [topic data sharing](topic-data-sharing.md#tutorial-topic-data-sharing) and [Tutorial: Link Confluent Platform and Confluent Cloud Clusters](hybrid-cp.md#cluster-link-hybrid-cp). Additional requirements for secure setups are provided in [Manage Security for Cluster Linking on Confluent Platform](security.md#cluster-link-security). - Cluster Linking has not yet been fully tested to mirror topics that contain records produced using the Kafka transactions feature. Therefore, using Cluster Linking to mirror such topics is not supported and not recommended. - [Cluster Linking for Confluent Platform](#cluster-linking) between a source cluster running Confluent Platform 7.0.x or earlier (non-KRaft) and a destination cluster running in KRaft mode is not supported. Link creation may succeed, but the connection will ultimately fail (with a `SOURCE_UNAVAILABLE` error message). To work around this issue, make sure the source cluster is running Confluent Platform version 7.1.0 or later. If you have links from a Confluent Platform source cluster to a Confluent Cloud destination cluster, you must upgrade your source clusters to Confluent Platform 7.1.0 or later to avoid this issue. - ACL migration (ACL sync), previously available in Confluent Platform 6.0.0 through 6.2.x, was removed in Confluent Platform 7.0.0 due to a security vulnerability, then re-introduced in Confluent Platform 7.1.0 with the vulnerability resolved. If you are using ACL migration in your pre-7.1.0 deployments, you should disable it or upgrade to 7.1.x. To learn more, see [Authorization (ACLs)](security.md#cluster-link-acls). - Any customer-owned firewall that allows the cluster link connection from source cluster brokers to destination cluster brokers must allow the TCP connection to persist in order for Cluster Linking to work. - Prefixing is not supported in 7.1.0. For more information, see the note at the top of this section: [Prefix Mirror Topics and Consumer Group Names](mirror-topics-cp.md#cluster-link-prefix-concepts). - Cluster Linking cannot replicate messages that use the v0 or v1 message format from the earliest versions of Kafka. Cluster Linking can replicate messages in the v2 format (introduced in Apache Kafka® v 0.11) and later. If Cluster Linking encounters a message with the v0 or v1 format, it will fail that mirror topic; that is, it will transition to a FAILED state and stop replication for that topic. To replicate a topic that contains messages in the v0 or v1 format, either begin replication for that topic after the last message in the v0 or v1 format, using the cluster link configuration `mirror.start.offset.spec`, or use [Confluent Replicator](../replicator/index.md#replicator-detail) to replicate topics and messages. - An issue exists where [consumer group offsets](mirror-topics-cp.md#mirror-topics-consumer-offsets) that are deleted on the destination cluster (especially auto-deleted) persist, instead of being removed as expected. (Under the hood, the offsets are being re-replicated to the destination before retention settings delete the offsets from source. This results in extended retention of inactive consumer group offsets.) To prevent this from happening, you can extend retention on the destination to make sure data is deleted on the source before it is deleted on the destination. To do this, increase `offsets.retention.minutes` on destination cluster by at least double `offsets.retention.check.interval.ms`. - Cluster Linking does not support the use of a proxy for authentication to the cluster. For supported security configurations, see [Manage Security for Cluster Linking on Confluent Platform](security.md#cluster-link-security). ### Other Kafka Clients The objective of this tutorial is to learn about Avro and Schema Registry centralized schema management and compatibility checks. To keep examples simple, this tutorial focuses on Java producers and consumers, but other Kafka clients work in similar ways. For examples of other Kafka clients interoperating with Avro and Schema Registry: * [Other client languages](/platform/current/clients/index.html#kafka-clients) * [Configure ksqlDB for Avro](/platform/current/ksqldb/operate-and-deploy/installation/avro-schema.html) * [Kafka Streams](/platform/current/streams/developer-guide/datatypes.html#streams-data-avro) * [Kafka Connect](/platform/current/schema-registry/connect.html#schemaregistry-kafka-connect) * [Confluent REST Proxy](/platform/current/kafka-rest/api.html#post-topic-string-avro) ## Features Confluent Schema Registry currently supports all Kafka security features, including: * Encryption * [TLS/SSL encryption](../../security/protect-data/encrypt-tls.md#encryption-ssl-schema-registry) with a secure Kafka cluster * [End-user REST API calls over HTTPS](#schema-registry-http-https) * Authentication * [Open Authentication (OAuth)](oauth-schema-registry.md#schemaregistry-oauth) for Schema Registry server * [TLS/SSL authentication](../../security/authentication/mutual-tls/overview.md#authentication-ssl-schema-registry) with a secure Kafka Cluster * [SASL authentication](../../security/authentication/overview.md#kafka-sasl-auth) with a secure Kafka Cluster * Jetty authentication as described in [Role-Based Access Control](rbac-schema-registry.md#schemaregistry-rbac) steps * Authorization (provided through the [Schema Registry Security Plugin for Confluent Platform](../../confluent-security-plugins/schema-registry/introduction.md#confluentsecurityplugins-schema-registry-security-plugin)) * [Role-Based Access Control](rbac-schema-registry.md#schemaregistry-rbac) * [Schema Registry ACL Authorizer for Confluent Platform](../../confluent-security-plugins/schema-registry/authorization/sracl_authorizer.md#confluentsecurityplugins-sracl-authorizer) * [Schema Registry Topic ACL Authorizer for Confluent Platform](../../confluent-security-plugins/schema-registry/authorization/topicacl_authorizer.md#confluentsecurityplugins-topicacl-authorizer) * [Schema Registry Authorization (reference of supported operations and resource URIs)](../../confluent-security-plugins/schema-registry/authorization/index.md#confluentsecurityplugins-schema-registry-authorization) For configuration details, check the [configuration options](../installation/config.md#schemaregistry-config). # Secure Deployment for Kafka Streams in Confluent Platform Kafka Streams natively integrates with the Apache Kafka® [security features](../../security/overview.md#security) and supports all of the client-side security features in Kafka. Kafka Streams leverages the [Java Producer and Consumer API](../../clients/overview.md#kafka-clients). To secure your Stream processing applications, configure the security settings in the corresponding Kafka producer and consumer clients, and then specify the corresponding configuration settings in your Kafka Streams application. Kafka supports cluster encryption and authentication, including a mix of authenticated and unauthenticated, and encrypted and non-encrypted clients. Using security is optional. Here a few relevant client-side security features: Encrypt data-in-transit between your applications and Kafka brokers : You can enable the encryption of the client-server communication between your applications and the Kafka brokers. For example, you can configure your applications to always use encryption when reading and writing data to and from Kafka. This is critical when reading and writing data across security domains such as internal network, public internet, and partner networks. Client authentication : You can enable client authentication for connections from your application to Kafka brokers. For example, you can define that only specific applications are allowed to connect to your Kafka cluster. Client authorization : You can enable client authorization of read and write operations by your applications. For example, you can define that only specific applications are allowed to read from a Kafka topic. You can also restrict write access to Kafka topics to prevent data pollution or fraudulent activities. For more information about the security features in Kafka, see [Kafka Security](../../security/overview.md#security) and the blog post [Apache Kafka Security 101](http://www.confluent.io/blog/apache-kafka-security-authorization-authentication-encryption). ## Quick Start This quick start uses the FTPS Sink connector to export data produced by the Avro console producer to FTPS directory. 1. Start all the necessary services using Confluent CLI. ```bash confluent local start ``` Every service will start in order, printing a message with its status: ```bash Starting Zookeeper Zookeeper is [UP] Starting Kafka Kafka is [UP] Starting Schema Registry Schema Registry is [UP] Starting Kafka REST Kafka REST is [UP] Starting Connect Connect is [UP] Starting KSQL Server KSQL Server is [UP] Starting Control Center Control Center is [UP] ``` 2. Next, start the Avro console producer to import a few records to Kafka: ```bash ./bin/kafka-avro-console-producer --broker-list localhost:9092 --topic test_ftps_sink \ --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"f1","type":"string"}]}' ``` 3. In the console producer, enter the following: ```bash {"f1": "value1"} {"f1": "value2"} {"f1": "value3"} ``` The three records entered are published to the Kafka topic `test_ftps_sink` in Avro format. 4. Configure your connector by first creating a `.properties` file named `quickstart-ftps.properties` with the following properties. ```bash # substitute <> with your information name=FTPSConnector connector.class=io.confluent.connect.ftps.FtpsSinkConnector key.converter=io.confluent.connect.avro.AvroConverter key.converter.schema.registry.url=http://localhost:8081 value.converter=io.confluent.connect.avro.AvroConverter value.converter.schema.registry.url=http://localhost:8081 tasks.max=3 topics=test_ftps_sink confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 confluent.license= format.class=io.confluent.connect.ftps.sink.format.avro.AvroFormat flush.size=100 ftps.host= ftps.port= ftps.username= ftps.password= ftps.working.dir= ftps.ssl.key.password= ftps.ssl.keystore.location= ftps.ssl.keystore.password= ftps.ssl.truststore.location= ftps.ssl.truststore.password= ``` 5. Start the connector by loading its configuration: ```bash confluent local load ftps-sink --config etc/kafka-connect-ftps/quickstart-ftps.properties ``` 6. Confirm that the connector is in a `RUNNING` state. ```bash confluent local status FTPSConnector ``` 7. After some time, check that the data is available in the FTPS working directory: You should see a file with name `/test_ftps_sink/partition=0/test_ftps_sink+0+0000000000.avro`. The file name is encoded as `topic+kafkaPartition+startOffset+endOffset.format`. To extract the content of the file, you can use `avro-tools-1.8.2.jar` (available in the [Apache Archives](https://archive.apache.org/dist/avro/avro-1.8.2/java/)). 8. Move `avro-tools-1.8.2.jar` to the FTPS working directory and run the following command: ```bash java -jar avro-tools-1.8.2.jar tojson //test_ftps_sink/partition=0/test_ftps_sink+0+0000000000.avro ``` You should see the following output: ```bash {"f1":"value1"} {"f1":"value2"} {"f1":"value3"} ``` ## Quick start This quick start uses the HDFS connector to export data produced by the Avro console producer to HDFS and assumes the following: - You have started the required services with the default configurations and you should make necessary changes according to the actual configurations used. - Security is not configured for HDFS and Hive metastore. To make the necessary security configurations, see [Secure HDFS and Hive metastore](#secure-hdfs-hive-metastore). Before you start Confluent Platform, make sure Hadoop is running locally or remotely and that you know the HDFS URL. For Hive integration, you need to have Hive installed and to know the metastore thrift URI. You also need to ensure the connector user has write access to the directories specified in `topics.dir` and `logs.dir`. The default value of `topics.dir` is `/topics` and the default value of `logs.dir` is `/logs`, if you don’t specify the two configurations, make sure that the connector user has write access to `/topics` and `/logs`. You may need to create `/topics` and `/logs` before running the connector as the connector usually don’t have write access to `/`. Complete the following steps: 1. Start all the necessary services using the Confluent CLI. If not already in your PATH, add Confluent’s `bin` directory by running: `export PATH=/bin:$PATH` ```bash confluent local start ``` Every service will start in order, printing a message with its status: ```bash Starting Zookeeper Zookeeper is [UP] Starting Kafka Kafka is [UP] Starting Schema Registry Schema Registry is [UP] Starting Kafka REST Kafka REST is [UP] Starting Connect Connect is [UP] Starting KSQL Server KSQL Server is [UP] Starting Control Center Control Center is [UP] ``` 2. Start the Avro console producer to import a few records to Kafka: ```bash ./bin/kafka-avro-console-producer --broker-list localhost:9092 --topic test_hdfs \ --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"f1","type":"string"}]}' ``` 3. In the console producer, enter the following: ```bash {"f1": "value1"} {"f1": "value2"} {"f1": "value3"} ``` The three records entered are published to the Kafka topic `test_hdfs` in Avro format. 4. Before starting the connector, ensure the configurations in `etc/kafka-connect-hdfs/quickstart-hdfs.properties` are properly set to your configurations of Hadoop (for example, `hdfs.url` points to the proper HDFS and using FQDN in the host). Then start connector by loading its configuration with the following command. Note that you must include a double dash (`--`) between the topic name and your flag. For more information, see [this post](https://unix.stackexchange.com/questions/11376/what-does-double-dash-mean-also-known-as-bare-double-dash). ```bash confluent local load hdfs-sink --config etc/kafka-connect-hdfs/quickstart-hdfs.properties { "name": "hdfs-sink", "config": { "connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector", "tasks.max": "1", "topics": "test_hdfs", "hdfs.url": "hdfs://localhost:9000", "flush.size": "3", "name": "hdfs-sink" }, "tasks": [] } ``` 5. Verify the connector started successfully by viewing the Connect worker’s log: ```bash confluent local services connect log ``` Towards the end of the log you should see that the connector starts, logs a few messages, and then exports data from Kafka to HDFS. Once the connector finishes ingesting data to HDFS, check that the data is available in HDFS: ```bash hadoop fs -ls /topics/test_hdfs/partition=0 ``` You should see a file with name `/topics/test_hdfs/partition=0/test_hdfs+0+0000000000+0000000002.avro` The file name is encoded as `topic+kafkaPartition+startOffset+endOffset.format`. You can use `avro-tools-1.8.2.jar` (available in [Apache mirrors](https://archive.apache.org/dist/avro/avro-1.8.2/java/avro-tools-1.8.2.jar)) to extract the content of the file. Run `avro-tools` directly on Hadoop as: ```bash hadoop jar avro-tools-1.8.2.jar tojson \ hdfs:///topics/test_hdfs/partition=0/test_hdfs+0+0000000000+0000000002.avro where "" is the HDFS name node hostname. Or, if you experience issues, first copy the avro file from HDFS to the local filesystem and try again with Java: .. codewithvars:: bash hadoop fs -copyToLocal /topics/test_hdfs/partition=0/test_hdfs+0+0000000000+0000000002.avro \ /tmp/test_hdfs+0+0000000000+0000000002.avro java -jar avro-tools-1.8.2.jar tojson /tmp/test_hdfs+0+0000000000+0000000002.avro You should see the following output: .. codewithvars:: bash {"f1":"value1"} {"f1":"value2"} {"f1":"value3"} ``` 6. Stop the Kafka Connect worker as well as all the rest of Confluent Platform by running: ```bash confluent local stop ``` Your output should resemble: ```none Stopping Control Center Control Center is [DOWN] Stopping KSQL Server KSQL Server is [DOWN] Stopping Connect Connect is [DOWN] Stopping Kafka REST Kafka REST is [DOWN] Stopping Schema Registry Schema Registry is [DOWN] Stopping Kafka Kafka is [DOWN] Stopping Zookeeper Zookeeper is [DOWN] ``` You may also stop all the services and wipe out any data generated during this quick start by running the following command: ```bash confluent local destroy ``` Your output should resemble: ```bash Stopping Control Center Control Center is [DOWN] Stopping KSQL Server KSQL Server is [DOWN] Stopping Connect Connect is [DOWN] Stopping Kafka REST Kafka REST is [DOWN] Stopping Schema Registry Schema Registry is [DOWN] Stopping Kafka Kafka is [DOWN] Stopping Zookeeper Zookeeper is [DOWN] Deleting: /var/folders/ty/rqbqmjv54rg_v10ykmrgd1_80000gp/T/confluent.PkQpsKfE ``` Note that if you want to run the quick start with Hive integration, before starting the connector, you need to add the following configurations to `etc/kafka-connect-hdfs/quickstart-hdfs.properties`: ```text hive.integration=true hive.metastore.uris=thrift uri to your Hive metastore schema.compatibility=BACKWARD ``` After the connector finishes ingesting data to HDFS, you can use Hive to check the data: ```text $hive>SELECT * FROM test_hdfs; ``` If you leave the `hive.metastore.uris` empty, an embedded Hive metastore will be created in the directory the connector is started. You need to start Hive in that specific directory to query the data. ## Quick start This quick start uses the HDFS 3 Sink connector to export data produced by the Avro console producer to HDFS. Before you start Confluent Platform, ensure the following: - Hadoop is running locally or remotely and that you know the HDFS URL. For Hive integration, you must have Hive installed and know the metastore thrift URI. - The connector user has write access to the directories specified in `topics.dir` and `logs.dir`. The default value of `topics.dir` is `/topics` and the default value of `logs.dir` is `/logs`. If you don’t specify the two configurations, ensure the connector user has write access to `/topics` and `/logs`. You may need to create `/topics` and `/logs` before running the connector, as the connector likely doesn’t have write access to `/`. This quick start assumes that you started the required services with the default configurations; you should make necessary changes according to the actual configurations used. This quick start also assumes that security is not configured for HDFS and Hive metastore. To make the necessary security configurations, see the [Secure HDFS and Hive Metastore](#hdfs3-connector) section. To get started, complete the following steps: 1. Install the connector using the following [CLI command](https://docs.confluent.io/confluent-cli/current/command-reference/connect/plugin/confluent_connect_plugin_install.html): ```bash # run from your Confluent Platform installation directory confluent connect plugin install confluentinc/kafka-connect-hdfs3:latest ``` 2. Start Confluent Platform. ```bash confluent local start ``` 3. [Produce](https://docs.confluent.io/current/cli/command-reference/confluent-produce.html) test Avro data to the `test_hdfs` topic in Kafka. ```bash ./bin/kafka-avro-console-producer --broker-list localhost:9092 --topic test_hdfs \ --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"f1","type":"string"}]}' # paste each of these messages {"f1": "value1"} {"f1": "value2"} {"f1": "value3"} ``` 4. Create a `hdfs3-sink.json` file with the following contents: ```json { "name": "hdfs3-sink", "config": { "connector.class": "io.confluent.connect.hdfs3.Hdfs3SinkConnector", "tasks.max": "1", "topics": "test_hdfs", "hdfs.url": "hdfs://localhost:9000", "flush.size": "3", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "value.converter": "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url":"http://localhost:8081", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1" } } ``` Note that the first few settings are common settings you’ll specify for all connectors. The `topics` parameter specifies the topics to export data from. In this case, `test_hdfs`. The HDFS connection URL, `hdfs.url`, specifies the HDFS to export data to. You should set this according to your configuration. `flush.size` specifies the number of records the connector needs to write before invoking file commits. For high availability HDFS deployments, set `hadoop.conf.dir` to a directory that includes `hdfs-site.xml` and `core-site.xml`. After `hdfs-site.xml` is in place and `hadoop.conf.dir` has been set, `hdfs.url` may be set to the namenodes nameservice ID, such as `nameservice1`. 5. Load the HDFS 3 Sink connector. ```bash confluent local load hdfs3-sink --config hdfs3-sink.json ``` 6. Confirm that the connector is in a `RUNNING` state. ```bash confluent local status hdfs3-sink ``` 7. Validate that the Avro data is in HDFS. ```bash # list files in partition 0 hadoop fs -ls /topics/test_hdfs/partition=0 # the following should appear in the list # /topics/test_hdfs/partition=0/test_hdfs+0+0000000000+0000000002.avro ``` The file name is encoded as `topic+kafkaPartition+startOffset+endOffset.format`. 8. Extract the contents of the file using the [avro-tools-1.8.2.jar](https://repo1.maven.org/maven2/org/apache/avro/avro-tools/1.8.2/avro-tools-1.8.2.jar). ```bash # substitute "" for the HDFS name node hostname hadoop jar avro-tools-1.8.2.jar tojson \ hdfs:///topics/test_hdfs/partition=0/test_hdfs+0+0000000000+0000000002.avro ``` 9. If you experience issues with the previous step, first copy the Avro file from HDFS to the local filesystem and try again with java. ```bash hadoop fs -copyToLocal /topics/test_hdfs/partition=0/test_hdfs+0+0000000000+0000000002.avro \ /tmp/test_hdfs+0+0000000000+0000000002.avro java -jar avro-tools-1.8.2.jar tojson /tmp/test_hdfs+0+0000000000+0000000002.avro # expected output {"f1":"value1"} {"f1":"value2"} {"f1":"value3"} ``` If you want to run the quick start with Hive integration, add the following configurations to `hdfs-sink.json`: ```text "hive.integration": "true", "hive.metastore.uris": "" "schema.compatibility": "BACKWARD" ``` After the connector finishes ingesting data to HDFS, you can use Hive to check the data: ```text beeline -e "SELECT * FROM test_hdfs;" ``` If the `hive.metastore.uris` setting is empty, an embedded Hive metastore is created in the directory the connector is started in. Start Hive in that specific directory to query the data. ## Quick Start The following uses the `S3SinkConnector` to write a file from the Kafka topic named `s3_topic` to S3. Then, the `S3SourceConnector` loads that Avro file from S3 to the Kafka topic named `copy_of_s3_topic`. 1. Follow the instructions from [the S3 Sink connector quick start](https://docs.confluent.io/kafka-connect-s3-sink/current/overview.html#quick-start) to set up the data to use below. 2. Install the connector through the [Confluent Hub Client](https://docs.confluent.io/current/connect/managing/confluent-hub/client.html). ```bash # run from your Confluent Platform installation directory confluent connect plugin install confluentinc/kafka-connect-s3-source:latest ``` 3. Create a `quickstart-s3source.properties` file with the following contents or use the `quickstart-s3source.properties`.: ```properties name=s3-source tasks.max=1 connector.class=io.confluent.connect.s3.source.S3SourceConnector s3.bucket.name=confluent-kafka-connect-s3-testing format.class=io.confluent.connect.s3.format.avro.AvroFormat confluent.license= confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 ``` 4. Edit the `quickstart-s3source.properties` to add the following properties: ```properties transforms=AddPrefix transforms.AddPrefix.type=org.apache.kafka.connect.transforms.RegexRouter transforms.AddPrefix.regex=.* transforms.AddPrefix.replacement=copy_of_$0 ``` #### IMPORTANT Adding this renames the output of topic of the messages to `copy_of_s3_topic`. This prevents a continuous feedback loop of messages. 5. Load the Backup and Restore S3 Source connector. ```bash confluent local load s3-source --config quickstart-s3source.properties ``` #### IMPORTANT Don’t use the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/index.html) in production environments. 6. Confirm that the connector is in a `RUNNING` state. ```bash confluent local status s3-source ``` 7. Confirm that the messages are being sent to Kafka. ```bash kafka-avro-console-consumer \ --bootstrap-server localhost:9092 \ --property schema.registry.url=http://localhost:8081 \ --topic copy_of_s3_topic \ --from-beginning | jq '.' ``` 8. The response should be 18 records as follows. ```bash {"f1": "value1"} {"f1": "value2"} {"f1": "value3"} {"f1": "value4"} {"f1": "value5"} {"f1": "value6"} {"f1": "value7"} {"f1": "value8"} {"f1": "value9"} {"f1": "value1"} {"f1": "value2"} {"f1": "value3"} {"f1": "value4"} {"f1": "value5"} {"f1": "value6"} {"f1": "value7"} {"f1": "value8"} {"f1": "value9"} ``` ## Quick Start This quick start uses the SFTP Sink Connector to export data produced by the Avro console producer to SFTP directory. First, start all the necessary services using Confluent CLI. ```bash confluent local start ``` Every service will start in order, printing a message with its status: ```bash Starting Zookeeper Zookeeper is [UP] Starting Kafka Kafka is [UP] Starting Schema Registry Schema Registry is [UP] Starting Kafka REST Kafka REST is [UP] Starting Connect Connect is [UP] Starting KSQL Server KSQL Server is [UP] Starting Control Center Control Center is [UP] ``` Next, start the Avro console producer to import a few records to Kafka: ```bash ./bin/kafka-avro-console-producer --broker-list localhost:9092 --topic test_sftp_sink \ --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"f1","type":"string"}]}' ``` In the console producer, enter the following: ```bash {"f1": "value1"} {"f1": "value2"} {"f1": "value3"} ``` The three records entered are published to the Kafka topic `test_sftp_sink` in Avro format. Before starting the connector, ensure the configurations in `etc/kafka-connect-sftp/quickstart-sftp.properties` are properly set to your configurations of SFTP (for example, `sftp.hostname` must point to the proper SFTP host). Then, start connector by loading its configuration with the following command: ```bash confluent local load sftp-sink --config etc/kafka-connect-sftp/quickstart-sftp.properties { "name": "sftp-sink", "config": { "topics": "test_sftp_sink", "tasks.max": "1", "connector.class": "io.confluent.connect.sftp.SftpSinkConnector", "confluent.topic.bootstrap.servers": "localhost:9092", "partitioner.class": "io.confluent.connect.storage.partitioner.DefaultPartitioner", "schema.generator.class": "io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator", "flush.size": "3", "schema.compatibility": "NONE", "format.class": "io.confluent.connect.sftp.sink.format.avro.AvroFormat", "storage.class": "io.confluent.connect.sftp.sink.storage.SftpSinkStorage", "sftp.host": "localhost", "sftp.port": "2222", "sftp.username": "foo", "sftp.password": "pass", "sftp.working.dir": "/share", "name": "sftpconnector" }, "tasks": [] } ``` To check that the connector started successfully, view the Connect worker’s log by entering: ```bash confluent local services connect log ``` Towards the end of the log you should see that the connector starts, logs a few messages, and then exports data from Kafka to SFTP. Once the connector finishes ingesting data to SFTP, check that the data is available in the SFTP working directory: You should see a file with name `/topics/test_sftp_sink/partition=0/test_sftp_sink+0+0000000000.avro` The file name is encoded as `topic+kafkaPartition+startOffset+endOffset.format`. To extract the contents of the file, use `avro-tools-1.8.2.jar` (available in the [Apache Archives](http://archive.apache.org/dist/avro/avro-1.8.2/java/avro-tools-1.8.2.jar)). Move `avro-tools-1.8.2.jar` to SFTP’s working directory and run the following command: ```bash java -jar avro-tools-1.8.2.jar tojson //topics/test_sftp_sink/partition=0/test_sftp_sink+0+0000000000.avro ``` You should see the following output: ```bash {"f1":"value1"} {"f1":"value2"} {"f1":"value3"} ``` Finally, stop the Connect worker as well as all the rest of Confluent Platform by running: ```bash confluent local stop ``` Your output should resemble: ```none Stopping Control Center Control Center is [DOWN] Stopping KSQL Server KSQL Server is [DOWN] Stopping Connect Connect is [DOWN] Stopping Kafka REST Kafka REST is [DOWN] Stopping Schema Registry Schema Registry is [DOWN] Stopping Kafka Kafka is [DOWN] Stopping Zookeeper Zookeeper is [DOWN] ``` *Or*, stop all the services and additionally wipe out any data generated during this quick start by running: ```bash confluent local destroy ``` Your output should resemble: ```bash Stopping Control Center Control Center is [DOWN] Stopping KSQL Server KSQL Server is [DOWN] Stopping Connect Connect is [DOWN] Stopping Kafka REST Kafka REST is [DOWN] Stopping Schema Registry Schema Registry is [DOWN] Stopping Kafka Kafka is [DOWN] Stopping Zookeeper Zookeeper is [DOWN] Deleting: /var/folders/ty/rqbqmjv54rg_v10ykmrgd1_80000gp/T/confluent.PkQpsKfE ``` ### Property-based example 1. Create a `snmp-trap-source-quickstart.properties` file with the following contents or use the `snmp-trap-source-quickstart.properties`. This configuration is used typically along with [standalone workers](/platform/current/connect/concepts.html#standalone-workers).: ```properties name=SnmpTrapSourceConnector tasks.max=1 connector.class=io.confluent.connect.snmp.SnmpTrapSourceConnector kafka.topic=snmp-kafka-topic snmp.batch.size=50 snmp.listen.address= snmp.listen.port= snmp.v3.enabled=true v3.security.context.users= v3.$username.auth.password= v3.$username.privacy.password= confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 confluent.license= ``` The following defines the Confluent license stored in Kafka, so you need the Kafka bootstrap addresses. The `replication.factor` may not be larger than the number of Kafka brokers in the destination cluster, so here set this to a value of 1 for demonstration purposes. Always set this to a value of at least 3 in production configurations. 2. Load the SNMP Trap Source connector. ```bash confluent local load snmp-trap-source --config snmp-trap-source-quickstart.properties ``` It’s important that you don’t use the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/index.html) in production environments. 3. Confirm that the connector is in a `RUNNING` state. ```bash confluent local status snmp-trap-source ``` 4. The SNMP device should be running and generating PDUs. The connector will listen and push PDUs of type trap to a Kafka topic. 5. Confirm that the messages are being sent to Kafka. ```bash kafka-avro-console-consumer --bootstrap-server localhost:9092 --property schema.registry.url=http://localhost:8081 --topic snmp-kafka-topic --from-beginning ``` A sample SNMP PDU of type trap might look like this for `sysDescr` Oid. Refer - [https://www.alvestrand.no/objectid/1.3.6.1.2.1.1.1.html](https://www.alvestrand.no/objectid/1.3.6.1.2.1.1.1.html): ```bash TRAP[ { contextEngineID=80:00:00:59:03:78:d2:94:b8:9f:95, contextName= }, requestID=2058388122, errorStatus=0, errorIndex=0, VBS[ 1.3.6.1.2.1.1.1.0 = 24-Port Gigabit Smart Switch with PoE and 4 SFP uplinks ] ] ``` Data in Kafka topic: ```bash { "peerAddress":"127.0.0.1/55159", "securityName":"admin", "variables":[ { "oid":"1.3.6.1.2.1.1.1.0", "type":"octetString", "counter32":null, "counter64":null, "gauge32":null, "integer":null, "ipaddress":null, "null":null, "objectIdentifier":null, "octetString":null, "opaque":null, "timeticks":null, "metadata":{ "string":"24-Port Gigabit Smart Switch with PoE and 4 SFP uplinks" } }] } ``` ## Enable monitoring You must enable monitoring explicitly on each ksqlDB server. To enable it in a Docker-based deployment, export an environment variable named `KSQL_JMX_OPTS` with your JMX configuration and expose the port that JMX will communicate over. The following Docker Compose example shows how you can configure monitoring for ksqlDB server. The surrounding components, like the broker and CLI, are omitted for brevity. You can see an example of a complete setup in the [ksqlDB Quick Start](../quickstart.md#ksqldb-quick-start). ```yaml ksqldb-server: image: confluentinc/cp-ksqldb-server:8.1.0 hostname: ksqldb-server container_name: ksqldb-server depends_on: - broker - schema-registry ports: - "8088:8088" - "1099:1099" environment: KSQL_LISTENERS: "http://0.0.0.0:8088" KSQL_BOOTSTRAP_SERVERS: "broker:9092" KSQL_KSQL_SCHEMA_REGISTRY_URL: "http://schema-registry:8081" KSQL_KSQL_LOGGING_PROCESSING_STREAM_AUTO_CREATE: "true" KSQL_KSQL_LOGGING_PROCESSING_TOPIC_AUTO_CREATE: "true" KSQL_KSQL_QUERY_PULL_METRICS_ENABLED: "true" KSQL_JMX_OPTS: > -Djava.rmi.server.hostname=localhost -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=1099 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.rmi.port=1099 ``` With respect to monitoring, here it what this does: - The environment variable `KSQL_JMX_OPTS` is supplied to the server with various arguments. The `>` character lets you write a multi-line string in Yaml, which makes this long argument easier to read. The advertised hostname, port, and security settings are configured. JMX has a wide range of [configuration options](https://docs.oracle.com/javase/8/docs/technotes/guides/management/agent.html), and you can set these however you like. - Port `1099` is exposed, which corresponds to the JMX port set in the `KSQL_JMX_OPTS` configuration. This enables remote monitoring tools to communicate into ksqlDB’s process. ## Pre-flight checks Before going through the tutorial, check that the environment has started correctly. If any of these pre-flight checks fails, consult the [Troubleshooting the scripted demo](teardown.md#cp-demo-troubleshooting) section. 1. Verify the status of the Docker containers show `Up` state. ```bash docker compose ps ``` Your output should resemble: ```text NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS alertmanager confluentinc/cp-enterprise-alertmanager:2.2.0 "alertmanager-start" alertmanager 2 hours ago Up 2 hours 0.0.0.0:9093->9093/tcp, [::]:9093->9093/tcp connect localbuild/connect:8.0.0-8.0.0 "/etc/confluent/dock…" connect 2 hours ago Up 2 hours (healthy) 0.0.0.0:8083->8083/tcp, [::]:8083->8083/tcp control-center confluentinc/cp-enterprise-control-center-next-gen:2.2.0 "/etc/confluent/dock…" control-center 2 hours ago Up 2 hours (healthy) 0.0.0.0:9021-9022->9021-9022/tcp, [::]:9021-9022->9021-9022/tcp elasticsearch docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.0 "/tini -- /usr/local…" elasticsearch 2 hours ago Up 2 hours (healthy) 0.0.0.0:9200->9200/tcp, [::]:9200->9200/tcp, 0.0.0.0:9300->9300/tcp, [::]:9300->9300/tcp kafka1 confluentinc/cp-server:8.0.0 "bash -c 'if [ ! -f …" kafka1 2 hours ago Up 2 hours (healthy) 0.0.0.0:8091->8091/tcp, [::]:8091->8091/tcp, 0.0.0.0:9091->9091/tcp, [::]:9091->9091/tcp, 0.0.0.0:10091->10091/tcp, [::]:10091->10091/tcp, 0.0.0.0:11091->11091/tcp, [::]:11091->11091/tcp, 0.0.0.0:12091->12091/tcp, [::]:12091->12091/tcp kafka2 confluentinc/cp-server:8.0.0 "bash -c 'if [ ! -f …" kafka2 2 hours ago Up 2 hours (healthy) 0.0.0.0:8092->8092/tcp, [::]:8092->8092/tcp, 0.0.0.0:9092->9092/tcp, [::]:9092->9092/tcp, 0.0.0.0:10092->10092/tcp, [::]:10092->10092/tcp, 0.0.0.0:11092->11092/tcp, [::]:11092->11092/tcp, 0.0.0.0:12092->12092/tcp, [::]:12092->12092/tcp kibana docker.elastic.co/kibana/kibana-oss:7.10.0 "/usr/local/bin/dumb…" kibana 2 hours ago Up 2 hours (healthy) 0.0.0.0:5601->5601/tcp, [::]:5601->5601/tcp ksqldb-cli confluentinc/cp-ksqldb-cli:8.0.0 "/bin/sh" ksqldb-cli 2 hours ago Up 2 hours ksqldb-server confluentinc/cp-ksqldb-server:8.0.0 "/etc/confluent/dock…" ksqldb-server 2 hours ago Up 2 hours (healthy) 0.0.0.0:8088-8089->8088-8089/tcp, [::]:8088-8089->8088-8089/tcp openldap osixia/openldap:1.3.0 "/container/tool/run…" openldap 2 hours ago Up 2 hours 389/tcp, 636/tcp prometheus confluentinc/cp-enterprise-prometheus:2.2.0 "prometheus-start" prometheus 2 hours ago Up 2 hours 0.0.0.0:9090->9090/tcp, [::]:9090->9090/tcp restproxy confluentinc/cp-kafka-rest:8.0.0 "/etc/confluent/dock…" restproxy 2 hours ago Up 2 hours 0.0.0.0:8086->8086/tcp, [::]:8086->8086/tcp schemaregistry confluentinc/cp-schema-registry:8.0.0 "/etc/confluent/dock…" schemaregistry 2 hours ago Up 2 hours (healthy) 0.0.0.0:8085->8085/tcp, [::]:8085->8085/tcp streams-demo cnfldemos/cp-demo-kstreams:0.0.12 "/app/start.sh" streams-demo 2 hours ago Up 2 hours 9092/tcp tools cnfldemos/tools:0.3 "/bin/bash" tools 2 hours ago Up 2 hours ``` 2. Jump to the end of the entire `cp-demo` pipeline and view the Kibana dashboard at [http://localhost:5601/app/dashboards#/view/Overview](http://localhost:5601/app/dashboards#/view/Overview) . This is a cool view and validates that the `cp-demo` start script completed successfully. ![image](tutorials/cp-demo/images/kibana-dashboard.png) 3. View the full Confluent Platform configuration in the [docker-compose.yml](https://github.com/confluentinc/cp-demo/tree/latest/docker-compose.yml) file. 4. View the Kafka Streams application configuration in the [client configuration](https://github.com/confluentinc/cp-demo/tree/latest/env_files/streams-demo.env) file, set with security parameters to the Kafka cluster and Schema Registry. ### Authorization with RBAC 1. Verify which users are configured to be super users. ```bash docker compose logs kafka1 | grep "super.users =" ``` Your output should resemble the following. Notice this authorizes each service name which authenticates as itself, as well as the unauthenticated `PLAINTEXT` which authenticates as `ANONYMOUS` (for demo purposes only): ```bash kafka1 | super.users = User:admin;User:mds;User:superUser;User:ANONYMOUS ``` 2. From the Confluent Control Center UI, in the Administration menu, click the **Manage role assignments** option. Click **Assignments**, and then the Kafka cluster ID. 3. From the **Topic** list, verify that the LDAP user `appSA` is allowed to access a few topics, including any topic whose name starts with **wikipedia**. This role assignment was done during `cp-demo` startup in the [create-role-bindings.sh script](https://github.com/confluentinc/cp-demo/tree/latest/scripts/helper/create-role-bindings.sh). 4. Verify that LDAP user `appSA` (which is not a super user) can consume messages from topic `wikipedia.parsed`. Notice that it is configured to authenticate to brokers with mTLS and authenticate to Schema Registry with LDAP. ```bash docker compose exec connect kafka-avro-console-consumer \ --bootstrap-server kafka1:11091,kafka2:11092 \ --consumer-property security.protocol=SSL \ --consumer-property ssl.truststore.location=/etc/kafka/secrets/kafka.appSA.truststore.jks \ --consumer-property ssl.truststore.password=confluent \ --consumer-property ssl.keystore.location=/etc/kafka/secrets/kafka.appSA.keystore.jks \ --consumer-property ssl.keystore.password=confluent \ --consumer-property ssl.key.password=confluent \ --property schema.registry.url=https://schemaregistry:8085 \ --property schema.registry.ssl.truststore.location=/etc/kafka/secrets/kafka.appSA.truststore.jks \ --property schema.registry.ssl.truststore.password=confluent \ --property basic.auth.credentials.source=USER_INFO \ --property basic.auth.user.info=appSA:appSA \ --group wikipedia.test \ --topic wikipedia.parsed \ --max-messages 5 ``` 5. Verify that LDAP user `badapp` cannot consume messages from topic `wikipedia.parsed`. ```bash docker compose exec connect kafka-avro-console-consumer \ --bootstrap-server kafka1:11091,kafka2:11092 \ --consumer-property security.protocol=SSL \ --consumer-property ssl.truststore.location=/etc/kafka/secrets/kafka.badapp.truststore.jks \ --consumer-property ssl.truststore.password=confluent \ --consumer-property ssl.keystore.location=/etc/kafka/secrets/kafka.badapp.keystore.jks \ --consumer-property ssl.keystore.password=confluent \ --consumer-property ssl.key.password=confluent \ --property schema.registry.url=https://schemaregistry:8085 \ --property schema.registry.ssl.truststore.location=/etc/kafka/secrets/kafka.badapp.truststore.jks \ --property schema.registry.ssl.truststore.password=confluent \ --property basic.auth.credentials.source=USER_INFO \ --property basic.auth.user.info=badapp:badapp \ --group wikipedia.test \ --topic wikipedia.parsed \ --max-messages 5 ``` Your output should resemble: ```bash ERROR [Consumer clientId=consumer-wikipedia.test-1, groupId=wikipedia.test] Topic authorization failed for topics [wikipedia.parsed] org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to access topics: [wikipedia.parsed] ``` 6. Create role bindings to permit `badapp` client to consume from topic `wikipedia.parsed` and its related subject in Schema Registry. Get the Kafka cluster ID: ```none KAFKA_CLUSTER_ID=$(curl -s https://localhost:8091/v1/metadata/id --tlsv1.2 --cacert scripts/security/snakeoil-ca-1.crt | jq -r ".id") ``` Create the role bindings: ```text # Create the role binding for the topic ``wikipedia.parsed`` docker compose exec tools bash -c "confluent iam rbac role-binding create \ --principal User:badapp \ --role ResourceOwner \ --resource Topic:wikipedia.parsed \ --kafka-cluster-id $KAFKA_CLUSTER_ID" # Create the role binding for the group ``wikipedia.test`` docker compose exec tools bash -c "confluent iam rbac role-binding create \ --principal User:badapp \ --role ResourceOwner \ --resource Group:wikipedia.test \ --kafka-cluster-id $KAFKA_CLUSTER_ID" # Create the role binding for the subject ``wikipedia.parsed-value``, i.e., the topic-value (versus the topic-key) docker compose exec tools bash -c "confluent iam rbac role-binding create \ --principal User:badapp \ --role ResourceOwner \ --resource Subject:wikipedia.parsed-value \ --kafka-cluster-id $KAFKA_CLUSTER_ID \ --schema-registry-cluster schema-registry" ``` 7. Verify that LDAP user `badapp` now can consume messages from topic `wikipedia.parsed`. ```bash docker compose exec connect kafka-avro-console-consumer \ --bootstrap-server kafka1:11091,kafka2:11092 \ --consumer-property security.protocol=SSL \ --consumer-property ssl.truststore.location=/etc/kafka/secrets/kafka.badapp.truststore.jks \ --consumer-property ssl.truststore.password=confluent \ --consumer-property ssl.keystore.location=/etc/kafka/secrets/kafka.badapp.keystore.jks \ --consumer-property ssl.keystore.password=confluent \ --consumer-property ssl.key.password=confluent \ --property schema.registry.url=https://schemaregistry:8085 \ --property schema.registry.ssl.truststore.location=/etc/kafka/secrets/kafka.badapp.truststore.jks \ --property schema.registry.ssl.truststore.password=confluent \ --property basic.auth.credentials.source=USER_INFO \ --property basic.auth.user.info=badapp:badapp \ --group wikipedia.test \ --topic wikipedia.parsed \ --max-messages 5 ``` 8. View all the role bindings that were configured for RBAC in this cluster. ```bash ./scripts/validate/validate_bindings.sh ``` 9. Because the Kafka cluster is configured for [SASL](../../security/authentication/sasl/plain/overview.md#kafka-sasl-auth-plain), any administrative commands must authenticate directly with the Kafka brokers. This authentication is provided via a client properties file specified with the `--command-config` flag on the command-line tool itself. For example, to run a command like the [consumer throttle script](https://github.com/confluentinc/cp-demo/tree/latest/scripts/app/throttle_consumer.sh), you must include this flag pointing to a file with the correct security credentials. This replaces the previous method of relying on a pre-configured `KAFKA_OPTS` environment variable on a broker container. Consequently, the command is no longer restricted to running on a specific container like `kafka1` or `kafka2` and can be executed from any machine that has the configuration file and network access to the brokers. 10. Next step: Learn more about security with the [Security Tutorial](../../security/security_tutorial.md#security-tutorial). ### Configure clients from the Confluent CLI For [Confluent CLI](https://docs.confluent.io/confluent-cli/current/overview.html) frequent users, once you have set up context in the CLI, you can use one-line command [confluent kafka client-config create](https://docs.confluent.io/confluent-cli/current/command-reference/kafka/client-config/index.html#confluent-kafka-client-config) to create a configuration file for connecting your client apps to Confluent Cloud. The following table lists client languages, corresponding language ID, and whether the language supports Confluent Cloud Schema Registry configuration. For languages that support Confluent Cloud Schema Registry configuration, you can optionally configure it for your client apps by passing Schema Registry information via the flags to the command. | Language | Language ID | Support for Confluent Cloud Schema Registry | Notes | |-------------|---------------|-----------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------| | Clojure | `clojure` | No | | | C/C++ | `cpp` | No | See examples: [C/C++ examples (librdkafka)](https://github.com/edenhill/librdkafka/tree/master/examples) | | C# | `csharp` | No | | | Go | `go` | Yes | See examples: [confluent-kafka-go/examples](https://github.com/confluentinc/confluent-kafka-go/tree/master/examples) | | Groovy | `groovy` | No | | | Java | `java` | Yes | | | Kotlin | `kotlin` | No | | | Ktor | `ktor` | Yes | | | JavaScript | `javascript` | Yes | See examples: [confluent-kafka-javascript/examples](https://github.com/confluentinc/confluent-kafka-javascript/tree/master/examples) | | Python | `python` | Yes | See examples: [confluent-kafka-python/examples](https://github.com/confluentinc/confluent-kafka-python/tree/master/examples) | | REST API | `restapi` | Yes | | | Ruby | `ruby` | No | | | Rust | `rust` | No | | | Scala | `scala` | No | | | Spring Boot | `springboot` | Yes | | Prerequisites: : - [Access to Confluent Cloud](https://www.confluent.io/confluent-cloud/) with an active cluster. - [Install the Confluent CLI](https://docs.confluent.io/confluent-cli/current/install.html). 1. Log in to your cluster using the [confluent login](https://docs.confluent.io/confluent-cli/current/command-reference/confluent_login.html) command with the cluster URL specified. ```none confluent login ``` ```none Enter your Confluent Cloud credentials: Email: susan@myemail.com Password: ``` 2. Set the Confluent Cloud [environment](../security/access-control/hierarchy/cloud-environments.md#cloud-environments). 1. Get the environment ID. ```none confluent environment list ``` Your output should resemble: ```none Id | Name +--------------+--------------------+ * t2703 | default env-abc123 | demo-env-102893 env-xyz123 | ccloud-demo env-wxy123 | data-lineage-demo env-abc12d | my-new-environment ``` 2. Set the environment using the ID (``). ```none confluent environment use ``` Your output should resemble: ```none Now using "env-xyz123" as the default (active) environment. ``` 3. Set the cluster to use. 1. Get the cluster ID. ```none confluent kafka cluster list ``` Your output should resemble: ```none Id | Name | Type | Cloud | Region | Availability | Status +-------------+-----------+-------+----------+----------+--------------+--------+ lkc-oymmj | cluster_1 | BASIC | gcp | us-east4 | single-zone | UP * lkc-7k6kj | cluster_0 | BASIC | gcp | us-east1 | single-zone | UP ``` 2. Set the cluster using the ID (``). This is the cluster where the commands are run. ```none confluent kafka cluster use ``` To verify the selected cluster after setting it, type `confluent kafka cluster list` again. The selected cluster will have an asterisk (`*`) next to it. 4. Create an API key and secret, and save them. You can generate the API key on the Confluent CLI or from the Confluent Cloud Console. Be sure to save the API key and secret. ### Confluent CLI 1. Run the following command to create the API key and secret, using the ID (``). > ```bash > confluent api-key create --resource > ``` > Your output should resemble: > ```none > It may take a couple of minutes for the API key to be ready. > Save the API key and secret. The secret is not retrievable later. > +---------+------------------------------------------------------------------+ > | API Key | ABC123xyz | > | Secret | 123xyzABC123xyzABC123xyzABC123xyzABC123xyzABC123xyzABC123xyzABCx | > +---------+------------------------------------------------------------------+ > ``` For more information, see [Use API Keys to Authenticate to Confluent Cloud](../security/authenticate/workload-identities/service-accounts/api-keys/overview.md#cloud-api-key-resource). ### Confluent Cloud Console > 1. In the console, click the **Kafka API keys** tab and click **Create key**. > Save the key and secret, then click the checkbox next to **I have saved my API key and secret > and am ready to continue.** > ![image](images/cloud-api-key-confirm.png) > 2. Add the API secret with `confluent api-key store `. When you create an API > key with the CLI, it is automatically stored locally. However, when you create an API key using > the console, API, or with the CLI on another machine, the secret is not available for CLI use until > you store it. This is required because secrets cannot be retrieved after creation. > ```bash > confluent api-key store --resource > ``` > For more information, see [Use API Keys to Authenticate to Confluent Cloud](../security/authenticate/workload-identities/service-accounts/api-keys/overview.md#cloud-api-key-resource). 5. Set the API key to use for Confluent CLI commands, using the ID (``). 1. Create a client configuration file for the language of your choice, using language ID (``). Then, copy and paste the displayed configuration into your client application source code. See [Client Language Table](#client-language-table) for a list of language IDs and whether the language supports Schema Registry configuration. - For languages that do NOT support Schema Registry configuration, run the following command: ```bash confluent kafka client-config create ``` - For languages that support Schema Registry configuration, run the following command: ```bash confluent kafka client-config create \ --schema-registry-api-key \ --schema-registry-api-secret ``` * For tips and recommendations for configuring resilient clients, see [Client Configuration Settings for Confluent Cloud](client-configs.md#client-producer-consumer-config-recs-cc). * For more information about using the CLI, see [confluent kafka client-config create](https://docs.confluent.io/confluent-cli/current/command-reference/kafka/client-config/index.html#confluent-kafka-client-config). ## Features The Amazon DynamoDB CDC Source connector includes the following features: * **IAM user authentication**: The connector supports authenticating to DynamoDB using IAM user access credentials. * **Provider integration support**: The connector supports IAM role-based authorization using Confluent Provider Integration. For more information about provider integration setup, see the [IAM roles authentication](#cc-amazon-dynamodb-cdc-source-setup-connection). * **Customizable API endpoints**: The connector allows you to specify an AWS DynamoDB API and Resource Group Tag API endpoint. * **Kafka cluster authentication customization**: The connector supports authenticating a Kafka cluster using API keys and/or service accounts. * **Snapshot mode customization**: The connector allows you to configure either of the following modes for snapshots: - **SNAPSHOT**: Only allows a one-time scan (Snapshot) of the existing data in the source tables simultaneously. - **CDC**: Only allows CDC with DynamoDB streams without initial snapshot for all streams simultaneously. - **SNAPSHOT_CDC** (Default): Allows an initial snapshot of all configured tables and once the snapshot is complete, starts CDC streaming using DynamoDB streams. * **Seamless table streaming**: The connector support the following two modes to provide seamless table streaming: - **TAG_MODE**: Auto-discover multiple DynamoDB tables and stream simultaneously (that is, `dynamodb.table.discovery.mode` is set to `TAG` - **INCLUDELIST_MODE**: Explicitly specify/select specific multiple DynamoDB table names and stream simultaneously (that is, `dynamodb.table.discovery.mode` is set to `INCLUDELIST`). * **Automatic topic creation**: The connector supports the auto-creation of topics with the name of the table, with a customer-provided prefix and suffix using [TopicRegexRouter Single Message Transformation (SMT)](/platform/current/connect/transforms/topicregexrouter.html). * **Supported data formats**: The connector supports Avro, Protobuf, and JSON Schema output formats. [Schema Registry](../get-started/schema-registry.md#cloud-sr-config) must be enabled to use a Schema Registry-based format (for example, Avro, JSON Schema, or Protobuf). For more information, see [Schema Registry Enabled Environments](limits.md#connect-ccloud-environment-limits). * **Schema management**: The connector supports Schema Registry, Schema Context and Reference Subject Naming Strategy. * **AWS DynamoDB scanning capabilities**: The connector includes the following scanning capabilities: - **Parallel Scans**: A DynamoDB table can be logically divided into multiple segments. The connector will divide the table into five logical segments and scan these segments in parallel. - **Pagination**: Tables are scanned sequentially, and a scan result’s response can fetch no more than 1 MB of data. Since tables can be large, scan request responses are paginated. With each response, a `LastEvaluatedKey` is returned. This `LastEvaluatedKey` from a scan response should be used as the `ExclusiveStartKey` for the next scan request. If no `LastEvaluatedKey` is returned, it indicates that the end of the result set has been reached. - **Non-isolated scans**: To scan an entire table, the task continues making multiple subsequent scan requests by submitting the appropriate `exclusiveStartKey` with each request. For a large table, this process may take hours, and the snapshot will capture items as they are at the time a scan request is made, and not from when the snapshot operation was started. - **Eventual Consistency**: By default, a scan uses eventually consistent reads when accessing items in a table. Therefore, the results from a consistent scan may not reflect the latest item changes at the time the scan iterates through each item in the table. A snapshot, on the other hand, only captures data from committed transactions. As a result, a scan will not include data from ongoing uncommitted transactions. Additionally, a snapshot does not need to manage or track any ongoing transactions on a DynamoDB table. * **Custom offset support**: The connector allows you to configure [custom offsets](offsets.md#connect-custom-offsets) using the Confluent Cloud user interface to prevent data loss and data duplication. * **Tombstone event and record deletion management**: The connector allows you to manage tombstone events and deleted records. Note that when the connector detects a delete event, it creates two event messages: - A delete event message with `op` type `d` and `document` field with the table primary key: ```json { "op": "d" "key": { "id": "5028" }, "value": { "document": "5028" }, } ``` - A tombstone record with Kafka Record Key as the table primary key value and Kafka Record Value as `null`: ```json { "key": { "id": "5028" }, "value": null } ``` Note that Kafka log compaction uses this to know that it can delete all messages for this key. **Tombstone message sample** ```json { "topic": "table1", "key": { "id": "5028" }, "value": null, "partition": 0, "offset": 1 } ``` * **Lease table prefix customization**: The connector supports naming lease tables with a prefix. ### Using HTTPS Requests You can communicate with your hosted ksqlDB cluster by using the [ksqlDB REST API](/platform/current/ksqldb/developer-guide/ksqldb-rest-api/index.html). Run the following `curl` command to send a POST request to the `ksql` endpoint. In this example, the request runs the LIST STREAMS statement and the response contains details about the streams in the ksqlDB cluster. - Specify `--basic` authentication in the **Accept** header of your request. - Send your ksqlDB-specific API key and secret, separated by a colon, as the `--user` credentials. ```bash curl --http1.1 \ -X "POST" "https:///ksql" \ -H "Accept: application/vnd.ksql.v1+json" \ -H "Content-Type: application/json" \ --basic --user ":" \ -d $'{ "ksql": "LIST STREAMS;", "streamsProperties": {} }' ``` Your output should resemble: ```json [ { "@type": "streams", "statementText": "LIST STREAMS;", "streams": [ { "type": "STREAM", "name": "KSQL_PROCESSING_LOG", "topic": "pksqlc-zz321-processing-log", "keyFormat": "KAFKA", "valueFormat": "JSON", "isWindowed": false } ], "warnings": [] } ] ``` For more information, see [ksqlDB API](/platform/current/ksqldb/developer-guide/ksqldb-rest-api/index.html). For an example that shows fully-managed Confluent Cloud connectors in action with Confluent Cloud for Apache Flink, see the [Cloud ETL Demo](/platform/current/tutorials/examples/cloud-etl/docs/index.html). This example also shows how to use Confluent CLI to manage your resources in Confluent Cloud. [![image](images/topology.png)](https://docs.confluent.io/platform/current/tutorials/examples/cloud-etl/docs/index.html) ## Flags ```none --bootstrap string Kafka cluster endpoint (Confluent Cloud) or a comma-separated list of broker hosts, each formatted as "host" or "host:port" (Confluent Platform). --group string Consumer group ID. (default "confluent_cli_consumer_") -b, --from-beginning Consume from beginning of the topic. --offset int The offset from the beginning to consume from. --partition int32 The partition to consume from. (default -1) --key-format string Format of message key as "string", "avro", "double", "integer", "jsonschema", or "protobuf". Note that schema references are not supported for Avro. (default "string") --value-format string Format message value as "string", "avro", "double", "integer", "jsonschema", or "protobuf". Note that schema references are not supported for Avro. (default "string") --print-key Print key of the message. --print-offset Print partition number and offset of the message. --full-header Print complete content of message headers. --delimiter string The delimiter separating each key and value. (default "\t") --timestamp Print message timestamp in milliseconds. --config strings A comma-separated list of configuration overrides ("key=value") for the consumer client. For a full list, see https://docs.confluent.io/platform/current/clients/librdkafka/html/md_CONFIGURATION.html --config-file string The path to the configuration file for the consumer client, in JSON or Avro format. --schema-registry-endpoint string Endpoint for Schema Registry cluster. --api-key string API key. --api-secret string API secret. --schema-registry-context string The Schema Registry context under which to look up schema ID. --schema-registry-api-key string Schema registry API key. --schema-registry-api-secret string Schema registry API secret. --cluster string Kafka cluster ID. --context string CLI context name. --environment string Environment ID. --certificate-authority-path string File or directory path to one or more Certificate Authority certificates for verifying the broker's key with SSL. --username string SASL_SSL username for use with PLAIN mechanism. --password string SASL_SSL password for use with PLAIN mechanism. --cert-location string Path to client's public key (PEM) used for SSL authentication. --key-location string Path to client's private key (PEM) used for SSL authentication. --key-password string Private key passphrase for SSL authentication. --protocol string Specify the broker communication protocol as "PLAINTEXT", "SASL_SSL", or "SSL". (default "SSL") --sasl-mechanism string SASL_SSL mechanism used for authentication. (default "PLAIN") --client-cert-path string File or directory path to client certificate to authenticate the Schema Registry client. --client-key-path string File or directory path to client key to authenticate the Schema Registry client. ``` ## Control Center features Control Center includes the following pages where you can drill down to view data and configure features in your Kafka environment. The following table lists Control Center pages and what they display depending on the mode for Confluent Control Center. | Control Center feature | Normal mode | Reduced infrastructure mode | |--------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | [Clusters overview](clusters.md#controlcenter-userguide-clusters) | View healthy and unhealthy clusters at a glance and search for a cluster being managed by Control Center. Click on a cluster tile to drill into views of critical metrics and connected services for that cluster. | View healthy and unhealthy clusters, the number of topics, and connected services. | | [Brokers overview](brokers.md#controlcenter-userguide-brokers) | View broker partitioning and replication status, which broker is the active controller, and broker metrics like throughput and more. | Same as Normal mode. To access the Brokers page in Reduced infrastructure mode, Use the **Brokers** navigation menu entry. | | [Topics](topics/overview.md#c3-all-topics) | Add and edit topics, view production and consumption metrics for a topic. Browse, create, and download messages, and manage Schema Registry for topics. | Add and edit topics. Browse, create, download messages, and manage Schema Registry for topics. Note that internal topics are not created in Reduced infrastructure mode. | | [Connect](connect.md#controlcenter-userguide-connect) | Manage, monitor, and configure connectors with [Kafka Connect](/platform/current/connect/index.html#kafka-connect), the toolkit for connecting external systems to Kafka. | Same as Normal mode. | | [ksqlDB](ksql.md#controlcenter-userguide-ksql) | Develop applications against ksqlDB, the streaming SQL engine for Kafka. Use the ksqlDB page in Control Center to: run, view, and terminate SQL queries; browse and download messages from query results; add, describe, and drop streams and tables; and view schemas of available streams and tables in a cluster. | Same as Normal mode. | | [Consumers](clients/consumers.md#controlcenter-userguide-consumers) | View the consumer groups associated with a selected Kafka cluster, including the number of consumers per group, the number of topics being consumed, and consumer lag across all relevant topics. The Consumers feature also contains the redesigned streams monitoring page. | Same as Normal mode. | | [Replicators](replicators.md#controlcenter-userguide-replicators) | Monitor and configure replicated topics and create replica topics that preserve topic configuration in the source cluster. | Configure replicated topics and create replica topics that preserve topic configuration in the source cluster. | | [Cluster Settings](clusters.md#controlcenter-userguide-cluster-settings) | View and edit cluster properties and broker configurations. | Same as Normal mode. | | [Alerts](alerts/concepts.md#concepts-alerts) | Use Alerts to define the trigger criteria for anomalous events that occur during data monitoring and to trigger an alert when those events occur. Set triggers, actions, and view alert history across all of your Control Center clusters. | Use Alerts to define a limited set of triggers that do not rely on monitoring and/or metrics data (Cluster down, consumer lag and consumer lead). | ### Initialization The Consumer is configured using a dictionary in the examples below. If you are running Kafka locally, you can initialize the Consumer as shown below. ```python from confluent_kafka import Consumer conf = {'bootstrap.servers': 'host1:9092,host2:9092', 'group.id': 'foo', 'auto.offset.reset': 'smallest'} consumer = Consumer(conf) ``` If you are connecting to a Kafka cluster in Confluent Cloud, you need to provide credentials for access. The example below shows using a cluster API key and secret. ```python from confluent_kafka import Consumer conf = {'bootstrap.servers': 'pkc-abcd85.us-west-2.aws.confluent.cloud:9092', 'security.protocol': 'SASL_SSL', 'sasl.mechanism': 'PLAIN', 'sasl.username': '', 'sasl.password': '', 'group.id': 'foo', 'auto.offset.reset': 'smallest'} consumer = Consumer(conf) ``` The `group.id` property is mandatory and specifies which consumer group the consumer is a member of. The `auto.offset.reset` property specifies what offset the consumer should start reading from in the event there are no committed offsets for a partition, or the committed offset is invalid (perhaps due to log truncation). The local example below shows `enable.auto.commit` configured to `false` in the consumer. The default value is `True`. ```python from confluent_kafka import Consumer conf = {'bootstrap.servers': 'host1:9092,host2:9092', 'group.id': 'foo', 'enable.auto.commit': 'false', 'auto.offset.reset': 'earliest'} consumer = Consumer(conf) ``` * For information on the available configuration properties, refer to the [API Documentation](/platform/current/clients/confluent-kafka-python/html/index.html). * For a step-by-step tutorial using the Python client including code samples for the producer and consumer see [this guide](https://developer.confluent.io/get-started/python/). #### OAuth2 authentication example It is important to note that the connector’s OAuth2 configuration only allows for use of the Client Credentials grant type. 1. Run the demo app with the `oauth2` Spring profile. ```bash mvn spring-boot:run -Dspring.profiles.active=oauth2 ``` 2. Create a `http-sink.properties` file with the following contents: ```text name=HttpSinkOAuth2 topics=http-messages tasks.max=1 connector.class=io.confluent.connect.http.HttpSinkConnector # key/val converters key.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.storage.StringConverter # licensing for local single-node Kafka cluster confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 # connect reporter required bootstrap server reporter.bootstrap.servers=localhost:9092 reporter.result.topic.name=success-responses reporter.result.topic.replication.factor=1 reporter.error.topic.name=error-responses reporter.error.topic.replication.factor=1 # http sink connector configs http.api.url=http://localhost:8080/api/messages auth.type=OAUTH2 oauth2.token.url=http://localhost:8080/oauth/token oauth2.client.id=kc-client oauth2.client.secret=kc-secret ``` For details about using this connector with Kafka Connect Reporter, see [Connect Reporter](/kafka-connectors/self-managed/userguide.html#userguide-connect-reporter). 3. Run and validate the connector as described in the [Quick start](#http-connector-quickstart). ## Quick Start In this quick start guide, the Marketo Source connector is used to consume records from Marketo entities `leads`, `campaigns`, `activities` entities of types (`activities_add_to_nurture,activities_add_to_opportunity`) and send the records to respective Kafka topics named `marketo_leads`, `marketo_campaigns` and `marketo_activities`. 1. Install the connector through the [Confluent Hub Client](https://docs.confluent.io/current/connect/managing/confluent-hub/client.html). ```bash # run from your confluent platform installation directory confluent connect plugin install confluentinc/kafka-connect-marketo:latest ``` 2. Start the Confluent Platform. ```bash confluent local start ``` 3. Check the status of all services. ```bash confluent local status ``` 4. Configure your connector by first creating a JSON file named `marketo-configs.json` with the following properties. Find the REST API endpoint URL from the process described in [Marketo REST API Quickstart](https://developers.marketo.com/blog/quick-start-guide-for-marketo-rest-api/). This endpoint URL will be used in the `marketo.url` configuration key (as shown in the following example) of the connector, but ensure you remove the path `rest` from the endpoint URL before using it in connector configurations. To see the process of determining the OAuth client ID and OAuth client secret, see [Marketo REST API Quickstart](https://developers.marketo.com/blog/quick-start-guide-for-marketo-rest-api/). `tasks.max` should be 3 here as there are three entity types: `leads`, `campaigns` and `activities`. ```bash // substitute <> with your config { "name": "marketo-connector", "config": { "connector.class": "io.confluent.connect.marketo.MarketoSourceConnector", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable": "false", "confluent.topic.bootstrap.servers": "127.0.0.1:9092", "confluent.topic.replication.factor": 1, "confluent.license": "", // leave it empty for evaluation license "tasks.max": 3, "poll.interval.ms": 1000, "topic.name.pattern": "marketo_${entityName}", "marketo.url": "https://.mktorest.com/", "marketo.since": "2020-07-01T00:00:00+00:00", "entity.names": "activities_add_to_nurture,activities_add_to_opportunity,campaigns,leads", "oauth2.client.id": "", "oauth2.client.secret": "" } } ``` 5. Start the Marketo Source connector by loading the connector’s configuration with the following command: ```bash confluent local load marketo-connector -- -d marketo-configs.json ``` 6. Confirm that the connector is in a `RUNNING` state. ```bash confluent local status marketo-connector ``` 7. Create some `leads`, `activities` and `campaigns` records using [Marketo APIs](https://developers.marketo.com/rest-api/endpoint-reference/). Use POST or Bulk Import APIs of appropriate entities to inject some sample records. 8. Confirm the messages from entities `leads`, `activities`, and `campaigns` were delivered to the `marketo_leads`, `marketo_activities` and `marketo_campaigns` topics respectively, in Kafka. Note, it may take about a minute for assets (`campaigns`) and about 5 minutes or more (depending upon the time Marketo server instance takes to prepare the export file) for export entities (`leads` and `activities`). ```bash confluent local consume marketo_leads -- --from-beginning ``` #### Connector configuration 1. Create your `oracle-cdc-confluent-cloud.json` file based on the following example: ```json { "name": "OracleCDC_Confluent_Cloud", "config":{ "connector.class": "io.confluent.connect.oracle.cdc.OracleCdcSourceConnector", "name": "OracleCDC_Confluent_Cloud", "tasks.max":3, "oracle.server": "", "oracle.sid":"", "oracle.pdb.name":"", "oracle.username": "", "oracle.password": "", "start.from":"snapshot", "redo.log.topic.name": "oracle-redo-log-topic", "redo.log.consumer.bootstrap.servers":"", "redo.log.consumer.sasl.jaas.config":"org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";", "redo.log.consumer.security.protocol":"SASL_SSL", "redo.log.consumer.sasl.mechanism":"PLAIN", "table.inclusion.regex":"", "_table.topic.name.template_":"Using template vars to set change event topic for each table", "table.topic.name.template": "${databaseName}.${schemaName}.${tableName}", "connection.pool.max.size": 20, "confluent.topic.replication.factor":3, "topic.creation.groups":"redo", "topic.creation.redo.include":"oracle-redo-log-topic", "topic.creation.redo.replication.factor":3, "topic.creation.redo.partitions":1, "topic.creation.redo.cleanup.policy":"delete", "topic.creation.redo.retention.ms":1209600000, "topic.creation.default.replication.factor":3, "topic.creation.default.partitions":5, "topic.creation.default.cleanup.policy":"compact", "confluent.topic.bootstrap.servers":"", "confluent.topic.sasl.jaas.config":"org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";", "confluent.topic.security.protocol":"SASL_SSL", "confluent.topic.sasl.mechanism":"PLAIN", "value.converter":"io.confluent.connect.avro.AvroConverter", "value.converter.basic.auth.credentials.source":"USER_INFO", "value.converter.schema.registry.basic.auth.user.info":":", "value.converter.schema.registry.url":"" } } ``` 2. Create `oracle-redo-log-topic`. Make sure the topic name matches the value you put for `"redo.log.topic.name"`. **Confluent Platform CLI** ```text bin/kafka-topics --create --topic oracle-redo-log-topic \ --bootstrap-server broker:9092 --replication-factor 1 \ --partitions 1 --config cleanup.policy=delete \ --config retention.ms=120960000 ``` **Confluent Cloud CLI** ```text confluent kafka topic create oracle-redo-log-topic \ --partitions 1 --config cleanup.policy=delete \ --config retention.ms=120960000 ``` 3. Start the Oracle CDC Source connector with the following command: ```text curl -s -H "Content-Type: application/json" -X POST -d @oracle-cdc-confluent-cloud.json http://localhost:8083/connectors/ | jq ``` ## Quick Start The quick start guide uses ServiceNow Sink connector to consume records from Kafka and send them to a ServiceNow table. This guide assumes multi-tenant environment is used. For local testing, refer to [Running Connect in standalone mode](/kafka-connectors/self-managed/userguide.html#configuring-and-running-workers). 1. Create a table called `test_table` in ServiceNow. ![image](images/servicenow_create_table.png) 2. Define three columns in the table. ![image](images/servicenow_define_columns.png) 3. Install the connector through the [Confluent Hub Client](https://docs.confluent.io/current/connect/managing/confluent-hub/client.html). ```bash # run from your confluent platform installation directory confluent-hub install confluentinc/kafka-connect-servicenow:latest ``` 4. Start the Confluent Platform. ```bash confluent local start ``` 5. Check the status of all services. ```bash confluent local services status ``` 6. Create a `servicenow-sink.json` file with the following contents: #### NOTE All user-defined tables in ServiceNow start with `u_`, ```bash // substitute <> with your config { "name": "ServiceNowSinkConnector", "config": { "connector.class": "io.confluent.connect.servicenow.ServiceNowSinkConnector", "topics": "test_table", "servicenow.url": "https://.service-now.com/", "tasks.max": "1", "servicenow.table": "u_test_table", "servicenow.user": "", "servicenow.password": "", "key.converter": "io.confluent.connect.avro.AvroConverter", "key.converter.schema.registry.url": "http://localhost:8081", "value.converter": "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url": "http://localhost:8081", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.license": "", // leave it empty for evaluation license "confluent.topic.replication.factor": "1", "reporter.bootstrap.servers": "localhost:9092", "reporter.error.topic.name": "test-error", "reporter.error.topic.replication.factor": 1, "reporter.error.topic.key.format": "string", "reporter.error.topic.value.format": "string", "reporter.result.topic.name": "test-result", "reporter.result.topic.key.format": "string", "reporter.result.topic.value.format": "string", "reporter.result.topic.replication.factor": 1 } } ``` #### NOTE For details about using this connector with Kafka Connect Reporter, see [Connect Reporter](/kafka-connectors/self-managed/userguide.html#userguide-connect-reporter). 7. Load the ServiceNow Sink connector by posting configuration to Connect REST server. ```bash confluent local load ServiceNowSinkConnector --config servicenow-sink.json ``` 8. Confirm that the connector is in a `RUNNING` state. ```bash confluent local status ServiceNowSinkConnector ``` 9. To produce some records into the `test_table` topic, first start a Kafka producer. #### NOTE All user-defined columns in ServiceNow start with `u_` ```bash kafka-avro-console-producer \ --broker-list localhost:9092 --topic test_table \ --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"u_name","type":"string"}, {"name":"u_price", "type": "float"}, {"name":"u_quantity", "type": "int"}]}' ``` 10. The console producer is now waiting for input, so you can go ahead and insert some records into the topic. ```json {"u_name": "scissors", "u_price": 2.75, "u_quantity": 3} {"u_name": "tape", "u_price": 0.99, "u_quantity": 10} {"u_name": "notebooks", "u_price": 1.99, "u_quantity": 5} ``` 11. Confirm the messages were delivered to the ServiceNow table by using the ServiceNow user interface. ![image](images/servicenow_result.png) ## Quick Start This quick start uses the Solace Source connector to consume records from a Solace PubSub+ Standard broker and send them to Kafka. 1. Install the connector through the [Confluent Hub Client](https://docs.confluent.io/current/connect/managing/confluent-hub/client.html). ```bash # run from your Confluent Platform installation directory confluent connect plugin install confluentinc/kafka-connect-solace-source:latest ``` 2. [Install the Solace JMS Client Library](#install-solace-client-jar). 3. Start the Confluent Platform. ```bash confluent local start ``` 4. Start a Solace PubSub+ Standard docker container. ```bash docker run -d --name "solace" \ -p 8080:8080 -p 55555:55555 -p 9000:9000 \ --shm-size=1000000000 \ --tmpfs /dev/shm \ --ulimit nofile=2448:38048 \ -e username_admin_globalaccesslevel=admin \ -e username_admin_password=admin \ solace/solace-pubsub-standard:9.1.0.77 ``` 5. Once the Solace docker container has started, navigate to the [Solace UI](http://localhost:8080) and configure a `connector-quickstart` queue in the `Default` message VPN. 6. Publish messages to the Solace queue using the REST endpoint. ```bash curl -X POST -d "m1" http://localhost:9000/Queue/connector-quickstart -H "Content-Type: text/plain" -H "Solace-Message-ID: 1000" # repeat the above command to send additional messages (change the Solace-Message-ID header on each message) ``` 7. Create a `solace-source.json` file with the following contents: ```json { "name": "SolaceSourceConnector", "config": { "connector.class": "io.confluent.connect.solace.SolaceSourceConnector", "tasks.max": "1", "kafka.topic": "from-solace-messages", "solace.host": "smf://localhost:55555", "solace.username": "admin", "solace.password": "admin", "jms.destination.type": "queue", "jms.destination.name": "connector-quickstart", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "value.converter": "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url": "http://localhost:8081", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1" } } ``` 8. Load the Solace Source connector. ```bash confluent local load solace --config solace-source.json ``` 9. Confirm the connector is in a `RUNNING` state. ```bash confluent local status SolaceSourceConnector ``` 10. Confirm the messages were delivered to the `from-solace-messages` topic in Kafka. ```bash kafka-avro-console-consumer --bootstrap-server localhost:9092 --property schema.registry.url=http://localhost:8081 --topic from-solace-messages --from-beginning ``` ## Quick Start The following steps show the `SpoolDirCsvSourceConnector` loading a mock CSV file to a Kafka topic named `spooldir-testing-topic`. The other connectors are similar but load from different file types. Prerequisites : - [Confluent Platform](/platform/current/installation/index.html) - [Confluent CLI](https://docs.confluent.io/confluent-cli/current/installing.html) (requires separate installation) 1. Install the connector through the [Confluent Hub Client](/kafka-connectors/self-managed/confluent-hub/client.html). ```bash # run from your Confluent Platform installation directory confluent connect plugin installjcustenborder/kafka-connect-spooldir:latest ``` 2. Start Confluent Platform using the Confluent CLI [confluent local](https://docs.confluent.io/confluent-cli/current/command-reference/local/index.html) commands. ```bash confluent local start ``` 3. Create a data directory and generate test data. ```bash mkdir data && curl "https://api.mockaroo.com/api/58605010?count=1000&key=25fd9c80" > "data/csv-spooldir-source.csv" ``` 4. Set up directories for files with errors and files that finished successfully. ```bash mkdir error && mkdir finished ``` 5. Create a `spooldir.json` file with the following contents: ```json { "name": "CsvSpoolDir", "config": { "tasks.max": "1", "connector.class": "com.github.jcustenborder.kafka.connect.spooldir.SpoolDirCsvSourceConnector", "input.path": "/path/to/data", "input.file.pattern": "csv-spooldir-source.csv", "error.path": "/path/to/error", "finished.path": "/path/to/finished", "halt.on.error": "false", "topic": "spooldir-testing-topic", "csv.first.row.as.header": "true", "schema.generation.enabled": "true" } } ``` 6. Load the SpoolDir CSV Source connector. ```bash confluent local load spooldir --config spooldir.json ``` #### IMPORTANT Don’t use the [confluent local](https://docs.confluent.io/confluent-cli/current/command-reference/local/index.html) commands in production environments. 7. Confirm that the connector is in a `RUNNING` state. ```bash confluent local status spooldir ``` 8. Confirm that the messages are being sent to Kafka. ```bash kafka-avro-console-consumer \ --bootstrap-server localhost:9092 \ --property schema.registry.url=http://localhost:8081 \ --topic spooldir-testing-topic \ --from-beginning | jq '.' ``` 9. Confirm that the source CSV file has been moved to the `finished` directory. #### NOTE Mounting custom volumes does not support multiple PersistentVolumes for ZooKeeper and Kafka data. CFK configures and manages one PersistentVolume for ZooKeeper and Kafka data. The following are a few of the common use cases for custom volume mounts: * Third-party secrets providers As an alternative to using Kubernetes secrets to secure sensitive information, you can use a vault product like HashiCorp Vault, AWS Secrets Manager, and Azure KeyVault. You integrate a third-party secrets provider by configuring an ephemeral volume mount for the Confluent component pod that takes the credentials from the secrets provider. * Kafka connectors Some Kafka connectors require JARs, that are outside of the Connect plugin but need to be available to the Connect pods. You can create persistent volumes with the connector JARs and mount them on the Connect worker pods. * Multiple custom partitions For example, you could write logs to a separate persistent volume of your choice. In CFK, you mount custom volumes to Confluent component pods by defining custom volume mounts in the component custom resources (CRs), such as for Kafka, ZooKeeper, Control Center, Schema Registry, ksqlDB, Connect, and Kafka REST Proxy. The same volume will be mounted on all the pods in the component cluster in the specified paths. To mount custom volumes to a Confluent Platform component: 1. Configure the volumes according to the driver specification. 2. Add the following to the Confluent Platform component CR: ```yaml spec: mountedVolumes: --- [1] volumes: --- [2] volumeMounts: --- [3] ``` * [1] `mountedVolumes` is an array of the `volumes` and `volumeMounts` that are requested for this component. * [2] Required. `volumes` is an array of named volumes in a pod that may be accessed by any container in the pod. For the supported volume types and the specific configuration properties required for each volume type, see [Kubernetes Volume Types](https://kubernetes.io/docs/concepts/storage/volumes/#volume-types) for the supported volume types. * [3] Required. Describes mounting paths of the `volumes` within this container. For the configuration properties for volume mount, see [Kubernetes Pod volumeMounts](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#volumes-1). 3. Apply the CR using the `kubectl apply` command. Before the volumes and volume mounts are added to the component pod template, CFK performs a validation to ensure that there is no conflict with internal volume mounts. Reconcile will fail, and the error will be added to the CFK logs in the following cases: * A custom volume’s mount path conflicts with an internal mount path. These are the internal mounts used by Confluent Platform components: * `/mnt/config` * `/mnt/config/init` * `/mnt/config/shared` * `/mnt/data/data0` * `/mnt/plugins` * `/opt/confluentinc` * A custom volume’s mount path conflicts with a custom-mounted secret. * There is a conflict between the custom volume names or custom volume mount paths. The below example is to mount an Azure file volume and HashiCorp vault with SecretProviderClass and a CSI driver: ```yaml apiVersion: platform.confluent.io/v1beta1 kind: Kafka spec: mountedVolumes: volumes: - name: azure azureFile: secretName: azure-secret shareName: aksshare readOnly: true - name: secrets-store-inline csi: driver: secrets-store.csi.k8s.io readOnly: true volumeAttributes: secretProviderClass: "vault-database" volumeMounts: - name: azure mountPath: /mnt/azurePath1 - name: azure mountPath: /mnt/azurePath2 - name: secrets-store-inline mountPath: "/mnt/secrets-store" readOnly: true ``` ## External access to other Confluent Platform components using load balancers The external clients can connect to other Confluent Platform components using load balancers. The access endpoint of each Confluent Platform component is: `.` For example, in the `example.com` domain with TLS enabled, you access the Confluent Platform components at the following endpoints: * `https://connect.example.com` * `https://replicator.example.com` * `https://schemaregistry.example.com` * `https://ksql.example.com` * `https://controlcenter.example.com` * `https://kafkarestproxy.example.com` **To allow external access to Kafka using load balancers:** 1. Set the following in the component CR and apply the configuration: ```yaml spec: externalAccess: type: loadBalancer loadBalancer: domain: --- [1] prefix: --- [2] sessionAffinity: --- [3] sessionAffinityConfig: --- [4] clientIP: timeoutSeconds: --- [5] ``` * [1] Required. Set `domain` to the domain name of your Kubernetes cluster. If you change this value on a running cluster, you must roll the cluster. * [2] Optional. Set `prefix` to change the default load balancer prefixes. The default is the component name, such as `controlcenter`, `connect`, `replicator`, `schemaregistry`, `ksql`. The value is used for the DNS entry. The component DNS name becomes `.`. If not set, the default DNS name is `.`, for example, `controlcenter.example.com`. You may want to change the default prefixes for each component to avoid DNS conflicts when running multiple Kafka clusters. If you change this value on a running cluster, you must roll the cluster. * [3] Required for consumer REST Proxy to enable client IP-based session affinity. For REST Proxy to be used for Kafka consumers, set to `ClientIP`. See [Kubernetes Service](https://kubernetes.io/docs/concepts/services-networking/service/#virtual-ips-and-service-proxies) for more information about session affinity. * [4] Contains the configurations of session affinity if set `sessionAffinity: ClientIP` in [3]. * [5] Specifies the seconds of `ClientIP` type session sticky time. The value must be bigger than `0` and less than or equal to `86400` (1 day). Default value is `10800` (3 hours). 2. Add a DNS entry for each Confluent Platform component that you added a load balancer to. Once the external load balancers are created, you add a DNS entry associated with component load balancers to your DNS table (or whatever method you use to get DNS entries recognized by your provider environment). You need the following to derive Confluent Platform component DNS entries: * Domain name of your Kubernetes cluster as set in Step #1 * The external IP of the component load balancers You can retrieve the external IP using the following command: ```bash kubectl get services -n -ojson ``` * The component `prefix` if set in Step #1 above. Otherwise, the default component name. A DNS name is made up of the `prefix` and the `domain` name. For example, `controlcenter.example.com`. For a tutorial scenario on configuring external access using load balancers, see the [quickstart tutorial for using load balancer](https://github.com/confluentinc/confluent-kubernetes-examples/tree/master/networking/external-access-load-balancer-deploy). ## Create a rolebinding 1. Create a ConfluentRoleBinding CR. The following is the structure of the CR: ```yaml kind: ConfluentRolebinding metadata: name: namespace: spec: principal: --- [1] type: --- [2] name: --- [3] role: --- [4] resourcePatterns: --- [5] - name: --- [6] resourceType: --- [7] patternType: --- [8] clustersScopeByIds: --- [9] kafkaClusterId: --- [9a] schemaRegistryClusterId: --- [9b] connectClusterId: --- [9c] ksqlClusterId: --- [9d] clustersScopeByRegistryName:--- [10] kafkaRestClassRef: --- [11] name: --- [12] namespace: --- [13] ``` * [1] Required. The identity of a user or group this rolebinding is created for. * [2] Required. The type of the principal. Set it to `user` or `group`. * [3] Required. The name of the principal. * [4] Required. Predefined role name. For the predefined roles you can use, refer to [Confluent RBAC Predefined Roles](https://docs.confluent.io/platform/current/security/rbac/rbac-predefined-roles.html). * [5] Optional. Qualified resources associated with this rolebinding. * [6] Required. The name of the resource associated with this rolebinding. This setting cannot be updated. When you update this resource name, a new rolebinding is created. * [7] Required. The type of resource the role binding is applied to. Valid options are `Topic`, `Group`, `Subject`, `KsqlCluster`, `Cluster`, and `TransactionalId`. For more information about the RBAC resource, see [Authorization using Role-Based Access Control](https://docs.confluent.io/platform/current/security/rbac/index.html#terminology). * [8] Optional. Specify whether the pattern of resource is `PREFIXED` or `LITERAL`. The default is `LITERAL` if not set. * [9] Optional. The scope of the cluster id. You can specify a cluster name ([10]) or one scope id among `kafkaClusterId`, `schemaRegistryClusterId`, `ksqlClusterId`, and `connectClusterId`. * [9a] Get the Kafka cluster ID using one of the following commands: ```bash confluent cluster describe --url https:// ``` ```bash curl -ik https:///v1/metadata/id ``` * [9b] Schema Registry cluster id is in the following pattern: `id__` * [9c] Connect cluster id is in the following pattern: `.` * [9d] ksqlDB cluster id is in the following pattern: `.` * [10] Optional. The cluster name registered in the [cluster registry](https://docs.confluent.io/platform/current/security/cluster-registry.html#cluster-registry-and-mds), which uniquely identifies the cluster for this rolebinding. * [11] Optional. The KafkaRestClass CR that defines configurations for the Confluent REST Class. If it is not configured, the default KafkaRestClass is used. * [12] Required under `kafkaRestClassRef` ([11]). The name of the KafkaRestClass CR. * [13] Optional. If omitted, the same namespace of this ConfluentRoleBinding CR is used. 2. Apply the ConfluentRolebinding CR: ```bash kubectl apply -f ``` The following example shows how a Confluent CLI command to create a role binding is translated to a ConfluentRolebinding CR: ```bash confluent iam rbac role-binding create --principal User: \ --role DeveloperRead --resource Subject:* \ --kafka-cluster-id \ --schema-registry-cluster-id ``` ```yaml apiVersion: platform.confluent.io/v1beta1 kind: ConfluentRolebinding metadata: name: internal-schemaregistry-schema-validation namespace: spec: principal: name: type: user clustersScopeByIds: schemaRegistryClusterId: kafkaClusterId: resourcePatterns: name: "*" patternType: LITERAL resourceType: Subject role: DeveloperRead ``` ### Migrate RBAC from using LDAP to using both LDAP and OAuth This section describes the steps to upgrade a Confluent Platform deployment configured with LDAP-based RBAC to LDAP and OAuth-based RBAC. To migrate your Confluent Platform deployment to use OAuth, the Confluent Platform version must be 7.7. Upgrading the Confluent Platform version and migrating to OAuth simultaneously is not supported. Even though this upgrade can be done in one step, as described in this section, we recommend the two-step migration, MDS first, and the rest of the components to reduce failed restarts of components. To migrate an existing Confluent Platform deployment from LDAP to LDAP and OAuth: 1. Upgrade the MDS with the required OAuth settings as described in [Enable RBAC for Kafka](#co-rbac-kafka) and apply the CR with the `kubectl apply` command. Following is a sample snippet of a Kafka CR with LDAP and OAuth: ```yaml kind:kafka spec: services: mds: provider: ldap: address: ldaps://ldap.operator.svc.cluster.local:636 authentication: type: simple simple: secretRef: credential tls: enabled: true configurations: oauth: configurations: dependencies: kafkaRest: authentication: type: oauth jaasConfig: secretRef: oauth-secret oauthSettings: ``` 2. After the Kafka successfully restarts, upgrade the rest of the Confluent Platform components. 1. Add the following annotation to the Schema Registry, Connect, and Control Center CRs: ```yaml kind: metadata: annotations: platform.confluent.io/disable-internal-rolebindings-creation: "true" ``` 2. Add the OAuth settings to the rest of the Confluent Platform components as described in [Enable RBAC for KRaft controller](#co-rbac-kraft) and [Enable RBAC for other Confluent Platform components](#co-rbac-cp) and apply the CRs with the `kubectl apply` command. The following are sample snippets of the relevant settings in the component CRs. ```yaml kind: KRaftController spec: dependencies: mdsKafkaCluster: bootstrapEndpoint: authentication: type: oauth jaasConfig: secretRef: oauthSettings: tokenEndpointUri: ``` ```yaml kind: KafkaRestClass spec: kafkaRest: authentication: type: oauth oauth: secretRef: configuration: ``` ```yaml kind: SchemaRegistry spec: dependencies: mds: authentication: type: oauth oauth: secretRef: configuration: ``` 3. If you have existing connectors, add the following to the Connect CR to avoid possible down time. ```none kind: Connect spec: configOverrides: server: - producer.sasl.login.callback.handler.class=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginCallbackHandler - consumer.sasl.login.callback.handler.class=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginCallbackHandler - admin.sasl.login.callback.handler.class=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginCallbackHandler ``` 3. Log into Control Center and check if you can see Kafka, Schema Registry, and Connect. 4. [Create internal role binding](co-manage-rbac.md#co-create-rolebinding). ## REST Proxy The REST Proxy should be used for any language that does not have native clients with serializers compatible with Schema Registry. It is a convenient, language-agnostic method for interacting with Kafka. Almost all standard libraries have good support for HTTP and JSON, so even if a wrapper of the API does not exist for your language it should still be easy to use the API. It also automatically translates between Avro and JSON. This simplifies writing applications in languages that do not have good Avro support. The [REST Proxy API Reference](../kafka-rest/api.md#kafkarest-api) describes the complete API in detail, but we will highlight some key interactions here. First, you will want to produce data to Kafka. To do so, construct a `POST` request to the `/topics/{topicName}` resource including the schema for the data (plain integers in this example) and a list of records, optionally including the partition for each record. ```http POST /topics/test HTTP/1.1 Host: kafkaproxy.example.com Content-Type: application/vnd.kafka.avro.v1+json Accept: application/vnd.kafka.v1+json, application/vnd.kafka+json, application/json { "value_schema": "{\"name\":\"int\",\"type\": \"int\"}" "records": [ { "value": 12 }, { "value": 24, "partition": 1 } ] } ``` Note that REST Proxy relies on content type information to properly convert data to Avro, so you *must* specify the `Content-Type` header. The response includes the same information you would receive from the Java clients API about the partition and offset of the published data (or errors in case of failure). Additionally, it includes the schema IDs it registered or looked up in Schema Registry. ```http HTTP/1.1 200 OK Content-Type: application/vnd.kafka.v1+json { "key_schema_id": null, "value_schema_id": 32, "offsets": [ { "partition": 2, "offset": 103 }, { "partition": 1, "offset": 104 } ] } ``` In future requests, you can use this schema ID instead of the full schema, reducing the overhead for each request. You can also produce data to specific partitions using a similar request format with the `/topics/{topicName}/partitions/{partition}` endpoint. To achieve good throughput, it is important to batch your produce requests so that each HTTP request contains many records. Depending on durability and latency requirements, this can be as simple as maintaining a queue of records and only send a request when the queue has reached a certain size or a timeout is triggered. Consuming data is a bit more complex because consumers are stateful. However, it still only requires two API calls to get started. See the [API Reference](../kafka-rest/api.md#kafkarest-api) for complete details and examples. Finally, the API also provides metadata about the cluster, such as the set brokers, list of topics, and per-partition information. However, most applications will not need to use these endpoints. Note that it is also possible to use [non-Java clients](https://cwiki.apache.org/confluence/display/KAFKA/Clients) developed by the community and manage registration and schema validation manually using the [Schema Registry API](../schema-registry/develop/api.md#schemaregistry-api). However, as this is error-prone and must be duplicated across every application, we recommend using the REST Proxy unless you need features that are not exposed via the REST Proxy. #### IMPORTANT * If you do not specify a license, CMF will generate a trial license. * You can use some Confluent Platform components with Confluent Cloud brokers with a valid license, including with CMF version 2.1 and later, Flink version 2.0 and later, and Kafka source and sink connectors for Flink version 4.0 and later. For support with these self-managed Confluent Platform components, you must have a valid Confluent Enterprise License for Customer-managed Confluent Platform for Confluent Cloud subscription license. For more details, see [Confluent Platform Enterprise Licensing Requirements](../../installation/license.md#customer-managed-cp-cc-license). 1. (Optional) Store your Confluent license in a Kubernetes secret. ```none kubectl create secret generic --from-file=license.txt ``` 2. (Optional) Create a CMF database encryption key into a Kubernetes secret CMF is storing sensitive data such as secrets in its internal database. Below instructions are for setting up the encryption key for the CMF database. CMF has a `cmf.sql.production` property. When the property is set to `false`, encryption is disabled. Otherwise, an encryption key is required. ```none # Generate a 256-bit key (recommended for production) openssl rand -out cmf.key 32 # Create a Kubernetes secret with the encryption key kubectl create secret generic \ --from-file==cmf.key -n ``` During the CMF installation, pass the following Helm parameter to use the encryption key: ```none --set encryption.key.kubernetesSecretName= \ --set encryption.key.kubernetesSecretProperty= ``` **Example** ```none openssl rand -out cmf.key 32 kubectl create secret generic cmf-encryption-key \ --from-file=encryption-key=cmf.key \ -n confluent helm upgrade --install cmf --version "~2.1.0" \ confluentinc/confluent-manager-for-apache-flink \ --namespace confluent \ --set encryption.key.kubernetesSecretName=cmf-encryption-key \ --set encryption.key.kubernetesSecretProperty=encryption-key ``` #### WARNING You must backup the encryption key, CMF does not keep a backup of it. If the key is lost, you will no longer be able to access the encrypted data stored in the database. 3. Install CMF using the default configuration: For deployment on OpenShift, you must also pass `--set podSecurity.securityContext.fsGroup=null --set podSecurity.securityContext.runAsUser=null` to below Helm command. ```none helm upgrade --install cmf confluentinc/confluent-manager-for-apache-flink \ --version "~2.1.0" \ --namespace \ --set license.secretRef= \ --set cmf.sql.production=false # or pass --set encryption.key.kubernetesSecretName ... ``` #### NOTE CMF will create a `PersistentVolumeClaim` (PVC) in Kubernetes. If the PVC remains in status `Pending`, check your Kubernetes cluster configuration and make sure a Container Storage Interface (CSI) driver is installed and configured correctly. Alternatively, if you want to run CMF without persistent storage, you can disable the PVC by setting the `persistence.create` property to `false`. Note that in this case, a restart of the CMF pod will lead to a data loss. 4. Configure the Chart. Helm provides [several options](https://helm.sh/docs/intro/using_helm/#customizing-the-chart-before-installing) for setting and overriding values in a chart. For CMF, you should customize the chart by passing a values file with the `--values` flag. First, use Helm to show the default `values.yaml` file for CMF. ```bash helm inspect values confluentinc/confluent-manager-for-apache-flink --version "~2.1.0" ``` You should see output similar to the following: ```bash ## Image pull secret imagePullSecretRef: ## confluent-manager-for-apache-flink image image: repository: confluentinc name: cp-cmf pullPolicy: IfNotPresent tag: 1.0.1 ## CMF Pod Resources resources: limits: cpu: 2 memory: 1024Mi requests: cpu: 1 memory: 1024Mi ## Load license either from K8s secret license: ## ## The license secret reference name is injected through ## CONFLUENT_LICENSE environment variable. ## The expected key: license.txt. license.txt contains raw license data. ## Example: ## secretRef: confluent-license-for-cmf secretRef: "" ## Pod Security Context podSecurity: enabled: true securityContext: fsGroup: 1001 runAsUser: 1001 runAsNonRoot: true ## Persistence for CMF persistence: # if set to false, the database will be on the pod ephemeral storage, e.g. gone when the pod stops create: true dataVolumeCapacity: 10Gi ## storageClassName: # Without the storage class, the default storage class is used. ## Volumes to mount for the CMF pod. ## ## Example with a PVC. ## mountedVolumes: ## volumes: ## - name: custom-volume ## persistentVolumeClaim: ## claimName: pvc-test ## volumeMounts: ## - name: custom-volume ## mountPath: /mnt/ ## mountedVolumes: volumes: volumeMounts: ## Configure the CMF service for example Authn/Authz cmf: # authentication: # type: mtls ## Enable Kubernetes RBAC # When set to true, it will create a proper role/rolebinding or cluster/clusterrolebinding based on namespaced field. # If a user doesn't have permission to create role/rolebinding then they can disable rbac field and # create required resources out of band to be used by the Operator. In this case, follow the # templates/clusterrole.yaml and templates/clusterrolebiding.yaml to create proper required resources. rbac: true ## Creates a default service account for the CMF pod if service.account.create is set to true. # In order to use a custom service account, set the name field to the desired service account name and set create to false. # Also note that the new service account must have the necessary permissions to run the CMF pod, i.e cluster wide permissions. # The custom service account must have: # # rules: # - apiGroups: ["flink.apache.org"] # resources: ["flinkdeployments", "flinkdeployments/status"] # Needed to manage FlinkDeployments CRs # verbs: ["*"] # - apiGroups: [""] # resources: ["services"] # Read-only permissions needed for the flink UI # verbs: ["get", "list", "watch"] serviceAccount: create: true name: "" # The jvmArgs parameter allows you to specify custom Java Virtual Machine (JVM) arguments that will be passed to the application container. # This can be useful for tuning memory settings, garbage collection, and other JVM-specific options. # Example : # jvmArgs: "-Xms512m -Xmx1024m -XX:+UseG1GC" ``` Note the following about CMF default values: - CMF uses SQLite to store metadata about your deployments. The data is persisted on a persistent volume that is created during the installation via a `PersistentVolumeClaim` created by Helm. - The persistent volume is created with your Kubernetes cluster’s default storage class. Depending on your storage class, your metadata might not be retained if you uninstall CMF. For example, if your reclaim policy is `Delete`, data is not retained. **Make sure to backup the data in the persistent volume regularly**. - If you want to set your storage class, you can overwrite `persistence.storageClassName` during the installation. - By default, the chart uses the image hosted by [Confluent on DockerHub](https://hub.docker.com/r/confluentinc/cp-cmf). To specify your own registry, set the following configuration values: ```none image: repository: name: cp-cmf pullPolicy: IfNotPresent tag: ``` - By default, the chart creates a cluster role and [service account](https://kubernetes.io/docs/concepts/security/service-accounts/) that CMF can use to create and monitor Flink applications in all namespaces. If you want to keep your service account, you set the `serviceAccount.name` property during installation to the preferred service account. - To change the log level, for example to show debug logs, set `cmf.logging.level.root=debug`. ## Confluent Platform how-tos You have several options to get started with Confluent Platform and Kafka, depending on your use cases and goals. - [Quick Start for Confluent Platform](platform-quickstart.md#quickstart) - Provides a simple example that shows you how to run Confluent Platform using Docker on a single broker, single cluster development environment with topic replication factors set to `1`. - [Tutorial: Set Up a Multi-Broker Kafka Cluster](tutorial-multi-broker.md#basics-multi-broker-setup) - Provides an example of how to run a single cluster with multiple brokers. Describes how to configure and start a single controller, and as many brokers as you want to run in the cluster. * [Run multiple clusters](tutorial-multi-broker.md#basics-multi-cluster-setup)- Describes a multi-cluster deployment where you have a dedicated controller for each cluster, and a Kafka server properties file for each broker. * [Scripted Confluent Platform Demo](../tutorials/cp-demo/overview.md#scripted-demo) - Provides a scripted demo to build a full Confluent Platform deployment with [ksqlDB](../ksqldb/overview.md#ksql-home) and [Kafka Streams](../streams/overview.md#kafka-streams) for stream processing, and security end-to-end. # Configure Metadata Service (MDS) in Confluent Platform The Confluent Platform Metadata Service (MDS) manages a variety of metadata about your Confluent Platform installation. Specifically, the MDS: - Hosts the [cluster registry](../../security/cluster-registry.md#cluster-registry) that enables you to keep track of which clusters you have installed. - Serves as the system of record for cross-cluster authorization data (including [RBAC](../../security/authorization/rbac/overview.md#rbac-overview), and [centralized ACLs](../../security/authorization/rbac/authorization-acl-with-mds.md#authorization-acl-with-mds)), and can be used for token-based authentication. - Provides a convenient way to manage [audit log configurations](../../security/compliance/audit-logs/audit-logs-cli-config.md#audit-log-cli-config) across multiple clusters. - Can be used to authenticate data (note that client authentication is not supported). You can set up the MDS internally within a Kafka cluster that serves other functions, and manage permissions in the same way that a database stores permissions for users logging into the database itself. You can also use the MDS to store user data. For the Kafka cluster hosting MDS, you must configure MDS on each Kafka broker, and you should synchronize these configurations across nodes. You can also set up MDS on a dedicated Kafka cluster, servicing multiple worker Kafka clusters in such a way that security information is isolated away from client data. In the case of [role-based access control (RBAC)](../../security/authorization/rbac/overview.md#rbac-overview), the MDS offers a single, centralized configuration context that, after it is set up for a cluster, saves administrators from the complex and time-consuming task of defining and assigning roles for each resource on an individual basis. The MDS can enforce the rules for RBAC, centralized audit logs, centralized ACLs, and the cluster registry on its host Kafka cluster and across multiple secondary clusters (such as Kafka, Connect, and Schema Registry). So you can use a single Kafka cluster hosting MDS to manage and secure multiple secondary Kafka, Connect, Schema Registry, and ksqlDB clusters. The MDS listens for commands using HTTP on the default port 1. MDS maintains a local cache of authorization data that is persisted to an internal Kafka topic named `_confluent-metadata-auth`. Running on a Kafka broker, you can optionally integrate MDS with LDAP to provide authentication and refreshable bearer tokens for impersonation. Note that impersonation is restricted to Confluent components. For details about configuring LDAP integration with RBAC, see [Configure LDAP Authentication](ldap-auth-mds.md#ldap-auth-mds). This topic includes the following configuration tasks: - [Configure a primary Kafka cluster to host the MDS and role binding](#config-primary-kafka-cluster-mds) - [Configure a secondary Kafka cluster managed by the MDS of the primary Kafka cluster](#config-secondary-kafka-cluster-managed-by-primary-mds-cluster) # Configure Confluent Platform Components to Communicate with MDS over TLS This topic describes the Kafka client configuration for Confluent Platform components to communicate with MDS over TLS. These files can be found in your Confluent Platform install server directory in the following locations: | Component | Properties file to update | |--------------------------|-----------------------------------------------------------| | Schema Registry | `/etc/schema-registry/schema-registry.properties` | | ksqlDB | `/etc/ksqldb/ksql-server.properties` | | Connect | `/etc/kafka/connect-distributed.properties` | | Confluent Control Center | `/etc/confluent-control-center/control-center.properties` | | REST Proxy | `/etc/kafka-rest/kafka-rest.properties` | Specify the following Kafka client configuration for your component. Any content in brackets (`<>`) must be customized for your environment. ```text confluent.metadata.bootstrap.server.urls=https://:8090,https://:8090,... confluent.metadata.http.auth.credentials.provider=BASIC confluent.metadata.basic.auth.user.info=: confluent.metadata.ssl.truststore.location= confluent.metadata.ssl.truststore.password= confluent.metadata.ssl.keystore.location= confluent.metadata.ssl.keystore.password= confluent.metadata.ssl.key.password= confluent.metadata.ssl.endpoint.identification.algorithm=HTTPS ``` See also: - [HTTPS Configuration Options](mds-configuration.md#https-configs-for-ssl) - [Metadata Service Configuration Settings](mds-configuration.md#mds-configuration-options) - [Use TLS Authentication in Confluent Platform](../../security/authentication/mutual-tls/overview.md#kafka-ssl-authentication) ## General `id` : Unique ID for the Confluent REST Proxy server instance. This is used in generating unique IDs for consumers that do not specify their ID. The ID is empty by default, which makes a single server setup easier to get up and running, but is not safe for multi-server deployments where automatic consumer IDs are used. * Type: string * Default: “” * Importance: high `bootstrap.servers` : A list of Kafka brokers to connect to. For example, `PLAINTEXT://hostname:9092,SSL://hostname2:9092`. This configuration is particularly important when Kafka security is enabled, because Kafka may expose multiple endpoints that will be stored as metadata, but REST Proxy may need to be configured with just one of those endpoints. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. Because these servers are used only for the initial connection to discover the full cluster membership (which may change dynamically), this list need not contain the full set of servers (you may want more than one, though, in case a server is down). `listeners` : Comma-separated list of listeners that listen for API requests over either HTTP or HTTPS. If a listener uses HTTPS, the appropriate TLS configuration parameters need to be set as well. * Type: list * Default: `http://0.0.0.0:8082` * Importance: high `schema.registry.url` : The base URL for Schema Registry that should be used by the serializer. * Type: string * Default: `http://localhost:8081` * Importance: high #### NOTE The configuration property `auto.register.schemas` is not supported for Kafka REST Proxy. `consumer.request.max.bytes` : Maximum number of bytes in unencoded message keys and values returned by a single request. This can be used by administrators to limit the memory used by a single consumer and to control the memory usage required to decode responses on clients that cannot perform a streaming decode. Note that the actual payload will be larger due to overhead from base64 encoding the response data and from JSON encoding the entire response. * Type: long * Default: 67108864 * Importance: medium `consumer.threads` : The maximum number of threads to run consumer requests on. Note that this must be greater than the maximum number of consumers in a single consumer group. The sentinel value of -1 allows the number of threads to grow as needed to fulfill active consumer requests. Inactive threads will ultimately be stopped and cleaned up. * Type: int * Default: 50 * Importance: medium `consumer.request.timeout.ms` : The maximum total time to wait for messages for a request if the maximum number of messages has not yet been reached. * Type: int * Default: 1000 * Importance: medium `host.name` : The host name used to generate absolute URLs in responses. If empty, the default canonical hostname is used. * Type: string * Default: “” * Importance: medium `access.control.allow.methods` : Set value to Jetty Access-Control-Allow-Origin header for specified methods. * Type: string * Default: “” * Importance: low `access.control.allow.origin` : Set value for Jetty Access-Control-Allow-Origin header. You may use `*` for any origin, or you can specify multiple origins separated by commas. * Type: string * Default: “” * Importance: low `response.http.headers.config` : Use to select which HTTP headers are returned in the HTTP response for Confluent Platform components. Specify multiple values in a comma-separated string using the format `[action][header name]:[header value]` where `[action]` is one of the following: `set`, `add`, `setDate`, or `addDate`. You must use quotation marks around the header value when the header value contains commas. For example: ```none response.http.headers.config="add Cache-Control: no-cache, no-store, must-revalidate", add X-XSS-Protection: 1; mode=block, add Strict-Transport-Security: max-age=31536000; includeSubDomains, add X-Content-Type-Options: nosniff ``` * Type: string * Default: “” * Importance: low `reject.options.request` : Boolean indicating whether or not to reject the OPTIONS method request to REST services. By default, sending a request with the OPTIONS method to all REST services from Confluent Platform REST Proxy, Confluent Control Center REST endpoint, and so on, returns the list of available methods on the specified endpoint. For example: `curl -X OPTIONS http://localhost:8083`. When `reject.options.request` is set to `true`, requests with `-X OPTIONS` are rejected and available methods are not returned. Setting `reject.options.request` to `true` protects API endpoints that are not specifically used by applications, which reduces the attack surface. * Type: boolean * Default: false * Importance: low `consumer.instance.timeout.ms` : Amount of idle time before a consumer instance is automatically destroyed. * Type: int * Default: 300000 * Importance: low `consumer.iterator.backoff.ms` : Amount of time to backoff when an iterator runs out of data. If a consumer has a dedicated worker thread, this is effectively the maximum error value for the entire request timeout. It should be small enough to closely target the timeout, but large enough to avoid busy waiting. * Type: int * Default: 50 * Importance: low `fetch.min.bytes` : Minimum number of bytes in message keys and values returned by a single request before the timeout of `consumer.request.timeout.ms` passes. The special sentinel value of -1 disables this functionality. * Type: int * Default: -1 * Importance: medium `consumer.iterator.timeout.ms` : Timeout for blocking consumer iterator operations. This should be set to a small enough value that it is possible to effectively peek() on the iterator. * Type: int * Default: 1 * Importance: low `debug` : Boolean indicating whether extra debugging information is generated in some error response entities. * Type: boolean * Default: false * Importance: low `idle.timeout.ms` : The number of milliseconds before an idle connection times out. * Type: long * Default: 30000 * Importance: low `metric.reporters` : A list of classes to use as metrics reporters. Implementing the `MetricReporter` interface allows plugging in classes that will be notified of new metric creation. The JmxReporter is always included to register JMX statistics. * Type: list * Default: [] * Importance: low `metrics.jmx.prefix` : Prefix to apply to metric names for the default JMX reporter. * Type: string * Default: `kafka.rest` * Importance: low `metrics.num.samples` : The number of samples maintained to compute metrics. * Type: int * Default: 2 * Importance: low `metrics.sample.window.ms` : The metrics system maintains a configurable number of samples over a fixed window size. This configuration controls the size of the window. For example, you might maintain two samples each measured over a 30 second period. When a window expires, you erase and overwrite the oldest window. * Type: long * Default: 30000 * Importance: low `port` : DEPRECATED: port to listen on for new connections. Use `listeners` instead. * Type: int * Default: 8082 * Importance: low `request.logger.name` : Name of the SLF4J logger to write the NCSA Common Log Format request log. * Type: string * Default: `io.confluent.rest-utils.request` * Importance: low `response.mediatype.default` : The default response media type that should be used if no specify types are requested in an Accept header. * Type: string * Default: `application/json` * Importance: low `response.mediatype.preferred` : An ordered list of the server’s preferred media types used for responses, from most preferred to least. * Type: list * Default: [application/json, application/vnd.kafka.v2+json] * Importance: low `shutdown.graceful.ms` : Amount of time to wait after a shutdown request for outstanding requests to complete. * Type: int * Default: 1000 * Importance: low `kafka.rest.resource.extension.class` : A list of classes to use as RestResourceExtension. Implementing the interface `RestResourceExtension` allows you to inject user defined resources like filters to REST Proxy. Typically used to add custom capabilities like logging, security, etc. * Type: list * Default: “” * Importance: low `advertised.listeners` : List of advertised listeners. This configuration is used to generate absolute URLs in V3 responses. The HTTP and HTTPS protocols are supported. Each listener must include the protocol, hostname, and port. For example: `http://myhost:8080` and `https://0.0.0.0:8081`. * Type: list * Default: “” * Importance: low `confluent.resource.name.authority` : The authority where the governance of the name space is delegated to. This value is defined by the remainder of the CRN. This is used when generating Confluent resource names. Examples: `confluent.cloud` and `mds-01.example.com`. * Type: string * Default: “” * Importance: low ## Configure ksqlDB for Secured Confluent Schema Registry You can configure ksqlDB to connect to Schema Registry over HTTP by setting the `ksql.schema.registry.url` to the HTTPS endpoint of Schema Registry. Depending on your security setup, you might also need to supply additional SSL configuration. For example, a trustStore is required if the Schema Registry SSL certificates aren’t trusted by the JVM by default. A keyStore is required if Schema Registry requires mutual authentication. You can configure SSL for communication with Schema Registry by using non-prefixed names, like `ssl.truststore.location`, or prefixed names like `ksql.schema.registry.ssl.truststore.location`. Non-prefixed names are used for settings that are shared with other communication channels, where the same settings are required to configure SSL communication with both Kafka and Schema Registry. Prefixed names affect communication with Schema Registry only and override any non-prefixed settings of the same name. Use the following to configure ksqlDB for communication with Schema Registry over HTTPS, where mutual authentication isn’t required and Schema Registry SSL certificates are trusted by the JVM: ```properties ksql.schema.registry.url=https://: ``` Use the following settings to configure ksqlDB for communication with Schema Registry over HTTPS, with mutual authentication, with an explicit trustStore, and where the SSL configuration is shared between Kafka and Schema Registry: ```properties ksql.schema.registry.url=https://: ksql.schema.registry.ssl.truststore.location=/etc/kafka/secrets/ksql.truststore.jks ksql.schema.registry.ssl.truststore.password= ksql.schema.registry.ssl.keystore.location=/etc/kafka/secrets/ksql.keystore.jks ksql.schema.registry.ssl.keystore.password= ksql.schema.registry.ssl.key.password= ``` Use the following settings to configure ksqlDB for communication with Schema Registry over HTTP, without mutual authentication and with an explicit trustStore. These settings explicitly configure only ksqlDB to Schema Registry SSL communication. ```properties ksql.schema.registry.url=https://: ksql.schema.registry.ssl.truststore.location=/etc/kafka/secrets/sr.truststore.jks ksql.schema.registry.ssl.truststore.password= ``` The exact settings will vary depending on the encryption and authentication mechanisms Schema Registry is using, and how your SSL certificates are signed. You can pass authentication settings to the Schema Registry client used by ksqlDB by adding the following to your ksqlDB Server config. ```properties ksql.schema.registry.basic.auth.credentials.source=USER_INFO ksql.schema.registry.basic.auth.user.info=username:password ``` For more information, see [Schema Registry Security Overview](../../../schema-registry/security/index.md#schemaregistry-security). ### Create a Confluent Platform to Confluent Cloud link Set up the cluster link that mirrors data from Confluent Platform to Confluent Cloud. This is a **source initiated link**, meaning that its connection will come from Confluent Platform and go to Confluent Cloud. As such, you won’t have to open your on-premise firewall. To create this source initiated link, you must create both halves of the cluster link: the first half on Confluent Cloud, the second half on Confluent Platform. 1. Create a cluster link on the Confluent Cloud cluster. 1. Create a link configuration file `$CONFLUENT_CONFIG/clusterlink-hybrid-dst.config` with the following entries: ```bash link.mode=DESTINATION connection.mode=INBOUND ``` The combination of the configurations `link.mode=DESTINATION` and `connection.mode=INBOUND` tell the cluster link that it is the Destination half of a source initiated cluster link. These two configurations must be used together. #### NOTE - This tutorial example is based on the assumption that there is only one listener. If you configure multiple listeners (for example, INTERNAL, REPLICATION and EXTERNAL) and want to switch to a different listener than the default, you must add one more parameter to the configuration: `local.listener.name=EXTERNAL`. To learn more, see the Confluent Platform documentation on [Configuration Options](/platform/current/multi-dc-deployments/cluster-linking/configs.html#configuration-options) and [Understanding Listeners in Cluster Linking](/platform/current/multi-dc-deployments/cluster-linking/configs.html#understanding-listeners-in-cluster-linking) - If you want to add any configurations to your cluster link (such as consumer offset sync or auto-create mirror topics) `clusterlink-hybrid-dst.config` is the file where you would add them. Cluster link configurations are always set on the Destination cluster link (not the Source cluster link). 2. Create the destination cluster link on Confluent Cloud. ```bash confluent kafka link create from-on-prem-link --cluster $CC_CLUSTER_ID \ --source-cluster $CP_CLUSTER_ID \ --config-file $CONFLUENT_CONFIG/clusterlink-hybrid-dst.config ``` The output from this command should indicate that the link was created. ```bash Created cluster link "from-on-prem-link". ``` 2. Create security credentials for the cluster link on Confluent Platform. This security credential will be used to read topic data and metadata from the source cluster. ```bash kafka-configs --bootstrap-server localhost:9092 --alter --add-config \ 'SCRAM-SHA-512=[iterations=8192,password=1LINK2RUL3TH3MALL]' \ --entity-type users --entity-name cp-to-cloud-link \ --command-config $CONFLUENT_CONFIG/CP-command.config ``` Your output should resemble: ```bash Completed updating config for user cp-to-cloud-link. ``` 3. Create a link configuration file `$CONFLUENT_CONFIG/clusterlink-CP-src.config` for the source cluster link on Confluent Platform with the following entries: ```bash link.mode=SOURCE connection.mode=OUTBOUND bootstrap.servers= ssl.endpoint.identification.algorithm=https security.protocol=SASL_SSL sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username='' password=''; local.listener.name=SASL_PLAINTEXT local.security.protocol=SASL_PLAINTEXT local.sasl.mechanism=SCRAM-SHA-512 local.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="cp-to-cloud-link" password="1LINK2RUL3TH3MALL"; ``` - The combination of configurations `link.mode=SOURCE` and `connection.mode=OUTBOUND` tell the cluster link that it is the source-half of a source initiated cluster link. These configurations must be used together. - The middle section tells the cluster link the `bootstrap.servers` of the Confluent Cloud destination cluster for it to reach out to, and the authentication credentials to use. Cluster Linking to Confluent Cloud uses TLS and SASL_PLAIN. This is needed so that the Confluent Cloud cluster knows to accept the incoming request. The Confluent Cloud bootstrap server is shown as the Endpoint in the output for `confluent kafka cluster describe $CC_CLUSTER_ID` , or in cluster settings on the Confluent Cloud console. If you use the Endpoint from the CLI output, remove the protocol prefix. For example, if the endpoint shows as `SASL_SSL://pkc-r2ymk.us-east-1.aws.confluent.cloud:9092`, your entry in `$CONFLUENT_CONFIG/clusterlink-CP-src.config` should be `bootstrap.servers=pkc-r2ymk.us-east-1.aws.confluent.cloud:9092`. - The last section, where lines are prefixed with `local`, contains the security credentials to use with the source cluster (Confluent Platform) to read data. - Note that the authentication mechanisms and security protocols for Confluent Platform map to what is defined in the [broker](#cluster-link-hybrid-config). Those for Confluent Cloud map to what will be defined in a file called `clusterlink-cloud-to-CP.config` in a subsequent step. To learn more about the authentication and security protocols used, see [Configure SASL/SCRAM authentication for Confluent Platform](/platform/current/kafka/authentication_sasl/authentication_sasl_scram.html), and the [JAAS](/platform/current/security/authentication/sasl/scram/overview.html#jaas) section in particular. 4. Create the source cluster link on Confluent Platform, using the following command, specifying the configuration file from the previous step. ```bash kafka-cluster-links --bootstrap-server localhost:9092 \ --create --link from-on-prem-link \ --config-file $CONFLUENT_CONFIG/clusterlink-CP-src.config \ --cluster-id $CC_CLUSTER_ID --command-config $CONFLUENT_CONFIG/CP-command.config ``` Your output should resemble: ```bash Cluster link 'from-on-prem-link' creation successfully completed. ``` #### NOTE - **When using Schema Linking:** To use a mirror topic that has a schema with |ccloud| ksqlDB, broker-side schema ID validation, or the topic viewer, make sure that make sure that [Schema Linking](https://docs.confluent.io/cloud/current/sr/schema-linking.html) puts the schema in the default context of the Confluent Cloud Schema Registry. To learn more, see [How Schemas work with Mirror Topics](https://docs.confluent.io/cloud/current/multi-cloud/cluster-linking/mirror-topics-cc.html#how-schemas-work-with-mirror-topics). - Before running the first command in the steps below, make sure that you are still logged in to Confluent Cloud and have the appropriate environment and cluster selected. To list and select these resources, use the commands `confluent kafka environment list`, `confluent kafka environment use`, `confluent kafka cluster list`, and `confluent kafka cluster use`. A selected environment or cluster is indicated by an asterisk next to it in the output of list commands. The commands won’t work properly if no resources are selected (or if the wrong ones are selected). Perform the following tasks logged in to Confluent Cloud. 1. Create a mirror topic. The following command establishes a mirror of the original `from-on-prem` topic, using the cluster link `from-on-prem-link`. ```bash confluent kafka mirror create from-on-prem --link from-on-prem-link ``` The command output will be: ```bash Created mirror topic "from-on-prem". ``` - The mirror topic name must match the original topic name. To learn more, see all [Known Limitations](https://docs.confluent.io/platform/current/multi-dc-deployments/cluster-linking/index.html#cluster-linking-limitations). - A mirror topic must specify the link to its source topic at creation time. This ensures that the mirror topic is a clean slate, with no conflicting data or metadata. 2. List the mirror topics on the link. ```bash confluent kafka mirror list --cluster $CC_CLUSTER_ID ``` Your output will resemble: ```bash Link Name | Mirror Topic Name | Num Partition | Max Per Partition Mirror Lag | Source Topic Name | Mirror Status | Status Time Ms +-------------------+-------------------+---------------+------------------------------+-------------------+---------------+----------------+ from-on-prem-link | from-on-prem | 1 | 0 | from-on-prem | ACTIVE | 1633640214250 ``` 3. Consume from the mirror topic on the destination cluster to verify it. Still on Confluent Cloud, run a consumer to consume messages from the mirror topic to consume the messages you originally produced to the Confluent Platform topic in previous steps. ```bash confluent kafka topic consume from-on-prem --from-beginning ``` Your output should be: ```bash 1 2 3 4 5 ``` #### NOTE If when you attempt to run the consumer you get an error indicating “no API key selected for resource”, run this command to specify the `` for the Confluent Cloud destination cluster, then re-run the consumer command: `confluent api-key use --resource $CC_CLUSTER_ID`, or follow the instructions on the CLI provided with the error messages. ## Demos and Examples After completing the [Replicator quick start](replicator-quickstart.md#replicator-quickstart), explore these hands-on working examples of Replicator in multi-datacenter deployments, for which you can download the demo from GitHub and run yourself. Refer to the diagram below to determine the Replicator examples that correspond to your deployment scenario. ![image](multi-dc-deployments/replicator/images/replicator-demos.png) 1. Kafka on-premises to Kafka on-premises - [Example: Replicate Data in an Active-Active Multi-DataCenter Deployment on Confluent Platform](replicator-docker-tutorial.md#replicator): fully-automated example of an active-active multi-datacenter design with two instances of Replicator copying data bidirectionally between the datacenters - [Schema translation](replicator-schema-translation.md#quickstart-demos-replicator-schema-translation): showcases the transfer of schemas stored in Schema Registry from one cluster to another using Replicator - [Confluent Platform demo](../../tutorials/cp-demo/index.md#cp-demo): deploy a Kafka streaming ETL, along with Replicator to replicate data 2. Kafka on-premises to Confluent Cloud - [Hybrid On-premises and Confluent Cloud](../../tutorials/cp-demo/index.md#cp-demo): on-premises Kafka cluster and Confluent Cloud cluster, and data copied between them with Replicator - [Connect Cluster Backed to Destination](/cloud/current/get-started/examples/ccloud/docs/replicator-to-cloud-configuration-types.html): Replicator configuration with Kafka Connect backed to destination cluster - [On-premises to Cloud with Connect Backed to Origin](/cloud/current/get-started/examples/ccloud/docs/replicator-to-cloud-configuration-types.html#onprem-cloud-origin): Replicator configuration with Kafka Connect backed to origin cluster 3. Confluent Cloud to Confluent Cloud - [Cloud to Cloud with Connect Backed to Destination](/cloud/current/get-started/examples/ccloud/docs/replicator-to-cloud-configuration-types.html#cloud-cloud-destination): Replicator configuration with Kafka Connect backed to destination cluster - [Cloud to Cloud with Connect Backed to Origin](/cloud/current/get-started/examples/ccloud/docs/replicator-to-cloud-configuration-types.html#cloud-cloud-origin): Replicator configuration with Kafka Connect backed to origin cluster - [Migrate Topics on Confluent Cloud Clusters](/cloud/current/clusters/migrate-topics-on-cloud-clusters.html): migrate topics from the origin Confluent Cloud cluster to the destination Confluent Cloud cluster ### Confluent Cloud To run the producer on Confluent Cloud: ```bash ./bin/kafka-avro-console-producer \ --topic test \ --bootstrap-server ${BOOTSTRAP_SERVER} \ --producer.config config.properties \ --property schema.registry.url=${SR_URL} \ --property basic.auth.credentials.source=USER_INFO \ --property basic.auth.user.info=${SR_API_KEY}:${SR_API_SECRET} \ --property value.schema='{"type":"record","name":"myrecord","fields": [{"name":"f1","type":"string"}]}' \ --property value.rule.set='{ "domainRules": [{ "name": "checkLen", "kind": "CONDITION", "type": "CEL", "mode": "WRITE", "expr": "size(message.f1) < 10", "onFailure": "ERROR"}]}' {"f1": "success"} {"f1": "this will fail"} ``` where `config.properties` contains: ```bash bootstrap.servers={{ BOOTSTRAP_SERVER }} security.protocol=SASL_SSL sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username='{{ CLUSTER_API_KEY }}' password='{{ CLUSTER_API_SECRET }}'; sasl.mechanism=PLAIN client.dns.lookup=use_all_dns_ips session.timeout.ms=45000 acks=all ``` In the above example for `config.properties`, the following best practices and requirements are implemented: - `bootstrap.servers`, security protocols, and credentials are required for the Apache Kafka® producer, consumer, and admin. - `client.dns.lookup` value is required for Kafka clients prior to 2.6. - `session.timeout.ms` being included is a best practice for higher availability in Kafka clients prior to 3.0. - `acks=all` specifies that the producer requires all in-sync replicas to acknowledge receipt of messages (records), and is a best practice configuration on the producer to prevent data loss. ### Confluent Cloud To run the consumer on Confluent Cloud: ```bash ./bin/kafka-avro-console-consumer \ --topic test \ --bootstrap-server ${BOOTSTRAP_SERVER} \ --consumer.config config.properties \ --property schema.registry.url=${SR_URL} \ --property basic.auth.credentials.source=USER_INFO \ --property basic.auth.user.info=${SR_API_KEY}:${SR_API_SECRET} ``` where `config.properties` contains: ```bash bootstrap.servers={{ BOOTSTRAP_SERVER }} security.protocol=SASL_SSL sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username='{{ CLUSTER_API_KEY }}' password='{{ CLUSTER_API_SECRET }}'; sasl.mechanism=PLAIN client.dns.lookup=use_all_dns_ips session.timeout.ms=45000 acks=all ``` In the above example for `config.properties`, the following best practices and requirements are implemented: - `bootstrap.servers`, security protocols, and credentials are required for the Kafka producer, consumer, and admin. - `client.dns.lookup` value is required for Kafka clients prior to 2.6. - `session.timeout.ms` being included is a best practice for higher availability in Kafka clients prior to 3.0. - `acks=all` specifies that the producer requires all in-sync replicas to acknowledge receipt of messages (records), and is a best practice configuration on the producer to prevent data loss. ## Quick Start This Quick Start describes how to configure Schema Registry for Role-Based Access Control to manage user and application authorization to topics and subjects (schemas), including how to: - Configure Schema Registry to start and connect to the RBAC-enabled Apache Kafka® cluster (edit `schema-registry.properties` [use the Confluent CLI to create roles](../../security/authorization/rbac/rbac-cli-quickstart.md#rbac-cli-quickstart)) - Use the Confluent CLI to grant a SecurityAdmin role to the Schema Registry service principal. - Use the Confluent CLI to grant a ResourceOwner role to the Schema Registry service principal on the internal topic and group (used to coordinate across the Schema Registry cluster). - Use the Confluent CLI to grant users access to topics (and associated subjects in Schema Registry). The examples assume a local install of Schema Registry and shared RBAC and MDS configuration. Your production environment may differ (for example, Confluent Cloud or remote Schema Registry). If you were to use a local Kafka, ZooKeeper, and bootstrap server as might be the case for testing, these would also need authorization through RBAC, requiring additional prerequisite setup and credentials. ### .NET Example for UAMI Configure your .NET client with the following UAMI-specific properties. ```c# private const string azureIMDSQueryParams = "api-version=&resource=&client_id="; private const string kafkaLogicalCluster = "your-logical-cluster"; private const string identityPoolId = "your-identity-pool-id"; public static async Task Main(string[] args) { if (args.Length != 3) { Console.WriteLine("Usage: .. brokerList schemaRegistryUrl"); return; } var bootstrapServers = args[1]; var schemaRegistryUrl = args[2]; var topicName = Guid.NewGuid().ToString(); var groupId = Guid.NewGuid().ToString(); var commonConfig = new ClientConfig { BootstrapServers = bootstrapServers, SecurityProtocol = SecurityProtocol.SaslPlaintext, SaslMechanism = SaslMechanism.OAuthBearer, SaslOauthbearerMethod = SaslOauthbearerMethod.Oidc, SaslOauthbearerMetadataAuthenticationType = SaslOauthbearerMetadataAuthenticationType.AzureIMDS, SaslOauthbearerConfig = $"query={azureIMDSQueryParams}", SaslOauthbearerExtensions = $"logicalCluster={kafkaLogicalCluster},identityPoolId={identityPoolId}" }; var consumerConfig = new ConsumerConfig { BootstrapServers = bootstrapServers, SecurityProtocol = SecurityProtocol.SaslPlaintext, SaslMechanism = SaslMechanism.OAuthBearer, SaslOauthbearerMethod = SaslOauthbearerMethod.Oidc, GroupId = groupId, AutoOffsetReset = AutoOffsetReset.Earliest, EnableAutoOffsetStore = false }; // pass the config values to the Producer's builder using (var producer = new ProducerBuilder(commonConfig) ``` ## Steps to migrate LDAP to OAuth in a Confluent Platform cluster To ensure a smooth transition, review and complete the following process. 1. Understand the current LDAP RBAC configuration. Review your existing LDAP RBAC configuration to understand how roles and permissions are configured. 2. Configure the Metadata Service (MDS) to support OAuth. To modify the Metadata Service (MDS) to enable OAuth support, you need to: * Update the `confluent.metadata.server.user.store` property to `LDAP_WITH_OAUTH` for a hybrid approach during the migration phase and with `OAUTH` once all clients are migrated to OAuth. * Configure the necessary OAuth endpoints and ensure that the Metadata Service (MDS) can validate OAuth tokens. **Example configuration of Metadata Service (MDS) for OAuth support** Similar to your Confluent Server broker configurations, the following settings are required to enable identity provider (IdP)-issued OAuth token validation in MDS. For details on these configurations, see configurations for supporting identity provider tokens in MDS. ```none confluent.metadata.server.user.store=LDAP_WITH_OAUTH confluent.metadata.server.oauthbearer.jwks.endpoint.url= confluent.metadata.server.oauthbearer.expected.issuer= confluent.metadata.server.oauthbearer.expected.audience= confluent.metadata.server.oauthbearer.sub.claim.name=sub # optional confluent.metadata.server.oauthbearer.groups.claim.name=groups # optional ``` For Kafka Java clients supporting SASL OAUTHBEARER, allow specific IdP endpoints by setting the following configuration property: ```properties org.apache.kafka.sasl.oauthbearer.allowed.urls=,,... ``` This property specifies a comma-separated list of allowed IdP JWKS (JSON Web Key Set) and token endpoint URLs. Use \* (asterisk) as the value to allow any endpoint. ```properties org.apache.kafka.sasl.oauthbearer.allowed.urls=* ``` You should consult the specific Kafka client and IdP documentation for the exact interpretation and security implications of such a broad setting. Java applications should set this property as a JVM system property when launching the application: ```bash -Dorg.apache.kafka.sasl.oauthbearer.allowed.urls=,,... ``` For other clients (for example, Python, Go, .NET) that are built on librdkafka, these clients use different property names and configuration mechanisms. So, refer to specific client library documentation for the equivalent OAuthBEARER configuration properties. 3. Configure your OIDC identity provider to issue OAuth tokens. - Set up an OIDC-compliant identity provider (IdP), such as Okta, Keycloak, or another provider. - Ensure that your identity provider is configured to issue tokens that Metadata Service (MDS) can validate. 4. Update your client configurations. Update the configurations of clients (for example, producers/consumers, or if using OAuth for the platform service to service authentication, update client configurations for Confluent Server brokers, Schema Registry, REST Proxy, and Connect) to use OAuth for authentication. This involves setting the appropriate OAuth properties in the client configuration files. For details, see [Configure Clients for SASL/OAUTHBEARER authentication in Confluent Platform](../../authentication/sasl/oauthbearer/configure-clients.md#configure-sasl-oauthbearer-clients). **Example of client configuration for OAuth authentication** To use OAuth authentication with an Confluent Platform cluster, you must configure Kafka clients with the following properties, replacing the placeholders with your actual values: ```none sasl.mechanism=OAUTHBEARER security.protocol=SASL_SSL sasl.login.callback.handler.class=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginCallbackHandler sasl.login.connect.timeout.ms=15000 # optional sasl.oauthbearer.token.endpoint.url= sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required \ clientId="" \ clientSecret="" \ scope=""; # optional ``` 5. Test the configuration. Thoroughly test the new OAuth configuration in a staging environment to ensure that authentication and authorization work as expected. 6. Monitor and validate. - Monitor your Confluent Platform cluster after migration to ensure that there are no issues with authentication or authorization. - Validate that all users, clients, and services have the correct permissions. # Security in Confluent Platform * [Overview](overview.md) * [Deployment Profiles](deployment-profiles.md) * [Compliance](compliance/index.md) * [Overview](compliance/overview.md) * [Audit Logs](compliance/audit-logs/index.md) * [Manage Secrets](compliance/secrets/index.md) * [Authenticate](authentication/index.md) * [Overview](authentication/overview.md) * [Mutual TLS](authentication/mutual-tls/index.md) * [OAuth/OIDC](authentication/oauth-oidc/index.md) * [Multi-Protocol Authentication](authentication/multi-protocol/index.md) * [REST Proxy](authentication/rest-proxy/index.md) * [SSO for Confluent Control Center](authentication/sso-for-c3/index.md) * [HTTP Basic Authentication](authentication/http-basic-auth/index.md) * [SASL](authentication/sasl/index.md) * [LDAP](authentication/ldap/index.md) * [Delegation Tokens](authentication/delegation-tokens/index.md) * [Authorize](authorization/index.md) * [Overview](authorization/overview.md) * [Access Control Lists](authorization/acls/index.md) * [Role-Based Access Control](authorization/rbac/index.md) * [LDAP Group-Based Authorization](authorization/ldap/index.md) * [Protect Data](protect-data/index.md) * [Overview](protect-data/overview.md) * [Protect Data in Motion with TLS Encryption](protect-data/encrypt-tls.md) * [Protect Sensitive Data Using Client-side Field Level Encryption](protect-data/csfle/index.md) * [Redact Confluent Logs](protect-data/log-redaction.md) * [Configure Security Properties using Prefixes](../kafka/security_prefixes.md) * [Secure Components](component/index.md) * [Overview](component/overview.md) * [Schema Registry](component/nav-sr-security.md) * [Kafka Connect](component/connect-redirect.md) * [KRaft Security](component/kraft-security.md) * [ksqlDB RBAC](authorization/rbac/ksql-rbac.md) * [REST Proxy](component/nav-rest-proxy-security.md) * [Enable Security for a Cluster](security_tutorial.md) * [Add Security to Running Clusters](incremental-security-upgrade.md) * [Configure Confluent Server Authorizer](csa-introduction.md) * [Security Management Tools](sec-manage-tools.md) * [Ansible Playbooks for Confluent Platform](https://docs.confluent.io/ansible/current/overview.html) * [Deploy Secure Confluent Platform Docker Images](../installation/docker/security.md) * [Cluster Registry](cluster-registry.md) * [Encrypt using Client-Side Payload Encryption](encrypt/cspe.md) ## Configure TLS encryption for Kafka clients The new Producer and Consumer clients support security for Kafka versions 0.9.0 and higher. If you are using the Kafka Streams API, you can read on how to configure equivalent [SSL](/platform/current/clients/javadocs/javadoc/org/apache/kafka/common/config/SslConfigs.html) and [SASL](/platform/current/clients/javadocs/javadoc/org/apache/kafka/common/config/SaslConfigs.html) parameters. If client authentication is not required by the Confluent Server broker, the following is a minimal configuration example that you can store in a client properties file `client-ssl.properties`. Because this configuration stores passwords directly in the Kafka client configuration file, it is important to restrict access to these files via file system permissions. ```bash bootstrap.servers=kafka1:9093 security.protocol=SSL ssl.truststore.location=/var/ssl/private/kafka.client.truststore.jks ssl.truststore.password=test1234 ``` If client authentication requires TLS, the client must provide the keystore as well. You can read about the additional configurations required in [mTLS authentication](../authentication/mutual-tls/overview.md#kafka-ssl-authentication). Here are examples using Kafka tools, `kafka-console-producer` and `kafka-console-consumer`, to pass in the `client-ssl.properties` file with the properties specified above: ```bash kafka-console-producer --bootstrap-server kafka1:9093 \ --topic test \ --producer.config client-ssl.properties kafka-console-consumer --bootstrap-server kafka1:9093 \ --topic test \ --consumer.config client-ssl.properties \ --from-beginning ``` ## Configure TLS encryption for Connect workers This section describes how to enable security for Kafka Connect. Securing Kafka Connect requires that you configure security for: 1. Kafka Connect workers: part of the Kafka Connect API, a worker is really just an advanced client, underneath the covers 2. Kafka Connect connectors: connectors may have embedded producers or consumers, so you must override the default configurations for Connect producers used with source connectors and Connect consumers used with sink connectors 3. Kafka Connect REST: Kafka Connect exposes a REST API that can be configured to use TLS/SSL using [additional properties](#encryption-ssl-rest) Configure security for Kafka Connect as described in the section below. Additionally, if you are using Confluent Control Center streams monitoring for Kafka Connect, configure security for: Configure the top-level settings in the Connect workers to use TLS by adding these properties in `connect-distributed.properties`. These top-level settings are used by the Connect worker for group coordination and to read and write to the internal topics which are used to track the cluster’s state (for example, configurations and offsets). ```bash bootstrap.servers=kafka1:9093 security.protocol=SSL ssl.truststore.location=/var/ssl/private/kafka.client.truststore.jks ssl.truststore.password=test1234 ``` Connect workers manage the producers used by source connectors and the consumers used by sink connectors. So, for the connectors to leverage security, you also have to override the default producer or consumer configuration that the worker uses. Depending on whether the connector is a source or sink connector: * For source connectors, configure the same properties, but add the `producer` prefix. ```bash producer.bootstrap.servers=kafka1:9093 producer.security.protocol=SSL producer.ssl.truststore.location=/var/ssl/private/kafka.client.truststore.jks producer.ssl.truststore.password=test1234 ``` * For sink connectors, configure the same properties, but add the `consumer` prefix. ```bash consumer.bootstrap.servers=kafka1:9093 consumer.security.protocol=SSL consumer.ssl.truststore.location=/var/ssl/private/kafka.client.truststore.jks consumer.ssl.truststore.password=test1234 ``` ## Mirror Data to Confluent Cloud with Cluster Linking In this section, you will create a source-initiated cluster link to mirror the topic `wikipedia.parsed` from Confluent Platform to Confluent Cloud. For security reasons, most on-premises datacenters don’t allow inbound connections, so Confluent recommends source-initiated cluster linking to easily and securely mirror Kafka topics from your on-premises cluster to Confluent Cloud. 1. Verify that you’re still using the `ccloud` CLI context. ```none confluent context list ``` 2. Give the cp-demo service account the `CloudClusterAdmin` role in Confluent Cloud to authorize it to create cluster links and mirror topics in Confluent Cloud. ```shell confluent iam rbac role-binding create \ --principal User:$SERVICE_ACCOUNT_ID \ --role CloudClusterAdmin \ --cloud-cluster $CCLOUD_CLUSTER_ID --environment $CC_ENV ``` Verify that the role-binding was created. The output should show the role has been created. ```shell confluent iam rbac role-binding list \ --principal User:$SERVICE_ACCOUNT_ID \ -o json | jq ``` 3. Inspect the file `scripts/ccloud/cluster-link-ccloud.properties` ```none # This is the Confluent Cloud half of the cluster link # Confluent Cloud dedicated cluster is the destination link.mode=DESTINATION # Link connection comes in from Confluent Platform so you don't have to open your on-prem firewall connection.mode=INBOUND ``` 4. Create the Confluent Cloud half of the cluster link with the name **cp-cc-cluster-link**. ```shell confluent kafka link create cp-cc-cluster-link \ --cluster $CCLOUD_CLUSTER_ID \ --source-cluster $CP_CLUSTER_ID \ --config-file ./scripts/ccloud/cluster-link-ccloud.properties ``` 5. Inspect the file `scripts/ccloud/cluster-link-cp-example.properties` and read the comments to understand what each property does. ```none # Configuration for the Confluent Platform half of the cluster link # Copy the contents of this file to cluster-link-cp.properties and # add your Confluent Cloud credentials # *****DO NOT***** add cluster-link-cp.properties to version control # with your Confluent Cloud credentials # Confluent Platform is the source cluster link.mode=SOURCE # The link is initiated at the source so you don't have to open your firewall connection.mode=OUTBOUND # Authenticate to Confluent Cloud bootstrap.servers= ssl.endpoint.identification.algorithm=https security.protocol=SASL_SSL sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \ username='' \ password=''; # We are using the CP's SASL OAUTHBEARER token listener local.listener.name=TOKEN local.sasl.mechanism=OAUTHBEARER local.sasl.login.callback.handler.class=io.confluent.kafka.clients.plugins.auth.token.TokenUserLoginCallbackHandler local.sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required \ username="connectorSA" \ password="connectorSA" \ metadataServerUrls="https://kafka1:8091,https://kafka2:8092"; ``` 6. Run the following command to copy the file to `scripts/ccloud/cluster-link-cp.properties` with credentials and bootstrap endpoint for your own Confluent Cloud cluster. ```shell sed -e "s||${CCLOUD_CLUSTER_API_KEY}|g" \ -e "s||${CCLOUD_CLUSTER_API_SECRET}|g" \ -e "s||${CC_BOOTSTRAP_ENDPOINT}|g" \ scripts/ccloud/cluster-link-cp-example.properties > scripts/ccloud/cluster-link-cp.properties ``` 7. Next, use the `cp` CLI context to log into Confluent Platform. To create a cluster link, the CLI user must have `ClusterAdmin` privileges. For simplicity, we are continuing to use a super user instead of a `ClusterAdmin`. ```shell confluent context use cp ``` 8. The cluster link itself needs the `DeveloperRead` and `DeveloperManage` roles for any topics it plans to mirror, as well as the `ClusterAdmin` role for the Kafka cluster. Our cluster link uses the `connectorSA` principal, which already has `ResourceOwner` permissions on the `wikipedia.parsed` topic, so we just need to add the `ClusterAdmin` role. ```shell confluent iam rbac role-binding create \ --principal User:connectorSA \ --role ClusterAdmin \ --kafka-cluster $CP_CLUSTER_ID ``` 9. Create the Confluent Platform half of the cluster link, still called **cp-cc-cluster-link**. ```shell confluent kafka link create cp-cc-cluster-link \ --destination-bootstrap-server $CC_BOOTSTRAP_ENDPOINT \ --destination-cluster $CCLOUD_CLUSTER_ID \ --config ./scripts/ccloud/cluster-link-cp.properties \ --url https://localhost:8091/kafka \ --certificate-authority-path scripts/security/snakeoil-ca-1.crt ``` 10. Switch contexts back to “ccloud” and create the mirror topic for `wikipedia.parsed` in Confluent Cloud. ```shell confluent context use ccloud \ && confluent kafka mirror create wikipedia.parsed --link cp-cc-cluster-link ``` 11. Consume records from the mirror topic using the schema context “cp-demo”. Press `Ctrl+C` to stop the consumer when you are ready. ```shell confluent kafka topic consume \ --api-key $CCLOUD_CLUSTER_API_KEY \ --api-secret $CCLOUD_CLUSTER_API_SECRET \ --schema-registry-endpoint $CC_SR_ENDPOINT/contexts/:.cp-demo: \ --schema-registry-api-key $SR_API_KEY \ --schema-registry-api-secret $SR_API_SECRET \ --value-format avro \ wikipedia.parsed | jq ``` You successfully created a source-initiated cluster link to seamlessly move data from on-premises to cloud in real time. Cluster linking opens up real-time hybrid cloud, multi-cloud, and disaster recovery use cases. See the [Cluster Linking documentation](https://docs.confluent.io/cloud/current/multi-cloud/overview.html) for more information. ### Breaking Changes - Remove `confluent schema-registry cluster [delete | enable | upgrade]` and `confluent schema-registry region list` commands - Remove `confluent context create` command - Remove the configuration and partition-replica lists from `confluent kafka topic describe` for on-premises; these lists are now available through new on-premises `confluent kafka topic configuration list` and `confluent kafka replica list` commands - Remove the configuration and partition-replica lists from `confluent local kafka topic describe`; topic configurations are available through a new `confluent local kafka topic configuration list` command - Rename `confluent schema-registry exporter get-config` to `confluent schema-registry exporter configuration describe` - Rename `confluent schema-registry exporter get-status` to `confluent schema-registry exporter status describe` - Rename `confluent schema-registry compatibility validate` to `confluent schema-registry schema compatibility validate` - Rename `confluent schema-registry config` to `confluent schema-registry configuration` - Rename `confluent kafka topic describe` to `confluent kafka topic configuration list` for Confluent Cloud - Rename `confluent kafka replica list` to `confluent kafka replica status list` - Rename `confluent kafka broker describe` to `confluent kafka broker configuration list` - Rename `confluent kafka broker update` to `confluent kafka broker configuration update` - Rename `confluent local kafka broker describe` to `confluent local kafka broker configuration list` - Rename `confluent local kafka broker update` to `confluent local kafka broker configuration update` - Rename `confluent price list` to `confluent billing price list` - Rename `confluent admin [payment | promo]` subcommands to `confluent billing [payment | promo]` subcommands - Rename `confluent kafka broker get-tasks` to `confluent kafka broker task list` and remove the `--all` flag; this functionality is now implicit when no broker ID is provided - Remove the `--all` flag from `confluent kafka broker describe`; this functionality has been moved to a new on-premises `confluent kafka cluster configuration list` command - Remove the `--all` flag from `confluent kafka broker update`; this functionality has been moved to a new on-premises `confluent kafka cluster configuration update` command - Remove deprecated `--api-key` and `--api-secret` flags from all `confluent schema-registry` commands - Remove the `--context` flag from `confluent environment use`, `confluent flink region use`, `confluent service-account use`, and `confluent kafka cluster use` - Remove the `--environment` from `confluent flink region use` and `confluent kafka cluster use` - Replace the `--schema` flag for `confluent schema-registry schema compatibility validate` with a required argument - Replace the `--name` flag for `confluent kafka quota create` with a required argument - Replace the `--name` flag for `confluent schema-registry kek create` with a required argument - Rename `--organization-id` to `--organization` for `confluent login` - Rename `--group-id` to `--group` for `confluent asyncapi export` - Rename `--kms-key-id` to `--kms-key` for `confluent schema-registry kek create` - Rename `--deleted` to `--all` for `confluent schema-registry subject describe` and `confluent schema-registry subject list` - Rename `--aws-account-id` to `--aws-account` for `confluent stream-share consumer redeem` - Rename `--azure-subscription-id` to `--azure-subscription` for `confluent stream-share consumer redeem` - Rename `--gcp-project-id` to `--gcp-project` for `confluent stream-share consumer redeem` - Rename `--config-name` to `--config` for `confluent kafka broker describe` and `confluent local kafka broker describe` - Rename `--provider` to `--cloud` for `confluent byok` commands - Rename `--ca-location` and `--ca-cert-path` to `--certificate-authority-path` for all commands which use these flags - The `--subject` flag is now required for `confluent schema-registry schema compatibility validate` - The `--type` flag is now required for `confluent schema-registry schema compatibility validate` for Confluent Cloud - The `--config` flag is now required for `confluent kafka topic update` - The `--passphrase` and `--passphrase-new` flags are now required for `confluent secret file rotate` and no longer accept pipes or files - The `--passphrase` flag is now required for `confluent secret master-key generate` and no longer accepts pipes or files - The `--config` flag for `confluent secret file add`, `confluent secret file remove`, and `confluent secret file update` no longer accepts pipes or files - The broker ID is now a required argument for `confluent kafka broker list` and `confluent kafka broker update` - The API key and secret are now required arguments for `confluent api-key store` - Remove “Cloud Name” (human) and “cloud_name” (serialized) from the output of `confluent kafka region list` - Remove “Read-Only” (human) and “read_only” (serialized) from the output of `confluent configuration` commands - Rename “Name” to “ID” (human) and “name” to “id” (serialized) in the output of `confluent plugin search`; a new “Name” (human) and “name” (serialized) field has been added in its place - Rename “Kafka” to “Kafka Cluster” (human) and “kafka” to “kafka_cluster” (serialized) in the output of `confluent ksql cluster` commands - Rename “Schema Registry Secret” to “Schema Registry API Secret” (human) and “schema_registry_secret” to “schema_registry_api_secret” (serialized) in the output of `confluent stream-share consumer redeem` - Rename “Resource Display Name” to “Resource Name” (human) and “resource_display_name” to “resource_name” (serialized) in the output of `confluent billing cost list` - Rename “Provider” to “Cloud” (human) and “provider” to “cloud” (serialized) in the output of `confluent kafka cluster describe` - Rename “Service Provider” to “Cloud” (human) and “service_provider” to “cloud” (serialized) in the output of `confluent kafka cluster list` - Rename “Service Provider Region” to “Region” (human) and “service_provider_region” to “region” (serialized) in the output of `confluent kafka cluster list` - Rename “Schema ID” to “ID” (human) and “schema_id” to “id” (serialized) in the output of `schema-registry schema list` - Rename “Region Name” to “Name” (human) and “region_name” to “name” (serialized) in the output of “confluent kafka region list” - Rename “Region ID” to “Region” (human) and “region_id” to “region” (serialized) in the output of “confluent kafka region list” - Rename “Cloud ID” to “Cloud” (human) and “cloud_id” to “cloud” (serialized) in the output of “confluent kafka region list” - Rename “Resource ID” and “Environment ID” to “Resource” and “Environment” (human) and “resource_id” and “environment_id” to “resource” and “environment” (serialized) in the output of `confluent billing cost list` - Rename “Broker ID” to “Broker” (human) and “broker_id” to “broker” (serialized) in the output of `confluent broker task list` - Rename “Partition ID”, “Cluster ID” and “Leader ID” to “ID”, “Cluster” and “Leader” (human) and “partition_id”, “cluster_id” and “leader_id” to “id”, “cluster” and “leader” (serialized) in the output of `confluent kafka partition [describe | list]` - Rename “Private Link Attachment ID” to “Private Link Attachment” (human) and “private_link_attachment_id” to “private_link_attachment” (serialized) in the output of `confluent network private-link attachment connection` commands - Rename “Task ID” to “Task” (human) and “task_id” to “task” (serialized) in the output of `confluent connect cluster describe` - Rename “Plugin ID” and “Version ID” to “ID” and “Version” (human) and “plugin_id” and “version_id” to “plugin” and “version” (serialized) in the output of `confluent flink artifact` commands - Rename “Partition ID” to “Partition” (human) and “partition_id” to “partition” (serialized) in the output of `confluent kafka partition reassignment list` - Rename “ingress” and “egress” to “ingress_limit” and “egress_limit” in the serialized output of `confluent kafka cluster` commands - Rename “kafka_cluster_id” to “kafka_cluster” in the serialized output of `confluent iam acl` commands - Rename “cluster_id” to “cluster” in the serialized output of `confluent broker task list` - Rename “cluster_id” and “consumer_group_id” to “cluster” and “consumer_group” in the serialized output of `confluent kafka consumer group [describe | list]` - Rename “cluster_id”, “consumer_group_id”, “consumer_id”, “instance_id”, “client_id”, and “partition_id” to “cluster”, “consumer_group”, “consumer”, “instance”, “client”, and “partition” in the serialized output of `confluent kafka consumer group lag [describe | list]` - Rename “owner_id” and “resource_id” to “owner” and “resource” in the serialized output of `confluent api-key [describe | list]` - Rename “cluster_id”, “environment_id”, and “service_account_id” to “cluster”, “environment”, and “service_account” in the serialized output of `confluent audit-log describe` - Rename “cluster_id”, “environment_id”, and “service_account_id” to “cluster”, “environment”, and “service_account” in the serialized output of `confluent connect event describe` - Rename “source_cluster_id”, “destination_cluster_id”, and “remote_cluster_id” to “source_cluster”, “destination_cluster”, and “remote_cluster” in the serialized output of `confluent kafka link [describe | list]` - Rename “cluster_id”, “consumer_group_id”, “max_lag_consumer_id”, “max_lag_instance_id”, “max_lag_client_id”, and “max_lag_partition_id” to “cluster”, “consumer_group”, “max_lag_consumer”, “max_lag_instance”, “max_lag_client”, and “max_lag_partition” in the serialized output of `confluent kafka consumer group lag summarize` - Rename “cluster_id” to “cluster” in the serialized output of `confluent kafka partition [describe | list]` - Rename “cluster_id”, “partition_id”, and “broker_id” to “cluster”, “partition”, and “broker” in the serialized output of `confluent kafka replica list` - Rename “cluster_id” to “cluster” in the serialized output of `confluent schema-registry cluster describe` - Rename “cluster_id” to “cluster” in the serialized output of `confluent kafka partition reassignment list` - Rename “environment_id” to “environment” in the serialized output of `confluent network` commands - Rename “plugin_name” and “plugin_id” to “name” and “id” in the serialized output of `confluent plugin list` - Rename “consumer_group_id”, “consumer_id”, “instance_id”, and “client_id” to “consumer_group”, “consumer”, “instance”, and “client” in the serialized output of `confluent kafka consumer list` - The field “Network Zonal Subdomains” (human) and “network_zonal_subdomains” (serialized) in the output of `confluent stream-share consumer redeem` and `confluent stream-share consumer share describe` is now a map - The field “subtask_statuses” in the serialized output of `confluent kafka broker task list` is now a map - The field “config” in the serialized output of `confluent schema-registry exporter describe` is now a map - The field “kms_properties” in the serialized output of `confluent schema-registry kek` commands is now a map - The field `principals` in the serialized output of `confluent kafka quota` commands is now an array - The field “network_zones” in the serialized output of `confluent stream-share consumer redeem` and `confluent stream-share consumer share describe` is now an array - The field “Error Trace” (human) and “error_trace” (serialized) in the output of `confluent schema-registry exporter status describe` is now omitted when it is empty - The field “topic_count” in the serialized output of `confluent kafka cluster describe` is now omitted when it is empty - Remove unused “disable_updates”, “anonymous_id”, “no_browser”, and “ver” configuration fields - Rename the Windows-only configuration field “update_plugins_once” to “update_plugins_once_windows” - Legacy on-premises contexts are no longer supported; the Certificate Authority path must now be provided by flag or environment variable - The following deprecated environment variables are no longer supported: “CCLOUD_EMAIL”, “CCLOUD_PASSWORD”, “CONFLUENT_USERNAME”, “CONFLUENT_PASSWORD”, “CONFLUENT_MDS_URL”, and “CONFLUENT_CA_CERT_PATH” - Rename the `CONFLUENT_PLATFORM_CA_CERT_PATH` environment variable to `CONFLUENT_PLATFORM_CERTIFICATE_AUTHORITY_PATH` - `confluent logout` now revokes the refresh token when logging out of Confluent Cloud - Saved credentials will no longer be read from the `.netrc` file - CLI text highlighting is now enabled by default for new users - All confirmation prompts for resource `delete` and `undelete` commands are now yes/no prompts - The `confluent login` command will no longer automatically log in using saved credentials in the keychain or configuration file - On-premises login with `confluent login` will now print the confirmation code to the terminal and ask the user to confirm before opening a browser ## Quick Start In this quick start guide, the AMPS Source connector is used to consume messages from an SOW topic called `Orders` on AMPS that has Kerberos authentication enabled. It then sends these messages as records to a Kafka topic named `AMPS_Orders` with headers being forwarded from the AMPS messages. For an example of how to get Kafka Connect connected to [Confluent Cloud](/cloud/current/index.html), see [Connect Self-Managed Kafka Connect to Confluent Cloud](/cloud/current/cp-component/connect-cloud-config.html#distributed-cluster). **Prerequisites:** - [Confluent Platform](/platform/current/installation/installing_cp/index.html) is installed and services are running by using the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/index.html) commands. #### NOTE This quick start assumes that you are using the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/index.html) commands, but [standalone installations](/platform/current/installation/installing_cp/index.html) are also supported. By default ZooKeeper, Apache Kafka®, Schema Registry, Kafka Connect REST API, and Kafka Connect are started with the `confluent local start` command. Note that as of Confluent Platform 7.5, ZooKeeper is deprecated for new deployments. Confluent recommends KRaft mode for new deployments. - Kafka and Schema Registry are running locally on the default ports. #### SEE ALSO For a more detailed Docker-based example of the Confluent Elasticsearch Connector, refer to [Confluent Platform Demo (cp-demo)](/platform/current/tutorials/cp-demo/docs/index.html#cp-demo). You can deploy a Kafka streaming ETL, including Elasticsearch, using ksqlDB for stream processing. This quick start assumes that you are using the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/index.html) commands, but [standalone installations](/platform/current/installation/installing_cp/index.html) are also supported. By default ZooKeeper, Apache Kafka®, Schema Registry, Kafka Connect REST API, and Kafka Connect are started with the `confluent local start` command. Note that as of Confluent Platform 7.5, ZooKeeper is deprecated for new deployments. Confluent recommends KRaft mode for new deployments. ### Requirements and considerations * Confluent Platform 8.0 or later with the CSFLE Add-On enabled. * CSFLE in Confluent Platform is supported only in non-shared mode. * Schema Registry supports CSFLE mainly for DEK permissions checks. * Client performs all key management service (KMS) interactions, including encryption and decryption. * CSFLE supports Java clients for producing/consuming encrypted messages. * CSFLE is not available in Confluent CLI, Connect, ksqlDB, Flink, or non-Java clients at this time. * CSFLE is not integrated with Control Center. * Only `string` and `byte` type Avro fields are supported for CSFLE tagging and encryption. * The *kafka-avro-console-producer* and *kafka-avro-console-consumer* tools work with CSFLE. * Supported KMS types include local KEK, AWS KMS (Amazon Web Services), and HashiCorp Vault. For the full list, see [Supported KMS types](https://docs.confluent.io/platform/current/security/protect-data/csfle/overview.html#supported-kms-types). * The CSFLE API is protected using Confluent Platform Role-Based Access Control (RBAC). ## Features The following are summaries of the main, notable features of CFK. Cloud Native Declarative API : * Declarative Kubernetes-native API approach to configure, deploy, and manage Confluent Platform components (namely Apache Kafka®, Connect workers, ksqlDB, Schema Registry, Confluent Control Center, Confluent REST Proxy) and application resources (such as topics, rolebindings) through Infrastructure as Code (IaC). * Provides built-in automation for cloud-native security best practices: * Complete granular RBAC, authentication and TLS network encryption * Auto-generated certificates * Support for credential management systems, such as HashiCorp Vault, to inject sensitive configurations in memory to Confluent deployments * Provides server properties, JVM, Log4j, and Log4j 2 configuration overrides for customization of all Confluent Platform components. Upgrades : * Provides automated rolling updates for configuration changes. * Provides automated rolling upgrades with no impact to Kafka availability. Scaling : * Provides single command, automated scaling and reliability checks of Confluent Platform. Resiliency : * Restores a Kafka pod with the same Kafka broker ID, configuration, and persistent storage volumes if a failure occurs. * Provides automated rack awareness to spread replicas of a partition across different racks (or zones), improving availability of Kafka brokers and limiting the risk of data loss. Scheduling : * Supports Kubernetes labels and annotations to provide useful context to DevOps teams and ecosystem tooling. * Supports Kubernetes tolerations and pod/node affinity for efficient resource utilization and pod placement. Monitoring : * Supports metrics aggregation using JMX/Jolokia. * Supports aggregated metrics export to Prometheus. # Build Streaming Applications on Confluent Platform You can use Apache Kafka® clients to write distributed applications and microservices that read, write, and process streams of events in parallel, at scale, and in a fault-tolerant manner, even those related to network problems or machine failures. The Kafka client library provides functions, classes, and utilities that you can use to create Kafka [producer](../_glossary.md#term-producer) clients and [consumer](../_glossary.md#term-consumer) clients using your choice of programming languages. The primary way to build production-ready producers and consumers is by using a programming language and a Kafka client library. The official Confluent supported clients are: * Java: The official Java client library supports the producer, consumer, Streams, and Connect APIs. * [librdkafka](https://docs.confluent.io/platform/current/clients/librdkafka/html/md_INTRODUCTION.html): The librdkafka and the following derived clients libraries only support the admin, producer, and consumer APIs. * C/C++ * Python * Go * .NET * JavaScript When you use the official Confluent-supported clients, you get the same enterprise-level support that you get with the rest of Confluent Platform: * The release cycle for Confluent-provided clients follow the Confluent release cycle, as opposed to the Kafka release cycle. * Confluent Platform maintenance fixes are provided for the 2-3 years (2 years with the Standard Support and 3 years with the Platinum Support) after the initial release of a minor version. Additional open-source and community-developed Kafka client libraries are available for other programming languages. Some of these include Scala, Ruby, Rust, PHP, and Elixir. The core APIs in the Kafka client library are: * Producer API: This API provides classes and methods for creating and sending messages to Kafka topics. It allows developers to specify message payloads, keys, and metadata and to control message delivery and acknowledgment. * Consumer API: This API provides classes and methods for consuming messages from Kafka topics. It allows developers to subscribe to one or more topics, receive messages in batches or individually, and process messages using custom logic. * Streams API: This API provides a high-level abstraction for building real-time data processing applications that consume, transform, and produce data streams from Kafka topics. * Connector API: This API provides a framework for building connectors that can transfer data between Kafka topics and external data systems, such as databases, message queues, and cloud storage services. * Admin API: This API provides functions for managing Kafka topics, partitions, and configurations. It allows developers to create, delete, and update topics and retrieve metadata about Kafka clusters and brokers. In addition to these core APIs, the Kafka client library includes various tools and utilities for configuring and monitoring Kafka clients and clusters, handling errors and exceptions and optimizing client performance and scalability. ## Related content After getting started with your deployment, you may want check out the following Kafka Connect documentation: * Course: [Kafka Connect 101](https://developer.confluent.io/learn-kafka/kafka-connect/) * Course: [Building data pipelines with Apache Kafka](https://developer.confluent.io/learn-kafka/data-pipelines/intro/) * Tutorial: [Moving Data In and Out of Kafka](/platform/current/connect/quickstart.html) * [Kafka Connect Logging](/platform/current/connect/logging.html) * [Upgrade Kafka Connect](/platform/current/installation/upgrade.html) * [Kafka Connect Security](/platform/current/connect/security.html) * [Kafka Connect REST Interface](/platform/current/connect/references/restapi.html) * [Using Kafka Connect with Schema Registry](/platform/current/schema-registry/connect.html) * [Upgrading a Connector Plugin](upgrade.md#connect-upgrading-plugin) * [Override the Worker Configuration](/platform/current/connect/references/allconfigs.html#override-the-worker-configuration) * [Adding Connectors or Software (Docker)](extending.md#connect-adding-connectors-to-images) Also, check out Confluent’s [end-to-end demos](https://github.com/confluentinc/examples/) for Kafka Connect on-premises, Confluent Cloud, and Confluent for Kubernetes. ### Kafka capabilities Confluent Platform provides all of Kafka’s open-source features plus additional proprietary components. Following is a summary of Kafka features. For an overview of Kafka use cases, features and terminology, see [Kafka Introduction](/kafka/introduction.html). - At the core of Kafka is the [Kafka broker](../_glossary.md#term-Kafka-broker). A broker stores data in a durable way from clients in one or more topics that can be consumed by one or more clients. Kafka also provides several [command-line tools](../tools/cli-reference.md#cp-all-cli) that enable you to start and stop Kafka, create topics and more. - Kafka provides security features such as [data encryption](../security/protect-data/encrypt-tls.md#kafka-ssl-encryption) between producers and consumers and brokers using SSL / TLS. [Authentication](../security/authentication/overview.md#authentication-overview) using SSL or SASL and authorization using ACLs. These security features are disabled by default. - Additionally, Kafka provides the following [Java APIs](/kafka/kafka-apis.html). - The Producer API that enables an application to send messages to Kafka. To learn more, see [Producer](../clients/producer.md#kafka-producer). - The Consumer API that enables an application to subscribe to one or more topics and process the stream of records produced to them. To learn more, see [Consumer](../clients/consumer.md#kafka-consumer). - [Kafka Connect](../connect/index.md#kafka-connect), a component that you can use to stream data between Kafka and other data systems in a scalable and reliable way. It makes it simple to configure connectors to move data into and out of Kafka. Kafka Connect can ingest entire databases or collect metrics from all your application servers into Kafka topics, making the data available for stream processing. Connectors can also deliver data from Kafka topics into secondary indexes like Elasticsearch or into batch systems such as Hadoop for offline analysis. - The [Streams API](../streams/introduction.md#streams-intro) that enables applications to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams. It has a very low barrier to entry, easy operationalization, and a high-level DSL for writing stream processing applications. As such it is the most convenient yet scalable option to process and analyze data that is backed by Kafka. - The [Admin API](/kafka/kafka-apis.html#admin-client-api) that provides the capability to create, inspect, delete, and manage topics, brokers, ACLs, and other Kafka objects. To learn more, see [Confluent REST Proxy for Apache Kafka](../kafka-rest/index.md#kafkarest-intro), which leverages the Admin API. ### Ansible Playbooks for Confluent Platform Ansible Playbooks for Confluent Platform (Confluent Ansible) provides you a simple way to configure and deploy Confluent Platform on a traditional VM or bare metal infrastructure. For more information, see [Ansible documentation](https://docs.confluent.io/ansible/current/overview.html) . For version compatibility among Confluent Ansible, Confluent Platform, Ansible, and Python, see [Ansible Requirements](https://docs.confluent.io/ansible/current/ansible-requirements.html). The following table summarizes the Confluent Platform features supported with Ansible Playbooks for Confluent Platform. | Confluent Platform 8.1 Feature | Availability in Ansible Playbooks for Confluent Platform 8.1 | |--------------------------------------------|----------------------------------------------------------------| | Kafka Broker | Available | | Schema Registry | Available | | REST Proxy | Available | | ksqlDB | Available | | Connect | Available | | Control Center | Available [1] | | Replicator | Available [2] | | Security: Authentication | Available | | Security: Role-based Access Control (RBAC) | Available | | Security: Network Encryption | Available | | Structured Audit Logs | Available | | MDS-based Access Control Lists (ACLs) | Available [3] | | Secrets Protection | Available | | Schema Validation | Available | | FIPS | Available | | Multi-region Clusters | Available | | Tiered Storage | Available | | Self-Balancing Clusters | Available | | Auto Data Balancer | Not available | | Confluent REST API | Available | | Health+ | Available | | Cluster Registry | Available | | Cluster Linking | Available | - [1] Confluent Control Center requires a separate installation. See [Installation](/control-center/current/installation/overview.html). - [2] Cannot have RBAC enabled on the source or target cluster. - [3] Only available for new installations. Does not support centrally managing ACLs across multiple Kafka clusters. You can manually configure the features marked as *Not available* outside of the scope of Ansible Playbooks for Confluent Platform. If you take the hybrid installation approach, refer to the appropriate installation document in [Confluent documentation](index.md#installation-overview) to ensure your install path of mixing Ansible installation and manual installation is supported. ## Task status metrics kafka.server:type=cluster-link-metrics,name=link-task-count,link-name={linkName},task-name={taskName},state={state},reason={reason},mode={mode},connection-mode={connection_mode} : Monitor the state of link level tasks. For example, monitor if consumer offset syncing is working. If the task is in error, a reason code is provided. You can set up alerts to trigger if errors occur. **Available tags:** - `task-name`: The specific task being monitored. Possible values: - `consumer-offset-sync`: Consumer offset synchronization task - `acl-sync`: ACL synchronization task - `auto-create-mirror`: Automatic mirror topic creation task - `topic-configs-sync`: Topic configuration synchronization task - `clear-mirror-start-offsets`: Clear mirror start offsets task - `pause-mirror-topics`: Pause mirror topics task - `check-availability`: Availability check task - `state-aggregator`: State aggregation task - `retry-task`: Retry task for failed operations - `periodic-partition-scheduler`: Periodic partition scheduler task - `degraded-partition-monitor`: Degraded partition monitor task - `state`: The current state of the task. Possible values: - `active`: Task is currently running - `in-error`: Task has encountered an error - `reason`: Error code when the task is in error state. Common values include: - `no-error`: No error (when state is active) - `authentication`: Authentication errors with link credentials [Link cannot authenticate to remote cluster] - `broker-authentication`: Authentication errors with broker credentials [Authentication issues between link and destination broker] - `authorization`: Authorization errors with link credentials [Link lacks permissions on remote cluster] - `broker-authorization`: Authorization errors with broker credentials [Authorization issues between link and destination broker] - `misconfiguration`: Configuration errors [Invalid or missing link configuration] - `internal`: Internal/unexpected errors [Unexpected system errors] - `suppressed-errors`: Errors that are being suppressed [Errors being handled gracefully] - `consumer-group-in-use`: Consumer group is active on destination [Cannot sync offsets while consumers are active] - `remote-link-not-found`: Remote link not found (bidirectional links) [Remote side of bidirectional link missing] - `security-disabled`: Remote cluster has no authorizer configured [Remote cluster lacks security configuration] - `acl-limit-exceeded`: ACL limit reached on destination cluster [ACL quota exceeded on destination] - `invalid-request`: Invalid request error [Malformed or invalid API request] - `topic-exists`: Topic already exists on destination [Cannot create mirror topic - already exists] - `policy-violation`: Policy violation error [Mirror topic creation violates policies] - `invalid-topic`: Invalid topic error [Topic name or configuration invalid] - `unknown-topic-or-partition`: Topic or partition not found [Source topic/partition doesn’t exist] kafka.server:type=cluster-link-metrics,name=mirror-transition-in-error,link-name={linkName},state={state},reason={reason},mode={mode},connection-mode={connection_mode} : Monitor mirror topic state transition errors. For example, if a mirror topic encounters errors during the promotion process; that is, while its state is `pending_stopped` and it is being transitioned to stopped. **Available tags:** - `state`: The mirror topic transition state. Common values include: - `pending_stopped`: Mirror topic is being stopped - `pending_mirror`: Topic is being converted to mirror - `pending_synchronization`: Mirror topic is being reversed/swapped - `pending_restore`: Mirror topic is being prepared for restore - `pending_setup_for_restore`: Mirror topic is being prepared for restore setup - `failed`: Mirror topic repair operations - `reason`: Error code for the transition failure. Uses the same error codes as task metrics (see above). ## Related Content * Blog post: [Why Avro For Kafka Data](https://www.confluent.io/blog/avro-kafka-data/) * Blog post: [Yes, Virginia, You Really Do Need a Schema Registry](https://www.confluent.io/blog/schema-registry-kafka-stream-processing-yes-virginia-you-really-need-one/) * Apache Avro® official site: [How to get started with Apache Avro using Java Clients](https://avro.apache.org/docs/current/gettingstartedjava.html) * [How to produce and consume (Avro) messages via console tools with Confluent Cloud](https://support.confluent.io/hc/en-us/articles/360044952772) (Confluent Support) * Confluent supported schema formats, and how to configure clients using Avro, Protobuf, or JSON Schema: [Formats, Serializers, and Deserializers](/platform/current/schema-registry/fundamentals/serdes-develop/index.html) * Try it out: [Schema Registry API Usage Examples](/platform/current/schema-registry/develop/using.html), showing more curl commands over HTTP and HTTPS * User guide for managing schemas on Confluent Control Center: [Manage Schemas in Confluent Platform and Control Center](schema.md#topicschema) * Production deployments of Schema Registry: [Deploy Schema Registry in Production on Confluent Platform](installation/deployment.md#schema-registry-prod) * Big picture: [Scripted Confluent Platform Demo](../tutorials/cp-demo/index.md#cp-demo) shows Schema Registry in the context of a full Confluent Platform deployment, with various types of security enabled #### IMPORTANT Confluent Platform components that have a REST endpoint (such as Schema Registry and Confluent Control Center ), don’t support using a principal derived from mTLS authentication when using RBAC. So if you relied on TLS/SSL certificate authentication across Confluent Platform before configuring RBAC, when using RBAC you must also provide HTTP Basic Auth credentials (such as LDAP user) to authenticate against other components or REST API endpoints. HTTP Basic Auth presents login credentials to other Confluent Platform components and the component uses those credentials to get an OAuth token for the user with MDS (which validates the credentials against LDAP) and then the component uses the OAuth token to make authorization requests to the MDS. You must specify the bearer token for [Use HTTP Basic Authentication in Confluent Platform](../../authentication/http-basic-auth/overview.md#http-basic-auth) and more specifically, must specify `basic.auth.user.info` and `basic.auth.credentials.source`. When configuring Confluent Platform components (for example, Confluent Control Center , ksqlDB, and REST Proxy) for RBAC, use OAuth for authentication with MDS and Kafka clusters. For authentication with other Confluent Platform components such as Confluent Platform for Apache Flink, see [Use HTTP Basic Authentication in Confluent Platform](../../authentication/http-basic-auth/overview.md#http-basic-auth). For Confluent Platform components with REST endpoints (such as Schema Registry and Confluent Control Center ), you must use HTTP Basic Authentication to authenticate with MDS. For details, refer to [Configure RBAC using the REST API in Confluent Platform](rbac-config-using-rest-api.md#rbac-config-using-rest-api). You cannot use [principal propagation](../../../kafka-rest/production-deployment/rest-proxy/security.md#kafka-rest-security-propagation) with Confluent Platform components (for example, REST Proxy) that have a REST endpoint that requires RBAC. When using RBAC with Schema Registry and Connect you can use any of the [authentication methods](../../authentication/overview.md#authentication-overview) supported by Confluent Platform to communicate with Kafka clusters and MDS. For authentication with other Confluent Platform components, see [Use HTTP Basic Authentication in Confluent Platform](../../authentication/http-basic-auth/overview.md#http-basic-auth). When using RBAC with Kafka clients, you can use any of the [authentication methods](../../authentication/overview.md#authentication-overview) supported by Confluent Platform *except OAUTHBEARER*. For details, refer to [Configure Clients for SASL/OAUTHBEARER authentication in Confluent Platform](../../authentication/sasl/oauthbearer/configure-clients.md#security-sasl-rbac-oauthbearer-clientconfig). ![Diagram that shows authentication methods available when using RBAC](images/rbac-authentication-overview.png) ### Kafka Connect This example runs two connectors: - SSE source connector - Elasticsearch sink connector They are running on a Connect worker that is configured with Confluent Platform security features. The Connect worker’s embedded producer is configured to be idempotent, exactly-once in order semantics per partition (in the event of an error that causes a producer retry, the same message—which is still sent by the producer multiple times—will only be written to the Kafka log on the broker once). The Kafka Connect Docker container is running a custom image which has a specific set of connectors and transformations needed by `cp-demo`. See [this Dockerfile](https://github.com/confluentinc/cp-demo/tree/latest/Dockerfile) for more details. Confluent Control Center uses the Kafka Connect API to manage multiple [connect clusters](../../connect/index.md#kafka-connect). 1. In the navigation bar, click **Connect**. 2. Select **connect1**, the name of the cluster of Connect workers. ![image](tutorials/cp-demo/images/connect_default.png) 3. Verify the connectors running in this example: - source connector `wikipedia-sse`: view the example’s SSE source connector [configuration file](https://github.com/confluentinc/cp-demo/tree/latest/scripts/connectors/submit_wikipedia_sse_config.sh). - sink connector `elasticsearch-ksqldb` consuming from the Kafka topic `WIKIPEDIABOT`: view the example’s Elasticsearch sink connector [configuration file](https://github.com/confluentinc/cp-demo/tree/latest/scripts/connectors/submit_elastic_sink_config.sh). ![image](tutorials/cp-demo/images/connector_list.png) 4. Click any connector name to view or modify any details of the connector configuration and custom transforms. ## Set custom component properties When a configuration setting is not directly supported by Ansible Playbooks for Confluent Platform, you can use the custom property feature to configure Confluent Platform components. Before you set a custom property variable, first check the Ansible variable file at the following location for an existing variable: ```bash https://github.com/confluentinc/cp-ansible/blob/8.1.0-post/docs/VARIABLES.md ``` If you find an existing variable that directly supports the setting, use the variable in the inventory file instead of using a config override. Configure the custom properties in the Ansible inventory file, `hosts.yml`, using the following dictionaries: * `kafka_controller_custom_properties` * `kafka_broker_custom_properties` * `schema_registry_custom_properties` * `kafka_rest_custom_properties` * `kafka_connect_custom_properties` * `ksql_custom_properties` * `control_center_next_gen_custom_properties` * `kafka_connect_replicator_custom_properties` * `kafka_connect_replicator_consumer_custom_properties` * `kafka_connect_replicator_producer_custom_properties` * `kafka_connect_replicator_monitoring_interceptor_custom_properties` In the example below: * The `num.io.threads` property gets set in the Kafka [properties file](/platform/current/installation/configuration/broker-configs.html#cp-config-brokers). * The `confluent.controlcenter.ksql.default.advertised.url` property gets set in the Control Center [properties file](/platform/current/control-center/installation/configuration.html). Note that the default in the `confluent.controlcenter.ksql.default.advertised.url` property value is the name Control Center should use to identify the ksqlDB cluster. ```none all: vars: kafka_broker_custom_properties: num.io.threads: 15 control_center_next_gen_custom_properties: confluent.controlcenter.ksql.url: http://ksql-external-dns:1234,http://ksql-external-dns:2345 ``` ### Property-based example 1. Create a `quickstart-azureblobstoragesource.properties` file with the following contents. This file should be placed under Confluent Platform installation directory. This configuration is used typically along with [standalone workers](/platform/current/connect/concepts.html#standalone-workers). ```properties name=azure-blob-storage-source tasks.max=1 connector.class=io.confluent.connect.azure.blob.storage.AzureBlobStorageSourceConnector # enter your Azure blob account, key and container name here azblob.account.name= azblob.account.key= azblob.container.name= format.class=io.confluent.connect.azure.blob.storage.format.avro.AvroFormat confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 ``` 2. Edit the `quickstart-azureblobstoragesource.properties` to add the following properties: ```properties transforms=AddPrefix transforms.AddPrefix.type=org.apache.kafka.connect.transforms.RegexRouter transforms.AddPrefix.regex=.* transforms.AddPrefix.replacement=copy_of_$0 ``` #### IMPORTANT Adding this renames the output of topic of the messages to `copy_of_blob_topic`. This prevents a continuous feedback loop of messages. 3. Load the Backup and Restore Azure Blob Storage Source connector. ```bash confluent local load azblobstorage-source --config quickstart-azureblobstoragesource.properties ``` #### IMPORTANT Don’t use the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/index.html) in production environments. 4. Confirm that the connector is in a `RUNNING` state. ```bash confluent local status azureblobstorage-source ``` 5. Confirm that the messages are being sent to Kafka. ```bash kafka-avro-console-consumer \ --bootstrap-server localhost:9092 \ --property schema.registry.url=http://localhost:8081 \ --topic copy_of_blob_topic \ --from-beginning | jq '.' ``` 6. The response should be 9 records as follows. ```bash {"f1": "value1"} {"f1": "value2"} {"f1": "value3"} {"f1": "value4"} {"f1": "value5"} {"f1": "value6"} {"f1": "value7"} {"f1": "value8"} {"f1": "value9"} ``` ## (Optional) Running the other components You can run configure and run additional components as a part of the Self-Balancing tests, if desired, but these are not integral to this tutorial. If you want to run Connect, ksqlDB, or Schema Registry with Confluent Platform, do the following: 1. Edit the properties files for Connect, ksqlDB, or Schema Registry search and replace any `replication.factor` values to either 2 or 3 (to work with your five broker cluster). If `replication.factor` values are set to less than 2 or greater than 4, this will result in system topics with replication factors that prevent graceful broker removal with Self-Balancing. For example, if you want to run Connect, you could set replication factors in `$CONFLUENT_HOME/etc/kafka/connect-distributed.properties` to a value of “2”: - `offset.storage.replication.factor=2` - `config.storage.replication.factor=2` - `status.storage.replication.factor=2` You could run this command to update replication configurations for Connect: ```bash sed -i '' -e "s/replication.factor=1/replication.factor=2/g" $CONFLUENT_HOME/etc/kafka/connect-distributed.properties ``` 2. In `$CONTROL_CENTER_HOME/etc/confluent-control-center/control-center-dev.properties`, verify that the configurations for Kafka Connect, ksqlDB, and Schema Registry match the following settings to provide Control Center with the default advertised URLs for the component clusters: ```bash # A comma separated list of Connect host names confluent.controlcenter.connect.cluster=http://localhost:8083 # KSQL cluster URL confluent.controlcenter.ksql.ksqlDB.url=http://localhost:8088 # Schema Registry cluster URL confluent.controlcenter.schema.registry.url=http://localhost:8081 ``` 3. Start Prometheus, Control Center, and Confluent Platform as described in previous sections. 4. Start the optional components in separate windows. - (Optional) [Kafka Connect](../../connect/index.md#kafka-connect) ```bash connect-distributed $CONFLUENT_HOME/etc/kafka/connect-distributed.properties ``` - Optional) [ksqlDB](../../ksqldb/overview.md#ksql-home) ```bash ksql-server-start $CONFLUENT_HOME/etc/ksqldb/ksql-server.properties ``` - (Optional) [Schema Registry overview](/platform/current/schema-registry/index.html) ```bash schema-registry-start $CONFLUENT_HOME/etc/schema-registry/schema-registry.properties ``` # Configure RBAC for a Connect Worker In an RBAC-enabled environment, several RBAC configuration lines need to be added to each Connect worker file. Refer to the following for information about what needs to be added to each Connect worker file. 1. Add the following parameter to enable per-connector principals. ```none connector.client.config.override.policy=All ``` 2. Add the following parameters to enable the Connect framework to authenticate with Kafka using a [service principal](connect-rbac-connect-cluster.md#connect-rbac-service-account). The service principal is used by Connect to read from and write to internal configuration topics. Note that the `` and `` are the service principal username and password granted permissions when setting up the [service principal](connect-rbac-connect-cluster.md#connect-rbac-service-account). ```none # Or SASL_SSL if using TLS/SSL security.protocol=SASL_PLAINTEXT sasl.mechanism=OAUTHBEARER sasl.login.callback.handler.class=io.confluent.kafka.clients.plugins.auth.token.TokenUserLoginCallbackHandler sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required \ username="" \ password="" \ metadataServerUrls="http(s)://:"; ``` 3. Add the following parameters to establish **worker-wide default properties** for each type of Kafka client used by connectors in the cluster. ```none producer.security.protocol=SASL_PLAINTEXT producer.sasl.mechanism=OAUTHBEARER producer.sasl.login.callback.handler.class=io.confluent.kafka.clients.plugins.auth.token.TokenUserLoginCallbackHandler ``` #### NOTE Any principal used by Idempotent producers must be granted IdempotentWrite on the cluster or Write permission on any topic to initialize the producer client. Binding either the DeveloperWrite or ResourceOwner RBAC roles on the Kafka cluster grants Write permission. Note that DeveloperWrite is the less permissive of the two roles, and is the first recommendation. Consume does not require additional Kafka permissions to be Idempotent consumers. The following role binding ensures that Write has access to the cluster: ```none confluent iam rbac role-binding create \ --principal $PRINCIPAL \ --role DeveloperWrite \ --resource Cluster:kafka-cluster \ --kafka-cluster $KAFKA_CLUSTER_ID ``` ```none consumer.security.protocol=SASL_PLAINTEXT consumer.sasl.mechanism=OAUTHBEARER consumer.sasl.login.callback.handler.class=io.confluent.kafka.clients.plugins.auth.token.TokenUserLoginCallbackHandler ``` ```none admin.security.protocol=SASL_PLAINTEXT admin.sasl.mechanism=OAUTHBEARER admin.sasl.login.callback.handler.class=io.confluent.kafka.clients.plugins.auth.token.TokenUserLoginCallbackHandler ``` 4. Add the following Metadata Service (MDS) parameters to require user RBAC authentication for Connect. RBAC authentication is required to allow users to create connectors, read connector configurations, and delete connectors. ```none # Adds the RBAC REST extension to the Connect worker rest.extension.classes=io.confluent.connect.security.ConnectSecurityExtension # The location of a running metadata service confluent.metadata.bootstrap.server.urls= # Credentials to use when communicating with the MDS confluent.metadata.basic.auth.user.info=: confluent.metadata.http.auth.credentials.provider=BASIC ``` #### NOTE For additional configurations available to any client communicating with MDS, see [REST client configurations](../../kafka/configure-mds/mds-configuration.md#rest-client-mds-config) in the Confluent Platform Security documentation. 5. Add the following parameter to have Connect use basic authentication for user requests and token authentication for impersonated requests (for example, from REST proxy). ```none rest.servlet.initializor.classes=io.confluent.common.security.jetty.initializer.InstallBearerOrBasicSecurityHandler # The path to a directory containing public keys that should be used to verify json web tokens # during authentication public.key.path= ``` See [Secret Registry](connect-rbac-secret-registry.md#connect-rbac-secret-registry) if you are using a Secret Registry for connector credentials. ### Connect to a secure Kafka cluster, like Confluent Cloud Run a ksqlDB Server that uses a secure connection to a Kafka cluster. Learn about [Configure Security for ksqlDB](../../ksqldb/operate-and-deploy/installation/security.md#ksqldb-installation-security). `KSQL_BOOTSTRAP_SERVERS` : A host:port pair for establishing the initial connection to the Kafka cluster. Multiple bootstrap servers can be used in the form `host1:port1,host2:port2,host3:port3...`. `KSQL_KSQL_SERVICE_ID` : The service ID of the ksqlDB server, which is used as the prefix for the internal topics created by ksqlDB. `KSQL_LISTENERS` : A list of URIs, including the protocol, that the broker listens on. `KSQL_KSQL_SINK_REPLICAS` : The default number of replicas for the topics created by ksqlDB. The default is one. `KSQL_KSQL_STREAMS_REPLICATION_FACTOR` : The replication factor for internal topics, the command topic, and output topics. `KSQL_SECURITY_PROTOCOL` : The protocol that your Kafka cluster uses for security. `KSQL_SASL_MECHANISM` : The SASL mechanism that your Kafka cluster uses for security. `KSQL_SASL_JAAS_CONFIG` : The Java Authentication and Authorization Service (JAAS) configuration. ```bash docker run -d \ -p 127.0.0.1:8088:8088 \ -e KSQL_BOOTSTRAP_SERVERS=REMOVED_SERVER1:9092,REMOVED_SERVER2:9093,REMOVED_SERVER3:9094 \ -e KSQL_LISTENERS=http://0.0.0.0:8088/ \ -e KSQL_KSQL_SERVICE_ID=default_ \ -e KSQL_KSQL_SINK_REPLICAS=3 \ -e KSQL_KSQL_STREAMS_REPLICATION_FACTOR=3 \ -e KSQL_SECURITY_PROTOCOL=SASL_SSL \ -e KSQL_SASL_MECHANISM=PLAIN \ -e KSQL_SASL_JAAS_CONFIG="org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";" \ confluentinc/cp-ksqldb-server:8.1.0 ``` ### Connect ksqlDB Server to a secure Kafka Cluster, like Confluent Cloud ksqlDB Server runs outside of your Kafka clusters, so you need to specify in the container environment how ksqlDB Server connects with a Kafka cluster. Run a ksqlDB Server that uses a secure connection to a Kafka cluster: ```bash docker run -d \ -p 127.0.0.1:8088:8088 \ -e KSQL_BOOTSTRAP_SERVERS=REMOTE_SERVER1:9092,REMOTE_SERVER2:9093,REMOTE_SERVER3:9094 \ -e KSQL_LISTENERS=http://0.0.0.0:8088/ \ -e KSQL_KSQL_SERVICE_ID=default_ \ -e KSQL_KSQL_SINK_REPLICAS=3 \ -e KSQL_KSQL_STREAMS_REPLICATION_FACTOR=3 \ -e KSQL_KSQL_INTERNAL_TOPIC_REPLICAS=3 \ -e KSQL_SECURITY_PROTOCOL=SASL_SSL \ -e KSQL_SASL_MECHANISM=PLAIN \ -e KSQL_SASL_JAAS_CONFIG="org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";" \ confluentinc/cp-ksqldb-server:8.1.0 ``` `KSQL_BOOTSTRAP_SERVERS` : A list of hosts for establishing the initial connection to the Kafka cluster. `KSQL_KSQL_SERVICE_ID` : The service ID of the ksqlDB Server, which is used as the prefix for the internal topics created by ksqlDB. `KSQL_LISTENERS` : A list of URIs, including the protocol, that the broker listens on. If you are using IPv6, set it to `http://[::]:8088`. `KSQL_KSQL_SINK_REPLICAS` : The default number of replicas for the topics created by ksqlDB. The default is one. `KSQL_KSQL_STREAMS_REPLICATION_FACTOR` : The replication factor for internal topics, the command topic, and output topics. `KSQL_KSQL_INTERNAL_TOPIC_REPLICAS` : The number of replicas for the internal topics created by ksqlDB Server. The default is 1. `KSQL_SECURITY_PROTOCOL` : The protocol that your Kafka cluster uses for security. `KSQL_SASL_MECHANISM` : The SASL mechanism that your Kafka cluster uses for security. `KSQL_SASL_JAAS_CONFIG` : The Java Authentication and Authorization Service (JAAS) configuration. Learn how to [Configure Security for ksqlDB](security.md#ksqldb-installation-security). ### Create a Kafka client project Notice that so far, all the heavy lifting happens inside of ksqlDB. ksqlDB takes care of the stateful stream processing. Triggering side-effects will be delegated to a light-weight service that consumes from a Kafka topic. You want to send an email each time an anomaly is found. To do that, you’ll implement a simple, scalable microservice. In practice, you might use [Kafka Streams](../../streams/overview.md#kafka-streams) to handle this piece, but to keep things simple, just use a Kafka consumer client. Start by creating a `pom.xml` file for your microservice. This simple microservice will run a loop, reading from the `possible_anomalies` Kafka topic and sending an email for each event it receives. Dependencies are declared on Kafka, Avro, SendGrid, and a few other things: ```xml 4.0.0 io.ksqldb email-sender 0.0.1 8 |release| 2.5.0 1.9.1 1.7.30 4.4.8 UTF-8 UTF-8 confluent Confluent https://packages.confluent.io/maven/ confluent https://packages.confluent.io/maven/ io.confluent kafka-avro-serializer ${confluent.version} org.apache.kafka kafka-clients ${kafka.version} org.apache.avro avro ${avro.version} org.slf4j slf4j-log4j12 ${slf4j.version} com.sendgrid sendgrid-java ${sendgrid.version} org.apache.maven.plugins maven-compiler-plugin 3.8.1 ${java.version} ${java.version} -Xlint:all org.apache.avro avro-maven-plugin ${avro.version} generate-sources schema ${project.basedir}/src/main/avro ${project.build.directory}/generated-sources true io.confluent kafka-schema-registry-maven-plugin ${confluent.version} http://localhost:8081 src/main/avro possible_anomalies-value true ``` Create the directory structure for the rest of the project: ```none mkdir -p src/main/java/io/ksqldb/tutorial src/main/resources src/main/avro ``` To ensure that your microservice logs output to the console, create the following file at `src/main/resources/log4j.properties`: ```none ### Start the stack To set up and launch the services in the stack, a few files need to be created first. MySQL requires some custom configuration to play well with Debezium, so take care of this first. Debezium has dedicated [documentation](https://debezium.io/documentation/reference/1.1/connectors/mysql.html) if you’re interested, but this guide covers just the essentials. Create a new file at `mysql/custom-config.cnf` with the following content: ```none [mysqld] server-id = 223344 log_bin = mysql-bin binlog_format = ROW binlog_row_image = FULL expire_logs_days = 10 gtid_mode = ON enforce_gtid_consistency = ON ``` This sets up MySQL’s transaction log so that Debezium can watch for changes as they occur. With this file in place, create a `docker-compose.yml` file that defines the services to launch: ```yaml version: '2' services: mysql: image: mysql:8.0.19 hostname: mysql container_name: mysql ports: - "3306:3306" environment: MYSQL_ROOT_PASSWORD: mysql-pw MYSQL_DATABASE: call-center MYSQL_USER: example-user MYSQL_PASSWORD: example-pw volumes: - "./mysql/custom-config.cnf:/etc/mysql/conf.d/custom-config.cnf" broker: image: confluentinc/cp-kafka:8.1.0 hostname: broker container_name: broker ports: - "29092:29092" environment: KAFKA_BROKER_ID: 1 KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:9092,PLAINTEXT_HOST://localhost:29092 KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0 KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1 KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1 schema-registry: image: confluentinc/cp-schema-registry:8.1.0 hostname: schema-registry container_name: schema-registry depends_on: - broker ports: - "8081:8081" environment: SCHEMA_REGISTRY_HOST_NAME: schema-registry SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: "PLAINTEXT://broker:9092" ksqldb-server: image: confluentinc/cp-ksqldb-server:8.1.0 hostname: ksqldb-server container_name: ksqldb-server depends_on: - broker - schema-registry ports: - "8088:8088" volumes: - "./confluent-hub-components/:/usr/share/kafka/plugins/" environment: KSQL_LISTENERS: "http://0.0.0.0:8088" KSQL_BOOTSTRAP_SERVERS: "broker:9092" KSQL_KSQL_SCHEMA_REGISTRY_URL: "http://schema-registry:8081" KSQL_KSQL_LOGGING_PROCESSING_STREAM_AUTO_CREATE: "true" KSQL_KSQL_LOGGING_PROCESSING_TOPIC_AUTO_CREATE: "true" # Configuration to embed Kafka Connect support. KSQL_CONNECT_GROUP_ID: "ksql-connect-cluster" KSQL_CONNECT_BOOTSTRAP_SERVERS: "broker:9092" KSQL_CONNECT_KEY_CONVERTER: "org.apache.kafka.connect.storage.StringConverter" KSQL_CONNECT_VALUE_CONVERTER: "io.confluent.connect.avro.AvroConverter" KSQL_CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL: "http://schema-registry:8081" KSQL_CONNECT_CONFIG_STORAGE_TOPIC: "_ksql-connect-configs" KSQL_CONNECT_OFFSET_STORAGE_TOPIC: "_ksql-connect-offsets" KSQL_CONNECT_STATUS_STORAGE_TOPIC: "_ksql-connect-statuses" KSQL_CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1 KSQL_CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1 KSQL_CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1 KSQL_CONNECT_PLUGIN_PATH: "/usr/share/kafka/plugins" ksqldb-cli: image: confluentinc/cp-ksqldb-cli:8.1.0 container_name: ksqldb-cli depends_on: - broker - ksqldb-server entrypoint: /bin/sh tty: true ``` There are a few things to notice here. The MySQL image mounts the custom configuration file that you wrote. MySQL merges these configuration settings into its system-wide configuration. The environment variables you gave it also set up a blank database called `call-center` along with a user named `example-user` that can access it. Also note that the ksqlDB server image mounts the `confluent-hub-components` directory, too. The jar files that you downloaded need to be on the classpath of ksqlDB when the server starts up. Bring up the entire stack by running: ```bash docker-compose up ``` ## Example 1: Same number of partitions in DC1 and DC2 In this example, you migrate from MirrorMaker to Replicator and keep the same number of partitions for `inventory` in DC1 and DC2. Prerequisites: : - Confluent Platform 5.0.0 or later is [installed](../../installation/overview.md#installation). - You must have the same number of partitions for `inventory` in DC1 and DC2 to use this method. - The `src.consumer.group.id` in Replicator must match `group.id` in MirrorMaker. 1. Stop the running MirrorMaker instance in DC1, where `` is the MirrorMaker process ID: ```none kill ``` 2. Configure and start Replicator. In this example, Replicator is run as an executable from the command line or from [a Docker image](../../installation/docker/config-reference.md#config-reference). 1. Add these values to `CONFLUENT_HOME/etc/kafka-connect-replicator/replicator_consumer.properties`. Replace `localhost:9082` with the `bootstrap.servers` of DC1, the source cluster: ```bash bootstrap.servers=localhost:9082 topic.preserve.partitions=true ``` 2. Add this value to `CONFLUENT_HOME/etc/kafka-connect-replicator/replicator_producer.properties`. Replace `localhost:9092` with the `bootstrap.servers` of DC2, the destination cluster: ```bash bootstrap.servers=localhost:9092 ``` 3. Ensure the replication factors are set to `2` or `3` for production, if they are not already: ```bash echo "confluent.topic.replication.factor=3" >> ./etc/kafka-connect-replicator/quickstart-replicator.properties echo "offset.storage.replication.factor=3" >> ./etc/kafka-connect-replicator/quickstart-replicator.properties echo "config.storage.replication.factor=3" >> ./etc/kafka-connect-replicator/quickstart-replicator.properties echo "status.storage.replication.factor=3" >> ./etc/kafka-connect-replicator/quickstart-replicator.properties ``` 4. Start Replicator: ```bash replicator --cluster.id \ --producer.config replicator_producer.properties \ --consumer.config replicator_consumer.properties \ --replication.config ./etc/kafka-connect-replicator/quickstart-replicator.properties ``` Replicator will use the committed offsets by MirrorMaker from DC1 and start replicating messages from DC1 to DC2 based on these offsets. ## Example 2: Different number of partitions in DC1 and DC2 In this example, you migrate from MirrorMaker to Replicator and have a different number of partitions for `inventory` in DC1 and DC2. Prerequisite: : - Confluent Platform 5.0.0 or later is [installed](../../installation/overview.md#installation). - The `src.consumer.group.id` in Replicator must match `group.id` in MirrorMaker. 1. Stop the running MirrorMaker instance from DC1. 2. Configure and start Replicator. In this example, Replicator is run as an executable from the command line or from [a Docker image](../../installation/docker/config-reference.md#config-reference). 1. Add this value to `CONFLUENT_HOME/etc/kafka-connect-replicator/replicator_consumer.properties`. Replace `localhost:9082` with the `bootstrap.servers` of DC1, the source cluster: ```bash bootstrap.servers=localhost:9082 topic.preserve.partitions=false ``` 2. Add this value to `CONFLUENT_HOME/etc/kafka-connect-replicator/replicator_producer.properties`. Replace `localhost:9092` with the `bootstrap.servers` of DC2, the destination cluster: ```bash bootstrap.servers=localhost:9092 ``` 3. Ensure the replication factors are set to `2` or `3` for production, if they are not already: ```bash echo "confluent.topic.replication.factor=3" >> ./etc/kafka-connect-replicator/quickstart-replicator.properties echo "offset.storage.replication.factor=3" >> ./etc/kafka-connect-replicator/quickstart-replicator.properties echo "config.storage.replication.factor=3" >> ./etc/kafka-connect-replicator/quickstart-replicator.properties echo "status.storage.replication.factor=3" >> ./etc/kafka-connect-replicator/quickstart-replicator.properties ``` 4. Start Replicator: ```bash replicator --cluster.id \ --producer.config replicator_producer.properties \ --consumer.config replicator_consumer.properties \ --replication.config ./etc/kafka-connect-replicator/quickstart-replicator.properties ``` Replicator will use the committed offsets by MirrorMaker from DC1 and start replicating messages from DC1 to DC2 based on these offsets. ## Demo: Enabling Schema ID Validation on a Topic at the Command Line This short demo shows the effect of enabling or disabling schema validation on a topic. If you are just getting started with Confluent Platform and Schema Registry, you might want to first work through the [Tutorial: Use Schema Registry on Confluent Platform to Implement Schemas for a Client Application](schema_registry_onprem_tutorial.md#schema-registry-onprem-tutorial), then return to this demo. The examples make use of the `kafka-console-producer` and `kafka-console-consumer`, which are located in `$CONFLUENT_HOME/bin`. 1. On a local install of Confluent Platform version 5.4.0 or later, modify `$CONFLUENT_HOME/etc/kafka/server.properties` to include the following configuration for the Schema Registry URL: ```bash ############################## My Schema Validation Demo Settings ################ # Schema Registry URL confluent.schema.registry.url=http://localhost:8081 ``` The example above includes two lines of comments, which are optional, to keep track of the configurations in the file. 2. Start Confluent Platform using the following command: ```bash confluent local start ``` 3. Create a test topic called `test-schemas` without specifying the Schema ID Validation setting so that it defaults to `false`. ```bash kafka-topics --bootstrap-server localhost:9092 --create --partitions 1 --replication-factor 1 --topic test-schemas ``` This creates a topic with no broker validation on records produced to the test topic, which is what you want for the first part of the demo. You can verify that the topic was created with `kafka-topics --bootstrap-server localhost:9092 --list`. 4. In a new command window for the producer, run this command to produce a serialized record (using the default string serializer) to the topic `test-schemas`. ```bash kafka-console-producer --bootstrap-server localhost:9092 --topic test-schemas --property parse.key=true --property key.separator=, ``` The command is successful because you currently have Schema ID Validation disabled for this topic. If broker Schema ID Validation had been enabled for this topic, the above command to produce to it would not be permitted. The output of this command is a producer command prompt (`>`), where you can type the messages you want to produce. Type your first message at the `>` prompt as follows: ```bash 1,my first record ``` Keep this session of the producer running. 5. Open a new command window for the consumer, and enter this command to read the messages: ```bash kafka-console-consumer --bootstrap-server localhost:9092 --from-beginning --topic test-schemas --property print.key=true ``` The output of this command is `my first record`. Keep this session of the consumer running. 6. Now, set Schema ID Validation for the topic `test-schemas` to `true`. ```bash kafka-configs --bootstrap-server localhost:9092 --alter --entity-type topics --entity-name test-schemas --add-config confluent.value.schema.validation=true ``` You should get a confirmation: `Completed updating config for topic test-schemas.` 7. Return to the producer session, and type a second message at the `>` prompt. ```bash 2,my second record ``` You will get an error because Schema ID Validation is enabled and the messages we are sending do not contain schema IDs: `This record has failed the validation on broker` If you subsequently disable Schema ID Validation (use the same command to set it to `false`), restart the producer, then type and resend the same or another similarly formatted message, the message will go through. (For example, produce `3,my third record`.) The messages that were successfully produced also show on Control Center ([http://localhost:9021/](http://localhost:9021/) in your web browser) in **Topics > test-schemas > messages**. You may have to select a partition or jump to a timestamp to see messages sent earlier. ![image](images/sv-topics.png) 8. Run shutdown and cleanup tasks. - You can stop the consumer and producer with Ctl-C in their respective command windows. - To stop Confluent Platform, type `confluent local services stop`. - If you would like to clear out existing data (topics, schemas, and messages) before starting again with another test, type `confluent local destroy`. #### IMPORTANT - If you use the legacy method of defining TLS/SSL values in system environment variables, TLS/SSL settings will apply to every Java component running on this JVM. For example on Connect, every [connector](/kafka-connectors/self-managed/overview.html) will use the given truststore. Consider a scenario where you are using an Amazon Web Services (AWS) connector such as S3 or Kinesis, and do not have the AWS certificate chain in the given truststore. The connector will fail with the following error: ```bash com.amazonaws.SdkClientException: Unable to execute HTTP request: sun.security.validator.ValidatorException: PKIX path building failed ``` This does not apply if you use the dedicated Schema Registry client configurations. - For the `kafka-avro-console-producer` and `kafka-avro-console-consumer`, you must pass the Schema Registry properties on the command line. Here is an example for the producer: ```bash ./kafka-avro-console-producer --broker-list localhost:9093 --topic myTopic \ --producer.config ~/ect/kafka/producer.properties --property value.schema=‘{“type”:“record”,“name”:“myrecord”,“fields”:[{“name”:“f1”,“type”:“string”}]}’ \ --property schema.registry.url=https://localhost:8081 --property schema.registry.ssl.truststore.location=/etc/kafka/security/schema.registry.client.truststore.jks --property schema.registry.ssl.truststore.password=myTrustStorePassword ``` For more examples of using the producer and consumer command line utilities, see [Test drive Avro schema](../fundamentals/serdes-develop/serdes-avro.md#sr-test-drive-avro), [Test drive JSON Schema](../fundamentals/serdes-develop/serdes-json.md#sr-test-drive-json-schema), [Test drive Protobuf schema](../fundamentals/serdes-develop/serdes-protobuf.md#sr-test-drive-protobuf), and the demo in [Validate Broker-side Schemas IDs in Confluent Platform](../schema-validation.md#schema-validation). #### Connect - Additional RBAC configurations required for [connect-avro-distributed.properties](https://github.com/confluentinc/examples/tree/latest/security/rbac/delta_configs/connect-avro-distributed.properties.delta) ```none bootstrap.servers=localhost:9092 security.protocol=SASL_PLAINTEXT sasl.mechanism=OAUTHBEARER sasl.login.callback.handler.class=io.confluent.kafka.clients.plugins.auth.token.TokenUserLoginCallbackHandler sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required username="connect" password="connect1" metadataServerUrls="http://localhost:8090"; ## Connector client (producer, consumer, admin client) properties ## key.converter=org.apache.kafka.connect.storage.StringConverter value.converter=io.confluent.connect.avro.AvroConverter group.id=connect-cluster offset.storage.topic=connect-offsets offset.storage.replication.factor=1 config.storage.topic=connect-configs config.storage.replication.factor=1 status.storage.topic=connect-statuses status.storage.replication.factor=1 # Allow producer/consumer/admin client overrides (this enables per-connector principals) connector.client.config.override.policy=All producer.security.protocol=SASL_PLAINTEXT producer.sasl.mechanism=OAUTHBEARER producer.sasl.login.callback.handler.class=io.confluent.kafka.clients.plugins.auth.token.TokenUserLoginCallbackHandler # Intentionally omitting `producer.sasl.jaas.config` to force connectors to use their own consumer.security.protocol=SASL_PLAINTEXT consumer.sasl.mechanism=OAUTHBEARER consumer.sasl.login.callback.handler.class=io.confluent.kafka.clients.plugins.auth.token.TokenUserLoginCallbackHandler # Intentionally omitting `consumer.sasl.jaas.config` to force connectors to use their own admin.security.protocol=SASL_PLAINTEXT admin.sasl.mechanism=OAUTHBEARER admin.sasl.login.callback.handler.class=io.confluent.kafka.clients.plugins.auth.token.TokenUserLoginCallbackHandler # Intentionally omitting `admin.sasl.jaas.config` to force connectors to use their own ## REST extensions: RBAC and Secret Registry ## # Installs the RBAC and Secret Registry REST extensions rest.extension.classes=io.confluent.connect.security.ConnectSecurityExtension,io.confluent.connect.secretregistry.ConnectSecretRegistryExtension ## RBAC Authentication ## # Enables basic and bearer authentication for requests made to the worker rest.servlet.initializor.classes=io.confluent.common.security.jetty.initializer.InstallBearerOrBasicSecurityHandler # The path to a directory containing public keys that should be used to verify json web tokens during authentication public.key.path=/tmp/tokenPublicKey.pem ## RBAC Authorization ## # The location of a running metadata service; used to verify that requests are authorized by the users that make them confluent.metadata.bootstrap.server.urls=http://localhost:8090 # Credentials to use when communicating with the MDS; these should usually match the ones used for communicating with Kafka confluent.metadata.basic.auth.user.info=connect:connect1 confluent.metadata.http.auth.credentials.provider=BASIC ## Secret Registry Secret Provider ## config.providers=secret config.providers.secret.class=io.confluent.connect.secretregistry.rbac.config.provider.InternalSecretConfigProvider config.providers.secret.param.master.encryption.key=password1234 config.providers.secret.param.kafkastore.bootstrap.servers=localhost:9092 config.providers.secret.param.kafkastore.security.protocol=SASL_PLAINTEXT config.providers.secret.param.kafkastore.sasl.mechanism=OAUTHBEARER config.providers.secret.param.kafkastore.sasl.login.callback.handler.class=io.confluent.kafka.clients.plugins.auth.token.TokenUserLoginCallbackHandler config.providers.secret.param.kafkastore.sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required username="connect" password="connect1" metadataServerUrls="http://localhost:8090"; ``` - Additional RBAC configurations required for a [source connector](https://github.com/confluentinc/examples/tree/latest/security/rbac/delta_configs/connector-source.properties.delta) ```none producer.override.sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required username="connector" password="connector1" metadataServerUrls="http://localhost:8090"; ``` - Additional RBAC configurations required for a [sink connector](https://github.com/confluentinc/examples/tree/latest/security/rbac/delta_configs/connector-sink.properties.delta) ```none consumer.override.sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required username="connector" password="connector1" metadataServerUrls="http://localhost:8090"; ``` - Role bindings: ```bash # Connect Admin confluent iam rbac role-binding create --principal User:$USER_ADMIN_CONNECT --role ResourceOwner --resource Topic:connect-configs --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:$USER_ADMIN_CONNECT --role ResourceOwner --resource Topic:connect-offsets --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:$USER_ADMIN_CONNECT --role ResourceOwner --resource Topic:connect-statuses --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:$USER_ADMIN_CONNECT --role ResourceOwner --resource Group:connect-cluster --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:$USER_ADMIN_CONNECT --role ResourceOwner --resource Topic:_confluent-secrets --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:$USER_ADMIN_CONNECT --role ResourceOwner --resource Group:secret-registry --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:$USER_ADMIN_CONNECT --role SecurityAdmin --kafka-cluster $KAFKA_CLUSTER_ID --connect-cluster $CONNECT_CLUSTER_ID # Connector Submitter confluent iam rbac role-binding create --principal User:$USER_CONNECTOR_SUBMITTER --role ResourceOwner --resource Connector:$CONNECTOR_NAME --kafka-cluster $KAFKA_CLUSTER_ID --connect-cluster $CONNECT_CLUSTER_ID # Connector confluent iam rbac role-binding create --principal User:$USER_CONNECTOR --role ResourceOwner --resource Topic:$TOPIC2_AVRO --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:$USER_CONNECTOR --role ResourceOwner --resource Subject:${TOPIC2_AVRO}-value --kafka-cluster $KAFKA_CLUSTER_ID --schema-registry-cluster $SCHEMA_REGISTRY_CLUSTER_ID # Sink Connector confluent iam rbac role-binding create --principal User:$USER_CONNECTOR --role DeveloperRead --resource Group:$CONNECTOR_CONSUMER_GROUP_ID --prefix --kafka-cluster $KAFKA_CLUSTER_ID ``` #### REST Proxy - Additional RBAC configurations required for [kafka-rest.properties](https://github.com/confluentinc/examples/tree/latest/security/rbac/delta_configs/kafka-rest.properties.delta) ```none # Configure connections to other Confluent Platform services bootstrap.servers=localhost:9092 schema.registry.url=http://localhost:8081 client.security.protocol=SASL_PLAINTEXT client.sasl.mechanism=OAUTHBEARER client.security.protocol=SASL_PLAINTEXT client.sasl.login.callback.handler.class=io.confluent.kafka.clients.plugins.auth.token.TokenUserLoginCallbackHandler client.sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required username="clientrp" password="clientrp1" metadataServerUrls="http://localhost:8090"; kafka.rest.resource.extension.class=io.confluent.kafkarest.security.KafkaRestSecurityResourceExtension rest.servlet.initializor.classes=io.confluent.common.security.jetty.initializer.InstallBearerOrBasicSecurityHandler public.key.path=/tmp/tokenPublicKey.pem # Credentials to use with the MDS confluent.metadata.bootstrap.server.urls=http://localhost:8090 confluent.metadata.basic.auth.user.info=rp:rp1 confluent.metadata.http.auth.credentials.provider=BASIC ``` - Role bindings: ```bash # REST Proxy Admin: role bindings for license management, no additional administrative rolebindings required because REST Proxy just does impersonation confluent iam rbac role-binding create --principal User:$USER_CLIENT_RP --role DeveloperRead --resource Topic:$LICENSE_TOPIC --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:$USER_CLIENT_RP --role DeveloperWrite --resource Topic:$LICENSE_TOPIC --kafka-cluster $KAFKA_CLUSTER_ID # Producer/Consumer confluent iam rbac role-binding create --principal User:$USER_CLIENT_RP --role ResourceOwner --resource Topic:$TOPIC3 --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:$USER_CLIENT_RP --role DeveloperRead --resource Group:$CONSUMER_GROUP --kafka-cluster $KAFKA_CLUSTER_ID ``` #### ksqlDB - Additional RBAC configurations required for [ksql-server.properties](https://github.com/confluentinc/examples/tree/latest/security/rbac/delta_configs/ksql-server.properties.delta) ```none bootstrap.servers=localhost:9092 security.protocol=SASL_PLAINTEXT sasl.mechanism=OAUTHBEARER sasl.login.callback.handler.class=io.confluent.kafka.clients.plugins.auth.token.TokenUserLoginCallbackHandler sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required username="ksqlDBserver" password="ksqlDBserver1" metadataServerUrls="http://localhost:8090"; # Specify KSQL service id used to bind user/roles to this cluster ksql.service.id=rbac-ksql # Enable KSQL authorization and impersonation ksql.security.extension.class=io.confluent.ksql.security.KsqlConfluentSecurityExtension # Enable KSQL Basic+Bearer authentication ksql.authentication.plugin.class=io.confluent.ksql.security.VertxBearerOrBasicAuthenticationPlugin public.key.path=/tmp/tokenPublicKey.pem # Metadata URL and access credentials confluent.metadata.bootstrap.server.urls=http://localhost:8090 confluent.metadata.http.auth.credentials.provider=BASIC confluent.metadata.basic.auth.user.info=ksqlDBserver:ksqlDBserver1 # Credentials for Schema Registry access ksql.schema.registry.url=http://localhost:8081 ksql.schema.registry.basic.auth.user.info=ksqlDBserver:ksqlDBserver1 ``` - Role bindings: ```bash # ksqlDB Server Admin confluent iam rbac role-binding create --principal User:$USER_ADMIN_KSQLDB --role ResourceOwner --resource Topic:_confluent-ksql-${KSQL_SERVICE_ID}_command_topic --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:$USER_ADMIN_KSQLDB --role ResourceOwner --resource Topic:${KSQL_SERVICE_ID}ksql_processing_log --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:$USER_ADMIN_KSQLDB --role SecurityAdmin --kafka-cluster $KAFKA_CLUSTER_ID --ksql-cluster $KSQL_SERVICE_ID confluent iam rbac role-binding create --principal User:$USER_ADMIN_KSQLDB --role ResourceOwner --resource KsqlCluster:ksql-cluster --kafka-cluster $KAFKA_CLUSTER_ID --ksql-cluster $KSQL_SERVICE_ID # ksqlDB CLI queries confluent iam rbac role-binding create --principal User:${USER_KSQLDB} --role DeveloperWrite --resource KsqlCluster:ksql-cluster --kafka-cluster $KAFKA_CLUSTER_ID --ksql-cluster $KSQL_SERVICE_ID confluent iam rbac role-binding create --principal User:${USER_KSQLDB} --role DeveloperRead --resource Topic:$TOPIC1 --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:${USER_KSQLDB} --role DeveloperRead --resource Group:_confluent-ksql-${KSQL_SERVICE_ID} --prefix --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:${USER_KSQLDB} --role DeveloperRead --resource Topic:${KSQL_SERVICE_ID}ksql_processing_log --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:${USER_ADMIN_KSQLDB} --role DeveloperRead --resource Group:_confluent-ksql-${KSQL_SERVICE_ID} --prefix --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:${USER_ADMIN_KSQLDB} --role DeveloperRead --resource Topic:$TOPIC1 --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:${USER_ADMIN_KSQLDB} --role ResourceOwner --resource TransactionalId:${KSQL_SERVICE_ID} --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:${USER_KSQLDB} --role ResourceOwner --resource Topic:_confluent-ksql-${KSQL_SERVICE_ID}transient --prefix --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:${USER_ADMIN_KSQLDB} --role ResourceOwner --resource Topic:_confluent-ksql-${KSQL_SERVICE_ID}transient --prefix --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:${USER_KSQLDB} --role ResourceOwner --resource Topic:${CSAS_STREAM1} --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:${USER_ADMIN_KSQLDB} --role ResourceOwner --resource Topic:${CSAS_STREAM1} --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:${USER_KSQLDB} --role ResourceOwner --resource Topic:${CTAS_TABLE1} --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:${USER_ADMIN_KSQLDB} --role ResourceOwner --resource Topic:${CTAS_TABLE1} --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:${USER_ADMIN_KSQLDB} --role ResourceOwner --resource Topic:_confluent-ksql-${KSQL_SERVICE_ID} --prefix --kafka-cluster $KAFKA_CLUSTER_ID ``` #### Control Center - Additional RBAC configurations required for [control-center-dev.properties](https://github.com/confluentinc/examples/tree/latest/security/rbac/delta_configs/control-center-dev.properties.delta) ```none confluent.controlcenter.rest.authentication.method=BEARER confluent.controlcenter.streams.security.protocol=SASL_PLAINTEXT public.key.path=/tmp/tokenPublicKey.pem confluent.metadata.basic.auth.user.info=c3:c31 confluent.metadata.bootstrap.server.urls=http://localhost:8090 ``` - Role bindings: ```bash # Control Center Admin confluent iam rbac role-binding create --principal User:$USER_ADMIN_C3 --role SystemAdmin --kafka-cluster $KAFKA_CLUSTER_ID # Control Center user confluent iam rbac role-binding create --principal User:$USER_CLIENT_C --role DeveloperRead --resource Topic:$TOPIC1 --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:$USER_CLIENT_C --role DeveloperRead --resource Topic:$TOPIC2_AVRO --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:$USER_CLIENT_C --role DeveloperRead --resource Subject:${TOPIC2_AVRO}-value --kafka-cluster $KAFKA_CLUSTER_ID --schema-registry-cluster $SCHEMA_REGISTRY_CLUSTER_ID confluent iam rbac role-binding create --principal User:$USER_CLIENT_C --role DeveloperRead --resource Connector:$CONNECTOR_NAME --kafka-cluster $KAFKA_CLUSTER_ID --connect-cluster $CONNECT_CLUSTER_ID ``` ### Common configuration Any component that interacts with secured Confluent Server brokers is a *client* and must be configured for security as well. These clients include Kafka Connect workers and certain connectors such as Replicator, ksqlDB clients, non-Java clients, Confluent Control Center , Confluent Schema Registry, REST Proxy, etc. All Kafka clients share a general set of security configuration parameters required to interact with a secured Confluent Platform cluster: 1. To encrypt data using TLS/SSL and authenticate using SASL, configure the security protocol to use `SASL_SSL`. (If you want TLS/SSL for both encryption and authentication without SASL, the security protocol would be `SSL`). ```bash security.protocol=SASL_SSL ``` 2. To configure TLS encryption truststore settings, set the truststore configuration parameters. In this tutorial, the Kafka client does not need the keystore because authentication is done using SASL/PLAIN instead of mutual TLS (mTLS). ```bash ssl.truststore.location=/var/ssl/private/kafka.client.truststore.jks ssl.truststore.password=test1234 ``` 3. To configure SASL authentication, set the SASL mechanism, which in this tutorial is `PLAIN`. Then configure the JAAS configuration property to describe to connect to the Confluent Server brokers. The properties `username` and `password` are used to configure the user for connections. ```bash sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \ username="client" \ password="client-secret"; ``` Combining the configuration steps above, the Kafka client’s general pattern for enabling TLS/SSL encryption and SASL/PLAIN authentication is to add the following to the Kafka client’s properties file. ```bash security.protocol=SASL_SSL ssl.truststore.location=/var/ssl/private/kafka.client.truststore.jks ssl.truststore.password=test1234 sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \ username="client" \ password="client-secret"; ``` What differs between Kafka clients is the specific [configuration prefix](../kafka/security_prefixes.md#security-prefixes) that precedes each configuration parameter, as described in the sections below. ### Step 1: Configure the inventory file For the full configuration examples, see the [Schema Registry Switchover using Confluent Ansible](https://github.com/confluentinc/cp-ansible/blob/8.1.x/docs/sample_inventories/sr-automation-workflow/sr_switchover_cp_to_cc.yml) sample inventory. ```yaml all: vars: password_encoder_secret: --- [1] schema_registry: --- [2] vars: unified_stream_manager: --- [3] schema_registry_endpoint: --- [4] authentication_type: --- [5] basic_username: --- [6] basic_password: --- [7] schema_exporters: --- [8] - name: --- [9] subjects: --- [10] context_type: --- [11] context: --- [12] config: --- [13] schema_registry_endpoint: authentication_type: basic_username: basic_password: sr_switch_over_exporter_name: --- [14] schema_importers: --- [15] - name: --- [16] subjects: --- [17] config: --- [18] schema_registry_endpoint: authentication_type: basic_username: basic_password: ``` * [1] Required. The secret for enabling schema exporter and importer. For more information, see [password.encoder.secret](https://docs.confluent.io/platform/current/schema-registry/installation/config.html#password-encoder-secret). * [2] The below variables can be specified under the `schema_registry` role or under `all` role. * [3] Required to enable forward sync from Confluent Platform to Confluent Cloud. Contains Confluent Cloud Schema Registry connection details. * [4] Required. The endpoint of the remote Confluent Cloud Schema Registry. * [5] Required. The authentication type of the remote Confluent Cloud Schema Registry. The supported type is `basic`. * [6] Required. The API key of the Confluent Cloud Schema Registry. * [7] Required. The API secret of the Confluent Cloud Schema Registry. * [8] Required. The schema exporter configurations. Only one exporter is allowed. * [9] Required. The name of the schema exporter. Must match `sr_switch_over_exporter_name` ([14]). * [10] Required. The subjects of the schema exporter. To export all subjects, use `:*:` for all subjects in all contexts, or specify patterns: `[":.context:*"]`. * [11] Required. The context type of the schema exporter. Specify how to handle contexts. Supported types are `AUTO`, `CUSTOM`, `NONE`, and `DEFAULT`. The default value is `AUTO`, whereby the exporter will use an auto-generated context in the destination cluster. The auto-generated context name will be reported in the status. If set to `NONE`, the exporter copies the source schemas as-is. * [12] Required if `context_type` is `CUSTOM`. The context of the schema exporter. * [13] If omitted, Confluent Ansible will use the default values specified in [4], [5], [6], and [7]. * [14] The name of the exporter to use for the switchover workflow. If not specified, the workflow will be an import-only workflow. If specified, the value must match one of `schema_exporters[0].name`. * [15] Required. The schema importers configuration. Only one importer is allowed. * [16] Required. The name of the schema importer. * [17] Required. The subjects of the schema importer. To import all subjects, use `:*:` for the default context, or specify patterns: `[":.context:*"]`. * [18] If omitted, Confluent Ansible will use the default values specified in [4], [5], [6], and [7]. ## Setup 1. Clone the [confluentinc/examples](https://github.com/confluentinc/examples) GitHub repository and check out the `latest` branch. ```bash git clone https://github.com/confluentinc/examples cd examples git checkout latest ``` 2. Change directory to the example for Clojure. ```bash cd clients/cloud/clojure/ ``` 3. Create a local file (for example, at `$HOME/.confluent/java.config`) with configuration parameters to connect to your Kafka cluster. Starting with one of the templates below, customize the file with connection information to your cluster. Substitute your values for `{{ BROKER_ENDPOINT }}`, `{{CLUSTER_API_KEY }}`, and `{{ CLUSTER_API_SECRET }}` (see [Configure Confluent Cloud Clients](https://docs.confluent.io/cloud/current/client-apps/config-client.html) for instructions on how to manually find these values, or use the [ccloud-stack utility for Confluent Cloud](/cloud/current/examples/ccloud/docs/ccloud-stack.html) to automatically create them). - Template configuration file for Confluent Cloud ```none # Required connection configs for Kafka producer, consumer, and admin bootstrap.servers={{ BROKER_ENDPOINT }} security.protocol=SASL_SSL sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username='{{ CLUSTER_API_KEY }}' password='{{ CLUSTER_API_SECRET }}'; sasl.mechanism=PLAIN # Required for correctness in Apache Kafka clients prior to 2.6 client.dns.lookup=use_all_dns_ips # Best practice for higher availability in Apache Kafka clients prior to 3.0 session.timeout.ms=45000 # Best practice for Kafka producer to prevent data loss acks=all ``` - Template configuration file for local host ```none # Kafka bootstrap.servers=localhost:9092 ``` #### Step 3: Create the connector configuration file Create a JSON file that contains the connector configuration properties. The following example shows the required connector properties. ```none { "connector.class": "ActiveMQSource", "name": "ActiveMQSource_0", "kafka.auth.mode": "KAFKA_API_KEY", "kafka.api.key": "", "kafka.api.secret": "", "kafka.topic" : "topic_0", "output.data.format" : "AVRO", "activemq.url" : "tcp://:61616", "activemq.username" : "", "activemq.password" : "", "jms.destination.name" : "", "tasks.max" : "1" } ``` Note the following property definitions: * `"name"`: Sets a name for your new connector. * `"connector.class"`: Identifies the connector plugin name. * `"kafka.auth.mode"`: Identifies the connector authentication mode you want to use. There are two options: `SERVICE_ACCOUNT` or `KAFKA_API_KEY` (the default). To use an API key and secret, specify the configuration properties `kafka.api.key` and `kafka.api.secret`, as shown in the example configuration (above). To use a [service account](service-account.md#s3-cloud-service-account), specify the **Resource ID** in the property `kafka.service.account.id=`. To list the available service account resource IDs, use the following command: ```bash confluent iam service-account list ``` For example: ```bash confluent iam service-account list Id | Resource ID | Name | Description +---------+-------------+-------------------+------------------- 123456 | sa-l1r23m | sa-1 | Service account 1 789101 | sa-l4d56p | sa-2 | Service account 2 ``` * `"kafka.topic"`: The Kafka topic name where you want data sent. * `"output.data.format"`: Options are AVRO, JSON, JSON_SR, and PROTOBUF. [Schema Registry](../get-started/schema-registry.md#cloud-sr-config) must be enabled to use a Schema Registry-based format (for example, Avro, JSON_SR (JSON Schema), or Protobuf). See [Schema Registry Enabled Environments](limits.md#connect-ccloud-environment-limits) for additional information. * `"activemq.url"`: The URL of the ActiveMQ broker. An ActiveMQ broker URL is similar to `tcp://:61616`. * `"jms.destination.name"`: The name of the JMS destination `queue` or `topic` name to read from. * `"tasks.max"`: Enter the number of [tasks](/platform/current/connect/concepts.html#tasks) in use by the connector. The connector supports multiple tasks. More tasks may improve performance. **Single Message Transforms**: See the [Single Message Transforms (SMT)](single-message-transforms.md#cc-single-message-transforms) documentation for details about adding SMTs using the CLI. See [Configuration Properties](#cc-activemq-source-config-properties) for all property values and definitions. #### Step 3: Create the connector configuration file Create a JSON file that contains the connector configuration properties. The following entry shows a typical connector configuration. When launched, the connector consumes data from streams `stream-1` and `stream-2` of log group `cloudwatch-group`. It produces the data to Kafka topic `logs.cloudwatch-group.stream-1` and topic `logs.cloudwatch-group.stream-2`. ```json { "name": "CloudWatchLogsSourceConnector_0", "config": { "connector.class": "CloudWatchLogsSource", "name": "CloudWatchLogsSourceConnector_0", "kafka.auth.mode": "KAFKA_API_KEY", "kafka.api.key": "", "kafka.api.secret": "", "kafka.topic.format": "logs.${log-group}.${log-stream}", "output.data.format": "STRING", "aws.access.key.id": "", "aws.secret.access.key": "", "aws.cloudwatch.logs.url": "https://logs.us-east-1.amazonaws.com", "aws.cloudwatch.log.group": "cloudwatch-group", "aws.cloudwatch.log.streams": "stream-1, stream-2", "aws.poll.interval.ms": "1500", "log.message.format": "STRING", "behavior.on.error": "FAIL", "tasks.max": "1" } } ``` Note the following property definitions: * `"connector.class"`: Identifies the connector plugin name. * `"name"`: Sets a name for your new connector. * `"kafka.auth.mode"`: Identifies the connector authentication mode you want to use. There are two options: `SERVICE_ACCOUNT` or `KAFKA_API_KEY` (the default). To use an API key and secret, specify the configuration properties `kafka.api.key` and `kafka.api.secret`, as shown in the example configuration (above). To use a [service account](service-account.md#s3-cloud-service-account), specify the **Resource ID** in the property `kafka.service.account.id=`. To list the available service account resource IDs, use the following command: ```bash confluent iam service-account list ``` For example: ```bash confluent iam service-account list Id | Resource ID | Name | Description +---------+-------------+-------------------+------------------- 123456 | sa-l1r23m | sa-1 | Service account 1 789101 | sa-l4d56p | sa-2 | Service account 2 ``` * `"kafka.topic.format"`: Topic format to use for generating the names of the Kafka topics. This format string can contain `${log-group}` and `${log-stream}` as a placeholder for the original log group and log stream names. For example, `confluent.${log-group}.${log-stream}` for the log group `log-group-1` and log stream `log-stream-1` maps to the topic name `confluent.log-group-1.log-stream-1`. * `"output.data.format"`: Enter an output data format (data going to the Kafka topic): AVRO, STRING, or JSON (schemaless). [Schema Registry](../get-started/schema-registry.md#cloud-sr-config) must be enabled to use a Schema Registry-based format (for example, Avro). See [Schema Registry Enabled Environments](limits.md#connect-ccloud-environment-limits) for additional information. * `"aws.access.key.id"` and `"aws.secret.access.key"`: Enter the AWS Access Key ID and Secret. For information about how to set these up, see [Access Keys](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys). * `"aws.cloudwatch.logs.url"`: For example, `https://logs.us-east-1.amazonaws.com`. For additional information, see [Amazon CloudWatch Logs endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/cwl_region.html). * `"aws.cloudwatch.log.group"`: Name of the log group on Amazon CloudWatch where the log streams are contained. * `"aws.cloudwatch.log.streams"`: List of the log streams on Amazon CloudWatch where you want to track log records. If the property is not used, all log streams under the log group are tracked. * `"aws.poll.interval.ms"`: Time in milliseconds (ms) the connector waits between polling the endpoint for updates. The default value is `1000` ms (1 second). * `"log.message.format"`: Specifies the format for log messages received from CloudWatch Log Streams. Valid values for this configuration are `JSON` and `STRING`. The default value is `STRING` * `"behavior.on.error"`: Determines how errors are managed by the connector. It must be set to one of the following: `IGNORE` or `FAIL`. When set to `FAIL`, the connector halts upon encountering an error while processing records. When set to `IGNORE`, the connector continues processing subsequent sets of records despite encountering errors. If a record is malformed, it is directed to the error topic associated with the connector. The default value is `FAIL`. Note: This configuration does not affect the connector’s behavior when log.message.format is set to `STRING`. * `"tasks.max"`: Enter the number of [tasks](/platform/current/connect/concepts.html#tasks) to use with the connector. The connector supports running one or more tasks. The connector can start at one task to support all import data and can scale up to one task per log stream. One task per log stream can raise the performance, up to the greatest number of log streams that Amazon supports (100,000 logs per second or 10 MB per second). **Single Message Transforms**: See the [Single Message Transforms (SMT)](single-message-transforms.md#cc-single-message-transforms) documentation for details about adding SMTs using the CLI. See [Configuration Properties](#cc-amazon-cloudwatch-logs-source-config-properties) for all property values and descriptions. #### Step 3: Create the connector configuration file Create a JSON file that contains the connector configuration properties. The following example shows required and optional connector properties. ```none { "name": "DynamoDbSinkConnector_0", "config": { "topics": "pageviews", "input.data.format": "AVRO", "connector.class": "DynamoDbSink", "name": "DynamoDbSinkConnector_0", "kafka.auth.mode": "KAFKA_API_KEY", "kafka.api.key": "", "kafka.api.secret": "", "aws.access.key.id": "********************", "aws.secret.access.key": "****************************************", "aws.dynamodb.pk.hash": "value.userid", "aws.dynamodb.pk.sort": "value.pageid", "table.name.format": "kafka-${topic}", "tasks.max": "1" } } ``` Note the following property definitions: * `"name"`: Sets a name for your new connector. * `"connector.class"`: Identifies the connector plugin name. * `"topics"`: Identifies the topic name or a comma-separated list of topic names. * `"kafka.auth.mode"`: Identifies the connector authentication mode you want to use. There are two options: `SERVICE_ACCOUNT` or `KAFKA_API_KEY` (the default). To use an API key and secret, specify the configuration properties `kafka.api.key` and `kafka.api.secret`, as shown in the example configuration (above). To use a [service account](service-account.md#s3-cloud-service-account), specify the **Resource ID** in the property `kafka.service.account.id=`. To list the available service account resource IDs, use the following command: ```bash confluent iam service-account list ``` For example: ```bash confluent iam service-account list Id | Resource ID | Name | Description +---------+-------------+-------------------+------------------- 123456 | sa-l1r23m | sa-1 | Service account 1 789101 | sa-l4d56p | sa-2 | Service account 2 ``` * `"input.data.format"`: Sets the input Kafka record value format (data coming from the Kafka topic). Valid entries are **AVRO**, **JSON_SR**, **PROTOBUF**, or **JSON**. You must have Confluent Cloud Schema Registry configured if using a schema-based message format (for example, Avro, JSON_SR (JSON Schema), or Protobuf). * `"aws.dynamodb.pk.hash"`: Defines how the DynamoDB table hash key is extracted from the records. By default, the Kafka partition number where the record is generated is used as the hash key. The hash key can be created from other record references. See [DynamoDB hash keys and sort keys](#cc-amazon-dynamodb-sink-hash-sort) for examples. Note that the maximum size of a partition using the default configuration is limited to 10 GB (defined by Amazon DynamoDB). * `"aws.dynamodb.pk.sort"`: Defines how the DynamoDB table sort key is extracted from the records. By default, the record offset is used as the sort key. If no sort key is required, use an empty string for this property `""`. The sort key can be created from other record references. See [DynamoDB hash keys and sort keys](#cc-amazon-dynamodb-sink-hash-sort) for examples. * `"table.name.format"`: The property is optional and defaults to the name of the Kafka topic. To create a table name format use the syntax `${topic}`. For example, `kafka_${topic}` for the topic `orders` maps to the table name `kafka_orders`. * `"tasks.max"`: Maximum number of tasks the connector can run. See Confluent Cloud [connector limitations](limits.md#cc-amazon-redshift-sink-limits) for additional task information. **Single Message Transforms**: See the [Single Message Transforms (SMT)](single-message-transforms.md#cc-single-message-transforms) documentation for details about adding SMTs using the CLI. See [Unsupported transformations](single-message-transforms.md#cc-single-message-transforms-unsupported-transforms) for a list of SMTs that are not supported with this connector. See [Configuration Properties](#cc-amazon-dynamodb-sink-config-properties) for all property values and definitions. #### Step 3: Create the connector configuration file Create a JSON file that contains the connector configuration properties. The following entry shows the required configuration properties. ```json { "name": "SqsSource_0", "config": { "connector.class": "SqsSource", "name": "SqsSource_0", "kafka.auth.mode": "KAFKA_API_KEY", "kafka.api.key": "", "kafka.api.secret": "", "sqs.url": "https://sqs.us-east-2.amazonaws.com/123456789012/MyQueue", "kafka.topic": "stocks", "aws.access.key.id": "", "aws.secret.key.id": "", "output.data.format": "JSON", "tasks.max": "1" } } ``` Note the following property definitions: * `"connector.class"`: Identifies the connector plugin name. * `"name"`: Sets a name for your new connector. * `"kafka.auth.mode"`: Identifies the connector authentication mode you want to use. There are two options: `SERVICE_ACCOUNT` or `KAFKA_API_KEY` (the default). To use an API key and secret, specify the configuration properties `kafka.api.key` and `kafka.api.secret`, as shown in the example configuration (above). To use a [service account](service-account.md#s3-cloud-service-account), specify the **Resource ID** in the property `kafka.service.account.id=`. To list the available service account resource IDs, use the following command: ```bash confluent iam service-account list ``` For example: ```bash confluent iam service-account list Id | Resource ID | Name | Description +---------+-------------+-------------------+------------------- 123456 | sa-l1r23m | sa-1 | Service account 1 789101 | sa-l4d56p | sa-2 | Service account 2 ``` * `"sqs.url"`: For example, `https://sqs.us-east-2.amazonaws.com/123456789012/MyQueue`. For details, see [Amazon SQS queue and message identifiers](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-queue-message-identifiers.html). * `"sqs.region"`: The AWS region that the SQS queue belongs to. If this property is not used, the connector attempts to infer the region from the SQS URL. * `"aws.access.key.id"` and `"aws.secret.key.id"`: Enter the AWS Access Key ID and Secret Key ID. For information about how to set these up, see [Access Keys](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys). * `"output.data.format"`: Enter an output data format (data going to the Kafka topic): AVRO, JSON_SR (JSON Schema), PROTOBUF, or JSON (schemaless). [Schema Registry](../get-started/schema-registry.md#cloud-sr-config) must be enabled to use a Schema Registry-based format (for example, Avro, JSON_SR (JSON Schema), or Protobuf). See [Schema Registry Enabled Environments](limits.md#connect-ccloud-environment-limits) for additional information. * `"tasks.max"`: Enter the number of [tasks](/platform/current/connect/concepts.html#tasks) to use with the connector. More tasks may improve performance. **Single Message Transforms**: See the [Single Message Transforms (SMT)](single-message-transforms.md#cc-single-message-transforms) documentation for details about adding SMTs using the CLI. See [Configuration Properties](#cc-amazon-sqs-source-config-properties) for all property values and descriptions. ## Quick Start Use this quick start to get up and running with the Confluent Cloud AWS Lambda Sink connector. The quick start provides the basics of selecting the connector and configuring it to send records to AWS Lambda. Prerequisites : * Authorized access to a [Confluent Cloud](https://www.confluent.io/confluent-cloud/) cluster on AWS. Confluent Cloud is available through the [AWS Marketplace](https://aws.amazon.com/marketplace/pp/prodview-g5ujul6iovvcy?trk=14575e70-1766-4f20-8083-0c2757a1ec75&sc_channel=el) or [directly from Confluent](https://www.confluent.io/get-started/). * The Confluent CLI installed and configured for the cluster. See [Install the Confluent CLI](https://docs.confluent.io/confluent-cli/current/install.html). * [Schema Registry](../get-started/schema-registry.md#cloud-sr-config) must be enabled to use a Schema Registry-based format (for example, Avro, JSON_SR (JSON Schema), or Protobuf). See [Schema Registry Enabled Environments](limits.md#connect-ccloud-environment-limits) for additional information. #### NOTE If no schema is defined, values are encoded as plain strings. For example, `"name": "Kimberley Human"` is encoded as `name=Kimberley Human`. * For networking considerations, see [Networking and DNS](overview.md#connect-internet-access-resources). To use a set of public egress IP addresses, see [Public Egress IP Addresses for Confluent Cloud Connectors](static-egress-ip.md#cc-static-egress-ips). * Your AWS Lambda project should be in the same region as your Confluent Cloud cluster where you are running the connector. * An AWS account configured with [Access Keys](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys). * You need to configure a Lambda IAM policy for the account to allow the following: * `lambda:InvokeFunction` and `lambda:GetFunction`. * Add resource to allow invoking all aliases and versions of the function, including `$LATEST`. When you specify function name without a version or alias suffix, all underlying versions, aliases, and `$LATEST` are implicitly included and accessible. The following shows a JSON example for setting this policy: ```json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "lambda:InvokeFunction", "lambda:GetFunction" ], "Resource": [ "arn:aws:lambda:*:*:function:" ] } ] } ``` #### NOTE If you want to restrict the connector to a particular alias or version, update the permission policy with alias or versions appended at the end as show below: ```none arn:aws:lambda:*:*:function:functionName:alias OR arn:aws:lambda:*:*:function:functionName:1 ``` - Kafka cluster credentials. The following lists the different ways you can provide credentials. - Enter an existing [service account](service-account.md#s3-cloud-service-account) resource ID. - Create a Confluent Cloud [service account](service-account.md#s3-cloud-service-account) for the connector. Make sure to review the ACL entries required in the [service account documentation](service-account.md#s3-cloud-service-account). Some connectors have specific ACL requirements. - Create a Confluent Cloud API key and secret. To create a key and secret, you can use [confluent api-key create](https://docs.confluent.io/confluent-cli/current/command-reference/api-key/confluent_api-key_create.html) *or* you can autogenerate the API key and secret directly in the Cloud Console when setting up the connector. #### NOTE The following steps show basic ACL entries for sink connector service accounts. Be sure to review the [Sink connector SUCCESS and ERROR topics](#cloud-service-account-sink-additional-acls) and [Sink connector offset management](#cloud-service-account-sink-offset-management-acls) sections for additional ACL entries that may be required for certain connectors or tasks. 1. Create a service account named `myserviceaccount`: ```none confluent iam service-account create myserviceaccount --description "test service account" ``` 2. Find the service account ID for `myserviceaccount`: ```none confluent iam service-account list ``` 3. Set a DESCRIBE ACL to the cluster. ```none confluent kafka acl create --allow --service-account "" --operations describe --cluster-scope ``` 4. Set a READ ACL to `pageviews`: ```none confluent kafka acl create --allow --service-account "" --operations read --topic pageviews ``` 5. Set a CREATE ACL to the following topic prefix: ```none confluent kafka acl create --allow --service-account "" --operations create --prefix --topic "dlq-lcc-" ``` 6. Set a WRITE ACL to the following topic prefix: ```none confluent kafka acl create --allow --service-account "" --operations write --prefix --topic "dlq-lcc-" ``` 7. Set a READ ACL to a consumer group with the following prefix: ```none confluent kafka acl create --allow --service-account "" --operations read --prefix --consumer-group "connect-lcc-" ``` 8. Create a Kafka API key and secret for ``: ```none confluent api-key create --resource "lkc-abcd123" --service-account "" ``` 9. Save the API key and secret. The connector configuration must include either an API key and secret or a service account ID. For additional service account information, see [Service Accounts on Confluent Cloud](../security/authenticate/workload-identities/service-accounts/overview.md#service-accounts). ## Create a Confluent Cloud configuration file 1. Create a customized Confluent Cloud configuration file with key=value pairs of connection details for the Confluent Cloud cluster using the format shown in this example, and save as `/tmp/myconfig.properties`. Note: you cannot use the `~/.ccloud/config.json` generated by Confluent Cloud CLI for other Confluent Platform components or clients, which is why you need to manually create your own key=value properties file. ```bash bootstrap.servers= ssl.endpoint.identification.algorithm=https security.protocol=SASL_SSL sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username='' password=''; ``` 2. Substitute ``, ``, and `` in the file above, to point to your Confluent Cloud cluster using the desired service account’s cluster API key and secret. 3. If you are using Confluent Cloud Schema Registry, add the following configuration parameters to the same file above. Substitute ``, ``, and `` to point to your Confluent Cloud Schema Registry using the desired service account’s Schema Registry API key and secret (which are different from the cluster API key and secret used earlier). ```bash basic.auth.credentials.source=USER_INFO schema.registry.basic.auth.user.info=: schema.registry.url=https:// ``` 4. If you are using Confluent Cloud ksqlDB, add the following configuration parameters to same file above. Substitute ``, ``, and `` to point to your Confluent Cloud ksqlDB using the desired service account’s ksqlDB API key and secret (which are different from the cluster API key and secret used earlier). ```bash ksql.endpoint= ksql.basic.auth.user.info=: ``` 5. Review the `/tmp/myconfig.properties` file, which may resemble below (with required substitutions): ```bash bootstrap.servers= ssl.endpoint.identification.algorithm=https security.protocol=SASL_SSL sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username='' password=''; basic.auth.credentials.source=USER_INFO schema.registry.basic.auth.user.info=: schema.registry.url=https:// ksql.endpoint= ksql.basic.auth.user.info=: ``` ### Deploy resources In this section, you add resources to your Terraform configuration file and provision them when the GitHub Action runs. 1. In your repository, create a new file named “variables.tf” with the following code. ```terraform variable "confluent_cloud_api_key" { description = "Confluent Cloud API Key" type = string } variable "confluent_cloud_api_secret" { description = "Confluent Cloud API Secret" type = string sensitive = true } ``` 2. In the “main.tf” file, add the following code. This code references the Cloud API key and secret you added in the previous steps and creates a new environment and Kafka cluster for your organization. Optionally, you can choose to use an existing environment. ```terraform locals { cloud = "AWS" region = "us-east-2" } provider "confluent" { cloud_api_key = var.confluent_cloud_api_key cloud_api_secret = var.confluent_cloud_api_secret } # Create a new environment. resource "confluent_environment" "my_env" { display_name = "my_env" stream_governance { package = "ESSENTIALS" } } # Create a new Kafka cluster. resource "confluent_kafka_cluster" "my_kafka_cluster" { display_name = "my_kafka_cluster" availability = "SINGLE_ZONE" cloud = local.cloud region = local.region basic {} environment { id = confluent_environment.my_env.id } depends_on = [ confluent_environment.my_env ] } # Access the Stream Governance Essentials package to the environment. data "confluent_schema_registry_cluster" "my_sr_cluster" { environment { id = confluent_environment.my_env.id } } ``` 3. Create a Service Account and provide a role binding by adding the following code to “main.tf”. The role binding gives the Service Account the necessary permissions to create topics, Flink statements, and other resources. In production, you may want to assign a less privileged role than OrganizationAdmin. ```terraform # Create a new Service Account. This will used during Kafka API key creation and Flink SQL statement submission. resource "confluent_service_account" "my_service_account" { display_name = "my_service_account" } data "confluent_organization" "my_org" {} # Assign the OrganizationAdmin role binding to the above Service Account. # This will give the Service Account the necessary permissions to create topics, Flink statements, etc. # In production, you may want to assign a less privileged role. resource "confluent_role_binding" "my_org_admin_role_binding" { principal = "User:${confluent_service_account.my_service_account.id}" role_name = "OrganizationAdmin" crn_pattern = data.confluent_organization.my_org.resource_name depends_on = [ confluent_service_account.my_service_account ] } ``` 4. Push all changes to your repository and check the **Actions** page to ensure the workflow runs successfully. At this point, you should have a new environment, an Apache Kafka® cluster, and a Stream Governance package provisioned in your Confluent Cloud organization. ## Confluent Platform properties files The following list includes the default Confluent Platform services configuration properties files, where `$CONFLUENT_HOME` is the directory where you installed Confluent Platform. You reference or modify the appropriate file when you work with a Confluent Platform service. - Connect: `$CONFLUENT_HOME/etc/schema-registry/connect-avro-distributed.properties` - Control Center: `$C3_HOME/etc/confluent-control-center/control-center-dev.properties` [1](#f1) - KRaft Controller: `$CONFLUENT_HOME/etc/kafka/controller.properties` - Kafka (KRaft mode): `$CONFLUENT_HOME/etc/kafka/broker.properties` - Kafka (ZooKeeper mode, Legacy): `$CONFLUENT_HOME/etc/kafka/server.properties` - REST Proxy: `$CONFLUENT_HOME/etc/kafka-rest/kafka-rest.properties` - ksqlDB: `$CONFLUENT_HOME/etc/ksqldb/ksql-server.properties` - Schema Registry: `$CONFLUENT_HOME/etc/schema-registry/schema-registry.properties` - ZooKeeper: `$CONFLUENT_HOME/etc/kafka/zookeeper.properties` * **[1]** Starting with Confluent Platform 8.0, Control Center is provided in an independent release, as described in [Control Center single-node manual installation](/control-center/current/installation/overview.html#single-node-manual-installation), and the Control Center examples in this tutorial: [Getting Started with a multi-broker cluster](/platform/current/get-started/tutorial-multi-broker.html#optional-install-and-configure-c3). In previous versions of Confluent Platform, this path was `$CONFLUENT_HOME/etc/confluent-control-center/control-center-dev.properties`. ### confluent.controlcenter.kafka..cprest.url Defines the REST endpoints for any additional Kafka clusters being monitored by Control Center to enable HTTP servers on the broker(s). Replace `` with the name that identifies this cluster. This name should be consistent with the Kafka cluster name used for other Control Center configurations. A comma-separated list with multiple values can be provided for a multi-broker cluster. Note that if the REST API endpoints are secured with TLS, you must include additional properties in the Confluent Control Center properties file that provide the security information. For more information, see [Configure TLS for Control Center as a server](../security/ssl.md#controlcenter-ui-https) and [TLS settings for web access](#https-settings). The following example shows REST endpoint settings for three clusters or data centers (dc1, dc2, and dc3): ```bash confluent.controlcenter.streams.cprest.url=https://dc1:8090 confluent.controlcenter.kafka.dc2.cprest.url=https://dc2:8090 confluent.controlcenter.kafka.dc3.cprest.url=https://dc3:8090 ``` * Type: list * Default: “” * Importance: high For an example of configuring the Control Center `cprest.url` specifically for multiple clusters, see [Enabling Multi-Cluster Schema Registry](/platform/current/control-center/topics/schema.html#multi-cluster-sr). ## Multi-node manual installation Use these steps for multi-node manual installation of Control Center and Confluent Platform. 1. Provision a new node using any of the Confluent Platform supported operating systems. For more information, see [Supported operating systems](/platform/current/installation/versions-interoperability.html#operating-systems). Login to the VM on which you will install Confluent Platform. Install Control Center on a new node/VM. To ensure a smooth transition, allow Control Center (Legacy) users to continue using Control Center (Legacy) until the Control Center has gathered 7-15 days of historical metrics. For more information, see [Migration](#install-c3-migration). 2. Login to the VM and install Control Center. For more information, see [Compatibility with Confluent Platform](system-requirements.md#install-c3-supported-cp). Use the instructions for installing Confluent Platform but make sure to use the base URL and properties from these instructions to install Control Center. For more information, see [Confluent Platform System Requirements](/platform/current/installation/system-requirements.html#system-requirements), [Install Confluent Platform using Systemd on Ubuntu and Debian](/platform/current/installation/installing_cp/deb-ubuntu.html#systemd-ubuntu-debian-install), and [Install Confluent Platform using Systemd on RHEL, CentOS, and Fedora-based Linux](/platform/current/installation/installing_cp/rhel-centos.html#systemd-rhel-centos-install). Ubuntu and Debian ```bash export BASE_URL=https://packages.confluent.io/confluent-control-center-next-gen/deb/ sudo apt-get update wget ${BASE_URL}archive.key sudo apt-key add archive.key sudo add-apt-repository -y "deb ${BASE_URL} stable main" sudo apt update ``` ```bash sudo apt install -y confluent-control-center-next-gen ``` RHEL, CentOS, and Fedora-based Linux ```bash export base_url=https://packages.confluent.io/confluent-control-center-next-gen/rpm/ cat < /dev/null [Confluent] name=Confluent repository baseurl=${base_url} gpgcheck=1 gpgkey=${base_url}archive.key enabled=1 EOF ``` ```bash sudo yum install -y confluent-control-center-next-gen cyrus-sasl openssl-devel ``` 3. Install Java for your operating system (if not installed). ```bash sudo yum install java-17-openjdk -y ---- RHEL/CentOs/Fedora ``` ```bash sudo apt install openjdk-17-jdk -y ---- Ubuntu/Debian ``` 4. Copy `/etc/confluent-control-center/control-center-production.properties` from your current Control Center (Legacy) into the Control Center node on the VM and add this property: ```bash confluent.controlcenter.id=10 confluent.controlcenter.prometheus.enable=true confluent.controlcenter.prometheus.url=http://localhost:9090 confluent.controlcenter.prometheus.rules.file=/etc/confluent-control-center/trigger_rules-generated.yml confluent.controlcenter.alertmanager.config.file=/etc/confluent-control-center/alertmanager-generated.yml ``` 5. If you are using SSL, copy the certs at `/var/ssl/private` from your current Control Center (Legacy) into the Control Center node on the VM. If you are not using SSL, skip this step. 6. Change ownership of the configuration files. Give the Control Center process write permissions to the alert manager, so that the process can properly manage alert triggers. Use the `chown` command to set the Control Center process as the owner of the `trigger_rules-generated.yml` and `alertmanager-generated.yml` files. ```bash chown -c cp-control-center /etc/confluent-control-center/trigger_rules-generated.yml chown -c cp-control-center /etc/confluent-control-center/alertmanager-generated.yml ``` 7. Start the following services on the Control Center node: ```bash systemctl enable prometheus systemctl start prometheus systemctl enable alertmanager systemctl start alertmanager systemctl enable confluent-control-center systemctl start confluent-control-center ``` 8. Login to each broker you intend to monitor and verify brokers can reach the Control Center node on port 9090. ```bash curl http://:9090/-/healthy ``` All brokers must have access to the Control Center node on port 9090, but port 9090 does not require public access. Restrict access as you prefer. 9. Update the following properties for every Kafka broker and KRaft controller. Pay attention to the notes on the highlighted lines that follow the code example. KRaft controller properties are located here: `/etc/controller/server.properties` ```bash metric.reporters=io.confluent.telemetry.reporter.TelemetryReporter,io.confluent.metrics.reporter.ConfluentMetricsReporter --- [1] confluent.telemetry.exporter._c3.type=http confluent.telemetry.exporter._c3.enabled=true confluent.telemetry.exporter._c3.metrics.include=io.confluent.kafka.server.request.(?!.*delta).*|io.confluent.kafka.server.server.broker.state|io.confluent.kafka.server.replica.manager.leader.count|io.confluent.kafka.server.request.queue.size|io.confluent.kafka.server.broker.topic.failed.produce.requests.rate.1.min|io.confluent.kafka.server.tier.archiver.total.lag|io.confluent.kafka.server.request.total.time.ms.p99|io.confluent.kafka.server.broker.topic.failed.fetch.requests.rate.1.min|io.confluent.kafka.server.broker.topic.total.fetch.requests.rate.1.min|io.confluent.kafka.server.partition.caught.up.replicas.count|io.confluent.kafka.server.partition.observer.replicas.count|io.confluent.kafka.server.tier.tasks.num.partitions.in.error|io.confluent.kafka.server.broker.topic.bytes.out.rate.1.min|io.confluent.kafka.server.request.total.time.ms.p95|io.confluent.kafka.server.controller.active.controller.count|io.confluent.kafka.server.session.expire.listener.zookeeper.disconnects.total|io.confluent.kafka.server.request.total.time.ms.p999|io.confluent.kafka.server.controller.active.broker.count|io.confluent.kafka.server.request.handler.pool.request.handler.avg.idle.percent.rate.1.min|io.confluent.kafka.server.session.expire.listener.zookeeper.disconnects.rate.1.min|io.confluent.kafka.server.controller.unclean.leader.elections.rate.1.min|io.confluent.kafka.server.replica.manager.partition.count|io.confluent.kafka.server.controller.unclean.leader.elections.total|io.confluent.kafka.server.partition.replicas.count|io.confluent.kafka.server.broker.topic.total.produce.requests.rate.1.min|io.confluent.kafka.server.controller.offline.partitions.count|io.confluent.kafka.server.socket.server.network.processor.avg.idle.percent|io.confluent.kafka.server.partition.under.replicated|io.confluent.kafka.server.log.log.start.offset|io.confluent.kafka.server.log.tier.size|io.confluent.kafka.server.log.size|io.confluent.kafka.server.tier.fetcher.bytes.fetched.total|io.confluent.kafka.server.request.total.time.ms.p50|io.confluent.kafka.server.tenant.consumer.lag.offsets|io.confluent.kafka.server.session.expire.listener.zookeeper.expires.rate.1.min|io.confluent.kafka.server.log.log.end.offset|io.confluent.kafka.server.broker.topic.bytes.in.rate.1.min|io.confluent.kafka.server.partition.under.min.isr|io.confluent.kafka.server.partition.in.sync.replicas.count|io.confluent.telemetry.http.exporter.batches.dropped|io.confluent.telemetry.http.exporter.items.total|io.confluent.telemetry.http.exporter.items.succeeded|io.confluent.telemetry.http.exporter.send.time.total.millis|io.confluent.kafka.server.controller.leader.election.rate.(?!.*delta).*|io.confluent.telemetry.http.exporter.batches.failed confluent.telemetry.exporter._c3.client.base.url=http://c3-internal-dns-hostname:9090/api/v1/otlp --- [2] confluent.telemetry.exporter._c3.client.compression=gzip confluent.telemetry.exporter._c3.api.key=dummy confluent.telemetry.exporter._c3.api.secret=dummy confluent.telemetry.exporter._c3.buffer.pending.batches.max=80 --- [3] confluent.telemetry.exporter._c3.buffer.batch.items.max=4000 --- [4] confluent.telemetry.exporter._c3.buffer.inflight.submissions.max=10 --- [5] confluent.telemetry.metrics.collector.interval.ms=60000 --- [6] confluent.telemetry.remoteconfig._confluent.enabled=false confluent.consumer.lag.emitter.enabled=true ``` - [1] To enable metrics for both Control Center (Legacy) and Control Center, update your existing Control Center (Legacy) property `metric.reporters` to use the following values: ```bash metric.reporters=io.confluent.telemetry.reporter.TelemetryReporter,io.confluent.metrics.reporter.ConfluentMetricsReporter ``` If you decommission Control Center (Legacy), enable only TelemetryReporter plugin with the following value: ```bash metric.reporters=io.confluent.telemetry.reporter.TelemetryReporter ``` - [2] Ensure the URL in `confluent.telemetry.exporter._c3.client.base.url` is the actual Control Center URL, reachable from the broker host. ```bash confluent.telemetry.exporter._c3.client.base.url=http://c3-internal-dns-hostname:9090/api/v1/otlp ``` - [3] [4] [5] [6] Use the following configurations for clusters up to 100,000 or fewer replicas. To get an accurate count of replicas, use the sum of all replicas across all clusters monitored in Control Center (Legacy) (including the Control Center (Legacy) bootstrap cluster). ```bash confluent.telemetry.exporter._c3.buffer.pending.batches.max=80 confluent.telemetry.exporter._c3.buffer.batch.items.max=4000 confluent.telemetry.exporter._c3.buffer.inflight.submissions.max=10 confluent.telemetry.metrics.collector.interval.ms=60000 ```
Configurations for clusters with 100,000 to 400,000 replicas Clusters with a replica count of 100,000 - 200,000: ```bash confluent.telemetry.exporter._c3.buffer.pending.batches.max=80 confluent.telemetry.exporter._c3.buffer.batch.items.max=4000 confluent.telemetry.exporter._c3.buffer.inflight.submissions.max=20 confluent.telemetry.metrics.collector.interval.ms=60000 ``` Clusters with a replica count of 200,000 - 400,000: ```bash confluent.telemetry.exporter._c3.buffer.pending.batches.max=80 confluent.telemetry.exporter._c3.buffer.batch.items.max=4000 confluent.telemetry.exporter._c3.buffer.inflight.submissions.max=20 confluent.telemetry.metrics.collector.interval.ms=120000 ``` For clusters with a replica count of 200,000 - 400,000, also update the following Control Center (Legacy) configuration: ```bash confluent.controlcenter.prometheus.trigger.threshold.time=2m ```
10. Perform a rolling restart for the brokers (zero downtime). For more information, see [Rolling restart](/platform/current/kafka/post-deployment.html#rolling-restart). ```bash systemctl restart confluent-server ``` 11. (Optional) Setup log rotation for Prometheus and Alertmanager. ### Prometheus 1. Create a new configuration file at `/etc/logrotate.d/prometheus` with the following content: ```bash /var/log/confluent/control-center/prometheus.log { size 10MB rotate 5 compress delaycompress missingok notifempty copytruncate } ``` 2. Create a script at `/usr/local/bin/logrotate-prometheus.sh`: ```bash #!/bin/bash /usr/sbin/logrotate -s /var/lib/logrotate/status-prometheus /etc/logrotate.d/prometheus ``` 3. Make the script executable ```bash chmod +x /usr/local/bin/logrotate-prometheus.sh ``` 4. To schedule with Cron, add the following line to your crontab (crontab -e): ```bash */10 * * * * /usr/local/bin/logrotate-prometheus.sh >> /tmp/prometheus-rotate.log 2>&1 ``` 5. Restart Prometheus ```bash systemctl restart prometheus ``` 6. Perform similar steps for Alertmanager logs. ### Alertmanager 1. Create a new configuration file at `/etc/logrotate.d/alertmanager` with the following content: ```bash /var/log/confluent/control-center/alertmanager.log { size 10MB rotate 5 compress delaycompress missingok notifempty copytruncate } ``` 2. Create a script at `/usr/local/bin/logrotate-alertmanager.sh`: ```bash #!/bin/bash /usr/sbin/logrotate -s /var/lib/logrotate/status-alertmanager /etc/logrotate.d/alertmanager ``` 3. Make the script executable ```bash chmod +x /usr/local/bin/logrotate-alertmanager.sh ``` 4. To schedule with Cron, add the following line to your crontab (crontab -e): ```bash */10 * * * * /usr/local/bin/logrotate-alertmanager.sh >> /tmp/alertmanager-rotate.log 2>&1 ``` 5. Restart Alertmanager ```bash systemctl restart alertmanager ``` # Configure RBAC for Control Center on Confluent Platform Control Center supports [Use Role-Based Access Control (RBAC) for Authorization in Confluent Platform](/platform/current/security/authorization/rbac/overview.html#rbac-overview) (role-based access control (RBAC)). As of Confluent Platform version 5.3 and later, RBAC provides a fine-grained security model across the platform in a development environment. Prior versions of Control Center only provided coarse-grained access control of either read-only or full access. If RBAC is not enabled, or Control Center is running against Confluent Platform versions prior to 5.3 and later, Control Center functions as it has before without restricted access (unless access control feature flags have been turned off in the `control-center-properties` files). When RBAC is not enabled, Access Control settings (referred to as feature flags) in Control Center configuration options can remove access for the features that have those flags; such as ksqlDB, License Manager, Schema Registry, topic inspections, broker configurations, and more. For more information on those available settings, see [Access control settings](../installation/configuration.md#controlcenter-access-control-settings). If RBAC is enabled, [Features](../installation/configuration.md#controlcenter-access-control-settings) are superseded by RBAC role permissions. RBAC works in conjunction with [ACLs](/platform/current/security/authorization/acls/overview.html#kafka-authorization) and [LDAP](c3-auth-ldap.md#controlcenter-security-ldap) security. In general, RBAC in Control Center enforces access for only a few resources for which it manages; typically those resources for which it keeps internal state (license management, broker metrics, and alerts). See [Control Center resource access by role](#c3-feature-access-by-role) for more details. The remainder of RBAC-enforced operations on resources managed by Control Center are delegated downstream to Apache Kafka®, Schema Registry, Connect, and ksqlDB. ### Recommended Confluent Platform RBAC reading Review the following documentation to gain a thorough understanding of the RBAC feature in Confluent Platform: - [Use Role-Based Access Control (RBAC) for Authorization in Confluent Platform](/platform/current/security/authorization/rbac/overview.html#rbac-overview) - [Configure Metadata Service (MDS) in Confluent Platform](/platform/current/kafka/configure-mds/index.html#rbac-mds-config) - [Use Predefined RBAC Roles in Confluent Platform](/platform/current/security/authorization/rbac/rbac-predefined-roles.html#rbac-predefined-roles) - [RBAC role use cases](/platform/current/security/authorization/rbac/rbac-predefined-roles.html#rbac-roles-use-cases) - [Role-Based Access Control for Confluent Platform Quick Start](/platform/current/security/authorization/rbac/rbac-cli-quickstart.html#rbac-cli-quickstart) - [Configure Role-Based Access Control for Schema Registry in Confluent Platform](/platform/current/schema-registry/security/rbac-schema-registry.html#schemaregistry-rbac) - [Deploy Secure ksqlDB with RBAC in Confluent Platform](/platform/current/security/authorization/rbac/ksql-rbac.html#ksql-rbac) - [Configure RBAC for a Connect Cluster](/platform/current/connect/rbac/connect-rbac-connect-cluster.html#connect-rbac-connect-cluster) Configure Metadata Service (MDS) in Confluent Platform. Use Predefined RBAC Roles in Confluent Platform and RBAC role use cases. Role-Based Access Control for Confluent Platform Quick Start. Configure Role-Based Access Control for Schema Registry in Confluent Platform. Deploy Secure ksqlDB with RBAC in Confluent Platform. Configure RBAC for a Connect Cluster. ### Initialization The Producer is configured using a dictionary in the examples below. If you are running Kafka locally, you can initialize the Producer as shown below. ```python from confluent_kafka import Producer import socket conf = {'bootstrap.servers': 'host1:9092,host2:9092', 'client.id': socket.gethostname()} producer = Producer(conf) ``` If you are connecting to a Kafka cluster in Confluent Cloud, you need to provide credentials for access. The example below shows using a cluster API key and secret. ```python from confluent_kafka import Producer import socket conf = {'bootstrap.servers': 'pkc-abcd85.us-west-2.aws.confluent.cloud:9092', 'security.protocol': 'SASL_SSL', 'sasl.mechanism': 'PLAIN', 'sasl.username': '', 'sasl.password': '', 'client.id': socket.gethostname()} producer = Producer(conf) ``` * For information on the available configuration properties, refer to the [API Documentation](/platform/current/clients/confluent-kafka-python/html/index.html). * For a step-by-step tutorial using the Python client including code samples for the producer and consumer see [this guide](https://developer.confluent.io/get-started/python/). ### REST-based example 1. Use this setting with [distributed workers](/platform/current/connect/concepts.html#distributed-workers). Write the following JSON to config.json, configure all of the required values, and use the following command to post the configuration to one of the distributed connect workers. Check here for more information about the Kafka Connect [Kafka Connect REST Interface](/platform/current/connect/references/restapi.html) ```json { "name" : "AzureBlobStorageSourceConnector", "config" : { "connector.class" : "io.confluent.connect.azure.blob.storage.AzureBlobStorageSourceConnector", "tasks.max" : "1", "azblob.account.name" : "your-account", "azblob.account.key" : "your-key", "azblob.container.name" : "confluent-kafka-connect-azBlobStorage-testing", "format.class" : "io.confluent.connect.azure.blob.storage.format.avro.AvroFormat", "confluent.topic.bootstrap.servers" : "localhost:9092", "confluent.topic.replication.factor" : "1", "transforms" : "AddPrefix", "transforms.AddPrefix.type" : "org.apache.kafka.connect.transforms.RegexRouter", "transforms.AddPrefix.regex" : ".*", "transforms.AddPrefix.replacement" : "copy_of_$0" } } ``` #### NOTE Change the `confluent.topic.bootstrap.servers` property to include your broker address(es), and change the `confluent.topic.replication.factor` to 3 for staging or production use. 2. Use curl to post a configuration to one of the Kafka Connect Workers. Change `http://localhost:8083/` to the endpoint of one of your Kafka Connect worker(s). ```bash curl -s -X POST -H 'Content-Type: application/json' --data @config.json http://localhost:8083/connectors ``` 3. Use the following command to update the configuration of existing connector. ```bash curl -s -X PUT -H 'Content-Type: application/json' --data @config.json http://localhost:8083/connectors/AzureBlobStorageSourceConnector/config ``` 4. To consume records written by connector to the configured Kafka topic, run the following command: ```bash kafka-avro-console-consumer --bootstrap-server localhost:9092 --property schema.registry.url=http://localhost:8081 --topic copy_of_blob_topic --from-beginning ``` ## Quick Start This quick start uses the Azure Cognitive Search Sink connector to consume records and write them as documents to an Azure Cognitive Search service. Prerequisites : - [Confluent Platform](/platform/current/installation/index.html) - [Confluent CLI](https://docs.confluent.io/confluent-cli/current/installing.html) (requires separate installation) 1. Before starting the connector, create and deploy an Azure Cognitive Search service. * Navigate to the Microsoft [Azure Portal](https://portal.azure.com/). * Create a Search service following this [Azure Cognitive Search quick start guide](https://docs.microsoft.com/en-us/azure/search/search-create-service-portal). * Create an index in the service following this [index quick start guide](https://docs.microsoft.com/en-us/azure/search/search-get-started-portal). * Copy the admin key and the Search service name from the portal and save it for later. Azure Cognitive Search should now be set up for the connector. #### NOTE Ensure the index has the default name `hotels-sample-index` and only has the fields `HotelId`, `HotelName`, `Description`. All others should be deleted. 2. Install the connector through the [Confluent Hub Client](/kafka-connectors/self-managed/confluent-hub/client.html). ```bash # run from your CP installation directory confluent connect plugin install confluentinc/kafka-connect-azure-seach:latest ``` 3. Start Confluent Platform using the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/index.html) commands. ```bash confluent local start ``` 4. Produce test data to the `hotels-sample` topic in Kafka. Start the Avro console producer to import a few records to Kafka: ```bash ${CONFLUENT_HOME}/bin/kafka-avro-console-producer --broker-list localhost:9092 --topic hotels-sample \ --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"HotelName","type":"string"},{"name":"Description","type":"string"}]}' \ --property key.schema='{"type":"string"}' \ --property "parse.key=true" \ --property "key.separator=," ``` Then in the console producer, enter: ```bash "marriotId",{"HotelName": "Marriot", "Description": "Marriot description"} "holidayinnId",{"HotelName": "HolidayInn", "Description": "HolidayInn description"} "motel8Id",{"HotelName": "Motel8", "Description": "motel8 description"} ``` The three records entered are published to the Kafka topic `hotels-sample` in Avro format. 5. Create a `azure-search.json` file with the following contents: ```bash { "name": "azure-search", "config": { "topics": "hotels-sample", "tasks.max": "1", "connector.class": "io.confluent.connect.azure.search.AzureSearchSinkConnector", "key.converter": "io.confluent.connect.avro.AvroConverter", "key.converter.schema.registry.url": "http://localhost:8081", "value.converter": "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url": "http://localhost:8081", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1", "azure.search.service.name": "", "azure.search.api.key": "", "index.name": "${topic}-index", "reporter.bootstrap.servers": "localhost:9092", "reporter.error.topic.name": "test-error", "reporter.error.topic.replication.factor": 1, "reporter.error.topic.key.format": "string", "reporter.error.topic.value.format": "string", "reporter.result.topic.name": "test-result", "reporter.result.topic.key.format": "string", "reporter.result.topic.value.format": "string", "reporter.result.topic.replication.factor": 1 } } ``` #### NOTE For details about using this connector with Kafka Connect Reporter, see [Connect Reporter](/kafka-connectors/self-managed/userguide.html#userguide-connect-reporter). 6. Load the Azure Cognitive Search Sink connector. ```bash confluent local load azure-search --config path/to/azure-search.json ``` #### IMPORTANT Don’t use the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/index.html) commands in production environments. 7. Confirm that the connector is in a `RUNNING` state. ```bash confluent local status azure-search ``` 8. Confirm that the messages were delivered to the result topic in Kafka ```bash confluent local consume test-result --from-beginning ``` 9. Confirm that the messages were delivered to Azure Cognitive Search. 10. Log in to the service and check that the index `hotel-samples-index` contains the three written records from before. 11. Clean up resources: 1. Delete the connector ```bash confluent local unload azure-search ``` 2. Stop Confluent Platform ```bash confluent local stop ``` 3. Delete the created Azure Cognitive Search service and its resource group in the Azure portal. ### REST-based example Use this setting with [distributed workers](/platform/current/connect/concepts.html#distributed-workers). Write the following JSON to `pubsub-source-source.json`, configure all of the required values, and use the following command to post the configuration to one of the distributed connect workers. Check here for more information about the Kafka Connect [REST API](/platform/current/connect/references/restapi.html) ```json { "name" : "pubsub-source", "config" : { "connector.class" : "io.confluent.connect.gcp.pubsub.PubSubSourceConnector", "tasks.max" : "1", "kafka.topic" : "pubsub-topic", "gcp.pubsub.project.id" : "project-1", "gcp.pubsub.topic.id" : "topic-1", "gcp.pubsub.subscription.id" : "subscription-1", "gcp.pubsub.credentials.path" : "/home/some_directory/credentials.json", "confluent.topic.bootstrap.servers" : "localhost:9092", "confluent.topic.replication.factor" : "1" } } ``` Use `curl` to post the configuration to one of the Kafka Connect Workers. Change `http://localhost:8083/` to the endpoint of one of your Kafka Connect worker(s). ```bash curl -s -X POST -H 'Content-Type: application/json' --data @pubsub-source.json http://localhost:8083/connectors ``` Use the following command to update the configuration of existing connector. ```bash curl -s -X PUT -H 'Content-Type: application/json' --data @pubsub-source.json http://localhost:8083/connectors/pubsub-source/config ``` To consume records written by connector to the configured Kafka topic, run the following command: ```bash kafka-avro-console-consumer --bootstrap-server localhost:9092 --property schema.registry.url=http://localhost:8081 --topic pubsub-topic --from-beginning ``` ### REST-based example 1. Use this setting with [distributed workers](/platform/current/connect/concepts.html#distributed-workers). Write the following JSON, which can be used to read all the data list directly under a GCS bucket, to `config.json`, configure all of the required values, and use the following command to post the configuration to one of the distributed connect workers. Check here for more information about the Kafka Connect [REST API](/platform/current/connect/references/restapi.html). ```json { "name": "gcs-source-generalized", "config": { "connector.class": "io.confluent.connect.gcs.GcsSourceConnector", "tasks.max": "1", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "mode": "GENERIC", "topics.dir": " ", "topic.regex.list": "mytopic:.", "format.class": "io.confluent.connect.gcs.format.json.JsonFormat", "gcs.bucket.name": "", "gcs.credentials.path": "", "value.converter.schemas.enable": "false", "confluent.topic.bootstrap.servers" : "localhost:9092", "confluent.topic.replication.factor" : "1", "confluent.license" : " Omit to enable trial mode ", "transforms" : "AddPrefix", "transforms.AddPrefix.type" : "org.apache.kafka.connect.transforms.RegexRouter", "transforms.AddPrefix.regex" : ".", "transforms.AddPrefix.replacement" : "copy_of_$0" } } ``` #### NOTE Change the `confluent.topic.bootstrap.servers` property to include your broker address(es), and change the `confluent.topic.replication.factor` to 3 for staging or production use. 2. Use curl to post a configuration to one of the Kafka Connect Workers. Change `http://localhost:8083/` to the endpoint of one of your Kafka Connect worker(s). ```bash curl -s -X POST -H 'Content-Type: application/json' --data @config.json http://localhost:8083/connectors ``` 3. Use the following command to update the configuration for an existing connector. ```bash curl -s -X PUT -H 'Content-Type: application/json' --data @config.json http://localhost:8083/connectors/GCSSourceConnector/config ``` 4. To consume records written by the connector to the configured Kafka topic, run the following command: ```bash kafka-avro-console-consumer --bootstrap-server localhost:9092 --property schema.registry.url=http://localhost:8081 --topic copy_of_gcs_topic --from-beginning ``` #### Basic authentication example 1. Run the demo app with the `basic-auth` Spring profile. ```bash mvn spring-boot:run -Dspring.profiles.active=basic-auth ``` If the demo app is already running, you will need to kill that instance (`CTRL + C`) before running a new instance to avoid port conflicts. 2. Create a `http-sink.properties` file with the following contents: ```text name=HttpSinkBasicAuth topics=http-messages tasks.max=1 connector.class=io.confluent.connect.http.HttpSinkConnector # key/val converters key.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.storage.StringConverter # licensing for local single-node Kafka cluster confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 # connect reporter required bootstrap server reporter.bootstrap.servers=localhost:9092 reporter.result.topic.name=success-responses reporter.result.topic.replication.factor=1 reporter.error.topic.name=error-responses reporter.error.topic.replication.factor=1 # http sink connector configs http.api.url=http://localhost:8080/api/messages auth.type=BASIC connection.user=admin connection.password=password ``` For details about using this connector with Kafka Connect Reporter, see [Connect Reporter](/kafka-connectors/self-managed/userguide.html#userguide-connect-reporter). 3. Run and validate the connector as described in the [Quick start](#http-connector-quickstart). #### SSL with basic authentication example 1. Run the demo app with the `ssl-auth` Spring profile. ```bash mvn spring-boot:run -Dspring.profiles.active=ssl-auth ``` 2. Create a `http-sink.properties` file with the following contents: ```text name=SSLHttpSink topics=string-topic tasks.max=1 connector.class=io.confluent.connect.http.HttpSinkConnector # key/val converters key.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.storage.StringConverter # licensing for local single-node Kafka cluster confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 # connect reporter required bootstrap server reporter.bootstrap.servers=localhost:9092 reporter.result.topic.name=success-responses reporter.result.topic.replication.factor=1 reporter.error.topic.name=error-responses reporter.error.topic.replication.factor=1 # http sink connector configs http.api.url=https://localhost:8443/api/messages # http sink connector SSL config ssl.enabled=true https.ssl.truststore.location=/path/to/http-sink-demo/src/main/resources/localhost-keystore.jks https.ssl.truststore.type=JKS https.ssl.truststore.password=changeit https.ssl.keystore.location=/path/to/http-sink-demo/src/main/resources/localhost-keystore.jks https.ssl.keystore.type=JKS https.ssl.keystore.password=changeit https.ssl.key.password=changeit https.ssl.protocol=TLSv1.2 auth.type=BASIC connection.user=admin connection.password=password ``` 3. Run and validate the connector as described in the [Quick start](#http-connector-quickstart). #### Proxy authentication example This proxy authentication example is dependent on MacOS X 10.6.8 or higher due to the proxy that is utilized. 1. Run the demo app with the `simple-auth` Spring profile. ```bash mvn spring-boot:run -Dspring.profiles.active=simple-auth ``` 2. Install [Squidman Proxy](https://squidman.net/squidman). 3. In SquidMan, navigate to the **Preferences > General** tab, and set the HTTP port to `3128`. 4. In SquidMan, navigate to the **Preferences > Template** tab, and add the following criteria: ```text auth_param basic program /usr/local/squid/libexec/basic_ncsa_auth /etc/squid/passwords auth_param basic realm proxy acl authenticated proxy_auth REQUIRED http_access allow authenticated ``` 5. Create a credentials file for the proxy. ```bash sudo mkdir /etc/squid sudo htpasswd -c /etc/squid/passwords proxyuser # set password to proxypassword ``` 6. Open the SquidMan application and select `Start Squid`. 7. Create a `http-sink.properties` file with the following contents: ```text name=HttpSinkProxyAuth topics=http-messages tasks.max=1 connector.class=io.confluent.connect.http.HttpSinkConnector # key/val converters key.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.storage.StringConverter # licensing for local single-node Kafka cluster confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 # connect reporter required bootstrap server reporter.bootstrap.servers=localhost:9092 reporter.result.topic.name=success-responses reporter.result.topic.replication.factor=1 reporter.error.topic.name=error-responses reporter.error.topic.replication.factor=1 # http sink connector configs http.api.url=http://localhost:8080/api/messages http.proxy.host=localhost http.proxy.port=3128 http.proxy.user=proxyuser http.proxy.password=proxypassword ``` 8. Run and validate the connector as described in the [Quick start](#http-connector-quickstart). #### JSON converter example 1. Run the demo app with the `basic-auth` Spring profile. ```bash mvn spring-boot:run -Dspring.profiles.active=basic-auth ``` 2. Create a `http-sink.properties` file with the following contents: ```text name=JsonHttpSink topics=json-topic tasks.max=1 connector.class=io.confluent.connect.http.HttpSinkConnector # key/val converters key.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.json.JsonConverter value.converter.schemas.enable=false # licensing for local single-node Kafka cluster confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 # http sink connector configs http.api.url=http://localhost:8080/api/messages auth.type=BASIC connection.user=admin connection.password=password ``` Note that you should publish JSON messages to the `json-topic` instead of to the String messages shown in the [Quick start](#http-connector-quickstart). 3. Run and validate the connector as described in the [Quick start](#http-connector-quickstart). #### Regex replacement example 1. Run the demo app with the `basic-auth` Spring profile. ```bash mvn spring-boot:run -Dspring.profiles.active=basic-auth ``` 2. Create a `http-sink.properties` file with the following contents: ```text name=RegexHttpSink topics=email-topic,non-email-topic tasks.max=1 connector.class=io.confluent.connect.http.HttpSinkConnector # key/val converters key.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.storage.StringConverter # licensing for local single-node Kafka cluster confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 # connect reporter required bootstrap server reporter.bootstrap.servers=localhost:9092 reporter.result.topic.name=success-responses reporter.result.topic.replication.factor=1 reporter.error.topic.name=error-responses reporter.error.topic.replication.factor=1 # http sink connector configs http.api.url=http://localhost:8080/api/messages auth.type=BASIC connection.user=admin connection.password=password # regex to mask emails regex.patterns=^.+@.+$ regex.replacements=******** ``` 3. Publish messages to the topics that are configured. Emails should be redacted with `********` before being sent to the demo app. ```bash confluent local produce email-topic > example@domain.com > another@email.com confluent local produce non-email-topic > not an email > another normal string ``` 4. Run and validate the connector as described in the [Quick start](#http-connector-quickstart). Note that regex replacement is not supported when `request.body.format` configuration is set to `JSON`. #### Retries example 1. Run the demo app with the `basic-auth` Spring profile. ```bash mvn spring-boot:run -Dspring.profiles.active=basic-auth ``` 2. Create a `http-sink.properties` file with the following contents: ```text name=RetriesExample topics=http-messages tasks.max=1 connector.class=io.confluent.connect.http.HttpSinkConnector # key/val converters key.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.storage.StringConverter # licensing for local single-node Kafka cluster confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 # connect reporter required bootstrap server reporter.bootstrap.servers=localhost:9092 reporter.result.topic.name=success-responses reporter.result.topic.replication.factor=1 reporter.error.topic.name=error-responses reporter.error.topic.replication.factor=1 # http sink connector configs http.api.url=http://localhost:8080/api/messages auth.type=BASIC connection.user=admin connection.password=password behavior.on.null.values=delete # retry configurations max.retries=20 retry.backoff.ms=5000 ``` 3. Publish messages to the topic that have keys and values. ```bash confluent local produce http-messages --property parse.key=true --property key.separator=, > 1,message-value > 2,another-message ``` 4. Stop the demo app. 5. Run and validate the connector as described in the [Quick start](#http-connector-quickstart). 6. The Connector will retry for maximum 20 times with an initial backoff duration of 5000ms. If the http operation is successful then the retry will be stopped. In this case the connector will retry for 20 times and the connector task will get failed. 7. The default value for `max.retries` is 10 and for `retry.backoff.ms` is 3000ms. ### REST-based example In this section, you will complete the steps in a REST-based example. 1. Copy the following JSON object to `influxdb-sink-connector.json` and configure all of the required values. This configuration is typically used along with [distributed workers](/platform/current/connect/concepts.html#distributed-workers). ```json { "name" : "InfluxDBSinkConnector", "config" : { "connector.class" : "io.confluent.influxdb.InfluxDBSinkConnector", "tasks.max" : "1", "topics" : "orders", "influxdb.url" : "http://localhost:8086", "influxdb.db" : "influxTestDB", "measurement.name.format" : "${topic}", "value.converter": "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url": "http://localhost:8081" } } ``` 2. Use the following `curl` command to post the configuration to one of the Kafka Connect workers while changing `http://localhost:8083/` to the endpoint of one of your Kafka Connect workers (for more information, see the Kafka Connect [REST API](/platform/current/connect/references/restapi.html)): ```bash curl -X POST -d @influxdb-sink-connector.json http://localhost:8083/connectors -H "Content-Type: application/json" ``` 3. Create a record in the `orders` topic: ```bash bin/kafka-avro-console-producer \ --broker-list localhost:9092 --topic orders \ --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"id","type":"int"},{"name":"product", "type": "string"}, {"name":"quantity", "type": "int"}, {"name":"price", "type": "float"}]}' ``` The console producer waits for input. 4. Copy and paste the following record into the terminal: ```bash {"id": 999, "product": "foo", "quantity": 100, "price": 50} ``` 5. Log in to the Docker container using the following command: ```bash docker exec -it bash ``` To find the container ID, use the `docker ps` command. 6. Once you are in the Docker container, log in to InfluxDB shell: ```bash influx ``` Your output should resemble: ```bash Connected to http://localhost:8086 version 1.7.7 InfluxDB shell version: 1.7.7 ``` 7. Run the following query to verify the records: ```bash > USE influxTestDB; Using database influxTestDB > SELECT * FROM orders; name: orders time id price product quantity ---- -- ----- ------- -------- 1567164248415000000 999 50 foo 100 ``` ### Use the JMS Source connector with TIBCO EMS You can use the JMS Source connector with TIBCO EMS and its support for JMS. Note that this is a specialization of the connector that avoids JNDI and instead uses system-specific APIs to establish connections. This is often easier to configure and use in most cases. To get started, you must install the latest TIBCO EMS JMS client libraries into the same directory where this connector is installed. For more details, see the [TIBCO EMS product documentation](https://docs.tibco.com/products/tibco-enterprise-message-service?_ga=2.158352819.868240457.1698346221-814484783.1670445207) Next, you must create a connector configuration for your environment, using the appropriate configuration properties. The following example shows a typical configuration of the connector for use with [distributed mode](/platform/current/connect/concepts.html#distributed-workers). ```bash { "name": "connector1", "config": { "connector.class": "io.confluent.connect.jms.JmsSourceConnector", "kafka.topic":"MyKafkaTopicName", "jms.destination.name":"MyQueueName", "jms.destination.type":"queue", "java.naming.factory.initial":"com.tibco.tibjms.naming.TibjmsInitialContextFactory", "java.naming.provider.url":"tibjmsnaming://:" "confluent.license":"" "confluent.topic.bootstrap.servers":"localhost:9092" "confluent.topic.ssl.truststore.location"="omitted" "confluent.topic.ssl.truststore.password"="" "confluent.topic.ssl.keystore.location"="omitted" "confluent.topic.ssl.keystore.password"="" "confluent.topic.ssl.key.password"="" "confluent.topic.security.protocol"="SSL" } } ``` Note that any extra properties defined on the connector will be passed into the JNDI InitialContext. This makes it easy to use any TIBCO EMS specific settings. Finally, deploy your connector by posting it to a Kafka Connect distributed worker. #### Connector configuration 1. Create your `oracle-cdc-confluent-cloud-json.json` file based on the following example: ```json { "name": "OracleCDC_Confluent_Cloud", "config":{ "connector.class": "io.confluent.connect.oracle.cdc.OracleCdcSourceConnector", "name": "OracleCDC_Confluent_Cloud", "tasks.max":3, "oracle.server": "", "oracle.sid":"", "oracle.pdb.name":"", "oracle.username": "", "oracle.password": "", "start.from":"snapshot", "redo.log.topic.name": "oracle-redo-log-topic", "redo.log.consumer.bootstrap.servers":"", "redo.log.consumer.sasl.jaas.config":"org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";", "redo.log.consumer.security.protocol":"SASL_SSL", "redo.log.consumer.sasl.mechanism":"PLAIN", "table.inclusion.regex":"", "_table.topic.name.template_":"Using template vars to set change event topic for each table", "table.topic.name.template": "${databaseName}.${schemaName}.${tableName}", "connection.pool.max.size": 20, "confluent.topic.replication.factor":3, "topic.creation.groups":"redo", "topic.creation.redo.include":"oracle-redo-log-topic", "topic.creation.redo.replication.factor":3, "topic.creation.redo.partitions":1, "topic.creation.redo.cleanup.policy":"delete", "topic.creation.redo.retention.ms":1209600000, "topic.creation.default.replication.factor":3, "topic.creation.default.partitions":5, "topic.creation.default.cleanup.policy":"compact", "confluent.topic.bootstrap.servers":"", "confluent.topic.sasl.jaas.config":"org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";", "confluent.topic.security.protocol":"SASL_SSL", "confluent.topic.sasl.mechanism":"PLAIN", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable": "false" } } ``` 2. Create `oracle-redo-log-topic`. Make sure the topic name matches the value you put for `"redo.log.topic.name"`. **Confluent Platform CLI** ```text bin/kafka-topics --create --topic oracle-redo-log-topic \ --bootstrap-server broker:9092 --replication-factor 1 \ --partitions 1 --config cleanup.policy=delete \ --config retention.ms=120960000 ``` **Confluent Cloud CLI** ```text confluent kafka topic create oracle-redo-log-topic \ --partitions 1 --config cleanup.policy=delete \ --config retention.ms=120960000 ``` 3. Start the Oracle CDC Source connector using the following command: ```text curl -s -H "Content-Type: application/json" -X POST -d @oracle-cdc-confluent-cloud-json.json http://localhost:8083/connectors/ | jq ``` ### Property-based example Create a configuration file `pagerduty-sink.properties` with the following content. This file should be placed inside the Confluent Platform installation directory. This configuration is used typically along with [standalone workers](/platform/current/connect/concepts.html#standalone-workers). ```text name=pagerduty-sink-connector topics=incidents connector.class=io.confluent.connect.pagerduty.PagerDutySinkConnector tasks.max=1 pagerduty.api.key=**** behavior.on.error=fail key.converter=org.apache.kafka.connect.storage.StringConverter value.converter=io.confluent.connect.avro.AvroConverter value.converter.schema.registry.url=http://localhost:8081 confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 confluent.license= reporter.bootstrap.servers=localhost:9092 reporter.result.topic.replication.factor=1 reporter.error.topic.replication.factor=1 ``` ### Property-based example The following steps provide a property-based example. 1. Create a configuration file, `salesforce-bulk-api.properties`. This configuration is used typically along with [standalone workers](/platform/current/connect/concepts.html#standalone-workers). ```properties name=SalesforceBulkApiSourceConnector tasks.max=1 connector.class=io.confluent.connect.salesforce.SalesforceBulkApiSourceConnector key.converter=org.apache.kafka.connect.storage.StringConverter value.converter=io.confluent.connect.avro.AvroConverter value.converter.schema.registry.url=http://localhost:8081 salesforce.username=< Required Configuration > salesforce.password=< Required Configuration > salesforce.password.token=< Required Configuration > salesforce.object=< Required Configuration > salesforce.since=< Required Configuration > kafka.topic=< Required Configuration > salesforce.instance=< Required Configuration > confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 confluent.license=Omit to enable trial mode ``` 2. Ensure the configurations in `salesforce-bulk-api.properties` are properly set. 3. Start the Salesforce Bulk API source connector by loading its configuration with the following command: ```bash confluent local load salesforce-bulk-api-source -- -d salesforce-bulk-api.properties { "name" : "SalesforceBulkApiSourceConnector", "config" : { "connector.class", "io.confluent.connect.salesforce.SalesforceBulkApiSourceConnector", "tasks.max" : "1", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "value.converter": "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url": "http://localhost:8081", "kafka.topic" : "< Required Configuration >", "salesforce.password" : "< Required Configuration >", "salesforce.password.token" : "< Required Configuration >", "salesforce.object" : "< Required Configuration >", "salesforce.username" : "< Required Configuration >", "salesforce.since" : "< Required Configuration >", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1", "confluent.license": "" }, "tasks": [] } ``` 4. Verify the connector starts successfully and review the Connect worker’s log by entering the following: ```bash confluent local log connect ``` 5. Confirm the connector is in a `RUNNING` state. ```bash confluent local status SalesforceBulkApiSourceConnector ``` 6. Confirm messages are being sent to Kafka. ```bash kafka-avro-console-consumer \ --bootstrap-server localhost:9092 \ --property schema.registry.url=http://localhost:8081 \ --topic \ --from-beginning | jq '.' ``` ## Upsert with SObject Sink Connector The `upsert` operation can be used when you want to update existing records in Salesforce (located by `external_id`) or otherwise, insert new records. The following example shows how to `upsert` records for an `Orders` object in Salesforce. 1. Create an `external_id` field in Salesforce. 1. Click your user name and then click **Setup**. 2. Under **Build**, click **Customize**, and then select **Orders**. 3. Click the **Add a custom field to orders** link. 4. In the **Order Custom Fields and Relationships** section, click **New**. 5. In the **Data Type** list, select a data type, `Text`, then click **Next**. 6. Enter the details for the field. For example, Field Label(`extid`), Length, Field Name(`extid`), Description. 7. Check the External ID box, then click **Next**. 8. The external ID (`extid`) is created and appears in the list under **Order Custom Fields and Relationships**. 2. Create a configuration file named `salesforce-sobject-orders-sink-config.json` with the following contents. Make sure to enter a real username, password, security token, consumer key, and consumer secret. Additionally, make sure you put the API name (`extid__c`) for the external ID (`extid`). See [Salesforce SObject Sink Connector Configuration Properties](salesforce_sobject_sink_connector_config.md#salesforce-sobject-sink-connector-config) for more information about these and the other configuration properties. ```none { "name": "upsert-orders", "config": { "connector.class" : "io.confluent.salesforce.SalesforceSObjectSinkConnector", "tasks.max" : "1", "topics" : "orders", "salesforce.object" : "Order", "salesforce.username" : "", "salesforce.password" : "", "salesforce.password.token" : "", "salesforce.consumer.key" : "", "salesforce.consumer.secret" : "", "confluent.topic.bootstrap.servers":"localhost:9092", "confluent.topic.replication.factor": "1", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable":true, "behavior.on.api.errors": "fail", "reporter.bootstrap.servers": "localhost:9092", "reporter.error.topic.name": "error-responses", "reporter.error.topic.replication.factor": 1, "reporter.result.topic.name": "success-responses", "reporter.result.topic.replication.factor": 1, "salesforce.sink.object.operation": "upsert", "override.event.type": "true", "request.max.retries.time.ms": 60000, "salesforce.custom.id.field.name": "extid__c", "salesforce.use.custom.id.field": true } } ``` 3. Enter the Confluent CLI [confluent local services connect connector load](https://docs.confluent.io/confluent-cli/current/command-reference/local/services/connect/connector/confluent_local_services_connect_connector_load.html) command to start the Salesforce source connector. ```bash confluent local load upsert-orders --config salesforce-sobject-orders-sink-config.json ``` Your output should resemble: ```none { "name": "upsert-orders", "config": { "connector.class" : "io.confluent.salesforce.SalesforceSObjectSinkConnector", "tasks.max" : "1", "topics" : "orders", "salesforce.object" : "Order", "salesforce.username" : "", "salesforce.password" : "", "salesforce.password.token" : "", "salesforce.consumer.key" : "", "salesforce.consumer.secret" : "", "confluent.topic.bootstrap.servers":"localhost:9092", "confluent.topic.replication.factor": "1", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable":true, "behavior.on.api.errors": "fail", "reporter.bootstrap.servers": "localhost:9092", "reporter.error.topic.name": "error-responses", "reporter.error.topic.replication.factor": 1, "reporter.result.topic.name": "success-responses", "reporter.result.topic.replication.factor": 1, "salesforce.sink.object.operation": "upsert", "override.event.type": "true", "request.max.retries.time.ms": 60000, "salesforce.custom.id.field.name": "extid__c", "salesforce.use.custom.id.field": true }, "tasks": [ ... ], "type": null } ``` 4. In order to insert an order into Salesforce with a Kafka record, the record should have a valid `AccountID`, `ContractID`, `EffectiveDate`, and `Status`. Please create an Account record and a Contract record in Salesforce. The values used in this example are: `"AccountId": "0012L0000176cdVQAQ"`, `"ContractId": "8002L000000ANqwQAG"` and `"EffectiveDate": 1608922098000` (the Epoch timestamp for 12/25/2020 in milliseconds). ```none kafka-console-producer \ --broker-list localhost:9092 \ --topic orders {"schema":{"type":"struct","fields":[{"type":"string","optional":false,"field":"Id"},{"type":"string","optional":false,"field":"AccountId"},{"type":"string","optional":false,"field":"ContractId"},{"type":"string","optional":false,"field":"Description"},{"type":"string","optional":false,"field":"Status"},{"type": "int64","optional": false,"field": "EffectiveDate"},{"type":"string","optional":false,"field":"_ObjectType"}, {"type":"string","optional":false,"field":"_EventType"}],"optional":false,"name":"myOrder","version":1},"payload": {"Id": "200", "AccountId": "0012L0000176cdVQAQ", "ContractId": "8002L000000ANqwQAG", "Status": "Draft", "EffectiveDate": 1608922098000, "Description":"Order record has been upserted.", "_ObjectType":"Order", "_EventType":"updated"}} ``` 5. Log in to Salesforce and verify that the `Order` object exists with the external ID. ![Salesforce screen 1](images/salesforce-sobject-sink-upsert-1.png) 6. Update the description of the order object the connector just created. ```none kafka-console-producer \ --broker-list localhost:9092 \ --topic orders {"schema":{"type":"struct","fields":[{"type":"string","optional":false,"field":"Id"},{"type":"string","optional":false,"field":"AccountId"},{"type":"string","optional":false,"field":"ContractId"},{"type":"string","optional":false,"field":"Description"},{"type":"string","optional":false,"field":"Status"},{"type": "int64","optional": false,"field": "EffectiveDate"},{"type":"string","optional":false,"field":"_ObjectType"}, {"type":"string","optional":false,"field":"_EventType"}],"optional":false,"name":"myOrder","version":1},"payload": {"Id": "200", "AccountId": "0012L0000176cdVQAQ", "ContractId": "8002L000000ANqwQAG", "Status": "Draft", "EffectiveDate": 1608922098000, "Description":"Order record has been updated.", "_ObjectType":"Order", "_EventType":"updated"}} ``` 7. Login to Salesforce and verify that the `Order` object has been updated with the external ID. ![Salesforce screen 2](images/salesforce-sobject-sink-upsert-2.png) ## Quick Start In this quick start guide, the Zendesk connector is used to consume records from a Zendesk resource called `tickets` and send the records to a Kafka topic named `ZD_tickets`. To run this quick start, ensure you have a [Zendesk Developer Account](https://developer.zendesk.com/). 1. Install the connector through the [Confluent Hub Client](https://docs.confluent.io/current/connect/managing/confluent-hub/client.html). ```bash # run from your confluent platform installation directory confluent connect plugin install confluentinc/kafka-connect-zendesk:latest ``` 2. Start the Confluent Platform. ```bash confluent local start ``` 3. Check the status of all services. ```bash confluent local services status ``` 4. Configure your connector by first creating a JSON file named `zendesk.json` with the following properties. ```bash // substitute <> with your config { "name": "ZendeskConnector", "config": { "connector.class": "io.confluent.connect.zendesk.ZendeskSourceConnector", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable": "false", "confluent.topic.bootstrap.servers": "127.0.0.1:9092", "confluent.topic.replication.factor": 1, "confluent.license": "", // leave it empty for evaluation license "tasks.max": 1, "poll.interval.ms": 1000, "topic.name.pattern": "ZD_${entityName}", "zendesk.auth.type": "basic", "zendesk.url": "https://.zendesk.com", "zendesk.user": "", "zendesk.password": "", "zendesk.tables": "tickets", "zendesk.since": "2019-08-01" } } ``` 5. Start the Zendesk Source connector by loading the connector’s configuration with the following command: ```bash confluent local load zendesk --config zendesk.json ``` 6. Confirm that the connector is in a `RUNNING` state. ```bash confluent local status ZendeskConnector ``` 7. Create one ticket record using Zendesk API as follows. ```bash curl https://{subdomain}.zendesk.com/api/v2/tickets.json \ -d '{"ticket": {"subject": "My printer is on fire!", "comment": { "body": "The smoke is very colorful." }}}' \ -H "Content-Type: application/json" -v -u {email_address}:{password} -X POST ``` 8. Confirm the messages were delivered to the `ZD_tickets` topic in Kafka. Note, it may take a minute before the record populates the topic. ```bash confluent local consume ZD_tickets --from-beginning ``` ## Annotate Confluent custom resources Confluent for Kubernetes (CFK) provides a set of public annotations that you can use to modify a certain workflow or a state of Confluent Platform components. The annotations are applied to Confluent Platform custom resources (CRs). platform.confluent.io/force-reconcile : Triggers a reconcile cycle of cluster. Once the reconcile cycle is complete, the annotation value gets reset to `false`. * Supported values: `true`, `false` * Default value: `false` * CR types applied to: All CRs platform.confluent.io/block-reconcile : Blocks the reconcile even when internal resources or the CR spec is changed. This is used primarily to allow users to perform manual workflows. When this is enabled, CFK discards any changes done out of band to the CR. * Supported values: `true`, `false` * Default value: `false` * CR types applied to: All CRs platform.confluent.io/roll-precheck : When set to `disable`, CFK does not perform the pre-check for under-replicated partitions. * Supported values: `disable`, `enable` * Default value: `enable` * CR types applied to: Kafka platform.confluent.io/roll-pause : When set to `true`, the current pod roll will be paused. * Supported values: `false`, `true` * Default value: `false` * CR types applied to: Kafka platform.confluent.io/disable-garbage-collection : Disables CFK from garbage collecting Kubernetes resources that CFK internally manages. * Supported values: `false`, `true` * Default value: `true` * CR types applied to: Control Center, Connect, Kafka, REST Proxy, ksqlDB, Schema Registry, ZooKeeper platform.confluent.io/enable-shrink : Enables the shrink workflow for the Kafka CR. This should only be enabled when the Kafka image of the version 7.0 or higher. * Supported values: `true`, `false` * Default value: `true` * CR types applied to: Kafka platform.confluent.io/disable-internal-rolebindings-creation : Defines whether to disable internal rolebinding creation in RBAC security settings. * Supported values: `true`, `false` * Default value: `false` * CR types applied to: Control Center, Connect, REST Proxy, ksqlDB, Schema Registry platform.confluent.io/soft-delete-versions : A list of versions to trigger a soft delete workflow for the Schema CR. * Supported values: A JSON formatted array, for example, `[1,2,3]` * Default value: None * CR types applied to: Schema platform.confluent.io/delete-versions : A list of versions to trigger a hard delete workflow for the Schema CR. * Supported values: A JSON formatted array, for example, `[1,2,3]` * Default value: None * CR types applied to: Schema platform.confluent.io/restart-connector : Triggers a restart of the Connector. * Supported values: `true`, `false` * Default value: `false` * CR types applied to: Connector platform.confluent.io/pause-connector : Pauses the connector. * Supported values: `true`, `false` * Default value: `false` * CR types applied to: Connector platform.confluent.io/resume-connector : Resumes the connector. * Supported values: `true`, `false` * Default value: `false` * CR types applied to: Connector platform.confluent.io/restart-task : Triggers a restart of the specified Connector task. * Supported values: A `int32` type number * Default value: None * CR types applied to: Connector platform.confluent.io/http-timeout-in-seconds : Specifies the HTTP client timeout in seconds for the CR workflows. * Supported values: A `int32` type number * Default value: None * CR types applied to: Control Center, Connect, Kafka, KafkaTopic, ClusterLink, Schema platform.confluent.io/confluent-hub-install-extra-args : Additional arguments for the Connect CR. The extra arguments will be used when the Connect starts up and downloads plugins from the Confluent Hub. * Supported values: A string of flags, for example, `--worker-configs /dev/null --component-dir /mnt/plugins` * Default value: None * CR types applied to: Connect platform.confluent.io/pod-overlay-configmap-name : Configures additional Kubernetes features that are not supported in the CFK API. * Supported values: A ConfigMap name. For details on the Pod Overlay feature and the associated ConfigMap, see [Customize Confluent Platform pods with Pod Overlay](#co-pod-overlay). * Default value: None * CR types applied to: Control Center, Connect, Kafka, REST Proxy, ksqlDB, Schema Registry, ZooKeeper, KRaft platform.confluent.io/enable-dynamic-configs : Enables dynamic TLS certificates rotation for Kafka listeners and Kafka REST class service so that the Kafka cluster does not roll when certificates change. * Supported values: `true`, `false` * Default value: `false` * CR types applied to: Kafka platform.confluent.io/pvc-access-mode : Sets the Persistent Volume Claim (PVC) access mode which specifies how CFK pods can interact with the underlying storage provided by a Persistent Volume. * Supported values: `ReadWriteOnce`, `ReadWriteMany` * Default value: `ReadWriteOnce` * CR types applied to: Kafka, ZooKeeper (Confluent Platform 7.9 or earlier only), KRaft, ksqlDB, Control Center, REST Proxy For details, see [Configure Storage for Confluent Platform Using Confluent for Kubernetes](co-storage.md#co-storage). platform.confluent.io/disable-hard-delete-schema : Disables hard delete of a schema. A hard delete removes all metadata, including schema IDs. * Supported values: `true`, `false` * Default value: `false` * CR types applied to: Schema To add an annotation, run the following command: ```bash kubectl annotate -n ="" ``` To delete an annotation, run the following command: ```bash kubectl annotate -n - ``` ### Configure CSFLE in CFK To deploy CSFLE with CFK: 1. Deploy Confluent Platform components. Specifically, configure Schema Registry with the required configuration for CSFLE using the `configOverrides.server` property in the SchemaRegistry custom resource (CR) YAML. For example: ```yaml kind: SchemaRegistry spec: replicas: 1 image: application: confluentinc/cp-schema-registry init: confluentinc/confluent-init-container configOverrides: server: - resource.extension.class=io.confluent.kafka.schemaregistry.rulehandler.RuleSetResourceExtension,io.confluent.dekregistry.DekRegistryResourceExtension - confluent.license.addon.csfle= ``` The value for `confluent.license.addon.csfle` is the same as your main Confluent Platform Enterprise license key. 2. Grant necessary RBAC permissions for users and key resources (topics, subjects, and KEKs). For example: * Give the `ResourceOwner` role to Schema Registry’s internal Kafka client for the `_dek_registry_keys` topic. * Grant user roles for topics, subjects, and KEK resources. You can use the [ConfluentRolebinding CR](co-manage-rbac.md#co-create-rolebinding) or the [confluent iam rbac role-binding create](https://docs.confluent.io/confluent-cli/current/command-reference/iam/rbac/role-binding/confluent_iam_rbac_role-binding_create.html) command. For example: ```bash confluent iam rbac role-binding create \ --principal User:sr \ --role ResourceOwner \ --resource Topic:_dek_registry_keys \ --kafka-cluster ``` * For enabling RBAC for DEK Registry, see [Access control (RBAC) for CSFLE](https://docs.confluent.io/platform/current/security/protect-data/csfle/client-side.html#access-control-rbac-for-csfle). 3. Register Schemas and RuleSets using the REST API, tagging fields for encryption. * Define the [schema for the topic](https://docs.confluent.io/platform/current/schema-registry/schema.html#create-a-topic-schema-in-c3-short) and add [tags to the schema fields](https://docs.confluent.io/platform/current/security/protect-data/csfle/client-side.html#step-3-add-tags-to-the-schema-fields) in the schema that you want to encrypt. * Define an [encryption policy](https://docs.confluent.io/platform/current/security/protect-data/csfle/client-side.html#step-3-define-an-encryption-policy) that specifies rules to use to encrypt the tags. 4. Register the KEK resource using the [Schema Registry REST API](https://docs.confluent.io/cloud/current/api.html#tag/Key-Encryption-Keys-(v1)/operation/createKek) or the [register-deks](https://docs.confluent.io/platform/current/security/protect-data/csfle/manage-keys.html#register-a-dek) command. An example JSON payload for the REST API: ```json { "name": "my-kek", "kmsType": "local-kms", "kmsKeyId": "mykey", "shared": false } ``` For details on managing CSFLE keys, see [Manage CSFLE keys](https://docs.confluent.io/platform/current/security/protect-data/csfle/manage-keys.html). 5. Configure the Java client with KMS credentials or local secret, and produce messages with encrypted fields. * Clients must be configured to provide the secret or KMS credentials at runtime. For local KEK, this is usually a base64 string; for AWS KMS, environment variables for credentials must be set. Example Java client properties: ```properties props.put("rule.executors._default_.param.secret", "pgbju8SjcaWJOtTSgeBckA=="); props.put("schema.registry.basic.auth.user.info", "testadmin:testadmin"); ``` * Only fields of type `string` or `byte` with the correct tag are supported for encryption. * When a message is produced with CSFLE, the tagged field is encrypted using the configured KEK. * Consumers must provide the correct key/credential to decrypt and read the tagged field. ### Requirements and considerations for RBAC with LDAP The following are the requirements and considerations for enabling and using RBAC using LDAP: * You must have an LDAP server that Confluent Platform can use for authentication. Currently, CFK only supports the `GROUPS` LDAP search mode. The search mode indicates if the user-to-group mapping is retrieved by searching for group or user entries. If you need to use the `USERS` search mode, specify using the `configOverrides` setting in the Kafka CR as below: ```yaml spec: configOverrides: server: - ldap.search.mode=USERS ``` See [Sample Configuration for User-Based Search](https://docs.confluent.io/platform/current/security/authorization/ldap/configure.html#sample-configuration-for-user-based-search) for more information. * You must create the user principals in LDAP that will be used by Confluent Platform components. These are the default user principals: * Kafka: `kafka`/`kafka-secret` * Confluent REST API: `erp`/`erp-secret` * Control Center: `c3`/`c3-secret` * ksqlDB: `ksql`/`ksql-secret` * Schema Registry: `sr`/`sr-secret` * Replicator: `replicator`/`replicator-secret` * Connect: `connect`/`connect-secret` * Create the LDAP user/password for a user who has a minimum of LDAP read-only permissions to allow Metadata Service (MDS) to query LDAP about other users. For example, you’d create a user `mds` with password `Developer!` * Create a user for the Admin REST service in LDAP and provide the username and password. ## Validate connections The following are example steps to validate external accesses to Confluent Platform components, using the `example.com` domain and default component prefixes. Control Center (Legacy) UI : In your browser, navigate to [https://controlcenter.example.com:443](https://controlcenter.example.com:443). Kafka : 1. Get the external endpoints of Kafka. * To get the broker endpoints: ```bash oc get kafka kafka -ojsonpath='{.status.listeners.external.advertisedExternalEndpoints}' ``` * To get the Kafka bootstrap server endpoint: ```bash oc get kafka kafka -ojsonpath='{.status.listeners.external.externalEndpoint}' ``` 2. Create a topic. For this step, you need the Confluent CLI tool on your local system. [Install Confluent CLI](https://docs.confluent.io/confluent-cli/current/install.html) on your local system to get access to the tool. For example: ```bash confluent kafka topic create mytest \ --partitions 3 --replication-factor 2 \ --url kafka.example.com:443 ``` 3. In Control Center (Legacy), validate that the `mytest` topic was created. Connect : 1. Get the external endpoint of the component: ```bash oc get connect connect -ojsonpath='{.status.restConfig.externalEndpoint}' ``` 2. Verify that you can reach the component endpoint. For example: ```bash curl https://connect.example.com:443 -ik -s -H "Content-Type: application/json" ``` ksqlDB : 1. Get the external endpoint of the component: ```bash oc get ksqldb ksqldb -ojsonpath='{.status.restConfig.externalEndpoint}' ``` 2. Verify that you can reach the component endpoint. For example: ```bash curl https://ksqldb.example.com:443/ksql -ik -s -H "Content-Type: application/vnd.ksql.v1+json; charset=utf-8" -X POST --data '{"ksql": "LIST ALL TOPICS;", "streamsProperties": {}}' ``` Schema Registry : 1. Get the external endpoint of the component: ```bash oc get schemaregistry schemaregistry -ojsonpath='{.status.restConfig.externalEndpoint}' ``` 2. Verify that you can reach the component endpoint. For example: ```bash curl -ik https://schemaregistry.example.com:443/subjects ``` Control Center (Legacy) : 1. Get the external endpoint of the component: ```bash oc get controlcenter controlcenter -ojsonpath='{.status.restConfig.externalEndpoint}' ``` 2. Verify that you can reach the component endpoint. For example: ```bash curl https://controlcenter.example.com:443/2.0/health/status -ik -s -H "Content-Type: application/json" ``` REST Proxy : 1. Get the external endpoint of the component: ```bash oc get kafkarestproxy kafkarestproxy -ojsonpath='{.status.restConfig.externalEndpoint}' ``` 2. Verify that you can reach the component endpoint. For example: ```bash curl -ik https://kafkarestproxy.example.com:443/v3/clusters ``` ### Use statically provisioned persistent volumes By default, CFK automates disk management by leveraging Kubernetes dynamic storage provisioning. If your Kubernetes cluster does not support dynamic provisioning, you can follow the instructions in this section to use statically-provisioned disks for your Confluent Platform deployments. Connect and Schema Registry do not use persistent storage volumes, so you do not need to follow the steps in this section. To use statically-provisioned persistent volumes for a Confluent Platform component: 1. Create a StorageClass in Kubernetes for local provisioning. For example: ```yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: my-storage-class provisioner: kubernetes.io/no-provisioner volumeBindingMode: WaitForFirstConsumer ``` 2. Create [PersistentVolumes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#persistent-volumes) with the desired host path and the hostname label for each of the desired worker nodes. You need the following number of persistent volumes for Confluent Platform components: * 2 persistent volumes on each ZooKeeper host (Confluent Platform 7.9 or earlier only) * 1 persistent volume on each Kafka, ksqlDB, Control Center host For example: ```yaml apiVersion: v1 kind: PersistentVolume metadata: name: pv-1 --- [1] spec: capacity: storage: 10Gi --- [2] volumeMode: Filesystem accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Retain --- [3] storageClassName: my-storage-class --- [4] local: path: /mnt/data/broker-1-data --- [5] nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - gke-myhost-cluster-default-pool-5cc13882-k0gb --- [6] ``` * [1] Choose a name for the PersistentVolume. * [2] Choose a storage size that is greater than or equal to the storage you’re requesting for each Kafka broker instance. This corresponds to the `spec.dataVolumeCapacity` property of the component CR. * [3] Choose `Retain` if you want the data to be retained after you delete the PersitentVolumeClaim that CFK will eventually create and which Kubernetes will eventually bind to this PersistentVolume. Choose `Delete` if you want this data to be garbage-collected when the PersistentVolumeClaim is deleted. #### WARNING With `persistentVolumeReclaimPolicy: Delete`, your data on the volume will be deleted when you delete the CFK component custom resource (CR), for example, when you delete the Kafka CR with the `kubectl delete kafka ` . * [4] The `storageClassName` must match the one created in Step 1. * [5] This is the directory path you want to use on the worker node for the broker as its persistent data volume. The path must exist on the worker node. * [6] This is the value of the `kubernetes.io/hostname` label of the worker node you want to host this broker instance. To find this hostname, run the following command: ```bash kubectl get nodes \ -o 'custom-columns=NAME:metadata.name,HOSTNAME:metadata.labels.kubernetes\.io/hostname' NAME HOSTNAME gke-myhost-cluster-default-pool-5cc13882-k0gb gke-myhost-cluster-default-pool-5cc13882-k0gb gke-myhost-cluster-default-pool-5cc13882-n8vr gke-myhost-cluster-default-pool-5cc13882-n8vr gke-myhost-cluster-default-pool-5cc13882-tbbj gke-myhost-cluster-default-pool-5cc13882-tbbj ``` 3. Add the storageClass to the component CR, for example: ```yaml spec: storageClass: name: my-storage-class ``` 4. After deploying the new CR, validate that the PersistentVolumes are bound: ```bash kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS pv-1 10Gi RWO Retain Bound operator/data0-kafka-0 my-storage-class pv-2 10Gi RWO Retain Bound operator/data0-kafka-2 my-storage-class pv-3 10Gi RWO Retain Bound operator/data0-kafka-1 my-storage-class ``` 5. Validate that the Confluent Platform pods are healthy. For example: ```bash kubectl get pods -l app=kafka NAME READY STATUS RESTARTS AGE kafka-0 1/1 Running 0 40m kafka-1 1/1 Running 0 40m kafka-2 1/1 Running 0 40m ``` #### Use a Dead Letter Queue with security When you use Confluent Platform with security enabled, the Confluent Platform [Admin Client](../installation/configuration/admin-configs.md#cp-config-admin) creates the Dead Letter Queue (DLQ) topic. Invalid records are first passed to an internal producer constructed to send these records, and then, the Admin Client creates the DLQ topic. For the DLQ to work in a secure Confluent Platform environment, you must add additional Admin Client configuration properties (prefixed with `admin.*`) to the Connect Worker configuration. The following [SASL/PLAIN](../security/authentication/mutual-tls/overview.md#kafka-ssl-authentication) example shows additional Connect Worker configuration properties: ```bash admin.ssl.endpoint.identification.algorithm=https admin.sasl.mechanism=PLAIN admin.security.protocol=SASL_SSL admin.request.timeout.ms=20000 admin.retry.backoff.ms=500 admin.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \ username="" \ password=""; ``` For details about configuring your Connect worker, sink connector, and dead letter queue topic in a Role-Based Access Control (RBAC) environment, see [Kafka Connect and RBAC](rbac-index.md#connect-rbac-index). ## ResourceOwner and UserAdmin The `ResourceOwner` (the user creating a new connector) is responsible for submitting the request for connector credentials to the `UserAdmin` before creating the connector. Once the request is received, the `UserAdmin` creates the secrets for the connector with a path consisting of the connector name and the keys `username` and `password` for the service account that has permissions to access the topics that the connector will consume from or produce to. The secrets are created using a [POST API request](#connect-rbac-secret-registry-api). For example: ```text POST /secret/paths//keys//versions { "secret": "" } ``` The following properties are then included in the connector configuration: **Sink connector properties:** ```properties consumer.override.sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required \ username="${secret::}" \ password="${secret::}" \ metadataServerUrls="http://:8090"; ``` **Source connector properties:** ```properties producer.override.sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required \ username="${secret::}" \ password="${secret::}" \ metadataServerUrls="http:/:8090"; ``` When the user submits the connector configuration, Connect validates that all external variable references have a path that matches the connector ID. The connector configuration is rejected if the connector configuration has variable references with a path that does not match the connector ID. # Configure Kerberos Authentication for Brokers Running MDS This configuration describes how to combine LDAP authentication for MDS with Kerberos broker authentication, essentially combining the two authentication methods. Prerequisites : - The prerequisites for configuring Kerberos authentication for MDS are the same as the prerequisites for configuring MDS. See [Configure Metadata Service (MDS) in Confluent Platform](index.md#rbac-mds-config). - Create a user for the Kafka broker. - Generate the keytab. See [Configure GSSAPI in Confluent Platform clusters](../../security/authentication/sasl/gssapi/overview.md#kafka-sasl-auth-gssapi). - [Create a PEM key pair](index.md#create-pem-key-pair). 1. Add the following required configuration options to the `etc.kafka.server.properties` file. Any content in brackets (`<>`) must be customized for your environment. #### NOTE The LDAP configuration attributes in this example reflect a system using Active Directory (AD). If you use a different directory system, contact your LDAP administrator for details. ```RST ############################# Confluent Authorizer Settings ############################# authorizer.class.name=io.confluent.kafka.security.authorizer.ConfluentServerAuthorizer confluent.authorizer.access.rule.providers=ZK_ACL,CONFLUENT confluent.metadata.server.listeners=http://0.0.0.0:8090 confluent.metadata.server.advertised.listeners=http://localhost:8090 #### Semi-colon separated list of super users in the format : #### #### For example: super.users=User:admin;User:mds #### super.users=User:;User: ############################# Identity Provider Settings (LDAP) ############################# #### JNDI Connection Settings #### ldap.java.naming.factory.initial=com.sun.jndi.ldap.LdapCtxFactory ldap.java.naming.provider.url=ldap://:389 #### MDS Authentication Settings #### ldap.java.naming.security.principal= ldap.java.naming.security.credentials= ldap.java.naming.security.authentication=simple #### Client Authentication Settings #### ldap.user.search.base= ldap.user.name.attribute=sAMAccountName ldap.group.search.base=CN=Users,DC=rbac,DC=confluent,DC=io ldap.group.object.class=group ldap.group.member.attribute.pattern=UID=(.*),OU=Users,DC=EXAMPLE,DC=COM ldap.user.object.class=account ############################# MDS Server Settings ############################# confluent.metadata.server.authentication.method=BEARER ############################# MDS Token Service Settings ############################# confluent.metadata.server.token.key.path= ############################# Listener Settings ############################# listeners=INTERNAL_SASL_PLAINTEXT://:9093,EXTERNAL_RBAC_SASL_PLAINTEXT://:9092 advertised.listeners=INTERNAL_SASL_PLAINTEXT://localhost:9093,EXTERNAL_RBAC_SASL_PLAINTEXT://localhost:9092 inter.broker.listener.name=INTERNAL_SASL_PLAINTEXT ############################# Listener SASL Configuration Settings ############################# listener.security.protocol.map=INTERNAL_SASL_PLAINTEXT:SASL_PLAINTEXT,EXTERNAL_RBAC_SASL_PLAINTEXT:SASL_PLAINTEXT ############################# Broker Internal Listener SASL Configuration Settings ############################# sasl.mechanism.inter.broker.protocol=GSSAPI listener.name.internal_sasl_plaintext.sasl.enabled.mechanisms=GSSAPI listener.name.internal_sasl_plaintext.sasl.kerberos.service.name=kafka listener.name.internal_sasl_plaintext.gssapi.sasl.jaas.config = \ com.sun.security.auth.module.Krb5LoginModule required \ debug=true \ useKeyTab=true \ storeKey=true \ keyTab="" \ principal=""; (for example: kafka/kafka1.hostname.com@EXAMPLE.COM) ############################# Broker External (Client) Listener SASL Configuration Settings ############################# listener.name.external_rbac_sasl_plaintext.sasl.enabled.mechanisms=OAUTHBEARER listener.name.external_rbac_sasl_plaintext.oauthbearer.sasl.jaas.config= \ org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required \ publicKeyPath=" Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ## Starting the ksqlDB Server The ksqlDB servers are run separately from the ksqlDB CLI client and Kafka brokers. You can deploy servers on remote machines, VMs, or containers and then the CLI connects to these remote servers. You can add or remove servers from the same resource pool during live operations, to elastically scale query processing. You can use different resource pools to support workload isolation. For example, you could deploy separate pools for production and for testing. You can only connect to one ksqlDB server at a time. The ksqlDB CLI does not support automatic failover to another ksqlDB server. ![image](ksqldb/images/client-server.png) Follow these instructions to start ksqlDB server using the `ksql-server-start` script. 1. Specify your ksqlDB server configuration parameters. You can also set any property for the Kafka Streams API, the Kafka producer, or the Kafka consumer. The required parameters are `bootstrap.servers` and `listeners`. You can specify the parameters in the ksqlDB properties file or the `KSQL_OPTS` environment variable. Properties set with `KSQL_OPTS` take precedence over those specified in the properties file. A recommended approach is to configure a common set of properties using the ksqlDB configuration file and override specific properties as needed, using the `KSQL_OPTS` environment variable. Here are the default settings: ```none bootstrap.servers=localhost:9092 listeners=http://0.0.0.0:8088 ``` For more information, see [Configure ksqlDB Server](operate-and-deploy/installation/server-config.md#ksqldb-install-configure-server). 2. Start a server node with this command: ```bash ksql-server-start ${CONFLUENT_HOME}/etc/ksqldb/ksql-server.properties ``` 3. Have a look at [this page](operate-and-deploy/installation/server-config.md#ksqldb-install-configure-server-non-interactive-usage) for instructions on running ksqlDB in non-interactive (aka headless) mode. ## Specify ksqlDB Server configuration parameters You can specify the configuration for your ksqlDB Server instances by using these approaches: - **The \`\`environment\`\` key:** In the stack file, populate the `environment` key with your settings. By convention, the ksqlDB setting names are prepended with `KSQL_`. - **\`\`–env\`\` option:** On the [docker run](https://docs.docker.com/engine/reference/commandline/run/) command line, specify your settings by using the `--env` option once for each parameter. For more information, see [Configure ksqlDB with Docker](install-ksqldb-with-docker.md#ksqldb-install-configure-with-docker). - **ksqlDB Server config file:** Add settings to the `ksql-server.properties` file. This requires building your own Docker image for ksqlDB Server. For more information, see [Configuring ksqlDB Server](server-config.md#ksqldb-install-configure-server). For a complete list of ksqlDB parameters, see the [Configuration Parameter Reference](../../reference/server-configuration.md#ksqldb-reference-server-configuration). You can also set any property for the Kafka Streams API, the Kafka producer, or the Kafka consumer. A recommended approach is to configure a common set of properties using the ksqlDB Server configuration file and override specific properties as needed, using the environment variables. ksqlDB must have access to a running Kafka cluster, which can be on your local machine, in a data center, a public cloud, or Confluent Cloud. For ksqlDB Server to connect to a Kafka cluster, the required parameters are `KSQL_LISTENERS` and `KSQL_BOOTSTRAP_SERVERS`, which have the following default values: ```yaml environment: KSQL_LISTENERS: http://0.0.0.0:8088 KSQL_BOOTSTRAP_SERVERS: localhost:9092 ``` ksqlDB runs separately from your Kafka cluster, so you specify the IP addresses of the cluster’s bootstrap servers when you start a container for ksqlDB Server. For more information, see [Configuring ksqlDB Server](server-config.md#ksqldb-install-configure-server). To start ksqlDB containers in different configurations, see [Configure ksqlDB with Docker](install-ksqldb-with-docker.md#ksqldb-install-configure-with-docker). ## Connecting to Confluent Cloud ksqlDB To use the `ksql-migrations` tool with your [Confluent Cloud ksqlDB cluster](/cloud/current/ksqldb/overview.html) cluster, set the following configurations in your `ksql-migrations.properties` file, which is created as part of [setting up your migrations project](#ksqldb-manage-metadata-schemas-initial-setup). ```properties ksql.auth.basic.username= ksql.auth.basic.password= ksql.migrations.topic.replicas=3 ssl.alpn=true ``` #### Commands and flags To create a cluster link, use `kafka-cluster-links` along with [bootstrap-server](#bootstrap-cluster-links) and the following flags. `--link` : (Required) The name of the cluster link to create. Must be a unique cluster link name within the cluster. * Type: string `--cluster-id` : (Required) The ID of the source cluster to link to. You can find a cluster’s ID with the CLI command `kafka-cluster cluster-id`. * Type: string (Required) One of the following parameters must be provided (not both) to specify how the destination cluster should communicate with the source. The available configurations are those that would be used to configure a client, including the required `bootstrap.servers` and other necessary security and authorization properties. `--config` : Comma-separated configurations to be applied to the cluster link on creation of the form “key=value”. When you use this flag, the configurations are specified directly on the command line (as opposed to in a file, as described for the next flag). You can use square brackets to group values that contain commas. For a full list of available configurations, see [Link Properties](configs.md#cluster-link-specific-configs). * Type: string `--config-file` : Property file containing [configurations](configs.md#cluster-link-specific-configs) for the cluster link. This is the recommended way to specify cluster link configurations. * Type: string For example, if you specify the following configuration for a secure cluster link in a file named `link-config.properties`: ```bash bootstrap.servers=example-1:9092,example-2:9092,example-3:9092 sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="example-user" password="example-password" security.protocol=SASL_SSL ssl.endpoint.identification.algorithm=https ``` Then, you can create the cluster link `example-link` with the following command: ```bash kafka-cluster-links --bootstrap-server localhost:9093 --create --link example-link --config-file link-config.properties --cluster-id pz-s7W72Sdm7A11wzku9gA ``` Optional configurations: `--command-config` : Property file containing configurations to be passed to the [AdminClient](../../installation/configuration/admin-configs.md#cp-config-admin). For example, with security credentials for authorization and authentication. `--consumer-group-filters-json` : JSON string to use for configuration of `consumer.offset.group.filters`. To learn more, see [Migrating consumer groups from source to destination cluster](#cluster-link-migrate-consumer-groups). * Type: string `--consumer-group-filters-json-file` : Path to JSON file to use for configuration of `consumer.offset.group.filters`. To learn more, see [Migrating consumer groups from source to destination cluster](#cluster-link-migrate-consumer-groups). * Type: string `--acl-filters-json-file` : Path to the ACL filters JSON file to use for configuration of `acl.filters`. To learn more, see [Migrating ACLs from Source to Destination Cluster](security.md#cluster-link-acls-migrate). * Type: string `--validate-only` : If provided, validates that the cluster link can be created as specified, but does not create it. `--exclude-validate-link` : If provided, creates the link without validating that the source cluster can be reached. This is helpful only if the source cluster is not yet running or reachable. If the source cluster is running and available, using this option is not recommended, as it skips helpful validations. `--topic-filters-json` : JSON string to use for configuration of `auto.create.mirror.topics.filters`. To learn more, see [Mirror Topics](mirror-topics-cp.md#mirror-topics-concepts). `--topic-filters-json-file` : Path to JSON file to use for configuration of `auto.create.mirror.topics.filters`. To learn more, see [Mirror Topics](mirror-topics-cp.md#mirror-topics-concepts). ### Confluent Cloud Confluent Cloud prerequisites are: - A Confluent Cloud account - Permission to create a topic and schema in a cluster in Confluent Cloud - Stream Governance Package enabled - API key and secret for Confluent Cloud cluster (`$APIKEY`, `$APISECRET`) - API key and secret for Schema Registry (`$SR_APIKEY`, `$SR_APISECRET`) - Schema Registry endpoint URL (`$SCHEMA_REGISTRY_URL`) - Cluster ID (`$CLUSTER_ID`) - Schema registry cluster ID (`$SR_CLUSTER_ID`) The examples assume that API keys, secrets, cluster IDs, and API endpoints are stored in persistent environment variables wherever possible, and refer to them as such. You can store these in shell variables if your setup is temporary. If you want to return to this environment and cluster for future work, consider storing them in a profile (such as `.zsh`, `.bashrc`, or `powershell.exe` profiles). The following steps provide guidelines on these prerequisites specific to these examples. To learn more general information, see [Manage Clusters](/cloud/current/clusters/create-cluster.html#manage-clusters-in-ccloud). 1. Log in to Confluent Cloud: ```bash confluent login ``` 2. Create a Kafka cluster in Confluent Cloud ```bash confluent kafka cluster create [flags] ``` For example: ```bash confluent kafka cluster create quickstart_cluster --cloud "aws" --region "us-west-2" ``` Your output will include a cluster ID (in the form of `lkc-xxxxxx`), show the cluster name and [cluster type](/cloud/current/clusters/cluster-types.html#ccloud-features-and-limits-by-cluster-type) (in this case, “Basic”), and endpoints. Take note of the cluster ID, and store it in an environment variable such as `$CLUSTER_ID`. 3. Get an API key and secret for the cluster: ```bash confluent api-key create --resource $CLUSTER_ID ``` Store the API key and secret for your cluster in a safe place, such as shell environment variables: `$APIKEY`, `$APISECRET` 4. View Stream Governance packages and Schema Registry endpoint URL. A [Stream Governance package](/cloud/current/stream-governance/packages.html#how-to-enable-sr-or-upgrade-to-a-stream-governance-package) was enabled as a part of creating the environment. - To view governance packages, use the Confluent CLI command [confluent environment list](https://docs.confluent.io/confluent-cli/current/command-reference/environment/confluent_environment_list.html): ```bash confluent environment list ``` Your output will show the environment ID, name, and associated Stream Governance packages. - To view the Stream Governance API endpoint URL, use the command [confluent schema-registry cluster describe](https://docs.confluent.io/confluent-cli/current/command-reference/schema-registry/cluster/confluent_schema-registry_cluster_describe.html): ```bash confluent schema-registry cluster describe ``` Your output will show the Schema Registry cluster ID in the form of `lsrc-xxxxxx`) and endpoint URL, which is also available to you in Cloud Console on the right side panel under “Stream Governance API” in the environment. Store these in environment variables: `$SR_CLUSTER_ID` and `$SCHEMA_REGISTRY_URL`. 5. Create a Schema Registry API key, using the Schema Registry cluster ID (`$SR_CLUSTER_ID`) from the previous step as the resource ID. ```bash confluent api-key create --resource $SR_CLUSTER_ID ``` Store the API key and secret for your Schema Registry in a safe place, such as shell environment variables: `$SR_APIKEY` and `$SR_APISECRET` ### Confluent Cloud 1. Create a Kafka topic: ```bash confluent kafka topic create transactions-avro --cluster $CLUSTER_ID ``` 2. Copy the following schema and store it in a file called `schema.txt`: ```none { "type": "record", "name": "Transaction", "fields": [ {"name": "id", "type": "string"}, {"name": "amount", "type": "double"} ] } ``` 3. Run the following command to create a producer with the schema created in the previous step: ```bash confluent kafka topic produce transactions-avro \ --cluster $CLUSTER_ID \ --schema "/schema.txt" --schema-registry-endpoint $SCHEMA_REGISTRY_URL \ --schema-registry-api-key $SR_APIKEY \ --schema-registry-api-secret $SR_APISECRET \ --api-key $APIKEY --api-secret $APISECRET \ --value-format "avro" ``` Your output should resemble: ```bash Successfully registered schema with ID 100001 Starting Kafka Producer. Use Ctrl-C or Ctrl-D to exit. ``` 4. Type the following command in the shell, and hit return. ```none { "id":"1000", "amount":500 } ``` 5. Open another terminal and run a consumer to read from topic `transactions-avro` and get the value of the message in JSON: ```bash confluent kafka topic consume transactions-avro \ --cluster $CLUSTER_ID \ --from-beginning \ --value-format "avro" \ --schema-registry-endpoint $SCHEMA_REGISTRY_URL \ --schema-registry-api-key $SR_APIKEY \ --schema-registry-api-secret $SR_APISECRET \ --api-key $APIKEY --api-secret $APISECRET ``` Your output should be: ```bash {"id":"1000","amount":500} ``` 6. Register a new schema version under the same subject by adding a new field, `customer_id`. Since the default subject level compatibility is BACKWARD, you must add the new field as “optional” in order for it to be compatible with the previous version. Create a new file as `schema2.txt` and copy the following schema in it: ```bash { "type": "record", "name": "Transaction", "fields": [ {"name": "id", "type": "string"}, {"name": "amount", "type": "double"}, {"name": "customer_id", "type": "string", "default":"null"} ] } ``` Open another terminal, and run the following command: ```bash confluent kafka topic produce transactions-avro \ --cluster $CLUSTER_ID \ --schema "/schema2.txt" \ --schema-registry-endpoint $SCHEMA_REGISTRY_URL \ --schema-registry-api-key $SR_APIKEY \ --schema-registry-api-secret $SR_APISECRET \ --api-key $APIKEY \ --api-secret "$APISECRET" \ --value-format "avro" ``` Your output should resemble: ```bash Successfully registered schema with ID 100002 Starting Kafka Producer. Use Ctrl-C or Ctrl-D to exit. ``` 7. Type the following into your producer, and hit return: ```none { "id":"1001", "amount":500, "customer_id":"1221" } ``` 8. Switch to the terminal with your running consumer to read from topic `transactions-avro` and get the new message. You should see the new output added to the original.: ```none {"id":"1000","amount":500.0} {"id":"1001","amount":500.0,"customer_id":"1221"} ``` (If by chance you closed the original consumer, just restart it using the same command shown in step 5.) 9. View the schemas that were registered with Schema Registry as versions 1 and 2. ```none confluent schema-registry schema describe --subject transactions-avro-value --version 1 ``` Your output should be similar to the following, showing the `id` and `amount` fields added in version 1 of the schema: ```none Schema ID: 100003 Type: JSON Schema: {"type":"object","properties":{"id":{"type":"string"},"amount":{"type":"number"}}} ``` To view version 2: ```none confluent schema-registry schema describe --subject transactions-avro-value --version 2 ``` Output for version 2 will include the `customer_id` field: ```none Schema ID: 100002 Schema: {"type":"record","name":"Transaction","fields":[{"name":"id","type":"string"},{"name":"amount","type":"double"},{"name":"customer_id","type":"string","default":"null"}]} ``` 10. Use the Confluent Cloud Console to examine schemas and messages. Messages that were successfully produced also show on the Confluent Cloud Console ([https://confluent.cloud/](https://confluent.cloud/)). in **Topics > > Messages**. You may have to select a partition or jump to a timestamp to see messages sent earlier. (For timestamp, type in a number, which will default to partition `1/Partition: 0`, and press return. To get the message view shown here, select the **cards** icon on the upper right.) ![image](images/serdes-avro-cloud-ui-messages.png) Schemas you create are available on the **Schemas** tab for the selected topic. ![image](images/serdes-avro-cloud-ui-schema.png) 11. Run shutdown and cleanup tasks. - You can stop the consumer and producer with Ctl-C in their respective command windows. - If you were using shell environment variables and want to keep them for later, remember to store them in a safe, persistent location. - You can remove topics, clusters, and environments from the [command line](https://docs.confluent.io/confluent-cli/current/command-reference/overview.html) or from the [Confluent Cloud Console](https://confluent.cloud/). ### Confluent Cloud 1. Create a Kafka topic: ```bash confluent kafka topic create transactions-json --cluster $CLUSTER_ID ``` 2. Copy the following schema and store it in a file called `schema.txt`: ```bash { "type":"object", "properties":{ "id":{"type":"string"}, "amount":{"type":"number"} } } ``` 3. Run the following command to create a producer with the schema created in the previous step: ```bash confluent kafka topic produce transactions-json \ --cluster $CLUSTER_ID \ --schema "/schema.txt" --schema-registry-endpoint $SCHEMA_REGISTRY_URL \ --schema-registry-api-key $SR_APIKEY \ --schema-registry-api-secret $SR_APISECRET \ --api-key $APIKEY --api-secret $APISECRET \ --value-format "jsonschema" ``` Your output should resemble: ```bash Successfully registered schema with ID 100001 Starting Kafka Producer. Use Ctrl-C or Ctrl-D to exit. ``` 4. Type the following command in the shell, and hit return. ```none { "id":"1000", "amount":500 } ``` 5. Open another terminal and run a consumer to read from topic `transactions-json` and get the value of the message in JSON: ```bash confluent kafka topic consume transactions-json \ --cluster $CLUSTER_ID \ --from-beginning \ --value-format "jsonschema" \ --schema-registry-endpoint $SCHEMA_REGISTRY_URL \ --schema-registry-api-key $SR_APIKEY \ --schema-registry-api-secret $SR_APISECRET \ --api-key $APIKEY --api-secret $APISECRET ``` Your output should be: ```bash {"id":"1000","amount":500} ``` 6. Use the producer to send another record as the message value, which includes a new property not explicitly declared in the schema. JSON Schema has an open content model, which allows any number of additional properties to appear in a JSON document without being specified in the JSON schema. This is achieved with `additionalProperties` set to `true`, which is the default. If you do not explicitly disable `additionalProperties` (by setting it to `false`), undeclared properties are allowed in records. These next few steps demonstrate this unique aspect of JSON Schema. Return to the producer session that is already running and send the following message, which includes a new property `"customer_id"` that is not declared in the schema with which we started this producer. (Hit return to send the message.) ```none {"id":"1000","amount":500,"customer_id":"1221"} ``` 7. Return to your running consumer to read from topic `transactions-json` and get the new message. You should see the new output added to the original. ```none {"id":"1000","amount":500} {"id":"1000","amount":500,"customer_id":"1221"} ``` The message with the new property (`customer_id`) is successfully produced and read. If you try this with the other schema formats (Avro, Protobuf), it will fail at the producer command because those specifications require that all properties be explicitly declared in the schemas. Keep this consumer running. 8. Update the compatibility requirement for the subject `transactions-json-value`. ```none confluent schema-registry subject update transactions-json-value --compatibility "none" ``` The output message is ```none Successfully updated Subject Level compatibility to "none" for subject "transactions-json-value" ``` 9. Store the following schema in a file called `schema2.txt`: ```bash { "type":"object", "properties":{ "id":{"type":"string"}, "amount":{"type":"number"} }, "additionalProperties": false } ``` Note that this schema is almost the same as the original in `schema.txt`, except that in this schema `additionalProperties` is explicitly set to false. 10. Run another producer to register the new schema. Use Ctl-C to shut down the running producer, and start a new one to register the new schema. ```bash confluent kafka topic produce transactions-json \ --cluster $CLUSTER_ID \ --schema "/schema2.txt" --schema-registry-endpoint $SCHEMA_REGISTRY_URL \ --schema-registry-api-key $SR_APIKEY \ --schema-registry-api-secret $SR_APISECRET \ --api-key $APIKEY --api-secret $APISECRET \ --value-format "jsonschema" ``` 11. Attempt to use this producer to register a new schema, and send another record as the message value, which includes a new property not explicitly declared in the schema. ```none { "id":"1001","amount":500,"customer_id":"this-will-break"} ``` This will break. You will get the following error: ```none Error: the JSON document is invalid ``` The consumer will continue running, but no new messages will be displayed. This is the same behavior you would see by default if using Avro or Protobuf in this scenario. 12. Rerun the producer in default mode as before (by using `schema.txt`) and send a follow-on message with an undeclared property. In the producer command window, stop the producer with Ctl+C. Run the original producer command. Note that there is no need to explicitly declare `additionalProperties` as `true` in the schema (although you could), as this is the default. ```bash confluent kafka topic produce transactions-json \ --cluster $CLUSTER_ID \ --schema "/schema.txt" --schema-registry-endpoint $SCHEMA_REGISTRY_URL \ --schema-registry-api-key $SR_APIKEY \ --schema-registry-api-secret $SR_APISECRET \ --api-key $APIKEY --api-secret $APISECRET \ --value-format "jsonschema" ``` 13. Use the producer to send another record as the message value, which again includes a new property not explicitly declared in the schema. ```none {"id":"1001","amount":500,"customer_id":"this-will-work-again"} ``` 14. Return to the consumer session to read the new message. The consumer should still be running and reading from topic `transactions-json`. You will see following new message in the console. ```none {"id":"1001","amount":500,"customer_id":"this-will-work-again"} ``` More specifically, if you followed all steps in order and started the consumer with the `--from-beginning` flag as mentioned earlier, the consumer shows a history of all messages sent: ```none {"id":"1000","amount":500} {"id":"1000","amount":500,"customer_id":"1221"} {"id":"1001","amount":500,"customer_id":"this-will-work-again"} ``` 15. View the schemas that were registered with Schema Registry as versions 1 and 2. ```none confluent schema-registry schema describe --subject transactions-json-value --version 1 ``` Your output should be similar to the following, showing the `id` and `amount` fields added in version 1 of the schema: ```none Schema ID: 100001 Type: JSON Schema: {"type":"object","properties":{"id":{"type":"string"},"amount":{"type":"number"}}} ``` To view version 2 ```none confluent schema-registry schema describe --subject transactions-avro-value --version 2 ``` Output for version 2 will include the same fields but include the `additionalProperties` flag set to `false`: ```none Schema ID: 100002 Type: JSON Schema: {"type":"object","properties":{"id":{"type":"string"},"amount":{"type":"number"}},"additionalProperties":false} ``` 16. Use the Confluent Cloud Console to examine schemas and messages. Messages that were successfully produced also show on the Confluent Cloud Console ([https://confluent.cloud/](https://confluent.cloud/)). in **Topics > > Messages**. You may have to select a partition or jump to a timestamp to see messages sent earlier. (For timestamp, type in a number, which will default to partition `1/Partition: 0`, and press return. To get the message view shown here, select the **cards** icon on the upper right.) ![image](images/serdes-json-cloud-ui-messages.png) Schemas you create are available on the **Schemas** tab for the selected topic. ![image](images/serdes-json-cloud-ui-schema.png) 17. Run shutdown and cleanup tasks. - You can stop the consumer and producer with Ctl-C in their respective command windows. - If you were using shell environment variables and want to keep them for later, remember to store them in a safe, persistent location. - You can remove topics, clusters, and environments from the [command line](https://docs.confluent.io/confluent-cli/current/command-reference/overview.html) or from the [Confluent Cloud Console](https://confluent.cloud/). ### Confluent Cloud 1. Create a Kafka topic: > ```bash > confluent kafka topic create transactions-protobuf --cluster $CLUSTER_ID > ``` 2. Copy the following schema and store it in a file called `schema.txt`: > ```bash > syntax = "proto3"; > message MyRecord { > string id = 1; > float amount = 2; > } > ``` 3. Run the following command to create a producer with the schema created in the previous step: > ```bash > confluent kafka topic produce transactions-protobuf \ > --cluster $CLUSTER_ID \ > --schema "/schema.txt" > --schema-registry-endpoint $SCHEMA_REGISTRY_URL \ > --schema-registry-api-key $SR_APIKEY \ > --schema-registry-api-secret $SR_APISECRET \ > --api-key $APIKEY --api-secret $APISECRET \ > --value-format "protobuf" > ``` > Your output should resemble: > ```bash > Successfully registered schema with ID 100001 > Starting Kafka Producer. Use Ctrl-C or Ctrl-D to exit. > ``` 4. Type the following command in the shell, and hit return. > ```none > { "id":"1000", "amount":500 } > ``` 5. Open another terminal and run a consumer to read from topic `transactions-protobuf` and get the value of the message in JSON: > ```bash > confluent kafka topic consume transactions-json \ > --cluster $CLUSTER_ID \ > --from-beginning \ > --value-format "protobuf" \ > --schema-registry-endpoint $SCHEMA_REGISTRY_URL \ > --schema-registry-api-key $SR_APIKEY \ > --schema-registry-api-secret $SR_APISECRET \ > --api-key $APIKEY --api-secret $APISECRET > ``` > Your output should be: > ```bash > {"id":"1000","amount":500} > ``` 6. Register a new schema version under the same subject by adding a new field, `customer_id`. > ```bash > syntax = "proto3"; > message MyRecord { > string id = 1; > float amount = 2; > string customer_id=3; > } > ``` > Open another terminal, and run the following command: > ```bash > confluent kafka topic produce transactions-protobuf \ > --cluster $CLUSTER_ID \ > --schema "/schema.txt" > --schema-registry-endpoint $SCHEMA_REGISTRY_URL \ > --schema-registry-api-key $SR_APIKEY \ > --schema-registry-api-secret $SR_APISECRET \ > --api-key $APIKEY --api-secret $APISECRET \ > --value-format "protobuf" > ``` > Your output should resemble: > ```bash > Successfully registered schema with ID 100002 > Starting Kafka Producer. Use Ctrl-C or Ctrl-D to exit. > ``` 7. Type the following into your producer, and hit return: ```none { "id":"1001", "amount":700, "customer_id":"1221"} ``` 8. Switch to the terminal with your running consumer to read from topic `transactions-avro` and get the new message. > You should see the new output added to the original.: > ```none > {"id":"1000","amount":500} > {"id":"1001","amount":700,"customerId":"1221"} > ``` > (If by chance you closed the original consumer, just restart it using the same command shown in step 5.) 9. View the schemas that were registered with Schema Registry as versions 1 and 2. > ```none > confluent schema-registry schema describe --subject transactions-protobuf-value --version 1 > ``` > Your output should be similar to the following, showing the `id` and `amount` fields added in version 1 of the schema > ```none > Schema ID: 100001 > Type: PROTOBUF > Schema: syntax = "proto3"; > message MyRecord { > string id = 1; > float amount = 2; > } > ``` > To view version 2 > ```none > confluent schema-registry schema describe --subject transactions-protobuf-value --version 2 > ``` > Output for version 2 will include the `customer_id` field > ```none > Schema ID: 100002 > Type: PROTOBUF > Schema: syntax = "proto3"; > message MyRecord { > string id = 1; > float amount = 2; > string customer_id = 3; > } > ``` > 1. > > Run shutdown and cleanup tasks. > - You can stop the consumer and producer with Ctl-C in their respective command windows. > - If you were using shell environment variables and want to keep them for later, remember to store them in a safe, persistent location. > - You can remove topics, clusters, and environments from the [command line](https://docs.confluent.io/confluent-cli/current/command-reference/overview.html) or from the [Confluent Cloud Console](https://confluent.cloud/). ### Confluent Replicator Confluent Replicator is a type of Kafka source connector that replicates data from a source to destination Kafka cluster. An embedded consumer inside Replicator consumes data from the source cluster, and an embedded producer inside the Kafka Connect worker produces data to the destination cluster. Replicator version 4.0 and earlier requires a connection to ZooKeeper in the origin and destination Kafka clusters. If ZooKeeper is configured for authentication, the client configures the ZooKeeper security credentials via the global JAAS configuration setting `-Djava.security.auth.login.config` on the Connect workers, and the ZooKeeper security credentials in the origin and destination clusters must be the same. To configure Confluent Replicator security, you must configure the Replicator connector as shown below and additionally you must configure: * [Kafka Connect](#sasl-gssapi-connect-workers) Configure Confluent Replicator to use SASL/GSSAPI by adding these properties in the Replicator’s JSON configuration file. ```bash { "name":"replicator", "config":{ .... "src.kafka.security.protocol" : "SASL_SSL", "src.kafka.sasl.mechanism" : "GSSAPI", "src.kafka.sasl.kerberos.service.name" : "kafka", "src.kafka.sasl.jaas.config" : "com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true keyTab=\"/etc/security/keytabs/kafka_client.keytab\" principal=\"replicator@EXAMPLE.COM\";", .... } } } ``` #### SEE ALSO To see an example Confluent Replicator configuration, see the [SASL source authentication demo script](https://github.com/confluentinc/examples/tree/latest//replicator-security/scripts/submit_replicator_source_sasl_plain_auth.sh). For demos of common security configurations see: [Replicator security demos](https://github.com/confluentinc/examples/tree/latest//replicator-security) To configure Confluent Replicator for a destination cluster with SASL/GSSAPI authentication, modify the Replicator JSON configuration to include the following: ```bash { "name":"replicator", "config":{ .... "dest.kafka.security.protocol" : "SASL_SSL", "dest.kafka.sasl.mechanism" : "GSSAPI", "dest.kafka.sasl.kerberos.service.name" : "kafka", "dest.kafka.sasl.jaas.config" : "com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true keyTab=\"/etc/security/keytabs/kafka_client.keytab\" principal=\"replicator@EXAMPLE.COM\";", .... } } } ``` Additionally, you can configure the following properties on the Connect worker: ```bash sasl.mechanism=GSSAPI security.protocol=SASL_SSL sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true keyTab="/etc/security/keytabs/kafka_client.keytab" principal="replicator@EXAMPLE.COM"; sasl.kerberos.service.name=kafka producer.sasl.mechanism=GSSAPI producer.security.protocol=SASL_SSL producer.sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true keyTab="/etc/security/keytabs/kafka_client.keytab" principal="replicator@EXAMPLE.COM"; producer.sasl.kerberos.service.name=kafka ``` For more information, see the general security configuration for Connect workers in [Kafka Connect Security Basics](../../../../connect/security.md#connect-security), and [Replicator Security Overview](../../../../multi-dc-deployments/replicator/index.md#replicator-security-overview). ### Schema Registry Schema Registry uses Kafka to persist schemas, and so it acts as a client to write data to the Kafka cluster. Therefore, if the Kafka brokers are configured for security, you should also configure Schema Registry to use security. You may also refer to the complete list of [Schema Registry configuration options](../../../../schema-registry/installation/config.md#schemaregistry-config). 1. Here is an example subset of `schema-registry.properties` configuration parameters to add for SASL authentication: ```bash kafkastore.bootstrap.servers=kafka1:9093 # Configure SASL_SSL if TLS/SSL encryption is enabled, otherwise configure SASL_PLAINTEXT kafkastore.security.protocol=SASL_SSL kafkastore.sasl.mechanism=GSSAPI ``` 2. Since you are using GSSAPI, configure a service name that matches the primary name of the Kafka server configured in the broker JAAS file. ```bash kafkastore.sasl.kerberos.service.name=kafka ``` 3. Configure the JAAS configuration property with a unique principal, i.e., usually the same name as the user running Schema Registry, and keytab, i.e., secret key. ```bash kafkastore.sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required \ useKeyTab=true \ storeKey=true \ keyTab="/etc/security/keytabs/kafka_client.keytab" \ principal="schemaregistry@EXAMPLE.COM"; ``` # If the validation is successful, brokers authenticate the incoming client. 1. After the client authentication is successful, the client principal is extracted from the token and used for authorization using RBAC or ACLs specified on the Confluent Platform cluster. 2. If the client principal extracted from the OAuth token has access policies specified for requested topic A, then the client request for access to topic A is successfully authorized. 3. After successful authentication and authorization, the Kafka client can proceed with Kafka operations, such as producing or consuming messages. The same flow as described above (named the client credentials grant flow in the OpenID Connect specification and RFC 6749) is also triggered when OAuth/OIDC is enabled for authentication between clients (producers or consumers) and other Confluent Platform services, such as Schema Registry or REST Proxy. Also, when OAuth/OIDC is used for securing service to service communication between all Confluent Platform services, for example, between Schema Registry and Confluent Server brokers or between Confluent Server brokers to Schema Registry. The SASL/OAUTHBEARER authentication flow provides the following benefits: * The client identities are hosted on an OIDC-compliant identity provider, enabling centralized identity management and streamlined authentication. * The use of short-lived tokens enhances security. * [Fine-grained](../../../../_glossary.md#term-granularity) access control, using the tokens to carry specific permissions. #### Use the configuration properties file If you have used the producer API, consumer API, or Streams API with Kafka clusters before, then you might be aware that the connectivity details to the cluster are specified using configuration properties. While some users may recognize this for applications developed to interact with Kafka, others might be unaware that the administration tools that come with Kafka work the same way, meaning that after you have defined the configuration properties (often in a form of a `config.properties` file), either applications or tools will be able to connect to clusters. When you create a configuration properties file in the user home directory, any subsequent command that you issue (be sure to include the path for the configuration file) reads that file and uses it to establish connectivity to the Kafka cluster. The first thing you must do to interact with your Kafka clusters using native Kafka tools is to generate a configuration properties file. The `--command-config` argument supplies the Confluent CLI tools with the configuration properties that they require to connect to the Kafka cluster, in the `.properties` file format. Typically, this includes the `security.protocol` that the cluster uses to connect and any information necessary to authenticate to the cluster. For example: ```text security.protocol=SASL_PLAINTEXT sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \ username="alice" \ password="s3cr3t"; ``` #### IMPORTANT There is no guarantee that this naming pattern will continue in future releases, because it’s not part of the public API. For best security, set only the minimum ACL operations. Allow only the following operations for the Kafka Streams principal. - Topic resource (for internal topics): READ, DELETE, WRITE, CREATE - Consumer Group resource: READ, DESCRIBE - Topic resource (for input topic): READ - Topic resource (for output topic): WRITE For example, given the following setup of your Kafka Streams application: - Configuration `application.id` value is `team1-streams-app1`. - Authenticating with the Kafka cluster as a `team1` user. - The application’s coded topology reads from input topics `input-topic1` and `input-topic2`. - The application’s topology write to output topics `output-topic1` and `output-topic2`. - The application has Exactly-Once processing guarantee enabled `processing.guarantee=exactly_once`. The following commands create the necessary ACLs in the Kafka cluster to allow your application to operate: ```bash #### Steps 1. Log in to Confluent Cloud with the command `confluent login`, and use your Confluent Cloud username and password. To prevent being logged out, use the `--save` argument to save your Confluent Cloud user login credentials or refresh token (in the case of SSO) to your home profile. ```bash confluent login --save ``` 2. Start the end-to-end example by running the provided script. This example uses the [ccloud-stack utility for Confluent Cloud](/cloud/current/examples/ccloud/docs/ccloud-stack.html) to automatically create a stack of fully managed services in Confluent Cloud. By default, the `ccloud-stack` utility creates resources in a new Confluent Cloud environment in cloud provider `aws` in region `us-west-2`. If you want to reuse an existing Confluent Cloud environment, or if `aws` and `us-west-2` are not the target provider and region, you may configure [other ccloud-stack options](/cloud/current/examples/ccloud/docs/ccloud-stack.html#ccloud-stack-options) before you run this example. ```bash ./start-ccloud.sh ``` 3. After starting the example, the microservices applications will be running locally and your Confluent Cloud instance will have Kafka topics with data in them. ![image](images/microservices-exercises-combined.png) 4. Sample topic data by running the following command, substituting your configuration file name with the file located in the `stack-configs` folder example (`java-service-account-12345.config`). ```bash source delta_configs/env.delta; CONFIG_FILE=/opt/docker/stack-configs/java-service-account-.config ./read-topics-ccloud.sh ``` 5. Explore the data with Elasticsearch and Kibana ![image](images/elastic-search-kafka.png) Full-text search is added via an Elasticsearch database connected through Kafka’s Connect API ([source](https://www.confluent.io/designing-event-driven-systems)). View the Kibana dashboard at [http://localhost:5601/app/kibana#/dashboard/Microservices](http://localhost:5601/app/kibana#/dashboard/Microservices) ![image](images/kibana_microservices.png) 6. View and monitor the streaming applications. Use the [Confluent Cloud Console](http://confluent.cloud) to explore topics, consumers, Stream Lineage, and the ksqlDB application. ![image](images/stream-lineage.png) 7. View the ksqlDB flow screen for the `ORDERS_ENRICHED` stream to observe events occurring and examine the stream’s schema. ![image](images/ksqldb-orders-flow.png) 8. When you are done, make sure to stop the example before proceeding to the exercises. Run the command below, where the `java-service-account-.config` file matches the file in your `stack-configs` folder. ```bash ./stop-ccloud.sh stack-configs/java-service-account-sa-123456.config ``` ## Change cluster settings for Dedicated clusters The following table lists editable cluster settings for Dedicated clusters and their default parameter values. | Parameter Name | Default | Editable | More Info | |------------------------------------------------------|---------------------|------------|------------------------------------------------------------------------------------------------------| | [auto.create.topics.enable](#topic-creation) | false | Yes | | | [ssl.enabled.protocols](#manage-tls-protocols) | TLSv1.2 | Yes | Options: `TLSv1.2`, `TLSv1.3`, or both. | | [ssl.cipher.suites](#restrict-ciphers) | “” | Yes | | | [num.partitions](#default-partitions) | 6 | Yes | Limits vary, see: [Kafka Cluster Types in Confluent Cloud](cluster-types.md#cloud-cluster-types) | | [log.cleaner.max.compaction.lag.ms](#lag-compaction) | 9223372036854775807 | Yes | Min: `21600000` ms | | [log.retention.ms](#log-retention) | 604800000 | Yes | Set to -1 for Infinite Storage | To modify these settings, use the [CLI](https://docs.confluent.io/confluent-cli/current/command-reference/kafka/cluster/configuration/index.html#confluent-kafka-cluster-configuration) or the [Kafka REST APIs](https://docs.confluent.io/cloud/current/api.html#operation/updateKafkaClusterConfig). For more information, see [Get Started with Confluent CLI](https://docs.confluent.io/confluent-cli/current/overview.html) or [Kafka REST API Quick Start for Confluent Cloud](../kafka-rest/krest-qs.md#cloud-rest-api-quickstart). Changes to the settings are applied to your Confluent Cloud cluster without additional action on your part and are persistent until the setting is explicitly changed again. ## Related content - Learn how to use Confluent Cloud to create topics, produce and consume to a Kafka cluster with the [Quick Start for Confluent Cloud](../get-started/index.md#cloud-quickstart) - For more information about Confluent CLI, see [confluent kafka cluster](https://docs.confluent.io/confluent-cli/current/command-reference/kafka/cluster/index.html) in the Confluent CLI Command Reference - For more information about Confluent Cloud APIs, see [Cluster API reference](https://docs.confluent.io/cloud/current/api.html#tag/Clusters-(cmkv2)) - For more information about creating a network using an API, see the [Networking API reference](https://docs.confluent.io/cloud/current/api.html#tag/Networks-(networkingv1)) - For more information about supported providers and regions, see [Cloud Providers and Regions for Confluent Cloud](regions.md#providers-regions) - For more information about BYOK encrypted clusters, see [Protect Data at Rest Using Self-Managed Encryption Keys on Confluent Cloud](../security/encrypt/byok/overview.md#byok-encrypted-clusters) - [How to use kafka-consumer-groups command with Confluent Cloud](https://support.confluent.io/hc/en-us/articles/360022562212) (Confluent Support) - For cost estimates, see [Confluent Cost Estimator](https://www.confluent.io/pricing/cost-estimator/) - For information about migrating from open source Kafka to Confluent Cloud, see the [Migrating from Kafka services to Confluent](https://assets.confluent.io/m/2745775bbd1fa224/original/20240425-EB-Migrating_From_Kafka_To_Confluent.pdf) PDF ## Quick Start Use this quick start to get up and running with the Confluent Cloud AlloyDB sink connector. The quick start provides the basics of selecting the connector and configuring it to stream events to an AlloyDB database. Prerequisites : - Authorized access to a [Confluent Cloud](https://www.confluent.io/confluent-cloud/) cluster on Google Cloud. - Authorized access to a AlloyDB database via [AlloyDB Auth Proxy](https://cloud.google.com/alloydb/docs/auth-proxy/connect) running on an intermediary VM accessible over a public IP. - The database and Kafka cluster should be in the same region. - For networking considerations, see [Networking and DNS](overview.md#connect-internet-access-resources). To use a set of public egress IP addresses, see [Public Egress IP Addresses for Confluent Cloud Connectors](static-egress-ip.md#cc-static-egress-ips). - The Confluent CLI installed and configured for the cluster. See [Install the Confluent CLI](https://docs.confluent.io/confluent-cli/current/install.html). - [Schema Registry](../get-started/schema-registry.md#cloud-sr-config) must be enabled to use a Schema Registry-based format (for example, Avro, JSON_SR (JSON Schema), or Protobuf). See [Schema Registry Enabled Environments](limits.md#connect-ccloud-environment-limits) for additional information. - Kafka cluster credentials. The following lists the different ways you can provide credentials. - Enter an existing [service account](service-account.md#s3-cloud-service-account) resource ID. - Create a Confluent Cloud [service account](service-account.md#s3-cloud-service-account) for the connector. Make sure to review the ACL entries required in the [service account documentation](service-account.md#s3-cloud-service-account). Some connectors have specific ACL requirements. - Create a Confluent Cloud API key and secret. To create a key and secret, you can use [confluent api-key create](https://docs.confluent.io/confluent-cli/current/command-reference/api-key/confluent_api-key_create.html) *or* you can autogenerate the API key and secret directly in the Cloud Console when setting up the connector. ## Quick Start Use this quick start to get up and running with the Confluent Cloud Amazon DynamoDB Sink connector. The quick start provides the basics of selecting the connector and configuring it to stream events to Amazon Redshift. Prerequisites : - Authorized access to a [Confluent Cloud](https://www.confluent.io/confluent-cloud/) cluster on Amazon Web Services. - The Confluent CLI installed and configured for the cluster. See [Install the Confluent CLI](https://docs.confluent.io/confluent-cli/current/install.html). - [Schema Registry](../get-started/schema-registry.md#cloud-sr-config) must be enabled to use a Schema Registry-based format (for example, Avro, JSON_SR (JSON Schema), or Protobuf). See [Schema Registry Enabled Environments](limits.md#connect-ccloud-environment-limits) for additional information. - Authorized access to AWS and the Amazon DynamoDB database. For more information, see [DynamoDB IAM policy](#cc-dynamodb-policy). - The database must be in the same region as your Confluent Cloud cluster. - For networking considerations, see [Networking and DNS](overview.md#connect-internet-access-resources). To use a set of public egress IP addresses, see [Public Egress IP Addresses for Confluent Cloud Connectors](static-egress-ip.md#cc-static-egress-ips). - Kafka cluster credentials. The following lists the different ways you can provide credentials. - Enter an existing [service account](service-account.md#s3-cloud-service-account) resource ID. - Create a Confluent Cloud [service account](service-account.md#s3-cloud-service-account) for the connector. Make sure to review the ACL entries required in the [service account documentation](service-account.md#s3-cloud-service-account). Some connectors have specific ACL requirements. - Create a Confluent Cloud API key and secret. To create a key and secret, you can use [confluent api-key create](https://docs.confluent.io/confluent-cli/current/command-reference/api-key/confluent_api-key_create.html) *or* you can autogenerate the API key and secret directly in the Cloud Console when setting up the connector. # Process Data with Confluent Cloud for Apache Flink * [Overview](overview.md) * [Get Started](get-started/index.md) * [Overview](get-started/overview.md) * [Quick Start with Cloud Console](get-started/quick-start-cloud-console.md) * [Quick Start with SQL Shell in Confluent CLI](get-started/quick-start-shell.md) * [Quick Start with Java Table API](get-started/quick-start-java-table-api.md) * [Quick Start with Python Table API](get-started/quick-start-python-table-api.md) * [Concepts](concepts/index.md) * [Overview](concepts/overview.md) * [Autopilot](concepts/autopilot.md) * [Batch and Stream Processing](concepts/batch-and-stream-processing.md) * [Billing](concepts/flink-billing.md) * [Comparison with Apache Flink](concepts/comparison-with-apache-flink.md) * [Compute Pools](concepts/compute-pools.md) * [Delivery Guarantees and Latency](concepts/delivery-guarantees.md) * [Determinism](concepts/determinism.md) * [Private Networking](concepts/flink-private-networking.md) * [Schema and Statement Evolution](concepts/schema-statement-evolution.md) * [Snapshot Queries](concepts/snapshot-queries.md) * [Statements](concepts/statements.md) * [Statement CFU Metrics](concepts/statement-cfu-metrics.md) * [Tables and Topics](concepts/dynamic-tables.md) * [Time and Watermarks](concepts/timely-stream-processing.md) * [User-defined Functions](concepts/user-defined-functions.md) * [How-To Guides](how-to-guides/index.md) * [Overview](how-to-guides/overview.md) * [Aggregate a Stream in a Tumbling Window](how-to-guides/aggregate-tumbling-window.md) * [Combine Streams and Track Most Recent Records](how-to-guides/combine-and-track-most-recent-records.md) * [Compare Current and Previous Values in a Stream](how-to-guides/compare-current-and-previous-values.md) * [Convert the Serialization Format of a Topic](how-to-guides/convert-serialization-format.md) * [Create a UDF](how-to-guides/create-udf.md) * [Deduplicate Rows in a Table](how-to-guides/deduplicate-rows.md) * [Generate Custom Sample Data](how-to-guides/custom-sample-data.md) * [Handle Multiple Event Types](how-to-guides/multiple-event-types.md) * [Log Debug Messages in UDFs](how-to-guides/enable-udf-logging.md) * [Mask Fields in a Table](how-to-guides/mask-fields.md) * [Process Schemaless Events](how-to-guides/process-schemaless-events.md) * [Profile a Query](how-to-guides/profile-query.md) * [Resolve Statement Issues](how-to-guides/resolve-common-query-problems.md) * [Scan and Summarize Tables](how-to-guides/scan-and-summarize-tables.md) * [Run a Snapshot Query](how-to-guides/run-snapshot-query.md) * [Transform a Topic](how-to-guides/transform-topic.md) * [View Time Series Data](how-to-guides/view-time-series-data.md) * [Operate and Deploy](operate-and-deploy/index.md) * [Overview](operate-and-deploy/overview.md) * [Carry-over Offsets](operate-and-deploy/carry-over-offsets.md) * [Deploy a Statement with CI/CD](operate-and-deploy/deploy-flink-sql-statement.md) * [Enable Private Networking](operate-and-deploy/private-networking.md) * [Generate a Flink API Key](operate-and-deploy/generate-api-key-for-flink.md) * [Grant Role-Based Access](operate-and-deploy/flink-rbac.md) * [Manage Compute Pools](operate-and-deploy/create-compute-pool.md) * [Manage Connections](operate-and-deploy/manage-connections.md) * [Monitor and Manage Statements](operate-and-deploy/monitor-statements.md) * [Move SQL Statements to Production](operate-and-deploy/best-practices.md) * [Profile Queries](operate-and-deploy/query-profiler.md) * [REST API](operate-and-deploy/flink-rest-api.md) * [Flink Reference](reference/index.md) * [Overview](reference/overview.md) * [SQL Syntax](reference/sql-syntax.md) * [DDL Statements](reference/statements/index.md) * [DML Statements](reference/queries/index.md) * [Functions](reference/functions/index.md) * [Data Types](reference/datatypes.md) * [Data Type Mappings](reference/serialization.md) * [Time Zone](reference/timezone.md) * [Keywords](reference/keywords.md) * [Information Schema](reference/flink-sql-information-schema.md) * [Example Streams](reference/example-data.md) * [Supported Cloud Regions](reference/cloud-regions.md) * [SQL Examples](reference/sql-examples.md) * [Table API](reference/table-api.md) * [CLI Reference](reference/flink-sql-cli.md) * [Get Help](get-help.md) * [FAQ](flink-faq.md) ## [1/25/2023] Confluent CLI v3.0.0 Release Notes Confluent CLI is now [source-available](https://github.com/confluentinc/cli) under the Confluent Community License. For more details, check out the [Announcing the Source Available Confluent CLI](https://www.confluent.io/blog/announcing-the-source-available-confluent-cli/) blog post. **Breaking Changes** : - Default to HTTPS for on-prem login with `confluent login` - Require acknowledgment before deleting any resource, and use `--force` flag to skip acknowledgment - Use correct and consistent JSON and YAML formatting across all commands - Remove leading “v” from archive names, so the format matches binary names - Place asterisk in “Current” column in `confluent api-key list`, `confluent environment list`, and `confluent kafka list` - Delete `confluent ksql app` commands in favor of `confluent ksql cluster` commands - In `confluent iam rbac role-binding list`, combine “Service Name” and “Pool Name” into “Name” - In `confluent iam rbac role-binding list`, require the `--inclusive` flag to list role bindings in nested scopes - Move Connect cluster management commands under `confluent connect cluster` - Prevent using numeric IDs for `confluent kafka acl` commands - Print a table instead of a list in `confluent schema-registry compatibility validate` and `confluent schema-registry config describe` - Remove the “KAPI” field, “API Endpoint” field, and the corresponding `--all` flag from `confluent kafka cluster describe` - Remove shorthand flags: `-D`, `-P`, `-r`, `-S`, and `-V` - Rename “Exporter” to “exporter” in serialized output for `confluent schema-registry exporter list` - Rename “task” to “tasks” in `confluent connect cluster describe` - Rename `--current-env` to `--current-environment` - Rename `--no-auth` to `--no-authentication` - Rename `--operation` to `--operations` where appropriate - Rename `--refs` to `--references` - Rename `--show-refs` to `--show-references` - Rename `--sr-apikey` and `--sr-api-key` to `--schema-registry-api-key` - Rename `--sr-apisecret` and `--sr-api-secret` to `--schema-registry-api-secret` - Rename `--sr-endpoint` to `--schema-registry-endpoint` - Rename `confluent audit-log migrate config` to `confluent audit-log config migrate` - Rename `confluent kafka link describe` to `confluent kafka link configuration list` - Rename `confluent kafka partition get-reassignments` to `confluent kafka partition reassigments list` - Rename values for `confluent price list --cluster-type` - Replace all instances of “First Name” and “Last Name” with “Full Name” - Require `--environment` in `confluent schema-registry cluster delete` **New Features** : - Use Kafka REST for all `confluent kafka` commands - Remove login requirement for `confluent secret` commands - Add “Read Only” column to `confluent kafka topic describe` - Add detailed Kafka REST examples for `confluent kafka acl` commands # Get Started with Confluent Cloud for Government Confluent Cloud for Government is a data streaming service based on Apache Kafka® and delivered as a fully-managed, cloud-native service. Use Confluent Cloud for Government to collect real-time data from multiple sources and put it in motion. Confluent Cloud for Government provides multiple interfaces to manage your data streams, including a web interface, a command-line interface, and APIs. Use this quick start to create a Kafka [cluster](../_glossary.md#term-Kafka-cluster) and a [topic](../_glossary.md#term-Kafka-topic). Connect a self-managed data source to the cluster and [produce](../_glossary.md#term-producer) data to the topic. Then use the Cloud Console to review the incoming data, similar to how a self-managed client would [consume](../_glossary.md#term-consumer) the data you’re producing. This quick start lists all the interfaces available to complete a step, but you should only use the interface that best fits your needs. ## Control Center resource access by role All of the other components besides Control Center (Kafka, Schema Registry, Connect, ksqlDB) are being enforced with RBAC by those components themselves. The only resources that Control Center directly enforces with RBAC are: - [License management](../installation/license.md#controlcenter-licenses) (global operation on Confluent Platform). - [Broker metrics](../brokers.md#controlcenter-userguide-brokers) (system health per cluster). - [Alerts](#c3-rbac-alerts-access) (global view for all clusters in a Control Center instance). | Role Scope | License management | Broker metrics | Alerts | |---------------|--------------------------|------------------|--------------------------| | SystemAdmin | Yes [1](#id3) | Yes | Yes | | ClusterAdmin | No | Yes | Yes | | Operator | No | Yes | Yes | | ResourceOwner | No | No | Yes [2](#id4) | * **[1]** To access license management, the `SystemAdmin` role must be granted on the Kafka cluster running MDS. * **[2]** Resource owners on a topic or consumer group can create a trigger or an action for that resource. They can also view the fired alert in the Alerts History and the Alerts REST API pages. #### IMPORTANT Starting with Confluent Platform version 8.0, ZooKeeper is no longer part of Confluent Platform. | Confluent Component | Management Aspect | Declarative API (CRD) | Confluent CLI | Confluent REST API | |---------------------------------------------------------------------------|-------------------------------|------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------| | Kafka | Create, update, delete topics | kafkatopic CRD | [kafka topic](https://docs.confluent.io/confluent-cli/current/command-reference/kafka/topic/index.html), kafka-topics.sh, kafka-configs.sh | [Topic](https://docs.confluent.io/platform/current/kafka-rest/api.html#topic-v3) | | Kafka | Simple ACLs | N/A | [kafka acl](https://docs.confluent.io/confluent-cli/current/command-reference/kafka/acl/index.html) | [ACL](https://docs.confluent.io/platform/current/kafka-rest/api.html#acl-v3) | | Kafka | Delete, config update brokers | kafka CRD | [kafka broker](https://docs.confluent.io/confluent-cli/current/command-reference/kafka/broker/index.html) | [Configs](https://docs.confluent.io/platform/current/kafka-rest/api.html#configs-v3) | | Kafka | Cluster Linking | clusterlink CRD | [kafka link](https://docs.confluent.io/confluent-cli/current/command-reference/kafka/broker/index.html) | [Cluster Linking](https://docs.confluent.io/platform/current/kafka-rest/api.html#cluster-linking-v3) | | Kafka | Mirror topics | clusterlink CRD | [kafka mirror](https://docs.confluent.io/confluent-cli/current/command-reference/kafka/mirror/index.html) | [Create a cluster link](https://docs.confluent.io/cloud/current/multi-cloud/cluster-linking/cluster-links-cc.html#creating-a-cluster-link-through-the-rest-api) | | Kafka | View partitions | N/A | [kafka partition](https://docs.confluent.io/confluent-cli/current/command-reference/kafka/partition/index.html) | [Partition](https://docs.confluent.io/platform/current/kafka-rest/api.html#partition-v3) | | Kafka | Manage partition assignment | Enable Self-Balancing Cluster capability | [kafka-reassign-partitions.sh](https://cwiki.apache.org/confluence/display/kafka/replication+tools#Replicationtools-4.ReassignPartitionsTool) | [Partition](https://docs.confluent.io/platform/current/kafka-rest/api.html#partition-v3) | | Kafka | Manage consumer groups | N/A | [kafka-consumer-groups.sh](https://docs.confluent.io/platform/current/clients/consumer.html#ak-consumer-group-command-tool) | [Consumer Group](https://docs.confluent.io/platform/current/kafka-rest/api.html#consumer-group-v3) | | Schema Registry | Schemas | Schema CRD | [schema-registry schema](https://docs.confluent.io/confluent-cli/current/command-reference/schema-registry/schema/) | [Schema Registry](https://docs.confluent.io/platform/current/schema-registry/develop/api.html) | | Schema Registry | Link schemas | SchemaExporter CRD | [schema-registry exporter](https://docs.confluent.io/confluent-cli/current/command-reference/schema-registry/exporter/) | [Schema Registry](https://docs.confluent.io/platform/current/schema-registry/schema-linking-cp.html#rest-apis) | | Connect | Connectors | Connector CRD | [connect plugin](https://docs.confluent.io/confluent-cli/current/command-reference/connect/plugin/) | [Kafka Connect](https://docs.confluent.io/platform/current/connect/references/restapi.html) | | Kafka / MDS | Manage RBAC | ConfluentRoleBinding CRD | [iam rbac](https://docs.confluent.io/confluent-cli/current/command-reference/iam/rbac/) | [Confluent Metadata](https://docs.confluent.io/platform/current/security/rbac/mds-api.html) | | ZooKeeper, KRaft, Kafka, Schema Registry, ksqlDB, Connect, Control Center | Restart components | Component CRDs | N/A | N/A | | ZooKeeper, KRaft, Kafka, Schema Registry, ksqlDB, Connect, Control Center | Delete deployments | Component CRDs | N/A | N/A | # Confluent APIs for Confluent Platform The Confluent APIs listed in this topic are a set of software interfaces that enable you to interact with the enterprise features of Confluent Platform. You can use these APIs to build applications that can consume, produce, and process data in real-time. Following are the APIs for Confluent Platform: - [Confluent REST Proxy](../kafka-rest/index.md#kafkarest-intro) - [Connect REST API](../connect/references/restapi.md#connect-userguide-rest) - [Flink REST API](../flink/clients-api/rest.md#af-rest-api) - [Schema Registry API](../schema-registry/develop/api.md#schemaregistry-api) - [ksqlDB REST API](../ksqldb/developer-guide/ksqldb-rest-api/overview.md#ksqldb-rest-api) - [Metadata API](../security/authorization/rbac/mds-api.md#mds-api) Note that before using these APIs, you should review the [Confluent Public API Terms of Use](https://www.confluent.io/legal/confluent-public-api-terms-of-use/). # Manage Self-Balancing Clusters * [Overview](index.md) * [How Self-Balancing simplifies Kafka operations](index.md#how-sbc-simplifies-ak-operations) * [Self-Balancing vs. Auto Data Balancer](index.md#sbc-vs-adb) * [How it works](index.md#how-it-works) * [Architecture of a Self-Balancing cluster](index.md#architecture-of-a-sbc-cluster) * [Enabling Self-Balancing Clusters](index.md#enabling-sbc-long) * [What defines a “balanced” cluster and what triggers a rebalance?](index.md#what-defines-a-balanced-cluster-and-what-triggers-a-rebalance) * [What happens if the lead broker (controller) is removed or lost?](index.md#what-happens-if-the-lead-broker-controller-is-removed-or-lost) * [How do the brokers leverage Cruise Control?](index.md#how-do-the-brokers-leverage-cruise-control) * [What internal topics does the Self-Balancing Clusters feature create and use?](index.md#what-internal-topics-does-the-sbc-long-feature-create-and-use) * [Limitations](index.md#limitations) * [Configuration and monitoring](index.md#configuration-and-monitoring) * [Getting status on the balancer](index.md#getting-status-on-the-balancer) * [Using Control Center](index.md#using-c3-short) * [Kafka server properties and commands](index.md#ak-server-properties-and-commands) * [Metrics for monitoring a rebalance](index.md#metrics-for-monitoring-a-rebalance) * [Replica placement and rack configurations](index.md#replica-placement-and-rack-configurations) * [Racks](index.md#racks) * [Replica placement and multi-region clusters](index.md#replica-placement-and-multi-region-clusters) * [Capacity](index.md#capacity) * [Distribution](index.md#distribution) * [Debugging rebalance failures](index.md#debugging-rebalance-failures) * [Security considerations](index.md#security-considerations) * [Troubleshooting](index.md#troubleshooting) * [Self-Balancing options do not show up on Control Center](index.md#sbc-options-do-not-show-up-on-c3-short) * [Broker metrics are not displayed on Control Center](index.md#broker-metrics-are-not-displayed-on-c3-short) * [Consumer lag reflected on Control Center](index.md#consumer-lag-reflected-on-c3-short) * [Broker removal attempt fails during Self-Balancing initialization](index.md#broker-removal-attempt-fails-during-sbc-initialization) * [Broker removal cannot complete due to offline partitions](index.md#broker-removal-cannot-complete-due-to-offline-partitions) * [Too many excluded topics causes problems with Self-Balancing](index.md#too-many-excluded-topics-causes-problems-with-sbc) * [The balancer status for a KRaft controller hangs](index.md#the-balancer-status-for-a-kraft-controller-hangs) * [Related content](index.md#related-content) * [Tutorial: Adding and Remove Brokers](sbc-tutorial.md) * [Configuring and starting controllers and brokers in KRaft mode](sbc-tutorial.md#configuring-and-starting-controllers-and-brokers-in-kraft-mode) * [Prerequisites](sbc-tutorial.md#prerequisites) * [Environment variables](sbc-tutorial.md#environment-variables) * [Configure Kafka brokers](sbc-tutorial.md#configure-ak-brokers) * [Create broker-0 to use a template for the other brokers](sbc-tutorial.md#create-broker-0-to-use-a-template-for-the-other-brokers) * [Enable the Metrics Reporter for Control Center](sbc-tutorial.md#enable-the-cmetric-for-c3-short) * [Configure replication factors for Self-Balancing](sbc-tutorial.md#configure-replication-factors-for-sbc) * [Verify that Self-Balancing is enabled](sbc-tutorial.md#verify-that-sbc-is-enabled) * [Save the file](sbc-tutorial.md#save-the-file) * [Create a basic configuration for a five-broker cluster](sbc-tutorial.md#create-a-basic-configuration-for-a-five-broker-cluster) * [Start Confluent Platform, create topics, and generate test data](sbc-tutorial.md#start-cp-create-topics-and-generate-test-data) * [Start the controller and brokers](sbc-tutorial.md#start-the-controller-and-brokers) * [Create a topic and test the cluster](sbc-tutorial.md#create-a-topic-and-test-the-cluster) * [What’s next](sbc-tutorial.md#what-s-next) * [(Optional) Install and configure Confluent Control Center](sbc-tutorial.md#optional-install-and-configure-c3) * [1. Download, extract, and configure Control Center](sbc-tutorial.md#download-extract-and-configure-c3-short) * [2. Configure Control Center with REST endpoints and advertised listeners](sbc-tutorial.md#configure-c3-short-with-rest-endpoints-and-advertised-listeners) * [3. Start Prometheus and (Control Center)](sbc-tutorial.md#start-prometheus-and-c3-short) * [4. Configure the controller and brokers to send metrics to Control Center with Prometheus](sbc-tutorial.md#configure-the-controller-and-brokers-to-send-metrics-to-c3-short-with-prometheus) * [5. Restart the controller and brokers](sbc-tutorial.md#restart-the-controller-and-brokers) * [Use the command line to test rebalancing](sbc-tutorial.md#use-the-command-line-to-test-rebalancing) * [List topics and generate data to your test topic](sbc-tutorial.md#list-topics-and-generate-data-to-your-test-topic) * [Verify status of brokers and topic data](sbc-tutorial.md#verify-status-of-brokers-and-topic-data) * [Remove a broker](sbc-tutorial.md#remove-a-broker) * [Add a broker (restart)](sbc-tutorial.md#add-a-broker-restart) * [Use Control Center to test rebalancing](sbc-tutorial.md#use-c3-short-to-test-rebalancing) * [Verify status of brokers and topic data](sbc-tutorial.md#id1) * [Remove a broker](sbc-tutorial.md#sbc-tutorial-c3-remove-broker) * [Add a broker (restart)](sbc-tutorial.md#id3) * [Shutdown and cleanup tasks](sbc-tutorial.md#shutdown-and-cleanup-tasks) * [(Optional) Running the other components](sbc-tutorial.md#optional-running-the-other-components) * [Related content](sbc-tutorial.md#related-content) * [Configure](configuration-options.md) * [Self-Balancing configuration](configuration-options.md#sbc-configuration) * [confluent.balancer.enable](configuration-options.md#confluent-balancer-enable) * [confluent.balancer.heal.uneven.load.trigger](configuration-options.md#confluent-balancer-heal-uneven-load-trigger) * [confluent.balancer.heal.broker.failure.threshold.ms](configuration-options.md#confluent-balancer-heal-broker-failure-threshold-ms) * [confluent.balancer.throttle.bytes.per.second](configuration-options.md#confluent-balancer-throttle-bytes-per-second) * [confluent.balancer.disk.max.load](configuration-options.md#confluent-balancer-disk-max-load) * [confluent.balancer.max.replicas](configuration-options.md#confluent-balancer-max-replicas) * [confluent.balancer.exclude.topic.names](configuration-options.md#confluent-balancer-exclude-topic-names) * [confluent.balancer.exclude.topic.prefixes](configuration-options.md#confluent-balancer-exclude-topic-prefixes) * [confluent.balancer.topic.replication.factor](configuration-options.md#confluent-balancer-topic-replication-factor) * [Self-Balancing internal topics](configuration-options.md#sbc-internal-topics) * [Required Configurations for Control Center](configuration-options.md#required-configurations-for-c3-short) * [Configure REST Endpoints in the Control Center properties file](configuration-options.md#configure-rest-endpoints-in-the-c3-short-properties-file) * [Configure authentication for REST endpoints on Kafka brokers (Secure Setup)](configuration-options.md#configure-authentication-for-rest-endpoints-on-ak-brokers-secure-setup) * [Examples: Update broker configurations on the fly](configuration-options.md#examples-update-broker-configurations-on-the-fly) * [Enable or disable Self-Balancing](configuration-options.md#enable-or-disable-sbc) * [Set trigger condition for rebalance](configuration-options.md#set-trigger-condition-for-rebalance) * [Set or remove a custom throttle](configuration-options.md#set-or-remove-a-custom-throttle) * [Monitoring the balancer with kafka-rebalance-cluster](configuration-options.md#monitoring-the-balancer-with-kafka-rebalance-cluster) * [Get the balancer status](configuration-options.md#get-the-balancer-status) * [Get the workload optimization status (AnyUnevenLoad)](configuration-options.md#get-the-workload-optimization-status-anyunevenload) * [kafka-remove-brokers](configuration-options.md#kafka-remove-brokers) * [Flags](configuration-options.md#flags) * [Examples](configuration-options.md#examples) * [Broker removal phases](configuration-options.md#broker-removal-phases) * [Self-Balancing initialization](configuration-options.md#sbc-initialization) * [Broker removal task priority](configuration-options.md#broker-removal-task-priority) * [Related content](configuration-options.md#related-content) * [Performance and Resource Usage](performance.md) * [Add brokers to expand a small cluster with a high partition count](performance.md#add-brokers-to-expand-a-small-cluster-with-a-high-partition-count) * [Test Description](performance.md#test-description) * [Cluster Configurations](performance.md#cluster-configurations) * [Performance Results](performance.md#performance-results) * [Test scalability of a large cluster with many partitions](performance.md#test-scalability-of-a-large-cluster-with-many-partitions) * [Test Description](performance.md#id1) * [Cluster Configurations](performance.md#id2) * [Performance Results](performance.md#id3) * [Repeatedly bounce the controller](performance.md#repeatedly-bounce-the-controller) * [Test Description](performance.md#id4) * [Cluster Configurations](performance.md#id5) * [Performance Results](performance.md#id6) * [Related content](performance.md#related-content) ## License types The following table lists the Kafka and Confluent features and whether they are covered under the [Enterprise license](#cp-enterprise-subs-license) , a [Community license](https://www.confluent.io/confluent-community-license/) or an [Apache Kafka 2.0 license](https://github.com/apache/kafka/blob/trunk/LICENSE). For more information, see the [Community license FAQ](https://www.confluent.io/confluent-community-license-faq/).
Confluent Enterprise License for Confluent Platform subscription
Auto Data Balancer
Confluent for Kubernetes
Confluent Replicator
Confluent Server Cluster Linking
Multi-Region Clusters
Role-based Access Control
Schema Registry Security Plug-in
Schema Validation
Secrets Protection
Self-Balancing Clusters
Structured Audit Logs
Tiered Storage
Control Center
Health+
Kafka Connect Commercial Connectors
Premium Connectors
MQTT Proxy
Schema Linking
Confluent Platform for Apache Flink
Confluent Enterprise License for Customer-managed Confluent Platform for Confluent Cloud subscription
Kafka Connect Worker Commercial Connectors
Control Center
Confluent for Kubernetes
Replicator
Confluent Community License
Admin REST API
Confluent CLI
Kafka Connect Community-licensed Connectors
ksqlDB
REST Proxy
Schema Registry
Apache 2.0 License
Apache Kafka (with Connect and Streams)
Ansible Playbooks
Apache ZooKeeper
Confluent Clients
Open Source Connectors
## Stream `bootstrap.servers` : A list of host/port pairs to use for establishing the initial connection to the Apache Kafka® cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. This list should be in the form host1:port1,host2:port2,…. Since these servers are just used for the initial connection to discover the full cluster membership (which may change dynamically), this list need not contain the full set of servers (you may want more than one, though, in case a server is down). * Type: string * Importance: high `topic.regex.list` : A comma-separated list of pairs of type `:` that is used to map MQTT topics to Kafka topics. * Type: list * Valid Values: A list of pairs in the form `:, :, ...` * Importance: high `stream.threads.num` : Number of threads publishing records to Kafka * Type: int * Default: 1 * Valid Values: [1,…] * Importance: high `producer.buffer.memory` : The total bytes of memory the producer can use to buffer records waiting to be sent to the server. If records are sent faster than they can be delivered to the server the producer will block for max.block.ms after which it will throw an exception.This setting should correspond roughly to the total memory the producer will use, but is not a hard bound since not all memory the producer uses is used for buffering. Some additional memory will be used for compression (if compression is enabled) as well as for maintaining in-flight requests. * Type: long * Default: 33554432 * Valid Values: [0,…] * Importance: high `producer.compression.type` : The compression type for all data generated by the producer. The default is none (i.e. no compression). Valid values are none, gzip, snappy, or lz4. Compression is of full batches of data, so the efficacy of batching will also impact the compression ratio (more batching means better compression). * Type: string * Default: none * Importance: high `producer.batch.size` : The producer will attempt to batch records together into fewer requests whenever multiple records are being sent to the same partition. This helps performance on both the client and the server. This configuration controls the default batch size in bytes. No attempt will be made to batch records larger than this size. Requests sent to brokers will contain multiple batches, one for each partition with data available to be sent. A small batch size will make batching less common and may reduce throughput (a batch size of zero will disable batching entirely). A very large batch size may use memory a bit more wastefully as we will always allocate a buffer of the specified batch size in anticipation of additional records. * Type: int * Default: 16384 * Valid Values: [0,…] * Importance: medium `producer.linger.ms` : The producer groups together any records that arrive in between request transmissions into a single batched request. Normally this occurs only under load when records arrive faster than they can be sent out. However in some circumstances the client may want to reduce the number of requests even under moderate load. This setting accomplishes this by adding a small amount of artificial delay—that is, rather than immediately sending out a record the producer will wait for up to the given delay to allow other records to be sent so that the sends can be batched together. This can be thought of as analogous to Nagle’s algorithm in TCP. This setting gives the upper bound on the delay for batching: once we get `batch.size` worth of records for a partition it will be sent immediately regardless of this setting, however if we have fewer than this many bytes accumulated for this partition we will ‘linger’ for the specified time waiting for more records to show up. This setting defaults to 0 (i.e. no delay). Setting linger.ms=5, for example, would have the effect of reducing the number of requests sent but would add up to 5ms of latency to records sent in the absence of load. * Type: long * Default: 0 * Valid Values: [0,…] * Importance: medium `producer.client.id` : An id string to pass to the server when making requests. The purpose of this is to be able to track the source of requests beyond just ip/port by allowing a logical application name to be included in server-side request logging. * Type: string * Default: “” * Importance: medium `producer.send.buffer.bytes` : The size of the TCP send buffer (SO_SNDBUF) to use when sending data. If the value is -1, the OS default will be used. * Type: int * Default: 131072 * Valid Values: [-1,…] * Importance: medium `producer.receive.buffer.bytes` : The size of the TCP receive buffer (SO_RCVBUF) to use when reading data. If the value is -1, the OS default will be used. * Type: int * Default: 32768 * Valid Values: [-1,…] * Importance: medium `producer.max.request.size` : The maximum size of a request in bytes. This setting will limit the number of record batches the producer will send in a single request to avoid sending huge requests. This is also effectively a cap on the maximum record batch size. Note that the server has its own cap on record batch size which may be different from this. * Type: int * Default: 1048576 * Valid Values: [0,…] * Importance: medium `producer.reconnect.backoff.ms` : The base amount of time to wait before attempting to reconnect to a given host. This avoids repeatedly connecting to a host in a tight loop. This backoff applies to all connection attempts by the client to a broker. * Type: long * Default: 50 * Valid Values: [0,…] * Importance: low `producer.reconnect.backoff.max.ms` : The maximum amount of time in milliseconds to wait when reconnecting to a broker that has repeatedly failed to connect. If provided, the backoff per host will increase exponentially for each consecutive connection failure, up to this maximum. After calculating the backoff increase, 20% random jitter is added to avoid connection storms. * Type: long * Default: 1000 * Valid Values: [0,…] * Importance: low `producer.max.block.ms` : The configuration controls how long KafkaProducer.send() and KafkaProducer.partitionsFor() will block.These methods can be blocked either because the buffer is full or metadata unavailable.Blocking in the user-supplied serializers or partitioner will not be counted against this timeout. * Type: long * Default: 60000 * Valid Values: [0,…] * Importance: medium `producer.request.timeout.ms` : The configuration controls the maximum amount of time the client will wait for the response of a request. If the response is not received before the timeout elapses the client will resend the request if necessary or fail the request if retries are exhausted. This should be larger than replica.lag.time.max.ms (a broker configuration) to reduce the possibility of message duplication due to unnecessary producer retries. * Type: int * Default: 30000 * Valid Values: [0,…] * Importance: medium `producer.metadata.max.age.ms` : The period of time in milliseconds after which we force a refresh of metadata even if we haven’t seen any partition leadership changes to proactively discover any new brokers or partitions. * Type: long * Default: 300000 * Valid Values: [0,…] * Importance: low `producer.metrics.sample.window.ms` : The window of time a metrics sample is computed over. * Type: long * Default: 30000 * Valid Values: [0,…] * Importance: low `producer.metrics.num.samples` : The number of samples maintained to compute metrics. * Type: int * Default: 2 * Valid Values: [1,…] * Importance: low `producer.metrics.recording.level` : The highest recording level for metrics. * Type: string * Default: INFO * Valid Values: [INFO, DEBUG] * Importance: low `producer.metric.reporters` : A list of classes to use as metrics reporters. Implementing the org.apache.kafka.common.metrics.MetricsReporter interface allows plugging in classes that will be notified of new metric creation. The JmxReporter is always included to register JMX statistics. * Type: list * Default: “” * Valid Values: [org.apache.kafka.common.config.ConfigDef$NonNullValidator@614eb172](mailto:org.apache.kafka.common.config.ConfigDef$NonNullValidator@614eb172) * Importance: low `producer.max.in.flight.requests.per.connection` : The maximum number of unacknowledged requests the client will send on a single connection before blocking. Note that if this setting is set to be greater than 1 and there are failed sends, there is a risk of message re-ordering due to retries (i.e., if retries are enabled). * Type: int * Default: 5 * Valid Values: [1,…] * Importance: low `producer.connections.max.idle.ms` : Close idle connections after the number of milliseconds specified by this config. * Type: long * Default: 540000 * Importance: medium `producer.partitioner.class` : Partitioner class that implements the org.apache.kafka.clients.producer.Partitioner interface. * Type: class * Default: org.apache.kafka.clients.producer.internals.DefaultPartitioner * Importance: medium `producer.interceptor.classes` : A list of classes to use as interceptors. Implementing the org.apache.kafka.clients.producer.ProducerInterceptor interface allows you to intercept (and possibly mutate) the records received by the producer before they are published to the Kafka cluster. By default, there are no interceptors. * Type: list * Default: “” * Valid Values: [org.apache.kafka.common.config.ConfigDef$NonNullValidator@22b2223](mailto:org.apache.kafka.common.config.ConfigDef$NonNullValidator@22b2223) * Importance: low `producer.security.protocol` : Protocol used to communicate with brokers. Valid values are: PLAINTEXT, SSL, SASL_PLAINTEXT, SASL_SSL. * Type: string * Default: PLAINTEXT * Importance: medium `producer.ssl.protocol` : The TLS protocol used to generate the SSLContext. The default is `TLSv1.3` when running with Java 11 or newer, `TLSv1.2` otherwise. This value should be fine for most use cases. Allowed values in recent JVMs are `TLSv1.2` and `TLSv1.3`. `TLS`, `TLSv1.1`, `SSL`, `SSLv2` and `SSLv3` might be supported in older JVMs, but their usage is discouraged due to known security vulnerabilities. With the default value for this configuration and `ssl.enabled.protocols`, clients downgrade to `TLSv1.2` if the server does not support `TLSv1.3`. If this configuration is set to `TLSv1.2`, clients do not use `TLSv1.3`, even if it is one of the values in `ssl.enabled.protocols` and the server only supports `TLSv1.3`. * Type: string * Default: `TLSv1.3` * Importance: medium `producer.ssl.provider` : The name of the security provider used for TLS connections. Default value is the default security provider of the JVM. * Type: string * Default: null * Importance: medium `producer.ssl.cipher.suites` : A list of cipher suites. This is a named combination of authentication, encryption, MAC, and key exchange algorithms used to negotiate the security settings for a network connection using TLS. By default, all the available cipher suites are supported. * Type: list * Default: null * Importance: low `producer.ssl.enabled.protocols` : The comma-separated list of protocols enabled for TLS connections. The default value is `TLSv1.2,TLSv1.3` when running with Java 11 or later, `TLSv1.2` otherwise. With the default value for Java 11 (`TLSv1.2,TLSv1.3`), Kafka clients and brokers prefer TLSv1.3 if both support it, and falls back to TLSv1.2 otherwise (assuming both support at least TLSv1.2). * Type: list * Default: `TLSv1.2,TLSv1.3` * Importance: medium `producer.ssl.keystore.type` : The file format of the key store file. This is optional for client. * Type: string * Default: JKS * Importance: medium `producer.ssl.keystore.location` : The location of the key store file. This is optional for client and can be used for two-way client authentication. * Type: string * Default: null * Importance: high `producer.ssl.keystore.password` : The store password for the key store file. This is optional for client and only needed if ssl.keystore.location is configured. * Type: password * Default: null * Importance: high `producer.ssl.key.password` : The password of the private key in the key store file. This is optional for client. * Type: password * Default: null * Importance: high `producer.ssl.truststore.type` : The file format of the trust store file. * Type: string * Default: JKS * Importance: medium `producer.ssl.truststore.location` : The location of the trust store file. * Type: string * Default: null * Importance: high `producer.ssl.truststore.password` : The password for the trust store file. If a password is not set access to the truststore is still available, but integrity checking is disabled. * Type: password * Default: null * Importance: high `producer.ssl.keymanager.algorithm` : The algorithm used by key manager factory for TLS connections. Default value is the key manager factory algorithm configured for the Java Virtual Machine. * Type: string * Default: SunX509 * Importance: low `producer.ssl.trustmanager.algorithm` : The algorithm used by trust manager factory for TLS connections. Default value is the trust manager factory algorithm configured for the Java Virtual Machine. * Type: string * Default: PKIX * Importance: low `producer.ssl.endpoint.identification.algorithm` : The endpoint identification algorithm to validate server hostname using server certificate. * Type: string * Default: https * Importance: low `producer.ssl.secure.random.implementation` : The SecureRandom PRNG implementation to use for TLS cryptography operations. * Type: string * Default: null * Importance: low `producer.sasl.kerberos.service.name` : The Kerberos principal name that Kafka runs as. This can be defined either in Kafka’s JAAS config or in Kafka’s config. * Type: string * Default: null * Importance: medium `producer.sasl.kerberos.kinit.cmd` : Kerberos kinit command path. * Type: string * Default: /usr/bin/kinit * Importance: low `producer.sasl.kerberos.ticket.renew.window.factor` : Login thread will sleep until the specified window factor of time from last refresh to ticket’s expiry has been reached, at which time it will try to renew the ticket. * Type: double * Default: 0.8 * Importance: low `producer.sasl.kerberos.ticket.renew.jitter` : Percentage of random jitter added to the renewal time. * Type: double * Default: 0.05 * Importance: low `producer.sasl.kerberos.min.time.before.relogin` : Login thread sleep time between refresh attempts. * Type: long * Default: 60000 * Importance: low `producer.sasl.login.refresh.window.factor` : Login refresh thread will sleep until the specified window factor relative to the credential’s lifetime has been reached, at which time it will try to refresh the credential. Legal values are between 0.5 (50%) and 1.0 (100%) inclusive; a default value of 0.8 (80%) is used if no value is specified. Currently applies only to OAUTHBEARER. * Type: double * Default: 0.8 * Valid Values: [0.5,…,1.0] * Importance: low `producer.sasl.login.refresh.window.jitter` : The maximum amount of random jitter relative to the credential’s lifetime that is added to the login refresh thread’s sleep time. Legal values are between 0 and 0.25 (25%) inclusive; a default value of 0.05 (5%) is used if no value is specified. Currently applies only to OAUTHBEARER. * Type: double * Default: 0.05 * Valid Values: [0.0,…,0.25] * Importance: low `producer.sasl.login.refresh.min.period.seconds` : The desired minimum time for the login refresh thread to wait before refreshing a credential, in seconds. Legal values are between 0 and 900 (15 minutes); a default value of 60 (1 minute) is used if no value is specified. This value and sasl.login.refresh.buffer.seconds are both ignored if their sum exceeds the remaining lifetime of a credential. Currently applies only to OAUTHBEARER. * Type: short * Default: 60 * Valid Values: [0,…,900] * Importance: low `producer.sasl.login.refresh.buffer.seconds` : The amount of buffer time before credential expiration to maintain when refreshing a credential, in seconds. If a refresh would otherwise occur closer to expiration than the number of buffer seconds then the refresh will be moved up to maintain as much of the buffer time as possible. Legal values are between 0 and 3600 (1 hour); a default value of 300 (5 minutes) is used if no value is specified. This value and sasl.login.refresh.min.period.seconds are both ignored if their sum exceeds the remaining lifetime of a credential. Currently applies only to OAUTHBEARER. * Type: short * Default: 300 * Valid Values: [0,…,3600] * Importance: low `producer.sasl.mechanism` : SASL mechanism used for client connections. This may be any mechanism for which a security provider is available. GSSAPI is the default mechanism. * Type: string * Default: GSSAPI * Importance: medium `producer.sasl.jaas.config` : JAAS login context parameters for SASL connections in the format used by JAAS configuration files. JAAS configuration file format is described here. The format for the value is: ‘loginModuleClass controlFlag (optionName=optionValue)\*;’. For brokers, the config must be prefixed with listener prefix and SASL mechanism name in lower-case. For example, listener.name.sasl_ssl.scram-sha-256.sasl.jaas.config=com.example.ScramLoginModule required; * Type: password * Default: null * Importance: medium `producer.sasl.client.callback.handler.class` : The fully qualified name of a SASL client callback handler class that implements the AuthenticateCallbackHandler interface. * Type: class * Default: null * Importance: medium `producer.sasl.login.callback.handler.class` : The fully qualified name of a SASL login callback handler class that implements the AuthenticateCallbackHandler interface. For brokers, login callback handler config must be prefixed with listener prefix and SASL mechanism name in lower-case. For example, listener.name.sasl_ssl.scram-sha-256.sasl.login.callback.handler.class=com.example.CustomScramLoginCallbackHandler * Type: class * Default: null * Importance: medium `producer.sasl.login.class` : The fully qualified name of a class that implements the Login interface. For brokers, login config must be prefixed with listener prefix and SASL mechanism name in lower-case. For example, listener.name.sasl_ssl.scram-sha-256.sasl.login.class=com.example.CustomScramLogin * Type: class * Default: null * Importance: medium ## Features The following functionality is currently exposed and available through Confluent REST APIs. * **Metadata** - Most metadata about the cluster – brokers, topics, partitions, and configs – can be read using `GET` requests for the corresponding URLs. * **Producers** - Instead of exposing producer objects, the API accepts produce requests targeted at specific topics or partitions and routes them all through a small pool of producers. * Producer configuration - Producer instances are shared, so configs cannot be set on a per-request basis. However, you can adjust settings globally by passing new producer settings in the REST Proxy configuration. For example, you might pass in the `compression.type` option to enable site-wide compression to reduce storage and network overhead. * **Consumers** - Consumers are stateful and therefore tied to specific REST Proxy instances. Offset commit can be either automatic or explicitly requested by the user. Currently limited to one thread per consumer; use multiple consumers for higher throughput. The REST Proxy uses either the high level consumer (v1 api) or the new 0.9 consumer (v2 api) to implement consumer-groups that can read from topics. Note: the v1 API has been marked for deprecation. * Consumer configuration - Although consumer instances are not shared, they do share the underlying server resources. Therefore, limited configuration options are exposed via the API. However, you can adjust settings globally by passing consumer settings in the REST Proxy configuration. * **Data Formats** - The REST Proxy can read and write data using JSON, raw bytes encoded with base64 or using JSON-encoded Avro, Protobuf, or JSON Schema. With Avro, Protobuf, or JSON Schema, schemas are registered and validated against Schema Registry. * **REST Proxy Clusters and Load Balancing** - The REST Proxy is designed to support multiple instances running together to spread load and can safely be run behind various load balancing mechanisms (e.g. round robin DNS, discovery services, load balancers) as long as instances are [configured correctly](production-deployment/rest-proxy/index.md#kafkarest-deployment). * **Simple Consumer** - The high-level consumer should generally be preferred. However, it is occasionally useful to use low-level read operations, for example to retrieve messages at specific offsets. # ksqlDB for Confluent Platform Java Client ksqlDB ships with a lightweight Java client that enables sending requests easily to a ksqlDB server from within your Java application, as an alternative to using the [REST API](../ksqldb-rest-api/rest-api-reference.md#ksqldb-rest-api-reference). The client supports pull and push queries; inserting new rows of data into existing ksqlDB streams; creation and management of new streams, tables, and persistent queries; and also admin operations such as listing streams, tables, and topics. The client sends requests to the HTTP2 server endpoints. Pull and push queries are served by the [/query-stream endpoint](../ksqldb-rest-api/streaming-endpoint.md#ksqldb-rest-api-query-stream-endpoint), and inserts are served by the [/inserts-stream endpoint](../ksqldb-rest-api/streaming-endpoint.md#ksqldb-rest-api-query-stream-endpoint-inserting-rows). All other requests are served by the [/ksql endpoint](../ksqldb-rest-api/ksql-endpoint.md#ksqldb-rest-api-ksql-endpoint). The client is compatible only with ksqlDB deployments that are on version 0.10.0 or later. Use the Java client to: - [Receive query results one row at a time (streamQuery())](#ksqldb-developer-guide-java-client-streamquery) - [Receive query results in a single batch (executeQuery())](#ksqldb-developer-guide-java-client-executequery) - [Terminate a push query (terminatePushQuery())](#ksqldb-developer-guide-java-client-terminatepushquery) - [Insert a new row into a stream (insertInto())](#ksqldb-developer-guide-java-client-insertinto) - [Insert new rows in a streaming fashion (streamInserts())](#ksqldb-developer-guide-java-client-streaminserts) - [Create and manage new streams, tables, and persistent queries (executeStatement())](#ksqldb-developer-guide-java-client-executestatement) - [List streams, tables, topics, and queries](#ksqldb-developer-guide-java-client-admin-operations) - [Describe specific streams and tables](#ksqldb-developer-guide-java-client-describe-source) - [Get metadata about the ksqlDB cluster](#ksqldb-developer-guide-java-client-server-info) - [Manage, list and describe connectors](#ksqldb-developer-guide-java-client-connector-operations) - [Define variables for substitution](#ksqldb-developer-guide-java-client-variable-substitution) - [Execute Direct HTTP Requests](#ksqldb-developer-guide-java-client-direct-http-requests) - [Assert the existence of a topic or schema](#ksqldb-developer-guide-java-client-assert-topics-schemas) Get started below or skip to the end for full [examples](#ksqldb-developer-guide-java-client-tutorials). #### IMPORTANT - You can revert a promoted or failed-over topic back into a mirror topic by using `truncate-and-restore`. This command is only available only on [“bidirectional” links](#bidirectional-linking-cp), and only in KRaft mode. To learn more about running Kafka in KRaft mode, see [KRaft Overview for Confluent Platform](../../kafka-metadata/kraft.md#kraft-overview), [KRaft Configuration for Confluent Platform](../../kafka-metadata/config-kraft.md#configure-kraft), and the [Platform Quick Start](../../get-started/platform-quickstart.md#cp-quickstart-step-1). Also, the [basic Cluster Linking tutorial](topic-data-sharing.md#tutorial-topic-data-sharing) includes a full walkthrough of how to run Cluster Linking in KRaft mode. - You can run `mirror describe` (`confluent kafka mirror describe --link `) on a promoted or failed over mirror topic, if you do not delete the cluster link. If you delete the cluster link, you will lose the history and, therefore, `mirror describe` will not find data on promoted or failed over topics. - There is no way to change a mirror topic to use a different cluster link or make changes to the link itself, other than to recreate the mirror topic on a different link. - You cannot delete a cluster link that still has mirror topics on it (the delete operation will fail). - If you are using Confluent for Kubernetes (CFK), and you delete your cluster link resource, any mirror topics still attached to that cluster link will be forcibly converted to regular topics by use of the `failover` API. To learn more, see [Modify a mirror topic](https://docs.confluent.io/operator/current/co-link-clusters.html#modify-a-mirror-topic) in [Cluster Linking using Confluent for Kubernetes](https://docs.confluent.io/operator/current/co-link-clusters.html#). ## Related Content Schema Linking is the recommended way to migrate schemas on Confluent Platform 7.0.0 or newer releases. [Schema Linking on Confluent Platform](../schema-linking-cp.md#schema-linking-cp-overview) [Confluent Cloud](/cloud/current/index.html) These more general topics are helpful for understanding how Schema Registry and schemas are managed on multi data center deployments. - [Confluent Cloud](/cloud/current/index.html) - [Replicator Schema Translation Example for Confluent Platform](../../multi-dc-deployments/replicator/replicator-schema-translation.md#quickstart-demos-replicator-schema-translation) - [Quick Start for Schema Management on Confluent Cloud](/cloud/current/get-started/schema-registry.html) - [Schema Registry Configuration Reference for Confluent Platform](config.md#schemaregistry-config) - [Overview of Multi-Datacenter Deployment Solutions on Confluent Platform](../../multi-dc-deployments/index.md#multi-dc) - [Schema Registry API Reference](../develop/api.md#schemaregistry-api) To learn more, see these sections in [Replicator Configuration Reference for Confluent Platform](../../multi-dc-deployments/replicator/configuration_options.md#replicator-config-options): - [Source Topics](../../multi-dc-deployments/replicator/configuration_options.md#rep-source-topics) - [Destination Topics](../../multi-dc-deployments/replicator/configuration_options.md#rep-destination-topics) - [Schema Translation](../../multi-dc-deployments/replicator/configuration_options.md#schema-translation) Finally, [Replicator Schema Translation Example for Confluent Platform](../../multi-dc-deployments/replicator/replicator-schema-translation.md#quickstart-demos-replicator-schema-translation) shows a demo of migrating schemas across self-managed, on-premises clusters, using the legacy Replicator methods. # Authorization in Confluent Platform * [Overview](overview.md) * [Access Control Lists](acls/index.md) * [Overview](acls/overview.md) * [Manage ACLs](acls/manage-acls.md) * [Role-Based Access Control](rbac/index.md) * [Overview](rbac/overview.md) * [Quick Start](rbac/rbac-cli-quickstart.md) * [Predefined RBAC Roles](rbac/rbac-predefined-roles.md) * [Cluster Identifiers](rbac/rbac-get-cluster-ids.md) * [Example of Enabling RBAC](rbac/cp-rbac-example.md) * [Enable RBAC on Running Cluster](rbac/enable-rbac-running-cluster.md) * [Use mTLS with RBAC](rbac/mtls-rbac.md) * [Configure mTLS with RBAC](rbac/configure-mtls-rbac.md) * [Deployment Patterns for mTLS with RBAC](rbac/mtls-rbac-options.md) * [Client Flow for OAuth-OIDC using RBAC](rbac/client-flow-oauth-oidc-and-rbac.md) * [Migrate LDAP to OAuth for RBAC](rbac/migrate-ldap-to-oauth-for-rbac.md) * [Migrate LDAP to mTLS for RBAC](rbac/migrate-ldap-to-mtls.md) * [RBAC using REST API](rbac/rbac-config-using-rest-api.md) * [Use Centralized ACLs with MDS for Authorization](rbac/authorization-acl-with-mds.md) * [Request Forwarding with mTLS RBAC](rbac/request-forwarding-mtls-rbac.md) * [Deploy Secure ksqlDB with RBAC](rbac/ksql-rbac.md) * [Metadata API](rbac/mds-api.md) * [LDAP Group-Based Authorization](ldap/index.md) * [Configure LDAP Group-Based Authorization](ldap/configure.md) * [LDAP Configuration Reference](ldap/ldap-config-ref.md) * [Tutorial: Group-Based Authorization Using LDAP](ldap/quickstart.md) * [Configure Confluent Server Authorizer in Confluent Platform](../csa-introduction.md) ## Kafka 101 Kafka Streams is, by deliberate design, tightly integrated with Apache Kafka®: many capabilities of Kafka Streams such as its [stateful processing features](architecture.md#streams-architecture-state), its [fault tolerance](architecture.md#streams-architecture-fault-tolerance), and its [processing guarantees](#streams-concepts-processing-guarantees) are built on top of functionality provided by Apache Kafka®’s storage and messaging layer. It is therefore important to familiarize yourself with the key concepts of Kafka, notably the sections [Getting Started](/kafka/get-started.html) and [Design](/kafka/design/index.html). In particular you should understand: * **The who’s who:** Kafka distinguishes **producers**, **consumers**, and **brokers**. In short, producers publish data to Kafka brokers, and consumers read published data from Kafka brokers. Producers and consumers are totally decoupled, and both run outside the Kafka brokers in the perimeter of a Kafka cluster. A Kafka **cluster** consists of one or more brokers. An application that uses the Kafka Streams API acts as both a producer and a consumer. * **The data:** Data is stored in **topics**. The topic is the most important abstraction provided by Kafka: it is a category or feed name to which data is published by producers. Every topic in Kafka is split into one or more **partitions**. Kafka partitions data for storing, transporting, and replicating it. Kafka Streams partitions data for processing it. In both cases, this partitioning enables elasticity, scalability, high performance, and fault tolerance. * **Parallelism:** Partitions of Kafka topics, and especially their number for a given topic, are also the main factor that determines the parallelism of Kafka with regards to reading and writing data. Because of the tight integration with Kafka, the parallelism of an application that uses the Kafka Streams API is primarily depending on Kafka’s parallelism. ## A Closer Look Before you dive into the [Concepts](concepts.md#streams-concepts) and [Architecture](architecture.md#streams-architecture), get your feet wet by walking through [your first Kafka Streams application](https://developer.confluent.io/tutorials/creating-first-apache-kafka-streams-application/confluent.html), let’s take a closer look. A key motivation of the Kafka Streams API is to bring stream processing out of the Big Data niche into the world of mainstream application development, and to radically improve the developer and operations experience by [making stream processing simple and easy](http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple). Using the Kafka Streams API you can implement standard Java applications to solve your stream processing needs – whether at small or at large scale – and then run these applications on client machines at the perimeter of your Kafka cluster. Your applications are fully elastic: you can run one or more instances of your application, and they will automatically discover each other and collaboratively process the data. Your applications are also fault-tolerant: if one of the instances dies, then the remaining instances will automatically take over its work – without any data loss! Deployment-wise, you are free to choose from any technology that can deploy Java applications, including but not limited to Puppet, Chef, Ansible, Docker, Mesos, YARN, Kubernetes, and so on. This lightweight and integrative approach of the Kafka Streams API – “Build applications, not infrastructure!” – is in stark contrast to other stream processing tools that require you to install and operate separate processing clusters and similar heavy-weight infrastructure that come with their own special set of rules on how to use and interact with them. The following list highlights [several key capabilities and aspects](architecture.md#streams-architecture) of the Kafka Streams API that make it a compelling choice for use cases such as microservices, event-driven systems, reactive applications, and continuous queries and transformations. **Powerful** : * Makes your applications highly scalable, elastic, distributed, fault-tolerant * Supports exactly-once processing semantics * Stateful and stateless processing * Event-time processing with windowing, joins, aggregations * Supports [Kafka Streams Interactive Queries for Confluent Platform](developer-guide/interactive-queries.md#streams-developer-guide-interactive-queries) to unify the worlds of streams and databases * Choose between a [declarative, functional API](developer-guide/dsl-api.md#streams-developer-guide-dsl) and a lower-level [imperative API](developer-guide/processor-api.md#streams-developer-guide-processor-api) for maximum control and flexibility **Lightweight** : * Low barrier to entry * Equally viable for small, medium, large, and very large use cases * Smooth path from local development to large-scale production * No processing cluster required * No external dependencies other than Kafka **Fully integrated** : * 100% compatible with Kafka 0.11.0 and 1.0.0 * Easy to integrate into existing applications and microservices * No artificial rules for packaging, deploying, and monitoring your applications * Runs everywhere: on-premises, public clouds, private clouds, containers, etc. * Integrates with databases through continuous change data capture (CDC) performed by [Kafka Connect](../connect/index.md#kafka-connect) **Real-time** : * Millisecond processing latency * Record-at-a-time processing (no micro-batching) * Seamlessly handles out-of-order data * High throughput **Secure** : * Supports [encryption of data-in-transit](developer-guide/security.md#streams-developer-guide-security) * Supports [authentication and authorization](developer-guide/security.md#streams-developer-guide-security) In summary, the Kafka Streams API is a compelling choice for building mission-critical stream processing applications and microservices. Give it a try with this step-by-step tutorial to build your first [Kafka Streams application](https://developer.confluent.io/tutorials/creating-first-apache-kafka-streams-application/confluent.html)! The next sections, [Concepts](concepts.md#streams-concepts), [Architecture](architecture.md#streams-architecture), and the [Developer Guide](developer-guide/overview.md#streams-developer-guide) will help to get you started. ### Configure passwordless OAuth authentication Starting with version 8.0, Confluent Ansible supports client assertion for Confluent Platform, a secure credential management with passwordless authentication. It uses asymmetric encryption-based authentication, extending Confluent Platform OAuth, and allows you to: * Avoid deploying username, password while securing Confluent Platform. * Streamline and automate client credential rotation on a periodic basis without manual intervention for the client applications. In Confluent Ansible 8.0, OAuth client assertion is not supported for Confluent Control Center. To configure client assertion on Confluent Platform components: 1. Enable client assertion for Confluent Platform components using the following variables: Kafka broker (`kafka_broker_`) and KRaft controller (`kafka_controller_`) inherit the superuser properties (`oauth_superuser_`) if not set. ```yaml oauth_superuser_oauth_client_assertion_enabled: true kafka_broker_oauth_client_assertion_enabled: true kafka_controller_oauth_client_assertion_enabled: true schema_registry_oauth_client_assertion_enabled: true kafka_connect_oauth_client_assertion_enabled: true ksql_oauth_client_assertion_enable: true kafka_rest_oauth_client_assertion_enable: true kafka_connect_replicator_oauth_client_assertion_enable: true kafka_connect_replicator_producer_oauth_client_assertion_enable: true kafka_connect_replicator_erp_oauth_client_assertion_enable: true kafka_connect_replicator_consumer_erp_oauth_client_assertion_enable: true ``` 2. Set other dependent variables listed below. Refer to the previous step for ``. ```yaml _oauth_user: //client ID, currently in use. _oauth_client_assertion_issuer: _oauth_client_assertion_sub: _oauth_client_assertion_audience: _oauth_client_assertion_private_key_file: _oauth_client_assertion_template_file: //optional _client_assertion_private_key_passphrase: //optional _oauth_client_assertion_jti_include: //optional _oauth_client_assertion_nbf_include: //optional ``` Example configurations: ```yaml ksql_oauth_client_assertion_enabled: true ksql_oauth_client_assertion_issuer: ksql ksql_oauth_client_assertion_audience: https://oauth1:8443/realms/cp-ansible-realm ksql_oauth_client_assertion_private_key_file: "my-tokenKeypair.pem" ``` Currently, there is no first-class support for the properties listed below, which are optional fields in OAuth and also in client assertion. You can set them using custom properties, `_custome_properties`. ```yaml kafka_broker_custom_properties: *.login.connect.timeout.ms *.login.read.timeout.ms *.login.retry.backoff.max.ms *.login.retry.backoff.ms ``` ## Cluster upgrades and error handling Confluent Cloud regularly updates clusters to perform [upgrades and maintenance](../release-notes/upgrade-policy.md#minor-ccloud-upgrade). During this process, Confluent performs [rolling restarts](../_glossary.md#term-rolling-restart) of all the brokers in a cluster. The Kafka protocol and architecture are designed for this type of highly-available, fault-tolerant operation. To ensure seamless client handling of cluster updates, you must configure your clients using [current client libraries](overview.md#client-support-matrix). Confluent recommends you use the strategies for error handling outlined below. During normal cluster operations that use a rolling restart, clients may encounter the following warning exceptions: ```none UNKNOWN_TOPIC_OR_PARTITION: "This server does not host this topic-partition." ``` ```none LEADER_NOT_AVAILABLE: "There is no leader for this topic-partition as we are in the middle of a leadership election." ``` ```none NOT_COORDINATOR: "This is not the correct coordinator." ``` ```none NOT_ENOUGH_REPLICAS: "Messages are rejected since there are fewer in-sync replicas than required." ``` ```none NOT_ENOUGH_REPLICAS_AFTER_APPEND: "Messages are written to the log, but to fewer in-sync replicas than required." ``` ```none NOT_LEADER_OR_FOLLOWER: "This server is not the leader for the given partition." ``` The following message is what a client would log at `WARN` level should it attempts to connect to a broker that is just restarted (in the context of a maintenance): ```none "Connection to node {} ({}) terminated during authentication. This may happen due to any of the following reasons: (1) Authentication failed due to invalid credentials with brokers older than 1.0.0, (2) Firewall blocking Kafka TLS traffic (eg it may only allow HTTPS traffic), (3) Transient network issue." ``` Configure clients with a sufficient number of retries or retry time to prevent these warning exceptions from getting logged as errors. * By default, Kafka producer clients retry for two minutes, print these warnings to logs, and recover without any intervention. * By default, Kafka consumer and admin clients retry for one minute. Timeout exceptions will occur if clients run out of memory buffer space while retrying or if clients run out of time while waiting for memory. In general, planning for volatility is a basic tenet of building cloud-native client applications. In addition to normal cluster operations, brokers may disappear for a variety of reasons, such as issues with the underlying infrastructure at the cloud-provider layer. For more information, see [Cloud-native applications](architecture.md#ccloud-architecture-cloud-native-apps). ## Flags ```none --hosts strings REQUIRED: A comma-separated list of hosts. --protocol string REQUIRED: Security protocol. --cluster-name string REQUIRED: Cluster name. --kafka-cluster string Kafka cluster ID. --schema-registry-cluster string Schema Registry cluster ID. --ksql-cluster string ksqlDB cluster ID. --connect-cluster string Kafka Connect cluster ID. --cmf string Confluent Managed Flink (CMF) ID. --flink-environment string Flink environment ID. --client-cert-path string Path to client cert to be verified by MDS. Include for mTLS authentication. --client-key-path string Path to client private key, include for mTLS authentication. --context string CLI context name. ``` ### Kafka Broker Create a new file to and put the `KafkaServer` configuration into it. The `KafkaServer` section is for the authentication on brokers. For this example, create it at `/tmp/kafka_server_jaas.conf`. ```bash KafkaServer { org.apache.kafka.common.security.plain.PlainLoginModule required username="admin" password="admin-secret" user_admin="admin-secret" user_confluent="confluent-secret" user_metricsreporter="metricsreporter-secret"; }; KafkaClient { org.apache.kafka.common.security.plain.PlainLoginModule required username="metricsreporter" password="metricsreporter-secret"; }; ``` This configures several users on the server: : - an `admin` user for internal interbroker traffic - a `confluent` user, for Confluent Control Center, Kafka Connect, and Schema Registry - a `metricsreporter` user for Metrics Reporter to publish Apache Kafka® metrics In this example, Metrics Reporter publishes metrics to the same cluster it is configured on, so we also need to include the corresponding `KafkaClient` client configuration in the same file. It is possible to pass the JAAS configuration file location as JVM parameter to each client JVM as ```bash -Djava.security.auth.login.config=/tmp/kafka_server_jaas.conf ``` Next, secure the Kafka broker, the monitoring interceptor and the metrics reporter. There are [more options for security](/platform/current/security/overview.html#security), but this broker will be secured using `SASL_PLAINTEXT`. ### Schema Registry Configuration If you followed the quick start, Connect relies on Schema Registry, so we first need to update Schema Registry to use SASL authentication. Edit the Schema Registry configuration (`CONFLUENT_HOME/etc/schema-registry/schema-registry.properties`) and add the following settings. ```bash kafkastore.security.protocol=SASL_PLAINTEXT kafkastore.sasl.mechanism=PLAIN ``` Start schema registry with the additional `SCHEMA_REGISTRY_OPTS` parameter with the JAAS file [created ealier](#controlcenter-security-kafkaclient). ```bash SCHEMA_REGISTRY_OPTS=-Djava.security.auth.login.config=/tmp/kafka_client_jaas.conf \ confluent local services schema-registry start ``` ### Transactional producer and exactly-once semantics The JavaScript Client library supports idempotent producers, transactional producers, and exactly-once semantics (EOS). To use an idempotent producer: ```js const producer = new Kafka().producer({ 'bootstrap.servers': '', 'enable.idempotence': true, }); ``` More details about the guarantees provided by an idempotent producer can be found [here](https://github.com/confluentinc/librdkafka/blob/master/INTRODUCTION.md#idempotent-producer), as well as the limitations and other configuration changes that an idempotent producer brings. To use a transactional producer: ```js const producer = new Kafka().producer({ 'bootstrap.servers': '', 'transactional.id': 'my-transactional-id', // Must be unique for each producer instance. }); await producer.connect(); // Start transaction. await producer.transaction(); await producer.send({topic: 'topic', messages: [{value: 'message'}]}); // Commit transaction. await producer.commit(); ``` Specifying a `transactional.id` makes the producer transactional. The `transactional.id` must be unique for each producer instance. The producer must be connected before starting a transaction with `transaction()`. A transactional producer cannot be used as a non-transactional producer, and every message must be within a transaction. More details about the guarantees provided by a transactional producer can be found [here](https://github.com/confluentinc/librdkafka/blob/master/INTRODUCTION.md#transactional-producer). Using a transactional producer also allows for exactly-once semantics (EOS) in the specific case of consuming from a Kafka cluster, processing the message, and producing to another topic on the same cluster. ```js consumer.run({ eachMessage: async ({ topic, partition, message }) => { try { const transaction = await producer.transaction(); await transaction.send({ topic: 'produceTopic', messages: [ { value: 'consumed a message: ' + message.value.toString() }, ] }); await transaction.sendOffsets({ consumer, topics: [ { topic, partitions: [ { partition, offset: String(Number(message.offset) + 1) }, ], } ], }); // The transaction assures that the message sent and the offset committed // are transactional, only reflecting on the broker on commit. await transaction.commit(); } catch (e) { console.error(e); await transaction.abort(); } }, }); ``` ## Quick Start This quick start uses the ActiveMQ Sink Connector to consume records from Kafka and send them to an ActiveMQ broker. 1. [Install ActiveMQ](https://activemq.apache.org/getting-started#installation-procedure-for-unix) 2. [Start ActiveMQ](https://activemq.apache.org/getting-started#starting-activemq) 3. Install the connector through the [Confluent Hub Client](https://docs.confluent.io/current/connect/managing/confluent-hub/client.html). ```bash # run from your Confluent Platform installation directory confluent connect plugin install confluentinc/kafka-connect-activemq-sink:latest ``` 4. Start Confluent Platform. ```bash confluent local start ``` 5. [Produce](https://docs.confluent.io/current/cli/command-reference/confluent-produce.html) test data to the `sink-messages` topic in Kafka. ```bash seq 10 | confluent local produce sink-messages ``` 6. Create a `activemq-sink.json` file with the following contents: ```json { "name": "AMQSinkConnector", "config": { "connector.class": "io.confluent.connect.jms.ActiveMqSinkConnector", "tasks.max": "1", "topics": "sink-messages", "activemq.url": "tcp://localhost:61616", "activemq.username": "connectuser", "activemq.password": "connectuser", "jms.destination.type": "queue", "jms.destination.name": "connector-quickstart", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "value.converter": "org.apache.kafka.connect.storage.StringConverter", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1" } } ``` 7. Load the ActiveMQ Sink Connector. ```bash confluent local load jms --config activemq-sink.json ``` #### IMPORTANT Don’t use the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/index.html) in production environments. 8. Confirm that the connector is in a `RUNNING` state. ```bash confluent local status AMQSinkConnector ``` 9. Navigate to the [ActiveMQ Admin UI](http://localhost:8161/admin) or use the following ActiveMQ CLI command to confirm the messages were delivered to the `connector-quickstart` queue. ```bash ./bin/activemq consumer --destination connector-quickstart --messageCount 10 ``` For an example of how to get Kafka Connect connected to [Confluent Cloud](/cloud/current/index.html), see [Connect Self-Managed Kafka Connect to Confluent Cloud](/cloud/current/cp-component/connect-cloud-config.html#distributed-cluster). ### Start the BigQuery Sink connector To start the BigQuery Sink Connector, complete the following steps: 1. Create the file `register-kcbd-connect-bigquery.json` to store the connector configuration. **Connect Distributed REST quick start connector properties:** ```json { "name": "kcbq-connect1", "config": { "connector.class": "com.wepay.kafka.connect.bigquery.BigQuerySinkConnector", "tasks.max" : "1", "topics" : "kcbq-quickstart1", "sanitizeTopics" : "true", "autoCreateTables" : "true", "allowNewBigQueryFields" : "true", "allowBigQueryRequiredFieldRelaxation" : "true", "schemaRetriever" : "com.wepay.kafka.connect.bigquery.retrieve.IdentitySchemaRetriever", "project" : "confluent-243016", "defaultDataset" : "ConfluentDataSet", "keyfile" : " /Users/titomccutcheon/dev/confluent_fork/kafka-connect-bigquery/kcbq-connector/quickstart/properties/confluent-243016-384a24e2de1a.json", "transforms" : "RegexTransformation", "transforms.RegexTransformation.type" : "org.apache.kafka.connect.transforms.RegexRouter", "transforms.RegexTransformation.regex" : "(kcbq_)(.*)", "transforms.RegexTransformation.replacement" : "$2" } } ``` Note that the `project` key is the `id` value of the BigQuery project : in Google Cloud. For `datasets`, the value `ConfluentDataSet` is the ID of the dataset entered by the user during Google Cloud dataset creation.\`\`keyfile\`\` is the service account key JSON file location. If you don’t want this connector to create a BigQuery table automatically, create a BigQuery table with `Partitioning: Partition by ingestion time` and a proper schema. Also, note that the properties prefixed with `transforms` are used to set up SMTs. The following is an example regex router SMT that strips `kcbq_` from the topic name. Replace with relevant regex to replace the topic of each sink record with destination dataset and table name in the format `:` or only the destination table name in the format `` 1. Start the connector. ```text curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" http://localhost:8083/connectors/ -d @register-kcbd-connect-bigquery.json ``` ### Install and load the connector 1. Install the connector through the [Confluent Hub Client](/kafka-connectors/self-managed/confluent-hub/client.html). ```bash # run from your CP installation directory confluent connect plugin install confluentinc/kafka-connect-gcp-bigtable:latest ``` Note that by default, it will install the plugin into `share/confluent-hub-components` and add the directory to the plugin path. 2. Adding a new connector plugin requires restarting Connect. Use the Confluent CLI to restart Kafka Connect. ```bash confluent local services connect stop && confluent local services connect start ``` 3. Configure your connector by adding the file `etc/kafka-connect-gcp-bigtable/sink-quickstart-bigtable.properties`, with the following properties: ```none name=BigTableSinkConnector topics=stats tasks.max=1 connector.class=io.confluent.connect.gcp.bigtable.BigtableSinkConnector gcp.bigtable.credentials.path=$home/bigtable-test-credentials.json gcp.bigtable.project.id=YOUR-PROJECT-ID gcp.bigtable.instance.id=test-instance auto.create.tables=true aut.create.column.families=true table.name.format=example_table # The following define the Confluent license stored in Kafka, so we need the Kafka bootstrap addresses. # `replication.factor` may not be larger than the number of Kafka brokers in the destination cluster, # so here we set this to '1' for demonstration purposes. Always use at least '3' in production configurations. confluent.license= confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 ``` Ensure you replace `YOUR-PROJECT-ID` with the project ID you created in the prerequisite portion of this quick start. You should also replace `$home` with your home directory path, or any other path where the credentials file was saved. 4. Start the BigTable Sink connector by loading the connector’s configuration using the following command: ```bash confluent local load bigtable --config etc/kafka-connect-gcp-bigtable/sink-quickstart-bigtable.properties ``` Your output should resemble the following: ```json { "name": "bigtable", "config": { "topics": "stats", "tasks.max": "1", "connector.class": "io.confluent.connect.gcp.bigtable.BigtableSinkConnector", "gcp.bigtable.credentials.path": "$home/bigtable-test-credentials.json", "gcp.bigtable.instance.id": "test-instance", "gcp.bigtable.project.id": "YOUR-PROJECT-ID", "auto.create.tables": "true", "auto.create.column.families": "true", "table.name.format": "example_table", "confluent.license": "", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1", "name": "bigtable" }, "tasks": [ { "connector": "bigtable", "task": 0 } ], "type": "sink" } ``` 5. Check the status of the connector to confirm it’s in a `RUNNING` state. ```bash confluent local status bigtable ``` Your output should resemble the following: ```bash { "name": "bigtable", "connector": { "state": "RUNNING", "worker_id": "10.200.7.192:8083" }, "tasks": [ { "id": 0, "state": "RUNNING", "worker_id": "10.200.7.192:8083" } ], "type": "sink" } ``` ## Quick Start In the following example, the GCS Source connector reads all data listed under a specific GCS bucket and then loads them into a Kafka topic. It does not matter what file naming convention you use for writing data to the GCS bucket. 1. Upload the following data under a folder name `quickstart` within the targeted GCS bucket. In this example JSON format, which supports line-delimited JSON, concatenated JSON, and a JSON array of records, is used. ```json {"f1": "value1"} {"f1": "value2"} {"f1": "value3"} {"f1": "value4"} {"f1": "value5"} {"f1": "value6"} {"f1": "value7"} {"f1": "value8"} {"f1": "value9"} ``` 2. Install the connector through the [Confluent Hub Client](https://docs.confluent.io/current/connect/managing/confluent-hub/client.html) by running the following command from your Confluent Platform installation directory: ```bash confluent connect plugin install confluentinc/kafka-connect-gcs-source:latest ``` 3. Create a `quickstart-gcssource-generalized.properties` file with the following contents: ```json { "name": "quickstart-gcs-source", "config": { "connector.class": "io.confluent.connect.gcs.GcsSourceConnector", "tasks.max": "1", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "mode": "GENERIC", "topics.dir": "quickstart", "topic.regex.list": "quick-start-topic:.*", "format.class": "io.confluent.connect.gcs.format.json.JsonFormat", "gcs.bucket.name": "", "gcs.credentials.path": "
", "value.converter.schemas.enable": "false" } } ``` 4. Load the Generalized GCS Source connector by running the following command: ```bash confluent local services connect connector load quickstart-gcs-source --config quickstart-gcssource-generalized.properties ``` 5. Verify the connector is in a `RUNNING` state: ```bash confluent local services connect connector status quickstart-gcs-source ``` 6. Verify the messages are being sent to Kafka: ```bash kafka-console-consumer \ --bootstrap-server localhost:9092 \ --topic quick-start-topic \ --from-beginning ``` 7. You should see output similar to the following: ```json {"f1": "value1"} {"f1": "value2"} {"f1": "value3"} {"f1": "value4"} {"f1": "value5"} {"f1": "value6"} {"f1": "value7"} {"f1": "value8"} {"f1": "value9"} ``` ## ActveMQ Quick Start For an example of how to get Kafka Connect connected to [Confluent Cloud](/cloud/current/index.html), see [Connect Self-Managed Kafka Connect to Confluent Cloud](/cloud/current/cp-component/connect-cloud-config.html#distributed-cluster). This quick start uses the JMS Sink connector to consume records from Kafka and send them to an ActiveMQ broker. Prerequisites : - [Confluent Platform](/platform/2.1/installation/index.html) - [Confluent CLI](/confluent-cli/current/installing.html) (requires separate installation) 1. [Install ActiveMQ](https://activemq.apache.org/getting-started#installation-procedure-for-unix) 2. [Start ActiveMQ](https://activemq.apache.org/getting-started#starting-activemq) 3. Install the connector by using the following [CLI command](/confluent-cli/current/command-reference/connect/plugin/confluent_connect_plugin_install.html): ```bash # run from your Confluent Platform installation directory confluent connect plugin install confluentinc/kafka-connect-jms-sink:latest ``` 4. [Download the activemq-all JAR](https://repo1.maven.org/maven2/org/apache/activemq/activemq-all/5.15.4/activemq-all-5.15.4.jar) and copy it into the JMS Sink connector’s plugin folder. This needs to be done on every Connect worker node and you must restart the workers pick up the client JAR. 5. Start Confluent Platform using the [confluent local](/confluent-cli/current/command-reference/local/index.html) command. ```bash confluent local start ``` 6. [Produce](https://docs.confluent.io/current/cli/command-reference/confluent-produce.html) test data to the `jms-messages` topic in Kafka. ```bash seq 10 | confluent local produce jms-messages ``` 7. Create a `jms-sink.json` file with the following contents: ```json { "name": "JmsSinkConnector", "config": { "connector.class": "io.confluent.connect.jms.JmsSinkConnector", "tasks.max": "1", "topics": "jms-messages", "java.naming.factory.initial": "org.apache.activemq.jndi.ActiveMQInitialContextFactory", "java.naming.provider.url": "tcp://localhost:61616", "java.naming.security.principal": "connectuser", "java.naming.security.credentials": "connectpassword", "connection.factory.name": "connectionFactory", "jms.destination.type": "queue", "jms.destination.name": "connector-quickstart", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "value.converter": "org.apache.kafka.connect.storage.StringConverter", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1" } } ``` 8. Load the JMS Sink connector. ```bash confluent local load jms --config jms-sink.json ``` #### IMPORTANT Don’t use the [Confluent CLI](/confluent-cli/current/index.html) in production environments. 9. Confirm that the connector is in a `RUNNING` state. ```bash confluent local status jms ``` 10. Navigate to the [ActiveMQ Admin UI](http://localhost:8161/admin) to confirm the messages were delivered to the `connector-quickstart` queue. ## TIBCO EMS Quick Start This quick start uses the JMS Sink connector to consume records from Kafka and send them to TIBCO Enterprise Message Service - Community Edition. 1. Download and unzip [TIBCO EMS Community Edition](https://www.tibco.com/resources/product-download/tibco-enterprise-message-service-community-edition-free-download-mac). 2. Run the `TIBCOUniversalInstaller-mac.command` and step through the TIBCO Universal Installer. 3. Start TIBCO EMS with default configurations. ```bash ~/TIBCO_HOME/ems/8.4/bin/tibemsd ``` 4. Install the connector through the [Confluent Hub Client](/kafka-connectors/self-managed/confluent-hub/client.html). ```bash # run from your Confluent Platform installation directory confluent connect plugin install confluentinc/kafka-connect-jms-sink:latest ``` 5. Copy `~/TIBCO_HOME/ems/8.4/lib/tibjms.jar` into the JMS Sink connector’s plugin folder. This needs to be done on every Connect worker node and the workers must be restarted to pick up the client jar. 6. Start Confluent Platform. ```bash confluent local start ``` 7. [Produce](https://docs.confluent.io/current/cli/command-reference/confluent-produce.html) test data to the `jms-messages` topic in Kafka. ```bash seq 10 | confluent local produce jms-messages ``` 8. Create a `jms-sink.json` file with the following contents: ```json { "name": "JmsSinkConnector", "config": { "connector.class": "io.confluent.connect.jms.JmsSinkConnector", "tasks.max": "1", "topics": "jms-messages", "java.naming.provider.url": "tibjmsnaming://localhost:7222", "java.naming.factory.initial": "com.tibco.tibjms.naming.TibjmsInitialContextFactory", "connection.factory.name": "QueueConnectionFactory", "java.naming.security.principal": "admin", "java.naming.security.credentials": "", "jms.destination.type": "queue", "jms.destination.name": "connector-quickstart", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "value.converter": "org.apache.kafka.connect.storage.StringConverter", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1" } } ``` 9. Load the JMS Sink connector. ```bash confluent local load jms --config jms-sink.json ``` #### IMPORTANT Don’t use the [Confluent CLI](/confluent-cli/current/index.html) in production environments. 10. Confirm that the connector is in a `RUNNING` state. ```bash confluent local status jms ``` 11. Validate that there are messages on the queue using the `tibemsadmin` tool. ```bash ~/TIBCO_HOME/ems/8.4/bin/tibemsadmin -server "tcp://localhost:7222" -user admin # admin password is blank by default tcp://localhost:7222> show queue connector-quickstart ``` ### Connect to IBM MQ using LDAP The [IBM MQ](https://docs.confluent.io/kafka-connect-ibmmq-source/current/) is available for download from Confluent Hub. If possible, you should use the IBM MQ Source connector instead of the general JMS connector. However, you may want to use the more general connector if you are required to connect to IBM MQ using LDAP, or any other JNDI mechanism. To get started, you must install the latest IBM MQ JMS client libraries into the same directory where this connector is installed. For more details, see the [IBM MQ installation](https://www.ibm.com/docs/en/ibm-mq/8.0?topic=mq-installing-uninstalling) documentation for more details. Next, create a connector configuration for your environment using the appropriate configuration properties. The following example shows a typical but incomplete configuration of the connector for use with [distributed mode](/platform/current/connect/concepts.html#distributed-workers). ```bash { "name": "connector1", "config": { "connector.class": "io.confluent.connect.jms.JmsSourceConnector", "kafka.topic":"MyKafkaTopicName", "jms.destination.name":"MyQueueName", "jms.destination.type":"queue", "java.naming.factory.initial":"com.sun.jndi.ldap.LdapCtxFactory", "java.naming.provider.url":"ldap://" "java.naming.security.principal":"MyUserName", "java.naming.security.credentials":"MyPassword", "confluent.license":"" "confluent.topic.bootstrap.servers":"localhost:9092" } } ``` Note that any extra properties defined on the connector will be passed into the JNDI InitialContext. This makes it easy to pass any IBM MQ specific settings used for connecting to the IBM MQ broker. Finally, deploy your connector by posting it to a Kafka Connect distributed worker. ## Quick Start In the following example, the Generalized S3 Source connector reads all data listed under a specific S3 bucket and then loads them into a Kafka topic. You may use any file naming convention writing when data to the S3 bucket. 1. Upload the following data under a folder name `quickstart` within the targeted S3 bucket. In this example JSON format is used, which supports the following: line-delimited JSON, concatenated JSON, and a JSON array of records. ```json {"f1": "value1"} {"f1": "value2"} {"f1": "value3"} {"f1": "value4"} {"f1": "value5"} {"f1": "value6"} {"f1": "value7"} {"f1": "value8"} {"f1": "value9"} ``` 2. Install the connector by running the following command from your Confluent Platform installation directory: ```bash confluent connect plugin install confluentinc/kafka-connect-s3-source:latest ``` 3. Create a `quickstart-s3source-generalized.properties` file with the following contents: ```properties name=quick-start-s3-source connector.class=io.confluent.connect.s3.source.S3SourceConnector tasks.max=1 value.converter=org.apache.kafka.connect.json.JsonConverter mode=GENERIC topics.dir=quickstart format.class=io.confluent.connect.s3.format.json.JsonFormat topic.regex.list=quick-start-topic:.* s3.bucket.name=healthcorporation value.converter.schemas.enable=false ``` #### NOTE For more information about accepted regular expressions, see [Google RE2 syntax](https://github.com/google/re2/wiki/Syntax/). 4. Load the Generalized S3 Source connector. ```bash confluent local services connect connector load quick-start-s3-source --config quickstart-s3source-generalized.properties ``` 5. Confirm the connector is in a `RUNNING` state: ```bash confluent local services connect connector status quick-start-s3-source ``` 6. Confirm that the messages are being sent to Kafka. ```bash kafka-console-consumer \ --bootstrap-server localhost:9092 \ --topic quick-start-topic \ --from-beginning ``` 7. The response should be 9 records as shown in the following example: ```bash {"f1": "value1"} {"f1": "value2"} {"f1": "value3"} {"f1": "value4"} {"f1": "value5"} {"f1": "value6"} {"f1": "value7"} {"f1": "value8"} {"f1": "value9"} ``` ## Quick Start Prerequisites : - [Confluent Platform](/platform/current/installation/index.html) - [Confluent CLI](https://docs.confluent.io/confluent-cli/current/installing.html) (requires separate installation) 1. Install the connector: ```none confluent connect plugin install confluentinc/kafka-connect-syslog:latest ``` 2. Start Confluent Platform using the Confluent CLI [confluent local](https://docs.confluent.io/confluent-cli/current/command-reference/local/index.html) commands. ```bash confluent local services connect start ``` 3. Create a config file with the following contents: ```none name=syslog-tcp tasks.max=1 connector.class=io.confluent.connect.syslog.SyslogSourceConnector syslog.port=5454 syslog.listener=TCP confluent.license= confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 ``` 4. Load the Syslog Connector. ```bash confluent local load syslog-tcp --config path/to/config.properties ``` #### IMPORTANT Don’t use the [confluent local](https://docs.confluent.io/confluent-cli/current/command-reference/local/index.html) commands in production environments. Always run the Syslog connector in standalone mode, for example, with `bin/connect-standalone`. 5. Test with the sample syslog-formatted message sent using `netcat`: ```none echo "<34>1 2003-10-11T22:14:15.003Z mymachine.example.com su - ID47 - Your refrigerator is running" | nc -v -w 0 localhost 5454 ``` 6. Confirm that the message is logged to Apache Kafka®: ```none kafka-avro-console-consumer \ --bootstrap-server localhost:9092 \ --property schema.registry.url=http://localhost:8081 \ --topic syslog --from-beginning | jq '.' ``` ## Quick start This quick start uses the TIBCO Source connector to consume records from TIBCO Enterprise Message Service™ - Community Edition and sends them to Kafka. 1. Download TIBCO Enterprise Message Service™ - Community Edition ([Mac](https://www.tibco.com/resources/product-download/tibco-enterprise-message-service-community-edition-free-download-mac) or [Linux](https://www.tibco.com/resources/product-download/tibco-enterprise-message-service-community-edition-free-download-linux)) and run the appropriate installer. For more details, see the [TIBCO Enterprise Message Service™ Installation Guide](https://docs.tibco.com/pub/ems-zlinux/8.5.0/doc/pdf/TIB_ems_8.5_installation.pdf). Similar documentation is available for each version of TIBCO EMS. 2. Install the connector through the [Confluent Hub Client](https://docs.confluent.io/current/connect/managing/confluent-hub/client.html). ```bash # run from your Confluent Platform installation directory confluent connect plugin install confluentinc/kafka-connect-tibco-source:latest ``` 3. [Install the TIBCO JMS Client Library](#installing-tibco-client-lib). 4. Start Confluent Platform. ```bash confluent local start ``` 5. Create a `connector-quickstart` queue with the TIBCO Admin Tool. ```bash # connect to TIBCO with the Admin Tool (PASSWORD IS EMPTY) tibco/ems/8.4/bin/tibemsadmin -server "tcp://localhost:7222" -user admin > create queue connector-quickstart ``` 6. Compile the TIBCO Java samples so that they can be run in the following step. ```bash # setup Java's classpath so that the Java compiler can find the imports of the samples cd tibco/ems/8.4/samples/java export TIBEMS_JAVA=tibco/ems/8.4/lib CLASSPATH=${TIBEMS_JAVA}/jms-2.0.jar:${CLASSPATH} CLASSPATH=.:${TIBEMS_JAVA}/tibjms.jar:${TIBEMS_JAVA}/tibjmsadmin.jar:${CLASSPATH} export CLASSPATH # compile the java classes (run from the tibco/ems/8.4/samples/java directory) javac *.java ``` 7. Produce a set of messages to the `connector-quickstart` queue. ```bash cd tibco/ems/8.4/samples/java # produce 5 test messages java tibjmsMsgProducer -user admin -queue connector-quickstart m1 m2 m3 m4 m5 tibjmsMsgProducer SAMPLE Server....................... localhost User......................... admin Destination.................. connector-quickstart Send Asynchronously.......... false Message Text................. m1 m2 m3 m4 m5 Publishing to destination 'connector-quickstart' Published message: m1 Published message: m2 Published message: m3 Published message: m4 Published message: m5 ``` 8. Create a `tibco-source.json` file with the following contents: ```json { "name": "TibcoSourceConnector", "config": { "connector.class": "io.confluent.connect.tibco.TibcoSourceConnector", "tasks.max": "1", "kafka.topic": "from-tibco-messages", "tibco.url": "tcp://localhost:7222", "tibco.username": "admin", "tibco.password": "", "jms.destination.type": "queue", "jms.destination.name": "connector-quickstart", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "value.converter": "org.apache.kafka.connect.storage.StringConverter", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1" } } ``` 9. Load the TIBCO Source connector. ```bash confluent local load tibco --config tibco-source.json ``` 10. Confirm that the connector is in a `RUNNING` state. ```bash confluent local status TibcoSourceConnector ``` 11. Confirm the messages were delivered to the `from-tibco-messages` topic in Kafka. ```bash confluent local consume from-tibco-messages --from-beginning ``` # The output topic in Kafka topic=connect-test ``` If choosing to use this tutorial without Schema Registry, you must also specify the `key.converter` and `value.converter` properties to use `org.apache.kafka.connect.json.JsonConverter`. This will override the converters’ settings for this connector only. You are now ready to load the connector, but before you do that, update the file with some sample data. Note that the connector configuration specifies a relative path for the file, so you should create the file in the same directory that you will run the Kafka Connect worker from. ```bash for i in {1..3}; do echo "log line $i"; done > test.txt ``` Next, start an instance of the FileStreamSourceConnector using the configuration file you defined previously. You can easily do this from the command line using the following commands: ```bash confluent local load file-source { "name": "file-source", "config": { "connector.class": "FileStreamSource", "tasks.max": "1", "file": "test.txt", "topics": "connect-test", "name": "file-source" }, "tasks": [] } ``` Upon success it will print a snapshot of the connector’s configuration. To confirm which connectors are loaded any time, run: ```bash confluent local status [ "file-source" ] ``` You will get a list of all the loaded connectors in this worker. The same command supplied with the connector name will give you the status of this connector, including an indication of whether the connector has started successfully or has encountered a failure. For instance, running this command on the connector you just loaded would give you the following: ```bash confluent local status file-source { "name": "file-source", "connector": { "state": "RUNNING", "worker_id": "192.168.10.1:8083" }, "tasks": [ { "state": "RUNNING", "id": 0, "worker_id": "192.168.10.1:8083" } ] } ``` Soon after the connector starts, each of the three lines in our log file should be delivered to Kafka, having registered a schema with Schema Registry. One way to validate that the data is there is to use the console consumer in another console to inspect the contents of the topic: ```bash kafka-avro-console-consumer --bootstrap-server localhost:9092 --topic connect-test --from-beginning "log line 1" "log line 2" "log line 3" ``` Note that `kafka-avro-console-consumer` is used because the data has been stored in Kafka using Avro format. This consumer uses the Avro converter that is bundled with Schema Registry in order to properly lookup the schema for the Avro data. ### Examples The following example shows a line added that overrides the default worker `compression.type` property. After the connector configuration is updated, the [Replicator](../../multi-dc-deployments/replicator/index.md#replicator-detail) connector will use gzip compression. ```json { "name": "Replicator", "config": { "connector.class": "io.confluent.connect.replicator.ReplicatorSourceConnector", "topic.whitelist": "_schemas", "topic.rename.format": "\${topic}.replica", "key.converter": "io.confluent.connect.replicator.util.ByteArrayConverter", "value.converter": "io.confluent.connect.replicator.util.ByteArrayConverter", "src.kafka.bootstrap.servers": "srcKafka1:10091", "dest.kafka.bootstrap.servers": "destKafka1:11091", "tasks.max": "1", "producer.override.compression.type": "gzip", "confluent.topic.replication.factor": "1", "schema.subject.translator.class": "io.confluent.connect.replicator.schemas.DefaultSubjectTranslator", "schema.registry.topic": "_schemas", "schema.registry.url": "http://destSchemaregistry:8086" } ``` The following example shows a line added that overrides the default worker `auto.offset.reset` property. After the connector configuration is updated, the [Elasticsearch](https://docs.confluent.io/kafka-connectors/elasticsearch/current/) connector will use `latest` instead of the default connect worker property value `earliest`. ```json { "name": "Elasticsearch", "config": { "connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector", "topics": "orders", "consumer.override.auto.offset.reset": "latest", "tasks.max": 1, "connection.url": "http://elasticsearch:9200", "type.name": "type.name=kafkaconnect", "key.ignore": "true", "schema.ignore": "false", "transforms": "renameTopic", "transforms.renameTopic.type": "org.apache.kafka.connect.transforms.RegexRouter", "transforms.renameTopic.regex": "orders", "transforms.renameTopic.replacement": "orders-latest" }' ``` When the worker override configuration property is set to `connector.client.config.override.policy=Principal`, each of the connectors can use a different service principal. The following example shows a sink connector service principal override when implementing [Role-Based Access Control (RBAC)](../rbac/connect-rbac-connectors.md#connect-rbac-connectors): ```none consumer.override.sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required \ username="" \ password="" \ metadataServerUrls=""; ``` ### Bouncy Castle FIPS provider To generate a PKCS#8-format encrypted keypair that works with the Bouncy Castle FIPS provider, run the following `openssl` commands. The first command generates a private key in PKCS#8 format. The second command prompts for a password. ```shell $ openssl genpkey -algorithm RSA -out private_key.pem $ openssl pkcs8 -topk8 -in rsakey.pem \ -out pvtkey-pkcs8-aes256.pem -v2 aes256 ``` To see a sample configuration for the Bouncy Castle FIPS provider, expand the following section:
Sample configuration for Bouncy Castle FIPS provider Example configuration for Bouncy Castle FIPS provider

      advertised.listeners=INTERNAL://REDACTED.us-west-2.compute.internal:9092,BROKER://REDACTED.us-west-2.compute.internal:9091,CUSTOM://REDACTED.us-west-2.compute.amazonaws.com:9093,TOKEN://REDACTED.us-west-2.compute.amazonaws.com:9094
      authorizer.class.name=io.confluent.kafka.security.authorizer.ConfluentServerAuthorizer
      broker.id=1
      confluent.ansible.managed=true
      confluent.authorizer.access.rule.providers=CONFLUENT
      confluent.balancer.topic.replication.factor=3
      confluent.basic.auth.credentials.source=USER_INFO
      confluent.basic.auth.user.info=schema-registry:password
      confluent.license.topic=_confluent-command
      confluent.license.topic.replication.factor=3
      confluent.metadata.server.advertised.listeners=https://REDACTED.us-west-2.compute.internal:8090
      confluent.metadata.server.authentication.method=BEARER
      confluent.metadata.server.listeners=https://0.0.0.0:8090
      confluent.metadata.server.sni.host.check.enabled=false
      confluent.metadata.server.ssl.key.password=REDACTED
      confluent.metadata.server.ssl.keystore.location=/var/ssl/private/kafka_broker.keystore_BCFKS.bcfks
      confluent.metadata.server.ssl.keystore.password=REDACTED
      confluent.metadata.server.ssl.truststore.location=/var/ssl/private/kafka_broker.truststore_BCFKS.bcfks
      confluent.metadata.server.ssl.truststore.password=REDACTED
      confluent.metadata.server.ssl.truststore.type=BCFKS
      confluent.metadata.server.ssl.keystore.type=BCFKS
      confluent.metadata.server.token.key.path=/var/ssl/private/encrypted_aes256_tokenKeypair.pem
      confluent.metadata.server.token.key.passphrase=REDACTED
      security.providers=io.confluent.kafka.security.fips.provider.BcFipsProviderCreator
      #confluent.metadata.server.security.providers=io.confluent.kafka.security.fips.provider.BcFipsProviderCreator
      confluent.metadata.server.token.max.lifetime.ms=3600000
      confluent.metadata.server.token.signature.algorithm=RS256
      confluent.metadata.topic.replication.factor=3
      confluent.metrics.reporter.bootstrap.servers=REDACTED.us-west-2.compute.internal:9091,REDACTED.us-west-2.compute.internal:9091,REDACTED.us-west-2.compute.internal:9091
      confluent.metrics.reporter.security.protocol=SSL
      confluent.metrics.reporter.ssl.key.password=REDACTED
      confluent.metrics.reporter.ssl.keystore.location=/var/ssl/private/kafka_broker.keystore_BCFKS.bcfks
      confluent.metrics.reporter.ssl.keystore.password=REDACTED
      confluent.metrics.reporter.ssl.truststore.location=/var/ssl/private/kafka_broker.truststore_BCFKS.bcfks
      confluent.metrics.reporter.ssl.truststore.password=REDACTED
      confluent.metrics.reporter.ssl.keystore.type=BCFKS
      confluent.metrics.reporter.ssl.truststore.type=BCFKS
      confluent.metrics.reporter.topic.replicas=3
      confluent.schema.registry.url=https://REDACTED.us-west-2.compute.internal:8081
      confluent.security.event.logger.exporter.kafka.topic.replicas=3
      confluent.ssl.key.password=REDACTED
      confluent.ssl.keystore.location=/var/ssl/private/kafka_broker.keystore_BCFKS.bcfks
      confluent.ssl.keystore.password=REDACTED
      confluent.ssl.truststore.location=/var/ssl/private/kafka_broker.truststore_BCFKS.bcfks
      confluent.ssl.truststore.password=REDACTED
      confluent.support.customer.id=anonymous
      confluent.support.metrics.enable=true
      group.initial.rebalance.delay.ms=3000
      inter.broker.listener.name=BROKER
      kafka.rest.bootstrap.servers=REDACTED.us-west-2.compute.internal:9092,REDACTED.us-west-2.compute.internal:9092,REDACTED.us-west-2.compute.internal:9092
      kafka.rest.client.security.protocol=SASL_SSL
      kafka.rest.client.ssl.key.password=REDACTED
      kafka.rest.client.ssl.keystore.location=/var/ssl/private/kafka_broker.keystore_BCFKS.bcfks
      kafka.rest.client.ssl.keystore.password=REDACTED
      kafka.rest.client.ssl.truststore.location=/var/ssl/private/kafka_broker.truststore_BCFKS.bcfks
      kafka.rest.client.ssl.truststore.password=REDACTED
      kafka.rest.client.ssl.keystore.type=BCFKS
      kafka.rest.client.ssl.truststore.type=BCFKS
      kafka.rest.confluent.metadata.basic.auth.user.info=pkcs8-7-5-x-74-test-cluster-main.kafka_erp:Confluent1!
      kafka.rest.confluent.metadata.bootstrap.server.urls=https://REDACTED.us-west-2.compute.internal:8090,https://REDACTED.us-west-2.compute.internal:8090,https://REDACTED.us-west-2.compute.internal:8090
      kafka.rest.confluent.metadata.http.auth.credentials.provider=BASIC
      kafka.rest.confluent.metadata.ssl.truststore.location=/var/ssl/private/kafka_broker.truststore_BCFKS.bcfks
      kafka.rest.confluent.metadata.ssl.truststore.password=REDACTED
      kafka.rest.enable=true
      kafka.rest.kafka.rest.resource.extension.class=io.confluent.kafkarest.security.KafkaRestSecurityResourceExtension
      kafka.rest.public.key.path=/var/ssl/private/public.pem
      kafka.rest.rest.servlet.initializor.classes=io.confluent.common.security.jetty.initializer.InstallBearerOrBasicSecurityHandler
      ldap.com.sun.jndi.ldap.read.timeout=3000
      ldap.group.member.attribute.pattern=uid=(.),OU=rbac,DC=confluent,DC=io
      ldap.group.name.attribute=cn
      ldap.group.search.base=OU=rbac,DC=confluent,DC=io
      ldap.java.naming.factory.initial=com.sun.jndi.ldap.LdapCtxFactory
      ldap.java.naming.provider.url=ldap://ip-10-0-242-18.us-west-2.compute.internal:389
      ldap.java.naming.security.authentication=simple
      ldap.java.naming.security.credentials=Confluent1!
      ldap.java.naming.security.principal=uid=mds,OU=rbac,DC=confluent,DC=io
      ldap.user.memberof.attribute.pattern=cn=(.),OU=rbac,DC=confluent,DC=io
      ldap.user.name.attribute=uid
      ldap.user.object.class=account
      ldap.user.search.base=OU=rbac,DC=confluent,DC=io
      listener.name.broker.ssl.client.auth=required
      listener.name.broker.ssl.key.password=REDACTED
      listener.name.broker.ssl.keystore.location=/var/ssl/private/kafka_broker.keystore_BCFKS.bcfks
      listener.name.broker.ssl.keystore.password=REDACTED
      listener.name.broker.ssl.truststore.location=/var/ssl/private/kafka_broker.truststore_BCFKS.bcfks
      listener.name.broker.ssl.truststore.password=REDACTED
      listener.name.custom.ssl.client.auth=required
      listener.name.custom.ssl.key.password=REDACTED
      listener.name.custom.ssl.keystore.location=/var/ssl/private/kafka_broker.keystore_BCFKS.bcfks
      listener.name.custom.ssl.keystore.password=REDACTED
      listener.name.custom.ssl.truststore.location=/var/ssl/private/kafka_broker.truststore_BCFKS.bcfks
      listener.name.custom.ssl.truststore.password=REDACTED
      listener.name.internal.oauthbearer.sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required publicKeyPath="/var/ssl/private/public.pem";
      listener.name.internal.oauthbearer.sasl.login.callback.handler.class=io.confluent.kafka.server.plugins.auth.token.TokenBearerServerLoginCallbackHandler
      listener.name.internal.oauthbearer.sasl.server.callback.handler.class=io.confluent.kafka.server.plugins.auth.token.TokenBearerValidatorCallbackHandler
      listener.name.internal.principal.builder.class=io.confluent.kafka.security.authenticator.OAuthKafkaPrincipalBuilder
      listener.name.internal.sasl.enabled.mechanisms=OAUTHBEARER
      listener.name.internal.ssl.client.auth=required
      listener.name.internal.ssl.key.password=REDACTED
      listener.name.internal.ssl.keystore.location=/var/ssl/private/kafka_broker.keystore.pk12
      listener.name.internal.ssl.keystore.password=REDACTED
      listener.name.internal.ssl.truststore.location=/var/ssl/private/kafka_broker.truststore.pk12
      listener.name.internal.ssl.truststore.password=REDACTED
      listener.name.token.oauthbearer.sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required publicKeyPath="/var/ssl/private/public.pem";
      listener.name.token.oauthbearer.sasl.login.callback.handler.class=io.confluent.kafka.server.plugins.auth.token.TokenBearerServerLoginCallbackHandler
      listener.name.token.oauthbearer.sasl.server.callback.handler.class=io.confluent.kafka.server.plugins.auth.token.TokenBearerValidatorCallbackHandler
      listener.name.token.principal.builder.class=io.confluent.kafka.security.authenticator.OAuthKafkaPrincipalBuilder
      listener.name.token.sasl.enabled.mechanisms=OAUTHBEARER
      listener.name.token.ssl.client.auth=required
      listener.name.token.ssl.key.password=REDACTED
      listener.name.token.ssl.keystore.location=/var/ssl/private/kafka_broker.keystore_BCFKS.bcfks
      listener.name.token.ssl.keystore.password=REDACTED
      listener.name.token.ssl.truststore.location=/var/ssl/private/kafka_broker.truststore_BCFKS.bcfks
      listener.name.token.ssl.truststore.password=REDACTED
      listener.security.protocol.map=INTERNAL:SASL_SSL,BROKER:SSL,CUSTOM:SSL,TOKEN:SASL_SSL
      listeners=INTERNAL://:9092,BROKER://:9091,CUSTOM://:9093,TOKEN://:9094
      log.dirs=/var/lib/kafka/data
      log.retention.check.interval.ms=300000
      log.retention.hours=168
      log.segment.bytes=1073741824
      metric.reporters=io.confluent.metrics.reporter.ConfluentMetricsReporter
      num.io.threads=16
      num.network.threads=8
      num.partitions=1
      num.recovery.threads.per.data.dir=2
      offsets.topic.replication.factor=3
      sasl.enabled.mechanisms=OAUTHBEARER
      socket.receive.buffer.bytes=102400
      socket.request.max.bytes=104857600
      socket.send.buffer.bytes=102400
      ssl.key.password=REDACTED
      ssl.keystore.location=/var/ssl/private/kafka_broker.keystore_BCFKS.bcfks
      ssl.keystore.password=REDACTED
      ssl.truststore.location=/var/ssl/private/kafka_broker.truststore_BCFKS.bcfks
      ssl.truststore.password=REDACTED
      super.users=User:mds;User:C=US,ST=Ca,L=PaloAlto,O=CONFLUENT,OU=TEST,CN=kafka_broker
      transaction.state.log.min.isr=2
      transaction.state.log.replication.factor=3
      zookeeper.clientCnxnSocket=org.apache.zookeeper.ClientCnxnSocketNetty
      zookeeper.connect=REDACTED.us-west-2.compute.internal:2182,REDACTED.us-west-2.compute.internal:2182,REDACTED.us-west-2.compute.internal:2182
      zookeeper.connection.timeout.ms=18000
      zookeeper.ssl.client.enable=true
      zookeeper.ssl.keystore.location=/var/ssl/private/kafka_broker.keystore.pk12
      zookeeper.ssl.keystore.password=REDACTED
      zookeeper.ssl.truststore.location=/var/ssl/private/kafka_broker.truststore.pk12
      zookeeper.ssl.truststore.password=REDACTED
      ssl.keystore.type=BCFKS
      ssl.truststore.type=BCFKS
      listener.name.internal.ssl.keystore.type=PKCS12
      listener.name.internal.ssl.truststore.type=PKCS12
      listener.name.broker.ssl.keystore.type=BCFKS
      listener.name.broker.ssl.truststore.type=BCFKS
      listener.name.custom.ssl.keystore.type=BCFKS
      listener.name.custom.ssl.truststore.type=BCFKS
      listener.name.token.ssl.keystore.type=BCFKS
      listener.name.token.ssl.truststore.type=BCFKS
      zookeeper.ssl.keystore.type=PKCS12
      zookeeper.ssl.truststore.type=PKCS12
      confluent.ssl.keystore.type=BCFKS
      confluent.ssl.truststore.type=BCFKS
   
# -d Enable debug logging from confluent_kafka import KafkaError, KafkaException, version from confluent_kafka import Producer, Consumer import json import logging import argparse import uuid import sys import re class CommandRecord (object): def __init__(self, stmt): self.stmt = stmt def __str__(self): return "({})".format(self.stmt) @classmethod def deserialize(cls, binstr): d = json.loads(binstr) return CommandRecord(d['statement']) class CommandConsumer(object): def __init__(self, ksqlServiceId, conf): self.consumer = Consumer(conf) self.topic = '_confluent-ksql-{}_command_topic'.format(ksqlServiceId) def consumer_run(self): max_offset = -1001 def latest_offsets(consumer, partitions): nonlocal max_offset for p in partitions: high_water = consumer.get_watermark_offsets(p)[1] if high_water >= max_offset: max_offset = high_water logging.debug("Max offset in command topic = %d", max_offset) self.consumer.subscribe([self.topic], on_assign=latest_offsets) self.msg_cnt = 0 self.msg_err_cnt = 0 stmts = {} try: while True: msg = self.consumer.poll(0.2) if msg is None: continue if msg.error() is not None: print("consumer: error: {}".format(msg.error())) self.consumer_err_cnt += 1 continue try: #print("Read msg with offset ", msg.offset()) self.msg_cnt += 1 record = CommandRecord.deserialize(msg.value()) #print(record) # match statements CREATE/DROP STREAM, CREATE/DROP TABLE match = re.search(r'(?:create|drop) (?:stream|table) ([a-zA-z0-9-]+?)(:?\(|AS|\s|;)', record.stmt, re.I) if match: name = match.group(1).upper() if name == "KSQL_PROCESSING_LOG": continue if name not in stmts: stmts[name] = [] stmts[name].append(record.stmt) # match statements TERMINATE query match2 = re.search(r'(?:terminate) (?:ctas|csas)_(.+?)_', record.stmt, re.I) if match2: name = match2.group(1).upper() stmts[name].append(record.stmt) # match statements INSERT INTO stream or table match3 = re.search(r'(?:insert into) ([a-zA-z0-9-]+?)(:?\(|\s|\()', record.stmt, re.I) if match3: name = match3.group(1).upper() stmts[name].append(record.stmt) #match statements CREATE TYPE match4 = re.search(r'(?:create|drop) type ([a-zA-z0-9-]+?)(:?AS|\s|;)', record.stmt, re.I) if match4: name = match4.group(1).upper() if name not in stmts: stmts[name] = [] stmts[name].append(record.stmt) if match is None and match2 is None and match3 is None and match4 is None: if 'UNRECOGNIZED' not in stmts: stmts['UNRECOGNIZED'] = [] stmts['UNRECOGNIZED'].append(record.stmt) # High watermark is +1 from last offset if msg.offset() >= max_offset-1: break except ValueError as ex: print("consumer: Failed to deserialize message in " "{} [{}] at offset {} (headers {}): {}".format( msg.topic(), msg.partition(), msg.offset(), msg.headers(), ex)) self.msg_err_cnt += 1 except KeyboardInterrupt: pass finally: self.consumer.close() logging.debug("Consumed {} messages, erroneous message = {}.".format(self.msg_cnt, self.msg_err_cnt)) outer_json = [] for key, value in stmts.items(): inner_json = {} inner_json['subject'] = key inner_json['statements'] = value outer_json.append(inner_json) print(json.dumps(outer_json )) if __name__ == '__main__': parser = argparse.ArgumentParser(description="Command topic consumer that dumps CREATE, DROP and TERMINATE queries to "+ "stdout. If no arguments are provided, default values are used. Default broker is " "'localhost:9092'. Default ksqlServiceId is 'default_'. You may optionally provide a configuration file with "+ "broker specific configuration parameters. Every run of this script will consume the topic from the beginning. ") parser.add_argument('-f', dest='confFile', type=argparse.FileType('r'), help='Configuration file (configProp=value format)') parser.add_argument('-b', dest='brokers', type=str, default=None, help='Bootstrap servers') parser.add_argument('-k', dest='ksqlServiceId', type=str, default=None, help='KsqlDB service ID') parser.add_argument("-d", dest='debug', action="store_true", default=False, help="Enable debug logging") args = parser.parse_args() if args.debug: logging.basicConfig(stream=sys.stderr, level=logging.DEBUG) conf = dict() if args.confFile is not None: # Parse client configuration file for line in args.confFile: line = line.strip() if len(line) == 0 or line[0] == '#': continue i = line.find('=') if i <= 0: raise ValueError("Configuration lines must be `name=value..`, not {}".format(line)) name = line[:i] value = line[i+1:] conf[name] = value if args.brokers is not None: # Overwrite any brokers specified in configuration file with # brokers from -b command line argument conf['bootstrap.servers'] = args.brokers elif 'bootstrap.servers' not in conf: conf['bootstrap.servers'] = 'localhost:9092' if args.ksqlServiceId is None: args.ksqlServiceId = 'default_' conf['auto.offset.reset'] = 'earliest' conf['enable.auto.commit']= 'False' conf['client.id'] = 'commandClient' # Generate a unique group.id conf['group.id'] = 'commandTopicConsumer.py-{}'.format(uuid.uuid4()) c = CommandConsumer(args.ksqlServiceId, conf) c.consumer_run() ``` If you prefer to recover the schema manually, use the following steps. 1. Capture streams SQL: 2. Run `list streams extended;` to list all of the streams. 3. Grab the SQL statement that created each stream from the output, ignoring `KSQL_PROCESSING_LOG`. 4. Capture tables SQL: 5. Run `list tables extended;` to list all of the tables. 6. Grab the SQL statement that created each table from the output. 7. Capture custom types SQL: 8. Run `list types;` to list all of the custom types. 9. Convert the output into `CREATE TYPE AS ` syntax by grabbing the name from the first column and the schema from the second column of the output. 10. Order by dependency: you’ll now have the list of SQL statements to rebuild the schema, but they are not yet ordered in terms of dependencies. You will need to reorder the statements to ensure each statement comes after any other statements it depends on. 11. Update the script to take into account any changes in syntax or functionality between the old and new clusters. The release notes can help here. It can also be useful to have a test ksqlDB cluster, pointing to a different test Kafka cluster, where you can try running the script to get feedback on any errors. Note: you may want to temporarily add `PARTITIONS=1` to the `WITH` clause of any `CREATE TABLE` or `CREATE STREAM` command, so that the command will run without requiring you to first create the necessary topics in the test Kafka cluster. 12. Stop the old cluster: if you do not do so then both the old and new cluster will be publishing to sink topics, resulting in undefined behavior. 13. Build the schema in the new instance. Now you have the SQL file you can run this against the new cluster to build a copy of the schema. This is best achieved with the [RUN SCRIPT](../../developer-guide/ksqldb-reference/run-script.md#ksqldb-reference-run-script) command, which takes a SQL file as an input. ### Initial Setup To get started with the `ksql-migrations` tool, use the `ksql-migrations new-project` command to set up the required directory structure and create a config file for using the migrations tool. ```none ksql-migrations new-project [--] ``` The two required arguments are the path that will be used as the root directory for your new migrations project, and your ksqlDB server URL. ```bash ksql-migrations new-project /my/migrations/project/path http://localhost:8088 ``` Your output should resemble: ```none Creating new migrations project at /my/migrations/project/path Creating directory: /my/migrations/project/path Creating directory: /my/migrations/project/path/migrations Creating file: /my/migrations/project/path/ksql-migrations.properties Writing to config file: ksql.server.url=http://localhost:8088 ... Migrations project directory created successfully Execution time: 0.0080 seconds ``` This command creates a config file, named `ksql-migrations.properties`, in the specified directory, and also creates an empty `/migrations` subdirectory. The config file is initialized with the ksqlDB server URL passed as part of the command. As a convenience, the config file is also initialized with default values for other [migrations tool configurations](#ksqldb-manage-metadata-schemas-config-reference) commented out. These additional, optional configurations include configs required to access secure ksqlDB servers, such as credentials for HTTP basic authentication or TLS keystores and truststores, as well as optional configurations specific to the migrations tool. See the [config reference](#ksqldb-manage-metadata-schemas-config-reference) for details on individual configs. See [here](#ksqldb-manage-metadata-schemas-connect-to-cloud) for the configs required to connect to a Confluent Cloud ksqlDB cluster. ## Step 1: Create a docker-compose file The minimum set of services for running ksqlDB comprises a Kafka broker and ksqlDB Server. The ksqlDB CLI is required for developing applications with SQL code. The following `docker-compose` file specifies the Docker images that you need for a minimal local environment: - confluentinc/cp-kafka - confluentinc/cp-ksqldb-server - confluentinc/cp-ksqldb-cli 1. Run the following command to create a file named `docker-compose.yml`. ```bash touch docker-compose.yml ``` 2. Copy the following YAML into docker-compose.yml and save the file. ```yaml version: '2' services: broker: image: confluentinc/cp-kafka:8.1.0 hostname: broker container_name: broker ports: - "9092:9092" - "9101:9101" environment: KAFKA_NODE_ID: 1 KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: 'CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT' KAFKA_ADVERTISED_LISTENERS: 'PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092' KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0 KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1 KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1 KAFKA_JMX_PORT: 9101 KAFKA_JMX_HOSTNAME: localhost KAFKA_PROCESS_ROLES: 'broker,controller' KAFKA_CONTROLLER_QUORUM_VOTERS: '1@broker:29093' KAFKA_LISTENERS: 'PLAINTEXT://broker:29092,CONTROLLER://broker:29093,PLAINTEXT_HOST://0.0.0.0:9092' KAFKA_INTER_BROKER_LISTENER_NAME: 'PLAINTEXT' KAFKA_CONTROLLER_LISTENER_NAMES: 'CONTROLLER' KAFKA_LOG_DIRS: '/tmp/kraft-combined-logs' # Replace CLUSTER_ID with a unique base64 UUID using "bin/kafka-storage.sh random-uuid" # See https://docs.confluent.io/kafka/operations-tools/kafka-tools.html#kafka-storage-sh CLUSTER_ID: 'MkU3OEVBNTcwNTJENDM2Qk' ksqldb-server: image: confluentinc/cp-ksqldb-server:8.1.0 hostname: ksqldb-server container_name: ksqldb-server depends_on: - broker ports: - "8088:8088" environment: KSQL_CONFIG_DIR: "/etc/ksql" KSQL_BOOTSTRAP_SERVERS: "broker:29092" KSQL_HOST_NAME: ksqldb-server KSQL_LISTENERS: "http://0.0.0.0:8088" KSQL_CACHE_MAX_BYTES_BUFFERING: 0 KSQL_KSQL_SCHEMA_REGISTRY_URL: "http://schema-registry:8081" KSQL_KSQL_CONNECT_URL: "http://connect:8083" KSQL_KSQL_LOGGING_PROCESSING_TOPIC_REPLICATION_FACTOR: 1 KSQL_KSQL_LOGGING_PROCESSING_TOPIC_AUTO_CREATE: 'true' KSQL_KSQL_LOGGING_PROCESSING_STREAM_AUTO_CREATE: 'true' ksqldb-cli: image: confluentinc/cp-ksqldb-cli:8.1.0 container_name: ksqldb-cli depends_on: - broker - ksqldb-server entrypoint: /bin/sh tty: true ``` #### IMPORTANT ksqlDB logs only error messages and doesn’t use the log level from the log4.properties file, which means that you can’t change the log level of the processing log. - For local deployments, edit the [log4j.properties](https://github.com/confluentinc/ksql/blob/master/config/log4j.properties) config file to assign Log4J properties. - For Docker deployments, set the corresponding environment variables. For more information, see [Configure ksqlDB with Docker](../operate-and-deploy/installation/install-ksqldb-with-docker.md#ksqldb-install-configure-with-docker) and [Configure Docker Logging](../../installation/docker/operations/logging.md#docker-operations-logging). All entries are written under the `processing` logger hierarchy. Restart the ksqlDB Server for your configuration changes to take effect. The following example shows how to configure the processing log to emit all events at ERROR level or higher to an appender that writes to `stdout`: ```properties log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=[%d] %p %m (%c:%L)%n log4j.logger.processing=ERROR, stdout log4j.additivity.processing=false ``` If you’re using a Docker deployment, set the following environment variables in your docker-compose.yml: ```properties environment: # --- ksqlDB Server log config --- KSQL_LOG4J_ROOT_LOGLEVEL: "ERROR" KSQL_LOG4J_LOGGERS: "org.apache.kafka.connect.runtime.rest=WARN,org.reflections=ERROR" # --- ksqlDB processing log config --- KSQL_LOG4J_PROCESSING_LOG_BROKERLIST: kafka:29092 KSQL_LOG4J_PROCESSING_LOG_TOPIC: KSQL_KSQL_LOGGING_PROCESSING_TOPIC_NAME: KSQL_KSQL_LOGGING_PROCESSING_TOPIC_AUTO_CREATE: "true" KSQL_KSQL_LOGGING_PROCESSING_STREAM_AUTO_CREATE: "true" ``` For more information, see [Create a log4J configuration](https://developer.confluent.io/tutorials/handling-deserialization-errors/ksql.html#create-a-log4j-configuration) in the [How to handle deserialization errors](https://developer.confluent.io/tutorials/handling-deserialization-errors/ksql.html) tutorial. For the full Docker example configuration, see the [Multi-node ksqlDB and Kafka Connect clusters](https://github.com/confluentinc/demo-scene/blob/master/multi-cluster-connect-and-ksql/docker-compose.yml) demo. ### NONE | Feature | Supported | |---------------------------------------------------------------------------------------------|-------------| | As value format | No | | As key format | Yes | | Multi-Column Keys | N/A | | [Schema Registry required](../operate-and-deploy/installation/server-config/avro-schema.md) | No | | [Schema inference](/reference/server-configuration#ksqlpersistencedefaultformatkey) | No | | [Single field wrapping](#ksqldb-serialization-formats-single-field-unwrapping) | No | | [Single field unwrapping](#ksqldb-serialization-formats-single-field-unwrapping) | No | The `NONE` format is a special marker format that is used to indicate ksqlDB should not attempt to deserialize that part of the Kafka record. It’s main use is as the `KEY_FORMAT` of key-less streams, especially where a default key format has been set, via [ksql.persistence.default.format.key](server-configuration.md#ksqldb-reference-server-configuration-persistence-default-format-key) that supports Schema inference. If the key format was not overridden, the server would attempt to load the key schema from the Schema Registry. If the schema existed, the key columns would be inferred from the schema, which may not be the intent. If the schema did not exist, the statement would be rejected. In such situations, the key format can be set to `NONE`: ```sql CREATE STREAM KEYLESS_STREAM ( VAL STRING ) WITH ( KEY_FORMAT='NONE', VALUE_FORMAT='JSON', KAFKA_TOPIC='foo' ); ``` Any statement that sets the key format to `NONE` and has key columns defined, will result in an error. If a `CREATE TABLE AS` or `CREATE STREAM AS` statement has a source with a key format of `NONE`, but the newly created table or stream has key columns, then you may either explicitly define the key format to use in the `WITH` clause, or the default key format, as set in [ksql.persistence.default.format.key](server-configuration.md#ksqldb-reference-server-configuration-persistence-default-format-key) will be used. Conversely, a `CREATE STREAM AS` statement that removes the key columns, i.e. via `PARTITION BY null` will automatically set the key format to `NONE`. ```sql -- keyless stream with NONE key format: CREATE STREAM KEYLESS_STREAM ( VAL STRING ) WITH ( KEY_FORMAT='NONE', VALUE_FORMAT='JSON', KAFKA_TOPIC='foo' ); -- Table created from stream with explicit key format declared in WITH clause: CREATE TABLE T WITH (KEY_FORMAT='KAFKA') AS SELECT VAL, COUNT() FROM KEYLESS_STREAM GROUP BY VAL; -- or, using the default key format set in the ksql.persistence.default.format.key config: CREATE TABLE T AS SELECT VAL, COUNT() FROM KEYLESS_STREAM GROUP BY VAL; ``` ### Start the stack Next, set up and launch the services in the stack. But before you bring it up, you need to make a few changes to the way that Postgres launches so that it works well with Debezium. Debezium has dedicated [documentation](https://debezium.io/documentation/reference/1.1/connectors/postgresql.html) on this if you’re interested, but this guide covers just the essentials. To simplify some of this, you launch a Postgres Docker container [extended by Debezium](https://hub.docker.com/r/debezium/postgres) to handle some of the customization. Also, you must create an additional configuration file at `postgres/custom-config.conf` with the following content: ```none listen_addresses = '*' wal_level = 'logical' max_wal_senders = 1 max_replication_slots = 1 ``` This sets up Postgres so that Debezium can watch for changes as they occur. With the Postgres configuration file in place, create a `docker-compose.yml` file that defines the services to launch. You may need to increase the amount of memory that you give to Docker when you launch it: ```yaml version: '2' services: mongo: image: mongo:4.2.5 hostname: mongo container_name: mongo ports: - "27017:27017" environment: MONGO_INITDB_ROOT_USERNAME: mongo-user MONGO_INITDB_ROOT_PASSWORD: mongo-pw MONGO_REPLICA_SET_NAME: my-replica-set command: --replSet my-replica-set --bind_ip_all postgres: image: debezium/postgres:12 hostname: postgres container_name: postgres ports: - "5432:5432" environment: POSTGRES_USER: postgres-user POSTGRES_PASSWORD: postgres-pw POSTGRES_DB: customers volumes: - ./postgres/custom-config.conf:/etc/postgresql/postgresql.conf command: postgres -c config_file=/etc/postgresql/postgresql.conf elastic: image: elasticsearch:7.6.2 hostname: elastic container_name: elastic ports: - "9200:9200" - "9300:9300" environment: discovery.type: single-node broker: image: confluentinc/cp-kafka:8.1.0 hostname: broker container_name: broker ports: - "29092:29092" environment: KAFKA_BROKER_ID: 1 KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:9092,PLAINTEXT_HOST://localhost:29092 KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0 KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1 KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1 schema-registry: image: confluentinc/cp-schema-registry:8.1.0 hostname: schema-registry container_name: schema-registry depends_on: - broker ports: - "8081:8081" environment: SCHEMA_REGISTRY_HOST_NAME: schema-registry SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: "PLAINTEXT://broker:9092" ksqldb-server: image: confluentinc/cp-ksqldb-server:8.1.0 hostname: ksqldb-server container_name: ksqldb-server depends_on: - broker - schema-registry ports: - "8088:8088" volumes: - "./confluent-hub-components/:/usr/share/kafka/plugins/" environment: KSQL_LISTENERS: "http://0.0.0.0:8088" KSQL_BOOTSTRAP_SERVERS: "broker:9092" KSQL_KSQL_SCHEMA_REGISTRY_URL: "http://schema-registry:8081" KSQL_KSQL_LOGGING_PROCESSING_STREAM_AUTO_CREATE: "true" KSQL_KSQL_LOGGING_PROCESSING_TOPIC_AUTO_CREATE: "true" KSQL_CONNECT_GROUP_ID: "ksql-connect-cluster" KSQL_CONNECT_BOOTSTRAP_SERVERS: "broker:9092" KSQL_CONNECT_KEY_CONVERTER: "org.apache.kafka.connect.storage.StringConverter" KSQL_CONNECT_VALUE_CONVERTER: "io.confluent.connect.avro.AvroConverter" KSQL_CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL: "http://schema-registry:8081" KSQL_CONNECT_CONFIG_STORAGE_TOPIC: "_ksql-connect-configs" KSQL_CONNECT_OFFSET_STORAGE_TOPIC: "_ksql-connect-offsets" KSQL_CONNECT_STATUS_STORAGE_TOPIC: "_ksql-connect-statuses" KSQL_CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1 KSQL_CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1 KSQL_CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1 KSQL_CONNECT_PLUGIN_PATH: "/usr/share/kafka/plugins" ksqldb-cli: image: confluentinc/cp-ksqldb-cli:8.1.0 container_name: ksqldb-cli depends_on: - broker - ksqldb-server entrypoint: /bin/sh tty: true ``` There are a couple things to notice here. The Postgres image mounts the custom configuration file that you wrote. Postgres adds these configuration settings into its system-wide configuration. The environment variables you gave it also set up a blank database called `customers`, along with a user named `postgres-user` that can access it. The compose file also sets up MongoDB as a replica set named `my-replica-set`. Debezium requires that MongoDB runs in this configuration to pick up changes from its oplog (see Debezium’s [documentation](https://debezium.io/documentation/reference/1.1/connectors/mongodb.html) on MongoDB). In this case, you’re just running a single-node replica set. Finally, note that the ksqlDB server image mounts the `confluent-hub-components` directory, too. The jar files that you downloaded need to be on the classpath of ksqlDB when the server starts up. Bring up the entire stack by running: ```bash docker-compose up ``` ### Create the transactions stream Connect to ksqlDB’s server by using its interactive CLI. Run the following command from your host: ```bash docker exec -it ksqldb-cli ksql http://ksqldb-server:8088 ``` Before you issue more commands, tell ksqlDB to start all queries from earliest point in each topic: ```sql SET 'auto.offset.reset' = 'earliest'; ``` We want to model a stream of credit card transactions from which we’ll look for anomalous activity. To do that, create a ksqlDB stream to represent the transactions. Each transaction has a few key pieces of information, like the card number, amount, and email address that it’s associated with. Because the specified topic (`transactions`) does not exist yet, ksqlDB creates it on your behalf. ```sql CREATE STREAM transactions ( tx_id VARCHAR KEY, email_address VARCHAR, card_number VARCHAR, timestamp VARCHAR, amount DECIMAL(12, 2) ) WITH ( kafka_topic = 'transactions', partitions = 8, value_format = 'avro', timestamp = 'timestamp', timestamp_format = 'yyyy-MM-dd''T''HH:mm:ss' ); ``` Notice that this stream is configured with a custom `timestamp` to signal that [event-time](../concepts/time-and-windows-in-ksqldb-queries.md#ksqldb-time-and-windows-event-time) should be used instead of [processing-time](../concepts/time-and-windows-in-ksqldb-queries.md#ksqldb-time-and-windows-processing-time). What this means is that when ksqlDB does time-related operations over the stream, it uses the `timestamp` column to measure time, not the current time of the operating system. This makes it possible to handle out-of-order events. The stream is also configured to use the `Avro` format for the value part of the underlying Kafka records that it generates. Because ksqlDB has been configured with Schema Registry (as part of the Docker Compose file), the schemas of each stream and table are centrally tracked. We’ll make use of this in our microservice later. ### Create topics and mirror data to on-premises 1. In Confluent Cloud, use the unified Confluent CLI to create a topic with one partition called `cloud-topic`. ```bash confluent kafka topic create cloud-topic --partitions 1 ``` 2. In another command window on Confluent Cloud, start a producer to send some data into `cloud-topic`. ```bash confluent kafka topic produce cloud-topic --cluster $CC_CLUSTER_ID ``` - Verify that the producer has started. Your output will resemble the following to show that the producer is ready. ```bash $ confluent kafka topic produce cloud-topic --cluster lkc-1vgo6 Starting Kafka Producer. Use Ctrl-C or Ctrl-D to exit. ``` - Type some entries of your choice into the producer window, hitting return after each entry to send. ```bash Riesling Pinot Blanc Verdejo ``` 3. Mirror the `cloud-topic` on Confluent Platform, using the command `kafka-mirrors --create --mirror-topic `. The following command establishes a mirror of the original `cloud-topic`, using the cluster link `from-cloud-link`. ```bash kafka-mirrors --create --mirror-topic cloud-topic --link from-cloud-link --bootstrap-server localhost:9092 --command-config $CONFLUENT_CONFIG/CP-command.config ``` You should get this verification that the mirror topic was created. ```bash Created topic cloud-topic. ``` 4. On Confluent Platform, check the mirror topic status by running `kafka-mirrors --describe` on the `from-cloud-link`. ```bash kafka-mirrors --describe --link from-cloud-link --bootstrap-server localhost:9092 --command-config $CONFLUENT_CONFIG/CP-command.config ``` Your output will show the status of any mirror topics on the specified link. ```bash Topic: cloud-topic LinkName: from-cloud-link LinkId: b1a56076-4d6f-45e0-9013-ff305abd0e54 MirrorTopic: cloud-topic State: ACTIVE StateTime: 2021-10-07 16:36:20 Partition: 0 State: ACTIVE DestLogEndOffset: 2 LastFetchSourceHighWatermark: 2 Lag: 0 TimeSinceLastFetchMs: 384566 ``` 5. Consume the data from the on-prem mirror topic. ```bash kafka-console-consumer --topic cloud-topic --from-beginning --bootstrap-server localhost:9092 --consumer.config $CONFLUENT_CONFIG/CP-command.config ``` Your output should match the entries you typed into the Confluent Cloud producer in step 8. ![image](multi-dc-deployments/cluster-linking/images/cluster-link-hybrid-produce-consume.png) 6. View the configuration of your cluster link: ```bash kafka-configs --describe --cluster-link from-cloud-link --bootstrap-server localhost:9092 --command-config $CONFLUENT_CONFIG/CP-command.config ``` The output for this command is a list of configurations, partially shown in the following example. ```bash Dynamic configs for cluster-link from-cloud-link are: metadata.max.age.ms=300000 sensitive=false synonyms={} reconnect.backoff.max.ms=1000 sensitive=false synonyms={} auto.create.mirror.topics.filters= sensitive=false synonyms={} ssl.engine.factory.class=null sensitive=false synonyms={} sasl.kerberos.ticket.renew.window.factor=0.8 sensitive=false synonyms={} reconnect.backoff.ms=50 sensitive=false synonyms={} consumer.offset.sync.ms=30000 sensitive=false synonyms={} ... link.mode=DESTINATION sensitive=false synonyms={} security.protocol=SASL_SSL sensitive=false synonyms={} acl.sync.ms=5000 sensitive=false synonyms={} ssl.keymanager.algorithm=SunX509 sensitive=false synonyms={} sasl.login.callback.handler.class=null sensitive=false synonyms={} replica.fetch.max.bytes=5242880 sensitive=false synonyms={} availability.check.consecutive.failure.threshold=5 sensitive=false synonyms={} sasl.login.refresh.window.jitter=0.05 sensitive=false synonyms={} ``` ## About prerequisites and command examples - These instructions assume you have a local installation of [Confluent Platform 7.0.0 or later](https://www.confluent.io/download/#confluent-platform), the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/installing.html), and Java 8, 11, or 17 (recommended). For details on Java requirements, see [Java](../../installation/system-requirements.md#sys-req-java) in System Requirements for Confluent Platform. If you are new to Confluent Platform, you may want to work through the [Quick Start for Confluent Platform](../../get-started/platform-quickstart.md#quickstart) first, and then return to this tutorial. - The examples assume that your properties files are in the default locations on your Confluent Platform installation, except as otherwise noted. This should make it easier to copy/paste example commands directly into your terminal in most cases. - With Confluent Platform is installed, Confluent CLI commands themselves can be run from any directory (`kafka-topics`, `kafka-console-producer`, `kafka-console-consumer`), but for commands that access properties files in `$CONFLUENT_HOME` (`kafka-server-start`), the examples show running these from within that directory. A reference for these open source utilities is provided in [Kafka Command-Line Interface (CLI) Tools](/kafka/operations-tools/kafka-tools.html). A reference for Confluent premium command line tools and utilities is provided in [CLI Tools for Confluent Platform](/platform/current/installation/cli-reference.html). - Confluent CLI commands can specify the bootstrap server at the beginning or end of the command: `kafka-topics --list --bootstrap-server localhost:9092` is the same as `kafka-topics --bootstrap-server localhost:9092 --list`. In these tutorials, the target bootstrap server is specified at the end of commands. The rest of the tutorial refers to `$CONFLUENT_HOME` to indicate your Confluent Platform install directory. Set this as an environment variable, for example: ```bash export CONFLUENT_HOME=$HOME/confluent-8.1.0 PATH=$CONFLUENT_HOME/bin:$PATH ``` ### Create consumer, producer, and replicator configuration files The Replicator executable script expects three configuration files: - Configuration for the origin cluster - Configuration for the destination cluster - Replicator configuration Create the following files in `$CONFLUENT_HOME/my-examples/`: 1. Configure the origin cluster in a new file named `consumer.properties`. ```none cp etc/kafka/consumer.properties my-examples/. ``` Edit the file and make sure it contains the addresses of brokers from the **origin** cluster. The default broker list will match the origin cluster you started earlier. ```bash # Origin cluster connection configuration bootstrap.servers=localhost:9082 ``` 2. Configure the destination cluster in a new file named `producer.properties`. ```none cp etc/kafka/producer.properties my-examples/. ``` Edit the file and make sure it contains the addresses of brokers from the **destination** cluster. The default broker list will match the destination cluster you started earlier. ```bash # Destination cluster connection configuration bootstrap.servers=localhost:9092 ``` 3. Define the Replicator configuration in a new file named `replication.properties` for the Connect worker. This quick start shows a configuration for `topic.rename.format` but any of the [Replicator Configuration Reference for Confluent Platform](configuration_options.md#replicator-config-options) that are not connection related can be supplied in this file. ```bash # Replication configuration topic.rename.format=${topic}.replica replication.factor=1 config.storage.replication.factor=1 offset.storage.replication.factor=1 status.storage.replication.factor=1 confluent.topic.replication.factor=1 ``` #### Example Consumer Code By default, each record is deserialized into an Avro `GenericRecord`, but in this tutorial the record should be deserialized using the application’s code-generated `Payment` class. Therefore, configure the deserializer to use Avro `SpecificRecord`, i.e., `SPECIFIC_AVRO_READER_CONFIG` should be set to `true`. For example: ```java ... import io.confluent.kafka.serializers.KafkaAvroDeserializer; ... props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class); props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, KafkaAvroDeserializer.class); props.put(KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG, true); ... KafkaConsumer consumer = new KafkaConsumer<>(props)); consumer.subscribe(Collections.singletonList(TOPIC)); while (true) { ConsumerRecords records = consumer.poll(100); for (ConsumerRecord record : records) { String key = record.key(); Payment value = record.value(); } } ... ``` Because the `pom.xml` includes `avro-maven-plugin`, the `Payment` class is automatically generated during compile. In this example, the connection information to the Kafka brokers and Schema Registry is provided by the configuration file that is passed into the code, but if you want to specify the connection information directly in the client application, see [this java template](https://github.com/confluentinc/examples/tree/latest/ccloud/template_delta_configs/java_producer_consumer.delta). For a full Java consumer example, refer to [the consumer example](https://github.com/confluentinc/examples/tree/latest/clients/avro/src/main/java/io/confluent/examples/clients/basicavro/ConsumerExample.java). ### Authorizing Access to the Schemas Topic If you enable [Kafka authorization](../../security/authorization/acls/overview.md#kafka-authorization), you must grant the Schema Registry service principal the ability to perform the following [operations on the specified resources](../../security/authorization/acls/overview.md#acl-format-operations-resources): - `Read` and `Write` access to the internal **\_schemas** topic. This ensures that only authorized users can make changes to the topic. - `DescribeConfigs` on the schemas topic to verify that the topic exists - `describe topic` on the schemas topic, giving the Schema Registry service principal the ability to list the schemas topic - `DescribeConfigs` on the internal consumer offsets topic - Access to the Schema Registry cluster (`group`) - `Create` permissions on the Kafka cluster ```bash export KAFKA_OPTS="-Djava.security.auth.login.config=" bin/kafka-acls --bootstrap-server localhost:9092 --command-config adminclient-configs.conf --add \ --allow-principal 'User:' --allow-host '*' \ --producer --consumer --topic _schemas --group schema-registry bin/kafka-acls --bootstrap-server localhost:9092 --command-config adminclient-configs.conf --add \ --allow-principal 'User:' --allow-host '*' \ --operation DescribeConfigs --topic _schemas bin/kafka-acls --bootstrap-server localhost:9092 --command-config adminclient-configs.conf --add \ --allow-principal 'User:' --allow-host '*' \ --operation Describe --topic _schemas bin/kafka-acls --bootstrap-server localhost:9092 --command-config adminclient-configs.conf --add \ --allow-principal 'User:' --allow-host '*' \ --operation Read --topic _schemas bin/kafka-acls --bootstrap-server localhost:9092 --command-config adminclient-configs.conf --add \ --allow-principal 'User:' --allow-host '*' \ --operation Write --topic _schemas bin/kafka-acls --bootstrap-server localhost:9092 --command-config adminclient-configs.conf --add \ --allow-principal 'User:' --allow-host '*' \ --operation Describe --topic __consumer_offsets bin/kafka-acls --bootstrap-server localhost:9092 --command-config adminclient-configs.conf --add \ --allow-principal 'User:' --allow-host '*' \ --operation Create --cluster kafka-cluster ``` If you are using the [Schema Registry ACL Authorizer for Confluent Platform](../../confluent-security-plugins/schema-registry/authorization/sracl_authorizer.md#confluentsecurityplugins-sracl-authorizer), you also need permissions to `Read`, `Write`, and `DescribeConfigs` on the internal **\_schemas_acl** topic: ```bash bin/kafka-acls --bootstrap-server localhost:9092 --command-config adminclient-configs.conf --add \ --allow-principal 'User:' --allow-host '*' \ --producer --consumer --topic _schemas_acl --group schema-registry bin/kafka-acls --bootstrap-server localhost:9092 --command-config adminclient-configs.conf --add \ --allow-principal 'User:' --allow-host '*' \ --operation Read --topic _schemas_acl bin/kafka-acls --bootstrap-server localhost:9092 --command-config adminclient-configs.conf --add \ --allow-principal 'User:' --allow-host '*' \ --operation Write --topic _schemas_acl bin/kafka-acls --bootstrap-server localhost:9092 --command-config adminclient-configs.conf --add \ --allow-principal 'User:' --allow-host '*' \ --operation DescribeConfigs --topic _schemas_acl ``` ### Grant roles for the Schema Registry service principal In these steps, you use the Confluent CLI to log on to MDS and create the Schema Registry service principal . After you have these roles set up, you can use the Confluent CLI to manage Schema Registry users. For this example, assume the commands use the MDS server credentials, URLs, and property values you set up on your local Schema Registry properties file. (Optionally, you can use a [registered cluster name](#sr-use-registred-cluster-name) in your role bindings.) 1. Log on to MDS. ```bash confluent login --url ://: ``` 2. As a prerequisite to granting additional access, grant permission to create the topic `_schema_encoders`, which serves as the `metadata.encoder.topic` as described in [Schema Registry Configuration Reference for Confluent Platform](../installation/config.md#schemaregistry-config). ```bash confluent iam rbac role-binding create \ --principal User: \ --role ResourceOwner \ --resource Topic:<_schema_encoders> \ --kafka-cluster ``` For example: ```bash confluent iam rbac role-binding create \ --principal User:jack-sr \ --role ResourceOwner \ --resource Topic:_schema_encoders \ --kafka-cluster my-kafka-cluster-ID ``` 3. Grant the user the role `SecurityAdmin` on the Schema Registry cluster. ```bash confluent iam rbac role-binding create \ --role SecurityAdmin \ --principal User: \ --kafka-cluster \ --schema-registry-cluster ``` 4. Use the command `confluent iam rbac role-binding list ` to view the role you just created. ```bash confluent iam rbac role-binding list \ --principal User: \ --kafka-cluster \ --schema-registry-cluster ``` For example, here is a listing for a user “jack-sr” granted `SecurityAdmin` role on “schema-registry-cool-cluster”, connecting to MDS through a Kafka cluster `my-kafka-cluster-ID`: ```bash confluent iam rbac role-binding list \ --principal User:jack-sr \ --kafka-cluster my-kafka-cluster-ID \ --schema-registry-cluster schema-registry-cool-cluster Role | ResourceType | Name | PatternType +---------------+--------------+------+-------------+ SecurityAdmin | Cluster | | ``` 5. Grant the user the role `ResourceOwner` on the group that Schema Registry nodes use to coordinate across the cluster. ```bash confluent iam rbac role-binding create \ --principal User: \ --role ResourceOwner \ --resource Group: \ --kafka-cluster ``` For example: ```bash confluent iam rbac role-binding create \ --principal User:jack-sr \ --role ResourceOwner \ --resource Group:schema-registry-cool-cluster \ --kafka-cluster my-kafka-cluster-ID ``` 6. Grant the user the role `ResourceOwner` Kafka topic that Schema Registry uses to store its schemas. ```bash confluent iam rbac role-binding create \ --principal User: \ --role ResourceOwner \ --resource Topic: \ --kafka-cluster ``` For example: ```bash confluent iam rbac role-binding create \ --principal User:jack-sr \ --role ResourceOwner \ --resource Topic:_jax-schemas-topic \ --kafka-cluster my-kafka-cluster-ID ``` 7. Use the command `confluent iam rbac role-binding list ` to view the role you just created. ```bash confluent iam rbac role-binding list \ --principal User:jack-sr \ --role ResourceOwner \ --kafka-cluster my-kafka-cluster-ID ``` For example: ```bash confluent iam rbac role-binding list \ --principal User:jack-sr \ --role ResourceOwner \ --kafka-cluster my-kafka-cluster-ID Role | ResourceType | Name | PatternType +-------------+--------------+----------------------------------+-------------+ ResourceOwner | Topic | _jax-schemas-topic | LITERAL ResourceOwner | Topic | __schema_encoders | LITERAL ResourceOwner | Group | schema-registry-cool-cluster | LITERAL ResourceOwner | Topic | _schemas | LITERAL ResourceOwner | Group | schema-registry | LITERAL ``` ### Client authentication and authorization Configure license client authentication : When using principal propagation and the following security types, you must configure client authentication for the license topic. For more information, see the following documentation: - [SASL OAUTHBEARER (RBAC) client authentication](../../security/authentication/sasl/oauthbearer/configure-clients.md#security-sasl-rbac-oauthbearer-clientconfig) - [SASL PLAIN client authentication](../../security/authentication/sasl/plain/overview.md#sasl-plain-clients) - [SASL SCRAM client authentication](../../security/authentication/sasl/scram/overview.md#sasl-scram-clients) - [mTLS client authentication](../../security/authentication/mutual-tls/overview.md#authentication-ssl-clients) Configure license client authorization : When using principal propagation and RBAC or ACLs, you must configure client authorization for the license topic. #### NOTE The `_confluent-command` internal topic is available as the preferred alternative to the `_confluent-license` topic for components such as Schema Registry, REST Proxy, and Confluent Server (which were previously using `_confluent-license`). Both topics will be supported going forward. Here are some guidelines: - New deployments (Confluent Platform 6.2.1 and later) will default to using `_confluent-command` as shown below. - Existing clusters will continue using the `_confluent-license` unless manually changed. - Newly created clusters on Confluent Platform 6.2.1 and later will default to creating the `_confluent-command` topic, and only existing clusters that already have a `_confluent-license` topic will continue to use it. - **RBAC authorization** Run this command to add `ResourceOwner` for the component user for the Confluent license topic resource (default name is `_confluent-command`). ```none confluent iam rbac role-binding create \ --role ResourceOwner \ --principal User: \ --resource Topic:_confluent-command \ --kafka-cluster ``` - **ACL authorization** Run this command to configure Kafka authorization, where bootstrap server, client configuration, service account ID is specified. This grants create, read, and write on the `_confluent-command` topic. ```none kafka-acls --bootstrap-server --command-config \ --add --allow-principal User: --operation Create --operation Read --operation Write \ --topic _confluent-command ``` #### Schema Registry - Additional RBAC configurations required for [schema-registry.properties](https://github.com/confluentinc/examples/tree/latest/security/rbac/delta_configs/schema-registry.properties.delta) ```none kafkastore.bootstrap.servers=localhost:9092 kafkastore.security.protocol=SASL_PLAINTEXT kafkastore.sasl.mechanism=OAUTHBEARER kafkastore.sasl.login.callback.handler.class=io.confluent.kafka.clients.plugins.auth.token.TokenUserLoginCallbackHandler kafkastore.sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required username="sr" password="sr1" metadataServerUrls="http://localhost:8090"; # Schema Registry group id, which is the cluster id schema.registry.group.id=schema-registry-demo # These properties install the Schema Registry security plugin, and configure it to use RBAC for authorization and OAuth for authentication resource.extension.class=io.confluent.kafka.schemaregistry.security.SchemaRegistrySecurityResourceExtension confluent.schema.registry.authorizer.class=io.confluent.kafka.schemaregistry.security.authorizer.rbac.RbacAuthorizer rest.servlet.initializor.classes=io.confluent.common.security.jetty.initializer.InstallBearerOrBasicSecurityHandler # The location of a running metadata service; used to verify that requests are authorized by the users that make them confluent.metadata.bootstrap.server.urls=http://localhost:8090 # Credentials to use when communicating with the MDS; these should usually match the ones used for communicating with Kafka confluent.metadata.basic.auth.user.info=sr:sr1 confluent.metadata.http.auth.credentials.provider=BASIC # The path to public keys that should be used to verify json web tokens during authentication public.key.path=/tmp/tokenPublicKey.pem # This enables anonymous access with a principal of User:ANONYMOUS confluent.schema.registry.anonymous.principal=true authentication.skip.paths=/* ``` - Role bindings: ```bash # Schema Registry Admin confluent iam rbac role-binding create --principal User:$USER_ADMIN_SCHEMA_REGISTRY --role ResourceOwner --resource Topic:_schemas --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:$USER_ADMIN_SCHEMA_REGISTRY --role SecurityAdmin --kafka-cluster $KAFKA_CLUSTER_ID --schema-registry-cluster $SCHEMA_REGISTRY_CLUSTER_ID confluent iam rbac role-binding create --principal User:$USER_ADMIN_SCHEMA_REGISTRY --role ResourceOwner --resource Group:$SCHEMA_REGISTRY_CLUSTER_ID --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:$USER_ADMIN_SCHEMA_REGISTRY --role DeveloperRead --resource Topic:$LICENSE_TOPIC --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:$USER_ADMIN_SCHEMA_REGISTRY --role DeveloperWrite --resource Topic:$LICENSE_TOPIC --kafka-cluster $KAFKA_CLUSTER_ID # Client connecting to Schema Registry confluent iam rbac role-binding create --principal User:$USER_CLIENT_A --role ResourceOwner --resource Subject:$SUBJECT --kafka-cluster $KAFKA_CLUSTER_ID --schema-registry-cluster $SCHEMA_REGISTRY_CLUSTER_ID ``` ### How the audit log migration tool works The audit log migration tool performs the following tasks: - Sets the output bootstrap servers to the value specified (when specified). Note that output bootstrap servers are empty by default. - Combines the input audit log destination topics. For topics that appear in more than one Kafka cluster configuration, the migration tool uses the maximum [retention time](audit-logs-concepts.md#audit-log-retention) specified in the configuration. - Sets the default audit log topic as `confluent-audit-log-events`. If necessary, the migration tool will add this topic to the set of destination topics (in which case, it specifies a retention period of 7776000000 milliseconds). - Combines the set of all [excluded principals](audit-logs-concepts.md#audit-logs-excluded-principals). - Replaces the `/kafka=*/` part of each [Confluent Resource Name (CRN)](audit-logs-concepts.md#confluent-resource-name) pattern using the cluster ID of the contributing Kafka cluster. For example, a route in the configuration from `cluster1` with a CRN like `crn:///kafka=*/topic=accounting-*` will be transformed to `crn:///kafka=cluster1/topic=accounting-*`. For routes that have a CRN that uses something other than `/kafka=*/`, the migration tool will not replace the Kafka cluster ID. For example, if a route specifies `kafka=pkc-123` and the cluster ID is `pkc-abc` then the tool will leave it untouched and return the warning: ```none Mismatched Kafka Cluster Warning: Routes from one Kafka cluster ID on a completely different cluster ID are unexpected, but not necessarily wrong. For example, this message might be returned if you attempt to reuse the same routing configuration on multiple clusters. ``` - For any incoming audit log router configurations that have default topics other than `confluent-audit-log-events`, the script will add extra routes for the following CRN patterns (if they do not already exist): | Topic Route | Event Category Type | |--------------------------------------------------------------------------|-----------------------| | `crn:///kafka=` | AUTHORIZE, MANAGEMENT | | `crn:///kafka=/topic=*` | AUTHORIZE, MANAGEMENT | | `crn:///kafka=/control-center-broker-metrics=*` | AUTHORIZE | | `crn:///kafka=/control-center-alerts=*` | AUTHORIZE | | `crn:///kafka=/delegation-token=*` | AUTHORIZE | | `crn:///kafka=/control-center-broker-metrics=*` | AUTHORIZE | | `crn:///kafka=/control-center-alerts=*` | AUTHORIZE | | `crn:///kafka=/cluster-registry=*` | AUTHORIZE | | `crn:///kafka=/security-metadata=*` | AUTHORIZE | | `crn:///kafka=/all=*` | AUTHORIZE | | `crn:///kafka=/connect=` | AUTHORIZE | | `crn:///kafka=/connect=/connector=*` | AUTHORIZE | | `crn:///kafka=/connect=/secret=*` | AUTHORIZE | | `crn:///kafka=/connect=/all=*` | AUTHORIZE | | `crn:///kafka=/schema-registry=` | AUTHORIZE | | `crn:///kafka=/schema-registry=/subject=*` | AUTHORIZE | | `crn:///kafka=/schema-registry=/all=*` | AUTHORIZE | | `crn:///kafka=/ksql=` | AUTHORIZE | | `crn:///kafka=/ksql=/ksql-cluster=*` | AUTHORIZE | | `crn:///kafka=/ksql=/all=*` | AUTHORIZE | #### NOTE If you do not want the routes listed above added in your newly-migrated audit log configuration, then edit your input `server.properties` files to only use `confluent-audit-log-events` in the `default_topics` before migrating. ### View audit logs on the fly During initial setup or troubleshooting, you can quickly and iteratively examine recent audit log entries using simple command line tools. This can aid in audit log troubleshooting, and also when troubleshooting role bindings for RBAC. First pipe your audit log topics into a local file. Grep works faster on your local file system. If you have the `kafka-console-consumer` installed locally and can directly consume from the audit log destination Kafka cluster, your command should look similar to the following: ```none ./kafka-console-consumer --bootstrap-server auditlog.example.com:9092 --consumer.config ~/auditlog-consumer.properties --whitelist '^confluent-audit-log-events.*' > /tmp/streaming.audit.log ``` If you don’t have direct access and must instead connect using a “jump box” (a machine or server on a network that you use to access and manage devices in a separate security zone), use a command similar to the following: ```none ssh -tt -i ~/.ssh/theuser.key theuser@jumpbox './kafka-console-consumer --bootstrap-server auditlog.example.com:9092 --consumer.config ~/auditlog-consumer.properties --whitelist '"'"'^confluent-audit-log-events.*'"'"' ' > /tmp/streaming.audit.log ``` Regardless of which method you use, at this point you can open another terminal and locally run `tail -f /tmp/streaming.audit.log` to view audit log messages on the fly. After you’ve gotten the audit logs, you can use [grep](https://man7.org/linux/man-pages/man1/grep.1.html) and [jq](https://stedolan.github.io/jq/) (or another utility) to examine them. For example: ```none tail -f /tmp/streaming.audit.log | grep 'connect' | jq -c '[.time, .data.authenticationInfo.principal, .data.authorizationInfo.operation, .data.resourceName]' ``` ### Configure Replicator configuration connection 1. Define the `kafka_connect_replicator` group and `hosts` to deploy to. For example: ```yaml kafka_connect_replicator: hosts: ip-172-31-34-246.us-east-2.compute.internal: ``` 2. Define the listener for Replicator configuration cluster. The following is an example of a listener with Kerberos authentication and TLS enabled: ```yaml kafka_connect_replicator_listener: ssl_enabled: true sasl_protocol: kerberos ``` 3. Define the basic configuration for Replicator connection: ```yaml kafka_connect_replicator_white_list: kafka_connect_replicator_bootstrap_servers: ``` 4. Define security configuration for the Replicator connection: ```yaml kafka_connect_replicator_kerberos_principal: kafka_connect_replicator_kerberos_keytab_path: kafka_connect_replicator_ssl_ca_cert_path: kafka_connect_replicator_ssl_cert_path: kafka_connect_replicator_ssl_key_path: kafka_connect_replicator_ssl_key_password: ``` 5. For RBAC-enabled deployment, define the additional security configuration. Specify either the Kafka cluster id (`kafka_connect_replicator_kafka_cluster_id`) or the cluster name (`kafka_connect_replicator_kafka_cluster_name`). ```yaml kafka_connect_replicator_rbac_enabled: true kafka_connect_replicator_erp_tls_enabled: kafka_connect_replicator_erp_host: kafka_connect_replicator_erp_admin_user: kafka_connect_replicator_erp_admin_password: kafka_connect_replicator_kafka_cluster_id: kafka_connect_replicator_kafka_cluster_name: kafka_connect_replicator_erp_pem_file: ``` 6. Set the `CLASSPATH` to the replicator installation directory in `kafka_connect_service_environment_overrides`: ```yaml kafka_connect_service_environment_overrides: CLASSPATH: /* ``` For more information about setting required Confluent Platform environment variables using Ansible, see [Set environment variables](ansible-configure.md#ansible-override-env-varabiles). ### Configure producer connection 1. Define the listener configuration for the producer connection to the destination cluster. The following is an example with TLS and Kerberos for authentication enabled: ```yaml kafka_connect_replicator_producer_listener: ssl_enabled: true sasl_protocol: kerberos ``` 2. Define the basic producer configuration: ```yaml kafka_connect_replicator_producer_bootstrap_servers: ``` 3. Define the security configuration for the producer connection: ```yaml kafka_connect_replicator_producer_kerberos_principal: kafka_connect_replicator_producer_kerberos_keytab_path: kafka_connect_replicator_producer_ssl_ca_cert_path: kafka_connect_replicator_producer_ssl_cert_path: kafka_connect_replicator_producer_ssl_key_path: kafka_connect_replicator_producer_ssl_key_password: ``` 4. Define custom properties for each client connection: ```yaml kafka_connect_replicator_producer_custom_properties: ``` 5. For RBAC-enabled deployment, define the additional producer custom properties. `kafka_connect_replicator_producer` configs default to match `kafka_connect_replicator` configs. The following are required only if you are producing to a different cluster than where you are storing your configs. Specify either the Kafka cluster id (`kafka_connect_replicator_producer_kafka_cluster_id`) or the cluster name (`kafka_connect_replicator_producer_kafka_cluster_name`). ```yaml kafka_connect_replicator_producer_rbac_enabled: true kafka_connect_replicator_producer_erp_tls_enabled: kafka_connect_replicator_producer_erp_host: kafka_connect_replicator_producer_erp_admin_user: kafka_connect_replicator_producer_erp_admin_password: kafka_connect_replicator_producer_kafka_cluster_id: kafka_connect_replicator_producer_kafka_cluster_name: kafka_connect_replicator_producer_erp_pem_file: ``` ### Run Replicator Docker Container with Kubernetes 1. Delete existing secret (if it exists). ```bash kubectl delete secret replicator-secret-props ``` 2. Regenerate configs, if changed. (See the [Quick Start](replicator-cloud-quickstart.md#cloud-replicator-quickstart).) 3. Upload the new secret. ```bash kubectl create secret generic replicator-secret-props --from-file=/tmp/replicator/ ``` 4. Reload pods. ```bash kubectl apply -f container/replicator-deployment.yaml ``` Here is an example `replicator-deployment.yaml`. ```yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: repl-exec-connect-cluster spec: replicas: 1 template: metadata: labels: app: replicator-app spec: containers: - name: confluent-replicator image: confluentinc/cp-enterprise-replicator-executable env: - name: CLUSTER_ID value: "replicator-k8s" - name: CLUSTER_THREADS value: "1" - name: CONNECT_GROUP_ID value: "containerized-repl" # Note: This is to avoid _overlay errors_ . You could use /etc/replicator/ here instead. - name: REPLICATION_CONFIG value: "/etc/replicator-config/replication.properties" - name: PRODUCER_CONFIG value: "/etc/replicator-config/producer.properties" - name: CONSUMER_CONFIG value: "/etc/replicator-config/consumer.properties" volumeMounts: - name: replicator-properties mountPath: /etc/replicator-config/ volumes: - name: replicator-properties secret: secretName: "replicator-secret-props" defaultMode: 0666 ``` 5. Verify status. ```bash kubectl get pods kubectl logs -f ``` ### Describe a custom connector Use the following command to get connector details. Command syntax: ```bash confluent connect cluster describe [flags] ``` For example: ```bash confluent connect cluster describe clcc-wzxp69 --cluster lkc-abcd123 ``` Example output: ```bash Connector Details +--------+---------------------+ | ID | clcc-wzxp69 | | Name | my-custom-connector | | Status | RUNNING | | Type | source | +--------+---------------------+ Task Level Details Task ID | State ----------+---------- 0 | RUNNING Configuration Details Config | Value ---------------------------------+---------------------------------------------------------- cloud.environment | prod cloud.provider | aws confluent.custom.plugin.id | custom-plugin-epp0ye connector.class | io.confluent.kafka.connect.datagen.DatagenConnector iterations | 10000000 kafka.api.key | **************** kafka.api.secret | **************** kafka.auth.mode | KAFKA_API_KEY kafka.endpoint | SASL_SSL://pkc-abcd5.us-west-2.aws.confluent.cloud:9092 kafka.region | us-west-2 kafka.topic | pageviews key.converter | org.apache.kafka.connect.storage.StringConverter max.interval | 100 name | custom-datagen_0 quickstart | pageviews tasks.max | 1 value.converter | org.apache.kafka.connect.json.JsonConverter value.converter.schemas.enable | false ``` #### Step 3: Create the connector configuration file Create a JSON file that contains the connector configuration properties. The following example shows required and optional connector properties: ```none { "connector.class": "AlloyDbSink", "name": "AlloyDbSinkConnector_0", "input.data.format": "AVRO", "kafka.auth.mode": "KAFKA_API_KEY", "kafka.api.key": "****************", "kafka.api.secret": "****************************************************************", "connection.host": "34.27.121.137", "connection.port": "5432", "connection.user": "postgres", "connection.password": "**************", "db.name": "postgres", "topics": "postgresql_ratings", "insert.mode": "UPSERT", "db.timezone": "UTC", "auto.create": "true", "auto.evolve": "true", "pk.mode": "record_value", "pk.fields": "user_id", "tasks.max": "1" } ``` Note the following property definitions. See the [AlloyDB Sink configuration properties](#cc-alloydb-sink-config-properties) for additional property values and definitions. * `"connector.class"`: Identifies the connector plugin name. * `"name"`: Sets a name for your new connector. * `"kafka.auth.mode"`: Identifies the connector authentication mode you want to use. There are two options: `SERVICE_ACCOUNT` or `KAFKA_API_KEY` (the default). To use an API key and secret, specify the configuration properties `kafka.api.key` and `kafka.api.secret`, as shown in the example configuration (above). To use a [service account](service-account.md#s3-cloud-service-account), specify the **Resource ID** in the property `kafka.service.account.id=`. To list the available service account resource IDs, use the following command: ```bash confluent iam service-account list ``` For example: ```bash confluent iam service-account list Id | Resource ID | Name | Description +---------+-------------+-------------------+------------------- 123456 | sa-l1r23m | sa-1 | Service account 1 789101 | sa-l4d56p | sa-2 | Service account 2 ``` * `"connection.host"`: The hostname or the IP address of the VM running the AlloyDB Auth Proxy. * `"connection.port"`: The AlloyDB database connection port. Defaults to `5432`. * `"connection.user"`: The AlloyDB database user name. * `"connection.password"`: The AlloyDB database password. * `"db.name"`: The AlloyDB database name. * `"input.data.format"`: Sets the input Kafka record value format (data coming from the Kafka topic). Valid entries are **AVRO**, **JSON_SR** (JSON Schema), or **PROTOBUF**. You must have Confluent Cloud Schema Registry configured if using a schema-based message format. * `"input.key.format"`: Sets the input record key format (data coming from the Kafka topic). Valid entries are **AVRO**, **JSON_SR** (JSON Schema), **PROTOBUF**, or **STRING**. You must have Confluent Cloud Schema Registry configured if using a schema-based message format. * `"delete.on.null"`: Whether to treat null record values as deletes. Defaults to `false`. Requires `pk.mode` to be `record_key`. Defaults to `false`. * `"topics"`: Identifies the topic name or a comma-separated list of topic names. * `"insert.mode"`: Enter one of the following modes: - `INSERT`: Use the standard `INSERT` row function. An error occurs if the row already exists in the table. - `UPSERT`: This mode is similar to `INSERT`. However, if the row already exists, the `UPSERT` function overwrites column values with the new values provided. * `db.timezone`: Name of the time zone the connector uses when inserting time-based values. Defaults to UTC. * `"auto.create"` (tables) and `"auto-evolve"` (columns): (Optional) Sets whether to automatically create tables or columns if they are missing relative to the input record schema. If not entered in the configuration, both default to `false`. When\`\`auto.create\`\` is set to `true`, the connector creates a table name using `${topic}` (that is, the Kafka topic name). For more information, see [Table names and Kafka topic names](#cc-alloydb-sink-truncation-behavior) and the [AlloyDB Sink configuration properties](#cc-alloydb-sink-config-properties). * `"pk.mode"`: Supported modes are listed below: - `kafka`: Kafka coordinates are used as the primary key. Must be used with the `"pk.fields"` property. - `none`: No primary keys used. - `record_key`: Fields from the record key are used. May be a primitive or a struct. - `record_value`: Fields from the Kafka record value are used. Must be a struct type. * `"pk.fields"`: A list of comma-separated primary key field names. The runtime interpretation of this property depends on the `pk.mode` selected. Options are listed below: - `kafka`: Must be three values representing the Kafka coordinates. If left empty, the coordinates default to `__connect_topic,__connect_partition,__connect_offset`. - `none`: PK Fields not used. - `record_key`: If left empty, all fields from the key struct are used. Otherwise, this is used to extract the fields in the property. A single field name must be configured for a primitive key. - `record_value`: Used to extract fields from the record value. If left empty, all fields from the value struct are used. * `"tasks.max"`: Maximum number of tasks the connector can run. See Confluent Cloud [connector limitations](limits.md#cc-alloydb-sink-limits) for additional task information. **Single Message Transforms**: See the [Single Message Transforms (SMT)](single-message-transforms.md#cc-single-message-transforms) documentation for details about adding SMTs using the CLI. See [Configuration Properties](#cc-alloydb-sink-config-properties) for all property values and definitions. #### Step 3: Create the connector configuration file Create a JSON file that contains the connector configuration properties. The following example shows an example configuration. For two additional examples, see [Configuration JSON Examples](#cc-amazon-lambda-sink-config-examples). ```none { "connector.class": "LambdaSink", "name": "LambdaSinkConnector_0", "topics": "topic_aws_lambda_1", "input.data.format": "JSON", "kafka.auth.mode": "KAFKA_API_KEY", "kafka.api.key": "****************", "kafka.api.secret": "*************************************************", "aws.access.key.id": "****************", "aws.secret.access.key": "********************************************", "aws.lambda.configuration.mode": "single", "aws.lambda.function.name": "lambda-test", "aws.lambda.invocation.type": "sync", "behavior.on.error": "fail", "tasks.max": "1" } ``` Note the following required property definitions: * `"connector.class"`: Identifies the connector plugin name. * `"name"`: Sets a name for your new connector. * `"topics"`: Identifies the topic name or a comma-separated list of topic names. * `"kafka.auth.mode"`: Identifies the connector authentication mode you want to use. There are two options: `SERVICE_ACCOUNT` or `KAFKA_API_KEY` (the default). To use an API key and secret, specify the configuration properties `kafka.api.key` and `kafka.api.secret`, as shown in the example configuration (above). To use a [service account](service-account.md#s3-cloud-service-account), specify the **Resource ID** in the property `kafka.service.account.id=`. To list the available service account resource IDs, use the following command: ```bash confluent iam service-account list ``` For example: ```bash confluent iam service-account list Id | Resource ID | Name | Description +---------+-------------+-------------------+------------------- 123456 | sa-l1r23m | sa-1 | Service account 1 789101 | sa-l4d56p | sa-2 | Service account 2 ``` * `"input.data.format"`: Sets the input Kafka record value format (data coming from the Kafka topic). Valid entries are **AVRO**, **JSON_SR** (JSON Schema), **PROTOBUF**, **JSON** (Schemaless), or **BYTES**. You must have Confluent Cloud Schema Registry configured if using a schema-based message format. #### NOTE If no schema is defined, values are encoded as plain strings. For example, `"name": "Kimberley Human"` is encoded as `name=Kimberley Human`. * `"aws.access.key.id"` and `"aws.secret.access.key"`: Enter the AWS Access Key ID and Secret. For information about how to set these up, see [Access Keys](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys). * `"aws.lambda.configuration.mode"`: The mode in which to run the connector. Options are `multiple` to invoke multiple AWS Lambda functions or `single` (the default) to invoke a single function. One connector instance can support a maximum of 10 functions. * `"aws.lambda.function.name"`: The AWS Lambda function to invoke for `single` configuration mode. * `"aws.lambda.topic2function.map"`: A map of Kafka topics to AWS Lambda functions for `multiple` configuration mode. Enter the map as comma- separated tuples. For example: `;,;,...`. You can map a maximum of three functions to a single topic. #### NOTE The following steps show basic ACL entries for source connector service accounts. Make sure to review [Debezium [Legacy] Source Connectors](#cloud-service-account-debezium-acls) and [JDBC-based Source Connectors and the MongoDB Atlas Source Connector](#cloud-service-account-jdbc-mongo-acls) for additional ACL entries that may be required for certain connectors. 1. Create a service account named `myserviceaccount`: ```none confluent iam service-account create myserviceaccount --description "test service account" ``` 2. Find the service account ID for `myserviceaccount`: ```none confluent iam service-account list ``` 3. Set a DESCRIBE ACL to the cluster. ```none confluent kafka acl create --allow --service-account "" --operations describe --cluster-scope ``` 4. Set a WRITE ACL to `passengers`: ```none confluent kafka acl create --allow --service-account "" --operations write --topic "passengers" ``` 5. Create a Kafka API key and secret for ``: ```none confluent api-key create --resource "lkc-abcd123" --service-account "" ``` 6. Save the API key and secret. The connector configuration must include either an API key and secret or a service account ID. For additional service account information, see [Service Accounts on Confluent Cloud](../security/authenticate/workload-identities/service-accounts/overview.md#service-accounts). ## Connect a Java application to Confluent Cloud To configure Java clients for Kafka to connect to a Kafka cluster in Confluent Cloud: 1. Add the Kafka client dependency to your project. For Maven: ```xml org.apache.kafka kafka-clients ${kafka.clients.version} ``` For Gradle: ```groovy implementation "org.apache.kafka:kafka-clients:${kafkaClientsVersion}" ``` Use a current, supported version per your build’s BOM or dependency management policy. 2. Configure your Java application with the connection properties. You can obtain these from the Confluent Cloud Console by selecting your cluster and clicking **Clients**. 3. Use the configuration in your producer or consumer code: ```java Properties props = new Properties(); props.put("bootstrap.servers", "your-bootstrap-servers"); props.put("security.protocol", "SASL_SSL"); props.put("sasl.mechanism", "PLAIN"); props.put("sasl.jaas.config", "org.apache.kafka.common.security.plain.PlainLoginModule required username='your-api-key' password='your-api-secret';"); // Create producer or consumer KafkaProducer producer = new KafkaProducer<>(props); ``` 4. See the [Java client examples](https://github.com/confluentinc/examples/tree/latest/clients/cloud/java) for complete working examples. 5. Integrate with your environment. ## Connect a C/C++ application to Confluent Cloud To configure a C/C++ application using the [librdkafka client](https://github.com/edenhill/librdkafka) to connect to a Kafka cluster in Confluent Cloud: 1. **Prerequisite**: Ensure you have installed the `librdkafka` library on your system or included it in your project’s build process. 2. In your application code, create a configuration object and set the properties for connecting to Confluent Cloud. ```c #include // ... rd_kafka_conf_t *conf; char errstr[512]; conf = rd_kafka_conf_new(); // Confluent Cloud bootstrap servers if (rd_kafka_conf_set(conf, "bootstrap.servers", "", errstr, sizeof(errstr)) != RD_KAFKA_CONF_OK) { fprintf(stderr, "%s\n", errstr); // Handle error } // Security configuration rd_kafka_conf_set(conf, "security.protocol", "SASL_SSL", errstr, sizeof(errstr)); rd_kafka_conf_set(conf, "sasl.mechanisms", "PLAIN", errstr, sizeof(errstr)); rd_kafka_conf_set(conf, "sasl.username", "", errstr, sizeof(errstr)); rd_kafka_conf_set(conf, "sasl.password", "", errstr, sizeof(errstr)); // See the Client Prerequisites section for details on ssl.ca.location. // For librdkafka v2.11 or later, it is typically not required. // ... create producer or consumer instance with this conf ... ``` 3. For complete, working projects, refer to the official [librdkafka examples directory](https://github.com/edenhill/librdkafka/tree/master/examples). ### Implement a permanent and in-line UDF ```java package io.confluent.flink.examples.table; import io.confluent.flink.plugin.ConfluentSettings; import org.apache.flink.table.api.EnvironmentSettings; import org.apache.flink.table.api.TableEnvironment; import org.apache.flink.table.functions.ScalarFunction; import org.apache.flink.table.functions.TableFunction; import java.util.List; import static org.apache.flink.table.api.Expressions.$; import static org.apache.flink.table.api.Expressions.array; import static org.apache.flink.table.api.Expressions.call; import static org.apache.flink.table.api.Expressions.row; /** * A table program example showing how to use User-Defined Functions * (UDFs) in the Flink Table API. * *

The Flink Table API simplifies the process of creating and managing UDFs. * *

    *
  • It helps creating a JAR file containing all required dependencies for a given UDF. *
  • Uploads the JAR to Confluent artifact API. *
  • Creates SQL functions for given artifacts. *
*/ public class Example_09_Functions { // Fill this with an environment you have write access to static final String TARGET_CATALOG = ""; // Fill this with a Kafka cluster you have write access to static final String TARGET_DATABASE = ""; // All logic is defined in a main() method. It can run both in an IDE or CI/CD system. public static void main(String[] args) { // Setup connection properties to Confluent Cloud EnvironmentSettings settings = ConfluentSettings.fromResource("/cloud.properties"); // Initialize the session context to get started TableEnvironment env = TableEnvironment.create(settings); // Set default catalog and database env.useCatalog(TARGET_CATALOG); env.useDatabase(TARGET_DATABASE); System.out.println("Registering a scalar function..."); // The Table API underneath creates a temporary JAR file containing all transitive classes // required to run the function, uploads it to Confluent Cloud, and registers the function // using the previously uploaded artifact. env.createFunction("CustomTax", CustomTax.class, true); // As of now, Scalar and Table functions are supported. System.out.println("Registering a table function..."); env.createFunction("Explode", Explode.class, true); // Once registered, the functions can be used in Table API and SQL queries. System.out.println("Executing registered UDFs..."); env.fromValues(row("Apple", "USA", 2), row("Apple", "EU", 3)) .select( $("f0").as("product"), $("f1").as("location"), $("f2").times(call("CustomTax", $("f1"))).as("tax")) .execute() .print(); env.fromValues( row(1L, "Ann", array("Apples", "Bananas")), row(2L, "Peter", array("Apples", "Pears"))) .joinLateral(call("Explode", $("f2")).as("fruit")) .select($("f0").as("id"), $("f1").as("name"), $("fruit")) .execute() .print(); // Instead of registering functions permanently, you can embed UDFs directly into queries // without registering them first. This will upload all the functions of the query as a // single artifact to Confluent Cloud. Moreover, the functions lifecycle will be bound to // the lifecycle of the query. System.out.println("Executing inline UDFs..."); env.fromValues(row("Apple", "USA", 2), row("Apple", "EU", 3)) .select( $("f0").as("product"), $("f1").as("location"), $("f2").times(call(CustomTax.class, $("f1"))).as("tax")) .execute() .print(); env.fromValues( row(1L, "Ann", array("Apples", "Bananas")), row(2L, "Peter", array("Apples", "Pears"))) .joinLateral(call(Explode.class, $("f2")).as("fruit")) .select($("f0").as("id"), $("f1").as("name"), $("fruit")) .execute() .print(); } /** A scalar function that calculates a custom tax based on the provided location. */ public static class CustomTax extends ScalarFunction { public int eval(String location) { if (location.equals("USA")) { return 10; } if (location.equals("EU")) { return 5; } return 0; } } /** A table function that explodes an array of string into multiple rows. */ public static class Explode extends TableFunction { public void eval(List arr) { for (String i : arr) { collect(i); } } } } ``` # Carry-over Offsets in Confluent Cloud for Apache Flink Confluent Cloud for Apache Flink® supports carry-over offsets, which means that you can use the topic offsets from one statement to start a new statement. Carry-over offsets provide a streamlined way to update Flink statements without data loss. This feature eliminates the manual complexity of copying offsets between statements and reduces the need to monitor statement status when deploying CI/CD pipelines. Automatic orchestration handles the upgrade process. The system automatically waits for the old statement to stop before starting the new one, providing a seamless transition of processing between statements. Carry-over offsets are available only when replacing an existing statement. This feature enables you to [evolve statements](../concepts/schema-statement-evolution.md#flink-sql-schema-and-statement-evolution) with exactly-once semantics across the update when the statement is “stateless”, as determined by the system. At a high level, “stateless” applies to statements that can process each event independently and in any order. For other scenarios, such as aggregates, lag, windows, pattern matching, or use of upsert sink, this feature can’t be used, because the update may cause inconsistent results. To use carry-over offsets, add the `sql.tables.initial-offset-from` property to the statement configuration when you create your new statement, for example: In the Confluent Cloud Console and the Flink SQL shell, you can set the property by using the [SET](../reference/statements/set.md#flink-sql-set-statement) statement, for example: ```sql SET 'sql.tables.initial-offset-from' = '' ``` The `` is the name of the statement that you want to use as the reference for the carry-over offsets. If you’re using the [Statements API](/cloud/current/api.html#tag/Statements-(sqlv1)/operation/createSqlv1Statement) or the Confluent Terraform provider, you can set the property by using the [properties field](https://registry.terraform.io/providers/confluentinc/confluent/latest/docs/resources/confluent_flink_statement), for example: ```json { "properties": { "sql.tables.initial-offset-from": "" } } ``` ## num.partitions You can change the number of partitions for an existing topic (`num.partitions`) for all cluster types on a per- topic basis. You can only increase (not decrease) the `num.partitions` value after you create a topic, and you must make the increase using the `kafka-topic` script or the API. Limits vary based on Kafka cluster type. For more information, see [Kafka Cluster Types in Confluent Cloud](../clusters/cluster-types.md#cloud-cluster-types). - Default: 6 - Editable: Yes - Kafka REST API and Terraform Provider Support: Yes To change the number of partitions, you can use the `kafka-topic` script that is a part of the [Kafka command line tools](https://www.confluent.io/blog/using-apache-kafka-command-line-tools-confluent-cloud/). (installed with Confluent Platform) with the following command. ```none bin/kafka-topics --bootstrap-server --command-config --alter --topic --partitions `` ``` Alternatively, you can use the [Kafka REST APIs](https://docs.confluent.io/cloud/current/api.html#tag/Topic-(v3)/operation/updatePartitionCountKafkaTopic) to change the number of partitions for an existing topic (`num.partitions`). You need the REST endpoint and the cluster ID for your cluster to make Kafka REST calls. To find this information with Cloud Console, see [Find the REST endpoint address and cluster ID](../clusters/broker-config.md#cluster-settings-console). For more on how to use the REST APIs, see [Kafka REST API Quick Start for Confluent Cloud](../kafka-rest/krest-qs.md#cloud-rest-api-quickstart). You could also use Terraform Provider for Confluent to edit this topic setting. For more details, sign in to the [the Confluent Support Portal](https://support.confluent.io/) and search for “How to increase the partition count for a Confluent Cloud hosted topic.” ## Flags ```none --bootstrap string Kafka cluster endpoint (Confluent Cloud); or comma-separated list of broker hosts (Confluent Platform), each formatted as "host" or "host:port". --key-schema string The ID or filepath of the message key schema. --schema string The ID or filepath of the message value schema. --key-format string Format of message key as "string", "avro", "double", "integer", "jsonschema", or "protobuf". Note that schema references are not supported for Avro. (default "string") --value-format string Format message value as "string", "avro", "double", "integer", "jsonschema", or "protobuf". Note that schema references are not supported for Avro. (default "string") --references string The path to the message value schema references file. --parse-key Parse key from the message. --delimiter string The delimiter separating each key and value. (default ":") --config strings A comma-separated list of configuration overrides ("key=value") for the producer client. For a full list, see https://docs.confluent.io/platform/current/clients/librdkafka/html/md_CONFIGURATION.html --config-file string The path to the configuration file for the producer client, in JSON or Avro format. --schema-registry-endpoint string Endpoint for Schema Registry cluster. --headers strings A comma-separated list of headers formatted as "key:value". --key-references string The path to the message key schema references file. --api-key string API key. --api-secret string API secret. --schema-registry-api-key string Schema registry API key. --schema-registry-api-secret string Schema registry API secret. --cluster string Kafka cluster ID. --context string CLI context name. --environment string Environment ID. --certificate-authority-path string File or directory path to one or more Certificate Authority certificates for verifying the broker's key with SSL. --username string SASL_SSL username for use with PLAIN mechanism. --password string SASL_SSL password for use with PLAIN mechanism. --cert-location string Path to client's public key (PEM) used for SSL authentication. --key-location string Path to client's private key (PEM) used for SSL authentication. --key-password string Private key passphrase for SSL authentication. --protocol string Specify the broker communication protocol as "PLAINTEXT", "SASL_SSL", or "SSL". (default "SSL") --sasl-mechanism string SASL_SSL mechanism used for authentication. (default "PLAIN") --client-cert-path string File or directory path to client certificate to authenticate the Schema Registry client. --client-key-path string File or directory path to client key to authenticate the Schema Registry client. ``` ## High-availability setup Use these steps to configure Control Center for Active/Active high-availability deployment. Considerations: : - You must manually duplicate alerts in one of your Control Center instances. - For a Confluent Ansible example of Control Center Active/Active high-availability setup, see: [GitHub repo](https://github.com/confluentinc/cp-ansible/tree/d31730fa1b14db2833c40ad7308e89de9f96b734/docs/sample_inventories/c3-next-gen-active-active-setup) - For CFK example of Control Center Active/Active high-availability setup, see: [GitHub repo](https://github.com/confluentinc/confluent-kubernetes-examples/tree/master/control-center-next-gen/plain-active-active-setup) To configure Control Center Active/Active high-availability, use the following steps: 1. Configure two instances of Control Center for your Kafka cluster. 2. For every Kafka broker and KRaft controller, you must add and configure two HttpExporters. Consider the following example HttpExporter configurations: ```none confluent.telemetry.exporter._c3-1.client.base.url=http://{C3-1-internal-dns-hostname}:9090/api/v1/otlp confluent.telemetry.exporter._c3-2.client.base.url=http://{C3-2-internal-dns-hostname}:9090/api/v1/otlp ``` - Replace `{C3-1-internal-dns-hostname}` with the base URL for the corresponding Prometheus instance in your cluster. 3. For every Kafka broker and KRaft controller, add the following configurations: ```none #common configs confluent.telemetry.metrics.collector.interval.ms=60000 confluent.telemetry.remoteconfig._confluent.enabled=false confluent.consumer.lag.emitter.enabled=true metric.reporters=io.confluent.telemetry.reporter.TelemetryReporter # instance 1 configs confluent.telemetry.exporter._c3.type=http confluent.telemetry.exporter._c3.enabled=true confluent.telemetry.exporter._c3.client.base.url=http://{C3-1-internal-dns-hostname}:9090/api/v1/otlp confluent.telemetry.exporter._c3.client.compression=gzip confluent.telemetry.exporter._c3.api.key=dummy confluent.telemetry.exporter._c3.api.secret=dummy confluent.telemetry.exporter._c3.buffer.pending.batches.max=80 confluent.telemetry.exporter._c3.buffer.batch.items.max=4000 confluent.telemetry.exporter._c3.buffer.inflight.submissions.max=10 confluent.telemetry.exporter._c3.metrics.include=io.confluent.kafka.server.request.(?!.*delta).*|io.confluent.kafka.server.server.broker.state|io.confluent.kafka.server.replica.manager.leader.count|io.confluent.kafka.server.request.queue.size|io.confluent.kafka.server.broker.topic.failed.produce.requests.rate.1.min|io.confluent.kafka.server.tier.archiver.total.lag|io.confluent.kafka.server.request.total.time.ms.p99|io.confluent.kafka.server.broker.topic.failed.fetch.requests.rate.1.min|io.confluent.kafka.server.broker.topic.total.fetch.requests.rate.1.min|io.confluent.kafka.server.partition.caught.up.replicas.count|io.confluent.kafka.server.partition.observer.replicas.count|io.confluent.kafka.server.tier.tasks.num.partitions.in.error|io.confluent.kafka.server.broker.topic.bytes.out.rate.1.min|io.confluent.kafka.server.request.total.time.ms.p95|io.confluent.kafka.server.controller.active.controller.count|io.confluent.kafka.server.session.expire.listener.zookeeper.disconnects.total|io.confluent.kafka.server.request.total.time.ms.p999|io.confluent.kafka.server.controller.active.broker.count|io.confluent.kafka.server.request.handler.pool.request.handler.avg.idle.percent.rate.1.min|io.confluent.kafka.server.session.expire.listener.zookeeper.disconnects.rate.1.min|io.confluent.kafka.server.controller.unclean.leader.elections.rate.1.min|io.confluent.kafka.server.replica.manager.partition.count|io.confluent.kafka.server.controller.unclean.leader.elections.total|io.confluent.kafka.server.partition.replicas.count|io.confluent.kafka.server.broker.topic.total.produce.requests.rate.1.min|io.confluent.kafka.server.controller.offline.partitions.count|io.confluent.kafka.server.socket.server.network.processor.avg.idle.percent|io.confluent.kafka.server.partition.under.replicated|io.confluent.kafka.server.log.log.start.offset|io.confluent.kafka.server.log.tier.size|io.confluent.kafka.server.log.size|io.confluent.kafka.server.tier.fetcher.bytes.fetched.total|io.confluent.kafka.server.request.total.time.ms.p50|io.confluent.kafka.server.tenant.consumer.lag.offsets|io.confluent.kafka.server.session.expire.listener.zookeeper.expires.rate.1.min|io.confluent.kafka.server.log.log.end.offset|io.confluent.kafka.server.broker.topic.bytes.in.rate.1.min|io.confluent.kafka.server.partition.under.min.isr|io.confluent.kafka.server.partition.in.sync.replicas.count|io.confluent.telemetry.http.exporter.batches.dropped|io.confluent.telemetry.http.exporter.items.total|io.confluent.telemetry.http.exporter.items.succeeded|io.confluent.telemetry.http.exporter.send.time.total.millis|io.confluent.kafka.server.controller.leader.election.rate.(?!.*delta).*|io.confluent.telemetry.http.exporter.batches.failed # instance 2 configs confluent.telemetry.exporter._c3-2.type=http confluent.telemetry.exporter._c3-2.enabled=true confluent.telemetry.exporter._c3-2.client.compression=gzip confluent.telemetry.exporter._c3-2.api.key=dummy confluent.telemetry.exporter._c3-2.api.secret=dummy confluent.telemetry.exporter._c3-2.buffer.pending.batches.max=80 confluent.telemetry.exporter._c3-2.buffer.batch.items.max=4000 confluent.telemetry.exporter._c3-2.buffer.inflight.submissions.max=10 confluent.telemetry.exporter._c3-2.client.base.url=http://{C3-2-internal-dns-hostname}:9090/api/v1/otlp confluent.telemetry.exporter._c3-2.metrics.include=io.confluent.kafka.server.request.(?!.*delta).*|io.confluent.kafka.server.server.broker.state|io.confluent.kafka.server.replica.manager.leader.count|io.confluent.kafka.server.request.queue.size|io.confluent.kafka.server.broker.topic.failed.produce.requests.rate.1.min|io.confluent.kafka.server.tier.archiver.total.lag|io.confluent.kafka.server.request.total.time.ms.p99|io.confluent.kafka.server.broker.topic.failed.fetch.requests.rate.1.min|io.confluent.kafka.server.broker.topic.total.fetch.requests.rate.1.min|io.confluent.kafka.server.partition.caught.up.replicas.count|io.confluent.kafka.server.partition.observer.replicas.count|io.confluent.kafka.server.tier.tasks.num.partitions.in.error|io.confluent.kafka.server.broker.topic.bytes.out.rate.1.min|io.confluent.kafka.server.request.total.time.ms.p95|io.confluent.kafka.server.controller.active.controller.count|io.confluent.kafka.server.session.expire.listener.zookeeper.disconnects.total|io.confluent.kafka.server.request.total.time.ms.p999|io.confluent.kafka.server.controller.active.broker.count|io.confluent.kafka.server.request.handler.pool.request.handler.avg.idle.percent.rate.1.min|io.confluent.kafka.server.session.expire.listener.zookeeper.disconnects.rate.1.min|io.confluent.kafka.server.controller.unclean.leader.elections.rate.1.min|io.confluent.kafka.server.replica.manager.partition.count|io.confluent.kafka.server.controller.unclean.leader.elections.total|io.confluent.kafka.server.partition.replicas.count|io.confluent.kafka.server.broker.topic.total.produce.requests.rate.1.min|io.confluent.kafka.server.controller.offline.partitions.count|io.confluent.kafka.server.socket.server.network.processor.avg.idle.percent|io.confluent.kafka.server.partition.under.replicated|io.confluent.kafka.server.log.log.start.offset|io.confluent.kafka.server.log.tier.size|io.confluent.kafka.server.log.size|io.confluent.kafka.server.tier.fetcher.bytes.fetched.total|io.confluent.kafka.server.request.total.time.ms.p50|io.confluent.kafka.server.tenant.consumer.lag.offsets|io.confluent.kafka.server.session.expire.listener.zookeeper.expires.rate.1.min|io.confluent.kafka.server.log.log.end.offset|io.confluent.kafka.server.broker.topic.bytes.in.rate.1.min|io.confluent.kafka.server.partition.under.min.isr|io.confluent.kafka.server.partition.in.sync.replicas.count|io.confluent.telemetry.http.exporter.batches.dropped|io.confluent.telemetry.http.exporter.items.total|io.confluent.telemetry.http.exporter.items.succeeded|io.confluent.telemetry.http.exporter.send.time.total.millis|io.confluent.kafka.server.controller.leader.election.rate.(?!.*delta).*|io.confluent.telemetry.http.exporter.batches.failed ``` # Configure SASL for Control Center on Confluent Platform Many of the concepts applied here [come from the Kafka Security documentation](/platform/current/security/authentication/overview.html#kafka-sasl-auth). Reading through and understanding that documentation will be useful in configuring Control Center for SASL. The following assumes that this is for a development setup only and generically followed the [Quick Start for Confluent Platform](/platform/current/get-started/platform-quickstart.html#quickstart). While the specifics are for development purposes only, securing a production cluster follows the same concepts. ### Admin client The JavaScript Client library includes an admin client to interact with the Kafka cluster. The admin client provides several methods to manage topics, groups, and other Kafka entities. ```js // An admin client can be created from configuration. const admin = new Kafka().admin({ 'bootstrap.servers': '', }); // Or from a producer or consumer instance. const depAdmin = producer.dependentAdmin(); await admin.connect(); await depAdmin.connect(); ``` A complete list of methods available on the admin client can be found in the [JavaScript Client API reference documentation](/platform/current/clients/confluent-kafka-javascript/docs/index.html). #### Standard API The Standard API is more performant, particularly when handling high volumes of messages. However, it requires more manual setup to use. The following example illustrates its use: ```js const producer = new Kafka.Producer({ 'bootstrap.servers': 'localhost:9092', 'dr_cb': true }); // Connect to the broker manually producer.connect(); // Wait for the ready event before proceeding producer.on('ready', () => { try { producer.produce( // Topic to send the message to 'topic', // optionally we can manually specify a partition for the message // this defaults to -1 - which will use librdkafka's default partitioner (consistent random for keyed messages, random for unkeyed messages) null, // Message to send. Must be a buffer Buffer.from('Awesome message'), // for keyed messages, we also specify the key - note that this field is optional 'Stormwind', // you can send a timestamp here. If your broker version supports it, // it will get added. Otherwise, we default to 0 Date.now(), // you can send an opaque token here, which gets passed along // to your delivery reports ); } catch (err) { console.error('A problem occurred when sending our message'); console.error(err); } }); // Any errors we encounter, including connection errors producer.on('event.error', (err) => { console.error('Error from producer'); console.error(err); }) // We must either call .poll() manually after sending messages // or set the producer to poll on an interval (.setPollInterval). // Without this, we do not get delivery events and the queue // will eventually fill up. producer.setPollInterval(100); // You can also set up the producer to poll in the background thread which is // spawned by the C code. It is more efficient for high-throughput producers. // Calling this clears any interval set in setPollInterval. producer.setPollInBackground(true); ``` To see the configuration options available to you, see the [librdkafka Configuration options](https://github.com/confluentinc/librdkafka/blob/v2.3.0/CONFIGURATION.md). #### IMPORTANT Append EntityPath= at the end of the `azure.servicebus.connection.string` ```json { "name" : "ServiceBusSourceConnector", "config" : { "connector.class" : "io.confluent.connect.azure.servicebus.ServiceBusSourceConnector", "tasks.max" : "1", "kafka.topic" : "servicebus-topic", "azure.servicebus.sas.keyname":"sas-keyname", "azure.servicebus.sas.key":"sas-key", "azure.servicebus.namespace":"namespace", "azure.servicebus.entity.name":"queue-name", "azure.servicebus.subscription" : "", "azure.servicebus.max.message.count" : "10", "azure.servicebus.max.waiting.time.seconds" : "30", "confluent.license":"", "confluent.topic.bootstrap.servers":"localhost:9092", "confluent.topic.replication.factor":"1" } } ``` Use `curl` to post the configuration to one of the Kafka Connect Workers. Change `http://localhost:8083/` the endpoint of one of your Kafka Connect worker(s). ```bash curl -s -X POST -H 'Content-Type: application/json' --data @config.json http://localhost:8083/connectors ``` Use the following command to update the configuration of existing connector. ```bash curl -s -X PUT -H 'Content-Type: application/json' --data @config.json http://localhost:8083/connectors/ServiceBusSourceConnector/config ``` To publish messages to Service Bus queue, follow the [Send and receive messages](https://docs.microsoft.com/en-us/azure/service-bus-messaging/service-bus-quickstart-cli#send-and-receive-messages). ```bash java -jar ./target/queuesgettingstarted-1.0.0-jar-with-dependencies.jar -c "Endpoint=sb://.servicebus.windows.net/;SharedAccessKeyName=;SharedAccessKey=;" ``` To consume records written by connector to the configured Kafka topic, run the following command: ```bash kafka-avro-console-consumer \ --bootstrap-server localhost:9092 \ --property schema.registry.url=http://localhost:8081 \ --topic servicebus-topic \ --from-beginning ``` ### Sink Connector Configuration Start the services using the Confluent CLI: ```bash confluent local start ``` Create a configuration file named datadog-metrics-sink-config.json with the following contents: ```text { "name": "datadog-metrics-sink", "config": { "topics": "datadog-metrics-topic", "connector.class": "io.confluent.connect.datadog.metrics.DatadogMetricsSinkConnector", "tasks.max": "1", "key.converter": "io.confluent.connect.string.StringConverter", "key.converter.schema.registry.url": "http://localhost:8081", "value.converter": "io.confluent.connect.json.JsonConverter", "value.converter.schema.registry.url": "http://localhost:8081", "datadog.api.key": "< your-api-key >" "datadog.domain": "COM" "behavior.on.error": "fail", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1" } } ``` Run this command to start the Datadog Metrics sink connector. ```bash confluent local load datadog-metrics-sink --config datadog-metrics-sink-config.json ``` To check that the connector started successfully view the Connect worker’s log by running: ```bash confluent local services connect log ``` Produce test data to the `datadog-metrics-topic` topic in Kafka using the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/index.html) confluent local produce command. ```bash kafka-avro-console-producer \ --broker-list localhost:9092 --topic datadog-metrics-topic \ --property value.schema='{"name": "metric","type": "record","fields": [{"name": "name","type": "string"},{"name": "type","type": "string"},{"name": "timestamp","type": "long"}, {"name": "dimensions", "type": {"name": "dimensions", "type": "record", "fields": [{"name": "host", "type":"string"}, {"name":"interval", "type":"int"}, {"name": "tag1", "type":"string"}]}},{"name": "values","type": {"name": "values","type": "record","fields": [{"name":"doubleValue", "type": "double"}]}}]}' ``` ## REST-based example Use this setting with [distributed workers](/platform/current/connect/concepts.html#distributed-workers). Write the following JSON to `config.json`, configure all of the required values, and use the following command to post the configuration to one of the distributed connect workers. For more information about the Kafka Connect REST API, see [this documentation](/platform/current/connect/references/restapi.html). ```json { "name" : "FirebaseSinkConnector", "config" : { "topics":"artists,songs", "connector.class" : "io.confluent.connect.firebase.FirebaseSinkConnector", "tasks.max" : "1", "gcp.firebase.credentials.path" : "credential path", "gcp.firebase.database.reference": "database url", "insert.mode" : "set/update/push", "key.converter" : "io.confluent.connect.avro.AvroConverter", "key.converter.schema.registry.url":"http://localhost:8081", "value.converter" : "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url":"http://localhost:8081", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1", "confluent.license": " Omit to enable trial mode " } } ``` ### REST-based example Use this setting with [distributed workers](/platform/current/connect/concepts.html#distributed-workers). Write the following JSON to `config.json`, configure all of the required values, and use the following command to post the configuration to one of the distributed Connect workers. Check here for more information about the Kafka Connect [REST API](/platform/current/connect/references/restapi.html). ```json { "name" : "MyGithubConnector", "config" : { "connector.class" : "io.confluent.connect.github.GithubSourceConnector", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1", "tasks.max" : "1", "github.service.url":"https://api.github.com", "github.access.token":"< Github-Access-Token >", "github.repositories":"apache/kafka", "github.resources":"stargazers", "github.since":"2019-01-01", "topic.name.pattern":"github-${resourceName}", "key.converter":"io.confluent.connect.avro.AvroConverter", "key.converter.schema.registry.url":"http://localhost:8081", "value.converter":"io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url":"http://localhost:8081" } } ``` ### REST-based example This configuration is used typically along with [distributed workers](/platform/current/connect/concepts.html#distributed-workers). Write the following json to `omnisci-sink-connector.json`, configure all of the required values, and use the command below to post the configuration to one the distributed connect worker(s). Check here for more information about the Kafka Connect [REST API](/platform/current/connect/references/restapi.html) ```bash { "name" : "OmnisciSinkConnector", "config" : { "connector.class" : "io.confluent.connect.omnisci.OmnisciSinkConnector", "tasks.max" : "1", "topics": "orders", "connection.database": "omnisci", "connection.port": "6274", "connection.host": "localhost", "connection.user": "admin", "connection.password": "HyperInteractive", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1", "auto.create": "true" } } ``` Use curl to post the configuration to one of the Kafka Connect workers. Change http://localhost:8083/ the endpoint of one of your Kafka Connect worker(s). Run the connector with this configuration. ```bash curl -X POST -d @omnisci-sink-connector.json http://localhost:8083/connectors -H "Content-Type: application/json" ``` Next, create a record in the `orders` topic ```bash bin/kafka-avro-console-producer \ --broker-list localhost:9092 --topic orders \ --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"id","type":"int"},{"name":"product", "type": "string"}, {"name":"quantity", "type": "int"}, {"name":"price", "type": "float"}]}' ``` The console producer is waiting for input. Copy and paste the following record into the terminal: ```bash {"id": 999, "product": "foo", "quantity": 100, "price": 50} ``` To verify the data in HEAVY-AI, log in to the Docker container using the following command: ```bash docker exec -it bash ``` Once you are inside the Docker container, launch omnisql: ```bash bin/omnisql ``` When prompted for a password, enter `HyperInteractive`. Finally, run the following SQL query to verify the records: ```bash omnisql> select * from orders; foo|50.0|100|999 ``` #### Avro converter example 1. Run the demo app with the `basic-auth` Spring profile. ```bash mvn spring-boot:run -Dspring.profiles.active=basic-auth ``` 2. Create a `http-sink.properties` file with the following contents: ```text name=AvroHttpSink topics=avro-topic tasks.max=1 connector.class=io.confluent.connect.http.HttpSinkConnector # key/val converters key.converter=io.confluent.connect.avro.AvroConverter value.converter=io.confluent.connect.avro.AvroConverter # licensing for local single-node Kafka cluster confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 # http sink connector configs http.api.url=http://localhost:8080/api/messages auth.type=BASIC connection.user=admin connection.password=password ``` Note that you should publish Avro messages to the `avro-topic` instead of to the string messages shown in the [Quick start](#http-connector-quickstart). 3. Run and validate the connector as described in the [Quick start](#http-connector-quickstart). #### Header forwarding example 1. Run the demo app with the `basic-auth` Spring profile. ```bash mvn spring-boot:run -Dspring.profiles.active=basic-auth ``` 2. Create a `http-sink.properties` file with the following contents: ```text name=HttpSinkBasicAuth topics=http-messages tasks.max=1 connector.class=io.confluent.connect.http.HttpSinkConnector # key/val converters key.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.storage.StringConverter # licensing for local single-node Kafka cluster confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 # connect reporter required bootstrap server reporter.bootstrap.servers=localhost:9092 reporter.result.topic.name=success-responses reporter.result.topic.replication.factor=1 reporter.error.topic.name=error-responses reporter.error.topic.replication.factor=1 # http sink connector configs http.api.url=http://localhost:8080/api/messages auth.type=BASIC connection.user=admin connection.password=password headers=Forward-Me:header_value|Another-Header:another_value ``` 3. Run and validate the connector as described in the [Quick start](#http-connector-quickstart). ### Key and topic substitution example 1. Run the demo app with the `simple-auth` Spring profile. ```bash mvn spring-boot:run -Dspring.profiles.active=simple-auth ``` 2. Create a `http-sink.properties` file with the following contents: ```text name=KeyTopicSubstitution topics=key-val-messages tasks.max=1 connector.class=io.confluent.connect.http.HttpSinkConnector # key/val converters key.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.storage.StringConverter # licensing for local single-node Kafka cluster confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 # connect reporter required bootstrap server reporter.bootstrap.servers=localhost:9092 reporter.result.topic.name=success-responses reporter.result.topic.replication.factor=1 reporter.error.topic.name=error-responses reporter.error.topic.replication.factor=1 # http sink connector configs auth.type=NONE confluent.topic.bootstrap.servers=localhost:9092 reporter.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 http.api.url=http://localhost:8080/api/messages/${topic}/${key} ``` 3. Produce a set of messages with keys and values. ```bash confluent local produce key-val-topic --property parse.key=true --property key.separator=, > 1,value > 2,another-value ``` 4. Run and validate the connector as described in the [Quick start](#http-connector-quickstart). You can run `curl localhost:8080/api/messages | jq` to see that the messages key and topic were saved. ### Delete behavior on null values example 1. Run the demo app with the `simple-auth` Spring profile. ```bash mvn spring-boot:run -Dspring.profiles.active=basic-auth ``` 2. Create a `http-sink.properties` file with the following contents: ```text name=DeleteNullHttpSink topics=http-messages tasks.max=1 connector.class=io.confluent.connect.http.HttpSinkConnector # key/val converters key.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.storage.StringConverter # licensing for local single-node Kafka cluster confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 # connect reporter required bootstrap server reporter.bootstrap.servers=localhost:9092 reporter.result.topic.name=success-responses reporter.result.topic.replication.factor=1 reporter.error.topic.name=error-responses reporter.error.topic.replication.factor=1 # http sink connector configs http.api.url=http://localhost:8080/api/messages auth.type=BASIC connection.user=admin connection.password=password behavior.on.null.values=delete ``` 3. Publish messages to the topic that have keys and values. ```bash confluent local produce http-messages --property parse.key=true --property key.separator=, > 1,message-value > 2,another-message ``` 4. Run and validate the connector as described in the [Quick start](#http-connector-quickstart). You can check for messages in the demo API with the following command: `curl http://localhost:8080/api/messages -H 'Authorization: Basic YWRtaW46cGFzc3dvcmQ=' | jq` 5. Publish messages to the topic that have keys with null values (tombstones). Note that this cannot be done with `confluent local produce` but there is an API in the demo app to send tombstones. ```text curl -X POST \ 'localhost:8080/api/tombstone?topic=http-messages&key=1' \ -H 'Authorization: Basic YWRtaW46cGFzc3dvcmQ=' ``` 6. Validate that the demo app deleted the messages. ```text curl http://localhost:8080/api/messages \ -H 'Authorization: Basic YWRtaW46cGFzc3dvcmQ=' | jq ``` ## Low importance `redo.log.corruption.topic` : The name of the Kafka topic where the connector records events that describe errors and corrupted portions of the Oracle redo log (missed data). This can optionally use the [template variables](overview.md#connect-oracle-cdc-source-variables) `${connectorName}`, `${databaseName}`, and `${schemaName}`. A blank topic name (the default) designates that this information is not written to Kafka. * Type: string * Default: blank * Importance: low `redo.log.consumer.fetch.min.bytes` : The minimum amount of data the server should return for a fetch request. If insufficient data is available the request waits for the minimum bytes of data to accumulate before answering the request. The default setting of `1` byte means that fetch requests are answered as soon as a single byte of data is available (the fetch request times out waiting for data to arrive). Setting this to something greater than `1` causes the server to wait for a larger amount of data to accumulate, which can improve server throughput at the cost of additional latency. * Type: int * Default: `1` * Importance: low `redo.log.consumer.max.partition.fetch.bytes` : The maximum amount of per-partition data the server will return. Records are fetched in batches by the consumer. If the first record batch (in the first fetched non-empty partition) is larger than this limit, the batch will still be returned to ensure that the consumer can make progress. (This is not an absolute maximum.) The maximum record batch size accepted by the broker is defined using `message.max.bytes` in the Kafka broker configuration or `max.message.bytes` in the topic configuration. See `fetch.max.bytes` for limiting the consumer request size. * Type: int * Default: `1048576` * Importance: low `redo.log.consumer.fetch.max.bytes` : The maximum amount of data the server should return for a fetch request. Records are fetched in batches by the consumer. If the first record batch (in the first non-empty partition of the fetch) is larger than this value, the record batch will still be returned to ensure that the consumer can make progress. (This is not an absolute maximum.) The maximum record batch size accepted by the broker is defined using `message.max.bytes` in the Kafka broker configuration or `max.message.bytes` in the topic configuration. Note that the consumer performs multiple fetches in parallel. * Type: long * Default: `52428800` * Importance: low `redo.log.consumer.max.poll.records` : The maximum number of records returned in a single call to poll(). * Type: int * Default: `500` * Importance: low `redo.log.consumer.request.timeout.ms` : Controls the maximum amount of time the client waits for the request response. If the response is not received before the timeout elapses, the client resends the request (if necessary) or fails the request if retries are exhausted. * Type: int * Default: `30000` * Importance: low `redo.log.consumer.receive.buffer.bytes` : The size of the TCP receive buffer (SO_RCVBUF) to use when reading data. If the value is `-1`, the Operating System default buffer size is used. * Type: int * Default: `65536` * Importance: low `redo.log.consumer.send.buffer.bytes` : The size of the TCP send buffer (SO_SNDBUF) to use when sending data. If the value is `-1`, the OS default will be used * Type: int * Default: `131072` * Importance: low `behavior.on.dictionary.mismatch` : Specifies the desired behavior when the connector is not able to parse the value of a column due to a dictionary mismatch caused by DDL statement. This can happen if the `online` dictionary mode is specified but the connector is streaming historical data recorded before DDL changes occurred. The default option `fail` will cause the connector task to fail. The `log` option will log the unparsable statement and skip the problematic record without failing the connector task. * Type: string * Default: `fail` * Importance: low `behavior.on.unparsable.statement` : Specifies the desired behavior when the connector encounters a SQL statement that could not be parsed. The default option `fail` will cause the connector task to fail. The `log` option will log the unparsable statement and skip the problematic record without failing the connector task. * Type: string * Default: `fail` * Importance: low `oracle.dictionary.mode` : The dictionary handling mode used by the connector. See [Address DDL Changes in Oracle Database for Confluent Platform](ddl-changes.md#connect-oracle-ddl-changes) for more information. * Type: string * Default: `auto` * Valid Values: `auto`, `online`, or `redo_log` - `auto`: The connector uses the dictionary from the online catalog until a DDL statement to evolve the table schema is encountered. At which point, the connector starts using the dictionary from archived redo logs. Once the DDL statement has been processed, the connector reverts back to using the online catalog. Use this mode if DDL statements are expected. - `online`: The connector always uses the online dictionary catalog. Use `online` mode if no DDL statements are expected as DDL statements aren’t supported in `online` mode. - `redo_log`: The connector always uses the dictionary catalog from archived redo logs. Use this mode if you can not access the online redo log. * Importance: low `log.mining.archive.destination.name` : The name of the archive log destination to use when mining archived redo logs. You can configure the connector to use a specific destination using this property (for example, LOG_ARCHIVE_DEST_2). This is only applicable for Oracle database versions 19c and later. * Type: string * Default: “” * Importance: low `record.buffer.mode` : Where to buffer records that are part of the transaction but may not yet be committed. #### IMPORTANT Database record buffering mode is not supported in Oracle Database version 19c and later. Use connector record buffering mode instead. * Type: string * Default: `connector` * Valid Values: `connector`, or `database` - `connector`: buffer uncommitted transactions in connector memory. - `database`: buffer the records in database. This option uses the `COMMITTED_DATA_ONLY` flag to LogMiner so that the connector only receives committed records. Transactions that are rolled back or in-progress are filtered out, as are internal redo records. Use this option if the worker where redo log processing task (task 0) is running is memory constrained so you would rather do buffering in the database. Note, though, that this option increases the database memory usage to stage all redo records within a single transaction in memory until LogMiner finds the commit record for that transaction. Therefore, it is possible to exhaust memory. * Importance: low `max.batch.timeout.ms` : The maximum time to wait for a record before returning an empty batch. Must be at least 1000 milliseconds (one second). The default is 60000 milliseconds (one minute). #### IMPORTANT - Not in active support. Use poll.linger.ms instead. * Type: long * Default: `60000` * Importance: low `max.buffer.size` : The maximum number of records from all snapshot threads and from the redo log that can be buffered into batches. The default `0` means a buffer size will be computed from the maximum batch size (`max.batch.size`) and number of threads (`snapshot.threads.per.task`). * Type: int * Default: `0` * Importance: low `lob.topic.name.template` : The template that defines the name of the Kafka topic where the connector writes LOB objects. The value can be a constant if the connector writes all LOB objects from all captured tables to one topic. Or, the value can include any supported [template variables](overview.md#connect-oracle-cdc-source-variables) (for example, `${columnName}`, `${databaseName}`, `${schemaName}`, `${tableName}`, and `${connectorName}`). The default is empty, which ignores all LOB type columns if any exist on captured tables. Special-meaning characters `\`, `$`, `{`, and `}` must be escaped with `\` when not intended to be part of a template variable. Any character that is not a valid character for topic name is replaced by an underscore in the topic name. * Type: string * Default: “” * Valid Values: [Template variables](overview.md#connect-oracle-cdc-source-variables) which resolve to a valid topic name or a blank string. Valid topic names consist of `1` to `249` alphanumeric, `+`, `.`, `_`, `\`, and `-` characters. * Importance: low `enable.large.lob.object.support` : If `true`, the connector will support large LOB objects that are split across multiple redo log records. The connector will emit commit messages to the redo log topic and use these commit messages to track when a large LOB object can be emitted to the LOB topic. * Type: boolean * Default: false * Importance: low `log.sensitive.data` : If `true`, logs sensitive data (such as customer records, SQL queries or exception traces containing sensitive data). Set this to true only in exceptional scenarios where logging sensitive data is acceptable and is necessary for troubleshooting. * Type: boolean * Default: false * Importance: low `numeric.mapping` : Map NUMERIC values by precision and optionally scale to primitive or decimal types. * Use `none` if all NUMERIC columns are to be represented by Connect’s DECIMAL logical type. * Use `best_fit_or_decimal` if NUMERIC columns should be cast to Connect’s primitive type based upon the column’s precision and scale. If the precision and scale exceed the bounds for any primitive type, Connect’s DECIMAL logical type will be used instead, and the values will be represented in binary form within the change events. * Use `best_fit_or_double` if NUMERIC columns should be cast to Connect’s primitive type based upon the column’s precision and scale. If the precision and scale exceed the bounds for any primitive type, Connect’s FLOAT64 type will be used instead. * Use `best_fit_or_string` if NUMERIC columns should be cast to Connect’s primitive type based upon the column’s precision and scale. If the precision and scale exceed the bounds for any primitive type, Connect’s STRING type will be used instead. * Use `precision_only` to map NUMERIC columns based only on the column’s precision assuming that column’s scale is 0. The `none` option is the default, but may lead to serialization issues since Connect’s DECIMAL type is mapped to its binary representation. One of the `best_fit_or` options will often be preferred. For backwards compatibility reasons, the `best_fit` option is also available. It behaves the same as `best_fit_or_decimal`. Updating this would require deletion of the table topic and the registered schemas if using non json `value.converter`. * Type: string * Default: `none` * Importance: low `numeric.default.scale` : The default scale to use for numeric types when the scale cannot be determined. * Type: int * Default: 127 * Importance: low `oracle.date.mapping` : Map Oracle DATE values to Connect types. * Use `date` if all the `DATE` columns are to be represented by Connect’s Date logical type. * Use `timestamp` if the `DATE` columns should be cast to Connect’s Timestamp. The `date` option is the default value for backward compatibility. Despite the name similarity, Oracle `DATE` type has different semantics than Connect Date. `timestamp` will often be preferred for semantic similarity. * Type: string * Default: `date` * Importance: low `emit.tombstone.on.delete` : If true, delete operations emit a tombstone record with null value. * Type: boolean * Default: false * Importance: low `oracle.fan.events.enable` : Whether the connection should allow using Oracle RAC Fast Application Notification (FAN) events. This is disabled by default, meaning FAN events will not be used even if they are supported by the database. You should only be enable this feature when using Oracle RAC set up with FAN events. Enabling the feature may cause connection issues when the database is not set up to use FAN events. * Type: boolean * Default: false * Importance: low `table.task.reconfig.checking.interval.ms` : The interval for the background monitoring thread to examine changes to tables and reconfigure table placement if necessary. The default is 300000 milliseconds (5 minutes). * Type: long * Default: `300000` * Importance: low `table.rps.logging.interval.ms` : The interval for the background thread to log current requests per second (RPS) for each table. * Type: long * Default: `60000` * Importance: low `log.mining.end.scn.deviation.ms` : Calculates the end SCN of log mining sessions as the approximate SCN that corresponds to the point in time that is `log.mining.end.scn.deviation.ms` milliseconds before the current SCN obtained from the database. The default value is set to 3 seconds on RAC environments, and is not set otherwise. This configuration is applicable only for Oracle database versions 19c and later. Setting this configuration to a lower value on a RAC environment introduces the potential for data loss at high load. A higher value increases the end-to end-latency for change events. * Type: long * Default: `0` for single node and `3000` for RAC environments * Importance: low `output.before.state.field` : The name of the field in the change record written to Kafka that contains the before state of changed database rows for an update operation. A blank value signals that this field should not be included in the change records. For more details, see [Before state for update operation](overview.md#before-state-for-update-operation). * Type: string * Default: “” * Importance: low `output.table.name.field` : The name of the field in the change record written to Kafka that contains the fully-qualified name of the affected Oracle table. A blank value signals that this field should not be included in the change records. Use unescaped `.` characters to designate nested fields within structs, or prefix with `header:` to write the fully-qualified name of the affected Oracle table as a header with the given name. * Type: string * Default: `table` * Importance: low `output.scn.field` : The name of the field in the change record written to Kafka that contains the Oracle System Change Number (SCN) where this change was made. A blank value indicates the field should not be included in the change records. Use unescaped `.` characters to designate nested fields within structs, or prefix with `header:` to write the SCN as a header with the given name. * Type: string * Default: `scn` * Importance: low `output.commit.scn.field` : The name of the field in the change record written to Kafka that contains the Oracle System Change Number (SCN) when this transaction was committed. An empty value indicates that the field should not be included in the change records. * Type: string * Default: “” * Importance: low `output.op.type.field` : The name of the field in the change record written to Kafka that contains the operation type for this change event. A blank value indicates the field should not be included in the change records. Use unescaped `.` characters to designate nested fields within structs, or prefix with `header:` to write the operation type as a header with the given name. * Type: string * Default: `op_type` * Importance: low `output.op.ts.field` : The name of the field in the change record written to Kafka that contains the operation timestamp for the change event. A blank value indicates the field should not be included in the change records. Use unescaped `.` characters to designate nested fields within structs, or prefix with `header:` to write the operation timestamp as a header with the given name. * Type: string * Default: `op_ts` * Importance: low `output.current.ts.field` : The name of the field in the change record written to Kafka that contains the current timestamp of the Kafka Connect worker when this change event was processed. A blank value indicates the field should not be included in the change records. Use unescaped `.` characters to designate nested fields within structs, or prefix with `header:` to write the current timestamp as a header with the given name. * Type: string * Default: `current_ts` * Importance: low `output.row.id.field` : The name of the field in the change record written to Kafka that contains the row ID of the changed row. A blank value indicates the field field should not be included in the change records. Use unescaped `.` characters to designate nested fields within structs, or prefix with `header:` to write the row ID as a header with the given name. * Type: string * Default: `row_id` * Importance: low `output.username.field` : The name of the field in the change record written to Kafka that contains the name of the Oracle user that executed the transaction. A blank value indicates the field should not be included in the change records. Use unescaped `.` characters to designate nested fields within structs, or prefix with `header:` to write the username as a header with the given name. * Type: string * Default: `username` * Importance: low `output.redo.field` : The name of the field in the change record written to Kafka that contains the original redo data manipulation language (DML) statement from which this change record was created. A blank value indicates the field should not be included in the change records. Use unescaped `.` characters to designate nested fields within structs, or prefix with `header:` to write the username as a header with the given name. * Type: string * Default: “” * Importance: low `output.undo.field` : The name of the field in the change record written to Kafka that contains the original undo DML statement that effectively undoes this change and represents the “before” state of the row. A blank value indicates the field should not be included in the change records. Use unescaped `.` characters to designate nested fields within structs, or prefix with `header:` to write the username as a header with the given name. * Type: string * Default: “” * Importance: low `output.op.type.read.value` : The value of the operation type for a read (snapshot) change event. By default this is `R` (read). * Type: string * Default: `R` * Importance: low `output.op.type.insert.value` : The value of the operation type for an insert change event. By default this is `I` (insert). * Type: string * Default: `I` * Importance: low `output.op.type.update.value` : The value of the operation type for an update change event. By default this is `U` (update). * Type: string * Default: `U` * Importance: low `output.op.type.delete.value` : The value of the operation type for a delete change event. By default this is `D` (delete). * Type: string * Default: `D` * Importance: low `output.op.type.truncate.value` : The value of the operation type for a truncate change event. By default this is `T` (truncate). * Type: string * Default: `T` * Importance: low `redo.log.startup.polling.limit.ms` : The amount of time to wait for the redo log to be present on connector startup. This is only relevant when connector is configured to capture change events. On expiration of this wait time, the connector would move to a failed state. * Type: long * Default: 300000 * Importance: low `snapshot.by.table.partitions` : Whether the connector should perform snapshots on each table partition if the table is defined to use partitions. This is `false` by default, meaning that one snapshot is performed on each table in its entirety. * Type: boolean * Default: false * Importance: low `oracle.validation.result.fetch.size` : The fetch size to be used while querying database for validations. This will be used to query list of tables and supplemental logging level validation. * Type: int * Default: `5000` * Importance: low `redo.log.row.poll.fields.include` : A comma-separated list of fields from the V$LOGMNR_CONTENTS view to include in the redo log events. * Type: list * Importance: low `redo.log.row.poll.fields.exclude` : A comma-separated list of fields from the V$LOGMNR_CONTENTS view to exclude in the redo log events. * Type: list * Importance: low `redo.log.row.poll.username.include` : A comma-separated list of database usernames. When this property is set, the connector captures changes only from the specified set of database users. You cannot set this property along with the `redo.log.row.poll.username.exclude` property. * Type: list * Importance: low `redo.log.row.poll.username.exclude` : A comma-separated list of database usernames. When this property is set, the connector captures changes only from the specified set of database users. You cannot set this property along with the `redo.log.row.poll.username.exclude` property. * Type: list * Importance: low `db.timezone` : Default timezone to assume when parsing Oracle `DATE` and `TIMESTAMP` types for which timezone information is not available. For example, if `db.timezone=UTC`, the data in both `DATE` and `TIMESTAMP` will be parsed as if in UTC timezone. The value has to be a valid java.util.TimeZone ID. * Type: string * Default: UTC * Importance: low `db.timezone.date` : The default timezone to assume when parsing Oracle `DATE` type for which timezone information is not available. If `db.timezone.date` is set, the value of `db.timezone` for `DATE` type will be overwritten with the value in `db.timezone.date`. For example, if `db.timezone=UTC` and `db.timezone.date=America/Los_Angeles`, the data `TIMESTAMP` will be parsed as if it is in UTC timezone, and the data in `DATE` will be parsed as if in America/Los_Angeles timezone. The value has to be a valid `java.util.TimeZone` ID. * Type: string * Importance: low `oracle.supplemental.log.level` : Database supplemental logging level for connector operation. If set to `full`, the connector validates the supplemental logging level on the database is FULL and then captures snapshots and CDC events for the specified tables whenever `table.topic.name.template` is not set to `""`. When the level is set to `msl`, the connector doesn’t capture the CDC change events, rather it only captures snapshots if `table.topic.name.template` is not set to `""`. Note that this setting is ignored if the `table.topic.name.template` is set to `""` as the connector will only capture redo logs. This setting defaults to `full` supplemental logging level mode. * Type: string * Default: full * Valid Values: [msl, full] * Importance: low `ldap.url` : The connection URL of LDAP server if using OID based LDAP. * Type: string * Importance: low `ldap.security.principal` : The login principal or user if using SIMPLE Authentication for LDAP. * Type: string * Importance: low `ldap.security.credentials` : The login password for principal if using SIMPLE Authentication for LDAP. * Type: string * Importance: low `oracle.ssl.truststore.file` : If using SSL for encryption and server authentication, set this to the location of the trust store containing server certificates that should be trusted. * Type: string * Default: “” * Importance: low `oracle.ssl.truststore.password` : If using SSL for encryption and server authentication, the password of the trust store containing server certificates that should be trusted. * Type: string * Default: “” * Importance: low `oracle.kerberos.cache.file` : If using Kerberos 5 authentication, set this to the location of the Kerberos 5 ticket cache file on all the Connect workers. * Type: string * Default: “” * Importance: low `log.sensitive.data` : If set to `true`, the connector logs sensitive data–such as customer records, SQL queries, and exception traces containing sensitive data. Confluent recommends you set this parameter to `true` only in cases where logging sensitive data is acceptable and necessary for troubleshooting. * Type: boolean * Default: false * Importance: low `retry.error.codes` : A comma-separated list of Oracle error codes (for example, `12505, 12528,...`) that the connector retries up to the time defined by the `max.retry.time.ms` parameter. By default, the connector retries in case of a recoverable or transient SQL exception and on certain Oracle error codes. * Type: list * Importance: low `enable.metrics.collection` : If set to `true`, the connector records metrics that can be used to gain insight into the connector and troubleshoot issues. These metrics can be accessed using Java Management Extensions (JMX). * Type: boolean * Default: false * Importance: low #### NOTE Note the following: * `salesforce.consumer.key` and `salesforce.consumer.secret` are required properties used for OAuth2 secure authentication by Salesforce.com. Additional information and tutorials are available at [salesforce.com](https://developer.salesforce.com/docs/atlas.en-us.api_streaming.meta/api_streaming/code_sample_auth_oauth.htm). * Change the `confluent.topic.bootstrap.servers` property to include your broker address(es) and change the `confluent.topic.replication.factor` to `3` for staging or production use. * Set the following connector configuration properties to enable OAuth JWT bearer token support: - `salesforce.username` - `salesforce.consumer.key` - `salesforce.jwt.keystore.path` - `salesforce.jwt.keystore.password` Run the connector with this configuration. ```bash confluent local load SalesforceCdcSourceConenctor --config salesforce-cdc-source.properties ``` Confirm that the connector is in a `RUNNING` state. ```bash confluent local status SalesforceCdcSourceConenctor ``` ## Quick Start In this quick start, the Salesforce Bulk API Source connector is used to import data from Salesforce to Kafka. Use the following steps to get started: 1. Create Salesforce developer account using this [link](https://developer.salesforce.com/signup) if you don’t have it. 2. Add records to the objects by clicking on App Launcher and selecting the required Salesforce object. 3. Install the connector by running the following command from your Confluent Platform installation directory: ```bash confluent connect plugin install confluentinc/kafka-connect-salesforce-bulk-api:latest ``` Note that by default, the connector will install the plugin into the `share/confluent-hub-components` directory and add the directory to the plugin path. For the plugin path change to take effect, you must restart the Connect worker. 4. Start the services using the Confluent CLI. ```bash confluent local start ``` Every service starts in order, printing a message with its status. Note also that the `SalesforceBulkApiSourceConnector` supports a single task only. ```bash Starting Zookeeper Zookeeper is [UP] Starting Kafka Kafka is [UP] Starting Schema Registry Schema Registry is [UP] Starting Kafka REST Kafka REST is [UP] Starting Connect Connect is [UP] Starting KSQL Server KSQL Server is [UP] Starting Control Center Control Center is [UP] ``` ## Quick start The quick start guide uses ServiceNow Source connector to consume records from a ServiceNow Table and send them to Kafka. This guide assumes multi-tenant environment is used. For local testing, refer to [Running Connect in standalone mode](/kafka-connectors/self-managed/userguide.html#configuring-and-running-workers). 1. Install the connector through the [Confluent Hub Client](https://docs.confluent.io/current/connect/managing/confluent-hub/client.html). ```bash # run from your confluent platform installation directory confluent-hub install confluentinc/kafka-connect-servicenow:latest ``` 2. Start the Confluent Platform. ```bash confluent local start ``` 3. Check the status of all services. ```bash confluent local services status ``` 4. Create a `servicenow-source.json` file with the following contents: ```bash // substitute <> with your config { "name": "ServiceNowSourceConnector", "config": { "connector.class": "io.confluent.connect.servicenow.ServiceNowSourceConnector", "kafka.topic": "topic-servicenow", "servicenow.url": "https://.service-now.com/", "tasks.max": "1", "servicenow.table": "", "servicenow.user": "", "servicenow.password": "", "servicenow.since": "2019-01-01", "key.converter": "org.apache.kafka.connect.json.JsonConverter", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "confluent.topic.bootstrap.servers": ":9092", "confluent.license": "", // leave it empty for evaluation license "poll.interval.s": "10", "confluent.topic.replication.factor": "1" } } ``` 5. Load the ServiceNow Source connector by posting configuration to Connect REST server. ```bash confluent local load servicenow --config servicenow-source.json ``` 6. Confirm that the connector is in a `RUNNING` state. ```bash confluent local status ServiceNowSourceConnector ``` 7. Create one record to ServiceNow. ```bash curl -X POST \ https://.service-now.com/api/now/table/ \ -H 'Accept: application/json' \ -H 'Authorization: Basic ' -H 'Content-Type: application/json' \ -H 'cache-control: no-cache' \ -d '{"short_description": "This is test"}' ``` 8. Confirm the messages were delivered to the `topic-servicenow` topic in Kafka. ```bash confluent local consume topic-servicenow --from-beginning ``` ## Create Kafka topic You can create a topic using a KafkaTopic CR in an on-prem or Confluent Cloud Kafka cluster: ```yaml kind: KafkaTopic metadata: name: --- [1] namespace: --- [2] spec: replicas: partitionCount: kafkaClusterRef: --- [3] kafkaRestClassRef: --- [4] kafkaRest: endpoint: --- [5] kafkaClusterID: --- [6] authentication: type: --- [7] basic: --- [8] bearer: --- [9] oauth: --- [10] configs: --- [11] ``` * [1] The topic name. If both `metadata.name` and `spec.name` are specified, `spec.name` is used. * [2] The namespace for the topic. * Use `kafkaClusterRef` ([3]), `kafkaRestClassRef` ([4]), or `kafkaRest.endpoint` ([5]) to explicitly specify the Confluent REST Class. The order of precedence is [4], [5], and [3]. If none of the above is set, it performs an auto discovery of the Kafka in the same namespace. * [3] Name of the Kafka cluster. * [4] Name of the KafkaRestClass CR. * [5] Confluent REST Class endpoint. See [Manage Confluent Admin REST Class for Confluent Platform Using Confluent for Kubernetes](co-manage-rest-api.md#co-manage-rest-api) for more information. * [6] ID of the Kafka cluster. Required when creating a topic in Confluent Cloud. * [7] If authentication is required for the Confluent Admin REST Class, specify the authentication type. `basic`, `bearer`, `mtls`, and `oauth` are supported. If you specified the Confluent Admin REST Class using `kafkaRestClassRef`, you do not have to set the authentication in `kafkaRest`. Otherwise specify the authentication in `kafkaRest`. * [8] For information about the basic settings, see [Basic authentication](co-authenticate-cp.md#co-authenticate-cp-basic). * [9] For information about the bearer settings, see [Bearer authentication](co-authenticate-kafka.md#co-authenticate-mds-bearer), * [10] For information about the OAuth settings, see [OAuth/OIDC authentication](co-authenticate-cp.md#co-authenticate-cp-oauth). * [11] Specify additional topic configuration settings in key and value pairs, for example, `cleanup.policy: "compact"`. For the list of available topics configuration parameters, see [Kafka Topics Configurations](https://docs.confluent.io/platform/current/installation/configuration/topic-configs.html). ## Configure external access to other Confluent Platform components using node ports To configure other Confluent components with node ports: 1. Set the following in the component CRs and apply the configuration using the `kubectl apply -f` command: ```yaml spec: externalAccess: type: nodePort nodePort: nodePortOffset: --- [1] host: --- [2] sessionAffinity: --- [3] sessionAffinityConfig: --- [4] clientIP: timeoutSeconds: --- [5] configOverrides: server: - advertised.listeners= --- [6] ``` The access endpoint of each Confluent Platform component will be: `:` * [1] Required. The value should be in the range between 30000 and 32767, inclusive. If you change this value on a running cluster, you must roll the cluster. * [2] Required. Specify the FQDN that will be used to configure all advertised listeners. If you change this value on a running cluster, you must roll the cluster. * [3] Required for consumer REST Proxy to enable client IP-based session affinity. For REST Proxy to be used for Kafka consumers, set to `ClientIP`. See [Kubernetes Service](https://kubernetes.io/docs/concepts/services-networking/service/#virtual-ips-and-service-proxies) for more information about session affinity. * [4] Contains the configurations of session affinity if set `sessionAffinity: ClientIP` in [3]. * [5] Specifies the seconds of `ClientIP` type session sticky time. The value must be bigger than `0` and less than or equal to `86400` (1 day). Default value is `10800` (3 hours). * [6] Set to the external DNS name used for node port. This configuration is used to generate absolute URLs in V3 responses. The HTTP and HTTPS protocols are supported. 2. Create firewall rules to allow connections at the NodePort range that you plan to use. For the steps to create firewall rules, see [Using Google Cloud firewall rules](https://cloud.google.com/vpc/docs/using-firewalls). 3. Verify the NodePort services are correctly created by listing the services in the namespace using the following command: ```bash kubectl get services -n | grep NodePort ``` For a tutorial scenario on configuring external access using NodePort, see the [quickstart tutorial for using node port](https://github.com/confluentinc/confluent-kubernetes-examples/tree/master/networking/external-access-nodeport-deploy). ## Configure external access to other Confluent Platform components using routes The external clients can connect to other Confluent Platform components using routes. The access endpoint of each Confluent Platform component is: `.:443` For example, in the `example.com` domain with TLS enabled, you access the Confluent Platform components at the following endpoints: * `https://connect.example.com:443` * `https://replicator.example.com:443` * `https://schemaregistry.example.com:443` * `https://ksqldb.example.com:443` * `https://controlcenter.example.com:443` **To allow external access to Confluent components using routes:** 1. Enable TLS for the component as described [Configure Network Encryption for Confluent Platform Using Confluent for Kubernetes](co-network-encryption.md#co-network-encryption). 2. Set the following in the component custom resource (CR) and apply the configuration: ```yaml spec: externalAccess: type: route route: domain: --- [1] prefix: --- [2] wildcardPolicy: --- [3] annotations: --- [4] ``` * [1] Required. Set `domain` to the domain name of your Kubernetes cluster. If you change this value on a running cluster, you must roll the cluster. * [2] Optional. Set `prefix` to change the default route prefixes. The default is the component name, such as `controlcenter`, `connector`, `replicator`, `schemaregistry`, `ksql`. The value is used for the DNS entry. The component DNS name becomes `.`. If not set, the default DNS name is `.`, for example, `controlcener.example.com`. You may want to change the default prefixes for each component to avoid DNS conflicts when running multiple Kafka clusters. If you change this value on a running cluster, you must roll the cluster. * [3] Optional. It defaults to `None` if not configured. Allowed values are `Subdomain` and `None`. * [4] Required for REST Proxy to be used as consumers. Otherwise, optional. Openshift routes support cookie based sticky sessions by default. To use the clientIp based session affinity that REST Proxy requires: * Disable cookies by setting the annotation `haproxy.router.openshift.io/disable_cookies: true` * Enable sourceIP based load balancing by setting the annotation `haproxy.router.openshift.io/balance: source` 3. Apply the configuration: ```bash oc apply -f ``` 4. Add a DNS entry for each Confluent Platform component that you added a route to. Once the routes are created, you add a DNS entry associated with component routes to your DNS table (or whatever method you use to get DNS entries recognized by your provider environment). You need the following to derive Confluent Platform component DNS entries: * THe domain name of your OpenShift cluster as set in Step #1. * External IP of the OpenShift router load balancer * The component `prefix` if set in Step #1 above. Otherwise, the default component name. A DNS name is made up of the `prefix` and the `domain` name. For example, `controlcenter.example.com`. To add DNS entires for Confluent components: 1. Get the IP address of the OpenShift router load balancer. The HAProxy load balancer serves as the router for route services, and generally, HAProxy runs in the `openshift-ingress` namespace. ```bash oc get svc --namespace openshift-ingress ``` An example output: ```text NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE router-default LoadBalancer 172.30.84.52 20.189.181.8 80:31294/TCP,443:32145/TCP 42h router-internal-default ClusterIP 172.30.184.233 80/TCP,443/TCP,1936/TCP 42h ``` 2. Get the component DNS names: ```bash oc get routes | awk '{print $2}' ``` 3. Use the external IP to point to all the DNS names for the Confluent components in your DNS service provider. The example below shows the DNS table entry, using: * Domain: `example.com` * Default component prefixes * Load balancer router external IP: `20.189.181.8` ```text 20.189.181.8 connect.example.com controlcenter.example.com ksqldb.example.com schemaregistry.example.com ``` 5. [Validate the connections](#co-routes-validate). ## Set up a schema exporter When the Schema Registry is running, set up a schema exporter on the Confluent Platform Schema Registry to sync the Confluent Platform Schema Registry to the Confluent Cloud Schema Registry. For details about the SchemaExporter CR, see [Create schema exporter](co-link-schemas.md#co-create-schema-exporter). 1. Set up a SchemaExporter CR and apply it on the Confluent Platform Schema Registry to sync the Confluent Platform Schema Registry to the Confluent Cloud Schema Registry. For more information about the SchemaExporter CR properties, see [Create schema exporter](co-link-schemas.md#co-create-schema-exporter). ```yaml kind: SchemaExporter metadata: name: --- [1] spec: sourceCluster: schemaRegistryClusterRef: name: --- [2] namespace: --- [3] destinationCluster: schemaRegistryRest: endpoint: --- [4] authentication: type: --- [5] basic: secretRef: --- [6] subjects: --- [7] contextName: --- [8] ``` * [1] Required. The name of the schema exporter. This name must match the `exporterName` value in the SchemaImporter CR (`SchemaImporter.spec.exporterName`) you set in [Set up a schema importer](#co-setup-schema-importer). * [2] The name of the source Schema Registry cluster. * [3] The namespace of the source Schema Registry cluster. * [4] The endpoint of the Confluent Cloud Schema Registry. * [5] The authentication type of the Confluent Cloud Schema Registry. The supported type is `basic`. * [6] The API key of the Confluent Cloud Schema Registry. * [7] The subjects to export. The default value of subjects is `"*"`, which denotes all subjects in the default context. To export all subjects from Confluent Platform to Confluent Cloud, you can set the subjects to `:*:` and contextType as `NONE`. You can choose to create multiple exporters and importers if using Scenario 2 listed in [Use Schema Registry in a hybrid setup](http://docs.confluent.io/platform/current/usm/usm-schema.html#configuration-scenarios). * [8] The context name of the Schema Registry cluster. Use an empty string for the default context. This value must match `SchemaRegistry.spec.unifiedStreamManager.context` for the Unified Stream Manager Schema Registry to work as expected. An example SchemaExporter CR snippet: ```yaml apiVersion: platform.confluent.io/v1beta1 kind: SchemaExporter metadata: name: schema-exporter namespace: my-namespace-1 spec: sourceCluster: schemaRegistryClusterRef: name: schemaregistry namespace: my-namespace-2 destinationCluster: schemaRegistryRest: endpoint: https:// authentication: type: basic basic: secretRef: cc-sr-credential subjects: [":*:"] contextName: my-context ``` 2. Wait until the exporter enters the RUNNING state, which confirms that schemas are synced; in-flight changes are handled automatically. 3. Ensure that the exporter destination URL and any forwarding settings match those used by the forwarder/Unified Stream Manager configuration. ## Upgrade CFK 1. Review [Upgrade considerations and troubleshooting](co-upgrade-overview.md#co-upgrade-considerations-cfk) and address any required steps. 2. If you are upgrading your CFK 2.x to 3.x to deploy and manage Confluent Platform 7.x: * For Log4J, set the annotation for the components you want to use Log4J: ```bash kubectl annotate \ platform.confluent.io/use-log4j1=true \ --namespace ``` The `platform.confluent.io/use-log4j1=true` annotation is required to use Confluent Platform 7.x with CFK 3.0+. * To use the JAAS class path compatible with Confluent Platform 7.x in basic authentication, set the annotation, `platform.confluent.io/use-old-jetty9=true`, to your Confluent Platform component CRs, such as Control Center, Control Center (Legacy), Schema Registry, Connect, ksqlDB, and REST Proxy: ```bash kubectl annotate \ platform.confluent.io/use-old-jetty9=true \ --namespace ``` If you do not set the annotation properly, you will not be able to log into the Confluent Platform 7.x components using basic authentication, and you will get a login prompt loop when try to login to Control Center. For more information, see [Issue: JAAS class path discrepancy between CFK 3.0 and Confluent Platform 7.x](co-troubleshooting.md#co-jaas-class-change). 3. Disable resource reconciliation. To prevent Confluent Platform components from rolling restarts, temporarily disable resource reconciliation of the components in each namespace where you have deployed Confluent Platform, specifying the CR kinds and CR names: ```bash kubectl annotate connect connect \ platform.confluent.io/block-reconcile=true \ --namespace ``` ```bash kubectl annotate controlcenter controlcenter \ platform.confluent.io/block-reconcile=true \ --namespace ``` ```bash kubectl annotate kafkarestproxy kafkarestproxy \ platform.confluent.io/block-reconcile=true \ --namespace ``` ```bash kubectl annotate kafka kafka \ platform.confluent.io/block-reconcile=true \ --namespace ``` ```bash kubectl annotate ksqldb ksqldb \ platform.confluent.io/block-reconcile=true \ --namespace ``` ```bash kubectl annotate schemaregistry schemaregistry \ platform.confluent.io/block-reconcile=true \ --namespace ``` For KRaft-based Confluent Platform: ```bash kubectl annotate kraftcontroller kraftcontroller \ platform.confluent.io/block-reconcile=true \ --namespace ``` For ZooKeeper-based Confluent Platform: ```bash kubectl annotate zookeeper zookeeper \ platform.confluent.io/block-reconcile=true \ --namespace ``` 4. Add the CFK Helm repo: ```bash helm repo add confluentinc https://packages.confluent.io/helm ``` ```bash helm repo update ``` 5. Get the CFK chart. * From the Helm repo: * To get the latest CFK chart: ```bash helm pull confluentinc/confluent-for-kubernetes --untar ``` * To get a specific version of the CFK chart, get the image tag of the CFK version from [Confluent for Kubernetes image tags](co-plan.md#co-operator-image-tags), and specify the version tag with the `--version` flag: ```bash helm pull confluentinc/confluent-for-kubernetes --version --untar ``` * From a download bundle as specified in [Deploy CFK using the download bundle](co-deploy-cfk.md#co-download-bundle). 6. **IMPORTANT.** Upgrade Confluent Platform custom resource definitions (CRDs). This step is required because Helm does not support upgrading or deleting CRDs using Helm. For more information, see the [Helm documentation](https://helm.sh/docs/chart_best_practices/custom_resource_definitions/#some-caveats-and-explanations). ```bash kubectl apply -f confluent-for-kubernetes/crds/ ``` 1. If the above `kubectl apply` command returns an error similar to the below: ```text The CustomResourceDefinition "kafkas.platform.confluent.io" is invalid: metadata.annotations: Too long: must have at most 262144 bytes make: *** [install-crds] Error 1 ``` Run the following commands: ```bash kubectl apply --server-side=true -f ``` 2. If running `kubectl apply` with the `--server-side=true` flag returns an error similar to the below: ```text Apply failed with 1 conflict: conflict with "helm" using apiextensions.k8s.io/v1: .spec.versions Please review the fields above--they currently have other managers. ``` Run `kubectl apply` with an additional flag, `--force-conflicts`: ```bash kubectl apply --server-side=true --force-conflicts -f ``` 7. Upgrade CFK to 3.1.0. * If you [deployed customized CFK using the values file](co-deploy-cfk.md#co-values-file): For debugging and validating upgrades, when you have a custom values file, it is recommended that you run the `helm upgrade` command with the `--dry-run` flag to preview. The command with the `--dry-run` flag will not actually apply the changes to the cluster, but will print the Kubernetes manifests that would be applied. ```bash helm upgrade --install confluent-operator \ confluentinc/confluent-for-kubernetes \ --values \ --namespace \ --dry-run ``` After reviewing the Kubernetes manifests, run the following command to upgrade CFK: ```bash helm upgrade --install confluent-operator \ confluentinc/confluent-for-kubernetes \ --values \ --namespace ``` * If you deployed CFK without customizing the values file, run the following command to upgrade CFK: ```bash helm upgrade --install confluent-operator \ confluentinc/confluent-for-kubernetes \ --namespace ``` * If you deployed CFK from a download bundle, upgrade CFK as specified in [Deploy CFK using the download bundle](co-deploy-cfk.md#co-download-bundle). Note that when using the CFK global license (`globalLicense: true` in the component CRs), you need to specify the license key in the `helm upgrade` command using the `--set licenseKey=` flag. For details, see [Update CFK global license](co-license.md#co-licence-global-level). ```bash helm upgrade --install confluent-operator \ confluentinc/confluent-for-kubernetes \ --values values.yaml \ --set licenseKey= ``` 8. Alternatively, upgrade CFK to a specific version, such as a hotfix or a patch version. * If you [deployed CFK using the values file](co-deploy-cfk.md#co-values-file), in your `values.yaml`, update the CFK `image.tag` to the image tag of the CFK version specified in [Confluent for Kubernetes image tags](co-plan.md#co-operator-image-tags): ```yaml image: tag: "" ``` And run the following command to upgrade CFK: ```bash helm upgrade --install confluent-operator \ confluentinc/confluent-for-kubernetes \ --values \ --namespace ``` * If you did not use a customized `values.yaml` for CFK deployment, run the following command to upgrade CFK to a specific version, using the image tag of the CFK version specified in [Confluent for Kubernetes image tags](co-plan.md#co-operator-image-tags): ```bash helm upgrade --install confluent-operator \ confluentinc/confluent-for-kubernetes \ --version --namespace ``` 9. Enable resource reconciliation for each Confluent Platform components that you disabled reconciliation in the first step above: ```bash kubectl annotate \ platform.confluent.io/block-reconcile- \ --namespace ``` 10. [Upgrade the CFK init container](#co-upgrade-init-container). # Manage Kafka Clusters Using Confluent Platform * [Overview](overview.md) * [Related content](overview.md#related-content) * [Cluster Metadata Management](../kafka-metadata/index.md) * [Overview](../kafka-metadata/overview.md) * [KRaft Overview](../kafka-metadata/kraft.md) * [The controller quorum](../kafka-metadata/kraft.md#the-controller-quorum) * [Scaling Kafka with KRaft](../kafka-metadata/kraft.md#scaling-ak-with-kraft) * [Configure KRaft](../kafka-metadata/config-kraft.md) * [Hardware and JVM requirements](../kafka-metadata/config-kraft.md#hardware-and-jvm-requirements) * [Configuration options](../kafka-metadata/config-kraft.md#configuration-options) * [Settings for other Kafka and Confluent Platform components](../kafka-metadata/config-kraft.md#settings-for-other-ak-and-cp-components) * [Generate and format IDs](../kafka-metadata/config-kraft.md#generate-and-format-ids) * [Tools for debugging KRaft mode](../kafka-metadata/config-kraft.md#tools-for-debugging-kraft-mode) * [Monitor KRaft](../kafka-metadata/config-kraft.md#monitor-kraft) * [Related content](../kafka-metadata/config-kraft.md#related-content) * [Find ZooKeeper Resources](../kafka-metadata/zk-production.md) * [Related content](../kafka-metadata/zk-production.md#related-content) * [Manage Self-Balancing Clusters](sbc/overview.md) * [Overview](sbc/index.md) * [How Self-Balancing simplifies Kafka operations](sbc/index.md#how-sbc-simplifies-ak-operations) * [Self-Balancing vs. Auto Data Balancer](sbc/index.md#sbc-vs-adb) * [How it works](sbc/index.md#how-it-works) * [Configuration and monitoring](sbc/index.md#configuration-and-monitoring) * [Replica placement and rack configurations](sbc/index.md#replica-placement-and-rack-configurations) * [Security considerations](sbc/index.md#security-considerations) * [Troubleshooting](sbc/index.md#troubleshooting) * [Related content](sbc/index.md#related-content) * [Tutorial: Adding and Remove Brokers](sbc/sbc-tutorial.md) * [Configuring and starting controllers and brokers in KRaft mode](sbc/sbc-tutorial.md#configuring-and-starting-controllers-and-brokers-in-kraft-mode) * [Prerequisites](sbc/sbc-tutorial.md#prerequisites) * [Environment variables](sbc/sbc-tutorial.md#environment-variables) * [Configure Kafka brokers](sbc/sbc-tutorial.md#configure-ak-brokers) * [Start Confluent Platform, create topics, and generate test data](sbc/sbc-tutorial.md#start-cp-create-topics-and-generate-test-data) * [(Optional) Install and configure Confluent Control Center](sbc/sbc-tutorial.md#optional-install-and-configure-c3) * [Use the command line to test rebalancing](sbc/sbc-tutorial.md#use-the-command-line-to-test-rebalancing) * [Use Control Center to test rebalancing](sbc/sbc-tutorial.md#use-c3-short-to-test-rebalancing) * [Shutdown and cleanup tasks](sbc/sbc-tutorial.md#shutdown-and-cleanup-tasks) * [(Optional) Running the other components](sbc/sbc-tutorial.md#optional-running-the-other-components) * [Related content](sbc/sbc-tutorial.md#related-content) * [Configure](sbc/configuration-options.md) * [Self-Balancing configuration](sbc/configuration-options.md#sbc-configuration) * [Self-Balancing internal topics](sbc/configuration-options.md#sbc-internal-topics) * [Required Configurations for Control Center](sbc/configuration-options.md#required-configurations-for-c3-short) * [Examples: Update broker configurations on the fly](sbc/configuration-options.md#examples-update-broker-configurations-on-the-fly) * [Monitoring the balancer with kafka-rebalance-cluster](sbc/configuration-options.md#monitoring-the-balancer-with-kafka-rebalance-cluster) * [kafka-remove-brokers](sbc/configuration-options.md#kafka-remove-brokers) * [Related content](sbc/configuration-options.md#related-content) * [Performance and Resource Usage](sbc/performance.md) * [Add brokers to expand a small cluster with a high partition count](sbc/performance.md#add-brokers-to-expand-a-small-cluster-with-a-high-partition-count) * [Test scalability of a large cluster with many partitions](sbc/performance.md#test-scalability-of-a-large-cluster-with-many-partitions) * [Repeatedly bounce the controller](sbc/performance.md#repeatedly-bounce-the-controller) * [Auto Data Balancing](rebalancer/overview.md) * [Overview](rebalancer/index.md) * [Quick Start](rebalancer/quickstart.md) * [Requirements and Limitations](rebalancer/quickstart.md#requirements-and-limitations) * [Confluent Auto Data Balancer Quick Start](rebalancer/quickstart.md#adb-full-quick-start) * [Licensing](rebalancer/quickstart.md#licensing) * [Suggested Reading](rebalancer/quickstart.md#suggested-reading) * [Tutorial: Add and Remove Brokers](rebalancer/adb-docker-tutorial.md) * [Installing and running Docker](rebalancer/adb-docker-tutorial.md#installing-and-running-docker) * [Use Docker to set up a three Node Kafka cluster](rebalancer/adb-docker-tutorial.md#use-docker-to-set-up-a-three-node-ak-cluster) * [Create a topic and generate data](rebalancer/adb-docker-tutorial.md#create-a-topic-and-generate-data) * [Related content](rebalancer/adb-docker-tutorial.md#related-content) * [Configure](rebalancer/configuration-options.md) * [confluent-rebalancer command flags](rebalancer/configuration-options.md#confluent-rebalancer-command-flags) * [Using a command-config file to talk to Kafka](rebalancer/configuration-options.md#using-a-command-config-file-to-talk-to-ak) * [Specifying Auto Data Balancer properties in a config-file](rebalancer/configuration-options.md#specifying-adb-properties-in-a-config-file) * [Example: Run the rebalancer with Security, Metrics, and License Configurations](rebalancer/configuration-options.md#example-run-the-rebalancer-with-security-metrics-and-license-configurations) * [Example: Run the rebalancer with a Separate Metrics Cluster](rebalancer/configuration-options.md#example-run-the-rebalancer-with-a-separate-metrics-cluster) * [ACLs for Auto Data Balancing](rebalancer/configuration-options.md#acls-for-auto-data-balancing) * [Tiered Storage](tiered-storage.md) * [Known limitations](tiered-storage.md#known-limitations) * [Enabling Tiered Storage on a broker](tiered-storage.md#enabling-tiered-storage-on-a-broker) * [AWS](tiered-storage.md#aws) * [GCS](tiered-storage.md#gcs) * [Azure](tiered-storage.md#azure) * [Pure Storage FlashBlade](tiered-storage.md#pure-storage-flashblade) * [Nutanix Objects](tiered-storage.md#nutanix-objects) * [NetApp Object Storage](tiered-storage.md#netapp-object-storage) * [Dell EMC ECS](tiered-storage.md#dell-emc-ecs) * [MinIO](tiered-storage.md#minio) * [Cloudian HyperStore Object Storage](tiered-storage.md#cloudian-hyperstore-object-storage) * [CEPH](tiered-storage.md#ceph) * [Scality](tiered-storage.md#scality) * [Configuring Tiered Storage to support compacted topics](tiered-storage.md#configuring-tiered-storage-to-support-compacted-topics) * [Creating a topic with Tiered Storage](tiered-storage.md#creating-a-topic-with-tiered-storage) * [Sending test messages to experiment with data storage](tiered-storage.md#sending-test-messages-to-experiment-with-data-storage) * [Best practices and recommendations](tiered-storage.md#best-practices-and-recommendations) * [Tuning](tiered-storage.md#tuning) * [Time interval for topic deletes](tiered-storage.md#time-interval-for-topic-deletes) * [Log segment sizes](tiered-storage.md#log-segment-sizes) * [ACLs on Tiered Storage internal topics](tiered-storage.md#acls-on-tiered-storage-internal-topics) * [TLS settings and troubleshooting certificates](tiered-storage.md#tls-settings-and-troubleshooting-certificates) * [Sizing brokers with Tiered Storage](tiered-storage.md#sizing-brokers-with-tiered-storage) * [Tier archiver metrics](tiered-storage.md#tier-archiver-metrics) * [Tier fetcher metrics](tiered-storage.md#tier-fetcher-metrics) * [Kafka log of tier size per partition](tiered-storage.md#kafka-log-of-tier-size-per-partition) * [Example performance test](tiered-storage.md#example-performance-test) * [Supported platforms and features](tiered-storage.md#supported-platforms-and-features) * [Configuration options](tiered-storage.md#configuration-options) * [Disabling Tiered Storage](tiered-storage.md#disabling-tiered-storage) * [Related content](tiered-storage.md#related-content) ## Authentication Mechanisms The authentication mechanism for incoming requests to Schema Registry is determined by the `confluent.schema.registry.auth.mechanism` config. Both TLS and [Jetty](https://github.com/eclipse/jetty.project) authentication mechanisms are supported. When using [Role Based Access Control](../../schema-registry/security/rbac-schema-registry.md#schemaregistry-rbac) (RBAC), Schema Registry expects HTTP Basic Auth (or token) credentials provided by the Schema Registry client for RBAC authorization. If you relied on TLS certificate authentication across Confluent Platform before enabling and configuring RBAC, be aware that you must also provide Basic Auth credentials (such as LDAP user) for Confluent Platform components other than Kafka. More specifically, for Schema Registry, you must specify the bearer token for [Use HTTP Basic Authentication in Confluent Platform](../../security/authentication/http-basic-auth/overview.md#http-basic-auth) and must include `basic.auth.user.info` and `basic.auth.credentials.source`. For details about which authentication methods to use when using RBAC, refer to [RBAC Authentication Options](../../security/authorization/rbac/overview.md#rbac-authentication-options). Here is an example properties file for Schema Registry using mTLS authentication and RBAC. ```properties listeners=https://sr:8081 kafkastore.bootstrap.servers=SSL://node1:9095,SSL://node2:9095,SSL://node2:9095 kafkastore.topic=_schemas debug=true schema.registry.resource.extension.class=io.confluent.kafka.schemaregistry.security.SchemaRegistrySecurityResourceExtension confluent.schema.registry.authorizer.class=io.confluent.kafka.schemaregistry.security.authorizer.rbac.RbacAuthorizer confluent.schema.registry.auth.mechanism=SSL kafkastore.bootstrap.servers=node1:9093,node2:9093,node3:9093 kafkastore.security.protocol=SASL_PLAINTEXT kafkastore.topic=_schemas kafkastore.sasl.mechanism=OAUTHBEARER kafkastore.sasl.login.callback.handler.class=io.confluent.kafka.clients.plugins.auth.token.TokenUserLoginCallbackHandler kafkastore.sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required \ username="kafka" \ password="secret" \ metadataServerUrls="http://node1:8090"; confluent.metadata.basic.auth.user.info=kafka:secret confluent.metadata.bootstrap.server.urls=http://node1:8090 confluent.metadata.http.auth.credentials.provider=BASIC public.key.path=/opt/cp/current/certs/public.pem confluent.schema.registry.auth.ssl.principal.mapping.rules=RULE:^CN=([a-zA-Z0-9.]*).*$/$1/L,DEFAULT ssl.client.authentication=REQUIRED ssl.client.auth=true ssl.keystore.location=/opt/cp/current/certs/sr.jks ssl.keystore.password=secret ssl.key.password=secret ssl.truststore.location=/opt/cp/current/certs/truststore.jks ssl.truststore.password=secret inter.instance.protocol=https kafkastore.ssl.endpoint.identification.algorithm= ssl.endpoint.identification.algorithm= rest.servlet.initializor.classes=io.confluent.common.security.jetty.initializer.InstallBearerOrBasicSecurityHandler ``` If the authentication mechanism is not set, all requests are rejected with a HTTP error code of 403. See [Schema Registry Authorization](authorization/index.md#confluentsecurityplugins-schema-registry-authorization) for details on how this authorization happens and how to configure it. Configure license client authentication : When using principal propagation and the following security types, you must configure client authentication for the license topic. For more information, see the following documentation: - [SASL OAUTHBEARER (RBAC) client authentication](../../security/authentication/sasl/oauthbearer/configure-clients.md#security-sasl-rbac-oauthbearer-clientconfig) - [SASL PLAIN client authentication](../../security/authentication/sasl/plain/overview.md#sasl-plain-clients) - [SASL SCRAM client authentication](../../security/authentication/sasl/scram/overview.md#sasl-scram-clients) - [mTLS client authentication](../../security/authentication/mutual-tls/overview.md#authentication-ssl-clients) Configure license client authorization : When using principal propagation and RBAC or ACLs, you must configure client authorization for the license topic. #### NOTE The `_confluent-command` internal topic is available as the preferred alternative to the `_confluent-license` topic for components such as Schema Registry, REST Proxy, and Confluent Server (which were previously using `_confluent-license`). Both topics will be supported going forward. Here are some guidelines: - New deployments (Confluent Platform 6.2.1 and later) will default to using `_confluent-command` as shown below. - Existing clusters will continue using the `_confluent-license` unless manually changed. - Newly created clusters on Confluent Platform 6.2.1 and later will default to creating the `_confluent-command` topic, and only existing clusters that already have a `_confluent-license` topic will continue to use it. - **RBAC authorization** Run this command to add `ResourceOwner` for the component user for the Confluent license topic resource (default name is `_confluent-command`). ```none confluent iam rbac role-binding create \ --role ResourceOwner \ --principal User: \ --resource Topic:_confluent-command \ --kafka-cluster ``` - **ACL authorization** Run this command to configure Kafka authorization, where bootstrap server, client configuration, service account ID is specified. This grants create, read, and write on the `_confluent-command` topic. ```none kafka-acls --bootstrap-server --command-config \ --add --allow-principal User: --operation Create --operation Read --operation Write \ --topic _confluent-command ``` ## Manage Schemas The FileStream connectors are good examples because they are simple, but they also have trivially structured data – each line is just a string. Almost all connectors will need schemas with more complex data formats. To create more complex data, you’ll need to work with the `org.apache.kafka.connect.data` API. Most structured records will need to interact with two classes in addition to primitive types: [Schema](/platform/current/connect/javadocs/javadoc/org/apache/kafka/connect/data/Schema.html) and [Struct](/platform/current/connect/javadocs/javadoc/org/apache/kafka/connect/data/Struct.html). The API documentation provides a complete reference, but here is a simple example creating a `Schema` and `Struct`: ```java Schema schema = SchemaBuilder.struct().name(NAME) .field("name", Schema.STRING_SCHEMA) .field("age", Schema.INT_SCHEMA) .field("admin", new SchemaBuilder.boolean().defaultValue(false).build()) .build(); Struct struct = new Struct(schema) .put("name", "Barbara Liskov") .put("age", 75) .build(); ``` If you are implementing a source connector, you’ll need to decide when and how to create schemas. Where possible, you should avoid recomputing them if possible. For example, if your connector is guaranteed to have a fixed schema, create it statically and reuse a single instance. However, many connectors will have dynamic schemas. One example of this is a database connector. Considering even just a single table, the schema will not be fixed for a single table over the lifetime of the connector since the user may execute an `ALTER TABLE` command. The connector must be able to detect these changes and react appropriately by creating an updated `Schema`. Sink connectors are usually simpler because they are consuming data and therefore do not need to create schemas. However, they should take just as much care to validate that the schemas they receive have the expected format. When the schema does not match – usually indicating the upstream producer is generating invalid data that cannot be correctly translated to the destination system – sink connectors should throw an exception to indicate this error to the Kafka Connect framework. When using the `AvroConverter` included with Confluent Platform, schemas are registered under the hood with Confluent Schema Registry, so any new schemas must satisfy the compatibility requirements for the destination topic. # Quick Start: Move Data In and Out of Kafka with Kafka Connect This tutorial provides a hands-on look at how you can move data into and out of Apache Kafka® without writing a single line of code. It is helpful to review the [concepts](index.md#connect-concepts) for Kafka Connect in tandem with running the steps in this guide to gain a deeper understanding. At the end of this tutorial you will be able to: * Use Confluent CLI to manage Confluent services, including starting a single connect worker in distributed mode and loading and unloading connectors. * Read data from a file and publish to a Kafka topic. * Read data from a Kafka topic and publish to file. * Integrate Schema Registry with a connector. To demonstrate the basic functionality of Kafka Connect and its integration with the Confluent Schema Registry, a few local standalone Kafka Connect processes with connectors are run. You can insert data written to a file into Kafka and write data from a Kafka topic to the console. If you are using JSON as the Connect data format, see the instructions [here](https://kafka.apache.org/documentation#quickstart_kafkaconnect) for a tutorial that does not include Schema Registry. # Kafka Connect Worker Configuration Properties for Confluent Platform The following lists many of the configuration properties related to Connect workers. The first section lists common properties that can be set in either standalone or distributed mode. These control basic functionality like which Apache Kafka® cluster to communicate with and what format data you’re working with. The next two sections list properties specific to standalone or distributed mode. For additional configuration properties see the following sections: * Connect and Schema Registry: See [Integrate Schemas from Kafka Connect in Confluent Platform](../../schema-registry/connect.md#schemaregistry-kafka-connect). * Producer configuration properties: See [Kafka Producer for Confluent Platform](../../clients/producer.md#kafka-producer). * Consumer configuration properties: See [Kafka Consumer for Confluent Platform](../../clients/consumer.md#kafka-consumer). * TLS/SSL encryption properties: See [Protect Data in Motion with TLS Encryption in Confluent Platform](../../security/protect-data/encrypt-tls.md#kafka-ssl-encryption). * All Kafka configuration properties: See [Kafka Configuration Reference for Confluent Platform](../../installation/configuration/index.md#cp-config-reference). For information about how the Connect worker functions, see [Configuring and Running Workers](/kafka-connectors/self-managed/userguide.html#configuring-and-running-workers). ## Distributed Worker Configuration In addition to the common worker configuration options, the following are available in distributed mode. For information about how the Connect worker functions, see [Configuring and Running Workers](/kafka-connectors/self-managed/userguide.html#configuring-and-running-workers). `group.id` : A unique string that identifies the Connect cluster group this Worker belongs to. #### IMPORTANT - For production environments, you must explicitly set this configuration. When using the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/index.html), this configuration property is set to `connect-cluster` by default. All workers with the same `group.id` will be in the same Connect cluster. For example, if worker A has `group.id=connect-cluster-a` and worker B has the same `group.id`, worker A and worker B form a cluster called `connect-cluster-a`. - The `group.id` for sink connectors is derived from the `consumer.group.id` in the worker properties. The `group.id` is created using the prefix `connect-` and the connector name. To override this value for a sink connector, add the following line ​in the worker properties file. ```bash connector.client.config.override.policy=All ``` * Type: string * Default: “” * Importance: high `config.storage.topic` : The name of the topic where connector and task configuration data are stored. This *must* be the same for all Workers with the same `group.id`. Kafka Connect will upon startup attempt to automatically create this topic with a single-partition and compacted cleanup policy to avoid losing data, but it will simply use the topic if it already exists. If you choose to create this topic manually, **always** create it as a compacted topic with a single partition and a high replication factor (3x or more). * Type: string * Default: “” * Importance: high `config.storage.replication.factor` : The replication factor used when Kafka Connects creates the topic used to store connector and task configuration data. This should **always** be at least `3` for a production system, but cannot be larger than the number of Kafka brokers in the cluster. Enter `-1` to use the Kafka broker default replication factor. * Type: short * Default: 3 * Importance: low `offset.storage.topic` : The name of the topic where connector and task configuration offsets are stored. This *must* be the same for all Workers with the same `group.id`. Kafka Connect will upon startup attempt to automatically create this topic with multiple partitions and a compacted cleanup policy to avoid losing data, but it will simply use the topic if it already exists. If you choose to create this topic manually, **always** create it as a compacted, highly replicated (3x or more) topic with a large number of partitions (e.g., 25 or 50, just like Kafka’s built-in `__consumer_offsets` topic) to support large Kafka Connect clusters. * Type: string * Default: “” * Importance: high `offset.storage.replication.factor` : The replication factor used when Connect creates the topic used to store connector offsets. This should **always** be at least `3` for a production system, but cannot be larger than the number of Kafka brokers in the cluster. Enter `-1` to use the Kafka broker default replication factor. * Type: short * Default: 3 * Importance: low `offset.storage.partitions` : The number of partitions used when Connect creates the topic used to store connector offsets. A large value (e.g., `25` or `50`, just like Kafka’s built-in `__consumer_offsets` topic) is necessary to support large Kafka Connect clusters. Enter `-1` to use the default number of partitions configured in the Kafka broker. * Type: int * Default: 25 * Importance: low `status.storage.topic` : The name of the topic where connector and task configuration status updates are stored. This *must* be the same for all Workers with the same `group.id`. Kafka Connect will upon startup attempt to automatically create this topic with multiple partitions and a compacted cleanup policy to avoid losing data, but it will simply use the topic if it already exists. If you choose to create this topic manually, **always** create it as a compacted, highly replicated (3x or more) topic with multiple partitions. * Type: string * Default: “” * Importance: high `status.storage.replication.factor` : The replication factor used when Connect creates the topic used to store connector and task status updates. This should **always** be at least `3` for a production system, but cannot be larger than the number of Kafka brokers in the cluster. Enter `-1` to use the Kafka broker default replication factor. * Type: short * Default: 3 * Importance: low `status.storage.partitions` : The number of partitions used when Connect creates the topic used to store connector and task status updates. Enter `-1` to use the default number of partitions configured in the Kafka broker. * Type: int * Default: 5 * Importance: low `heartbeat.interval.ms` : The expected time between heartbeats to the group coordinator when using Kafka’s group management facilities. Heartbeats are used to ensure that the Worker’s session stays active and to facilitate rebalancing when new members join or leave the group. The value must be set lower than `session.timeout.ms`, but typically should be set no higher than 1/3 of that value. It can be adjusted even lower to control the expected time for normal rebalances. * Type: int * Default: 3000 * Importance: high The `heartbeat.interval.ms` setting is ignored when `group.protocol = consumer`, instead use the broker configuration `group.consumer.heartbeat.interval.ms` to control the heartbeat. `session.timeout.ms` : The timeout used to detect failures when using Kafka’s group management facilities. * Type: int * Default: 30000 * Importance: high `ssl.key.password` : The password of the private key in the key store file. This is optional for client. * Type: password * Importance: high `ssl.keystore.location` : The location of the key store file. This is optional for client and can be used for two-way client authentication. * Type: string * Importance: high `ssl.keystore.password` : The store password for the key store file.This is optional for client and only needed if ssl.keystore.location is configured. * Type: password * Importance: high `ssl.truststore.location` : The location of the trust store file. * Type: string * Importance: high `ssl.truststore.password` : The password for the trust store file. * Type: password * Importance: high `connections.max.idle.ms` : Close idle connections after the number of milliseconds specified by this config. * Type: long * Default: 540000 * Importance: medium `receive.buffer.bytes` : The size of the TCP receive buffer (SO_RCVBUF) to use when reading data. * Type: int * Default: 32768 * Importance: medium `request.timeout.ms` : The configuration controls the maximum amount of time the client will wait for the response of a request. If the response is not received before the timeout elapses the client will resend the request if necessary or fail the request if retries are exhausted. * Type: int * Default: 40000 * Importance: medium `sasl.kerberos.service.name` : The Kerberos principal name that Kafka runs as. This can be defined either in Kafka’s JAAS config or in Kafka’s config. * Type: string * Importance: medium `security.protocol` : Protocol used to communicate with brokers. Valid values are: PLAINTEXT, SSL, SASL_PLAINTEXT, SASL_SSL. * Type: string * Default: “PLAINTEXT” * Importance: medium `send.buffer.bytes` : The size of the TCP send buffer (SO_SNDBUF) to use when sending data. * Type: int * Default: 131072 * Importance: medium `ssl.enabled.protocols` : The comma-separated list of protocols enabled for TLS connections. The default value is `TLSv1.2,TLSv1.3` when running with Java 11 or later, `TLSv1.2` otherwise. With the default value for Java 11 (`TLSv1.2,TLSv1.3`), Kafka clients and brokers prefer TLSv1.3 if both support it, and falls back to TLSv1.2 otherwise (assuming both support at least TLSv1.2). * Type: list * Default: `TLSv1.2,TLSv1.3` * Importance: medium `ssl.keystore.type` : The file format of the key store file. This is optional for client. Default value is JKS * Type: string * Default: “JKS” * Importance: medium `ssl.protocol` : The TLS protocol used to generate the SSLContext. The default is `TLSv1.3` when running with Java 11 or newer, `TLSv1.2` otherwise. This value should be fine for most use cases. Allowed values in recent JVMs are `TLSv1.2` and `TLSv1.3`. `TLS`, `TLSv1.1`, `SSL`, `SSLv2` and `SSLv3` might be supported in older JVMs, but their usage is discouraged due to known security vulnerabilities. With the default value for this configuration and `ssl.enabled.protocols`, clients downgrade to `TLSv1.2` if the server does not support `TLSv1.3`. If this configuration is set to `TLSv1.2`, clients do not use `TLSv1.3`, even if it is one of the values in `ssl.enabled.protocols` and the server only supports `TLSv1.3`. * Type: string * Default: `TLSv1.3` * Importance: medium `ssl.provider` : The name of the security provider used for TLS/SSL connections. Default value is the default security provider of the JVM. * Type: string * Importance: medium `ssl.truststore.type` : The file format of the trust store file. Default value is JKS. * Type: string * Default: “JKS” * Importance: medium `worker.sync.timeout.ms` : When the Worker is out of sync with other Workers and needs to resynchronize configurations, wait up to this amount of time before giving up, leaving the group, and waiting a backoff period before rejoining. * Type: int * Default: 3000 * Importance: medium `worker.unsync.backoff.ms` : When the Worker is out of sync with other Workers and fails to catch up within Worker.sync.timeout.ms, leave the Connect cluster for this long before rejoining. * Type: int * Default: 300000 * Importance: medium `client.id` : An ID string to pass to the server when making requests. The purpose of this is to be able to track the source of requests beyond just ip/port by allowing a logical application name to be included in server-side request logging. * Type: string * Default: “” * Importance: low `metadata.max.age.ms` : The period of time in milliseconds after which you force a refresh of metadata even if you haven’t seen any partition leadership changes to proactively discover any new brokers or partitions. * Type: long * Default: 300000 * Importance: low `metric.reporters` : A list of classes to use as metrics reporters. Implementing the `MetricReporter` interface allows plugging in classes that will be notified of new metric creation. The JmxReporter is always included to register JMX statistics. * Type: list * Default: [] * Importance: low `metrics.num.samples` : The number of samples maintained to compute metrics. * Type: int * Default: 2 * Importance: low `metrics.sample.window.ms` : The number of samples maintained to compute metrics. * Type: long * Default: 30000 * Importance: low `reconnect.backoff.ms` : The amount of time to wait before attempting to reconnect to a given host. This avoids repeatedly connecting to a host in a tight loop. This backoff applies to all requests sent by the consumer to the broker. * Type: long * Default: 50 * Importance: low `retry.backoff.ms` : The amount of time to wait before attempting to retry a failed fetch request to a given topic partition. This avoids repeated fetching-and-failing in a tight loop. * Type: long * Default: 100 * Importance: low `sasl.kerberos.kinit.cmd` : Kerberos kinit command path. Default is /usr/bin/kinit * Type: string * Default: “/usr/bin/kinit” * Importance: low `sasl.kerberos.min.time.before.relogin` : Login thread sleep time between refresh attempts. * Type: long * Default: 60000 * Importance: low `sasl.kerberos.ticket.renew.jitter` : Percentage of random jitter added to the renewal time. * Type: double * Default: 0.05 * Importance: low `sasl.kerberos.ticket.renew.window.factor` : Login thread will sleep until the specified window factor of time from last refresh to ticket’s expiry has been reached, at which time it will try to renew the ticket. * Type: double * Default: 0.8 * Importance: low `ssl.cipher.suites` : A list of cipher suites. This is a named combination of authentication, encryption, MAC, and key exchange algorithms used to negotiate the security settings for a network connection using TLS. By default, all the available cipher suites are supported. * Type: list * Importance: low `ssl.endpoint.identification.algorithm` : The endpoint identification algorithm to validate server hostname using server certificate. * Type: string * Importance: low `ssl.keymanager.algorithm` : The algorithm used by key manager factory for TLS/SSL connections. Default value is the key manager factory algorithm configured for the Java Virtual Machine. * Type: string * Default: “SunX509” * Importance: low `ssl.trustmanager.algorithm` : The algorithm used by trust manager factory for TLS/SSL connections. Default value is the trust manager factory algorithm configured for the Java Virtual Machine. * Type: string * Default: “PKIX” * Importance: low # - Spring Petclinic: https://github.com/spring-petclinic/spring-petclinic-rest/blob/master/src/main/resources/openapi.yml openapi: 3.0.1 info: title: Confluent Manager for Apache Flink / CMF description: Apache Flink job lifecycle management component for Confluent Platform. version: '1.0' servers: - url: http://localhost:8080 tags: - name: Environments - name: FlinkApplications - name: Secrets - name: SQL - name: Savepoints paths: ## ---------------------------- Environments API ---------------------------- ## /cmf/api/v1/environments: post: tags: - Environments operationId: createOrUpdateEnvironment summary: Create or update an Environment requestBody: content: application/json: schema: $ref: '#/components/schemas/PostEnvironment' application/yaml: schema: $ref: '#/components/schemas/PostEnvironment' required: true responses: 201: description: The Environment was successfully created or updated. content: application/json: schema: $ref: '#/components/schemas/Environment' application/yaml: schema: $ref: '#/components/schemas/Environment' 400: description: Bad request. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' get: tags: - Environments operationId: getEnvironments summary: Retrieve a paginated list of all environments. x-spring-paginated: true responses: 200: description: List of environments found. If no environments are found, an empty list is returned. Note the information about secret is not included in the list call yet. In order to get the information about secret, make a getSecret call. content: application/json: schema: $ref: '#/components/schemas/EnvironmentsPage' application/yaml: schema: $ref: '#/components/schemas/EnvironmentsPage' 304: description: Not modified. headers: ETag: description: An ID for this version of the response. schema: type: string 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' /cmf/api/v1/environments/{envName}: get: operationId: getEnvironment tags: - Environments summary: Get/Describe an environment with the given name. parameters: - name: envName in: path description: Name of the Environment to be retrieved. required: true schema: type: string responses: 200: description: Environment found and returned. content: application/json: schema: $ref: '#/components/schemas/Environment' application/yaml: schema: $ref: '#/components/schemas/Environment' 404: description: Environment not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' delete: operationId: deleteEnvironment tags: - Environments parameters: - name: envName in: path description: Name of the Environment to be deleted. required: true schema: type: string responses: 200: description: Environment found and deleted. 304: description: Not modified. 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 404: description: Environment not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' ## ---------------------------- Applications API ---------------------------- ## /cmf/api/v1/environments/{envName}/applications: post: tags: - FlinkApplications operationId: createOrUpdateApplication summary: Creates a new Flink Application or updates an existing one in the given Environment. parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string requestBody: content: application/json: schema: $ref: '#/components/schemas/FlinkApplication' application/yaml: schema: $ref: '#/components/schemas/FlinkApplication' required: true responses: 201: description: The Application was successfully created or updated. content: application/json: schema: $ref: '#/components/schemas/FlinkApplication' application/yaml: schema: $ref: '#/components/schemas/FlinkApplication' 400: description: Bad request. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 422: description: Request valid but invalid content. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' get: tags: - FlinkApplications operationId: getApplications summary: Retrieve a paginated list of all applications in the given Environment. x-spring-paginated: true parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string responses: 200: description: Application found and returned. content: application/json: schema: $ref: '#/components/schemas/ApplicationsPage' application/yaml: schema: $ref: '#/components/schemas/ApplicationsPage' 304: description: Not modified. headers: ETag: description: An ID for this version of the response. schema: type: string 404: description: Environment not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' /cmf/api/v1/environments/{envName}/applications/{appName}: get: tags: - FlinkApplications operationId: getApplication summary: Retrieve an Application of the given name in the given Environment. parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string - name: appName in: path description: Name of the Application required: true schema: type: string responses: 200: description: Application found and returned. content: application/json: schema: $ref: '#/components/schemas/FlinkApplication' application/yaml: schema: $ref: '#/components/schemas/FlinkApplication' 304: description: Not modified. headers: ETag: description: An ID for this version of the response. schema: type: string 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 404: description: Application not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' delete: tags: - FlinkApplications operationId: deleteApplication summary: Deletes an Application of the given name in the given Environment. parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string - name: appName in: path description: Name of the Application required: true schema: type: string responses: 200: description: Application found and deleted. 304: description: Not modified. 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' /cmf/api/v1alpha1/environments/{envName}/applications/{appName}/events: get: tags: - FlinkApplications operationId: getApplicationEvents summary: Get a paginated list of events of the given Application x-spring-paginated: true parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string - name: appName in: path description: Name of the Application required: true schema: type: string responses: 200: description: Events found and returned. content: application/json: schema: $ref: '#/components/schemas/EventsPage' application/yaml: schema: $ref: '#/components/schemas/EventsPage' 404: description: Environment or Application not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' /cmf/api/v1/environments/{envName}/applications/{appName}/start: post: tags: - FlinkApplications operationId: startApplication summary: Starts an earlier submitted Flink Application parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string - name: appName in: path description: Name of the Application required: true schema: type: string - name: startFromSavepointUid in: query description: UID of the Savepoint from which the application should be started. This savepoint could belong to the application or can be a detached savepoint. required: false schema: type: string responses: 200: description: Application started content: application/json: schema: $ref: '#/components/schemas/FlinkApplication' application/yaml: schema: $ref: '#/components/schemas/FlinkApplication' 304: description: Not modified. 404: description: Environment, Application or Savepoint not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' /cmf/api/v1/environments/{envName}/applications/{appName}/suspend: post: tags: - FlinkApplications operationId: suspendApplication summary: Suspends an earlier started Flink Application parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string - name: appName in: path description: Name of the Application required: true schema: type: string responses: 200: description: Application suspended content: application/json: schema: $ref: '#/components/schemas/FlinkApplication' application/yaml: schema: $ref: '#/components/schemas/FlinkApplication' 304: description: Not modified. 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' /cmf/api/v1/environments/{envName}/applications/{appName}/instances: get: tags: - FlinkApplications operationId: getApplicationInstances summary: Get a paginated list of instances of the given Application x-spring-paginated: true parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string - name: appName in: path description: Name of the Application required: true schema: type: string responses: 200: description: Instances found and returned. content: application/json: schema: $ref: '#/components/schemas/ApplicationInstancesPage' application/yaml: schema: $ref: '#/components/schemas/ApplicationInstancesPage' 404: description: Environment or Application not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' /cmf/api/v1/environments/{envName}/applications/{appName}/instances/{instName}: get: tags: - FlinkApplications operationId: getApplicationInstance summary: Retrieve an Instance of an Application parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string - name: appName in: path description: Name of the Application required: true schema: type: string - name: instName in: path description: Name of the ApplicationInstance required: true schema: type: string responses: 200: description: ApplicationInstance found and returned. content: application/json: schema: $ref: '#/components/schemas/FlinkApplicationInstance' application/yaml: schema: $ref: '#/components/schemas/FlinkApplicationInstance' 404: description: FlinkApplicationInstance or environment or application not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' ## ---------------------------- Environment Secret Mapping API ---------------------------- ## /cmf/api/v1/environments/{envName}/secret-mappings/{name}: delete: tags: - Environments operationId: deleteEnvironmentSecretMapping summary: Deletes the Environment Secret Mapping for the given Environment and Secret. parameters: - name: envName in: path description: Name of the Environment in which the mapping has to be deleted. required: true schema: type: string - name: name in: path description: Name of the environment secret mapping to be deleted in the given environment. required: true schema: type: string responses: 204: description: The Environment Secret Mapping was successfully deleted. 404: description: Environment or Secret not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' get: tags: - Environments operationId: getEnvironmentSecretMapping summary: Retrieve the Environment Secret Mapping for the given name in the given environment. parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string - name: name in: path description: Name of the environment secret mapping to be retrieved. required: true schema: type: string responses: 200: description: Environment Secret Mapping found and returned. content: application/json: schema: $ref: '#/components/schemas/EnvironmentSecretMapping' application/yaml: schema: $ref: '#/components/schemas/EnvironmentSecretMapping' 404: description: Environment or Secret not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' put: tags: - Environments operationId: updateEnvironmentSecretMapping summary: Updates the Environment Secret Mapping for the given Environment. parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string - name: name in: path description: Name of the environment secret mapping to be updated required: true schema: type: string requestBody: content: application/json: schema: $ref: '#/components/schemas/EnvironmentSecretMapping' application/yaml: schema: $ref: '#/components/schemas/EnvironmentSecretMapping' required: true responses: 200: description: The Environment Secret Mapping was successfully updated. content: application/json: schema: $ref: '#/components/schemas/EnvironmentSecretMapping' application/yaml: schema: $ref: '#/components/schemas/EnvironmentSecretMapping' 400: description: Bad request. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 404: description: Environment or Secret not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' /cmf/api/v1/environments/{envName}/secret-mappings: post: tags: - Environments operationId: createEnvironmentSecretMapping summary: Creates the Environment Secret Mapping for the given Environment. parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string requestBody: content: application/json: schema: $ref: '#/components/schemas/EnvironmentSecretMapping' application/yaml: schema: $ref: '#/components/schemas/EnvironmentSecretMapping' required: true responses: 200: description: The Environment Secret Mapping was successfully created. content: application/json: schema: $ref: '#/components/schemas/EnvironmentSecretMapping' application/yaml: schema: $ref: '#/components/schemas/EnvironmentSecretMapping' 400: description: Bad request. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 404: description: Environment not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 409: description: Environment Secret Mapping already exists. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' get: tags: - Environments operationId: getEnvironmentSecretMappings summary: Retrieve a paginated list of all Environment Secret Mappings. x-spring-paginated: true parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string responses: 200: description: Environment Secret Mappings found and returned. content: application/json: schema: $ref: '#/components/schemas/EnvironmentSecretMappingsPage' application/yaml: schema: $ref: '#/components/schemas/EnvironmentSecretMappingsPage' 404: description: Environment not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' ## ---------------------------- Statement API ---------------------------- ## /cmf/api/v1/environments/{envName}/statements: post: tags: - SQL operationId: createStatement summary: Creates a new Flink SQL Statement in the given Environment. parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string requestBody: content: application/json: schema: $ref: '#/components/schemas/Statement' application/yaml: schema: $ref: '#/components/schemas/Statement' required: true responses: 200: description: The Statement was successfully created. content: application/json: schema: $ref: '#/components/schemas/Statement' application/yaml: schema: $ref: '#/components/schemas/Statement' 400: description: Bad request. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 404: description: Environment not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 409: description: Statement already exists. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 422: description: Request valid but invalid content. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' get: tags: - SQL operationId: getStatements summary: Retrieve a paginated list of Statements in the given Environment. x-spring-paginated: true parameters: - $ref: '#/components/parameters/computePoolParam' - $ref: '#/components/parameters/phaseParam' - name: envName in: path description: Name of the Environment required: true schema: type: string responses: 200: description: Statements found and returned. content: application/json: schema: $ref: '#/components/schemas/StatementsPage' application/yaml: schema: $ref: '#/components/schemas/StatementsPage' 404: description: Environment not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' /cmf/api/v1/environments/{envName}/statements/{stmtName}: get: tags: - SQL operationId: getStatement summary: Retrieve the Statement of the given name in the given Environment. parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string - name: stmtName in: path description: Name of the Statement required: true schema: type: string responses: 200: description: Statement found and returned. content: application/json: schema: $ref: '#/components/schemas/Statement' application/yaml: schema: $ref: '#/components/schemas/Statement' 404: description: Statement not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' delete: tags: - SQL operationId: deleteStatement summary: Deletes the Statement of the given name in the given Environment. parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string - name: stmtName in: path description: Name of the Statement required: true schema: type: string responses: 204: description: Statement was found and deleted. 202: description: Statement was found and deletion request received. 404: description: Statement not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' put: tags: - SQL operationId: updateStatement summary: Updates a Statement of the given name in the given Environment. parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string - name: stmtName in: path description: Name of the Statement required: true schema: type: string requestBody: content: application/json: schema: $ref: '#/components/schemas/Statement' application/yaml: schema: $ref: '#/components/schemas/Statement' required: true responses: 200: description: Statement was found and updated. 400: description: Bad request. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 404: description: Statement not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 422: description: Request valid but invalid content. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' /cmf/api/v1/environments/{envName}/statements/{stmtName}/results: get: tags: - SQL operationId: getStatementResult summary: Retrieve the result of the interactive Statement with the given name in the given Environment. parameters: - $ref: '#/components/parameters/pageTokenParam' - name: envName in: path description: Name of the Environment required: true schema: type: string - name: stmtName in: path description: Name of the Statement required: true schema: type: string responses: 200: description: StatementResults found and returned. content: application/json: schema: $ref: '#/components/schemas/StatementResult' application/yaml: schema: $ref: '#/components/schemas/StatementResult' 400: description: Statement does not return results. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 404: description: Statement not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 410: description: Results are gone. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' /cmf/api/v1/environments/{envName}/statements/{stmtName}/exceptions: get: tags: - SQL operationId: getStatementExceptions summary: Retrieves the last 10 exceptions of the Statement with the given name in the given Environment. parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string - name: stmtName in: path description: Name of the Statement required: true schema: type: string responses: 200: description: StatementExceptions found and returned. content: application/json: schema: $ref: '#/components/schemas/StatementExceptionList' application/yaml: schema: $ref: '#/components/schemas/StatementExceptionList' 404: description: Statement not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' ## ---------------------------- Compute Pool API ---------------------------- ## /cmf/api/v1/environments/{envName}/compute-pools: post: tags: - SQL operationId: createComputePool summary: Creates a new Flink Compute Pool in the given Environment. parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string requestBody: content: application/json: schema: $ref: '#/components/schemas/ComputePool' application/yaml: schema: $ref: '#/components/schemas/ComputePool' required: true responses: 200: description: The Compute Pool was successfully created. content: application/json: schema: $ref: '#/components/schemas/ComputePool' application/yaml: schema: $ref: '#/components/schemas/ComputePool' 400: description: Bad request. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 409: description: Compute Pool already exists. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 422: description: Request valid but invalid content. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' get: tags: - SQL operationId: getComputePools summary: Retrieve a paginated list of Compute Pools in the given Environment. x-spring-paginated: true parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string responses: 200: description: Compute Pools found and returned. content: application/json: schema: $ref: '#/components/schemas/ComputePoolsPage' application/yaml: schema: $ref: '#/components/schemas/ComputePoolsPage' 404: description: Environment not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' /cmf/api/v1/environments/{envName}/compute-pools/{computePoolName}: get: tags: - SQL operationId: getComputePool summary: Retrieve the Compute Pool of the given name in the given Environment. parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string - name: computePoolName in: path description: Name of the Compute Pool required: true schema: type: string responses: 200: description: Compute Pool found and returned. content: application/json: schema: $ref: '#/components/schemas/ComputePool' application/yaml: schema: $ref: '#/components/schemas/ComputePool' 404: description: Compute Pool not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' delete: tags: - SQL operationId: deleteComputePool summary: Deletes the ComputePool of the given name in the given Environment. parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string - name: computePoolName in: path description: Name of the ComputePool required: true schema: type: string responses: 204: description: Compute Pool was found and deleted. 404: description: Compute Pool not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 409: description: Compute Pool is in use and cannot be deleted. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' ## ---------------------------- Secrets API ---------------------------- ## /cmf/api/v1/secrets: post: tags: - Secrets operationId: createSecret summary: Create a Secret. description: Create a Secret. This secrets can be then used to specify sensitive information in the Flink SQL statements. Right now these secrets are only used for Kafka and Schema Registry credentials. requestBody: content: application/json: schema: $ref: '#/components/schemas/Secret' application/yaml: schema: $ref: '#/components/schemas/Secret' required: true responses: 200: description: The Secret was successfully created. Note that for security reasons, you can never view the contents of the secret itself once created. content: application/json: schema: $ref: '#/components/schemas/Secret' application/yaml: schema: $ref: '#/components/schemas/Secret' 400: description: Bad request. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 409: description: The Secret already exists. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' get: tags: - Secrets operationId: getSecrets summary: Retrieve a paginated list of all secrets. Note that the actual secret data is masked for security reasons. x-spring-paginated: true responses: 200: description: List of secrets found. If no secrets are found, an empty list is returned. content: application/json: schema: $ref: '#/components/schemas/SecretsPage' application/yaml: schema: $ref: '#/components/schemas/SecretsPage' 304: description: The list of secrets has not changed. headers: ETag: description: An ID for this version of the response. schema: type: string 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' /cmf/api/v1/secrets/{secretName}: get: tags: - Secrets operationId: getSecret summary: Retrieve the Secret of the given name. Note that the secret data is not returned for security reasons. parameters: - name: secretName in: path description: Name of the Secret required: true schema: type: string responses: 200: description: Secret found and returned, with security data masked. content: application/json: schema: $ref: '#/components/schemas/Secret' application/yaml: schema: $ref: '#/components/schemas/Secret' 404: description: Secret not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' put: tags: - Secrets operationId: updateSecret summary: Update the secret. parameters: - name: secretName in: path description: Name of the Secret required: true schema: type: string requestBody: required: true content: application/json: schema: $ref: '#/components/schemas/Secret' application/yaml: schema: $ref: '#/components/schemas/Secret' responses: 200: description: Returns the updated Secret content: application/json: schema: $ref: '#/components/schemas/Secret' application/yaml: schema: $ref: '#/components/schemas/Secret' 400: description: Bad request. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 404: description: Secret with the given name not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 422: description: Request valid but invalid content. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' delete: tags: - Secrets operationId: deleteSecret summary: Delete the secret with the given name. parameters: - name: secretName in: path description: Name of the Secret required: true schema: type: string responses: 204: description: Secret was successfully deleted. 404: description: Secret not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' ## ---------------------------- Catalog API ---------------------------- ## /cmf/api/v1/catalogs/kafka: post: tags: - SQL operationId: createKafkaCatalog summary: Creates a new Kafka Catalog that can be referenced by Flink Statements requestBody: content: application/json: schema: $ref: '#/components/schemas/KafkaCatalog' application/yaml: schema: $ref: '#/components/schemas/KafkaCatalog' required: true responses: 200: description: The Kafka Catalog was successfully created. content: application/json: schema: $ref: '#/components/schemas/KafkaCatalog' application/yaml: schema: $ref: '#/components/schemas/KafkaCatalog' 400: description: Bad request. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 409: description: Kafka Catalog already exists. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 422: description: Request valid but invalid content. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' get: tags: - SQL operationId: getKafkaCatalogs summary: Retrieve a paginated list of Kafka Catalogs x-spring-paginated: true responses: 200: description: Kafka Catalogs found and returned. content: application/json: schema: $ref: '#/components/schemas/KafkaCatalogsPage' application/yaml: schema: $ref: '#/components/schemas/KafkaCatalogsPage' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' /cmf/api/v1/catalogs/kafka/{catName}: get: tags: - SQL operationId: getKafkaCatalog summary: Retrieve the Kafka Catalog of the given name. parameters: - name: catName in: path description: Name of the Kafka Catalog required: true schema: type: string responses: 200: description: Kafka Catalog found and returned. content: application/json: schema: $ref: '#/components/schemas/KafkaCatalog' application/yaml: schema: $ref: '#/components/schemas/KafkaCatalog' 404: description: Kafka Catalog not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' delete: tags: - SQL operationId: deleteKafkaCatalog summary: Deletes the Kafka Catalog of the given name. parameters: - name: catName in: path description: Name of the Kafka Catalog required: true schema: type: string responses: 204: description: Kafka Catalog was found and deleted. 404: description: Kafka Catalog not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 409: description: Catalog contains databases and cannot be deleted. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' put: tags: - SQL operationId: updateKafkaCatalog summary: Updates a KafkaCatalog of the given name. parameters: - name: catName in: path description: Name of the KafkaCatalog required: true schema: type: string requestBody: content: application/json: schema: $ref: '#/components/schemas/KafkaCatalog' application/yaml: schema: $ref: '#/components/schemas/KafkaCatalog' required: true responses: 200: description: KafkaCatalog was found and updated. 400: description: Bad request. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 404: description: KafkaCatalog not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 422: description: Request valid but invalid content. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' ## ---------------------------- Database API ---------------------------- ## /cmf/api/v1/catalogs/kafka/{catName}/databases: post: tags: - SQL operationId: createKafkaDatabase summary: Creates a new Kafka Database parameters: - name: catName in: path description: Name of the Catalog required: true schema: type: string requestBody: content: application/json: schema: $ref: '#/components/schemas/KafkaDatabase' application/yaml: schema: $ref: '#/components/schemas/KafkaDatabase' required: true responses: 200: description: The Kafka Database was successfully created. content: application/json: schema: $ref: '#/components/schemas/KafkaDatabase' application/yaml: schema: $ref: '#/components/schemas/KafkaDatabase' 400: description: Bad request. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 409: description: Kafka Database already exists. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 422: description: Request valid but invalid content. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' get: tags: - SQL operationId: getKafkaDatabases summary: Retrieve a paginated list of Kafka Databases parameters: - name: catName in: path description: Name of the Catalog required: true schema: type: string x-spring-paginated: true responses: 200: description: Kafka Databases found and returned. content: application/json: schema: $ref: '#/components/schemas/KafkaDatabasesPage' application/yaml: schema: $ref: '#/components/schemas/KafkaDatabasesPage' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' /cmf/api/v1/catalogs/kafka/{catName}/databases/{dbName}: get: tags: - SQL operationId: getKafkaDatabase summary: Retrieve the Kafka Database of the given name in the given KafkaCatalog. parameters: - name: catName in: path description: Name of the Kafka Catalog required: true schema: type: string - name: dbName in: path description: Name of the Kafka Database required: true schema: type: string responses: 200: description: Kafka Database found and returned. content: application/json: schema: $ref: '#/components/schemas/KafkaDatabase' application/yaml: schema: $ref: '#/components/schemas/KafkaDatabase' 404: description: Kafka Database not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' delete: tags: - SQL operationId: deleteKafkaDatabase summary: Deletes the Kafka Database of the given name in the given KafkaCatalog. parameters: - name: catName in: path description: Name of the Kafka Catalog required: true schema: type: string - name: dbName in: path description: Name of the Kafka Database required: true schema: type: string responses: 204: description: Kafka Database was found and deleted. 404: description: Kafka Database not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' put: tags: - SQL operationId: updateKafkaDatabase summary: Updates a KafkaDatabase of the given name in the given KafkaCatalog. parameters: - name: catName in: path description: Name of the KafkaCatalog required: true schema: type: string - name: dbName in: path description: Name of the KafkaDatabase required: true schema: type: string requestBody: content: application/json: schema: $ref: '#/components/schemas/KafkaDatabase' application/yaml: schema: $ref: '#/components/schemas/KafkaDatabase' required: true responses: 200: description: KafkaDatabase was found and updated. 400: description: Bad request. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 404: description: KafkaDatabase not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 422: description: Request valid but invalid content. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' ## ---------------------------- Savepoint API ---------------------------- ## ### --------------------------- Savepoint for Applications API --------------------------- ### /cmf/api/v1/environments/{envName}/applications/{appName}/savepoints: post: tags: - Savepoints operationId: createSavepointForFlinkApplication summary: Creates a new Savepoint for the given Application in the given Environment. parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string - name: appName in: path description: Name of the Application required: true schema: type: string requestBody: content: application/json: schema: $ref: '#/components/schemas/Savepoint' application/yaml: schema: $ref: '#/components/schemas/Savepoint' responses: 201: description: Savepoint was successfully created. content: application/json: schema: $ref: '#/components/schemas/Savepoint' application/yaml: schema: $ref: '#/components/schemas/Savepoint' 404: description: Environment or Application not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 400: description: Bad request. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 422: description: Request valid but invalid content. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' get: tags: - Savepoints operationId: getSavepointsForFlinkApplication summary: Retrieve a paginated list of all Savepoints for the given Application in the given Environment. x-spring-paginated: true parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string - name: appName in: path description: Name of the Application required: true schema: type: string responses: 200: description: Savepoints found and returned. In case, there are no savepoints, an empty list is returned. content: application/json: schema: $ref: '#/components/schemas/SavepointsPage' application/yaml: schema: $ref: '#/components/schemas/SavepointsPage' 404: description: Environment or Application not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' /cmf/api/v1/environments/{envName}/applications/{appName}/savepoints/{savepointName}: get: tags: - Savepoints operationId: getSavepointForFlinkApplication summary: Retrieve the Savepoint of the given name for the given Application in the given Environment. parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string - name: appName in: path description: Name of the Application required: true schema: type: string - name: savepointName in: path description: Name of the Savepoint required: true schema: type: string responses: 200: description: Savepoint found and returned. content: application/json: schema: $ref: '#/components/schemas/Savepoint' application/yaml: schema: $ref: '#/components/schemas/Savepoint' 404: description: Environment, Application or Savepoint not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' delete: tags: - Savepoints operationId: deleteSavepointForFlinkApplication summary: Deletes the Savepoint of the given name for the given Application in the given Environment. parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string - name: appName in: path description: Name of the Application required: true schema: type: string - name: savepointName in: path description: Name of the Savepoint to be deleted. required: true schema: type: string - name: force in: query description: If a Savepoint is marked for deletion, it can be force deleted. required: false schema: type: boolean default: false responses: 204: description: Savepoint was found and deleted. 404: description: Environment, Application or Savepoint not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' /cmf/api/v1/environments/{envName}/applications/{appName}/savepoints/{savepointName}/detach: post: tags: - Savepoints operationId: detachSavepointFromFlinkApplication summary: Detaches the Savepoint of the given name for the given Application in the given Environment. parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string - name: appName in: path description: Name of the Application required: true schema: type: string - name: savepointName in: path description: Name of the Savepoint to be detached. required: true schema: type: string responses: 200: description: Savepoint was successfully detached and returned. content: application/json: schema: $ref: '#/components/schemas/Savepoint' application/yaml: schema: $ref: '#/components/schemas/Savepoint' 404: description: Environment, Application or Savepoint not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' ### --------------------------- Savepoint for Statements API --------------------------- ### /cmf/api/v1/environments/{envName}/statements/{stmtName}/savepoints: post: tags: - Savepoints operationId: createSavepointForFlinkStatement summary: Creates a new Savepoint for the given Statement in the given Environment. parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string - name: stmtName in: path description: Name of the Statement required: true schema: type: string requestBody: content: application/json: schema: $ref: '#/components/schemas/Savepoint' application/yaml: schema: $ref: '#/components/schemas/Savepoint' responses: 201: description: Savepoint was successfully created. content: application/json: schema: $ref: '#/components/schemas/Savepoint' application/yaml: schema: $ref: '#/components/schemas/Savepoint' 400: description: Bad request. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 404: description: Environment or Statement not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 422: description: Request valid but invalid content. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' get: tags: - Savepoints operationId: getSavepointsForFlinkStatement summary: Retrieve a paginated list of all Savepoints for the given Statement in the given Environment. x-spring-paginated: true parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string - name: stmtName in: path description: Name of the Statement required: true schema: type: string responses: 200: description: Savepoints found and returned. In case, there are no savepoints, an empty list is returned. content: application/json: schema: $ref: '#/components/schemas/SavepointsPage' application/yaml: schema: $ref: '#/components/schemas/SavepointsPage' 404: description: Environment or Statement not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' /cmf/api/v1/environments/{envName}/statements/{stmtName}/savepoints/{savepointName}: get: tags: - Savepoints operationId: getSavepointForFlinkStatement summary: Retrieve the Savepoint of the given name for the given Statement in the given Environment. parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string - name: stmtName in: path description: Name of the Statement required: true schema: type: string - name: savepointName in: path description: Name of the Savepoint required: true schema: type: string responses: 200: description: Savepoint found and returned. content: application/json: schema: $ref: '#/components/schemas/Savepoint' application/yaml: schema: $ref: '#/components/schemas/Savepoint' 404: description: Environment, Statement or Savepoint not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' delete: tags: - Savepoints operationId: deleteSavepointForFlinkStatement summary: Deletes the Savepoint of the given name for the given Statement in the given Environment. parameters: - name: envName in: path description: Name of the Environment required: true schema: type: string - name: stmtName in: path description: Name of the Statement required: true schema: type: string - name: savepointName in: path description: Name of the Savepoint to be deleted. required: true schema: type: string - name: force in: query description: If a Savepoint is marked for deletion, it can be force deleted. required: false schema: type: boolean default: false responses: 204: description: Savepoint was found and deleted. 404: description: Environment, Statement or Savepoint not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' ### --------------------------- Detached Savepoint API --------------------------- ### /cmf/api/v1/detached-savepoints: post: tags: - DetachedSavepoints operationId: createDetachedSavepoint summary: Creates a new detached savepoint. requestBody: content: application/json: schema: $ref: '#/components/schemas/Savepoint' application/yaml: schema: $ref: '#/components/schemas/Savepoint' responses: 201: description: Detached Savepoint was successfully created and returned. content: application/json: schema: $ref: '#/components/schemas/Savepoint' application/yaml: schema: $ref: '#/components/schemas/Savepoint' 400: description: Bad request. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 422: description: Request valid but invalid content. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' get: tags: - DetachedSavepoints operationId: listDetachedSavepoints summary: Retrieve a paginated list of all Detached Savepoints. x-spring-paginated: true parameters: - name: name in: query description: Filter by detached savepoint name prefix (e.g. ?name=abc) required: false schema: type: string responses: 200: description: Detached Savepoints found and returned. In case, there are none, an empty list is returned. content: application/json: schema: $ref: '#/components/schemas/SavepointsPage' application/yaml: schema: $ref: '#/components/schemas/SavepointsPage' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' /cmf/api/v1/detached-savepoints/{detachedSavepointName}: get: tags: - DetachedSavepoints operationId: getDetachedSavepoint summary: Retrieve the Detached Savepoint of the given name. parameters: - name: detachedSavepointName in: path description: Name of the Detached Savepoint required: true schema: type: string responses: 200: description: Detached Savepoint found and returned. content: application/json: schema: $ref: '#/components/schemas/Savepoint' application/yaml: schema: $ref: '#/components/schemas/Savepoint' 404: description: Detached Savepoint not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' delete: tags: - DetachedSavepoints operationId: deleteDetachedSavepoint summary: Deletes the Detached Savepoint of the given name. parameters: - name: detachedSavepointName in: path description: Name of the Detached Savepoint required: true schema: type: string responses: 204: description: Detached Savepoint was found and deleted. 404: description: Detached Savepoint not found. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' 500: description: Server error. content: application/json: schema: $ref: '#/components/schemas/RestError' application/yaml: schema: $ref: '#/components/schemas/RestError' components: # https://github.com/daniel-shuy/swaggerhub-spring-pagination / Copyright (c) 2023 Daniel Shuy parameters: pageTokenParam: in: query name: page-token schema: type: string description: Token for the next page of results computePoolParam: in: query name: compute-pool schema: type: string description: Name of the ComputePool to filter on phaseParam: in: query name: phase schema: type: string enum: [pending, running, completed, deleting, failing, failed, stopped] description: Phase to filter on schemas: ## ---------------------------- Shared Utilities ---------------------------- ## RestError: title: REST Error description: The schema for all error responses. type: object properties: errors: title: errors description: List of all errors type: array items: title: error type: object description: An error properties: message: type: string description: An error message PaginationResponse: type: object properties: pageable: $ref: '#/components/schemas/Pageable' Sort: type: object format: sort properties: sorted: type: boolean description: Whether the results are sorted. example: true unsorted: type: boolean description: Whether the results are unsorted. example: false empty: type: boolean Pageable: type: object format: pageable properties: page: type: integer minimum: 0 size: type: integer description: The number of items in a page. minimum: 1 sort: $ref: '#/components/schemas/Sort' ## ---------------------------- Shared Bases ---------------------------- ## ResourceBaseV2: type: object properties: apiVersion: description: API version for spec type: string kind: description: Kind of resource - set to resource type type: string required: - apiVersion - kind PostResourceBase: type: object properties: name: title: Name description: A unique name for the resource. type: string # Validate for DNS subdomain name minLength: 4 maxLength: 253 pattern: '^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$' GetResourceBase: type: object properties: created_time: title: Time when the resource has been created type: string format: date-time updated_time: title: Time when the resource has been last updated type: string format: date-time # defines kubernetesNamespace KubernetesNamespace: type: object properties: kubernetesNamespace: type: string title: Kubernetes namespace name where resources referencing this environment are created in. minLength: 1 maxLength: 253 pattern: '^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$' # defines properties of fields with flinkApplicationDefaults ResourceWithFlinkApplicationDefaults: type: object properties: flinkApplicationDefaults: title: the defaults as YAML or JSON for FlinkApplications type: object format: yamlorjson # defines computePool defaults ComputePoolDefaults: type: object properties: computePoolDefaults: title: ComputePoolDefaults description: the defaults as YAML or JSON for ComputePools type: object format: yamlorjson # defines statement defaults StatementDefaults: type: object properties: flinkConfiguration: description: default Flink configuration for Statements type: object additionalProperties: type: string # defines defaults for detached and interactive statements AllStatementDefaults: type: object properties: statementDefaults: title: AllStatementDefaults description: the defaults for detached and interactive Statements type: object properties: detached: description: defaults for detached statements $ref: '#/components/schemas/StatementDefaults' interactive: description: defaults for interactive statements $ref: '#/components/schemas/StatementDefaults' ## ---------------------------- Request Schemas ---------------------------- ## PostEnvironment: title: Environment description: Environment type: object required: - name allOf: - $ref: '#/components/schemas/PostResourceBase' - $ref: '#/components/schemas/ResourceWithFlinkApplicationDefaults' - $ref: '#/components/schemas/KubernetesNamespace' - $ref: '#/components/schemas/ComputePoolDefaults' - $ref: '#/components/schemas/AllStatementDefaults' ## ---------------------------- Response Schemas ---------------------------- ## Environment: title: Environment description: Environment type: object allOf: - $ref: '#/components/schemas/PostResourceBase' - $ref: '#/components/schemas/GetResourceBase' - $ref: '#/components/schemas/ResourceWithFlinkApplicationDefaults' - $ref: '#/components/schemas/KubernetesNamespace' - $ref: '#/components/schemas/ComputePoolDefaults' - $ref: '#/components/schemas/AllStatementDefaults' properties: secrets: title: Secrets description: The secrets mapping for the environment. This is a mapping between connection_secret_id and the secret name. type: object additionalProperties: type: string default: { } required: - name - kubernetesNamespace EnvironmentSecretMapping: title: EnvironmentSecretMapping description: The secrets mapping for the environment. The name shows the name of the Connection Secret ID to be mapped. type: object properties: apiVersion: title: API version for EnvironmentSecretMapping spec type: string kind: title: Kind of resource - set to EnvironmentSecretMapping type: string metadata: title: EnvironmentSecretMappingMetadata description: Metadata about the environment secret mapping type: object properties: name: description: Name of the Connection Secret ID type: string uid: description: Unique identifier of the EnvironmentSecretMapping type: string creationTimestamp: description: Timestamp when the EnvironmentSecretMapping was created type: string updateTimestamp: description: Timestamp when the EnvironmentSecretMapping was last updated type: string labels: description: Labels of the EnvironmentSecretMapping type: object additionalProperties: type: string annotations: description: Annotations of the EnvironmentSecretMapping type: object additionalProperties: type: string spec: title: EnvironmentSecretMappingSpec description: Spec for environment secret mapping type: object writeOnly: true properties: secretName: description: Name of the secret to be mapped to the connection secret id of this mapping. type: string required: - secretName required: - apiVersion - kind EnvironmentSecretMappingsPage: type: object allOf: - $ref: '#/components/schemas/PaginationResponse' - type: object properties: metadata: type: object title: EnvironmentSecretMappingsPageMetadata properties: size: type: integer format: int64 default: 0 items: type: array items: $ref: '#/components/schemas/EnvironmentSecretMapping' default: [ ] Secret: title: Secret description: Represents a Secret that can be used to specify sensitive information in the Flink SQL statements. allOf: - $ref: '#/components/schemas/ResourceBaseV2' - type: object properties: metadata: title: SecretMetadata description: Metadata about the secret type: object properties: name: type: string description: Name of the Secret creationTimestamp: description: Timestamp when the Secret was created type: string readOnly: true updateTimestamp: description: Timestamp when the Secret was last updated type: string readOnly: true uid: description: Unique identifier of the Secret type: string readOnly: true labels: description: Labels of the Secret type: object additionalProperties: type: string annotations: description: Annotations of the Secret type: object additionalProperties: type: string required: - name spec: title: SecretSpec description: Spec for secret type: object writeOnly: true properties: data: title: SecretData description: Data of the secret type: object additionalProperties: type: string status: title: SecretStatus description: Status for the secret type: object readOnly: true properties: version: title: SecretVersion description: The version of the secret type: string environments: title: Environments description: The environments to which the secret is attached to. type: array items: type: string required: - metadata - spec SecretsPage: type: object allOf: - $ref: '#/components/schemas/PaginationResponse' - type: object properties: metadata: type: object title: SecretsPageMetadata properties: size: type: integer format: int64 default: 0 items: type: array items: $ref: '#/components/schemas/Secret' default: [ ] EnvironmentsPage: type: object allOf: - $ref: '#/components/schemas/PaginationResponse' - type: object properties: metadata: type: object title: EnvironmentsPageMetadata properties: size: type: integer format: int64 default: 0 items: type: array title: "Env" items: $ref: '#/components/schemas/Environment' default: [ ] FlinkApplication: title: FlinkApplication description: Represents a Flink Application submitted by the user type: object allOf: - $ref: '#/components/schemas/ResourceBaseV2' - type: object properties: metadata: title: Metadata about the application type: object format: yamlorjson spec: title: Spec for Flink Application type: object format: yamlorjson status: title: Status for Flink Application type: object format: yamlorjson required: # status is optional for application spec - metadata - spec ApplicationsPage: type: object allOf: - $ref: '#/components/schemas/PaginationResponse' - type: object properties: metadata: type: object title: ApplicationPageMetadata properties: size: type: integer format: int64 default: 0 items: type: array items: $ref: '#/components/schemas/FlinkApplication' default: [ ] FlinkApplicationEvent: title: FlinkApplicationEvent description: Events from the deployment of Flink clusters # TODO(CF-1159): Using the ResourceBaseV2 here leads to incorrectly generated EventType where the generated interface doesn't match the file name causing compilation errors. type: object properties: apiVersion: title: API version for Event spec - set to v1alpha1 type: string kind: title: Kind of resource - set to FlinkApplicationEvent type: string metadata: title: EventMetadata description: Metadata about the event type: object properties: name: description: Name of the Event type: string uid: description: Unique identifier of the Event. Identical to name. type: string creationTimestamp: description: Timestamp when the Event was created type: string flinkApplicationInstance: description: Name of the FlinkApplicationInstance which this event is related to type: string labels: description: Labels of the Event type: object additionalProperties: type: string annotations: description: Annotations of the Event type: object additionalProperties: type: string status: type: object title: EventStatus properties: message: description: Human readable status message. type: string type: title: EventType description: Type of the event type: string data: $ref: '#/components/schemas/EventData' required: - kind - apiVersion - metadata - status EventDataNewStatus: type: object properties: newStatus: description: "The new status" type: string EventDataJobException: type: object properties: exceptionString: description: "The full exception string from the Flink job" type: string EventData: oneOf: - $ref: '#/components/schemas/EventDataNewStatus' - $ref: '#/components/schemas/EventDataJobException' EventsPage: type: object allOf: - $ref: '#/components/schemas/PaginationResponse' - type: object properties: metadata: type: object title: EventsPageMetadata properties: size: type: integer format: int64 default: 0 items: type: array items: $ref: '#/components/schemas/FlinkApplicationEvent' default: [ ] FlinkApplicationInstance: title: ApplicationInstance description: An instance of a Flink Application type: object allOf: - $ref: '#/components/schemas/ResourceBaseV2' - type: object properties: metadata: title: ApplicationInstanceMetadata description: Metadata about the instance type: object properties: name: description: Name of the Instance - a uuid. type: string uid: description: Unique identifier of the instance. Identical to name. type: string creationTimestamp: description: Timestamp when the Instance was created type: string updateTimestamp: description: Timestamp when the Instance status was last updated type: string labels: description: Labels of the instance type: object additionalProperties: type: string annotations: description: Annotations of the instance type: object additionalProperties: type: string status: type: object title: ApplicationInstanceStatus properties: spec: description: The environment defaults merged with the FlinkApplication spec at instance creation time type: object format: yamlorjson jobStatus: type: object properties: jobId: description: Flink job id inside the Flink cluster type: string state: description: Tracks the final Flink JobStatus of the instance type: string ApplicationInstancesPage: type: object allOf: - $ref: '#/components/schemas/PaginationResponse' - type: object properties: metadata: type: object title: ApplicationInstancesPageMetadata properties: size: type: integer format: int64 default: 0 items: type: array items: $ref: '#/components/schemas/FlinkApplicationInstance' default: [ ] Statement: title: Statement description: Represents a SQL Statement submitted by the user allOf: - $ref: '#/components/schemas/ResourceBaseV2' - type: object properties: metadata: title: StatementMetadata description: Metadata about the statement type: object properties: name: description: Name of the Statement type: string creationTimestamp: description: Timestamp when the Statement was created type: string updateTimestamp: description: Timestamp when the Statement was updated last type: string uid: description: Unique identifier of the Statement type: string labels: description: Labels of the Statement type: object additionalProperties: type: string annotations: description: Annotations of the Statement type: object additionalProperties: type: string required: - name spec: title: StatementSpec description: Spec for statement type: object properties: statement: description: SQL statement type: string properties: title: SessionProperties description: Properties of the client session type: object additionalProperties: type: string flinkConfiguration: title: StatementFlinkConfiguration description: Flink configuration for the statement type: object additionalProperties: type: string computePoolName: description: Name of the ComputePool type: string parallelism: description: Parallelism of the statement type: integer format: int32 stopped: description: Whether the statement is stopped type: boolean startFromSavepoint: description: Configuration for starting/resuming the statement from a savepoint $ref: '#/components/schemas/StatementStartFromSavepoint' required: - statement - computePoolName status: title: StatementStatus description: Status for statement type: object properties: phase: description: The lifecycle phase of the statement type: string detail: description: Details about the execution status of the statement type: string traits: title: StatementTraits description: Detailed information about the properties of the statement type: object properties: sqlKind: description: The kind of SQL statement type: string isBounded: description: Whether the result of the statement is bounded type: boolean isAppendOnly: description: Whether the result of the statement is append only type: boolean upsertColumns: description: The column indexes that are updated by the statement type: array items: type: integer format: int32 schema: title: StatementResultSchema description: The schema of the statement result $ref: '#/components/schemas/ResultSchema' required: - phase result: title: StatementResult description: Result of the statement $ref: '#/components/schemas/StatementResult' required: # status and result are optional for Statement spec - metadata - spec StatementsPage: type: object allOf: - $ref: '#/components/schemas/PaginationResponse' - type: object properties: metadata: type: object title: StatementPageMetadata properties: size: type: integer format: int64 default: 0 items: type: array items: $ref: '#/components/schemas/Statement' default: [ ] StatementResult: title: StatementResult description: Represents the result of a SQL Statement allOf: - $ref: '#/components/schemas/ResourceBaseV2' - type: object properties: metadata: title: StatementResultMetadata description: Metadata about the StatementResult type: object properties: creationTimestamp: description: Timestamp when the StatementResult was created type: string annotations: description: Annotations of the StatementResult type: object additionalProperties: type: string results: title: StatementResults description: Results of the Statement type: object properties: data: title: Data type: array items: description: A result row type: object format: yamlorjson required: - metadata - results StatementException: title: StatementException description: Represents an exception that occurred while executing a SQL Statement type: object allOf: - $ref: '#/components/schemas/ResourceBaseV2' - type: object properties: name: description: Name of the StatementException type: string message: description: Message of the StatementException type: string timestamp: description: Timestamp when the StatementException was created type: string required: - name - message - timestamp StatementExceptionList: title: StatementExceptionList description: Represents a list of exceptions that occurred while executing a SQL Statement type: object allOf: - $ref: '#/components/schemas/ResourceBaseV2' - type: object properties: data: title: Exceptions description: List of exceptions type: array maxItems: 10 items: $ref: '#/components/schemas/StatementException' default: [ ] required: - data DataType: title: DataType description: Represents a SQL data type type: object properties: type: description: Name of the data type of the column type: string nullable: description: Whether the data type is nullable type: boolean length: description: Length of the data type type: integer format: int32 precision: description: Precision of the data type type: integer format: int32 scale: description: Scale of the data type type: integer format: int32 keyType: description: Type of the key in the data type (if applicable) $ref: '#/components/schemas/DataType' x-go-pointer: true valueType: description: Type of the value in the data type (if applicable) $ref: '#/components/schemas/DataType' x-go-pointer: true elementType: description: Type of the elements in the data type (if applicable) $ref: '#/components/schemas/DataType' x-go-pointer: true fields: description: Fields of the data type (if applicable) type: array items: type: object title: DataTypeField description: Field of the data type properties: name: description: Name of the field type: string fieldType: description: Type of the field $ref: '#/components/schemas/DataType' x-go-pointer: true description: description: Description of the field type: string required: - name - fieldType resolution: description: Resolution of the data type (if applicable) type: string fractionalPrecision: description: Fractional precision of the data type (if applicable) type: integer format: int32 required: - type - nullable ResultSchema: title: ResultSchema description: Represents the schema of the result of a SQL Statement type: object properties: columns: description: Properites of all columns in the schema type: array items: title: ResultSchemaColumn type: object properties: name: description: Name of the column type: string type: description: Type of the column $ref: '#/components/schemas/DataType' required: - name - type required: - columns ComputePool: title: ComputePool description: Represents the configuration of a Flink cluster type: object allOf: - $ref: '#/components/schemas/ResourceBaseV2' - type: object properties: metadata: title: ComputePoolMetadata description: Metadata about the ComputePool type: object properties: name: description: Name of the ComputePool type: string creationTimestamp: description: Timestamp when the ComputePool was created type: string uid: description: Unique identifier of the ComputePool type: string labels: description: Labels of the ComputePool type: object additionalProperties: type: string annotations: description: Annotations of the ComputePool type: object additionalProperties: type: string required: - name spec: title: ComputePoolSpec description: Spec for ComputePool type: object properties: type: description: Type of the ComputePool type: string clusterSpec: description: Cluster Spec type: object format: yamlorjson required: - type - clusterSpec status: title: ComputePoolStatus description: Status for ComputePool type: object properties: phase: description: Phase of the ComputePool type: string required: - phase required: # status is optional for ComputePool spec - metadata - spec ComputePoolsPage: type: object allOf: - $ref: '#/components/schemas/PaginationResponse' - type: object properties: metadata: type: object title: ComputePoolPageMetadata properties: size: type: integer format: int64 default: 0 items: type: array items: $ref: '#/components/schemas/ComputePool' default: [ ] CatalogMetadata: title: CatalogMetadata description: Metadata about the Catalog type: object properties: name: description: Name of the Catalog type: string creationTimestamp: description: Timestamp when the Catalog was created type: string updateTimestamp: description: Timestamp when the Catalog was updated the last time type: string uid: description: Unique identifier of the Catalog type: string labels: description: Labels of the Catalog type: object additionalProperties: type: string annotations: description: Annotations of the Catalog type: object additionalProperties: type: string required: - name KafkaCatalog: title: KafkaCatalog description: Represents a the configuration of a Kafka Catalog type: object allOf: - $ref: '#/components/schemas/ResourceBaseV2' - type: object properties: metadata: $ref: '#/components/schemas/CatalogMetadata' spec: title: KafkaCatalogSpec description: Spec of a Kafka Catalog type: object properties: srInstance: description: Details about the SchemaRegistry instance of the Catalog type: object properties: connectionConfig: description: connection options for the SR client type: object additionalProperties: type: string connectionSecretId: description: an identifier to look up a Kubernetes secret that contains the connection credentials type: string required: - connectionConfig kafkaClusters: type: array items: type: object properties: databaseName: description: the database name under which the Kafka cluster is listed in the Catalog type: string connectionConfig: description: connection options for the Kafka client type: object additionalProperties: type: string connectionSecretId: description: an identifier to look up a Kubernetes secret that contains the connection credentials type: string required: - databaseName - connectionConfig required: - srInstance required: - metadata - spec KafkaCatalogsPage: type: object allOf: - $ref: '#/components/schemas/PaginationResponse' - type: object properties: metadata: type: object title: CatalogPageMetadata properties: size: type: integer format: int64 default: 0 items: type: array items: $ref: '#/components/schemas/KafkaCatalog' default: [ ] DatabaseMetadata: title: DatabaseMetadata description: Metadata about the Database type: object properties: name: description: Name of the Database type: string creationTimestamp: description: Timestamp when the Database was created type: string updateTimestamp: description: Timestamp when the Database was updated the last time type: string uid: description: Unique identifier of the Database type: string labels: description: Labels of the Database type: object additionalProperties: type: string annotations: description: Annotations of the Database type: object additionalProperties: type: string required: - name KafkaDatabase: title: KafkaDatabase description: Represents a the configuration of a Kafka Database type: object allOf: - $ref: '#/components/schemas/ResourceBaseV2' - type: object properties: metadata: $ref: '#/components/schemas/DatabaseMetadata' spec: title: KafkaDatabaseSpec description: Spec of a Kafka Database type: object properties: kafkaCluster: description: Details about the Kafka cluster of the database type: object properties: connectionConfig: description: connection options for the Kafka client type: object additionalProperties: type: string connectionSecretId: description: an identifier to look up a secret that contains the connection credentials type: string required: - connectionConfig alterEnvironments: description: List of environments that have permission to alter the tables of this database type: array items: type: string required: - kafkaCluster required: - metadata - spec KafkaDatabasesPage: type: object allOf: - $ref: '#/components/schemas/PaginationResponse' - type: object properties: metadata: type: object title: DatabasePageMetadata properties: size: type: integer format: int64 default: 0 items: type: array items: $ref: '#/components/schemas/KafkaDatabase' default: [ ] Savepoint: title: Savepoint description: Represents a Savepoint for a Flink Application or Statement type: object allOf: - $ref: '#/components/schemas/ResourceBaseV2' - type: object properties: metadata: title: SavepointMetadata description: Metadata about the Savepoint type: object x-class-extra-annotation: '@com.fasterxml.jackson.annotation.JsonInclude(com.fasterxml.jackson.annotation.JsonInclude.Include.NON_NULL)' properties: name: description: Name of the Savepoint type: string creationTimestamp: description: Timestamp when the Savepoint was created type: string uid: description: Unique identifier of the Savepoint type: string labels: description: Labels of the Savepoint type: object additionalProperties: type: string annotations: description: Annotations of the Savepoint type: object additionalProperties: type: string spec: title: SavepointSpec description: Spec for Savepoint type: object x-class-extra-annotation: '@com.fasterxml.jackson.annotation.JsonInclude(com.fasterxml.jackson.annotation.JsonInclude.Include.NON_NULL)' properties: path: description: Path of the Savepoint type: string backoffLimit: description: Backoff limit for the Savepoint type: integer format: int32 default: -1 formatType: description: Format type of the Savepoint type: string enum: - CANONICAL - NATIVE - UNKNOWN default: CANONICAL status: title: SavepointStatus description: Status for Savepoint type: object x-class-extra-annotation: '@com.fasterxml.jackson.annotation.JsonInclude(com.fasterxml.jackson.annotation.JsonInclude.Include.NON_NULL)' properties: state: description: State of the Savepoint type: string path: description: Path of the Savepoint type: string triggerTimestamp: description: Timestamp when the Savepoint was triggered type: string resultTimestamp: description: Timestamp when the Savepoint result was received type: string failures: description: The number of failures of the Savepoint type: integer format: int32 error: description: The error message for the Savepoint type: string pendingDeletion: description: Whether the Savepoint is pending deletion type: boolean required: - metadata - spec SavepointsPage: type: object allOf: - $ref: '#/components/schemas/PaginationResponse' - type: object properties: metadata: type: object title: SavepointPageMetadata properties: size: type: integer format: int64 default: 0 items: type: array items: $ref: '#/components/schemas/Savepoint' default: [ ] StatementStartFromSavepoint: title: StartStatementFromSavepoint description: Configuration for resuming a Statement from a savepoint. Works only with update Statement. type: object properties: savepointName: description: The name of the Savepoint resource to start Statement from. The request will be rejected if savepoint that has not completed is referenced. type: string uid: description: The uuid of the Savepoint resource to start from. type: string initialSavepointPath: description: The path of the savepoint to start the Statement from. This could be an external path too. type: string allowNonRestoredState: description: A boolean flag to allow the job to start even if some state could not be restored. type: boolean savepointRedeployNonce: description: Nonce used to trigger a full redeployment of the job from the savepoint. In order to trigger redeployment, change the number to a different non-null value. Rollback is not possible after redeployment. type: integer format: int64 ``` # Configure Access Control for Confluent Manager for Apache Flink Confluent Manager for Apache Flink® models its access control around seven resource types, which different types of users have access to. For a general description of role-based access control (RBAC), see [Use Role-Based Access Control (RBAC) for Authorization in Confluent Platform](../../security/authorization/rbac/overview.md#rbac-overview). Following is a list of the resources that are available in Confluent Manager for Apache Flink® for Flink SQL, and their descriptions: * **Flink application**: Defines a Flink application, which starts the Flink Cluster in Application mode. Depending on their assigned role, developers have access to their Flink environment to create, update, and view Flink applications. * **Flink environment**: The environment contains where and how to deploy the application, such as the Kubernetes namespace or central configurations that cannot be overridden. You can use Flink environments to separate the privileges of different teams or organizations. System administrators are responsible for managing the Flink environments and provisioning them correctly. * **Flink statement**: Statements are the resource in CMF used to execute and maintain SQL queries. * **Flink secret**: These are resources that manage the confidential data that can be used by Flink Statements. Currently these can be used for Kafka connection configuration or Schema Registry configuration. * **Flink catalog**: Provides Kafka topics as tables with schemas derived from Schema Registry. * **Flink compute pool**: In CMF, the compute resources that are used to execute a SQL statement. * **Flink detached savepoint**: Detached savepoints are standalone Flink savepoint resources that are not tied to a specific running job. # Stream Processing with Confluent Platform for Apache Flink Confluent Platform for Apache Flink® brings support for Apache Flink® to Confluent Platform. Apache Flink applications are composed of streaming dataflows that are transformed by one or more user-defined operators. These dataflows form directed acyclic graphs that start with one or more sources, and end in one or more sinks. Sources and sinks can be Apache Kafka® topics, which means that Flink integrates nicely with Confluent Platform. To learn more about Confluent Platform for Apache Flink connector support, see [Connectors](jobs/applications/supported-features.md#af-cp-connectors). Confluent Platform for Apache Flink is fully compatible with Apache Flink. However, not all Apache Flink features are supported in Confluent Platform for Apache Flink. To learn more about what features are supported, see [Confluent Platform for Apache Flink Features and Support](jobs/applications/supported-features.md#cpflink-vs-oss). Flink applications are deployed in Kubernetes with Confluent Manager for Apache Flink, which is a central management component that enables users to securely manage a fleet of Flink applications across multiple environments. See the following topics to learn more and get started: - [Install and Configure](installation/overview.md#cpf-install) - [Get Started](get-started/get-started-application.md#cpf-get-started) - [Supported Features](jobs/applications/supported-features.md#cpflink-vs-oss) - [Flink Concepts](concepts/flink.md#cp-flink-concepts) - [Disaster Recovery](disaster-recovery.md#backup-restore-cmf) - [Confluent Manager for Apache Flink](concepts/cmf.md#cmf) - [Get Help](get-help.md#cpf-get-help) ## Confluent Cloud Confluent Cloud provides Kafka as a cloud service, so that means you no longer need to install, upgrade or patch Kafka server components. You also get access to a [cloud-native design](../_glossary.md#term-Kora), which offers Infinite Storage, elastic scaling and an uptime guarantee. If you’re coming to Confluent Cloud from open source Kafka, you can use data-streaming features only available from Confluent, including non-Java client libraries and proxies for Kafka producers and consumers, tools for monitoring and observability, an intuitive browser-based user interface, enterprise-grade security and data governance features. Confluent Cloud includes different types of server processes for steaming data in a production environment. In addition to brokers and topics, Confluent Cloud provides implementations of Kafka Connect, Schema Registry, and ksqlDB. ## Related content - To download an automated version of this quick start, see [the Quick Start on GitHub](https://github.com/confluentinc/examples/tree/latest/cp-quickstart/README.md) - To configure and run a multi-broker cluster without Docker, see [Tutorial: Set Up a Multi-Broker Kafka Cluster](tutorial-multi-broker.md#basics-multi-broker-setup) - To learn how to develop with Confluent Platform, see [Confluent Developer](https://developer.confluent.io/learn-kafka) - For training and certification guidance, including resources and access to hands-on training and certification exams, see [Confluent Education](https://www.confluent.io/training) - To try out basic Kafka, Kafka Streams, and ksqlDB tutorials with step-by-step instructions, see [Kafka Tutorials](https://developer.confluent.io/tutorials/) - To learn how to build stream processing applications in Java or Scala, see [Kafka Streams documentation](../streams/overview.md#kafka-streams) - To learn how to read and write data to and from Kafka using programming languages such as Go, Python, .NET, C/C++, see [Kafka Clients documentation](../clients/overview.md#kafka-clients) ## Run multiple clusters Another option to experiment with is a multi-cluster deployment. This is relevant for trying out features like Replicator, Cluster Linking, and multi-cluster Schema Registry, where you want to share or replicate topic data across two clusters, often modeled as the origin and the destination cluster. These configurations can be used for data sharing across data centers and regions and are often modeled as source and destination clusters. An example configuration for [cluster linking](../multi-dc-deployments/cluster-linking/index.md#cluster-linking) is shown in the diagram below. (A full guide to this setup is available in the [Tutorial: Share Data Across Topics Using Cluster Linking for Confluent Platform](../multi-dc-deployments/cluster-linking/topic-data-sharing.md#tutorial-topic-data-sharing).) ![image](images/kafka-basics-multi-cluster.png) Multi-cluster configurations are described in context under the relevant use cases. Since these configurations will vary depending on what you want to accomplish, the best way to test out multi-cluster is to choose a use case, and follow the feature-specific tutorial. - [Tutorial: Share Data Across Topics Using Cluster Linking for Confluent Platform](../multi-dc-deployments/cluster-linking/topic-data-sharing.md#tutorial-topic-data-sharing) (requires Confluent Platform 6.0.0 or newer, recommended as the best getting started example) - [Tutorial: Replicate Data Across Kafka Clusters in Confluent Platform](../multi-dc-deployments/replicator/replicator-quickstart.md#replicator-quickstart) - [Enabling Multi-Cluster Schema Registry](../schema-registry/schema.md#multi-cluster-sr) ### Control Center 1. Open Control Center in a browser. The default URL is [http://localhost:9021/](http://localhost:9021/). 2. On the **Home** page, click your cluster. 3. In the navigation menu, click **Health+** to open the overview page. 4. Click **Get started** to set up Health+ for your cluster. 5. In the **Enable your cluster to communicate with Health+** section, enter your API key and secret. ![Enable Health+ page in Confluent Control Center](images/c3-health-plus-api-key-secret.png) - If you used the Confluent CLI to generate the key and secret, enter them in the **confluent.telemetry.api.key** and **confluent.telemetry.api.secret** text boxes. - If you used Confluent Cloud Console to generate the key and secret, click **Upload key and secret** and navigate to the file that you downloaded previously. 6. Click **Continue**. 7. ### (Optional) Add additional Confluent Platform services, such as ksqlDB or Connect. - For any Confluent Platform components other than Confluent Server, enable Telemetry Reporting by adding the following lines to the corresponding configuration file for the service. The default location for a component’s configuration file is `$CONFLUENT_HOME/etc//.properties`. ```properties metric.reporters=io.confluent.telemetry.reporter.TelemetryReporter confluent.telemetry.enabled=true confluent.telemetry.api.key= confluent.telemetry.api.secret= ``` #### NOTE Confluent Server doesn’t require the `metric.reporters` setting, but all other Confluent Platform components do require it. - Save the file and restart the service to deploy the new configuration. Use the `confluent local services stop` and `confluent local services start` commands to restart the service. 8. Navigate to the [Health+](https://confluent.cloud/health-plus) page in Confluent Cloud Console to verify that your data is being received. The tile for your Confluent Platform cluster should show **Running**. 9. Click **Finish**. 10. Click the tile for your cluster to see your telemetry data on the [Monitor Using Health+ Dashboard for Confluent Platform](health-plus-monitoring-dashboard.md#health-plus-monitoring-dashboard). #### Enterprise license for Confluent Platform subscription The following Confluent Platform components are under the Confluent Enterprise license for Confluent Platform subscription: * [Confluent Server](available_packages.md#confluent-server-package) The following are a few key features included in Confluent Server: * [Cluster Linking](../multi-dc-deployments/cluster-linking/index.md#cluster-linking) * [Multi-Region Clusters](../multi-dc-deployments/multi-region.md#bmrr) * [Role-based Access Control (RBAC)](../security/authorization/rbac/overview.md#rbac-overview) * [Structured Audit Logs](../security/compliance/audit-logs/audit-logs-concepts.md#audit-logs-concepts) * [Schema Validation](../schema-registry/schema-validation.md#schema-validation) * [Schema Registry Security Plugin for Confluent Platform](../confluent-security-plugins/schema-registry/introduction.md#confluentsecurityplugins-schema-registry-security-plugin) * [Secrets Protection](../security/compliance/secrets/overview.md#secrets) * [Self-Balancing Clusters](../clusters/sbc/index.md#sbc) * [Tiered Storage](../clusters/tiered-storage.md#tiered-storage) * [Schema Linking](../schema-registry/schema-linking-cp.md#schema-linking-cp-overview) * [Data Contracts](/platform/current/schema-registry/fundamentals/data-contracts.html) * Pre-built Connectors In [Confluent Hub](https://www.confluent.io/hub), filter by the Premium and Commercial license types to see the Connectors under the Confluent Enterprise license. * [Control Center for Confluent Platform](https://docs.confluent.io/control-center/current/overview.html) * [Confluent for Kubernetes](https://docs.confluent.io/operator/current/overview.html) * [Confluent Replicator](../multi-dc-deployments/replicator/index.md#replicator-detail) * [MQTT Proxy](../kafka-mqtt/index.md#mqtt-proxy) * [Confluent Platform for Apache Flink](../flink/overview.md#cp-flink-overview) # Upgrade Confluent Platform * [Overview](upgrade-checklist.md) * [Step 0: Prepare for the upgrade](upgrade-checklist.md#step-0-prepare-for-the-upgrade) * [Step 1: Upgrade Kafka controllers and brokers](upgrade-checklist.md#step-1-upgrade-ak-controllers-and-brokers) * [Step 2: Upgrade Confluent Platform components](upgrade-checklist.md#step-2-upgrade-cp-components) * [Step 3: Update configuration files](upgrade-checklist.md#step-3-update-configuration-files) * [Connect Log Redactor configuration](upgrade-checklist.md#connect-log-redactor-configuration) * [Confluent license](upgrade-checklist.md#confluent-license) * [Replication factor for Self-Balancing Clusters](upgrade-checklist.md#replication-factor-for-sbc-long) * [Step 4: Enable Health+](upgrade-checklist.md#step-4-enable-health) * [Step 5: Rebuild applications](upgrade-checklist.md#step-5-rebuild-applications) * [Other Considerations](upgrade-checklist.md#other-considerations) * [Related content](upgrade-checklist.md#related-content) * [Upgrade the Operating System](upgrade-os.md) * [Upgrade order](upgrade-os.md#upgrade-order) * [Pre-upgrade checks](upgrade-os.md#pre-upgrade-checks) * [Post-upgrade checks](upgrade-os.md#post-upgrade-checks) * [Related content](upgrade-os.md#related-content) * [Confluent Platform Upgrade Procedure](upgrade.md) * [Upgrade prerequisite: client protocol deprecation](upgrade.md#upgrade-prerequisite-client-protocol-deprecation) * [Preparation](upgrade.md#preparation) * [Upgrade order](upgrade.md#upgrade-order) * [Stage 1: Preparation (If you are on Confluent Platform 7.7 or earlier)](upgrade.md#stage-1-preparation-if-you-are-on-cp-7-7-or-earlier) * [Stage 2: Upgrade to Confluent Platform 8.1](upgrade.md#stage-2-upgrade-to-cp-version) * [Upgrade Kafka](upgrade.md#upgrade-ak) * [Steps to upgrade for any fix pack release](upgrade.md#steps-to-upgrade-for-any-fix-pack-release) * [Steps for upgrading to 8.1.x](upgrade.md#steps-for-upgrading-to-version-x) * [Confluent license](upgrade.md#confluent-license) * [Advertised listeners](upgrade.md#advertised-listeners) * [Security](upgrade.md#security) * [Replication factor for Self-Balancing Clusters](upgrade.md#replication-factor-for-sbc-long) * [Upgrade DEB packages using APT](upgrade.md#upgrade-deb-packages-using-apt) * [Install a specific version on Debian or Ubuntu](upgrade.md#install-a-specific-version-on-debian-or-ubuntu) * [Method 1: Install the latest version (Recommended)](upgrade.md#method-1-install-the-latest-version-recommended) * [Method 2: Force a specific older version with pinning](upgrade.md#method-2-force-a-specific-older-version-with-pinning) * [Upgrade RPM packages by using YUM](upgrade.md#upgrade-rpm-packages-by-using-yum) * [Upgrade using TAR or ZIP archives](upgrade.md#upgrade-using-tar-or-zip-archives) * [Upgrade Confluent Control Center](upgrade.md#upgrade-c3) * [Upgrade Schema Registry](upgrade.md#upgrade-sr) * [Upgrade Confluent REST Proxy](upgrade.md#upgrade-crest-long) * [Upgrade Kafka Streams applications](upgrade.md#upgrade-kstreams-applications) * [Upgrade Kafka Connect](upgrade.md#upgrade-kconnect-long) * [Upgrade Kafka Connect standalone mode](upgrade.md#upgrade-kconnect-long-standalone-mode) * [Upgrade Kafka Connect distributed mode](upgrade.md#upgrade-kconnect-long-distributed-mode) * [Upgrade ksqlDB](upgrade.md#upgrade-ksqldb) * [Upgrade other client applications](upgrade.md#upgrade-other-client-applications) * [Related content](upgrade.md#related-content) ## Next steps **RBAC:** * [Role-Based Access Control for Confluent Platform Quick Start](../../security/authorization/rbac/rbac-cli-quickstart.md#rbac-cli-quickstart) * [Configure RBAC for Control Center on Confluent Platform](/control-center/current/security/c3-rbac.html) * [Deploy Secure ksqlDB with RBAC in Confluent Platform](../../security/authorization/rbac/ksql-rbac.md#ksql-rbac) * [Configure Role-Based Access Control for Schema Registry in Confluent Platform](../../schema-registry/security/rbac-schema-registry.md#schemaregistry-rbac) * [Kafka Connect and RBAC](../../connect/rbac-index.md#connect-rbac-index) * [Role-Based Access Control (RBAC)](../../kafka-rest/production-deployment/rest-proxy/security.md#rbac-rest-proxy-security) **Centralized ACLs:** * [Use Centralized ACLs with MDS for Authorization in Confluent Platform](../../security/authorization/rbac/authorization-acl-with-mds.md#authorization-acl-with-mds) **Centralized audit logs:** * [Configure Audit Logs in Confluent Platform Using Confluent CLI](../../security/compliance/audit-logs/audit-logs-cli-config.md#audit-log-cli-config) **Cluster registry:** * [Cluster Registry in Confluent Platform](../../security/cluster-registry.md#cluster-registry) #### NOTE Most configuration attributes show example values in `<>`, which can be helpful in terms of understanding the type of value expected. Users are expected to replace the example with values matching their own setup. Values displayed without `<>` can be used as recommended values. ```RST ############################# Broker Settings ################################## zookeeper.connect=:2181,:2181,:2181 log.dirs=/var/lib/kafka/data broker.id=1 ############################# Log Retention Policy, Log Basics ################## log.retention.check.interval.ms=300000 log.retention.hours=168 log.segment.bytes=1073741824 num.io.threads=16 num.network.threads=8 num.partitions=1 num.recovery.threads.per.data.dir=2 ########################### Socket Server Settings ############################# socket.receive.buffer.bytes=102400 socket.request.max.bytes=104857600 socket.send.buffer.bytes=102400 ############################# Internal Topic Settings ######################### offsets.topic.replication.factor=3 transaction.state.log.min.isr=2 transaction.state.log.replication.factor=3 ######################## Metrics Reporting ######################################## metric.reporters=io.confluent.metrics.reporter.ConfluentMetricsReporter confluent.metrics.reporter.bootstrap.servers=
-west-2.compute.internal:9092 confluent.metrics.reporter.topic.replicas=3 confluent.support.customer.id=anonymous ######################## LISTENERS ###################################### listeners=INTERNAL://:9092,EXTERNAL://:9093,TOKEN://:9094 advertised.listeners=INTERNAL://:9092,\ EXTERNAL://:9093,\ TOKEN://:9094 listener.security.protocol.map=INTERNAL:SSL,EXTERNAL:SSL,TOKEN:SASL_SSL inter.broker.listener.name=INTERNAL ############################ TLS/SSL SETTINGS ##################################### ssl.truststore.location=/var/ssl/private/client.truststore.jks ssl.truststore.password= ssl.keystore.location=/var/ssl/private/kafka.keystore.jks ssl.keystore.password= ssl.key.password= ssl.client.auth=required ssl.endpoint.identification.algorithm=HTTPS ############## TLS/SSL settings for metrics reporting ############## confluent.metrics.reporter.security.protocol=SSL confluent.metrics.reporter.ssl.truststore.location=/var/ssl/private/client.truststore.jks confluent.metrics.reporter.ssl.truststore.password= confluent.metrics.reporter.ssl.keystore.location=/var/ssl/private/kafka.keystore.jks confluent.metrics.reporter.ssl.keystore.password= confluent.metrics.reporter.ssl.key.password= ############################# TLS/SSL LISTENERS ############################# listener.name.internal.ssl.principal.mapping.rules= \ RULE:^CN=([a-zA-Z0-9.]*).*$/$1/L ,\ DEFAULT listener.name.external.ssl.principal.mapping.rules= \ RULE:^CN=([a-zA-Z0-9.]*).*$/$1/L ,\ DEFAULT ############################# TOKEN LISTENER ############################# listener.name.token.sasl.enabled.mechanisms=OAUTHBEARER listener.name.token.oauthbearer.sasl.jaas.config= \ org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required \ publicKeyPath=""; listener.name.token.oauthbearer.sasl.server.callback.handler.class=io.confluent.kafka.server.plugins.auth.token.TokenBearerValidatorCallbackHandler listener.name.token.oauthbearer.sasl.login.callback.handler.class=io.confluent.kafka.server.plugins.auth.token.TokenBearerServerLoginCallbackHandler ############################# Authorization Settings ############################# authorizer.class.name=io.confluent.kafka.security.authorizer.ConfluentServerAuthorizer confluent.authorizer.access.rule.providers=ZK_ACL,CONFLUENT super.users=User:kafka ############################# MDS Listener - which port to listen on ############################# confluent.metadata.server.listeners=https://0.0.0.0:8090,http://0.0.0.0:8091 confluent.metadata.server.advertised.listeners=https://:8090,\ http://:8091 ############################# TLS/SSL Settings for MDS ############################# confluent.metadata.server.ssl.keystore.location= confluent.metadata.server.ssl.keystore.password= confluent.metadata.server.ssl.key.password= confluent.metadata.server.ssl.truststore.location= confluent.metadata.server.ssl.truststore.password= ############################# MDS Token Service Settings - enable token generation ############################# confluent.metadata.server.token.max.lifetime.ms=3600000 confluent.metadata.server.token.key.path= confluent.metadata.server.token.signature.algorithm=RS256 confluent.metadata.server.authentication.method=BEARER ############################# Identity Provider Settings(LDAP - local OpenLDAP) ############################# ldap.java.naming.factory.initial=com.sun.jndi.ldap.LdapCtxFactory ldap.com.sun.jndi.ldap.read.timeout=3000 ldap.java.naming.provider.url=ldap: # how mds authenticates to ldap server ldap.java.naming.security.principal= ldap.java.naming.security.credentials= ldap.java.naming.security.authentication=simple # ldap search mode (GROUPS is default) #ldap.search.mode=GROUPS #ldap.search.mode=USERS # how to search for users ldap.user.search.base= # how to search for groups ldap.group.search.base= # which attribute in ldap record corresponds to user name ldap.user.name.attribute=sAMAccountName ldap.user.memberof.attribute.pattern= ldap.group.object.class=group ldap.group.name.attribute=sAMAccountName ldap.group.member.attribute.pattern= ########################### Enable Swagger ############################# confluent.metadata.server.openapi.enable=true ``` ### GET /clusters **List Clusters** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) ‘Return a list of known Kafka clusters. Currently both Kafka and Kafka REST Proxy are only aware of the Kafka cluster pointed at by the `bootstrap.servers` configuration. Therefore only one Kafka cluster will be returned in the response.’ **Example request:** ```http GET /clusters HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The list of Kafka clusters. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaClusterList", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters", "next": null }, "data": [ { "kind": "KafkaCluster", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1", "resource_name": "crn:///kafka=cluster-1" }, "cluster_id": "cluster-1", "controller": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers/1" }, "acls": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/acls" }, "brokers": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers" }, "broker_configs": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/broker-configs" }, "consumer_groups": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups" }, "topics": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics" }, "partition_reassignments": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/-/partitions/-/reassignment" } } ] } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### GET /clusters/{cluster_id}/broker-configs **List Dynamic Broker Configs** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Return a list of dynamic cluster-wide broker configuration parameters for the specified Kafka cluster. Returns an empty list if there are no dynamic cluster-wide broker configuration parameters. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. **Example request:** ```http GET /clusters/{cluster_id}/broker-configs HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The list of cluster configs. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaClusterConfigList", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/broker-configs", "next": null }, "data": [ { "kind": "KafkaClusterConfig", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/broker-configs/max.connections", "resource_name": "crn:///kafka=cluster-1/broker-config=max.connections" }, "cluster_id": "cluster-1", "config_type": "BROKER", "name": "max.connections", "value": "1000", "is_default": false, "is_read_only": false, "is_sensitive": false, "source": "DYNAMIC_DEFAULT_BROKER_CONFIG", "synonyms": [ { "name": "max.connections", "value": "1000", "source": "DYNAMIC_DEFAULT_BROKER_CONFIG" }, { "name": "max.connections", "value": "2147483647", "source": "DEFAULT_CONFIG" } ] }, { "kind": "KafkaClusterConfig", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/broker-configs/compression.type", "resource_name": "crn:///kafka=cluster-1/broker-config=compression.type" }, "cluster_id": "cluster-1", "config_type": "BROKER", "name": "compression.type", "value": "gzip", "is_default": false, "is_read_only": false, "is_sensitive": false, "source": "DYNAMIC_DEFAULT_BROKER_CONFIG", "synonyms": [ { "name": "compression.type", "value": "gzip", "source": "DYNAMIC_DEFAULT_BROKER_CONFIG" }, { "name": "compression.type", "value": "producer", "source": "DEFAULT_CONFIG" } ] } ] } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### POST /clusters/{cluster_id}/broker-configs:alter **Batch Alter Dynamic Broker Configs** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Update or delete a set of dynamic cluster-wide broker configuration parameters. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. **Example request:** ```http POST /clusters/{cluster_id}/broker-configs:alter HTTP/1.1 Host: example.com Content-Type: application/json { "data": [ { "name": "max.connections", "operation": "DELETE" }, { "name": "compression.type", "value": "gzip" } ] } ``` * **Status Codes:** * [204 No Content](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.5) – No Content * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### GET /clusters/{cluster_id}/broker-configs/{name} **Get Dynamic Broker Config** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Return the dynamic cluster-wide broker configuration parameter specified by `name`. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **name** (*string*) – The configuration parameter name. **Example request:** ```http GET /clusters/{cluster_id}/broker-configs/{name} HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The cluster configuration parameter. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaClusterConfig", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/broker-configs/compression.type", "resource_name": "crn:///kafka=cluster-1/broker-config=compression.type" }, "cluster_id": "cluster-1", "config_type": "BROKER", "name": "compression.type", "value": "gzip", "is_default": false, "is_read_only": false, "is_sensitive": false, "source": "DYNAMIC_DEFAULT_BROKER_CONFIG", "synonyms": [ { "name": "compression.type", "value": "gzip", "source": "DYNAMIC_DEFAULT_BROKER_CONFIG" }, { "name": "compression.type", "value": "producer", "source": "DEFAULT_CONFIG" } ] } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### GET /clusters/{cluster_id}/brokers/-/configs **List Dynamic Broker Configs** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Return the list of dynamic configuration parameters for all the brokers in the given Kafka cluster. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. **Example request:** ```http GET /clusters/{cluster_id}/brokers/-/configs HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The list of broker configs. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaBrokerConfigList", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers/1/configs", "next": null }, "data": [ { "kind": "KafkaBrokerConfig", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers/1/configs/max.connections", "resource_name": "crn:///kafka=cluster-1/broker=1/config=max.connections" }, "cluster_id": "cluster-1", "broker_id": 1, "name": "max.connections", "value": "1000", "is_default": false, "is_read_only": false, "is_sensitive": false, "source": "DYNAMIC_BROKER_CONFIG", "synonyms": [ { "name": "max.connections", "value": "1000", "source": "DYNAMIC_BROKER_CONFIG" }, { "name": "max.connections", "value": "2147483647", "source": "DEFAULT_CONFIG" } ] }, { "kind": "KafkaBrokerConfig", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers/1/configs/compression.type", "resource_name": "crn:///kafka=cluster-1/broker=1/config=compression.type" }, "cluster_id": "cluster-1", "broker_id": 1, "name": "compression.type", "value": "gzip", "is_default": false, "is_read_only": false, "is_sensitive": false, "source": "DYNAMIC_BROKER_CONFIG", "synonyms": [ { "name": "compression.type", "value": "gzip", "source": "DYNAMIC_BROKER_CONFIG" }, { "name": "compression.type", "value": "producer", "source": "DEFAULT_CONFIG" } ] } ] } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### GET /clusters/{cluster_id}/brokers/{broker_id}/configs **List Broker Configs** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Return the list of configuration parameters that belong to the specified Kafka broker. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **broker_id** (*integer*) – The Kafka broker ID. **Example request:** ```http GET /clusters/{cluster_id}/brokers/{broker_id}/configs HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The list of broker configs. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaBrokerConfigList", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers/1/configs", "next": null }, "data": [ { "kind": "KafkaBrokerConfig", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers/1/configs/max.connections", "resource_name": "crn:///kafka=cluster-1/broker=1/config=max.connections" }, "cluster_id": "cluster-1", "broker_id": 1, "name": "max.connections", "value": "1000", "is_default": false, "is_read_only": false, "is_sensitive": false, "source": "DYNAMIC_BROKER_CONFIG", "synonyms": [ { "name": "max.connections", "value": "1000", "source": "DYNAMIC_BROKER_CONFIG" }, { "name": "max.connections", "value": "2147483647", "source": "DEFAULT_CONFIG" } ] }, { "kind": "KafkaBrokerConfig", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers/1/configs/compression.type", "resource_name": "crn:///kafka=cluster-1/broker=1/config=compression.type" }, "cluster_id": "cluster-1", "broker_id": 1, "name": "compression.type", "value": "gzip", "is_default": false, "is_read_only": false, "is_sensitive": false, "source": "DYNAMIC_BROKER_CONFIG", "synonyms": [ { "name": "compression.type", "value": "gzip", "source": "DYNAMIC_BROKER_CONFIG" }, { "name": "compression.type", "value": "producer", "source": "DEFAULT_CONFIG" } ] } ] } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### POST /clusters/{cluster_id}/brokers/{broker_id}/configs:alter **Batch Alter Broker Configs** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Update or delete a set of broker configuration parameters. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **broker_id** (*integer*) – The Kafka broker ID. **Example request:** ```http POST /clusters/{cluster_id}/brokers/{broker_id}/configs:alter HTTP/1.1 Host: example.com Content-Type: application/json { "data": [ { "name": "max.connections", "operation": "DELETE" }, { "name": "compression.type", "value": "gzip" } ] } ``` * **Status Codes:** * [204 No Content](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.5) – No Content * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### GET /clusters/{cluster_id}/brokers/{broker_id}/configs/{name} **Get Broker Config** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Return the configuration parameter specified by `name`. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **broker_id** (*integer*) – The Kafka broker ID. * **name** (*string*) – The configuration parameter name. **Example request:** ```http GET /clusters/{cluster_id}/brokers/{broker_id}/configs/{name} HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The broker configuration parameter. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaBrokerConfig", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers/1/configs/compression.type", "resource_name": "crn:///kafka=cluster-1/broker=1/config=compression.type" }, "cluster_id": "cluster-1", "broker_id": 1, "name": "compression.type", "value": "gzip", "is_default": false, "is_read_only": false, "is_sensitive": false, "source": "DYNAMIC_BROKER_CONFIG", "synonyms": [ { "name": "compression.type", "value": "gzip", "source": "DYNAMIC_BROKER_CONFIG" }, { "name": "compression.type", "value": "producer", "source": "DEFAULT_CONFIG" } ] } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### GET /clusters/{cluster_id}/topics/{topic_name}/configs **List Topic Configs** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Return the list of configuration parameters that belong to the specified topic. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **topic_name** (*string*) – The topic name. **Example request:** ```http GET /clusters/{cluster_id}/topics/{topic_name}/configs HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The list of cluster configs. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaTopicConfigList", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/topic-1/configs", "next": null }, "data": [ { "kind": "KafkaTopicConfig", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/topic-1/configs/cleanup.policy", "resource_name": "crn:///kafka=cluster-1/topic=topic-1/config=cleanup.policy" }, "cluster_id": "cluster-1", "topic_name": "topic-1", "name": "cleanup.policy", "value": "compact", "is_default": false, "is_read_only": false, "is_sensitive": false, "source": "DYNAMIC_TOPIC_CONFIG", "synonyms": [ { "name": "cleanup.policy", "value": "compact", "source": "DYNAMIC_TOPIC_CONFIG" }, { "name": "cleanup.policy", "value": "delete", "source": "DEFAULT_CONFIG" } ] }, { "kind": "KafkaTopicConfig", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/topic-1/configs/compression.type", "resource_name": "crn:///kafka=cluster-1/topic=topic-1/config=compression.type" }, "cluster_id": "cluster-1", "topic_name": "topic-1", "name": "compression.type", "value": "gzip", "is_default": false, "is_read_only": false, "is_sensitive": false, "source": "DYNAMIC_TOPIC_CONFIG", "synonyms": [ { "name": "compression.type", "value": "gzip", "source": "DYNAMIC_TOPIC_CONFIG" }, { "name": "compression.type", "value": "producer", "source": "DEFAULT_CONFIG" } ] } ] } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [404 Not Found](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.5) – Indicates attempted access to an unreachable or non-existing resource like e.g. an unknown topic or partition. GET requests to endpoints not allowed in the accesslists will also result in this response. **endpoint_not_found:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "error_code": 404, "message": "HTTP 404 Not Found" } ``` **cluster_not_found:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "error_code": 404, "message": "Cluster my-cluster cannot be found." } ``` **unknown_topic_or_partition:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "error_code": 40403, "message": "This server does not host this topic-partition." } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### GET /clusters/{cluster_id}/topics/{topic_name}/configs/{name} **Get Topic Config** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Return the configuration parameter with the given `name`. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **topic_name** (*string*) – The topic name. * **name** (*string*) – The configuration parameter name. **Example request:** ```http GET /clusters/{cluster_id}/topics/{topic_name}/configs/{name} HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The topic configuration parameter. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaTopicConfig", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/topic-1/compression.type", "resource_name": "crn:///kafka=cluster-1/topic=topic-1/config=compression.type" }, "cluster_id": "cluster-1", "topic_name": "topic-1", "name": "compression.type", "value": "gzip", "is_default": false, "is_read_only": false, "is_sensitive": false, "source": "DYNAMIC_TOPIC_CONFIG", "synonyms": [ { "name": "compression.type", "value": "gzip", "source": "DYNAMIC_TOPIC_CONFIG" }, { "name": "compression.type", "value": "producer", "source": "DEFAULT_CONFIG" } ] } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [404 Not Found](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.5) – Indicates attempted access to an unreachable or non-existing resource like e.g. an unknown topic or partition. GET requests to endpoints not allowed in the accesslists will also result in this response. **endpoint_not_found:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "error_code": 404, "message": "HTTP 404 Not Found" } ``` **cluster_not_found:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "error_code": 404, "message": "Cluster my-cluster cannot be found." } ``` **unknown_topic_or_partition:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "error_code": 40403, "message": "This server does not host this topic-partition." } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### PUT /clusters/{cluster_id}/topics/{topic_name}/configs/{name} **Update Topic Config** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Update the configuration parameter with given `name`. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **topic_name** (*string*) – The topic name. * **name** (*string*) – The configuration parameter name. **Example request:** ```http PUT /clusters/{cluster_id}/topics/{topic_name}/configs/{name} HTTP/1.1 Host: example.com Content-Type: application/json { "value": "gzip" } ``` * **Status Codes:** * [204 No Content](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.5) – No Content * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [404 Not Found](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.5) – Indicates attempted access to an unreachable or non-existing resource like e.g. an unknown topic or partition. GET requests to endpoints not allowed in the accesslists will also result in this response. **endpoint_not_found:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "error_code": 404, "message": "HTTP 404 Not Found" } ``` **cluster_not_found:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "error_code": 404, "message": "Cluster my-cluster cannot be found." } ``` **unknown_topic_or_partition:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "error_code": 40403, "message": "This server does not host this topic-partition." } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### DELETE /clusters/{cluster_id}/topics/{topic_name}/configs/{name} **Reset Topic Config** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Reset the configuration parameter with given `name` to its default value. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **topic_name** (*string*) – The topic name. * **name** (*string*) – The configuration parameter name. * **Status Codes:** * [204 No Content](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.5) – No Content * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [404 Not Found](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.5) – Indicates attempted access to an unreachable or non-existing resource like e.g. an unknown topic or partition. GET requests to endpoints not allowed in the accesslists will also result in this response. **endpoint_not_found:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "error_code": 404, "message": "HTTP 404 Not Found" } ``` **cluster_not_found:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "error_code": 404, "message": "Cluster my-cluster cannot be found." } ``` **unknown_topic_or_partition:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "error_code": 40403, "message": "This server does not host this topic-partition." } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### GET /clusters/{cluster_id}/topics/{topic_name}/default-configs **List New Topic Default Configs** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) List the default configuration parameters used if the topic were to be newly created. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **topic_name** (*string*) – The topic name. **Example request:** ```http GET /clusters/{cluster_id}/topics/{topic_name}/default-configs HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The list of cluster configs. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaTopicConfigList", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/topic-1/configs", "next": null }, "data": [ { "kind": "KafkaTopicConfig", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/topic-1/configs/cleanup.policy", "resource_name": "crn:///kafka=cluster-1/topic=topic-1/config=cleanup.policy" }, "cluster_id": "cluster-1", "topic_name": "topic-1", "name": "cleanup.policy", "value": "compact", "is_default": false, "is_read_only": false, "is_sensitive": false, "source": "DYNAMIC_TOPIC_CONFIG", "synonyms": [ { "name": "cleanup.policy", "value": "compact", "source": "DYNAMIC_TOPIC_CONFIG" }, { "name": "cleanup.policy", "value": "delete", "source": "DEFAULT_CONFIG" } ] }, { "kind": "KafkaTopicConfig", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/topic-1/configs/compression.type", "resource_name": "crn:///kafka=cluster-1/topic=topic-1/config=compression.type" }, "cluster_id": "cluster-1", "topic_name": "topic-1", "name": "compression.type", "value": "gzip", "is_default": false, "is_read_only": false, "is_sensitive": false, "source": "DYNAMIC_TOPIC_CONFIG", "synonyms": [ { "name": "compression.type", "value": "gzip", "source": "DYNAMIC_TOPIC_CONFIG" }, { "name": "compression.type", "value": "producer", "source": "DEFAULT_CONFIG" } ] } ] } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [404 Not Found](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.5) – Indicates attempted access to an unreachable or non-existing resource like e.g. an unknown topic or partition. GET requests to endpoints not allowed in the accesslists will also result in this response. **endpoint_not_found:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "error_code": 404, "message": "HTTP 404 Not Found" } ``` **cluster_not_found:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "error_code": 404, "message": "Cluster my-cluster cannot be found." } ``` **unknown_topic_or_partition:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "error_code": 40403, "message": "This server does not host this topic-partition." } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### GET /clusters/{cluster_id}/consumer-groups/{consumer_group_id} **Get Consumer Group** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Return the consumer group specified by the `consumer_group_id`. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **consumer_group_id** (*string*) – The consumer group ID. **Example request:** ```http GET /clusters/{cluster_id}/consumer-groups/{consumer_group_id} HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The consumer group. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaConsumerGroup", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-1", "resource_name": "crn:///kafka=cluster-1/consumer-group=consumer-group-1" }, "cluster_id": "cluster-1", "consumer_group_id": "consumer-group-1", "is_simple": false, "partition_assignor": "org.apache.kafka.clients.consumer.RoundRobinAssignor", "state": "STABLE", "coordinator": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers/1" }, "consumers": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-1/consumers" }, "lag_summary": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-1/lag-summary" } } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### GET /clusters/{cluster_id}/consumer-groups/{consumer_group_id}/lag-summary **Get Consumer Group Lag Summary** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Return the maximum and total lag of the consumers belonging to the specified consumer group. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **consumer_group_id** (*string*) – The consumer group ID. **Example request:** ```http GET /clusters/{cluster_id}/consumer-groups/{consumer_group_id}/lag-summary HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The max and total consumer lag in a consumer group. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaConsumerGroupLagSummary", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-1/lag-summary", "resource_name": "crn:///kafka=cluster-1/consumer-groups=consumer-group-1/lag-summary" }, "cluster_id": "cluster-1", "consumer_group_id": "consumer-group-1", "max_lag_consumer_id": "consumer-1", "max_lag_instance_id": "consumer-instance-1", "max_lag_client_id": "client-1", "max_lag_topic_name": "topic-1", "max_lag_partition_id": 1, "max_lag": 100, "total_lag": 110, "max_lag_consumer": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-1/consumers/consumer-1" }, "max_lag_partition": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/topic-1/partitions/1" } } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### GET /clusters/{cluster_id}/consumer-groups/{consumer_group_id}/lags/{topic_name}/partitions/{partition_id} **Get Consumer Lag** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy)[![Available in dedicated clusters only](https://img.shields.io/badge/-Available%20in%20dedicated%20clusters%20only-%23bc8540)](https://docs.confluent.io/cloud/current/clusters/cluster-types.html#dedicated-cluster) Return the consumer lag on a partition with the given `partition_id`. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **consumer_group_id** (*string*) – The consumer group ID. * **topic_name** (*string*) – The topic name. * **partition_id** (*integer*) – The partition ID. **Example request:** ```http GET /clusters/{cluster_id}/consumer-groups/{consumer_group_id}/lags/{topic_name}/partitions/{partition_id} HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The consumer lag. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaConsumerLag", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-1/lags/topic-1/partitions/1", "resource_name": "crn:///kafka=cluster-1/consumer-group=consumer-group-1/lag=topic-1/partition=1" }, "cluster_id": "cluster-1", "consumer_group_id": "consumer-group-1", "topic_name": "topic-1", "partition_id": 1, "consumer_id": "consumer-1", "instance_id": "consumer-instance-1", "client_id": "client-1", "current_offset": 1, "log_end_offset": 101, "lag": 100 } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### POST /clusters/{cluster_id}/topics/{topic_name}/records **Produce Records** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Produce records to the given topic, returning delivery reports for each record produced. This API can be used in streaming mode by setting “Transfer-Encoding: chunked” header. For as long as the connection is kept open, the server will keep accepting records. For each record sent to the server, the server will asynchronously send back a delivery report, in the same order. Records are streamed to and from the server as Concatenated JSON. Errors are reported per record. The HTTP status code will be HTTP 200 OK as long as the connection is successfully established. Note that the cluster_id is validated only when running in Confluent Cloud. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **topic_name** (*string*) – The topic name. **binary_and_json:** ```http POST /clusters/{cluster_id}/topics/{topic_name}/records HTTP/1.1 Host: example.com Content-Type: application/json { "partition_id": 1, "headers": [ { "name": "Header-1", "value": "SGVhZGVyLTE=" }, { "name": "Header-2", "value": "SGVhZGVyLTI=" } ], "key": { "type": "BINARY", "data": "Zm9vYmFy" }, "value": { "type": "JSON", "data": { "foo": "bar" } }, "timestamp": "2021-02-05T19:14:42Z" } ``` **binary_and_avro_with_subject_and_raw_schema:** ```http POST /clusters/{cluster_id}/topics/{topic_name}/records HTTP/1.1 Host: example.com Content-Type: application/json { "partition_id": 1, "headers": [ { "name": "Header-1", "value": "SGVhZGVyLTE=" }, { "name": "Header-2", "value": "SGVhZGVyLTI=" } ], "key": { "type": "BINARY", "data": "Zm9vYmFy" }, "value": { "type": "AVRO", "subject": "topic-1-key", "schema": "{\\\"type\\\":\\\"string\\\"}", "data": "foobar" }, "timestamp": "2021-02-05T19:14:42Z" } ``` **string:** ```http POST /clusters/{cluster_id}/topics/{topic_name}/records HTTP/1.1 Host: example.com Content-Type: application/json { "value": { "type": "STRING", "data": "My message" } } ``` **schema_id_and_schema_version:** ```http POST /clusters/{cluster_id}/topics/{topic_name}/records HTTP/1.1 Host: example.com Content-Type: application/json { "key": { "subject_name_strategy": "TOPIC_NAME", "schema_id": 1, "data": 1000 }, "value": { "schema_version": 1, "data": { "foo": "bar" } } } ``` **latest_schema:** ```http POST /clusters/{cluster_id}/topics/{topic_name}/records HTTP/1.1 Host: example.com Content-Type: application/json { "key": { "data": 1000 }, "value": { "data": "foobar" } } ``` **null_and_empty_data:** ```http POST /clusters/{cluster_id}/topics/{topic_name}/records HTTP/1.1 Host: example.com Content-Type: application/json { "key": { "schema_id": 1 }, "value": { "schema_version": 1, "data": null } } ``` **empty_value:** ```http POST /clusters/{cluster_id}/topics/{topic_name}/records HTTP/1.1 Host: example.com Content-Type: application/json { "key": { "data": 1000 } } ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The response containing a delivery report for a record produced to a topic. In streaming mode, for each record sent, a separate delivery report will be returned, in the same order, each with its own error_code. **produce_record_success:** ```http HTTP/1.1 200 OK Content-Type: application/json { "error_code": 200, "cluster_id": "cluster-1", "topic_name": "topic-1", "partition_id": 1, "offset": 0, "timestamp": "2021-02-05T19:14:42Z", "key": { "type": "BINARY", "size": 7 }, "value": { "type": "JSON", "size": 15 } } ``` **produce_record_bad_binary_data:** ```http HTTP/1.1 200 OK Content-Type: application/json { "error_code": 400, "message": "Bad Request: data=1 is not a base64 string." } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **header_not_base64_encoded:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `byte[]` from String \"\": Unexpected end of base64-encoded String: base64 variant 'MIME-NO-LINEFEEDS' expects padding (one or more '=' characters) at the end. This Base64Variant might have been incorrectly configured" } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [404 Not Found](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.5) – Indicates attempted access to an unreachable or non-existing resource like e.g. an unknown topic or partition. GET requests to endpoints not allowed in the accesslists will also result in this response. **endpoint_not_found:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "error_code": 404, "message": "HTTP 404 Not Found" } ``` **cluster_not_found:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "error_code": 404, "message": "Cluster my-cluster cannot be found." } ``` **unknown_topic_or_partition:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "error_code": 40403, "message": "This server does not host this topic-partition." } ``` * [413 Request Entity Too Large](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.14) – This implies the client is sending a request payload that is larger than the maximum message size the server can accept. **produce_records_expects_json:** ```http HTTP/1.1 413 Request Entity Too Large Content-Type: application/json { "error_code": 413, "message": "The request included a message larger than the maximum message size the server can accept." } ``` * [415 Unsupported Media Type](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.16) – This implies the client is sending the request payload format in an unsupported format. **produce_records_expects_json:** ```http HTTP/1.1 415 Unsupported Media Type Content-Type: application/json { "error_code": 415, "message": "HTTP 415 Unsupported Media Type" } ``` * [422 Unprocessable Entity](https://www.rfc-editor.org/rfc/rfc4918#section-11.2) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **produce_record_empty_request_body:** ```http HTTP/1.1 422 Unprocessable Entity Content-Type: application/json { "error_code": 422, "message": "Payload error. Request body is empty. Data is required." } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ## Role-Based Access Control (RBAC) This is a commercial component of Confluent Platform. **Prerequisites:** * [HTTPS](config.md#confluent-server-rest-http-ssl-config) is recommended, but not required. * RBAC-enabled Kafka and Schema Registry clusters. For details about RBAC, see [Use Role-Based Access Control (RBAC) for Authorization in Confluent Platform](../../../security/authorization/rbac/overview.md#rbac-overview). To enable token authentication (in the `kafka.properties` file) set `kafka.rest.rest.servlet.initializor.classes` to `io.confluent.common.security.jetty.initializer.InstallBearerOrBasicSecurityHandler` and `kafka.rest.kafka.rest.resource.extension.class` to `io.confluent.kafkarest.security.KafkaRestSecurityResourceExtension`. ```bash kafka.rest.rest.servlet.initializor.classes=io.confluent.common.security.jetty.initializer.InstallBearerOrBasicSecurityHandler kafka.rest.kafka.rest.resource.extension.class=io.confluent.kafkarest.security.KafkaRestSecurityResourceExtension ``` When token authentication is enabled, the generated token is used to impersonate the API requests. The Admin REST APIs Kafka clients use the `SASL_PLAINTEXT` or `SASL_SSL` authentication mechanism to authenticate with Kafka brokers. ## Role-Based Access Control (RBAC) This is a commercial component of Confluent Platform. **Prerequisites:** * RBAC-enabled Kafka and Schema Registry clusters. For details about RBAC, see [Use Role-Based Access Control (RBAC) for Authorization in Confluent Platform](../../../security/authorization/rbac/overview.md#rbac-overview). [HTTPS](config.md#kafka-rest-http-ssl-config) is recommended, but not required. Confluent REST Proxy supports the cross-component, proprietary role-based access control (RBAC) solution to enforce access controls across Confluent Platform. The REST Proxy security plugin supports a bearer token-based authentication mechanism. With token authentication, REST Proxy can impersonate the user requests when communicating with Kafka brokers and Schema Registry clusters. RBAC REST Proxy security resolves a number of usability challenges, including: * Local configuration of principals. With RBAC REST Proxy security, principals are no longer configured locally; instead, principals are handled by the Metadata Service (MDS). * Existing REST Proxy security capabilities do not scale for very large deployments without significant manual operations; in RBAC REST Proxy security, the MDS binds and enforces an Kafka cluster configuration across different resources (Topics, Connectors, Schema Registry, etc.), thereby saving users the time and challenge associated with reconfiguring ACLs and roles separately for each Kafka cluster resource. # Stream Processing Concepts in ksqlDB for Confluent Platform ksqlDB enables stream processing, which is a way to compute over events as they arrive, rather than in batches at a later time. These events come from Apache Kafka® topics. In ksqlDB, events are stored in a stream, which is a Kafka topic with a defined schema. When you create a stream in ksqlDB, if the backing Kafka topic doesn’t exist, ksqlDB creates it with the specified number of partitions. The stream’s metadata (schema, serialization scheme, etc.) is stored in ksqlDB’s command topic, which is an internal communication channel. Each ksqlDB server keeps a local copy of this metadata. Events are added to a stream as rows, which are essentially Kafka records with extra metadata. ksqlDB uses a Kafka producer to insert these records into the backing Kafka topic. The data itself is persisted in Kafka, not on the ksqlDB servers. ksqlDB offers a SQL-like interface for transforming streams. You can create new streams derived from existing ones by selecting and manipulating columns. This is done with persistent queries. For example, you can filter a stream or convert the case of a string field. ksqlDB also supports stateful operations. This means that the processing of an event can depend on the accumulated effects of previous events. State can be used for simple aggregations, like counting events, or more complex operations, like feature engineering for machine learning. Each parallel instance of a ksqlDB application handles events for a specific group of keys, and the state for those keys is kept locally. This allows for high throughput and low latency. For fault tolerance, ksqlDB uses state snapshots and stream replay. Snapshots capture the entire state of the pipeline, including offsets in the input queues and the state derived from processed data. In case of failure, the pipeline can be restored from the snapshot, and it can replay the stream from the saved offsets. Sources tables are not kept entirely in state. Stateless operations, like filtering and projections, don’t require state. - [Apache Kafka and ksqlDB](apache-kafka-primer.md#ksqldb-apache-kafka-primer): A quick overview of Kafka. - [Connectors in ksqlDB](connectors.md#ksqldb-connectors): Connectors source and sink data from external systems. - [Events in ksqlDB](events.md#ksqldb-events): An event is the fundamental unit of data in stream processing. - [Joins](../developer-guide/joins/overview.md#ksqldb-joins): Joins are how to combine data from many streams and tables into one. - [User-defined Functions](functions.md#ksqldb-concepts-udfs): Extend ksqlDB to invoke custom code written in Java. - [Lambda Functions](lambda-functions.md#ksqldb-concepts-lambda-functions): Lambda functions enable you to apply in-line functions without creating a full UDF. - [Materialized Views](materialized-views.md#ksqldb-concepts-materialized-views): Materialized views precompute the results of queries at write-time so reads become predictably fast. - [Queries](queries.md#ksqldb-concepts-queries): Queries are how you process events and retrieve computed results. - [Stream Processing](stream-processing.md#ksqldb-concepts-stream-processing): Stream processing is a way to write programs computing over unbounded streams of events. - [Streams](streams.md#ksqldb-concepts-streams): A stream is an immutable, append-only collection of events that represents a series of historical facts. - [Tables](tables.md#ksqldb-concepts-tables): A table is a mutable collection of events that models change over time. - [Time and Windows](time-and-windows-in-ksqldb-queries.md#ksqldb-time-and-windows): Windows help you bound a continuous stream of events into distinct time intervals. ## Get started Start by creating a `pom.xml` for your Java application: ```xml 4.0.0 my.ksqldb.app my-ksqldb-app 0.0.1 8 0.29.0 UTF-8 UTF-8 ksqlDB ksqlDB https://ksqldb-mvns.s3.amazonaws.com/maven/ ksqlDB https://ksqldb-mvns.s3.amazonaws.com/maven/ io.confluent.ksql ksqldb-api-client ${ksqldb.version} with-dependencies org.apache.kafka kafka-streams 4.1.0 org.apache.maven.plugins maven-compiler-plugin 3.8.1 ${java.version} ${java.version} -Xlint:all ``` # Configure ksqlDB for Confluent Platform - [Configure Security for ksqlDB](security.md#ksqldb-installation-security) - [ksqlDB Configuration Parameter Reference](../../reference/server-configuration.md#ksqldb-reference-server-configuration) - [Configure ksqlDB for Avro, Protobuf, and JSON schemas](avro-schema.md#ksqldb-installation-configure-serialization-formats) ksqlDB configuration parameters can be set for ksqlDB Server and for queries, as well as for the underlying Kafka Streams and Kafka Clients (producer and consumer). These instructions assume you are installing Confluent Platform by using ZIP or TAR archives. For more information, see [On-Premises Deployments](../../../installation/overview.md#installation). #### Step 2. Configure the link on the private cluster The more privileged / private cluster (cluster A in the diagram) requires: - Connectivity to its remote cluster (one-way connectivity is acceptable; such as AWS PrivateLink) - A user to create a cluster link object on it **second** (after the remote cluster) with the following configuration: ```properties # bootstrap of the remote cluster bootstrap.servers=localhost:9992 link.mode=BIDIRECTIONAL # authentication for the link principal on the remote cluster sasl.mechanism=SCRAM-SHA-512 security.protocol=SASL_PLAINTEXT sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required \ username="link" \ password="link-secret"; # authentication for the link principal on the local cluster local.sasl.mechanism=SCRAM-SHA-512 local.security.protocol=SASL_PLAINTEXT local.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required \ username="link" \ password="link-secret"; ``` - Additional configurations can be included in this file as needed, per [cluster link configurations](#cp-cluster-link-config-options). - An authentication configuration (such as API key or OAuth) for a principal on its **remote cluster** with ACLs or RBAC role bindings giving permission to read topic data and metadata. - **Alter:Cluster ACL** – this is unique to the advanced mode - Describe:Cluster ACL - The required ACLs or RBAC role bindings for a cluster link, as described in [Manage Security for Cluster Linking on Confluent Platform](security.md#cluster-link-security) (for [a cluster link on a source cluster](security.md#cluster-link-acls-for-link-on-source) and for a [source-initiated link on the destination cluster](security.md#cluster-link-acls-for-source-initiated-links)). - **Local authentication** (unique to the unidirectional mode): An authentication configuration (such as API key or OAuth) for a principal **on its own cluster** with ACLs or RBAC role bindings giving permission to read topic data and the Describe:Cluster ACL - The required ACLs or RBAC role bindings as described giving permission to read topic data and the Describe:Cluster ACL, as described in [Manage Security for Cluster Linking on Confluent Platform](security.md#cluster-link-security). This authentication configuration never leaves this cluster. The lines for these configurations must be prefixed with `local.` to indicate that they belong to the local cluster. - Link mode set to `link.mode=BIDIRECTIONAL` For example, run the following command to create a bidirectional INBOUND/OUTBOUND link on the private cluster (cluster A in the diagram), including the call to your cluster A config file: ```bash $CONFLUENT_HOME/bin/kafka-cluster-links --create --link bidirectional-link \ --config-file my-examples/a-link.config \ --bootstrap-server localhost:9092 --command-config my-examples/command.config ``` ## FAQ Quick List - [How do I know the data was successfully copied to the destination and can be safely deleted from the source topic?](#faq-verify-data-replication) - [How can I throttle a Confluent Platform to Confluent Cloud cluster link to make the best use of my bandwidth?](#faq-throttle) - [Will adding a cluster link result in throttling consumers on the source cluster?](#faq-throttle-consumers-effects) - [Will adding a cluster link cause throttling of existing producers on the destination cluster?](#faq-throttle-producers-effects) - [How is the consumer offset sync accomplished?](#faq-consumer-offset-sync) - [Can I modify consumer group filters on-the-fly?](#faq-consumer-groups-filters-modify-on-fly) - [How do I create a cluster link?](#faq-create-link) - [Which clusters can create cluster links?](#faq-cluster-details) - [Can I prioritize one link over another?](#faq-prioritize-links) - [How do I create a mirror topic?](#faq-mirror-topics) - [Can I prevent certain topics from being mirrored by a cluster link?](#faq-block-mirror-topics) - [Can I override a topic configuration when using auto-create mirror topics?](#faq-override-topic-config-with-auto-create-mirror-topics) - [Can I use Cluster Linking without the traffic going over the public internet?](#faq-no-public-internet) - [Does Schema Linking have the same limitations as Cluster Linking for private networking and cross-region?](#faq-schema-linking-private-net-rqmts) - [I need RPO==0 (guarantee of no data loss after a failover) in Confluent Cloud. What can I do?](#faq-rpo-zero) - [If I want to join two topics from different clusters in ksqlDB, how can Cluster Linking help me?](#faq-ksqldb-multi-cluster) - [Does Cluster Linking work with mTLS?](#faq-mtls) - [How does Schema Registry multi-region disaster recovery (DR) work in Confluent Cloud?](#faq-cloud-multi-region-dr) - [How can I automatically failover Kafka clients?](#faq-failover-ak-clients) - [How does Cluster Linking optimize network bandwidth and performance in Confluent Cloud?](#faq-optimize) - [How do I perform a failover on a cluster link used primarily for data sharing?](#faq-failover-with-data-sharing) - [Does Cluster Linking support compacted topics?](#faq-compacted-topics-support) - [Does Cluster Linking support bidirectional links between two clusters?](#faq-bidirectional-links) - [Does Cluster Linking support repartitioning or renaming of topics?](#faq-repartitioning-and-renaming-topics) - [Can Cluster Linking create circular dependencies? How can I prevent infinite loops?](#faq-circular-dependencies) #### NOTE As a general guideline (not just for this tutorial), any customer-owned firewall that allows the cluster link connection from source cluster brokers to destination cluster brokers must allow the TCP connection to persist in order for Cluster Linking to work. - These instructions assume you have a local installation of [Confluent Platform 7.1.0 or later](https://www.confluent.io/get-started/?product=software), and Java 8, 11, or 17 (recommended) (needed for Confluent Platform). [Install instructions for self-managed deployments](/platform/current/installation/overview.html) are available in the documentation. If you are new to Confluent Platform, you may want to first work through the [Quick Start for Apache Kafka using Confluent Platform](/platform/current/platform-quickstart.html), and/or the [basic Cluster Linking tutorial](/platform/current/multi-dc-deployments/cluster-linking/topic-data-sharing.html) then return to this tutorial. - This tutorial and the source-initiated link feature require Confluent Enterprise, and are not supported in Confluent Community or Apache Kafka®. - With a default install of Confluent Platform, the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/command-reference/overview.html) and [Cluster Linking commands](https://docs.confluent.io/platform/current/multi-dc-deployments/cluster-linking/commands.html) should be available in `$CONFLUENT_HOME/bin` and properties files will be in the directory `CONFLUENT_CONFIG` (`$CONFLUENT_HOME/etc/kafka/`). You must have Confluent Platform running to access these commands. Once Confluent Platform is [configured](#cluster-link-hybrid-config) and [running](#cluster-link-hybrid-start-cp), you can type any command with no arguments to get help (for example, `kafka-cluster-links`). - This tutorial requires a Confluent Cloud login and the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/overview.html). To learn more, see [Get the latest version of Confluent Cloud](https://docs.confluent.io/cloud/current/multi-cloud/cluster-linking/quickstart.html#get-the-latest-version-of-ccloud) in the [Confluent Cloud Cluster Linking Quick Start](https://docs.confluent.io/cloud/current/multi-cloud/cluster-linking/quickstart.html#) as well as [Migrate Confluent CLI](https://docs.confluent.io/confluent-cli/current/migrate.html). If you are new to Confluent Cloud, you might want to walk through that Quick Start first, and then return to this tutorial. - This tutorial requires that you run a [Dedicated cluster](https://docs.confluent.io/cloud/current/clusters/cluster-types.html#dedicated-clusters) in Confluent Cloud, which will incur Confluent Cloud charges. - The parameter `password.encoder.secret` is used to encrypt the credentials which will be stored in the cluster link. This is required for ZooKeeper as supported on pre-8.0 versions of Confluent Platform and when [migrating from ZooKeeper to KRaft](/platform/current/installation/migrate-zk-kraft.html), as described in [What’s supported](/platform/current/multi-dc-deployments/cluster-linking/index.html#what-s-supported). To learn more about this parameter, see [Multi-Region Clusters](/platform/current/kafka/dynamic-config.html#dynamic-config-passwords-upgrade). ## Authentication The following example shows how to configure SASL_SSL with GSSAPI as the SASL mechanism for the cluster link to talk to the source cluster. You can set these configurations using a `config-file`, as described in the section on [how to set properties on a cluster link](configs.md#cluster-link-specific-configs). ```bash security.protocol=SASL_SSL ssl.truststore.location=/path/to/truststore.p12 ssl.truststore.password=truststore-password ssl.truststore.type=PKCS12 sasl.mechanism=GSSAPI sasl.kerberos.service.name=kafka sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required \ useKeyTab=true \ storeKey=true \ keyTab="/path/to/link.keytab" \ principal="clusterlink1@EXAMPLE.COM"; ``` Cluster Linking configurations should include client-side TLS/SSL and SASL/GSSAPI configuration options for connections to the source cluster in this scenario. If you reference a `keystore`/`truststore` directly (for example, `keystore.jks`), the same files must be available in the same location on each of the brokers For details on creating TLS/SSL key and trust stores, see [Use TLS Authentication in Confluent Platform](../../security/authentication/mutual-tls/overview.md#kafka-ssl-authentication). For details on SASL/GSSAPI, see [Configure GSSAPI in Confluent Platform clusters](../../security/authentication/sasl/gssapi/overview.md#kafka-sasl-auth-gssapi). To configure cluster links to use other SASL mechanisms, include client-side security configurations for that mechanism. See [SASL](../../security/authentication/overview.md#kafka-sasl-auth) for other supported mechanisms. To use mutual TLS authentication as the security protocol, a key store should also be configured for the link. See [Use TLS Authentication in Confluent Platform](../../security/authentication/mutual-tls/overview.md#kafka-ssl-authentication) for details. # Deploy Confluent Platform in a Multi-Datacenter Environment Confluent Platform supports several types of multi-datacenter deployment solutions. * [Overview](index.md) * [Multi-Data Center Architectures on Confluent Platform](multi-region-architectures.md) * [Cluster Linking on Confluent Platform](cluster-linking/overview.md) * [Overview](cluster-linking/index.md) * [Tutorials](cluster-linking/overview-use-cases.md) * [Share Data Across Topics](cluster-linking/topic-data-sharing.md) * [Link Hybrid Cloud and Bridge-to-Cloud Clusters](cluster-linking/hybrid-cp.md) * [Migrate Data](cluster-linking/migrate-cp.md) * [Manage](cluster-linking/overview-config-and-manage.md) * [Manage Mirror Topics](cluster-linking/mirror-topics-cp.md) * [Configure](cluster-linking/configs.md) * [Command Reference](cluster-linking/commands.md) * [Monitor](cluster-linking/metrics.md) * [Security](cluster-linking/security.md) * [FAQ](cluster-linking/faqs-cp.md) * [Troubleshooting](cluster-linking/trouble-cp.md) * [Multi-Region Clusters on Confluent Platform](multi-region-overview.md) * [Overview](multi-region.md) * [Tutorial: Multi-Region Clusters](multi-region-tutorial.md) * [Tutorial: Move Active-Passive to Multi-Region](mrc-move-from-active-passive.md) * [Replicate Topics Across Kafka Clusters in Confluent Platform](replicator/overview.md) * [Overview](replicator/index.md) * [Example: Active-active Multi-Datacenter](replicator/replicator-docker-tutorial.md) * [Tutorial: Replicate Data Across Clusters](replicator/replicator-quickstart.md) * [Tutorial: Run as an Executable or Connector](replicator/replicator-run.md) * [Configure](replicator/configuration_options.md) * [Verify Configuration](replicator/replicator-verifier.md) * [Tune](replicator/replicator-tuning.md) * [Monitor](replicator/replicator-monitoring.md) * [Configure for Cross-Cluster Failover](replicator/replicator-failover.md) * [Migrate from MirrorMaker to Replicator](replicator/migrate-replicator.md) * [Replicator Schema Translation Example for Confluent Platform](replicator/replicator-schema-translation.md) ## Multi-Datacenter Use Cases Replicator can be deployed across clusters and in multiple datacenters. Multi-datacenter deployments enable use-cases such as: * Active-active geo-localized deployments: allows users to access a near-by datacenter to optimize their architecture for low latency and high performance * Active-passive disaster recover (DR) deployments: in an event of a partial or complete datacenter disaster, allow failing over applications to use Confluent Platform in a different datacenter. * Centralized analytics: Aggregate data from multiple Apache Kafka® clusters into one location for organization-wide analytics * Cloud migration: Use Kafka to synchronize data between on-prem applications and cloud deployments Replication of events in Kafka topics from one cluster to another is the foundation of Confluent’s multi datacenter architecture. Replication can be done with Confluent Replicator or using the open source [Kafka MirrorMaker](https://kafka.apache.org/documentation/#basic_ops_mirror_maker). Replicator can be used for replication of topic data as well as [migrating schemas](../../schema-registry/installation/migrate.md#schemaregistry-migrate) in Schema Registry. This documentation focuses on Replicator, including [architecture](#replicator-architecture), [quick start tutorial](replicator-quickstart.md#replicator-quickstart), how to [configure and run](replicator-run.md#replicator-run) Replicator in different contexts, [tuning and monitoring](replicator-tuning.md#replicator-tuning), [cross-cluster failover](replicator-failover.md#replicator-failover), and more. A section on how to [migrate from MirrorMaker to Replicator](migrate-replicator.md#migrate-replicator) is also included. Some of the general thinking on deployment strategies can also apply to MirrorMaker, but if you are primarily interested in MirrorMaker, see [Mirroring data between clusters](https://kafka.apache.org/documentation/#basic_ops_mirror_maker) in the Kafka documentation. ### Inspect topics 1. For each datacenter, inspect the data in various topics, provenance information, timestamp information, and cluster ID. ```bash ./read-topics.sh ``` 2. Verify the output resembles: ```text -----dc1----- list topics: __consumer_offsets __consumer_timestamps _confluent-command _confluent-license _confluent-telemetry-metrics _confluent_balancer_api_state _schemas connect-configs-dc1 connect-offsets-dc1 connect-status-dc1 topic1 topic2 topic1: {"userid":{"string":"User_7"},"dc":{"string":"dc1"}} {"userid":{"string":"User_7"},"dc":{"string":"dc2"}} {"userid":{"string":"User_9"},"dc":{"string":"dc2"}} {"userid":{"string":"User_2"},"dc":{"string":"dc1"}} {"userid":{"string":"User_5"},"dc":{"string":"dc2"}} {"userid":{"string":"User_1"},"dc":{"string":"dc1"}} {"userid":{"string":"User_3"},"dc":{"string":"dc2"}} {"userid":{"string":"User_7"},"dc":{"string":"dc1"}} {"userid":{"string":"User_1"},"dc":{"string":"dc2"}} {"userid":{"string":"User_8"},"dc":{"string":"dc1"}} Processed a total of 10 messages topic2: {"registertime":{"long":1513471082347},"userid":{"string":"User_2"},"regionid":{"string":"Region_7"},"gender":{"string":"OTHER"}} {"registertime":{"long":1496006007512},"userid":{"string":"User_5"},"regionid":{"string":"Region_6"},"gender":{"string":"OTHER"}} {"registertime":{"long":1494319368203},"userid":{"string":"User_7"},"regionid":{"string":"Region_2"},"gender":{"string":"FEMALE"}} {"registertime":{"long":1493150028737},"userid":{"string":"User_1"},"regionid":{"string":"Region_5"},"gender":{"string":"FEMALE"}} {"registertime":{"long":1517151907191},"userid":{"string":"User_5"},"regionid":{"string":"Region_3"},"gender":{"string":"OTHER"}} {"registertime":{"long":1489672305692},"userid":{"string":"User_2"},"regionid":{"string":"Region_6"},"gender":{"string":"OTHER"}} {"registertime":{"long":1511471447951},"userid":{"string":"User_2"},"regionid":{"string":"Region_5"},"gender":{"string":"MALE"}} {"registertime":{"long":1488018372941},"userid":{"string":"User_7"},"regionid":{"string":"Region_2"},"gender":{"string":"OTHER"}} {"registertime":{"long":1500952152251},"userid":{"string":"User_2"},"regionid":{"string":"Region_1"},"gender":{"string":"MALE"}} {"registertime":{"long":1493556444692},"userid":{"string":"User_1"},"regionid":{"string":"Region_8"},"gender":{"string":"FEMALE"}} Processed a total of 10 messages _schemas: null null null {"subject":"topic1-value","version":1,"id":1,"schema":"{\"type\":\"record\",\"name\":\"KsqlDataSourceSchema\",\"namespace\":\"io.confluent.ksql.avro_schemas\",\"fields\":[{\"name\":\"userid\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"dc\",\"type\":[\"null\",\"string\"],\"default\":null}]}","deleted":false} {"subject":"topic2-value","version":1,"id":2,"schema":"{\"type\":\"record\",\"name\":\"KsqlDataSourceSchema\",\"namespace\":\"io.confluent.ksql.avro_schemas\",\"fields\":[{\"name\":\"registertime\",\"type\":[\"null\",\"long\"],\"default\":null},{\"name\":\"userid\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"regionid\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"gender\",\"type\":[\"null\",\"string\"],\"default\":null}]}","deleted":false} {"subject":"topic2.replica-value","version":1,"id":2,"schema":"{\"type\":\"record\",\"name\":\"KsqlDataSourceSchema\",\"namespace\":\"io.confluent.ksql.avro_schemas\",\"fields\":[{\"name\":\"registertime\",\"type\":[\"null\",\"long\"],\"default\":null},{\"name\":\"userid\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"regionid\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"gender\",\"type\":[\"null\",\"string\"],\"default\":null}]}","deleted":false} [2021-01-04 19:16:09,579] ERROR Error processing message, terminating consumer process: (kafka.tools.ConsoleConsumer$) org.apache.kafka.common.errors.TimeoutException Processed a total of 6 messages provenance info (cluster, topic, timestamp): 2qHo2TsdTIaTjvkCyf3qdw,topic1,1609787778125 2qHo2TsdTIaTjvkCyf3qdw,topic1,1609787779123 2qHo2TsdTIaTjvkCyf3qdw,topic1,1609787780125 2qHo2TsdTIaTjvkCyf3qdw,topic1,1609787781246 2qHo2TsdTIaTjvkCyf3qdw,topic1,1609787782125 Processed a total of 10 messages timestamp info (group: topic-partition): replicator-dc1-to-dc2-topic2: topic2-0 1609787797164 replicator-dc1-to-dc2-topic1: topic1-0 1609787797117 Processed a total of 2 messages cluster id: ZagAAEfORQG-lxwq6OsV5Q -----dc2----- list topics: __consumer_offsets __consumer_timestamps _confluent-command _confluent-controlcenter-6-1-0-1-AlertHistoryStore-changelog _confluent-controlcenter-6-1-0-1-AlertHistoryStore-repartition _confluent-controlcenter-6-1-0-1-Group-ONE_MINUTE-changelog _confluent-controlcenter-6-1-0-1-Group-ONE_MINUTE-repartition _confluent-controlcenter-6-1-0-1-Group-THREE_HOURS-changelog _confluent-controlcenter-6-1-0-1-Group-THREE_HOURS-repartition _confluent-controlcenter-6-1-0-1-KSTREAM-OUTEROTHER-0000000106-store-changelog _confluent-controlcenter-6-1-0-1-KSTREAM-OUTEROTHER-0000000106-store-repartition _confluent-controlcenter-6-1-0-1-KSTREAM-OUTERTHIS-0000000105-store-changelog _confluent-controlcenter-6-1-0-1-KSTREAM-OUTERTHIS-0000000105-store-repartition _confluent-controlcenter-6-1-0-1-MetricsAggregateStore-changelog _confluent-controlcenter-6-1-0-1-MetricsAggregateStore-repartition _confluent-controlcenter-6-1-0-1-MonitoringMessageAggregatorWindows-ONE_MINUTE-changelog _confluent-controlcenter-6-1-0-1-MonitoringMessageAggregatorWindows-ONE_MINUTE-repartition _confluent-controlcenter-6-1-0-1-MonitoringMessageAggregatorWindows-THREE_HOURS-changelog _confluent-controlcenter-6-1-0-1-MonitoringMessageAggregatorWindows-THREE_HOURS-repartition _confluent-controlcenter-6-1-0-1-MonitoringStream-ONE_MINUTE-changelog _confluent-controlcenter-6-1-0-1-MonitoringStream-ONE_MINUTE-repartition _confluent-controlcenter-6-1-0-1-MonitoringStream-THREE_HOURS-changelog _confluent-controlcenter-6-1-0-1-MonitoringStream-THREE_HOURS-repartition _confluent-controlcenter-6-1-0-1-MonitoringTriggerStore-changelog _confluent-controlcenter-6-1-0-1-MonitoringTriggerStore-repartition _confluent-controlcenter-6-1-0-1-MonitoringVerifierStore-changelog _confluent-controlcenter-6-1-0-1-MonitoringVerifierStore-repartition _confluent-controlcenter-6-1-0-1-TriggerActionsStore-changelog _confluent-controlcenter-6-1-0-1-TriggerActionsStore-repartition _confluent-controlcenter-6-1-0-1-TriggerEventsStore-changelog _confluent-controlcenter-6-1-0-1-TriggerEventsStore-repartition _confluent-controlcenter-6-1-0-1-actual-group-consumption-rekey _confluent-controlcenter-6-1-0-1-aggregate-topic-partition-store-changelog _confluent-controlcenter-6-1-0-1-aggregate-topic-partition-store-repartition _confluent-controlcenter-6-1-0-1-aggregatedTopicPartitionTableWindows-ONE_MINUTE-changelog _confluent-controlcenter-6-1-0-1-aggregatedTopicPartitionTableWindows-ONE_MINUTE-repartition _confluent-controlcenter-6-1-0-1-aggregatedTopicPartitionTableWindows-THREE_HOURS-changelog _confluent-controlcenter-6-1-0-1-aggregatedTopicPartitionTableWindows-THREE_HOURS-repartition _confluent-controlcenter-6-1-0-1-cluster-rekey _confluent-controlcenter-6-1-0-1-expected-group-consumption-rekey _confluent-controlcenter-6-1-0-1-group-aggregate-store-ONE_MINUTE-changelog _confluent-controlcenter-6-1-0-1-group-aggregate-store-ONE_MINUTE-repartition _confluent-controlcenter-6-1-0-1-group-aggregate-store-THREE_HOURS-changelog _confluent-controlcenter-6-1-0-1-group-aggregate-store-THREE_HOURS-repartition _confluent-controlcenter-6-1-0-1-group-stream-extension-rekey _confluent-controlcenter-6-1-0-1-metrics-trigger-measurement-rekey _confluent-controlcenter-6-1-0-1-monitoring-aggregate-rekey-store-changelog _confluent-controlcenter-6-1-0-1-monitoring-aggregate-rekey-store-repartition _confluent-controlcenter-6-1-0-1-monitoring-message-rekey-store _confluent-controlcenter-6-1-0-1-monitoring-trigger-event-rekey _confluent-license _confluent-metrics _confluent-monitoring _confluent-telemetry-metrics _confluent_balancer_api_state _schemas connect-configs-dc2 connect-offsets-dc2 connect-status-dc2 topic1 topic2.replica topic1: {"userid":{"string":"User_2"},"dc":{"string":"dc2"}} {"userid":{"string":"User_1"},"dc":{"string":"dc1"}} {"userid":{"string":"User_6"},"dc":{"string":"dc2"}} {"userid":{"string":"User_9"},"dc":{"string":"dc1"}} {"userid":{"string":"User_9"},"dc":{"string":"dc2"}} {"userid":{"string":"User_9"},"dc":{"string":"dc1"}} {"userid":{"string":"User_9"},"dc":{"string":"dc2"}} {"userid":{"string":"User_9"},"dc":{"string":"dc1"}} {"userid":{"string":"User_9"},"dc":{"string":"dc2"}} {"userid":{"string":"User_9"},"dc":{"string":"dc1"}} Processed a total of 10 messages topic2.replica: {"registertime":{"long":1488571887136},"userid":{"string":"User_2"},"regionid":{"string":"Region_4"},"gender":{"string":"FEMALE"}} {"registertime":{"long":1496554479008},"userid":{"string":"User_3"},"regionid":{"string":"Region_9"},"gender":{"string":"OTHER"}} {"registertime":{"long":1515819037639},"userid":{"string":"User_1"},"regionid":{"string":"Region_7"},"gender":{"string":"FEMALE"}} {"registertime":{"long":1498630829454},"userid":{"string":"User_9"},"regionid":{"string":"Region_5"},"gender":{"string":"FEMALE"}} {"registertime":{"long":1491954362758},"userid":{"string":"User_6"},"regionid":{"string":"Region_6"},"gender":{"string":"FEMALE"}} {"registertime":{"long":1498308706008},"userid":{"string":"User_2"},"regionid":{"string":"Region_2"},"gender":{"string":"OTHER"}} {"registertime":{"long":1509409463384},"userid":{"string":"User_5"},"regionid":{"string":"Region_8"},"gender":{"string":"OTHER"}} {"registertime":{"long":1494736574275},"userid":{"string":"User_4"},"regionid":{"string":"Region_4"},"gender":{"string":"OTHER"}} {"registertime":{"long":1513254638109},"userid":{"string":"User_3"},"regionid":{"string":"Region_5"},"gender":{"string":"FEMALE"}} {"registertime":{"long":1499607488391},"userid":{"string":"User_4"},"regionid":{"string":"Region_2"},"gender":{"string":"OTHER"}} Processed a total of 10 messages _schemas: null null null {"subject":"topic1-value","version":1,"id":1,"schema":"{\"type\":\"record\",\"name\":\"KsqlDataSourceSchema\",\"namespace\":\"io.confluent.ksql.avro_schemas\",\"fields\":[{\"name\":\"userid\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"dc\",\"type\":[\"null\",\"string\"],\"default\":null}]}","deleted":false} {"subject":"topic2-value","version":1,"id":2,"schema":"{\"type\":\"record\",\"name\":\"KsqlDataSourceSchema\",\"namespace\":\"io.confluent.ksql.avro_schemas\",\"fields\":[{\"name\":\"registertime\",\"type\":[\"null\",\"long\"],\"default\":null},{\"name\":\"userid\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"regionid\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"gender\",\"type\":[\"null\",\"string\"],\"default\":null}]}","deleted":false} {"subject":"topic2.replica-value","version":1,"id":2,"schema":"{\"type\":\"record\",\"name\":\"KsqlDataSourceSchema\",\"namespace\":\"io.confluent.ksql.avro_schemas\",\"fields\":[{\"name\":\"registertime\",\"type\":[\"null\",\"long\"],\"default\":null},{\"name\":\"userid\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"regionid\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"gender\",\"type\":[\"null\",\"string\"],\"default\":null}]}","deleted":false} [2021-01-04 19:17:26,336] ERROR Error processing message, terminating consumer process: (kafka.tools.ConsoleConsumer$) org.apache.kafka.common.errors.TimeoutException Processed a total of 6 messages provenance info (cluster, topic, timestamp): ZagAAEfORQG-lxwq6OsV5Q,topic1,1609787854055 ZagAAEfORQG-lxwq6OsV5Q,topic1,1609787854057 ZagAAEfORQG-lxwq6OsV5Q,topic1,1609787856052 ZagAAEfORQG-lxwq6OsV5Q,topic1,1609787857052 ZagAAEfORQG-lxwq6OsV5Q,topic1,1609787857054 Processed a total of 10 messages timestamp info (group: topic-partition): replicator-dc2-to-dc1-topic1: topic1-0 1609787867007 replicator-dc2-to-dc1-topic1: topic1-0 1609787877008 Processed a total of 2 messages cluster id: 2qHo2TsdTIaTjvkCyf3qdw ``` #### Configure and run Replicator on the Connect cluster You should have at least one distributed mode Connect Worker already up and running. To learn more, review the [distributed mode documentation](/kafka-connectors/self-managed/userguide.html#distributed-mode) . You can check if the Connect Worker is up and running by checking its REST API: ```bash curl http://localhost:8083/ {"version":"8.1.0-ccs","commit":"078e7dc02a100018"} ``` If everything is fine, you will see a version number and commit hash for the version of the Connect Worker you are running. Run Replicator by sending the Connect REST API its configuration file in JSON format. Here’s an example configuration: ```none { "name":"replicator", "config":{ "connector.class":"io.confluent.connect.replicator.ReplicatorSourceConnector", "tasks.max":4, "key.converter":"io.confluent.connect.replicator.util.ByteArrayConverter", "value.converter":"io.confluent.connect.replicator.util.ByteArrayConverter", "src.kafka.bootstrap.servers":"localhost:9082", "topic.whitelist":"test-topic", "topic.rename.format":"${topic}.replica", "confluent.license":"XYZ" } } ``` You can send this to Replicator using `curl`. This assumes the above JSON is in a file called `example-replicator.json`: ```none curl -X POST -d @example-replicator.json http://localhost:8083/connectors --header "content-Type:application/json" ``` This example demonstrates use of some important configuration parameters. For an explanation of all configuration parameters, see [Replicator Configuration Reference for Confluent Platform](configuration_options.md#replicator-config-options). * `key.converter` and `value.converter` - Classes used to convert Kafka records to Connect’s internal format. The Connect Worker configuration specifies global converters and those will be used if you don’t specify anything in the Replicator configuration. For Replication, however, no conversion is necessary. You just want to read bytes out of the origin cluster and write them to the destination with no changes. Therefore, you can override the global converters with the `ByteArrayConverter`, which leaves the records as is. * `src.kafka.bootstrap.servers` - A list of brokers from the **origin** cluster * `topic.whitelist` - An explicit list of the topics that you want replicated. The quick start replicates a topic named `test-topic`. * `topic.rename.format` - A substitution string that is used to rename topics in the destination cluster. The snippet above uses `${topic}.replica`, where `${topic}` will be substituted with the topic name from the origin cluster. That means that the `test-topic` being replicated from the origin cluster will be renamed to `test-topic.replica` in the destination cluster. * `confluent.license` - You cannot use Confluent Replicator without the license on Confluent Platform versions 5.5.0 and later, as there is no trial period for Replicator on these newer Confluent Platform versions. Contact Confluent Support for more information. ## Suggested Reading * [Use Schema Registry to Migrate Schemas in Confluent Platform](../../schema-registry/installation/migrate.md#schemaregistry-migrate) * [Schemas, subjects, and topics](../../schema-registry/fundamentals/index.md#sr-subjects-topics-primer) * [Tutorial: Replicate Data Across Kafka Clusters in Confluent Platform](replicator-quickstart.md#replicator-quickstart) * [Configure Replicator for Cross-Cluster Failover in Confluent Platform](replicator-failover.md#replicator-failover) * These sections in [Replicator Configuration Properties](https://docs.confluent.io/kafka-connect-replicator/current/configuration_options.html): - [Source Topics](https://docs.confluent.io/kafka-connect-replicator/current/configuration_options.html#destination-data-conversion) - [Destination Topics](https://docs.confluent.io/kafka-connect-replicator/current/configuration_options.html#destination-topics) - [Schema Translation](https://docs.confluent.io/kafka-connect-replicator/current/configuration_options.html#schema-translation) ## Confluent Community software / Kafka New features in Confluent Platform 8.1 include the following: * [KIP-932 Queues for Kafka:](https://cwiki.apache.org/confluence/display/KAFKA/KIP-932%3A+Queues+for+Kafka) Queues for Kafka is now available as a preview. You can test this feature, which can be upgraded to production-ready clusters when Queues becomes generally available. Note that some features, such as the partition assignor, are still in development. For configuration and testing details, see the [Apache Queues for Kafka Preview](https://cwiki.apache.org/confluence/display/KAFKA/Queues+for+Kafka+%28KIP-932%29+-+Preview+Release+Notes) documentation. - The preview includes [KIP-1103](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1103%3A+Additional+metrics+for+cooperative+consumption), which adds metrics for Share Groups. - To enable share groups, set `$share.version = 1$` by using the `kafka-features.sh` tool. ```shell bin/kafka-features.sh --bootstrap-server upgrade --feature share.version=1 ``` - To provide feedback or to discuss the Queues for Kafka preview, contact Confluent. * [KIP-853 KRaft Controller Membership Changes:](https://cwiki.apache.org/confluence/display/KAFKA/KIP-853%3A+KRaft+Controller+Membership+Changes) You can now upgrade KRaft voters from a static to a dynamic configuration. * [KIP-1166 Improve high-watermark replication:](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1166:+Improve+high-watermark+replication) This KIP fixes issues where pending fetch requests could fail to complete, which previously impacted high-watermark progression. * [KIP-890 Transactions Server-Side Defense:](https://cwiki.apache.org/confluence/display/KAFKA/KIP-890%3A+Transactions+Server-Side+Defense) To prevent an infinite “out-of-order sequence” error, idempotent producers now reject non-zero sequences when no producer ID state exists on the partition for the transaction. * [KIP-848 Consumer Group Protocol:](https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol) This KIP includes several enhancements, such as a new rack-aware assignor ([KIP-1101](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1101%3A+Trigger+rebalance+on+rack+topology+changes)). The new assignor makes rack-aware partition assignment significantly more memory-efficient, which supports hundreds of members in a single consumer group. * [KIP-1131 Improved controller-side metrics for monitoring broker states:](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1131%3A+Improved+controller-side+monitoring+of+broker+states) This KIP adds new controller-side metrics to improve monitoring of broker health and status. * [KIP-1109 Unifying Kafka consumer topic metrics:](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1109%3A+Unifying+Kafka+Consumer+Topic+Metrics) Kafka consumer metrics now preserve periods (`.`) in topic names instead of replacing them with underscores (`_`). This change aligns their behavior with producer metrics. The old metrics that use underscores in topic names will be removed in a future release. * [KIP-1118 Add deadlock protection on the producer network thread:](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1118%3A+Add+Deadlock+Protection+on+Producer+Network+Thread) Calling `KafkaProducer.flush()` from within the `KafkaProducer.send()` callback now raises an exception to prevent a potential deadlock in the producer. * [KIP-1143 Deprecate Optional and return String from public Endpoint:](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=345377327) The `Endpoint.listenerName()` method that returns `Optional` is now deprecated. You should update your code to use the new method that returns a `String`. * [KIP-1152 Add transactional ID pattern filter to ListTransactions API:](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1152%3A+Add+transactional+ID+pattern+filter+to+ListTransactions+API) You can now filter transactions by a transactional ID pattern when using the `kafka-transactions.sh` tool or the Admin API. This feature avoids the need to retrieve all transactions and filter them on the client side. For a full list of KIPs, features, and bug fixes, see the [Apache Kafka 4.1 release notes](https://archive.apache.org/dist/kafka/4.1.0/RELEASE_NOTES.html). ### GET /subjects Get a list of registered subjects. (For API usage examples, see [List all subjects](using.md#kafka-key-listing-all-subjects).) * **Parameters:** * **subjectPrefix** (*string*) – Add `?subjectPrefix=` (as an empty string) at the end of this request to list subjects in the default context. If this flag is not included, `GET /subjects` returns all subjects across all contexts. To learn more about contexts, see the [exporters](#schemaregistry-api-exporters) API reference and the quick start and concepts guides for [Schema Linking on Confluent Platform](../schema-linking-cp.md#schema-linking-cp-overview) and [Schema Linking on Confluent Cloud](/cloud/current/sr/schema-linking.html). * **deleted** (*boolean*) – Add `?deleted=true` at the end of this request to list both current and soft-deleted subjects. The default is `false`. If this flag is not included, only current subjects are listed (not those that have been soft-deleted). Hard and soft delete are explained below in the description of the `delete` API. * **Response JSON Array of Objects:** * **name** (*string*) – Subject * **Status Codes:** * [500 Internal Server Error](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.5.1) – * Error code 50001 – Error in the backend datastore **Example request**: ```http GET /subjects HTTP/1.1 Host: schemaregistry.example.com Accept: application/vnd.schemaregistry.v1+json, application/vnd.schemaregistry+json, application/json ``` **Example response**: ```http HTTP/1.1 200 OK Content-Type: application/vnd.schemaregistry.v1+json ["subject1", "subject2"] ``` #### NOTE JSON and PROTOBUF_NOSR are not supported. For more details, see [What’s supported](../../index.md#sr-supported-features) in the Schema Registry overview. Use the serializer and deserializer for your schema format. Specify the serializer in the code for the Kafka producer to send messages, and specify the deserializer in the code for the Kafka consumer to read messages. The new Protobuf and JSON Schema serializers and deserializers support many of the same configuration properties as the Avro equivalents, including [subject name strategies](#sr-schemas-subject-name-strategy) for the key and value. In the case of the `RecordNameStrategy` (and `TopicRecordNameStrategy`), the subject name will be: - For Avro, the record fullname (namespace + record name). - For Protobuf, the message name. - For JSON Schema, the title. When using `RecordNameStrategy` with Protobuf and JSON Schema, there is additional configuration that is required. This, along with examples and command line testing utilities, is covered in the deep dive sections: - [Avro](serdes-avro.md#serdes-and-formatter-avro) - [Protobuf](serdes-protobuf.md#serdes-and-formatter-protobuf) - [JSON Schema](serdes-json.md#serdes-and-formatter-json) In addition to the detailed sections above, produce and consume examples are available in [confluentinc/confluent-kafka-go/examples](https://github.com/confluentinc/confluent-kafka-go/tree/master/examples) for each of the different Schema Registry SerDes. The serializers and [Kafka Connect converters](/platform/current/connect/concepts.html#converters) for all supported schema formats automatically register schemas by default. The Protobuf serializer recursively registers all referenced schemas separately. With Protobuf and JSON Schema support, the Schema Registry adds the ability to add new schema formats using schema plugins (the existing Avro support has been wrapped with an Avro schema plugin). #### Avro consumer For example, to consume messages from the beginning (`--from-beginning`) from the `stocks` topic on a Confluent Cloud cluster: ```bash ./bin/kafka-avro-console-consumer --bootstrap-server $BOOTSTRAP_SERVER \ --property basic.auth.credentials.source="USER_INFO" \ --property print.key=true --property print.schema.ids=true \ --property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer \ --property schema.registry.url=$SCHEMA_REGISTRY_URL \ --consumer.config /Users/vicky/creds.config \ --topic stocks --from-beginning \ --property schema.registry.basic.auth.user.info=$SR_APIKEY:$SR_APISECRET ``` This results in output similar to the following, with the schema ID showing at the end of each message line: ```bash ... ZVZZT {"side":"SELL","quantity":1546,"symbol":"ZVZZT","price":629,"account":"ABC123","userid":"User_4"} 100008 ZJZZT {"side":"SELL","quantity":765,"symbol":"ZJZZT","price":140,"account":"ABC123","userid":"User_2"} 100008 ZJZZT {"side":"BUY","quantity":2977,"symbol":"ZJZZT","price":264,"account":"ABC123","userid":"User_9"} 100008 ... ``` To drill down on a particular subset of messages, determine the offset and partition you want to focus on. You can use the Confluent Cloud Console to navigate to a particular offset and partition. ![image](schema-registry/images/serdes-message-per-offset-partition.png) For example, to show messages to the `stocks` topic, starting at offset `15846316` on partition `0`, replace from `--from-beginning` in the command with the `--offset` and `--partition` numbers you want to explore. To limit the number of messages, you can add a value for `--max-messages` such as `5` in the example: ```bash ./bin/kafka-avro-console-consumer --bootstrap-server $BOOTSTRAP_SERVER \ --property basic.auth.credentials.source="USER_INFO" \ --property print.key=true --property print.schema.ids=true \ --offset 15846316 --partition 0 --max-messages 5 \ --property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer \ --property schema.registry.basic.auth.user.info=$SR_APIKEY:$SR_APISECRET \ --consumer.config /Users/vicky/creds.config --topic stocks \ --property schema.registry.basic.auth.user.info=$SR_APIKEY:$SR_APISECRET ``` The output for this example is: ```bash .... ZWZZT {"side":"SELL","quantity":1905,"symbol":"ZWZZT","price":33,"account":"LMN456","userid":"User_9"} 100008 ZVV {"side":"BUY","quantity":4288,"symbol":"ZVV","price":795,"account":"XYZ789","userid":"User_9"} 100008 ZVV {"side":"BUY","quantity":235,"symbol":"ZVV","price":918,"account":"ABC123","userid":"User_7"} 100008 ZWZZT {"side":"BUY","quantity":3041,"symbol":"ZWZZT","price":759,"account":"LMN456","userid":"User_3"} 100008 ZVV {"side":"BUY","quantity":3080,"symbol":"ZVV","price":79,"account":"XYZ789","userid":"User_7"} 100008 Processed a total of 5 messages ``` ## Quick starts - The Schema Registry tutorials provide full walkthroughs on how to enable client applications to read and write Avro data, check schema version compatibility, and use the UIs to manage schemas. - **Schema Registry Tutorial on Confluent Cloud**: Sign up for [Confluent Cloud](https://www.confluent.io/confluent-cloud/) and use the [Confluent Cloud Schema Registry Tutorial](/cloud/current/sr/schema_registry_ccloud_tutorial.html) to get started. - **Schema Registry Tutorial on Confluent Platform**: Download [Confluent Platform](https://www.confluent.io/download/#confluent-platform) and use the [Confluent Platform Schema Registry Tutorial](/platform/current/schema-registry/schema_registry_tutorial.html) to get started. - For a quick hands on introduction, jump to the [Schema Registry module of the free Apache Kafka 101](https://developer.confluent.io/learn-kafka/apache-kafka/schema-registry/) course to learn why you would need a Schema Registry, what it is, and how to get started. Also see the free [Schema Registry 101](https://developer.confluent.io/learn-kafka/schema-registry/) course to learn about the schema formats and how to build, register, manage and evolve schemas. - On Confluent Cloud, try out the interactive tutorials embedded in the Cloud Console. [Take this link to sign up or sign in to Confluent Cloud](https://confluent.cloud/tutorials/schema-registry-getting-started), and try out the guided workflows directly in Confluent Cloud. - To learn about [schema formats](fundamentals/serdes-develop/index.md#serializer-and-formatter), create schemas, and use producers and consumers to send messages to topics, see [Test drive Avro schema](fundamentals/serdes-develop/serdes-avro.md#sr-test-drive-avro), [Test drive Protobuf schema](fundamentals/serdes-develop/serdes-protobuf.md#sr-test-drive-protobuf), and [Test drive JSON Schema](fundamentals/serdes-develop/serdes-json.md#sr-test-drive-json-schema). #### Before You Begin If you are new to Confluent Platform, consider first working through these quick starts and tutorials to get a baseline understanding of the platform (including the role of producers, consumers, and brokers), Confluent Cloud, and Schema Registry. Experience with these workflows will give you better context for schema migration. - [Quick Start for Confluent Platform](../../get-started/platform-quickstart.md#quickstart) - [Quick Start for Apache Kafka using Confluent Cloud](/cloud/current/get-started/index.html) - [Tutorial: Use Schema Registry on Confluent Platform to Implement Schemas for a Client Application](../schema_registry_onprem_tutorial.md#schema-registry-onprem-tutorial) Before you begin schema migration, verify that you have: - [Access to Confluent Cloud](https://www.confluent.io/confluent-cloud/) to serve as the destination Schema Registry - A local install of Confluent Platform; for example, from a [Quick Start for Confluent Platform](../../get-started/platform-quickstart.md#quickstart) download, or other cluster to serve as the origin Schema Registry. Schema migration requires that you configure and run Replicator. If you need more information than is included in the examples here, refer to the [replicator tutorial](../../multi-dc-deployments/replicator/replicator-quickstart.md#replicator-quickstart). ##### Recommended Deployment ![image](images/multi-dc-setup-kafka.png) The image above shows two datacenters - DC A, and DC B. Either could be on-premises, in [Confluent Cloud](/cloud/current/index.html), or part of a [bridge to cloud](installation/migrate.md#schemaregistry-migrate) solution. Each of the two datacenters has its own Apache Kafka® cluster, ZooKeeper cluster, and Schema Registry. The Schema Registry nodes in both datacenters link to the primary Kafka cluster in DC A, and the secondary datacenter (DC B) forwards Schema Registry writes to the primary (DC A). Note that Schema Registry nodes and hostnames must be addressable and routable across the two sites to support this configuration. Schema Registry instances in DC B have `leader.eligibility` set to false, meaning that none can be elected leader during steady state operation with both datacenters online. To protect against complete loss of DC A, Kafka cluster A (the source) is replicated to Kafka cluster B (the target). This is achieved by running the [Replicator](../multi-dc-deployments/replicator/index.md#replicator-detail) local to the target cluster (DC B). In this active-passive setup, Replicator runs in one direction, copying Kafka data and configurations from the active DC A to the passive DC B. The Schema Registry instances in both data centers point to the internal `_schemas` topic in DC A. For the purposes of disaster recovery, you must replicate the [internal schemas topic](fundamentals/index.md#schemaregistry-design) itself. If DC A goes down, the system will failover to DC B. Therefore, DC B needs a copy of the `_schemas` topic for this purpose. Producers write data to just the active cluster. Depending on the overall design, consumers can read data from the active cluster only, leaving the passive cluster for disaster recovery, or from both clusters to optimize reads on a geo-local cache. In the event of a partial or complete disaster in one datacenter, applications can failover to the secondary datacenter. ### Kafka Connect This section describes how to enable security for Kafka Connect. Securing Kafka Connect requires that you configure security for: 1. Kafka Connect workers: part of the Kafka Connect API, a worker is really just an advanced client, underneath the covers 2. Kafka Connect connectors: connectors may have embedded producers or consumers, so you must override the default configurations for Connect producers used with source connectors and Connect consumers used with sink connectors 3. Kafka Connect REST: Kafka Connect exposes a REST API that can be configured to use TLS/SSL using [additional properties](../../../protect-data/encrypt-tls.md#encryption-ssl-rest) Configure security for Kafka Connect as described in the section below. Additionally, if you are using Confluent Control Center streams monitoring for Kafka Connect, configure security for: * [Confluent Metrics Reporter](#sasl-gssapi-metrics-reporter) Configure all the following properties in `connect-distributed.properties`. 1. Configure the Connect workers to use SASL/GSSAPI. ```bash sasl.mechanism=GSSAPI sasl.kerberos.service.name=kafka # Configure SASL_SSL if TLS/SSL encryption is enabled, otherwise configure SASL_PLAINTEXT security.protocol=SASL_SSL ``` 2. Configure the JAAS configuration property with a unique principal, i.e., usually the same name as the user running the worker, and keytab, i.e., secret key, for each worker. ```bash sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required \ useKeyTab=true \ storeKey=true \ keyTab="/etc/security/keytabs/kafka_client.keytab" \ principal="connect@EXAMPLE.COM"; ``` 3. For the connectors to leverage security, you also have to override the default producer/consumer configuration that the worker uses. Depending on whether the connector is a source or sink connector: * Source connector: configure the same properties adding the `producer` prefix. ```bash producer.sasl.mechanism=GSSAPI producer.sasl.kerberos.service.name=kafka # Configure SASL_SSL if TLS/SSL encryption is enabled, otherwise configure SASL_PLAINTEXT producer.security.protocol=SASL_SSL producer.sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required \ useKeyTab=true \ storeKey=true \ keyTab="/etc/security/keytabs/kafka_client.keytab" \ principal="connect@EXAMPLE.COM"; ``` * Sink connector: configure the same properties adding the `consumer` prefix. ```bash consumer.sasl.mechanism=GSSAPI consumer.sasl.kerberos.service.name=kafka # Configure SASL_SSL if TLS/SSL encryption is enabled, otherwise configure SASL_PLAINTEXT consumer.security.protocol=SASL_SSL consumer.sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required \ useKeyTab=true \ storeKey=true \ keyTab="/etc/security/keytabs/kafka_client.keytab" \ principal="connect@EXAMPLE.COM"; ``` ### Configure .NET clients Configure your .NET client with UAMI-specific properties: ```csharp using Confluent.Kafka; using System; using System.Threading.Tasks; public class UamiKafkaClient { // Azure IMDS API version - use 2025-04-07 or later private const string azureIMDSApiVersion = "2025-04-07"; private const string bootstrapEndpoint = ""; private const string uamiClientId = ""; private const string azureIMDSQueryParams = $"api-version={azureIMDSApiVersion}&resource={bootstrapEndpoint}&client_id={uamiClientId}"; private const string kafkaLogicalCluster = ""; private const string identityPoolId = ""; public static async Task Main(string[] args) { var bootstrapServers = args[0]; var topicName = "test-topic"; var groupId = Guid.NewGuid().ToString(); var commonConfig = new ClientConfig { BootstrapServers = bootstrapServers, SecurityProtocol = SecurityProtocol.SaslSsl, SaslMechanism = SaslMechanism.OAuthBearer, SaslOauthbearerMethod = SaslOauthbearerMethod.Oidc, SaslOauthbearerMetadataAuthenticationType = SaslOauthbearerMetadataAuthenticationType.AzureIMDS, SaslOauthbearerConfig = $"query={azureIMDSQueryParams}", SaslOauthbearerExtensions = $"logicalCluster={kafkaLogicalCluster},identityPoolId={identityPoolId}" }; // Use commonConfig with ProducerBuilder or ConsumerBuilder using (var producer = new ProducerBuilder(commonConfig).Build()) { // Producer code here } } } ``` ## Configure Kafka Connect This section describes how to enable security for Kafka Connect. Securing Kafka Connect requires that you configure security for: 1. Kafka Connect workers: part of the Kafka Connect API, a worker is really just an advanced client, underneath the covers 2. Kafka Connect connectors: connectors may have embedded producers or consumers, so you must override the default configurations for Connect producers used with source connectors and Connect consumers used with sink connectors 3. Kafka Connect REST: Kafka Connect exposes a REST API that can be configured to use TLS/SSL using [additional properties](../../../protect-data/encrypt-tls.md#encryption-ssl-rest) Configure security for Kafka Connect as described in the section below. Additionally, if you are using Confluent Control Center streams monitoring for Kafka Connect, configure security for: * [Confluent Metrics Reporter](#sasl-plain-metrics-reporter) Configure all the following properties in `connect-distributed.properties`. 1. Configure the Connect workers to use SASL/PLAIN. ```bash sasl.mechanism=PLAIN ### Principal A principal is an entity that can be authenticated by the authorizer. Clients of a Confluent Server broker identify themselves as a particular principal using various security protocols. The way a principal is identified depends upon which security protocol it uses to connect to the Confluent Server broker (for example: [mTLS](../../../kafka/configure-mds/mutual-tls-auth-rbac.md#mutual-tls-auth-rbac), [SASL/GSSAPI](../../authentication/sasl/gssapi/overview.md#kafka-sasl-auth-gssapi), or [SASL/PLAIN](../../authentication/sasl/plain/overview.md#kafka-sasl-auth-plain)). Authentication depends on the security protocol in place (such as SASL or TLS) to recognize a principal within a Confluent Server broker. The following examples show the principal name format based on the security protocol being used: - When a client connects to a Confluent Server broker using the TLS security protocol, the principal name will be in the form of the TLS certificate subject name: `CN=quickstart.confluent.io,OU=TEST,O=Sales,L=PaloAlto,ST=Ca,C=US`. Note that there are no spaces after the comma between subject parts. - When a client connects to a Confluent Server broker using the SASL security protocol with GSSAPI (Kerberos) mechanism, the principal will be in the Kerberos principal format: `kafka-client@hostname.com`. For more detail, refer to [Kerberos Principal Names](https://docs.oracle.com/cd/E19253-01/816-4557/refer-31/index.html). - When a client connects to a Confluent Server broker using the SASL security protocol with a PLAIN or SCRAM mechanism, the principal is a simple text string, such as `alice`, `admin`, or `billing_etl_job_03`. In the following ACL, the plain text principals (`User:alice`, `User:fred`) are identified as Kafka users who are allowed to run specific operations (read and write) from either of the specified hosts (host-1, host-2) on a specific resource (topic): ```shell kafka-acls --bootstrap-server localhost:9092 \ --command-config adminclient-configs.conf \ --add \ --allow-principal User:alice \ --allow-principal User:fred \ --allow-host host-1 \ --allow-host host-2 \ --operation read \ --operation write \ --topic finance-topic ``` To follow best practices, create one principal per application and give each principal only the ACLs required and no more. For example, if Alice is writing three programs that access different topics to automate a billing workflow, she could create three principals: `billing_etl_job_01`, `billing_etl_job_02`, and `billing_etl_job_03`. She would then grant each principal permissions on only the required topics and run each program with its specific principal. Alternatively, she could take a middle-ground approach and create a single `billing_etl_jobs` principal with access to all topics that the billing programs require and run all three with that principal. Alice should not run these programs as her own principal because she would presumably have broader permissions than the jobs actually need. Running with one principal per application also helps significantly with debugging and auditing because it’s clearer which application is performing each operation. ## Adding security to brokers and clients running TLS or SASL authentication You can secure a running Confluent Platform cluster using one or more of the supported protocols. This is done in phases: 1. Incrementally restart the cluster nodes to open additional secured port(s). 2. Restart Kafka clients using the secured rather than `PLAINTEXT` port (assuming you are securing the client-broker connection). 3. Incrementally restart the cluster again to enable broker-to-broker security (if this is required). 4. A final incremental restart to close the `PLAINTEXT` port. The specific steps for configuring security protocols are described in the respective sections for [TLS](authentication/mutual-tls/overview.md#kafka-ssl-authentication) and [SASL](authentication/overview.md#kafka-sasl-auth). Follow these steps to enable security for your desired protocol(s). The security implementation lets you configure different protocols for both broker-client and broker-broker communication. These must be enabled in separate restarts. A `PLAINTEXT` port must be left open throughout so brokers and/or clients can continue to communicate. When performing an incremental restart, take into consideration the recommendations for doing [rolling restarts](../kafka/post-deployment.md#rolling-restart) to avoid downtime for end users. For example, if you want to encrypt both broker-client and broker-broker communication with TLS: 1. In the first incremental restart, open a TLS port on each node: ```bash listeners=PLAINTEXT://broker1:9091,SSL://broker1:9092 ``` #### NOTE In Confluent Platform clusters, you can update some Confluent Server broker configurations without restarting the broker by adding or removing listeners [dynamically](../kafka/dynamic-config.md#kafka-dynamic-configurations). When adding a new listener, provide the security configuration of the listener using the listener prefix `listener.name.{listenerName}`. If the new listener uses SASL, then provide the JAAS configuration property `sasl.jaas.config` with the listener and mechanism prefix. For more details, refer to [JAAS](authentication/sasl/gssapi/overview.md#jaas-config). 2. Then restart the Kafka clients, changing their configuration to point at the newly-opened, secured port: ```bash bootstrap.servers=[broker1:9092,...] security.protocol=SSL ...etc ``` For more details, refer to [Protect Data in Motion with TLS Encryption in Confluent Platform](protect-data/encrypt-tls.md#kafka-ssl-encryption). 3. In the second incremental server restart, instruct Confluent Platform to use TLS as the broker-broker protocol (which will use the same TLS port): ```bash listeners=PLAINTEXT://broker1:9091,SSL://broker1:9092 security.inter.broker.protocol=SSL ``` 4. In the final restart, secure the cluster by closing the `PLAINTEXT` port: ```bash listeners=SSL://broker1:9092 security.inter.broker.protocol=SSL ``` #### NOTE Use `GenricAvroSerde` to enable both forward and backward schema compatibility if your application requires both forward schema checks of the producer and backward compatibility for Kafka Streams. Usage example for Confluent `GenericAvroSerde`: ```java // Generic Avro serde example import io.confluent.kafka.streams.serdes.avro.GenericAvroSerde; // When configuring the default serdes of StreamConfig final Properties streamsConfiguration = new Properties(); streamsConfiguration.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, GenericAvroSerde.class); streamsConfiguration.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, GenericAvroSerde.class); streamsConfiguration.put("schema.registry.url", "http://my-schema-registry:8081"); // When you want to override serdes explicitly/selectively final Map serdeConfig = Collections.singletonMap("schema.registry.url", "http://my-schema-registry:8081"); // `Foo` and `Bar` are Java classes generated from Avro schemas final Serde keyGenericAvroSerde = new GenericAvroSerde(); keyGenericAvroSerde.configure(serdeConfig, true); // `true` for record keys final Serde valueGenericAvroSerde = new GenericAvroSerde(); valueGenericAvroSerde.configure(serdeConfig, false); // `false` for record values StreamsBuilder builder = new StreamsBuilder(); KStream textLines = builder.stream("my-avro-topic", Consumed.with(keyGenericAvroSerde, valueGenericAvroSerde)); ``` Usage example for Confluent `SpecificAvroSerde`: ```java // Specific Avro serde example import io.confluent.kafka.streams.serdes.avro.SpecificAvroSerde; // When configuring the default serdes of StreamConfig final Properties streamsConfiguration = new Properties(); streamsConfiguration.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, SpecificAvroSerde.class); streamsConfiguration.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, SpecificAvroSerde.class); streamsConfiguration.put("schema.registry.url", "http://my-schema-registry:8081"); // When you want to override serdes explicitly/selectively final Map serdeConfig = Collections.singletonMap("schema.registry.url", "http://my-schema-registry:8081"); // `Foo` and `Bar` are Java classes generated from Avro schemas final Serde keySpecificAvroSerde = new SpecificAvroSerde<>(); keySpecificAvroSerde.configure(serdeConfig, true); // `true` for record keys final Serde valueSpecificAvroSerde = new SpecificAvroSerde<>(); valueSpecificAvroSerde.configure(serdeConfig, false); // `false` for record values StreamsBuilder builder = new StreamsBuilder(); KStream textLines = builder.stream("my-avro-topic", Consumed.with(keySpecificAvroSerde, valueSpecificAvroSerde)); ``` When you create source streams, you specify input serdes by using the Streams DSL. When you construct the processor topology by using the lower-level [Processor API](processor-api.md#streams-developer-guide-processor-api), you can specify the serde class, like the Confluent `GenericAvroSerde` and `SpecificAvroSerde` classes. ```java TopologyBuilder builder = new TopologyBuilder(); builder.addSource("Source", keyGenericAvroSerde.deserializer(), valueGenericAvroSerde.deserializer(), inputTopic); ``` ## Using Kafka Streams within your application code You can call Kafka Streams from anywhere in your application code, but usually these calls are made within the `main()` method of your application, or some variant thereof. The basic elements of defining a processing topology within your application are described below. First, you must create an instance of `KafkaStreams`. * The first argument of the `KafkaStreams` constructor takes a topology (either `StreamsBuilder#build()` for the [DSL](dsl-api.md#streams-developer-guide-dsl) or `Topology` for the [Processor API](processor-api.md#streams-developer-guide-processor-api)) that is used to define a topology. * The second argument is an instance of `StreamsConfig`, which defines the configuration for this specific topology. Code example: ```java import org.apache.kafka.streams.KafkaStreams; import org.apache.kafka.streams.kstream.StreamsBuilder; import org.apache.kafka.streams.processor.Topology; // Use the builders to define the actual processing topology, e.g. to specify // from which input topics to read, which stream operations (filter, map, etc.) // should be called, and so on. We will cover this in detail in the subsequent // sections of this Developer Guide. StreamsBuilder builder = ...; // when using the DSL Topology topology = builder.build(); // // OR // Topology topology = ...; // when using the Processor API // Use the configuration properties to tell your application where the Kafka cluster is, // which Serializers/Deserializers to use by default, to specify security settings, // and so on. Properties props = ...; KafkaStreams streams = new KafkaStreams(topology, props); ``` At this point, internal structures are initialized, but the processing is not started yet. You have to explicitly start the Kafka Streams thread by calling the `KafkaStreams#start()` method: ```java // Start the Kafka Streams threads streams.start(); ``` If there are other instances of this stream processing application running elsewhere (e.g., on another machine), Kafka Streams transparently re-assigns tasks from the existing instances to the new instance that you just started. For more information, see [Stream partitions and tasks](../architecture.md#streams-architecture-tasks) and [Threading model](../architecture.md#streams-architecture-threads). To catch any unexpected exceptions, you can set an `java.lang.Thread.UncaughtExceptionHandler` before you start the application. This handler is called whenever a stream thread is terminated by an unexpected exception: ```java streams.setUncaughtExceptionHander((exception) -> StreamsUncaughtExceptionHandler.StreamThreadExceptionResponse.REPLACE_THREAD); ``` The `StreamsUncaughtExceptionHandler` interface enables responding to exceptions not handled by Kafka Streams. It has one method, `handle`, that returns an enum of type `StreamThreadExceptionResponse`. You have the opportunity to define how Kafka Streams responds to the exception, with three possible values: REPLACE_THREAD, SHUTDOWN_CLIENT, or SHUTDOWN_APPLICATION. #### Option 3: Quarantine corrupted records (dead letter queue) You can provide your own `DeserializationExceptionHandler` implementation. For example, you can choose to forward corrupt records into a quarantine topic (think: a “dead letter queue”) for further processing. To do this, use the [Producer API](../clients/overview.md#kafka-clients) to write a corrupted record directly to the quarantine topic. The drawback of this approach is that “manual” writes are side effects that are invisible to the Kafka Streams runtime library, so they do not benefit from the end-to-end processing guarantees of the Streams API. Code example: ```java public class SendToDeadLetterQueueExceptionHandler implements DeserializationExceptionHandler { KafkaProducer dlqProducer; String dlqTopic; @Override public DeserializationHandlerResponse handle(final ProcessorContext context, final ConsumerRecord record, final Exception exception) { log.warn("Exception caught during Deserialization, sending to the dead queue topic; " + "taskId: {}, topic: {}, partition: {}, offset: {}", context.taskId(), record.topic(), record.partition(), record.offset(), exception); dlqProducer.send(new ProducerRecord<>(dlqTopic, null, record.timestamp(), record.key(), record.value())); return DeserializationHandlerResponse.CONTINUE; } @Override public void configure(final Map configs) { dlqProducer = .. // get a producer from the configs map dlqTopic = .. // get the topic name from the configs map } } Properties streamsSettings = new Properties(); streamsSettings.put( StreamsConfig.DEFAULT_DESERIALIZATION_EXCEPTION_HANDLER_CLASS_CONFIG, SendToDeadLetterQueueExceptionHandler.class.getName() ); ``` ## Confluent provided tools The following CLI tools are officially provided or bundled with your Confluent Platform installation. They offer comprehensive functionalities for managing your Kafka cluster and its components, providing seamless integration and dedicated support. - [Bundled CLI Tools](cli-reference.md#cp-all-cli) - These are the core CLI tools included directly with your Confluent Platform installation, located in the `CONFLUENT_HOME/bin` directory. They provide fundamental features for managing your Kafka cluster and its components, including Kafka tools and Confluent specific utilities. - [Confluent CLI](diagnostics-tool.md#diagnostics-cp) - The Confluent CLI is a unified CLI designed for managing Confluent Platform deployments. It offers a comprehensive set of commands for interacting with Kafka topics, Kafka Connect, ksqlDB, Schema Registry, Role-Based Access Control (RBAC), and more. - [Check Clusters for KRaft Migration](kraft-migration-tool.md#kraft-migration-tool) - The `kraft-migration-tool` is a utility for evaluating your clusters before migration from legacy ZooKeeper-based Kafka clusters to the KRaft mode. This is an important step in the migration process. For more about the migration process, see [Migrate from ZooKeeper to KRaft on Confluent Platform](../installation/migrate-zk-kraft.md#migrate-zk-kraft). - [Confluent Diagnostics Tool](diagnostics-tool.md#diagnostics-cp) - The Confluent Platform Diagnostics Bundle Tool is a dedicated utility for collecting diagnostic information about your Confluent Platform installation. This tool gathers logs, configuration files, process information, and metrics from Kafka brokers and Kafka Connect, consolidating them into a `.tar.gz` file for analysis. This is particularly useful when engaging with [Confluent Support](https://support.confluent.io). - [kcat (formerly kafkacat)](kafkacat-usage.md#kafkacat-usage) - kcat is a command-line utility for testing and debugging Apache Kafka® deployments. You can use kcat to produce and consume messages, and to inspect topic and partition details. ## Set Up Confluent CLI and variables 1. [Install Confluent CLI](https://docs.confluent.io/confluent-cli/current/install.html) locally, v3.28.0 or later (if you already have it installed, update the CLI as described in [Upgrade](https://docs.confluent.io/confluent-cli/current/install.html#upgrade)). Verify the installation was successful. ```none confluent version ``` 2. Using the CLI, log in to Confluent Cloud with the command `confluent login`, and use your Confluent Cloud username and password. The `--save` argument saves your Confluent Cloud user login credentials for future use. ```shell confluent login --save ``` 3. Use the demo Confluent Cloud environment. ```shell CC_ENV=$(confluent environment list -o json \ | jq -r '.[] | select(.name | contains("cp-demo")) | .id') \ && echo "Your Confluent Cloud environment: $CC_ENV" \ && confluent environment use $CC_ENV ``` 4. Get the Confluent Cloud cluster ID and use the cluster. ```shell CCLOUD_CLUSTER_ID=$(confluent kafka cluster list -o json \ | jq -r '.[] | select(.name | contains("cp-demo")) | .id') \ && echo "Your Confluent Cloud cluster ID: $CCLOUD_CLUSTER_ID" \ && confluent kafka cluster use $CCLOUD_CLUSTER_ID ``` 5. Get the bootstrap endpoint for the Confluent Cloud cluster. ```shell CC_BOOTSTRAP_ENDPOINT=$(confluent kafka cluster describe -o json | jq -r .endpoint) \ && echo "Your Cluster's endpoint: $CC_BOOTSTRAP_ENDPOINT" ``` 6. Create a Confluent Cloud service account for CP Demo and get its ID. ```shell confluent iam service-account create cp-demo-sa --description "service account for cp-demo" \ && SERVICE_ACCOUNT_ID=$(confluent iam service-account list -o json \ | jq -r '.[] | select(.name | contains("cp-demo")) | .id') \ && echo "Your cp-demo service account ID: $SERVICE_ACCOUNT_ID" ``` 7. Get the ID and endpoint URL for your Schema Registry cluster. (**Note:** The Schema Registry cluster was created by default when you [added your cloud environment](/cloud/current/get-started/schema-registry.html#cloud-sr-enable-zones).) ```shell CC_SR_CLUSTER_ID=$(confluent schema-registry cluster describe -o json | jq -r .cluster_id) \ && CC_SR_ENDPOINT=$(confluent schema-registry cluster describe -o json | jq -r .endpoint_url) \ && echo "Schema Registry Cluster ID: $CC_SR_CLUSTER_ID" \ && echo "Schema Registry Endpoint: $CC_SR_ENDPOINT" ``` 8. Create a Schema Registry API key for the cp-demo service account. ```shell confluent api-key create \ --service-account $SERVICE_ACCOUNT_ID \ --resource $CC_SR_CLUSTER_ID \ --description "SR key for cp-demo schema link" ``` Verify your output resembles ```text It may take a couple of minutes for the API key to be ready. Save the API key and secret. The secret is not retrievable later. +---------+------------------------------------------------------------------+ | API Key | SZBKJLD67XK5NZNZ | | Secret | NTqs/A3Mt0Ohkk4fkaIsC0oLQ5Q/F0lLowYo/UrsTrEAM5ozxY7fjqxDdVwMJz99 | +---------+------------------------------------------------------------------+ ``` Set variables to reference the Schema Registry credentials returned in the previous step. ```shell SR_API_KEY=SZBKJLD67XK5NZNZ SR_API_SECRET=NTqs/A3Mt0Ohkk4fkaIsC0oLQ5Q/F0lLowYo/UrsTrEAM5ozxY7fjqxDdVwMJz99 ``` 9. Create a Kafka cluster API key for the cp-demo service account. ```shell confluent api-key create \ --service-account $SERVICE_ACCOUNT_ID \ --resource $CCLOUD_CLUSTER_ID \ --description "Kafka key for cp-demo cluster link" ``` Verify your output resembles ```text It may take a couple of minutes for the API key to be ready. Save the API key and secret. The secret is not retrievable later. +---------+-------------------------------------------------------------------+ | API Key | SZBKLMG61XK9NZAB | | Secret | QTpi/A3Mt0Ohkk4fkaIsGR3ATQ5Q/F0lLowYo/UrsTr3AMsozxY7fjqxDdVwMJz02 | +---------+-------------------------------------------------------------------+ ``` Set variables to reference the Kafka credentials returned in the previous step. ```shell CCLOUD_CLUSTER_API_KEY=SZBKLMG61XK9NZAB CCLOUD_CLUSTER_API_SECRET=QTpi/A3Mt0Ohkk4fkaIsGR3ATQ5Q/F0lLowYo/UrsTr3AMsozxY7fjqxDdVwMJz02 ``` 10. We will also need the cluster ID for the on-premises Confluent Platform cluster. ```shell CP_CLUSTER_ID=$(curl -s https://localhost:8091/v1/metadata/id \ --tlsv1.2 --cacert ./scripts/security/snakeoil-ca-1.crt \ | jq -r ".id") \ && echo "Your on-premises Confluent Platform cluster ID: $CP_CLUSTER_ID" ``` ### Topics Confluent Control Center can manage topics in a Kafka cluster. 1. Click **Topics**. 2. Scroll down and click the topic `wikipedia.parsed`. ![image](tutorials/cp-demo/images/topic_list_wikipedia.png) 3. View an overview of this topic: - Throughput - Partition replication status ![image](tutorials/cp-demo/images/topic_actions.png) 4. View which brokers are leaders for which partitions and where all partitions reside. 5. Inspect messages for this topic in real-time. ![image](tutorials/cp-demo/images/topic_inspect.png) 6. View the schema for this topic. For `wikipedia.parsed`, the topic value is using a Schema registered with Schema Registry (the topic key is just a string). ![image](tutorials/cp-demo/images/topic_schema.png) 7. View configuration settings for this topic. ![image](tutorials/cp-demo/images/topic_settings.png) 8. Return to **Topics**, click `wikipedia.parsed.count-by-domain` to view the output topic from the Kafka Streams application. ![image](tutorials/cp-demo/images/count-topic-view.png) 9. Return to **Topics** view and click the **+ Add a topic** button to create a new topic in your Kafka cluster. You can also view and edit settings of Kafka topics in the cluster. Read more on Confluent Control Center [topic management](https://docs.confluent.io/control-center/current/topics/overview.html). ![image](tutorials/cp-demo/images/create_topic.png) # Scripted Confluent Platform Demo The scripted Confluent Platform demo (`cp-demo`) example builds a full Confluent Platform deployment with an Apache Kafka® event streaming application that uses [ksqlDB](../../ksqldb/overview.md#ksql-home) and [Kafka Streams](../../streams/overview.md#kafka-streams) for stream processing, and secures all of the components end-to-end. The tutorial includes a module that makes it a hybrid deployment that runs Cluster Linking and Schema Linking to copy data and schemas from a local on-premises Kafka cluster to Confluent Cloud, a fully-managed service for Kafka. Follow the accompanying guided tutorial to learn how Kafka and Confluent Cloud work with Connect, Confluent Schema Registry, Confluent Control Center, and Cluster Linking with security enabled end-to-end. ### Dimension summary Clusters are billed based on the dimensions listed in the following tables. For every available dimension, the table below lists the [Costs API](https://docs.confluent.io/cloud/current/api.html#tag/Costs-(billingv1)) line item and the unit of measure for the dimension. | Dimension | Line Type | Unit of Measure | |------------------------------------------------|-----------------------------|--------------------------------------------------| | Kafka Storage | `KAFKA_STORAGE` | Cost per GB stored per hour | | Kafka Ingress | `KAFKA_NETWORK_WRITE` | Cost per GB written | | Kafka Egress | `KAFKA_NETWORK_READ` | Cost per GB read | | Confluent Unit for Kafka (CKU/eCKU) | `KAFKA_NUM_CKUS` | Cost per CKU/eCKU per hour | | Kafka Ingress via Kafka REST APIs | `KAFKA_REST_PRODUCE` | Cost per GB written | | KSQL Confluent Streaming Unit (CSU) | `KSQL_NUM_CSUS` | Cost per Confluent Streaming Unit (CSU) per hour | | Connector Capacity for Dedicated Kafka cluster | `CONNECT_CAPACITY` | Cost per hour | | Connect Task | `CONNECT_NUM_TASKS` | Cost per task per hour | | Connect Data Transfer | `CONNECT_THROUGHPUT` | Cost per GB written or read | | Confluent Support Plan | `SUPPORT` | Cost per hour (prorated based on monthly price) | | Cluster Linking Links | `CLUSTER_LINKING_PER_LINK` | Cost per link per hour | | Cluster Linking Ingress | `CLUSTER_LINKING_WRITE` | Cost per GB written | | Cluster Linking Egress | `CLUSTER_LINKING_READ` | Cost per GB read | | Audit Logs | `AUDIT_LOG_READ` | Cost per GB of data read from audit log topics | | Stream Governance Base | `GOVERNANCE_BASE` | Cost per hour | | Schema Registry Schema | `SCHEMA_REGISTRY` | Cost per schema per hour | | Stream Governance Rule | `NUM_RULES` | Cost per rule per hour | | Credit | `PROMO_CREDIT` | Credit issued by Confluent | | Custom Connect Task | `CUSTOM_CONNECT_NUM_TASKS` | Cost per task per hour | | Custom Connect Data Transfer | `CUSTOM_CONNECT_THROUGHPUT` | Cost per GB written or read per hour | | Confluent Unit for Flink (CFU) | `FLINK_NUM_CFUS` | Cost per CFU per minute | ### Configure clients from the Confluent Cloud Console The easiest way to get started connecting your client apps to Confluent Cloud is to copy and paste the configuration file from the Confluent Cloud Console. 1. Log in to Confluent Cloud. 2. Select an environment. 3. Select a cluster. 4. Select **Clients** from the navigation menu. 5. (Optional) Click **+ New client** button. 6. Select the language you are using for your client application. ![image](images/cloud-client-languages.png) 7. Once you have selected a language, create or use existing API keys for your Kafka cluster and Schema Registry cluster as needed. Then, copy and paste the displayed configuration into your client application source code. ![image](images/cloud-client-configuration-example.png) ## Unit Testing Unit tests run very quickly and verify that isolated functional blocks of code work as expected. They can test the logic of your application with minimal dependencies on other services. ksqlDB exposes a test runner command line tool called [ksql-test-runner](/platform/current/ksqldb/how-to-guides/test-an-app.html) that can automatically test whether your ksqlDB statements behave correctly when given a set of inputs. It runs quickly and doesn’t require a running Kafka or ksqlDB cluster. Example in [Kafka Tutorial](https://developer.confluent.io/tutorials/join-a-stream-to-a-stream/ksql.html). Note that it can change and does not have backward compatibility guarantees. With a Kafka Streams application, use [TopologyTestDriver](https://docs.confluent.io/platform/current/streams/developer-guide/test-streams.html), a test class that tests Kafka Streams logic. Its start-up time is very fast, and you can test a single message at a time through a Kafka Streams topology, which allows easy debugging and stepping. Refer to the example in [Kafka Tutorial](https://developer.confluent.io/tutorials/dynamic-output-topic/confluent.html). If you developed your own Kafka Streams Processor, you may want to unit test it as well. Because the `Processor` forwards its results to the context rather than returning them, unit testing requires a mocked context capable of capturing forwarded data for inspection. For these purposes, use [MockProcessorContext](https://docs.confluent.io/platform/current/streams/developer-guide/test-streams.html), with an example in [Kafka Streams test](https://github.com/apache/kafka/blob/trunk/streams/src/test/java/org/apache/kafka/streams/state/internals/TimestampedSegmentsTest.java). For basic Producers and Consumers, there are mock interfaces useful in unit tests. JVM Producer and Consumer unit tests can make use of [MockProducer](/platform/current/clients/javadocs/javadoc/org/apache/kafka/clients/producer/MockProducer.html) and [MockConsumer](/platform/current/clients/javadocs/javadoc/org/apache/kafka/clients/consumer/MockConsumer.html), which implement the same interfaces and mock all the I/O operations as implemented in the `KafkaProducer` and `KafkaConsumer`, respectively. You can refer to a `MockProducer` example in [Build your first Apache KafkaProducer application](https://developer.confluent.io/tutorials/creating-first-apache-kafka-producer-application/confluent.html) and a `MockConsumer` example in [Build your first Apache KafkaConsumer application](https://developer.confluent.io/tutorials/creating-first-apache-kafka-consumer-application/confluent.html). For non-JVM librdkafka Producer and Consumer, it [varies by language](https://docs.confluent.io/platform/current/clients/index.html). You could also use [rdkafka_mock](https://github.com/edenhill/librdkafka/blob/master/src/rdkafka_mock.h), a minimal implementation of the Kafka protocol broker APIs with no other dependencies. Refer to an example in [librdkafka](https://github.com/edenhill/librdkafka/blob/master/tests/0105-transactions_mock.c). #### IMPORTANT The `ConsumerTimestampsInterceptor` is a producer to the `__consumer_timestamps` topic on the source cluster and as such requires appropriate security configurations. These should be provided with the `timestamps.producer.` prefix. for example, `timestamps.producer.security.protocol=SSL`. For more information on security configurations see: - [SSL Encryption](/platform/current/kafka/encryption.html#encryption-ssl-clients) - [SSL Authentication](/platform/current/kafka/authentication_ssl.html#authentication-ssl-clients) - [SASL/SCRAM](/platform/current/kafka/authentication_sasl/authentication_sasl_scram.html#sasl-scram-clients) - [SASL/GSSAPI](/platform/current/kafka/authentication_sasl/authentication_sasl_gssapi.html#sasl-gssapi-clients) - [SASL/PLAIN](/platform/current/kafka/authentication_sasl/authentication_sasl_plain.html#sasl-plain-clients) The interceptor also requires ACLs for the `__consumer_timestamps` topic. The consumer principal requires WRITE and DESCRIBE operations on the `__consumer_timestamps` topic. To learn more, see: - [Understanding Consumer Offset Translation](/platform/current/multi-dc-deployments/replicator/replicator-failover.html#consumer-offset-translation-feature) - Discussion on consumer offsets and timestamp preservation in the whitepaper on [Disaster Recovery for Multi-Datacenter Apache Kafka Deployments](https://www.confluent.io/white-paper/disaster-recovery-for-multi-datacenter-apache-kafka-deployments/). # Install Custom Connectors for Confluent Cloud Learn how to install your custom connector in Confluent Cloud. * [Overview](overview.md) * [Quick Start](custom-connector-qs.md) * [Getting a connector](custom-connector-qs.md#getting-a-connector) * [Packaging a custom connector](custom-connector-qs.md#packaging-a-custom-connector) * [Uploading and launching the connector](custom-connector-qs.md#uploading-and-launching-the-connector) * [Manage Custom Connectors](custom-connector-manage.md) * [Search for a custom connector](custom-connector-manage.md#search-for-a-custom-connector) * [Get notifications](custom-connector-manage.md#get-notifications) * [Modify a custom connector configuration](custom-connector-manage.md#modify-a-custom-connector-configuration) * [Override configuration properties](custom-connector-manage.md#override-configuration-properties) * [Update networking endpoints](custom-connector-manage.md#update-networking-endpoints) * [Custom connector logs](custom-connector-manage.md#custom-connector-logs) * [View metrics](custom-connector-manage.md#view-metrics) * [Delete a custom connector](custom-connector-manage.md#delete-a-custom-connector) * [View a custom connector plugin ID](custom-connector-manage.md#view-a-custom-connector-plugin-id) * [Delete a custom connector plugin](custom-connector-manage.md#delete-a-custom-connector-plugin) * [Limitations and Support](custom-connector-fands.md) * [Limitations](custom-connector-fands.md#limitations) * [Shared responsibility](custom-connector-fands.md#shared-responsibility) * [Confluent and Partner support](custom-connector-fands.md#confluent-and-partner-support) * [Certified Partner-built connectors](custom-connector-fands.md#certified-partner-built-connectors) * [Supported AWS, Azure and GCP regions](custom-connector-fands.md#supported-aws-az-and-gcp-regions) * [Schema Registry integration](custom-connector-fands.md#sr-integration) * [App log topic](custom-connector-fands.md#app-log-topic) * [API and CLI](custom-connector-cli.md) * [Custom Connector API](custom-connector-cli.md#custom-connector-api) * [Custom Connector CLI](custom-connector-cli.md#custom-connector-cli) * [Custom Connector Plugin CLI](custom-connector-cli.md#custom-connector-plugin-cli) * [Custom Connector Plugin Version CLI](custom-connector-cli.md#custom-connector-plugin-version-cli) * [Unsupported connector CLI commands](custom-connector-cli.md#unsupported-connector-cli-commands) * [Command reference](custom-connector-cli.md#command-reference) ## Quick Start Use this quick start to get up and running with the Confluent Cloud ActiveMQ source connector. Prerequisites : - Authorized access to a [Confluent Cloud](https://www.confluent.io/confluent-cloud/) cluster on Amazon Web Services (AWS), Microsoft Azure (Azure), or Google Cloud. - Access to an ActiveMQ message broker. - The Confluent CLI installed and configured for the cluster. See [Install the Confluent CLI](https://docs.confluent.io/confluent-cli/current/install.html). - [Schema Registry](../get-started/schema-registry.md#cloud-sr-config) must be enabled to use a Schema Registry-based format (for example, Avro, JSON_SR (JSON Schema), or Protobuf). See [Schema Registry Enabled Environments](limits.md#connect-ccloud-environment-limits) for additional information. - For networking considerations, see [Networking and DNS](overview.md#connect-internet-access-resources). To use a set of public egress IP addresses, see [Public Egress IP Addresses for Confluent Cloud Connectors](static-egress-ip.md#cc-static-egress-ips). - Kafka cluster credentials. The following lists the different ways you can provide credentials. - Enter an existing [service account](service-account.md#s3-cloud-service-account) resource ID. - Create a Confluent Cloud [service account](service-account.md#s3-cloud-service-account) for the connector. Make sure to review the ACL entries required in the [service account documentation](service-account.md#s3-cloud-service-account). Some connectors have specific ACL requirements. - Create a Confluent Cloud API key and secret. To create a key and secret, you can use [confluent api-key create](https://docs.confluent.io/confluent-cli/current/command-reference/api-key/confluent_api-key_create.html) *or* you can autogenerate the API key and secret directly in the Cloud Console when setting up the connector. ## Features The Amazon CloudWatch Logs Source connector provides the following features: * **At least once delivery**: The connector guarantees that records are delivered at least once to the Kafka topic. * **Supports multiple tasks**: The connector supports running one or more tasks. More tasks may improve performance. The connector can start at one task to support all import data and can scale up to one task per log stream. One task per log stream raises the performance up to the greatest number of log streams that Amazon supports (10,000 logs per second or 1 MB per second). * **Customize topic format**: The connector sources data from a single log group and can write to one topic per log stream. There is a Kafka topic format property (CLI property `kafka.topic.format`) you can use to customize the topic names for each log stream. * **Supported data formats**: The connector supports Avro, String and JSON (schemaless) output formats. [Schema Registry](../get-started/schema-registry.md#cloud-sr-config) must be enabled to use a Schema Registry-based format (for example, Avro). See [Schema Registry Enabled Environments](limits.md#connect-ccloud-environment-limits) for additional information. * **Provider integration support**: The connector supports IAM role-based authorization using the Confluent Provider Integration. For more information about provider integration setup, see the [IAM roles authentication](#cc-cloudwatch-source-setup-connection). * **Enhanced log stream capacity**: The connector now supports more than 50 log streams, removing the previous limitation and allowing for greater scalability in log ingestion scenarios. For more information and examples to use with the Confluent Cloud API for Connect, see the [Confluent Cloud API for Connect Usage Examples](connect-api-section.md#ccloud-connect-api) section. ## Quick Start Use this quick start to get up and running with the Confluent Cloud Amazon CloudWatch Metrics Sink connector. The quick start provides the basics of selecting the connector and configuring it to send records to Amazon CloudWatch. Prerequisites : * Authorized access to a [Confluent Cloud](https://www.confluent.io/confluent-cloud/) cluster on AWS. * The Confluent CLI installed and configured for the cluster. See [Install the Confluent CLI](https://docs.confluent.io/confluent-cli/current/install.html). * [Schema Registry](../get-started/schema-registry.md#cloud-sr-config) must be enabled to use a Schema Registry-based format (for example, Avro, JSON_SR (JSON Schema), or Protobuf). See [Schema Registry Enabled Environments](limits.md#connect-ccloud-environment-limits) for additional information. * For networking considerations, see [Networking and DNS](overview.md#connect-internet-access-resources). To use a set of public egress IP addresses, see [Public Egress IP Addresses for Confluent Cloud Connectors](static-egress-ip.md#cc-static-egress-ips). * An AWS account configured with [Access Keys](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys). * The Amazon CloudWatch Metrics region must in the same region where your Confluent Cloud cluster is located (where you are running the connector). Note that the hard-coded endpoint URL for the connector is set to `https://monitoring.{kafka-cluster-region}.amazonaws.com`. This sets the Amazon CloudWatch region to your Kafka cluster region. - Kafka cluster credentials. The following lists the different ways you can provide credentials. - Enter an existing [service account](service-account.md#s3-cloud-service-account) resource ID. - Create a Confluent Cloud [service account](service-account.md#s3-cloud-service-account) for the connector. Make sure to review the ACL entries required in the [service account documentation](service-account.md#s3-cloud-service-account). Some connectors have specific ACL requirements. - Create a Confluent Cloud API key and secret. To create a key and secret, you can use [confluent api-key create](https://docs.confluent.io/confluent-cli/current/command-reference/api-key/confluent_api-key_create.html) *or* you can autogenerate the API key and secret directly in the Cloud Console when setting up the connector. ## Features * **Auto-created tables**: Tables can be auto-created based on topic names and auto-evolved based on the record schema. * **Select configuration properties**: - `aws.dynamodb.pk.hash`: Defines how the DynamoDB table hash key is extracted from the records. By default, the Kafka partition number where the record is generated is used as the hash key. Other record references can be used to create the hash key. See [DynamoDB hash keys and sort keys](#cc-amazon-dynamodb-sink-hash-sort) for examples. - `aws.dynamodb.pk.sort`: Defines how the DynamoDB table sort key is extracted from the records. By default, the record offset is used as the sort key. The sort key can be created from other references. See [DynamoDB hash keys and sort keys](#cc-amazon-dynamodb-sink-hash-sort) for examples. * **Provider integration support**: The connector supports IAM role-based authorization using Confluent Provider Integration. For more information about provider integration setup, see the [IAM roles authentication](#cc-amazon-dynamodb-sink-setup-connection). For more information and examples to use with the Confluent Cloud API for Connect, see the [Confluent Cloud API for Connect Usage Examples](connect-api-section.md#ccloud-connect-api) section. ### Configuration Note that configuration properties that are not shown in the Cloud Console use the default values. For all property values and definitions, see [Configuration properties](#cc-amazon-dynamodb-cdc-source-config-properties). 1. Select the output record value format: Avro, JSON Schema, Protobuf. A valid schema must be available in [Schema Registry](../get-started/schema-registry.md#cloud-sr-config) to use a schema-based message format (for example, Avro, JSON Schema, or Protobuf). For additional information, see [Schema Registry Enabled Environments](limits.md#connect-ccloud-environment-limits). 2. In the **AWS DynamoDB API Endpoint** field, enter the AWS DynamoDB API endpoint. 3. Select a table sync mode from the **DynamoDB Table Sync Mode** dropdown list. Valid values are: - `CDC`: Perform CDC only. - `SNAPSHOT`: Perform a snapshot only. - `SNAPSHOT_CDC` (Default): The connector starts with a snapshot and then switches to CDC mode upon completion. 4. Select a table discovery mode from the **Table Discovery Mode** dropdown list. Valid values are: - `INCLUDELIST`: A comma-separated list of DynamoDB table names to be captured. This is required if `dynamodb.table.discovery.mode` is set to `INCLUDELIST`. - `TAG`: A semi-colon-separated list of pairs in the form `:,` that is used to create tag filters. For example, `key1:v1,v2;key2:v3,v4` will include all tags that match `key1` key with value of either `v1` or `v2`, and match `key2` with value of either `v3` or `v4`. Any `keys` not specified will be excluded. 5. In the **Tables Include List** field, enter a comma-separated list of DynamoDB table names to be captured. Note that this is required if `dynamodb.table.discovery.mode` is set to `INCLUDELIST`. ### **Show advanced configurations** - **Schema context**: Select a schema context to use for this connector, if using a schema-based data format. This property defaults to the **Default** context, which configures the connector to use the default schema set up for Schema Registry in your Confluent Cloud environment. A schema context allows you to use separate schemas (like schema sub-registries) tied to topics in different Kafka clusters that share the same Schema Registry environment. For example, if you select a non-default context, a **Source** connector uses only that schema context to register a schema and a **Sink** connector uses only that schema context to read from. For more information about setting up a schema context, see [What are schema contexts and when should you use them?](../sr/faqs-cc.md#faq-schema-contexts). **CDC Details** - **Prefix for CDC Checkpointing table**: Prefix for CDC Checkpointing tables, must be unique per connector. Checkpointing table is used to store the last processed record for each shard and is used to resume from last processed record in case of connector restart. This is applicable only in CDC mode. - **CDC Checkpointing Table Billing Mode**: Define billing mode for internal checkpoint table created with CDC. Valid values are `PROVISIONED` and `PAY_PER_REQUEST`. Default is `PROVISIONED`. Use `PAY_PER_REQUEST` for unpredictable application traffic and on-demand billing mode. Use `PROVISIONED` for predictable application traffic and provisioned billing mode. - **Max number of records per DynamoDB Streams poll**: The maximum number of records that can be returned in single DynamoDB Stream getRecords operation. Only applicable in CDC phase. Default value is `5000`. **Snapshot Details** - **Max records per Table Scan**: Maximum number of records that can be returned in single DynamoDB read operation. Only applicable to `SNAPSHOT` phase. Note that there is 1 MB size limit as well. - **Snapshot Table RCU consumption percentage**: Configure the percentage of table read capacity that will be used as a maximum limit of RCU consumption rate. **DynamoDB Details** - **Maximum batch size**: The maximum number of records the connector will wait for before publishing the data on the topic. The connector may still return fewer records if no additional records are available. - **Poll linger milliseconds**: The maximum time to wait for a record before returning an empty batch. The default is 5 seconds. **Processing position** Define a specific offset position for this connector to begin procession data from by clicking **Set offsets**. For more information on managing offsets, [Manage Offsets for Fully-Managed Connectors in Confluent Cloud](offsets.md#connect-custom-offsets) **Auto-restart policy** - **Enable Connector Auto-restart**: Control the auto-restart behavior of the connector and its task in the event of user-actionable errors. Defaults to `true`, enabling the connector to automatically restart in case of user-actionable errors. Set this property to `false` to disable auto-restart for failed connectors. In such cases, you would need to manually restart the connector. **Consumer configuration** - **Max poll interval(ms)**: Set the maximum delay between subsequent consume requests to Kafka. Use this property to improve connector performance in cases when the connector cannot send records to the sink system. The default is 300,000 milliseconds (5 minutes). - **Max poll records**: Set the maximum number of records to consume from Kafka in a single request. Use this property to improve connector performance in cases when the connector cannot send records to the sink system. The default is 500 records. **Transforms** - **Single Message Transforms**: To add a new SMT, see [Add transforms](single-message-transforms.md#cc-single-message-transforms-ui). For more information about unsupported SMTs, see [Unsupported transformations](single-message-transforms.md#cc-single-message-transforms-unsupported-transforms). For all property values and definitions, see [Configuration properties](#cc-amazon-dynamodb-cdc-source-config-properties). 6. Click **Continue**. ## Features The AWS Lambda Sink connector provides the following features: * **Supports multiple Lambda functions**: The connector supports a single AWS Lambda function or multiple Lambda functions. * **Provider integration support**: The connector supports IAM role-based authorization using Confluent Provider Integration. For more information about provider integration setup, see the [IAM roles authentication](#cc-aws-lambda-sink-setup-connection). * **Synchronous and Asynchronous Lambda function invocation**: The AWS Lambda function can be invoked by this connector either synchronously or asynchronously. * **At-least-once delivery**: The connector guarantees at-least-once processing semantics in synchronous mode. In asynchronous mode, at-least-once delivery is guaranteed, but it does not guarantee at-least-once processing by the AWS Lambda function. This is because AWS Lambda may drop async events if it cannot process them after a few retries. Under certain circumstances, a record may be processed more than once. You should design your AWS Lambda function to be [idempotent](https://aws.amazon.com/premiumsupport/knowledge-center/lambda-function-idempotent/). If you have configured the connector to log the response from the Lambda function to a Kafka topic, the topic can contain duplicate records. You can enable Kafka log compaction on the topic to remove duplicate records. Alternatively, you can write a ksqlDB query to detect duplicate records in a time window. * **Supports multiple tasks**: The connector supports running one or more tasks. More tasks may improve performance. * **Results topics**: In synchronous mode, AWS Lambda results are stored in the `success-` and `error-` topics. * **Input Data Format with or without a Schema**: The connector supports input data from Kafka topics in Avro, JSON Schema (JSON_SR), Protobuf, JSON (schemaless), or Bytes format. [Schema Registry](../get-started/schema-registry.md#cloud-sr-config) must be enabled to use a Schema Registry-based format. #### NOTE If no schema is defined, values are encoded as plain strings. For example, `"name": "Kimberley Human"` is encoded as `name=Kimberley Human`. * **Backward compatibility**: The API for this connector is compatible with earlier versions. * **Supports AWS Lambda function versions and aliases**: The connector supports invoking specific AWS Lambda function versions or aliases by appending a colon and the desired version or alias to the function name (for example, `function:1` for a version or `function:alias` for an alias). For more information and examples to use with the Confluent Cloud API for Connect, see the [Confluent Cloud API for Connect Usage Examples](connect-api-section.md#ccloud-connect-api) section. # Tables and Topics in Confluent Cloud for Apache Flink Apache Flink® and the Table API use the concept of dynamic tables to facilitate the manipulation and processing of streaming data. Dynamic tables represent an abstraction for working with both batch and streaming data in a unified manner, offering a flexible and expressive way to define, modify, and query structured data. In contrast to the static tables that represent batch data, dynamic tables change over time. But like static batch tables, systems can execute queries over dynamic tables. Confluent Cloud for Apache Flink® implements ANSI-Standard SQL and has the familiar concepts of catalogs, databases, and tables. Confluent Cloud maps a Flink catalog to an environment and *vice-versa*. Similarly, Flink databases and tables are mapped to Apache Kafka® clusters and topics. For more information, see [Metadata mapping between Kafka cluster, topics, schemas, and Flink](../overview.md#ccloud-flink-overview-metadata-mapping). # Get Started with Confluent Cloud for Apache Flink Welcome to Confluent Cloud for Apache Flink®. This section guides you through the steps to get your queries running using the Confluent Cloud Console (browser-based) and the Flink SQL shell (CLI-based). If you’re currently using Confluent Cloud in a region that doesn’t yet support Flink, so you can’t use your data in existing Apache Kafka® topics, you can still try out Flink SQL by using sample data generators or the [Example catalog](../reference/example-data.md#flink-sql-example-data), which are used in the quick starts and [How-to Guides for Confluent Cloud for Apache Flink](../how-to-guides/overview.md#flink-sql-how-to-guides). Choose one of the following quick starts to get started with Flink SQL on Confluent Cloud: - [Flink SQL Quick Start with Confluent Cloud Console](quick-start-cloud-console.md#flink-sql-quick-start-cloud-console) - [Flink SQL Shell Quick Start](quick-start-shell.md#flink-sql-quick-start-shell) Also, you can access Flink by using the [REST API](../operate-and-deploy/flink-rest-api.md#flink-rest-api) and the [Confluent Terraform Provider](../operate-and-deploy/deploy-flink-sql-statement.md#flink-deploy-sql-statement). - [REST API-based data streams](https://github.com/confluentinc/demo-scene/tree/master/http-streaming) - [Sample Project for Confluent Terraform Provider](https://registry.terraform.io/providers/confluentinc/confluent/latest/docs/guides/sample-project) If you get stuck, have a question, or want to provide feedback or feature requests, don’t hesitate to reach out. Check out [Get Help with Confluent Cloud for Apache Flink](../get-help.md#ccloud-flink-help) for our support channels. ### Cloud | Command | Description | |------------------------------------------------------------------------------------------------------|------------------------------------------------------------| | [confluent ai](confluent_ai.md#confluent-ai) | Start an interactive AI shell. | | [confluent api-key](api-key/index.md#confluent-api-key) | Manage API keys. | | [confluent asyncapi](asyncapi/index.md#confluent-asyncapi) | Manage AsyncAPI document tooling. | | [confluent audit-log](audit-log/index.md#confluent-audit-log) | Manage audit log configuration. | | [confluent billing](billing/index.md#confluent-billing) | Manage Confluent Cloud billing. | | [confluent byok](byok/index.md#confluent-byok) | Manage your keys in Confluent Cloud. | | [confluent ccpm](ccpm/index.md#confluent-ccpm) | Manage custom Connect plugin management (CCPM). | | [confluent cloud-signup](confluent_cloud-signup.md#confluent-cloud-signup) | Sign up for Confluent Cloud. | | [confluent completion](confluent_completion.md#confluent-completion) | Print shell completion code. | | [confluent configuration](configuration/index.md#confluent-configuration) | Configure the Confluent CLI. | | [confluent connect](connect/index.md#confluent-connect) | Manage Kafka Connect. | | [confluent context](context/index.md#confluent-context) | Manage CLI configuration contexts. | | [confluent environment](environment/index.md#confluent-environment) | Manage and select Confluent Cloud environments. | | [confluent feedback](confluent_feedback.md#confluent-feedback) | Submit feedback for the Confluent CLI. | | [confluent flink](flink/index.md#confluent-flink) | Manage Apache Flink. | | [confluent iam](iam/index.md#confluent-iam) | Manage RBAC and IAM permissions. | | [confluent kafka](kafka/index.md#confluent-kafka) | Manage Apache Kafka. | | [confluent ksql](ksql/index.md#confluent-ksql) | Manage ksqlDB. | | [confluent local](local/index.md#confluent-local) | Manage a local Confluent Platform development environment. | | [confluent login](confluent_login.md#confluent-login) | Log in to Confluent Cloud or Confluent Platform. | | [confluent logout](confluent_logout.md#confluent-logout) | Log out of Confluent Cloud. | | [confluent network](network/index.md#confluent-network) | Manage Confluent Cloud networks. | | [confluent organization](organization/index.md#confluent-organization) | Manage your Confluent Cloud organizations. | | [confluent plugin](plugin/index.md#confluent-plugin) | Manage Confluent plugins. | | [confluent prompt](confluent_prompt.md#confluent-prompt) | Add Confluent CLI context to your terminal prompt. | | [confluent provider-integration](provider-integration/index.md#confluent-provider-integration) | Manage Confluent Cloud provider integrations. | | [confluent schema-registry](schema-registry/index.md#confluent-schema-registry) | Manage Schema Registry. | | [confluent service-quota](service-quota/index.md#confluent-service-quota) | Look up Confluent Cloud service quota limits. | | [confluent shell](confluent_shell.md#confluent-shell) | Start an interactive shell. | | [confluent stream-share](stream-share/index.md#confluent-stream-share) | Manage stream shares. | | [confluent tableflow](tableflow/index.md#confluent-tableflow) | Manage Tableflow. | | [confluent unified-stream-manager](unified-stream-manager/index.md#confluent-unified-stream-manager) | Manage Unified Stream Manager clusters. | | [confluent update](confluent_update.md#confluent-update) | Update the Confluent CLI. | | [confluent version](confluent_version.md#confluent-version) | Show version of the Confluent CLI. | ## Add a connector (non-RBAC environment) Follow these steps to configure a source or sink connector by completing the applicable UI fields. You can also add a connector by [uploading a connector configuration file](#c3-upload-connector-config). For details about connector settings common and unique to source and sink connectors, see [Configuring Connectors](/kafka-connectors/self-managed/configuring.html). This procedure is applicable to a non-RBAC workflow. For an role-based access control (RBAC) workflow, see [Add a connector (RBAC environment)](#c3-add-connector-rbac-workflow)). There are two steps (tabs) to complete in this workflow: - Set up the connection. - Test and verify. **To add a connector in a non-RBAC environment** 1. Select a cluster from the navigation bar and click the **Connect** menu. The [All Connect Clusters page](#c3-all-connect-clusters-page) opens. ![image](images/c3-all-connect-clusters-page.png) 2. In the **Cluster name** column, click the **connect-default** link (or the link for your Connect cluster). The [Connectors page](#c3-connectors-page) opens. ![image](images/c3-no-connectors.png) 3. Click **Add connector**. The Browse page for selecting connectors opens. The connectors that initially appear in this page are [bundled](/kafka-connectors/self-managed/supported.html) with Confluent Platform. To narrow the available selections, select either **Sources** or **Sinks** from the **Filter by type** menu. ![image](images/c3-connect-select-connector.png) 4. Click the tile for the connector you want to configure. The **Add Connector** page opens to the **01 Setup Connection** tab. Use the shortcut panel to the right to navigate the list of configurations. ![image](images/c3-add-connector.png) 5. Complete the fields as appropriate for the connector. Required fields are indicated with an asterisk. ### Generalized GCS Source connector configuration When configuring the [Generalized Google Cloud Storage Source connector](https://docs.confluent.io/kafka-connectors/gcs-source/current/generalized/overview.html), you won’t be able to add the `topic.regex.list` configuration parameter if the mode for the connector is set to `RESTORE_BACKUP`, which is the default mode. If you set the mode to `GENERIC`, you will see `topic.regex.list` listed as an option under **Kafka Topic Regex** in the **Topic** section. For more details about each of these parameters, see the [Generalized GCS Source Connector Configuration Properties](https://docs.confluent.io/kafka-connectors/gcs-source/current/configuration_options.html#generalized-connector-parameters) page. 6. (Optional) If there are additional properties you need to add, click **Add a property**. The **Additional Properties** dialog opens for you to enter the property name. After entering the property name, enter the value for the property. ![image](images/c3-connector-add-property.png)![image](images/c3-connector-add-prop-modal.png)![image](images/c3-connector-addl-props.png) To delete a property, click the trash icon. You can undo the operation. 7. Click **Continue**. The **02 Test and verify** page opens. (If the 02 Security page opens, see [RBAC workflow](#c3-add-connector-rbac-workflow).) ![image](images/c3-connect-download-config-link.png) 8. (Optional) Click **Download connector config file**. See [Download a connector configuration file](#c3-download-connector-config) for details. 9. Review the information and click **Launch**. The information displayed is sent to the [Connect REST API](/platform/current/connect/references/restapi.html#connect-userguide-rest). - If the configuration was successful, the connector appears in the connectors table within the [Connectors page](#c3-connectors-page). Green bars indicate the connector is running. - If the configuration was unsuccessful, the **Status** column indicates Failed. Red bars indicate the connector is not running. In the **Name** column, click the link for the connector and edit the configuration fields. Repeat the process as necessary. #### NOTE - These are properties for the self-managed connector. If you are using Confluent Cloud, see [Google BigQuery Sink Connector for Confluent Cloud](/cloud/current/connectors/cc-gcp-bigquery-sink.html). - New tables and updated schemas take a few minutes to be detected by the Google Client Library. For more information see the Google Cloud [BigQuery API guide](https://cloud.google.com/bigquery/docs/error-messages#metadata-errors-for-streaming-inserts). `defaultDataset` : The default dataset to be used * Type: string * Importance: high #### NOTE `defaultDataset` replaced the `datasets` parameter of older versions of this connector. `project` : The BigQuery project to write to. * Type: string * Importance: high `topics` : A list of Kafka topics to read from. * Type: list * Importance: high `autoCreateTables` : Create BigQuery tables if they don’t already exist. This property should only be enabled for Schema Registry-based inputs: Avro, Protobuf, or JSON Schema (JSON_SR). Table creation is not supported for JSON input. * Type: boolean * Default: false * Importance: high `gcsBucketName` : The name of the bucket where Google Cloud Storage (GCS) blobs are located. These blobs are used to batch-load to BigQuery. This is applicable only if `enableBatchLoad` is configured. * Type: string * Default: “” * Importance: high `queueSize` : The maximum size (or -1 for no maximum size) of the worker queue for BigQuery write requests before all topics are paused. This is a soft limit; the size of the queue can go over this before topics are paused. All topics resume once a flush is triggered or the size of the queue drops under half of the maximum size. * Type: long * Default: -1 * Valid Values: [-1,…] * Importance: high `bigQueryRetry` : The number of retry attempts made for a BigQuery request that fails with a backend error or a quota exceeded error. * Type: int * Default: 0 * Valid Values: [0,…] * Importance: medium `bigQueryRetryWait` : The minimum amount of time, in milliseconds, to wait between retry attempts for a BigQuery backend or quota exceeded error. * Type: long * Default: 1000 * Valid Values: [0,…] * Importance: medium `bigQueryMessageTimePartitioning` : Whether or not to use the message time when inserting records. Default uses the connector processing time. * Type: boolean * Default: false * Importance: high `bigQueryPartitionDecorator` : Whether or not to append partition decorator to BigQuery table name when inserting records. Default is true. Setting this to true appends partition decorator to table name (e.g. table$yyyyMMdd depending on the configuration set for bigQueryPartitionDecorator). Setting this to false bypasses the logic to append the partition decorator and uses raw table name for inserts. * Type: boolean * Default: true * Importance: high `timestampPartitionFieldName` : The name of the field in the value that contains the timestamp to partition by in BigQuery and enable timestamp partitioning for each table. Leave this configuration blank, to enable ingestion time partitioning for each table. * Type: string * Default: null * Importance: low `clusteringPartitionFieldNames` : Comma-separated list of fields where data is clustered in BigQuery. * Type: list * Default: null * Importance: low `timePartitioningType` : The time partitioning type to use when creating tables. Existing tables will not be altered to use this partitioning type. * Type: string * Default: DAY * Valid Values: (case insensitive) [MONTH, YEAR, HOUR, DAY] * Importance: low `keySource` : Determines whether the keyfile configuration is the path to the credentials JSON file or to the JSON itself. Available values are `FILE` and `JSON`. This property is available in BigQuery sink connector version 1.3 (and later). * Type: string * Default: FILE * Importance: medium `keyfile` : `keyfile` can be either a string representation of the Google credentials file or the path to the Google credentials file itself. The string representation of the Google credentials file is supported in BigQuery sink connector version 1.3 (and later). * Type: string * Default: null * Importance: medium `sanitizeTopics` : Designates whether to automatically sanitize topic names before using them as table names. If not enabled, topic names are used as table names. * Type: boolean * Default: false * Importance: medium `schemaRetriever` : A class that can be used for automatically creating tables and/or updating schemas. Note that in version 2.0.0, SchemaRetriever API changed to retrieve the schema from each SinkRecord, which will help support multiple schemas per topic. `SchemaRegistrySchemaRetriever` has been removed as it retrieves schema based on the topic. * Type: class * Default: `com.wepay.kafka.connect.bigquery.retrieve.IdentitySchemaRetriever` * Importance: medium `threadPoolSize` : The size of the BigQuery write thread pool. This establishes the maximum number of concurrent writes to BigQuery. * Type: int * Default: 10 * Valid Values: [1,…] * Importance: medium `allBQFieldsNullable` : If true, no fields in any produced BigQuery schema are REQUIRED. All non-nullable Avro fields are translated as `NULLABLE` (or `REPEATED`, if arrays). * Type: boolean * Default: false * Importance: low `avroDataCacheSize` : The size of the cache to use when converting schemas from Avro to Kafka Connect. * Type: int * Default: 100 * Valid Values: [0,…] * Importance: low `batchLoadIntervalSec` : The interval, in seconds, in which to attempt to run GCS to BigQuery load jobs. Only relevant if `enableBatchLoad` is configured. * Type: int * Default: 120 * Importance: low `convertDoubleSpecialValues` : Designates whether +Infinity is converted to Double.MAX_VALUE and whether -Infinity and NaN are converted to Double.MIN_VALUE to ensure successful delivery to BigQuery. * Type: boolean * Default: false * Importance: low `enableBatchLoad` : **Beta Feature** Use with caution. The sublist of topics to be batch loaded through GCS. * Type: list * Default: “” * Importance: low `includeKafkaData` : Whether to include an extra block containing the Kafka source topic, offset, and partition information in the resulting BigQuery rows. * Type: boolean * Default: false * Importance: low `upsertEnabled` : Enable upsert functionality on the connector through the use of record keys, intermediate tables, and periodic merge flushes. Row-matching will be performed based on the contents of record keys. This feature won’t work with SMTs that change the name of the topic and doesn’t support JSON input. * Type: boolean * Default: false * Importance: low `deleteEnabled` : Enable delete functionality on the connector through the use of record keys, intermediate tables, and periodic merge flushes. A delete will be performed when a record with a null value (that is–a tombstone record) is read. This feature will not work with SMTs that change the name of the topic and doesn’t support JSON input. * Type: boolean * Default: false * Importance: low `intermediateTableSuffix` : A suffix that will be appended to the names of destination tables to create the names for the corresponding intermediate tables. Multiple intermediate tables may be created for a single destination table, but their names will always start with the name of the destination table, followed by this suffix, and possibly followed by an additional suffix. * Type: string * Default: “tmp” * Importance: low `mergeIntervalMs` : How often (in milliseconds) to perform a merge flush, if upsert/delete is enabled. Can be set to `-1` to disable periodic flushing. * Type: long * Default: 60_000L * Importance: low `mergeRecordsThreshold` : How many records to write to an intermediate table before performing a merge flush, if upsert/delete is enabled. Can be set to `-1` to disable record count-based flushing. * Type: long * Default: -1 * Importance: low `autoCreateBucket` : Whether to automatically create the given bucket, if it does not exist. * Type: boolean * Default: true * Importance: medium `allowNewBigQueryFields` : If true, new fields can be added to BigQuery tables during subsequent schema updates. * Type: boolean * Default: false * Importance: medium `allowBigQueryRequiredFieldRelaxation` : If true, fields in BigQuery Schema can be changed from `REQUIRED` to `NULLABLE`. Note that `allowNewBigQueryFields` and `allowBigQueryRequiredFieldRelaxation` replaced the `autoUpdateSchemas` parameter of older versions of this connector. * Type: boolean * Default: false * Importance: medium `allowSchemaUnionization` : If true, the existing table schema (if one is present) will be unionized with new record schemas during schema updates. If false, the record of the last schema in a batch will be used for any necessary table creation and schema update attempts. Note that setting `allowSchemaUnionization` to `false` and `allowNewBigQueryFields` and `allowBigQueryRequiredFieldRelaxation` to `true` is equivalent to setting `autoUpdateSchemas` to `true` in older (pre-2.0.0) versions of this connector. This should only be enabled for Schema Registry-based inputs: Avro, Protobuf, or JSON Schema (JSON_SR). Table schema updates are not supported for JSON input. If you set `allowSchemaUnionization` to `false` and `allowNewBigQueryFields` and `allowBigQueryRequiredFieldRelaxation` to `true` if BigQuery raises a schema validation exception or a table doesn’t exist when a writing a batch, the connector will try to remediate by required field relaxation and/or adding new fields. If `allowSchemaUnionization`, `allowNewBigQueryFields`, and `allowBigQueryRequiredFieldRelaxation` are `true`, the connector will create or update tables with a schema whose fields are a union of the existing table schema’s fields and the ones present in all of the records of the current batch. The key difference is that with unionization disabled, new record schemas have to be a superset of the table schema in BigQuery. In general when enabled, `allowSchemaUnionization` is useful to make things work. For instance, if you’d like to remove fields from data upstream, the updated schemas still work in the connector. Similarly it is useful when different tasks see records whose schemas contain different fields that are not in the table. However note with caution that if `allowSchemaUnionization` is set and some bad records are in the topic, the BigQuery schema may be permanently changed. This presents two issues: first, since BigQuery doesn’t allow columns to be dropped from tables, they’ll add unnecessary noise to the schema. Second, since BigQuery doesn’t allow column types to be modified, they could completely break pipelines down the road where well-behaved records have schemas whose field names overlap with the accidentally-added columns in the table, but use a different type. * Type: boolean * Default: false * Importance: medium `kafkaDataFieldName` : The Kafka data field name. The default value is null, which means the Kafka Data field will not be included. * Type: string * Default: null * Importance: low `kafkaKeyFieldName` : The Kafka key field name. The default value is null, which means the Kafka Key field will not be included. * Type: string * Default: null * Importance: low `topic2TableMap` : Map of topics to tables (optional). Format: comma-separated tuples, e.g. :,:,.. Note that topic name should not be modified using regex SMT while using this option. Also note that SANITIZE_TOPICS_CONFIG would be ignored if this config is set. Lastly, if the topic2table map doesn’t contain the topic for a record, a table with the same name as the topic name would be created. * Type: string * Default: “” * Importance: low # Hybrid Deployment of Confluent Platform and Confluent Cloud using Confluent for Kubernetes Confluent for Kubernetes (CFK) provides cloud-native automation for deploying and managing Confluent in many hybrid scenarios. When you are deploying Confluent Platform components to be connected to Confluent Cloud, provide the basic configuration required in the Confluent Platform component CRs: * The Confluent Cloud component endpoints * The Confluent Cloud key and password in the format that each respective Confluent Cloud component requires for authentication credentials See the example GitHub scenarios listed below for details. There might be additional information required, such as TLS certificates, depending on your deployment settings. Refer to the following configuration examples of the hybrid deployment of Confluent Platform connecting to Confluent Cloud: * [CFK managed Connectors, ksqlDB, and REST Proxy against a Kafka and a Schema Registry in the Confluent Cloud](https://github.com/confluentinc/confluent-kubernetes-examples/tree/master/hybrid/ccloud-integration) * [CFK managed Connect cluster connected to Confluent Cloud, installing and managing the JDBC source connector plugin through the declarative Connector CR](https://github.com/confluentinc/confluent-kubernetes-examples/tree/master/hybrid/ccloud-connect-confluent-hub) * [CFK managed Replicator cloning topics from a source Confluent Cloud cluster to a destination Confluent Cloud cluster and CFK managed Control Center monitoring the end to end flow](https://github.com/confluentinc/confluent-kubernetes-examples/tree/master/hybrid/replicator-cloud2cloud) * [CFK managed Replicator cloning topics from a source Confluent Cloud cluster to a CFK managed destination cluster](https://github.com/confluentinc/confluent-kubernetes-examples/tree/master/hybrid/replicator-source-ccloud-destCFK-tls) * [Cluster Linked Confluent Platform to Confluent Cloud](https://github.com/confluentinc/confluent-kubernetes-examples/tree/master/hybrid/clusterlink) # Manage Confluent Platform with Confluent for Kubernetes * [Overview](co-manage-overview.md) * [Manage Flink](co-manage-flink.md) * [Manage Kafka Admin REST Class](co-manage-rest-api.md) * [Manage Kafka Topics](co-manage-topics.md) * [Manage Schemas](co-manage-schemas-index.md) * [Manage Connectors](co-manage-connectors.md) * [Scale Clusters](co-scale-cluster.md) * [Scale Storage](co-scale-storage.md) * [Link Kafka Clusters](co-link-clusters.md) * [Manage Security](co-manage-security.md) * [Restart Confluent Components](co-roll-cluster.md) * [Delete Confluent Deployment](co-delete-deployment.md) * [Manage Confluent Cloud](co-manage-ccloud.md) ## Confluent APIs The following links open the associated Confluent API docs. - [Confluent REST Proxy](../kafka-rest/index.md#kafkarest-intro) - [Connect REST API](../connect/references/restapi.md#connect-userguide-rest) - [Flink REST API](../flink/clients-api/rest.md#af-rest-api) - [Schema Registry API](../schema-registry/develop/api.md#schemaregistry-api) - [ksqlDB REST API](../ksqldb/developer-guide/ksqldb-rest-api/overview.md#ksqldb-rest-api) - [Metadata API](../security/authorization/rbac/mds-api.md#mds-api) ## Offset management After the consumer receives its assignment from the coordinator, it must determine the initial position for each assigned partition. When the group is first created, before any messages have been consumed, the position is set according to a configurable offset reset policy (`auto.offset.reset`). Typically, consumption starts either at the earliest offset or the latest offset. As a consumer in the group reads messages from the partitions assigned by the coordinator, it must commit the offsets corresponding to the messages it has read. If the consumer crashes or is shut down, its partitions will be re-assigned to another member, which will begin consumption from the last committed offset of each partition. If the consumer crashes before any offset has been committed, then the consumer which takes over its partitions will use the reset policy. The offset commit policy is crucial to providing the message delivery guarantees needed by your application. By default, the consumer is configured to use an automatic commit policy, which triggers a commit on a periodic interval. The consumer also supports a commit API which can be used for manual offset management. Correct offset management is crucial because it affects [delivery semantics](/platform/current/streams/concepts.html#streams-concepts-processing-guarantees). By default, the consumer is configured to auto-commit offsets. The `auto.commit.offset.interval` property sets the upper time bound of the commit interval. Using auto-commit offsets can give you “at-least-once” delivery, but you must consume all data returned from a `ConsumerRecords poll(Duration timeout)` call before any subsequent `poll` calls, or before closing the consumer. To explain further; when auto-commit is enabled, every time the `poll` method is called and data is fetched, the consumer is ready to automatically commit the offsets of messages that have been returned by the poll. If the processing of these messages is not completed before the next auto-commit interval, there’s a risk of losing the message’s progress if the consumer crashes or is otherwise restarted. In this case, when the consumer restarts, it will begin consuming from the last committed offset. When this happens, the last committed position can be as old as the auto-commit interval. Any messages that have arrived since the last commit are read again. If you want to reduce the window for duplicates, you can reduce the auto-commit interval, but some users may want even finer control over offsets. The consumer therefore supports a commit API which gives you full control over offsets. Note that when you use the commit API directly, you should first disable auto-commit in the configuration by setting the `enable.auto.commit` property to `false`. Each call to the commit API results in an offset commit request being sent to the broker. Using the synchronous API, the consumer is blocked until that request returns successfully. This may reduce overall throughput since the consumer might otherwise be able to process records while that commit is pending. One way to deal with this is to increase the amount of data that is returned when polling. The consumer has a configuration setting `fetch.min.bytes` which controls how much data is returned in each fetch. The broker will hold on to the fetch until enough data is available (or `fetch.max.wait.ms` expires). The tradeoff, however, is that this also increases the amount of duplicates that have to be dealt with in a worst-case failure. A second option is to use asynchronous commits. Instead of waiting for the request to complete, the consumer can send the request and return immediately by using asynchronous commits. So if it helps performance, why not always use asynchronous commits? The main reason is that the consumer does not retry the request if the commit fails. This is something that committing synchronously gives you for free; it will will retry indefinitely until the commit succeeds or an unrecoverable error is encountered. The problem with asynchronous commits is dealing with commit ordering. By the time the consumer finds out that a commit has failed, you may already have processed the next batch of messages and even sent the next commit. In this case, a retry of the old commit could cause duplicate consumption. Instead of complicating the consumer internals to try and handle this problem in a sane way, the API gives you a callback which is invoked when the commit either succeeds or fails. If you like, you can use this callback to retry the commit, but you will have to deal with the same reordering problem. Offset commit failures are merely annoying if the following commits succeed since they won’t actually result in duplicate reads. However, if the last commit fails before a rebalance occurs or before the consumer is shut down, then offsets will be reset to the last commit and you will likely see duplicates. A common pattern is therefore to combine async commits in the poll loop with sync commits on rebalances or shut down. Committing on close is straightforward, but you need a way to hook into rebalances. Each rebalance has two phases: partition revocation and partition assignment. The revocation method is always called before a rebalance and is the last chance to commit offsets before the partitions are re-assigned. The assignment method is always called after the rebalance and can be used to set the initial position of the assigned partitions. In this case, the revocation hook is used to commit the current offsets synchronously. In general, asynchronous commits should be considered less safe than synchronous commits. Consecutive commit failures before a crash will result in increased duplicate processing. You can mitigate this danger by adding logic to handle commit failures in the callback or by mixing occasional synchronous commits, but you shouldn’t add too much complexity unless testing shows it is necessary. If you need more reliability, synchronous commits are there for you, and you can still scale up by increasing the number of topic partitions and the number of consumers in the group. But if you just want to maximize throughput and you’re willing to accept some increase in the number of duplicates, then asynchronous commits may be a good option. A somewhat obvious point, but one that’s worth making is that asynchronous commits only make sense for “at least once” message delivery. To get “at most once,” you need to know if the commit succeeded before consuming the message. This implies a synchronous commit unless you have the ability to “unread” a message after you find that the commit failed. In the examples, we show several detailed examples of the commit API and discuss the tradeoffs in terms of performance and reliability. When writing to an external system, the consumer’s position must be coordinated with what is stored as output. That is why the consumer stores its offset in the same place as its output. For example, a connector populates data in HDFS along with the offsets of the data it reads so that it is guaranteed that either data and offsets are both updated, or neither is. A similar pattern is followed for many other data systems that require these stronger semantics, and for which the messages do not have a primary key to allow for deduplication. This is how Kafka supports [exactly-once processing](/platform/current/streams/concepts.html#streams-concepts-processing-guarantees) in Kafka Streams, and the transactional producer or consumer can be used generally to provide exactly-once delivery when transferring and processing data between Kafka topics. Otherwise, Kafka guarantees at-least-once delivery by default, and you can implement at-most-once delivery by disabling retries on the producer and committing offsets in the consumer prior to processing a batch of messages. ## Supported Confluent Platform features for Kafka clients The following tables describes the client support for various Confluent Platform features. | Feature | C/C++ | Go | Java | .NET | Python | |-------------------------------------|---------|------|--------|--------|----------| | Admin API | Yes | Yes | Yes | Yes | Yes | | Control Center metrics integration | Yes | Yes | Yes | Yes | Yes | | Custom partitioner | Yes | No | Yes | No | No | | Exactly Once Semantics | Yes | Yes | Yes | Yes | Yes | | Idempotent Producer | Yes | Yes | Yes | Yes | Yes | | Kafka Streams | No | No | Yes | No | No | | Record Headers | Yes | Yes | Yes | Yes | Yes | | SASL Kerberos/GSSAPI | Yes | Yes | Yes | Yes | Yes | | SASL PLAIN | Yes | Yes | Yes | Yes | Yes | | SASL SCRAM | Yes | Yes | Yes | Yes | Yes | | SASL OAUTHBEARER | Yes | Yes | Yes | Yes | Yes | | Simplified installation | Yes | Yes | Yes | Yes | Yes | | Schema Registry | Yes | Yes | Yes | Yes | Yes | | Topic Metadata API | Yes | Yes | Yes | Yes | Yes | ### Enter connection details In the **Create a new connection** page, enter the details for your Kafka cluster. 1. In the **General** section, enter the name of the connection and select the type of connection. - **Connection Name:** An easy-to-remember name to display in the **Resources** view. - **Connection Type:** In the dropdown, choose one of the following values: - Apache Kafka® - Confluent Cloud - Confluent Platform - Warpstream - Other 2. In the **Kafka Cluster** section, enter the bootstrap server and authentication details. Confluent for VS Code supports authenticating to Kafka with most of the commonly used SASL authentication mechanisms. - **Bootstrap Server(s):** One or more `host:port` pairs to use for establishing the initial connection. For more than one server, use a comma-separated list. - **Authentication Type:** In the dropdown, choose one of the following values: - Username & Password (SASL/PLAIN) - API Credentials (SASL/PLAIN) - SASL/SCRAM (supports both `SCRAM-SHA-256` and `SCRAM-SHA-512`) - SASL/OAUTHBEARER - Kerberos (SASL/GSSAPI) #### NOTE To use Mutual TLS (mTLS) authentication, expand the **TLS Configuration** section and enter **Key Store** and **Trust Store** details. - **Verify Server Hostname:** Enable verification of the Kafka/Schema Registry host name matching the Distinguished Name (DN) in the broker’s certificate. - **Key Store Configuration:** Certificate used by Kafka/Schema Registry to authenticate the client. This is used to configure mutual TLS (mTLS) authentication. - **Path:** The path of the Key Store file. - **Password:** The store password for the Key Store file. Key Store password is not supported for PEM format. - **Key Password:** The password of the private key in the Key Store file. - **Type:** The file format of the Key Store file. Choose from PEM, PKCS12, or JKS. - **Trust Store Configuration:** Certificates for verifying SSL/TLS connections to Kafka/Schema Registry. This is required if Kafka/Schema Registry use a self-signed or a non-public Certificate Authority (CA). - **Path:** The path of the Trust Store file. - **Password:** The password for the Trust Store file. If a password is not set, the configured Trust Store file is used, but integrity checking of the Trust Store file is disabled. Trust Store password is not supported for PEM format. - **Key Password:** The password of the private key in the Trust Store file. - **Type:** The file format of the Trust Store file. Choose from PEM, PKCS12, or JKS.. Confluent Cloud uses TLS certificates from [Let’s Encrypt](https://letsencrypt.org/), a trusted Certificate Authority (CA). Confluent Cloud does not support self-managed certificates for TLS encryption. For more information, see [Manage TLS Certificates](/cloud/current/cp-component/clients-cloud-config.html#manage-tls-certificates). 3. In the **Schema Registry** section, enter the URL of the Schema Registry to use for serialization. 4. Click the **Authentication Type** dropdown and choose one of the following values: - Username & Password - API Credentials - OAuth To use mutual TLS (mTLS) authentication, expand the **TLS Configuration** section and enter **Key Store** and **Trust Store** details, as shown in the previous step. #### IMPORTANT **Breaking Change:** The Catalogs API has changed from CMF 2.0 to 2.1, since the SQL support is still in Open Preview. In the new version, catalogs and databases are separate resources. A catalog references a Schema Registry instance, and databases (which reference Kafka clusters) are created as separate resources within a catalog. If you’re migrating from CMF 2.0, see the [previous documentation](https://docs.confluent.io/platform/8.0/flink/overview.html) for reference. Note that when you upgrade to CMF 2.1, CMF will automatically migrate your existing catalog objects to the new catalogs and databases format. A core concept of SQL is the table. Tables store data, represented as rows. Users can query and modify the rows of a table by running SQL queries and Data Definition Language (DDL) statements. Most database systems store, manage, and process table data internally. In contrast, Flink SQL is solely a processing engine and not a data store. Flink accesses external data storage systems to read and write data. Catalogs and databases bridge the gap between the SQL engine and external data storage systems, enabling users to access and manipulate data stored in various formats and locations. Confluent Manager for Apache Flink® features built-in Kafka catalogs to connect to Kafka and Schema Registry. A Kafka Database exposes Kafka topics as tables and derives their schema from Schema Registry. **Catalogs and Databases:** A *catalog* contains one or more *databases*. A catalog references a Schema Registry instance, which is used to derive table schemas from topic schemas. Each catalog can have multiple databases. A *database* references a Kafka cluster and contains tables that correspond to the topics in that cluster. Each topic of a Kafka cluster is represented as a TABLE in the database. **Hierarchy:** - CATALOG → references a Schema Registry instance - DATABASE → references a Kafka cluster (contained within a catalog) - TABLE → corresponds to a Kafka topic (contained within a database) Catalogs are accessible from all CMF environments, but there are ways to restrict access to specific catalogs or databases. ### Management and monitoring features Confluent Platform provides several features to supplement Kafka’s Admin API, and built-in JMX monitoring. - [Confluent Control Center](/control-center/current/overview.html), which is a web-based system for managing and monitoring Kafka. It allows you to easily manage Kafka Connect, to create, edit, and manage connections to other systems. Control Center also enables you to monitor data streams from producer to consumer, assuring that every message is delivered, and measuring how long it takes to deliver messages. Using Control Center, you can build a production data pipeline based on Kafka without writing a line of code. - [Health+](../health-plus/index.md#health-plus), also a web-based tool to help ensure the health of your clusters and minimize business disruption with intelligent alerts, monitoring, and proactive support. - [Metrics reporter](../monitor/metrics-reporter.md#metrics-reporter) for collecting various metrics from a Kafka cluster. The metrics are produced to a topic in a Kafka cluster. ### ksqlDB Creates the Physical Plan From the logical plan, the ksqlDB engine creates the physical plan, which is a Kafka Streams DSL application with a schema. The generated code is based on the ksqlDB classes, `SchemaKStream` and `SchemaKTable`: - A ksqlDB stream is rendered as a [SchemaKStream](https://github.com/confluentinc/ksql/blob/master/ksqldb-engine/src/main/java/io/confluent/ksql/structured/SchemaKStream.java) instance, which is a [KStream](https://docs.confluent.io/current/streams/javadocs/org/apache/kafka/streams/kstream/KStream.html) with a [Schema](/platform/current/connect/javadocs/javadoc/org/apache/kafka/connect/data/Schema.html). - A ksqlDB table is rendered as a [SchemaKTable](https://github.com/confluentinc/ksql/blob/master/ksqldb-engine/src/main/java/io/confluent/ksql/structured/SchemaKTable.java) instance, which is a [KTable](https://docs.confluent.io/current/streams/javadocs/org/apache/kafka/streams/kstream/KTable.html) with a [Schema](/platform/current/connect/javadocs/javadoc/org/apache/kafka/connect/data/Schema.html). - Schema awareness is provided by the [SchemaRegistryClient](https://github.com/confluentinc/schema-registry/blob/master/client/src/main/java/io/confluent/kafka/schemaregistry/client/SchemaRegistryClient.java) class. The ksqlDB engine traverses the nodes of the logical plan and emits corresponding Kafka Streams API calls: 1. Define the source – a `SchemaKStream` or `SchemaKTable` with info from the ksqlDB metastore 2. Filter – produces another `SchemaKStream` 3. Project – `select()` method 4. Apply aggregation – Multiple steps: `rekey()`, `groupby()`, and `aggregate()` methods. ksqlDB may re-partition data if it’s not keyed with a GROUP BY phrase. 5. Filter – `filter()` method 6. Project – `select()` method for the result ![Diagram showing how the ksqlDB engine creates a physical plan for a SQL statement](ksqldb/images/ksql-statement-physical-plan.gif) If the DML statement is CREATE STREAM AS SELECT or CREATE TABLE AS SELECT, the result from the generated Kafka Streams application is a persistent query that writes continuously to its output topic until the query is terminated. ## Clients This release updates client libraries with new authentication methods, improved error handling, and more flexible APIs. * [KIP-1139 OAuth support enhancements:](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1139%3A+Add+support+for+OAuth+jwt-bearer+grant+type) Kafka clients now support the `jwt-bearer` grant type for OAuth, in addition to `client_credentials`. This grant type is supported by many identity providers and avoids the need to store secrets in client configuration files. * [KIP-877 Register metrics for plugins and connectors:](https://cwiki.apache.org/confluence/display/KAFKA/KIP-877%3A+Mechanism+for+plugins+and+connectors+to+register+metrics) With KIP-877, your client-side plugins can implement the `Monitorable` interface to register their own metrics. Tags that identify the plugin are automatically injected and the metrics use the `kafka.CLIENT:type=plugins` naming convention, where `CLIENT` is either `producer`, `consumer`, or `admin`. * [KIP-1050 Consistent error handling for transactions:](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1050%3A+Consistent+error+handling+for+Transactions) KIP-1050 groups all transactional errors into five distinct types to ensure consistent error handling across all client SDKs and Producer APIs. The five types are as follows: - Retriable: retry only - Retriable: refresh metadata and retry - Abortable - Application-Recoverable - Invalid-Configuration * [KIP-1092 Extend Consumer#close for Kafka Streams:](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=321719077) KIP-1092 adds a new `Consumer.close(CloseOptions)` method. This new method lets Kafka Streams control whether a consumer explicitly leaves its group on shutdown, which gives you finer control over rebalances. The `Consumer.close(Duration)` method is now deprecated. * [KIP-1142 List non-existent groups with dynamic configurations:](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1142%3A+Allow+to+list+non-existent+group+which+has+dynamic+config) KIP-1142 enables the `ListConfigResources` API to retrieve configurations for non-existent consumer groups that have dynamic configurations defined. * **UAMI support**: You can now configure your client to use an Azure User-Assigned Managed Identity (UAMI) to authenticate with an IdP like Microsoft Entra ID. This feature uses Azure’s native identity management to fetch tokens automatically, which eliminates the need to manage static client IDs and secrets. If you are a Java client user, import the latest `kafka-client-plugin` Maven artifact. If you use Confluent-provided non-Java clients, you can use this feature with the latest version of your client. For more information, see [Configure Azure User Assigned Managed Identity OAuth for Confluent Platform](../security/authentication/sasl/oauthbearer/uami.md#oauth-uami-share). #### Next Steps - If you are extending to cloud hybrid with continuous migration from a primary self-managed cluster to the cloud, the migration is complete. - If this is a one-time migration to Confluent Cloud, the next step is to use [Replicator](../../multi-dc-deployments/replicator/replicator-quickstart.md#replicator-quickstart) to migrate your topics to the cloud cluster. - For information on how to manage schemas and storage space in Confluent Cloud through the REST API, see [Manage Schemas in Confluent Cloud](/cloud/current/sr/index.html). - Looking for a guide on how to configure and use Schema Registry in Confluent Cloud? See [Quick Start for Schema Management on Confluent Cloud](/cloud/current/get-started/schema-registry.html) and [Quick Start for Apache Kafka using Confluent Cloud](/cloud/current/get-started/index.html). ### How it Works When a client communicates to the Schema Registry HTTPS endpoint, Schema Registry passes the client credentials to Metadata Service (MDS) for authentication. MDS is a REST layer on the Kafka broker within Confluent Server, and it integrates with LDAP to authenticate end users on behalf of Schema Registry and other Confluent Platform services such as Connect, Confluent Control Center, and ksqlDB. As shown in [Scripted Confluent Platform Demo](../../tutorials/cp-demo/index.md#cp-demo), clients must have predefined LDAP entries. Once a client is authenticated, you must enforce that only authorized entities have access to the permitted resources. You can use [ACLs](../../confluent-security-plugins/schema-registry/authorization/sracl_authorizer.md#confluentsecurityplugins-sracl-authorizer), RBAC, or both to do so. While ACLs and RBAC can be used together or independently, RBAC is the preferred solution as it provides finer-grained authorization and a unified method for managing access across Confluent Platform. The combined authentication and authorization workflow for a Kafka client connecting to Schema Registry is shown in the diagram below. ![image](images/sr-rbac-rest-api-request.png) # Use Multi-Protocol Authentication in Confluent Platform Confluent Platform clusters support multi-protocol authentication, allowing two or more authentication protocols to be configured and used simultaneously for Confluent Platform services in your cluster. Kafka supports SASL/PLAIN, SASL/PLAIN with LDAP server, SASL/OAUTHBEARER (OAuth 2.0), SASL/GSSAPI (Kerberos), SASL/SCRAM, and mutual TLS (mTLS). All other Confluent Platform services support only OAuth 2.0, mutual TLS, and HTTP Basic Authentication. Typically, one authentication protocol is used for a given Confluent Platform service in a deployment. However, there are scenarios where supporting multiple protocols simultaneously can be beneficial. You can use Confluent Platform services to support multiple protocols for authenticating incoming client requests in certain scenarios. With multi-protocol authentication, a Confluent Platform service can be configured to authenticate clients that use OAuth 2.0, mTLS certificates, or HTTP Basic Authentication. You are not constrained to use only one protocol. * **Transitioning from HTTP Basic Authentication to OAuth 2.0**: If an application is migrating from HTTP Basic Authentication to OAuth 2.0, supporting both temporarily allows a smooth transition for clients. Existing clients can continue using HTTP Basic Authentication while new clients adopt OAuth 2.0. * **Supporting diverse clients**: Some legacy clients may only support HTTP Basic Authentication, while newer clients can leverage OAuth 2.0. Supporting both ensures the application is accessible to a wider range of clients. * **Providing options**: Offering a choice between simpler HTTP Basic Authentication for internal or trusted clients and more secure OAuth 2.0 for external clients can be appropriate in some architectures. * **Different internal and external communication**: If a Confluent Platform cluster needs to authenticate with mutual TLS for all internal platform communications (for example, Schema Registry to Confluent Server, Connect to Schema Registry, Confluent Server to Schema Registry, or Connect to Confluent Server) and use OAuth 2.0 to authenticate client applications to CP services, you can configure use multi-protocol authentication to support your requirements. However, there are important considerations and best practices to keep in mind: * HTTP Basic Authentication should only be used over HTTPS to protect credentials. Because credentials are sent with every request, it is less secure than OAuth 2.0. * OAuth 2.0 is the recommended modern standard for authorization. It provides better security through tokens and allows for [granular access control](../../../_glossary.md#term-granularity). * The implementation must be carefully designed to avoid conflicts between the two authentication methods. The order of authentication filters and proper configuration is critical. * Long-term, fully migrating to OAuth 2.0 and phasing out HTTP Basic Authentication lets you reduce complexity and improve overall security. In summary, while supporting multiple authentication methods simultaneously can be justified in some cases, it adds complexity. The recommendation is to prefer OAuth 2.0 as the more modern, secure protocol and only introduce HTTP Basic Authentication support thoughtfully for legacy compatibility when necessary. A clear roadmap to eventually standardize on OAuth 2.0 should be part of the plan. ## Registering clusters You can use either [curl commands](#register-clusters-curl) or the Confluent Platform [CLI](https://docs.confluent.io/confluent-cli/current/command-reference/cluster/confluent_cluster_register.html) to register clusters. When registering a Confluent Platform cluster in the cluster registry, you must specify following information: Cluster name : The new name of the Confluent Platform cluster to be used in RBAC role bindings and centralized audit logs. Cluster ID : Refer to [View a cluster ID](authorization/rbac/rbac-get-cluster-ids.md#view-cluster-ids) if you need to locate the cluster ID. Host name and port number : The host and ports defined for a cluster should only include ports that support [RBAC token authentication](authentication/sasl/oauthbearer/overview.md#rbac-token-auth). For example, in Confluent Platform clusters, do not use the interbroker port or external Kerberos or mTLS ports. This is most important when using the [Confluent Metadata API Reference for Confluent Platform](authorization/rbac/mds-api.md#mds-api) because it leverages port information when pushing configuration updates out to known Confluent Platform clusters. Protocol used by the hosts and ports : The protocol should be SASL_SSL for Confluent Platform clusters (or SASL_PLAINTEXT for non-production Confluent Platform clusters), and HTTP or HTTPS for Connect, ksqlDB, and Schema Registry clusters. Be sure to grant the appropriate [RBAC roles](authorization/rbac/rbac-predefined-roles.md#rbac-predefined-roles) (ClusterAdmin and SystemAdmin) on newly-registered clusters so that users can access and use them in other configurations. Also be sure to grant the AuditAdmin role to principals who will be administering the centralized audit log configuration. For details about granting roles on registered clusters, see [Configuring role bindings for registered clusters](#cluster-registry-rolebinding). # Configure MDS to Manage Centralized Audit Logs You can use [Centralized audit logging](audit-logs-cli-config.md#audit-log-cli-config) to dynamically update an audit log configuration. Changes made through the Confluent CLI are pushed from the MDS (metadata service) out to all [registered clusters](../../cluster-registry.md#cluster-registry), allowing for the centralized management of the audit log configuration, and assurance that all registered clusters publish their audit log events to the same destination Kafka cluster. The MDS uses an admin client to connect to the destination Kafka cluster and inspect, create, and alter destination topics in response to certain API requests. Before you can use centralized audit logging, you must configure one of your Kafka clusters to run the metadata service (MDS), which provides API endpoints to register a list of the Kafka clusters in your organization and to centrally manage the audit log configurations of those clusters. This audit log configuration API pushes out to all registered clusters the rules governing which events are captured and where they are sent. It also creates missing destination topics, and keeps the retention time policies of the destination topics in sync with the audit log configuration policy. Until configured otherwise, the MDS operates on the assumption that it is a lone cluster, using its own internal admin client to configure destination topics on itself, and leaving the bootstrap servers unspecified so that audit log destination topics are on the same cluster. Because the default behavior requires less configuration, it is useful for the initial setup in a development environment. In a production setting, you should have all of the Kafka clusters publish their audit logs to a single, central destination cluster. The configuration file for each managed cluster must include the destination cluster’s connection and credential information. However, it should disable auto-creation of destination topics, and leave `confluent.security.event.router.config` unspecified. The following sections explain how to configure Kafka clusters and the MDS to manage centralized audit logs. ## Security deployment profiles The following table defines the different security options that make up a security deployment profile for Confluent Platform clusters. The security options that make up a deployment profile are: - **Authentication** - **Kafka client**: Options for Kafka clients authenticating to Confluent Server brokers - **Kafka client to non-Kafka component**: Options for Kafka clients authenticating to Schema Registry, REST Proxy, and ksqlDB - **Service-to-service**: Options for authentication between any two Confluent Platform services, for example Schema Registry to Confluent Server, Connect to Schema Registry, and so forth - **User**: Options for users authenticating using Confluent Control Center or Confluent CLI - **Authorization**: Options for controlling access to resources in your Confluent Platform cluster - **Encryption**: Options for encrypting data in motion (or data in transit) - **Identity provider protocols**: Options for integrating with external identity providers
Profile Authentication Authorization Encryption Identity provider protocols
Kafka clients Client to non-Kafka component Service to service User
1 mTLS or SASL with one of:
  • PLAIN
  • GSSAPI
  • SCRAM
mTLS mTLS HTTP Basic Authentication ACLs TLS
2 mTLS or SASL with one of:
  • GSSAPI
  • SCRAM
GSSAPI or SCRAM mTLS HTTP Basic Authentication ACLs TLS
3 mTLS or SASL with one of:
  • PLAIN
  • PLAIN with LDAP server
  • GSSAPI
  • SCRAM
HTTP Basic Authentication OAuthBearer (powered by LDAP and MDS-issued tokens) HTTP Basic Authentication RBAC TLS LDAP
4 mTLS or SASL with one of:
  • PLAIN
  • PLAIN with LDAP server
  • GSSAPI
  • SCRAM
HTTP Basic Authentication OAuthBearer powered by LDAP and MDS-issued tokens OIDC (SSO) RBAC TLS Both OIDC and LDAP
5 mTLS or SASL with one of:
  • OAUTHBEARER with IdP-issued tokens
  • PLAIN
  • PLAIN with LDAP server
  • GSSAPI
  • SCRAM
HTTP Basic Authentication OAuthBearer with IdP-issued tokens OIDC (SSO) RBAC TLS Both OIDC and LDAP
6 mTLS or SASL with one of:
  • OAUTHBEARER with IdP-issued tokens
  • PLAIN
  • GSSAPI
  • SCRAM
mTLS mTLS HTTP Basic Authentication ACLs TLS OIDC
7 mTLS or SASL with one of:
  • OAUTHBEARER with IdP-issued tokens
  • PLAIN
  • PLAIN with LDAP server
  • GSSAPI
  • SCRAM
OAuthBearer with IdP-issued tokens or HTTP Basic Authentication OAuthBearer with IdP-issued tokens OIDC (SSO) RBAC TLS Both OIDC and LDAP
8 mTLS or SASL with one of:
  • OAUTHBEARER with IdP-issued tokens
  • PLAIN
  • GSSAPI
  • SCRAM
mTLS or OAuthBearer with IdP-issued tokens mTLS HTTP Basic Authentication ACLs TLS OIDC
9 mTLS or SASL with one of:
  • OAUTHBEARER with IdP-issued tokens
  • PLAIN
  • GSSAPI
  • SCRAM
OAuthBearer with IdP-issued tokens OAuth OIDC (SSO) RBAC TLS OIDC
10 mTLS or SASL with one of:
  • OAUTHBEARER with IdP-issued tokens
  • PLAIN
  • GSSAPI
  • SCRAM
OAuthBearer with IdP-issued tokens mTLS OIDC (SSO) RBAC TLS OIDC
11 mTLS or SASL with one of:
  • OAUTHBEARER with IdP-issued tokens
  • PLAIN
  • GSSAPI
  • SCRAM
mTLS mTLS OIDC (SSO) RBAC TLS OIDC
12 mTLS or SASL with one of:
  • OAUTHBEARER with IdP-issued tokens
  • PLAIN
  • GSSAPI
  • SCRAM
mTLS mTLS Basic username, password (file-based user identity management) RBAC TLS Not applicable
You can also deploy Apache Flink® within the Confluent Platform. Deployments that include Flink make use of mTLS authentication with one of the following client to non-Kafka component: - PLAIN - GSSAPI - SCRAM Service to service and user authentication also use mTLS in Apache Flink® deployments. Authorization uses HTTP basic authentication with encryption through ACLS. TLS is the supported provider protocol for deployments that use Flink. ## Background and context Here we recap some fundamental building blocks that will be useful for the rest of this section. This is mostly a summary of aspects of the Kafka Streams [architecture](architecture.md#streams-architecture) that impact performance. You might want to refresh your understanding of sizing just for Kafka first, by revisiting notes on [Production Deployment](../kafka/deployment.md#cp-production-recommendations) and how to [choose the number of topics/partitions](https://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/). **Kafka Streams uses Kafka’s producer and consumer APIs**: under the hood a Kafka Streams application has Kafka producers and consumers, just like a typical Kafka client. So when it will come time to add more Kafka Streams instances, think of that as adding more producers and consumers to your app. **Unit of parallelism is a task**: In Kafka Streams the basic unit of parallelism is a stream task. Think of a task as consuming from a single Kafka partition per topic and then processing those records through a graph of processor nodes. If the processing is stateful, then the task writes to state stores and produces back to one or more Kafka partitions. To improve the potential parallelism, there is just one tuning knob: choose a higher number of [partitions](https://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/) for your topics. That will automatically lead to a proportional increase in number of tasks. **Task placement matters**: Increasing the number of partitions/tasks increases the potential for parallelism, but we must still decide where to place those tasks physically. There are two options: scale up, by putting all the tasks on a single server. This is useful when the app is CPU bound and one server has a lot of CPUs. You can do this by having an app with lots of threads (`num.stream.threads` config option, with a default of 1) or equivalently have clones of the app running on the same machine, each with 1 thread. There should not be any performance difference between the two. The second option is to scale out, by spreading the tasks across more than one machine. This is useful when the app is network, memory or disk bound, or if a single server has a limited number of CPU cores. **Load balancing is automatic**: Once you decide how many partitions you need and where to start Kafka Streams instances, the rest is automatic. The load is balanced across the tasks with no user involvement because of the consumer group management [feature](https://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0-9-consumer-client/) that is part of Kafka. Kafka Streams benefits from it, because, as we mentioned earlier, it is a client of Kafka too in this context. Armed with this information, let’s look at a couple of key scenarios. ## Configuration prerequisites Before you configure hybrid setup, ensure the following: * Your Confluent Cloud organization has the [Advanced Governance](https://docs.confluent.io/cloud/current/stream-governance/packages.html#stream-gov-packages) package. * You have [registered at least one Confluent Platform cluster with Confluent Cloud](https://docs.confluent.io/cloud/current/usm/register/overview.html) so it is visible in the Confluent Cloud UI. * Your Schema Registry on Confluent Platform has its mode set to `READWRITE` at the global level `:__GLOBAL:` context. Ensure no mode overrides are set at the subject or custom context level. You can remove any specific overrides by using the `DELETE` mode API against all relevant subjects and custom contexts. * You can perform this configuration using Confluent for Kubernetes, Ansible Playbooks for Confluent Platform or by making calls directly to the Confluent REST API. The subsequent workflows provide separate instructions for each approach. * You have the necessary credentials and endpoints for both your Confluent Platform and Confluent Cloud Schema Registry instances. You can obtain the private endpoints by logging into Confluent Cloud Console and navigating to **Environment** > **Schema Registry** > **Endpoints**. * Create an API key that uses a service account assigned the `DataSteward` role. This ensures the service account has the permissions that it needs to manage schemas and enforce governance policies across both Confluent Platform and Confluent Cloud environments. For instructions, see [Add an API key](https://docs.confluent.io/cloud/current/security/authenticate/workload-identities/service-accounts/api-keys/manage-api-keys.html#add-an-api-key). #### NOTE Before exporting schemas from Confluent Platform to Confluent Cloud, you must run the [schema-compatibility-check script](https://github.com/confluentinc/schema-registry/blob/master/bin/schema-compatibility-check) if the target Confluent Cloud environment already contains existing schemas. This script verifies that the schemas in both environments are compatible. This export process is only recommended when your Confluent Cloud and Confluent Platform environments are replicas, or when your Confluent Cloud environment is a direct subset of your Confluent Platform environment. ## Enable RBAC with parallel restart of Confluent Platform 1. Set the following and provide the required properties for RBAC in your hosts inventory file: ```yaml rbac_enabled: true ``` For a list of all the RBAC-related properties, refer to [Role-based access control](ansible-authorize.md#ansible-authz-rbac). Below is an example snippet: ```yaml all: vars: ssl_enabled: true rbac_enabled: true mds_ssl_client_authentication: required # super user credentials for bootstrapping RBAC within Confluent Platform mds_super_user: mds mds_super_user_password: password # LDAP users for Confluent Platform components kafka_broker_ldap_user: kafka_broker kafka_broker_ldap_password: password schema_registry_ldap_user: schema_registry schema_registry_ldap_password: password kafka_connect_ldap_user: connect_worker kafka_connect_ldap_password: password ksql_ldap_user: ksql ksql_ldap_password: password kafka_rest_ldap_user: rest_proxy kafka_rest_ldap_password: password control_center_next_gen_ldap_user: control_center control_center_next_gen_ldap_password: password kafka_broker: vars: kafka_broker_custom_properties: ldap.java.naming.factory.initial: com.sun.jndi.ldap.LdapCtxFactory ldap.com.sun.jndi.ldap.read.timeout: 3000 ldap.java.naming.provider.url: ldap://ldap1:389 ldap.java.naming.security.principal: uid=mds,OU=rbac,DC=example,DC=com ldap.java.naming.security.credentials: password ldap.java.naming.security.authentication: simple ldap.user.search.base: OU=rbac,DC=example,DC=com ldap.group.search.base: OU=rbac,DC=example,DC=com ldap.user.name.attribute: uid ldap.user.memberof.attribute.pattern: CN=(.*),OU=rbac,DC=example,DC=com ldap.group.name.attribute: cn ldap.group.member.attribute.pattern: CN=(.*),OU=rbac,DC=example,DC=com ldap.user.object.class: account ``` 2. Run the `confluent.platform.all` playbook: ```bash ansible-playbook -i confluent.platform.all \ --skip-tags package \ -e deployment_strategy=parallel ``` Include the `--skip-tags package` option to skip the package installation tasks and to ensure no upgrade happens. The option also speeds up the reconfiguration process. ### Produce Records 1. Build the producer and consumer binaries: ```bash cargo build ``` You should see: ```text Compiling rust_kafka_client_example v0.1.0 (/path/to/repo/examples/clients/cloud/rust) Finished dev [unoptimized + debuginfo] target(s) in 2.85s ``` 2. Run the producer, passing in arguments for: - the local file with configuration parameters to connect to your Kafka cluster - the topic name ```bash ./target/debug/producer --config $HOME/.confluent/librdkafka.config --topic test1 ``` 3. Verify the producer sent all the messages. You should see: ```text Preparing to produce record: alice 0 Preparing to produce record: alice 1 Preparing to produce record: alice 2 Preparing to produce record: alice 3 Preparing to produce record: alice 4 Preparing to produce record: alice 5 Preparing to produce record: alice 6 Preparing to produce record: alice 7 Preparing to produce record: alice 8 Successfully produced record to topic test1 partition [5] @ offset 117 Successfully produced record to topic test1 partition [5] @ offset 118 Successfully produced record to topic test1 partition [5] @ offset 119 Successfully produced record to topic test1 partition [5] @ offset 120 Successfully produced record to topic test1 partition [5] @ offset 121 Successfully produced record to topic test1 partition [5] @ offset 122 Successfully produced record to topic test1 partition [5] @ offset 123 Successfully produced record to topic test1 partition [5] @ offset 124 Successfully produced record to topic test1 partition [5] @ offset 125 ``` 4. View the [producer code](https://github.com/confluentinc/examples/tree/latest/clients/cloud/rust/src/producer.rs). #### Oracle XStream CDC Source connector The [Source connector service account](#cloud-service-account-source-connectors) section provides basic ACL entries for source connector service accounts. Oracle XStream CDC Source connector require additional ACL entries. Add the following ACL entries for Oracle XStream CDC Source connector: * ACLs to create and write to change event topics prefixed with ``. Use the following commands to set these ACLs: ```none confluent kafka acl create --allow --service-account "" \ --operations create --prefix --topic "" ``` ```none confluent kafka acl create --allow --service-account "" \ --operations write --prefix --topic "" ``` * ACLs to describe configurations at the cluster scope level. Use the following commands to set these ACLs: ```none confluent kafka acl create --allow --service-account "" \ --cluster-scope --operations describe ``` ```none confluent kafka acl create --allow --service-account "" \ --cluster-scope --operations describe_configs ``` * ACLs to read, create, and write to schema history topics prefixed with `__orcl-schema-changes..lcc-`. Use the following commands to set these ACLs: ```none confluent kafka acl create --allow --service-account "" \ --operations read --prefix --topic "__orcl-schema-changes..lcc-" ``` ```none confluent kafka acl create --allow --service-account "" \ --operations create --prefix --topic "__orcl-schema-changes..lcc-" ``` ```none confluent kafka acl create --allow --service-account "" \ --operations write --prefix --topic "__orcl-schema-changes..lcc-" ``` * ACLs to read schema history consumer group named `-schemahistory`. Use the following commands to set these ACLs: ```none confluent kafka acl create --allow --service-account "" \ --operations "read" --consumer-group "-schemahistory" ``` The following additional ACL entries are required if heartbeats are enabled for the connector using the `heartbeat.interval.ms` configuration property. * ACLs to read, create, and write to heartbeat topics prefixed with `__orcl-heartbeat.lcc-`. Use the following commands to set these ACLs: ```none confluent kafka acl create --allow --service-account "" \ --operations read --prefix --topic "__orcl-heartbeat.lcc-" ``` ```none confluent kafka acl create --allow --service-account "" \ --operations create --prefix --topic "__orcl-heartbeat.lcc-" ``` ```none confluent kafka acl create --allow --service-account "" \ --operations write --prefix --topic "__orcl-heartbeat.lcc-" ``` The following additional ACL entries are required if signaling using Kafka topic is enabled and configured for the connector using the `signal.enabled.channels` and `signal.kafka.topic` configuration properties. * ACLs to read from the signaling topic. Use the following commands to set these ACLs: ```none confluent kafka acl create --allow --service-account "" \ --operations read --topic "" ``` * ACLs to read Kafka signaling consumer group named `kafka-signal`. Use the following commands to set these ACLs: ```none confluent kafka acl create --allow --service-account "" \ --operations "read" --consumer-group "kafka-signal" ``` ## Generate the Delta Configurations 1. Run the script, passing in the configuration file `/tmp/myconfig.properties` you defined above. Reminder: you cannot use the `~/.ccloud/config.json` generated by Confluent Cloud CLI for other Confluent Platform components or clients, which is why you need to manually create your own key=value properties file in the previous section. ```bash ./ccloud-generate-cp-configs.sh /tmp/myconfig.properties ``` 2. Verify that your output resembles: ```bash Confluent Platform Components: delta_configs/schema-registry-ccloud.delta delta_configs/replicator-to-ccloud-producer.delta delta_configs/ksql-server-ccloud.delta delta_configs/ksql-datagen.delta delta_configs/control-center-ccloud.delta delta_configs/connect-ccloud.delta delta_configs/connector-ccloud.delta delta_configs/ak-tools-ccloud.delta Kafka Clients: delta_configs/java_producer_consumer.delta delta_configs/java_streams.delta delta_configs/python.delta delta_configs/dotnet.delta delta_configs/go.delta delta_configs/node.delta delta_configs/cpp.delta delta_configs/env.delta ``` 3. Add the delta configuration output to the respective component’s properties file. Remember that these are the *delta* configurations, not the complete configurations. ## Step 2: Apply the Deduplicate Topic action In the previous step, you created a Flink table that had duplicate rows. In this step, you apply the Deduplicate Topic action to create an output table that has only unique rows. 1. In the navigation menu, click **Data portal**. 2. In the **Data portal** page, click the **Environment** dropdown menu and select the environment for your workspace. 3. In the **Recently created** section, find your **users** topic and click it to open the details pane. 4. Click **Actions**, and in the Actions list, click **Deduplicate topic** to open the **Deduplicate topic** dialog. 5. In the **Fields to deduplicate** dropdown, select **user_id**. Flink uses the deduplication field as the output message key. This means that the output topic’s row key may be different from the input topic’s row key, because the deduplication statement’s DISTRIBUTED BY clause determines the output topic’s key. For this example, the output message key is the `user_id` field. 6. In the **Compute pool** dropdown, select the compute pool you want to use. 7. (Optional) In the **Runtime configuration** section, select **Run with a service account** to run the deduplicate query with a service account principal. Use this option for production queries. #### NOTE The service account you select must have the DeveloperManage and DeveloperWrite roles to create topics, schemas, and run Flink statements. For more information, see [Grant Role-Based Access](../operate-and-deploy/flink-rbac.md#flink-rbac). 8. Click the **Show SQL** toggle to view the statement that the action will run. For this example, the deduplication query depends on the `registertime` field, so you must modify the generated statement to use the `registertime` field as the field to sort on. 9. Click **Open SQL editor** to modify the statement. A Flink workspace opens with the generated statement in the cell. 10. In the cell, replace `$rowtime` with `registertime` in the `ORDER BY` clause. ```sql CREATE TABLE ``.``.`users_deduplicate` ( PRIMARY KEY (`user_id`) NOT ENFORCED ) DISTRIBUTED BY HASH( `user_id` ) WITH ( 'changelog.mode' = 'upsert', 'value.format'='avro-registry', 'key.format'='avro-registry' ) AS SELECT `user_id`, `registertime`, `gender`, `regionid` FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY `user_id` ORDER BY registertime ASC) AS row_num FROM ``.``.`users`) WHERE row_num = 1; ``` 11. Click **Run** to execute the deduplication query. The CREATE TABLE AS SELECT statement creates the `users_deduplicate` table and populates it with rows from the `users` table using a [deduplication query](../reference/queries/deduplication.md#flink-sql-deduplication). 12. When the **Statement status** changes to **Running**, you can query the `users_deduplicate` table. ### On-Premises ```none --role string REQUIRED: Role name of the new role binding. --principal string REQUIRED: Principal type and identifier using "Prefix:ID" format. --kafka-cluster string Kafka cluster ID for the role binding. --schema-registry-cluster string Schema Registry cluster ID for the role binding. --ksql-cluster string ksqlDB cluster ID for the role binding. --connect-cluster string Kafka Connect cluster ID for the role binding. --cmf string Confluent Managed Flink (CMF) ID, which specifies the CMF scope. --flink-environment string Flink environment ID, which specifies the Flink environment scope. --cluster-name string Cluster name to uniquely identify the cluster for role binding listings. --context string CLI context name. --resource string Resource type and identifier using "Prefix:ID" format. --prefix Whether the provided resource name is treated as a prefix pattern. --client-cert-path string Path to client cert to be verified by MDS. Include for mTLS authentication. --client-key-path string Path to client private key, include for mTLS authentication. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ### On-Premises ```none --role string REQUIRED: Role name of the existing role binding. --principal string REQUIRED: Principal type and identifier using "Prefix:ID" format. --force Skip the deletion confirmation prompt. --kafka-cluster string Kafka cluster ID for the role binding. --schema-registry-cluster string Schema Registry cluster ID for the role binding. --ksql-cluster string ksqlDB cluster ID for the role binding. --connect-cluster string Kafka Connect cluster ID for the role binding. --cmf string Confluent Managed Flink (CMF) ID, which specifies the CMF scope. --flink-environment string Flink environment ID, which specifies the Flink environment scope. --cluster-name string Cluster name to uniquely identify the cluster for role binding listings. --context string CLI context name. --resource string Resource type and identifier using "Prefix:ID" format. --prefix Whether the provided resource name is treated as a prefix pattern. --client-cert-path string Path to client cert to be verified by MDS. Include for mTLS authentication. --client-key-path string Path to client private key, include for mTLS authentication. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ### On-Premises ```none --principal string Principal ID, which limits role bindings to this principal. If unspecified, list all principals and role bindings. --current-user List role bindings assigned to the current user. --role string Predefined role assigned to "--principal". If "--principal" is unspecified, list all principals assigned the role. --kafka-cluster string Kafka cluster ID, which specifies the Kafka cluster scope. --schema-registry-cluster string Schema Registry cluster ID, which specifies the Schema Registry cluster scope. --ksql-cluster string ksqlDB cluster ID, which specifies the ksqlDB cluster scope. --connect-cluster string Kafka Connect cluster ID, which specifies the Connect cluster scope. --cmf string Confluent Managed Flink (CMF) ID, which specifies the CMF scope. --flink-environment string Flink environment ID, which specifies the Flink environment scope. --client-cert-path string Path to client cert to be verified by MDS. Include for mTLS authentication. --client-key-path string Path to client private key, include for mTLS authentication. --context string CLI context name. --cluster-name string Cluster name, which specifies the cluster scope. --resource string Resource type and identifier using "Prefix:ID" format. If specified with "--role" and no principals, list all principals and role bindings. --inclusive List role bindings for specified scopes and nested scopes. Otherwise, list role bindings for the specified scopes. If scopes are unspecified, list only organization-scoped role bindings. -o, --output string Specify the output format as "human", "json", or "yaml". (default "human") ``` ### Step 2: Produce and consume with Confluent CLI The following is an example CLI command to produce to `test-topic`: ```text confluent kafka topic produce test-topic \ --protocol SASL_SSL \ --sasl-mechanism PLAIN \ --bootstrap ":19091,:19092" \ --username admin --password secret \ --ca-location scripts/security/snakeoil-ca-1.crt ``` - Specify `--protocol SASL_SSL` for the SASL_SSL/PLAIN authentication. - Specify `--sasl-mechanism PLAIN` is the mechanism used for SASL_SSL protocol. The default is `PLAIN`, so it can be omitted in this scenario. - `--bootstrap` is the list of hosts that the producer/consumer talks to. The list should be the same as what you configured in Step 1. Hosts should be separated by commas. - `--username` and `--password` are the credentials you have set up in the JAAS configuration. They can be passed as flags, or you could wait for CLI to prompt for it. The second option is more secure. - `--ca-location` is the path to the CA certificate verifying the broker’s key, and it’s required for SSL verification. For more information about setting up this flag, refer to [this document](https://github.com/edenhill/librdkafka/blob/master/INTRODUCTION.md#ssl). ## Configure cluster for client monitoring Use the following configurations to add and update the properties file of every Kafka broker in the cluster. Considerations: : - KRaft properties file: `server.properties` - File system location: `/kafka_2.13-3.8.0/config/kraft/server.properties` 1. Add the following configurations to the properties file of every Kafka broker. Telemetry Reporter configurations to add: ```none confluent.telemetry.external.client.metrics.push.enabled=true confluent.telemetry.external.client.metrics.delta.temporality=false confluent.telemetry.external.client.metrics.subscription.interval.ms.list=60000 confluent.telemetry.external.client.metrics.subscription.metrics.list=org.apache.kafka.consumer.fetch.manager.fetch.latency.avg,org.apache.kafka.consumer.connection.creation.total,org.apache.kafka.consumer.fetch.manager.fetch.total,org.apache.kafka.consumer.fetch.manager.bytes.consumed.rate,org.apache.kafka.producer.bufferpool.wait.ratio,org.apache.kafka.producer.record.queue.time.avg,org.apache.kafka.producer.request.latency.avg,org.apache.kafka.producer.produce.throttle.time.avg,org.apache.kafka.producer.connection.creation.total,org.apache.kafka.producer.request.total,org.apache.kafka.producer.topic.byte.rate ``` 2. Update the following configuration in the properties file of every Kafka broker. Telemetry Reporter configurations to update: ```none confluent.telemetry.exporter._c3.metrics.include=io.confluent.kafka.server.request.(?!.*delta).*|io.confluent.kafka.server.socket_server.connections|io.confluent.kafka.server.server.broker.state|io.confluent.kafka.server.replica.manager.leader.count|io.confluent.kafka.server.request.queue.size|io.confluent.kafka.server.broker.topic.failed.produce.requests.rate.1.min|io.confluent.kafka.server.tier.archiver.total.lag|io.confluent.kafka.server.request.total.time.ms.p99|io.confluent.kafka.server.broker.topic.failed.fetch.requests.rate.1.min|io.confluent.kafka.server.broker.topic.total.fetch.requests.rate.1.min|io.confluent.kafka.server.partition.caught.up.replicas.count|io.confluent.kafka.server.partition.observer.replicas.count|io.confluent.kafka.server.tier.tasks.num.partitions.in.error|io.confluent.kafka.server.broker.topic.bytes.out.rate.1.min|io.confluent.kafka.server.request.total.time.ms.p95|io.confluent.kafka.server.controller.active.controller.count|io.confluent.kafka.server.session.expire.listener.zookeeper.disconnects.total|io.confluent.kafka.server.request.total.time.ms.p999|io.confluent.kafka.server.controller.active.broker.count|io.confluent.kafka.server.request.handler.pool.request.handler.avg.idle.percent.rate.1.min|io.confluent.kafka.server.session.expire.listener.zookeeper.disconnects.rate.1.min|io.confluent.kafka.server.controller.unclean.leader.elections.rate.1.min|io.confluent.kafka.server.replica.manager.partition.count|io.confluent.kafka.server.controller.unclean.leader.elections.total|io.confluent.kafka.server.partition.replicas.count|io.confluent.kafka.server.broker.topic.total.produce.requests.rate.1.min|io.confluent.kafka.server.controller.offline.partitions.count|io.confluent.kafka.server.socket.server.network.processor.avg.idle.percent|io.confluent.kafka.server.partition.under.replicated|io.confluent.kafka.server.log.log.start.offset|io.confluent.kafka.server.log.tier.size|io.confluent.kafka.server.log.size|io.confluent.kafka.server.tier.fetcher.bytes.fetched.total|io.confluent.kafka.server.request.total.time.ms.p50|io.confluent.kafka.server.tenant.consumer.lag.offsets|io.confluent.kafka.server.session.expire.listener.zookeeper.expires.rate.1.min|io.confluent.kafka.server.log.log.end.offset|io.confluent.kafka.server.broker.topic.bytes.in.rate.1.min|io.confluent.kafka.server.partition.under.min.isr|io.confluent.kafka.server.partition.in.sync.replicas.count|io.confluent.telemetry.http.exporter.batches.dropped|io.confluent.telemetry.http.exporter.items.total|io.confluent.telemetry.http.exporter.items.succeeded|io.confluent.telemetry.http.exporter.send.time.total.millis|io.confluent.kafka.server.controller.leader.election.rate.(?!.*delta).*|io.confluent.telemetry.http.exporter.batches.failed|org.apache.kafka.consumer.(fetch.manager.fetch.latency.avg|connection.creation.total|fetch.manager.fetch.total|fetch.manager.bytes.consumed.rate)|org.apache.kafka.producer.(bufferpool.wait.ratio|record.queue.time.avg|request.latency.avg|produce.throttle.time.avg|connection.creation.total|request.total|topic.byte.rate) ``` 3. Restart every Kafka broker. ### Connect Configuration The Connect properties file (`/CONFLUENT_HOME/etc/schema-registry/connect-avro-distributed.properties`) must be configured to use the same security protocol as the Kafka broker. For this example, `SASL_PLAINTEXT` is used for the producer, consumer, the producer monitoring interceptor, and the consumer monitoring interceptor. ```bash ## Quick Start In this quick start, you will configure the Data Diode Connector to replicate records in the topic `diode` to the topic `dest_diode`. Start the services with one command using Confluent CLI. ```bash |confluent_start| ``` Next, create two topics - `diode` and `dest_diode`. ```bash ./bin/kafka-topics --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic diode ./bin/kafka-topics --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic dest_diode ``` Next, start the console producer and import a few records to the `diode` topic. ```bash ./bin/kafka-console-producer --broker-list localhost:9092 --topic diode ``` Then, add records (one per line) in the console producer. ```bash silicon resistor transistor capacitor amplifier ``` This publishes five records to the Kafka topic `diode`. Keep the window open. Next, load the Source connector. ```bash ./bin/confluent local load datadiode-source-connector --config ./etc/kafka-connect-datadiode/DataDiodeSourceConnector.properties ``` Your output should resemble the following: ```bash { "name": "datadiode-source-connector", "config": { "connector.class": "io.confluent.connect.diode.source.DataDiodeSourceConnector", "tasks.max": "1", "kafka.topic.prefix": "dest_" "key.converter": "org.apache.kafka.connect.converters.ByteArrayConverter", "value.converter": "org.apache.kafka.connect.converters.ByteArrayConverter", "header.converter": "org.apache.kafka.connect.converters.ByteArrayConverter", "diode.port": "3456", "diode.encryption.password": "supersecretpassword", "diode.encryption.salt": "secretsalt" }, "tasks": [], "type": null } ``` Next, load the Sink connector. ```bash ./bin/confluent local load datadiode-sink-connector --config ./etc/kafka-connect-datadiode/DataDiodeSinkConnector.properties ``` Your output should resemble the following: ```bash { "name": "datadiode-sink-connector", "config": { "connector.class": "io.confluent.connect.diode.sink.DataDiodeSinkConnector", "tasks.max": "1", "topics": "diode", "key.converter": "org.apache.kafka.connect.converters.ByteArrayConverter", "value.converter": "org.apache.kafka.connect.converters.ByteArrayConverter", "header.converter": "org.apache.kafka.connect.converters.ByteArrayConverter", "diode.host": "10.12.13.15", "diode.port": "3456", "diode.encryption.password": "supersecretpassword", "diode.encryption.salt": "secretsalt" }, "tasks": [], "type": null } ``` View the Connect worker log and verify that the connectors started successfully. ```bash confluent local services connect log ``` Finally, check that records are now available in `dest_diode` topic. ```bash ./bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic dest_diode --from-beginning ``` You should see five records in the consumer. If you have the console producer running, you can create additional records. These additional records should be immediately visible in the consumer. ## Quick Start In this quick start, you will configure the Data Diode connector to replicate records in the topic `diode` to the topic `dest_diode`. Start the services with one command using Confluent CLI. ```bash |confluent_start| ``` Next, create two topics - `diode` and `dest_diode`. ```bash ./bin/kafka-topics --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic diode ./bin/kafka-topics --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic dest_diode ``` Next, start the console producer and import a few records to the `diode` topic. ```bash ./bin/kafka-console-producer --broker-list localhost:9092 --topic diode ``` Then, add records (one per line) in the console producer. ```bash silicon resistor transistor capacitor amplifier ``` This publishes five records to the Kafka topic `diode`. Keep the window open. Next, load the Source connector. ```bash ./bin/confluent local load datadiode-source-connector --config ./etc/kafka-connect-datadiode/DataDiodeSourceConnector.properties ``` Your output should resemble the following: ```bash { "name": "datadiode-source-connector", "config": { "connector.class": "io.confluent.connect.diode.source.DataDiodeSourceConnector", "tasks.max": "1", "kafka.topic.prefix": "dest_" "key.converter": "org.apache.kafka.connect.converters.ByteArrayConverter", "value.converter": "org.apache.kafka.connect.converters.ByteArrayConverter", "header.converter": "org.apache.kafka.connect.converters.ByteArrayConverter", "diode.port": "3456", "diode.encryption.password": "supersecretpassword", "diode.encryption.salt": "secretsalt" }, "tasks": [], "type": null } ``` Next, load the Sink connector. ```bash ./bin/confluent local load datadiode-sink-connector --config ./etc/kafka-connect-datadiode/DataDiodeSinkConnector.properties ``` Your output should resemble the following: ```bash { "name": "datadiode-sink-connector", "config": { "connector.class": "io.confluent.connect.diode.sink.DataDiodeSinkConnector", "tasks.max": "1", "topics": "diode", "key.converter": "org.apache.kafka.connect.converters.ByteArrayConverter", "value.converter": "org.apache.kafka.connect.converters.ByteArrayConverter", "header.converter": "org.apache.kafka.connect.converters.ByteArrayConverter", "diode.host": "10.12.13.15", "diode.port": "3456", "diode.encryption.password": "supersecretpassword", "diode.encryption.salt": "secretsalt" }, "tasks": [], "type": null } ``` View the Connect worker log and verify that the connectors started successfully. ```bash confluent local services connect log ``` Finally, check that records are now available in `dest_diode` topic. ```bash ./bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic dest_diode --from-beginning ``` You should see five records in the consumer. If you have the console producer running, you can create additional records. These additional records should be immediately visible in the consumer. ### Property-based example 1. Create a `gcs-source-connector.properties` file with the following contents. This file is included with the connector in `etc/kafka-connect-gcs/gcs-source-connector.properties`. This configuration is used typically along with [standalone workers](/platform/current/connect/concepts.html#standalone-workers).: ```properties name=gcs-source tasks.max=1 connector.class=io.confluent.connect.gcs.GcsSourceConnector # enter the bucket name and GCS credentials here gcs.bucket.name= gcs.credentials.path= format.class=io.confluent.connect.gcs.format.avro.AvroFormat confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 # for production environments, enter the Confluent license here # confluent.license= ``` 2. Edit the `gcs-source-connector.properties` to add the following properties: ```properties transforms=AddPrefix transforms.AddPrefix.type=org.apache.kafka.connect.transforms.RegexRouter transforms.AddPrefix.regex=.* transforms.AddPrefix.replacement=copy_of_$0 ``` #### IMPORTANT Adding this renames the output of topic of the messages to `copy_of_gcs_topic`. This prevents a continuous feedback loop of messages. 3. Load the Backup and Restore GCS Source connector. ```bash confluent local load gcs-source --config gcs-source-connector.properties ``` #### IMPORTANT Don’t use the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/index.html) in production environments. 4. Confirm that the connector is in a `RUNNING` state. ```bash confluent local status gcs-source ``` 5. Confirm that the messages are being sent to Kafka. ```bash kafka-avro-console-consumer \ --bootstrap-server localhost:9092 \ --property schema.registry.url=http://localhost:8081 \ --topic copy_of_gcs_topic \ --from-beginning | jq '.' ``` 6. The response should be 9 records as follows. ```bash {"f1": "value1"} {"f1": "value2"} {"f1": "value3"} {"f1": "value4"} {"f1": "value5"} {"f1": "value6"} {"f1": "value7"} {"f1": "value8"} {"f1": "value9"} ``` ### Property-based example 1. Create a `gcs-source-connector.properties` file with the following contents. This file is included with the connector in `etc/kafka-connect-gcs/gcs-source-connector.properties`. This configuration is used typically along with [standalone workers](/platform/current/connect/concepts.html#standalone-workers).: ```json { "name" : "GCSSourceConnector", "config" : { "format.class": "io.confluent.connect.gcs.format.avro.AvroFormat", "connector.class" : "io.confluent.connect.gcs.GcsSourceConnector", "gcs.bucket.name" : "", "gcs.credentials.path" : "", "tasks.max" : "1", "confluent.topic.bootstrap.servers" : "localhost:9092", "confluent.topic.replication.factor" : "1", "confluent.license" : "" } } ``` 2. Edit the `gcs-source-connector.properties` file to add the following properties: ```json { "transforms" : "AddPrefix", "transforms.AddPrefix.type" : "org.apache.kafka.connect.transforms.RegexRouter", "transforms.AddPrefix.regex" : ".*", "transforms.AddPrefix.replacement" : "copy_of_$0" } ``` Adding the previous properties renames the output topic of the messages to `copy_of_gcs_topic` which prevents a continuous feedback loop of messages. 3. Load the Generalized GCS Source connector. ```bash confluent local load gcs-source --config gcs-source-connector.properties ``` #### IMPORTANT Don’t use the local CLI commands in a production environment. The [Confluent CLI](https://docs.confluent.io/confluent-cli/current/index.html) is used in production environments. See [confluent local](https://docs.confluent.io/confluent-cli/current/command-reference/local/confluent_local_current.html#confluent-local-currentference/local/confluent_local_current.html#confluent-local-current). for more information about local CLI commands. 4. Verify the connector is in a `RUNNING` state. ```bash confluent local status gcs-source ``` 5. Verify messages are being sent to Kafka. ```bash kafka-avro-console-consumer \ --bootstrap-server localhost:9092 \ --property schema.registry.url=http://localhost:8081 \ --topic copy_of_gcs_topic \ --from-beginning | jq '.' ``` 6. The response should be 9 records as shown in the following example: ```bash {"f1": "value1"} {"f1": "value2"} {"f1": "value3"} {"f1": "value4"} {"f1": "value5"} {"f1": "value6"} {"f1": "value7"} {"f1": "value8"} {"f1": "value9"} ``` ## Solace Quick Start This quick start uses the JMS Sink connector to consume records from Kafka and send them to a Solace PubSub+ broker. 1. Start the [Solace PubSub+ Standard](https://solace.com/software/) broker. ```bash docker run -d --name "solace" \ -p 8080:8080 -p 55555:55555 \ --shm-size=1000000000 \ --ulimit nofile=2448:38048 \ -e username_admin_globalaccesslevel=admin \ -e username_admin_password=admin \ -e system_scaling_maxconnectioncount=100 \ solace/solace-pubsub-standard:9.1.0.77 ``` 2. Create a Solace Queue in the `default` Message VPN. 1. Once the solace docker container has started, navigate to [http://localhost:8080](http://localhost:8080) in your browser and login with `admin`/`admin`. 2. Select the `default` Message VPN on the home screen. 3. Select “Queues” in the left menu to navigate to the Queues page. 4. On the Queues page, select the “+ Queue” button in the upper right and name the Queue `connector-quickstart`. 3. Install the connector through the [Confluent Hub Client](/kafka-connectors/self-managed/confluent-hub/client.html). ```bash # run from your Confluent Platform installation directory confluent connect plugin install confluentinc/kafka-connect-jms-sink:latest ``` 4. [Download the sol-jms jar](https://mvnrepository.com/artifact/com.solacesystems/sol-jms) and copy it into the JMS Sink connector’s plugin folder. This needs to be done on every Connect worker node and the workers must be restarted to pick up the client jar. 5. Start Confluent Platform. ```bash confluent local start ``` 6. [Produce](https://docs.confluent.io/current/cli/command-reference/confluent-produce.html) test data to the `jms-messages` topic in Kafka. ```bash seq 10 | confluent local produce jms-messages ``` 7. Create a `jms-sink.json` file with the following contents: ```json { "name": "JmsSinkConnector", "config": { "connector.class": "io.confluent.connect.jms.JmsSinkConnector", "tasks.max": "1", "topics": "jms-messages", "java.naming.factory.initial": "com.solacesystems.jndi.SolJNDIInitialContextFactory", "java.naming.provider.url": "smf://localhost:55555", "java.naming.security.principal": "admin", "java.naming.security.credentials": "admin", "connection.factory.name": "/jms/cf/default", "Solace_JMS_VPN": "default", "jms.destination.type": "queue", "jms.destination.name": "connector-quickstart", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "value.converter": "org.apache.kafka.connect.storage.StringConverter", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1" } } ``` 8. Load the JMS Sink connector. ```bash confluent local load jms --config jms-sink.json ``` #### IMPORTANT Don’t use the [Confluent CLI](/confluent-cli/current/index.html) in production environments. 9. Confirm that the connector is in a `RUNNING` state. ```bash confluent local status jms ``` 10. Navigate to the [Solace UI](http://localhost:8080) to confirm the messages were delivered to the `connector-quickstart` queue. ### Capturing Redo logs only 1. If not running, start Confluent Platform. ```text confluent local start ``` 2. Create the following connector configuration JSON file and save the file as `config1.json`. Note the following [configuration property](configuration-properties.md#connect-oracle-cdc-source-config) entries: * Configure the connector with a new `name`. * Set `table.topic.name.template` to an empty string. * Set `table.inclusion.regex` to capture several tables. * (Optional) Use `redo.log.topic.name` to rename the redo log. * (Optional) Set `redo.log.corruption.topic` to specify the topic where you want to record corrupted records. ```json { "name": "SimpleOracleCDC_1", "config":{ "connector.class": "io.confluent.connect.oracle.cdc.OracleCdcSourceConnector", "name": "SimpleOracleCDC_1", "tasks.max":1, "key.converter": "io.confluent.connect.avro.AvroConverter", "key.converter.schema.registry.url": "http://localhost:8081", "value.converter": "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url": "http://localhost:8081", "confluent.topic.bootstrap.servers":"localhost:9092", "oracle.server": "", "oracle.port": 1521, "oracle.sid":"", "oracle.pdb.name":"", "oracle.username": "", "oracle.password": "", "start.from":"snapshot", "redo.log.topic.name": "redo-log-topic-1", "table.inclusion.regex":"", "_table.topic.name.template_":"Set to an empty string to disable generating change event records", "table.topic.name.template": "", "connection.pool.max.size": 20, "confluent.topic.replication.factor":1, "topic.creation.groups": "redo", "topic.creation.redo.include": "redo-log-topic", "topic.creation.redo.replication.factor": 3, "topic.creation.redo.partitions": 1, "topic.creation.redo.cleanup.policy": "delete", "topic.creation.redo.retention.ms": 1209600000, "topic.creation.default.replication.factor": 3, "topic.creation.default.partitions": 5, "topic.creation.default.cleanup.policy": "compact" } } ``` 3. Enter the following command to start the connector: ```text curl -s -X POST -H 'Content-Type: application/json' --data @config1.json http://localhost:8083/connectors | jq ``` 4. Enter the following command to get the connector status: ```text curl -s -X GET -H 'Content-Type: application/json' http://localhost:8083/connectors/SimpleOracleCDC_1/status | jq ``` 5. Verify the following connector operations are successful: - The connector is started with one running task (see the following note). - The connector produces records whenever DML events (`INSERT`, `UPDATE`, AND `DELETE`) occur for captured tables. - The connector does not produce records for tables that were not included in regex or were explicitly excluded with `table.exclusion.regex`. - If the `redo.log.corruption.topic` is configured, the connector sends corrupted records to the specified corruption topic. #### NOTE If using the property `"start.from":"snapshot"`, the redo log topic contains only database operations completed after the connector starts. 6. Enter the following command to check Kafka topics: ```text kafka-topics --list --zookeeper localhost:2181 ``` If there are operations on the tables after the connector starts, you should see the topic configured by the `redo-log-topic` property. If no operations have occurred, there should be nothing displayed other than internal topics. 7. Consume records using the Avro console consumer. ```text kafka-avro-console-consumer --topic redo-log-topic-1 \ --partition 0 --offset earliest --bootstrap-server localhost:9092 \ --property schema.registry.url=http://localhost:8081 ``` If there are operations on the tables after the connector starts, you should see records displayed. If no operations have occurred, there should be no records. 8. Check for errors in the log: ```text confluent local services connect log | grep "ERROR" ``` 9. After you finish testing, enter the following command to clean up the running configuration: ```text confluent local services destroy ``` ### Capturing Redo logs and Change Event logs 1. If not running, start Confluent Platform. ```text confluent local start ``` 2. Create the following connector configuration JSON file and save the file as `config2.json`. ```json { "name": "SimpleOracleCDC_2", "config":{ "connector.class": "io.confluent.connect.oracle.cdc.OracleCdcSourceConnector", "name": "SimpleOracleCDC_2", "tasks.max":3, "key.converter": "io.confluent.connect.avro.AvroConverter", "key.converter.schema.registry.url": "http://localhost:8081", "value.converter": "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url": "http://localhost:8081", "confluent.topic.bootstrap.servers":"localhost:9092", "oracle.server": "", "oracle.port": 1521, "oracle.sid":"", "oracle.pdb.name":"", "oracle.username": "", "oracle.password": "", "start.from":"snapshot", "redo.log.topic.name": "redo-log-topic-2", "redo.log.consumer.bootstrap.servers":"localhost:9092", "table.inclusion.regex":"", "_table.topic.name.template_":"Using template vars to set change event topic for each table", "table.topic.name.template": "${databaseName}.${schemaName}.${tableName}", "connection.pool.max.size": 20, "confluent.topic.replication.factor":1, "topic.creation.groups": "redo", "topic.creation.redo.include": "redo-log-topic-2", "topic.creation.redo.replication.factor": 3, "topic.creation.redo.partitions": 1, "topic.creation.redo.cleanup.policy": "delete", "topic.creation.redo.retention.ms": 1209600000, "topic.creation.default.replication.factor": 3, "topic.creation.default.partitions": 5, "topic.creation.default.cleanup.policy": "compact" } } ``` 3. Create `redo-log-topic-2`. Make sure the topic name matches the value you put for `"redo.log.topic.name"`. ```text bin/kafka-topics --create --topic redo-log-topic-2 \ --bootstrap-server broker:9092 --replication-factor 1 \ --partitions 1 --config cleanup.policy=delete \ --config retention.ms=120960000 ``` 4. Enter the following command to start the connector: ```text curl -s -X POST -H 'Content-Type: application/json' --data @config2.json http://localhost:8083/connectors | jq ``` 5. Enter the following command to get the connector status: ```text curl -s -X GET -H 'Content-Type: application/json' http://localhost:8083/connectors/SimpleOracleCDC_2/status | jq ``` 6. Verify the connector starts with three running tasks. 7. Perform `INSERT`, `UPDATE`, and `DELETE` row operations for each table and verify the following expected results: - The connector creates a redo log topic and change event log topics for each captured table. - Redo log events are generated starting from current time. - The change event log for each table contains snapshot events (`op_type=R`) followed by other types of events. 8. Enter the following command to check Kafka topics: ```text kafka-topics --list --zookeeper localhost:2181 ``` You should see the topic configured by the `redo-log-topic` property and topics in the form of `${databaseName}.${schemaName}.${tableName}`. 9. Consume records using the Avro console consumer. ```text kafka-avro-console-consumer --topic redo-log-topic-2 \ --partition 0 --offset earliest --bootstrap-server localhost:9092 \ --property schema.registry.url=http://localhost:8081 ``` You should see change event records with `op_type=I` (insert), `op_type=I` (update), or `op_type=D` (delete). 10. Check for errors in the log. ```text confluent local services connect log | grep "ERROR" ``` 11. After you finish testing, enter the following command to clean up the running configuration: ```text confluent local services destroy ``` ### Starting from a specific SCN (without snapshot) 1. If not running, start Confluent Platform. ```text confluent local start ``` 2. Create the following connector configuration JSON file. Save the JSON file using the name `config3.json`. You have to choose an Oracle System Change Number (SCN) that exists with (at minimum) a redo log with the SCN or timestamp. The log has to be applicable for one of the included tables. You can use `SELECT CURRENT_SCN FROM v$database;` to query the current SCN of the database. ```json { "name": "SimpleOracleCDC_3", "config":{ "connector.class": "io.confluent.connect.oracle.cdc.OracleCdcSourceConnector", "name": "SimpleOracleCDC_3", "tasks.max":3, "key.converter": "io.confluent.connect.avro.AvroConverter", "key.converter.schema.registry.url": "http://localhost:8081", "value.converter": "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url": "http://localhost:8081", "confluent.topic.bootstrap.servers":"localhost:9092", "oracle.server": "", "oracle.port": 1521, "oracle.sid":"", "oracle.pdb.name":"", "oracle.username": "", "oracle.password": "", "_start.from_":"Set to a proper scn or timestamp to start without snapshotting tables", "start.from":"", "redo.log.topic.name": "redo-log-topic-3", "redo.log.consumer.bootstrap.servers":"localhost:9092", "table.inclusion.regex":"", "table.topic.name.template": "${databaseName}.${schemaName}.${tableName}", "connection.pool.max.size": 20, "confluent.topic.replication.factor":1, "topic.creation.groups": "redo", "topic.creation.redo.include": "redo-log-topic-3", "topic.creation.redo.replication.factor": 3, "topic.creation.redo.partitions": 1, "topic.creation.redo.cleanup.policy": "delete", "topic.creation.redo.retention.ms": 1209600000, "topic.creation.default.replication.factor": 3, "topic.creation.default.partitions": 5, "topic.creation.default.cleanup.policy": "compact" } } ``` 3. Create `redo-log-topic-3`. Make sure the topic name matches the value you put for `"redo.log.topic.name"`. ```text bin/kafka-topics --create --topic redo-log-topic-3 \ --bootstrap-server broker:9092 --replication-factor 1 \ --partitions 1 --config cleanup.policy=delete \ --config retention.ms=120960000 ``` 4. Enter the following command to start the connector: ```text curl -s -X POST -H 'Content-Type: application/json' --data @config3.json http://localhost:8083/connectors | jq ``` 5. Enter the following command to get the connector status: ```text curl -s -X GET -H 'Content-Type: application/json' http://localhost:8083/connectors/SimpleOracleCDC_3/status | jq ``` Verify that connector is started with three running tasks. 6. Perform `INSERT`, ```UPDATE```, and `DELETE` row operations for each table and verify the following expected results: - The connector creates a redo log topic and change event log topics for each captured table. - Redo log events are generated starting from current time. - The change event log for each table stars from the specified `start.from` values and do not contain snapshot events (`op_type=R`). 7. Enter the following command to check Kafka topics: ```text kafka-topics --list --zookeeper localhost:2181 ``` You should see the topic configured by the `redo-log-topic` property and topics in the form of `${databaseName}.${schemaName}.${tableName}`. 8. Consume records using the Avro console consumer. ```text kafka-avro-console-consumer --topic redo-log-topic-3 \ --partition 0 --offset earliest --bootstrap-server localhost:9092 \ --property schema.registry.url=http://localhost:8081 ``` You should see table-specific topics with snapshot records (`op_type=R`). 9. Check for errors in the log. ```text confluent local services connect log | grep "ERROR" ``` 10. After you finish testing, enter the following command to clean up the running configuration: ```text confluent local services destroy ``` #### Procedure 1. If not running, start Confluent Platform. ```text confluent local start ``` 2. Create the following connector configuration JSON file and save it as `config4.json`. You have to choose an Oracle System Change Number (SCN) that exists with (at minimum) a redo log with the SCN or timestamp. The log has to be applicable for one of the included tables. You can use `SELECT CURRENT_SCN FROM v$database;` to query the current SCN of the database. ```json { "name": "SimpleOracleCDC_4", "config":{ "connector.class": "io.confluent.connect.oracle.cdc.OracleCdcSourceConnector", "name": "SimpleOracleCDC_4", "tasks.max":2, "key.converter": "io.confluent.connect.avro.AvroConverter", "key.converter.schema.registry.url": "http://localhost:8081", "value.converter": "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url": "http://localhost:8081", "confluent.topic.bootstrap.servers":"localhost:9092", "oracle.server": "", "oracle.port": 1521, "oracle.sid":"", "oracle.pdb.name":"", "oracle.username": "", "oracle.password": "", "start.from":"snapshot", "redo.log.topic.name": "redo-log-topic-4", "redo.log.consumer.bootstrap.servers":"localhost:9092", "table.inclusion.regex":"", "table.topic.name.template": "${databaseName}.${schemaName}.${tableName}", "_lob.topic.name.template_": "Using template vars to set lob topic for each table", "lob.topic.name.template": "${tableName}.${columnName}_topic", "connection.pool.max.size": 20, "confluent.topic.replication.factor":1, "topic.creation.groups": "redo", "topic.creation.redo.include": "redo-log-topic-4", "topic.creation.redo.replication.factor": 3, "topic.creation.redo.partitions": 1, "topic.creation.redo.cleanup.policy": "delete", "topic.creation.redo.retention.ms": 1209600000, "topic.creation.default.replication.factor": 3, "topic.creation.default.partitions": 5, "topic.creation.default.cleanup.policy": "compact" } } ``` 3. Create `redo-log-topic-4`. Make sure the topic name matches the value you put for `"redo.log.topic.name"`. ```text bin/kafka-topics --create --topic redo-log-topic-4 \ --bootstrap-server broker:9092 --replication-factor 1 \ --partitions 1 --config cleanup.policy=delete \ --config retention.ms=120960000 ``` 4. Enter the following command to start the connector: ```text curl -s -X POST -H 'Content-Type: application/json' --data @config4.json http://localhost:8083/connectors | jq ``` 5. Enter the following command and verify that connector is started with two running tasks. ```text curl -s -X GET -H 'Content-Type: application/json' http://localhost:8083/connectors/SimpleOracleCDC_4/status | jq ``` 6. Perform `INSERT`, `UPDATE`, and `DELETE` row operations for each table and verify the following expected results: - The redo log topic is created. - Change event topics are created for each captured table. - LOB topics are created for each LOB column. - The key of the LOB topic records contain the following information: ```text { "table", "dot.separated.fully.qualified.table.name", "column", "column.name.of.LOB.column", "primary_key", "primary.key.of.change.event.topic.after.applying.${key.template}" } ``` - The value of the LOB topic is the LOB value. - When a row is deleted from the table, the corresponding LOB is deleted from the LOB topic. The connector writes a tombstone record (null value) to the LOB topic. 7. After you finish testing, enter the following command to clean up the running configuration: ```text confluent local services destroy ``` ### Capturing Redo logs and Snapshot with Supplemental logging only 1. If not running, start Confluent Platform. ```text confluent local start ``` 2. Create the following connector configuration JSON file and save the file as `config1.json`. Note the following [configuration property](configuration-properties.md#connect-oracle-cdc-source-config) entries: * Configure the connector with a new `name`. * Set `table.inclusion.regex` to capture several tables. * (Optional) Set `redo.log.corruption.topic` to specify the topic where you want to record corrupted records. ```json { "name": "SimpleOracleCDC_8", "config":{ "connector.class": "io.confluent.connect.oracle.cdc.OracleCdcSourceConnector", "name": "SimpleOracleCDC_8", "tasks.max": 3, "key.converter": "io.confluent.connect.avro.AvroConverter", "key.converter.schema.registry.url": "http://localhost:8081", "value.converter": "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url": "http://localhost:8081", "confluent.topic.bootstrap.servers": "localhost:9092", "oracle.server": "", "oracle.port": 1521, "oracle.sid": "", "oracle.pdb.name": "", "oracle.username": "", "oracle.password": "", "oracle.supplemental.log.level": "msl", "start.from": "snapshot", "redo.log.topic.name": "redo-log-topic-2", "redo.log.consumer.bootstrap.servers": "localhost:9092", "table.inclusion.regex": "", "_table.topic.name.template_": "Using template vars to set change event topic for each table", "table.topic.name.template": "${databaseName}.${schemaName}.${tableName}", "connection.pool.max.size": 20, "confluent.topic.replication.factor": 1, "topic.creation.groups": "redo", "topic.creation.redo.include": "redo-log-topic-8", "topic.creation.redo.replication.factor": 3, "topic.creation.redo.partitions": 1, "topic.creation.redo.cleanup.policy": "delete", "topic.creation.redo.retention.ms": 1209600000, "topic.creation.default.replication.factor": 3, "topic.creation.default.partitions": 5, "topic.creation.default.cleanup.policy": "compact" } } ``` 3. Enter the following command to start the connector: ```text curl -s -X POST -H 'Content-Type: application/json' --data @config1.json http://localhost:8083/connectors | jq ``` 4. Enter the following command to get the connector status: ```text curl -s -X GET -H 'Content-Type: application/json' http://localhost:8083/connectors/SimpleOracleCDC_8/status | jq ``` 5. Verify the following connector operations are successful: - The connector is started with three running tasks (see the following note). - The connector produces snapshot records for captured tables. #### NOTE If using the property `"start.from":"snapshot"`, the redo log topic contains only database operations completed after the connector starts. 6. Enter the following command to check Kafka topics: ```text kafka-topics --list --zookeeper localhost:2181 ``` You should see the topic configured by the `redo-log-topic` property and topics in the form of `${databaseName}.${schemaName}.${tableName}`. 7. Consume records using the Avro console consumer. ```text kafka-avro-console-consumer --topic redo-log-topic-8 \ --partition 0 --offset earliest --bootstrap-server localhost:9092 \ --property schema.registry.url=http://localhost:8081 ``` If there are operations on the tables after the connector starts, you should see records displayed. If no operations have occurred, there should be no records. 8. Check for errors in the log: ```text confluent local services connect log | grep "ERROR" ``` 9. After you finish testing, enter the following command to clean up the running configuration: ```text confluent local services destroy ``` ## PostgreSQL Example This section includes an example of how to move records from Oracle Database to PostgreSQL using the Oracle CDC Source and the JDBC Sink connectors. 1. Create an Oracle CDC Source connector. The following configuration will create a snapshot and store new changes (inserts) to a table-specific topic called `ORCLCDB.C__MYUSER.USERS` ```json { "name": "SimpleOracleCDC_DEMO", "config":{ "connector.class": "io.confluent.connect.oracle.cdc.OracleCdcSourceConnector", "name": "SimpleOracleCDC_DEMO", "tasks.max":3, "key.converter":"org.apache.kafka.connect.storage.StringConverter", "value.converter": "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url": "http://localhost:8081", "confluent.topic.bootstrap.servers":"localhost:9092", "oracle.server": "localhost", "oracle.port": 1521, "oracle.sid":"ORCLCDB", "oracle.username": "C##MYUSER", "oracle.password": "mypassword", "start.from":"snapshot", "redo.log.topic.name": "redo-log-topic", "redo.log.consumer.bootstrap.servers":"localhost:9092", "table.inclusion.regex": ".*USERS.*", "table.topic.name.template": "${databaseName}.${schemaName}.${tableName}", "connection.pool.max.size": 20, "confluent.topic.replication.factor":1, "lob.topic.name.template":"${databaseName}.${schemaName}.${tableName}.${columnName}", "redo.log.row.fetch.size":1, "numeric.mapping": "best_fit" } } ``` A sample record in `ORCLCDB.C__MYUSER.USERS`: ```json {"ID":241,"FIRST_NAME":{"string":"Lettie"},"LAST_NAME":{"string":"Kaplan"},"EMAIL":{"string":"Lettie.Kaplan@utvel.us"},"GENDER":{"string":"male"},"CLUB_STATUS":{"string":"active"},"COMMENTS":{"string":"Confluent"},"UPDATE_TS":{"long":1623831883974},"table":{"string":"ORCLCDB.C##MYUSER.USERS"},"scn":{"string":"1450183"},"op_type":{"string":"I"},"op_ts":{"string":"1623857084000"},"current_ts":{"string":"1623831886610"},"row_id":{"string":"AAAR9JAAHAAAACFAAg"},"username":{"string":"C##MYUSER"}} ``` 2. Create a JDBC Sink connector. You can use a [Single Message Transform (SMT)](/platform/current/connect/concepts.html#transforms) to drop prefix, `ORCLCDB.C__MYUSER.`, enabling the connector to UPSERT records to a `USERS` table in PostgreSQL as shown in the following example: ```json { "name": "jdbc_sink_postgres_demo", "config": { "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector", "connection.url": "jdbc:postgresql://:5432/", "connection.user": "", "connection.password": "", "tasks.max": "2", "topics": "ORCLCDB.C__MYUSER.USERS", "auto.create": "true", "auto.evolve": "true", "dialect.name": "PostgreSqlDatabaseDialect", "insert.mode": "upsert", "pk.mode": "record_value", "pk.fields":"ID", "batch.size": 1, "key.converter":"org.apache.kafka.connect.storage.StringConverter", "value.converter": "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url": "http://localhost:8081", "transforms":"dropPrefix", "transforms.dropPrefix.type":"org.apache.kafka.connect.transforms.RegexRouter", "transforms.dropPrefix.regex":"ORCLCDB.C__MYUSER.(.*)", "transforms.dropPrefix.replacement":"$1", "errors.tolerance":"all", "errors.deadletterqueue.topic.name":"dlq-jdbc-sink", "errors.deadletterqueue.context.headers.enable": "true", "errors.deadletterqueue.topic.replication.factor":"1" } } ``` 3. Check PostgreSQL and verify the data looks simliar to following: ![Oracle CDC connector PostgreSQL example](images/PostgreSQL_example.png) #### NOTE If the redo log topic updates are not propagated to the table topic and security is enabled on the Kafka cluster, you must configure `redo.log.consumer.*` accordingly. If the topic has at least one record produced by the connector that is currently running, make sure that the database rows have changed on the corresponding table since the initial snapshot for the table was taken. Ensure the `table.inclusion.regex` configuration property matches the fully qualified table name (for example, `dbo.Users`) and the regular expression in the `table.exclusion.regex` configuration property does not match the fully qualified table name. When `Supplemental Log` is turned on for a database or multiple tables, it might take time for a connector to catch up on reading a redo log and to find relevant records. Check the current SCN in the database with `SELECT CURRENT_SCN FROM V$DATABASE` and compare it with the last SCN the connector processed or saw in a connect-offsets topic (the topic name could be different depending on a setup) or in TRACE logs. If there is a huge gap, consider increasing `redo.log.row.fetch.size` to 100, 1000, or even a larger number. ```text curl -s -X PUT -H "Content-Type:application/json" \ http://localhost:8083/admin/loggers/io.confluent.connect.oracle \ -d '{"level": "TRACE"}' \ | jq '.' ``` ## Quick Start The RabbitMQ Sink connector streams records from Kafka topics to a RabbitMQ exchange with high throughput. This quick start shows example data production and consumption setups in detail. 1. Start the [RabbitMQ Server](https://www.rabbitmq.com/download.html) broker, specifying the docker image on basis of required RabbitMQ version. ```bash docker run -it --rm --name rabbitmq \ -p 5672:5672 \ -p 15672:15672 \ rabbitmq:3.8.4-management ``` 2. Create a RabbitMQ exchange. To produce messages from Kafka to RabbitMQ, you also create a queue and binding. 1. Once the RabbitMQ docker container has started, navigate to [http://localhost:15672](http://localhost:15672) in your browser and login with `guest`/`guest`. 2. In the `Exchanges` tab click on `Add a new exchange`. Name it `exchange1` and leave other options as the default settings. 3. In the `Queues` tab click on `Add a new queue`. Name it `queue1` and leave other options as the default settings. 4. In the `Exchanges` tab click on the exchange created `exchange1`. In the `Bindings` section add a binding in the field `To queue` to `queue1` with routing key `rkey1`. 3. Install the connector through the [Confluent Hub Client](/kafka-connectors/self-managed/confluent-hub/client.html). ```bash # run from your Confluent Platform installation directory confluent connect plugin install confluentinc/kafka-connect-rabbitmq-sink:latest ``` 4. Start Confluent Platform. ```bash confluent local start ``` 5. [Produce](https://docs.confluent.io/current/cli/command-reference/confluent-produce.html) test data to a pre-created `rabbitmq-messages` topic in Kafka. ```bash seq 10 | confluent local produce rabbitmq-messages ``` 6. Create a `rabbitmq-sink.json` file with the following contents: ```json { "name": "RabbitMQSinkConnector", "config": { "connector.class": "io.confluent.connect.rabbitmq.sink.RabbitMQSinkConnector", "tasks.max": "1", "topics": "rabbitmq-messages", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "value.converter": "org.apache.kafka.connect.converters.ByteArrayConverter", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1", "rabbitmq.host": "localhost", "rabbitmq.port": "5672", "rabbitmq.username": "guest", "rabbitmq.password": "guest", "rabbitmq.exchange": "exchange1", "rabbitmq.routing.key": "rkey1", "rabbitmq.delivery.mode": "PERSISTENT" } } ``` 7. Load the RabbitMQ Sink connector. ```bash confluent local load RabbitMQSinkConnector --config rabbitmq-sink.json ``` #### IMPORTANT Don’t use the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/index.html) in production environments. 8. Confirm that the connector is in a `RUNNING` state. ```bash confluent local status RabbitMQSinkConnector ``` 9. Navigate to the [RabbitMQ UI](http://localhost:15672) to confirm the messages were delivered to the `queue1` queue. ## Quick Start 1. Install [Redis](https://redis.io/topics/quickstart). 2. Start the Redis server so it can start listening for Redis connections. This starts Redis using the default port 6379 and no password (for testing purposes only). > ```bash > redis-server > ``` 3. Use the Redis CLI (`redis-cli`) to view any insertions being made. You can use the `MONITOR` command if the instance is being used only for this quick start test (see the note below). ```text redis-cli MONITOR ``` #### IMPORTANT The `MONITOR` CLI command is a debugging command that streams back every command processed by the Redis server. It assists you in understanding what is happening to the database. However, using it comes at a performance cost. **Do not use this in production environments.** 4. Install the connector. See [installation instructions](#redis-sink-connector-install) for details. 5. Start the Confluent Platform. ```bash confluent local start ``` #### IMPORTANT Ensure your start the Confluent Platform after installing the connector. If not, you must restart the Connect workers to register the installation and to add the new connector location to the path. 6. Ensure the installed connector has been identified by the Confluent Platform. ```bash confluent local services connect plugin list ``` 7. [Produce](https://docs.confluent.io/current/cli/command-reference/confluent-produce.html) test data to the `users` topic in Kafka. ```bash echo key1,value1 | confluent local produce users --property parse.key=true --property key.separator=, echo key2,value2 | confluent local produce users --property parse.key=true --property key.separator=, echo key3,value3 | confluent local produce users --property parse.key=true --property key.separator=, ``` #### IMPORTANT This connector expects non-null keys. The `parse.key` and `key.separator` properties ensure the exported records have explicit keys and values 8. Create a `redis-sink.properties` file with the following properties: ```text name=kafka-connect-redis topics=users tasks.max=1 connector.class=com.github.jcustenborder.kafka.connect.redis.RedisSinkConnector key.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.storage.StringConverter ``` 9. Start the connector. ```bash confluent local load kafka-connect-redis --config redis-sink.properties ``` 10. Ensure the connector status is `RUNNING`. ```bash confluent local status kafka-connect-redis ``` 11. Observe that data is flowing and the keys and values being inserted into Kafka are going to the desired Redis instance. 12. Shut down Confluent Platform. ```bash confluent local destroy ``` 13. Stop the `redis-server` and `redis-cli` (use Ctrl+C). ## Quick start This Quick start uses the Splunk S2S Source connector to receive data from the Splunk UF and ingests it into Kafka. 1. Install the connector using the [Confluent Hub Client](https://docs.confluent.io/current/connect/managing/confluent-hub/client.html). ```text # run from your CP installation directory confluent connect plugin install confluentinc/kafka-connect-splunk-s2s:latest ``` 2. Start the Confluent Platform. ```bash confluent local start ``` 3. Create a `splunk-s2s-source.properties` file with the following contents: ```text name=splunk-s2s-source tasks.max=1 connector.class=io.confluent.connect.splunk.s2s.SplunkS2SSourceConnector splunk.s2s.port=9997 kafka.topic=splunk-s2s-events key.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.json.JsonConverter key.converter.schemas.enable=false value.converter.schemas.enable=false confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 ``` 4. Load the Splunk S2S Source connector. ```bash confluent local load splunk-s2s-source --config splunk-s2s-source.properties ``` Don’t use the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/index.html) in production environments. 5. Confirm the connector is in a `RUNNING` state. ```bash confluent local status splunk-s2s-source ``` 6. Start a Splunk UF by running the Splunk UF Docker container. ```bash docker run -d -p 9998:9997 -e "SPLUNK_START_ARGS=--accept-license" -e "SPLUNK_PASSWORD=password" --name splunk-uf splunk/universalforwarder:9.0.0 ``` 7. Create a `splunk-s2s-test.log` file with the following sample log events: ```text log event 1 log event 2 log event 3 ``` 8. Copy the `splunk-s2s-test.log` file to the Splunk UF Docker container using the following command: ```bash docker cp splunk-s2s-test.log splunk-uf:/opt/splunkforwarder/splunk-s2s-test.log ``` 9. Configure the UF to monitor the `splunk-s2s-test.log` file: ```bash docker exec -it splunk-uf sudo ./bin/splunk add monitor -source /opt/splunkforwarder/splunk-s2s-test.log -auth admin:password ``` 10. Configure the UF to connect to Splunk S2S Source connector: - **For Mac/Windows systems**: ```bash docker exec -it splunk-uf sudo ./bin/splunk add forward-server host.docker.internal:9997 ``` - **For Linux systems**: ```bash docker exec -it splunk-uf sudo ./bin/splunk add forward-server 172.17.0.1:9997 ``` 11. Verify the data was ingested into the Kafka topic. To look for events from a monitored file (`splunk-s2s-test.log`) in the Kafka topic, run the following command: ```text kafka-console-consumer --bootstrap-server localhost:9092 --topic splunk-s2s-events --from-beginning | grep 'log event' ``` #### NOTE When you use the previous command without `grep`, you will see many Splunk internal events get ingested in the Kafka topic as Splunk UF sends internal Splunk log events to connector by default. 12. Shut down Confluent Platform. ```bash confluent local destroy ``` 13. Shut down the Docker container. ```bash docker stop splunk-uf docker rm splunk-uf ``` #### IMPORTANT The default port used by a Splunk HEC is `8088`. However, the ksqlDB component of Confluent Platform also uses that port. For this quick start, since both Splunk and Confluent Platform will be running, we configure the HEC to use port `8889`. If that port is in use by another process, change `8889` to a different, open port. 1. Start a Splunk Enterprise instance by running the Splunk Docker container. ```bash docker run -d -p 8000:8000 -p 8889:8889 -e "SPLUNK_START_ARGS=--accept-license" -e "SPLUNK_PASSWORD=password" --name splunk splunk/splunk:7.3.0 ``` 2. Open [http://localhost:8000](http://localhost:8000) to access Splunk Web. Log in with username `admin` and password `password`. 3. Configure a Splunk HEC using Splunk Web. - Click **Settings** > **Data Inputs**. - Click **HTTP Event Collector**. - Click **Global Settings**. - In the All Tokens toggle button, select **Enabled**. - Ensure **SSL disabled** is checked. - Change the HTTP Port Number to **8889**. - Click **Save**. - Click **New Token**. - In the **Name** field, enter a name for the token: `kafka` - Click **Next**. - Click **Review**. - Click **Submit**. #### IMPORTANT Note the token value on the **Token has been created successfully** page. This token value is needed for the connector configuration later. 4. Install the connector through the [Confluent Hub Client](https://docs.confluent.io/current/connect/managing/confluent-hub/client.html). ```bash # run from your Confluent Platform installation directory confluent connect plugin installsplunk/kafka-connect-splunk:latest ``` 5. Start Confluent Platform. ```bash confluent local start ``` 6. [Produce](https://docs.confluent.io/current/cli/command-reference/confluent-produce.html) test data to the `splunk-qs` topic in Kafka. ```bash echo event 1 | confluent local produce splunk-qs echo event 2 | confluent local produce splunk-qs ``` 7. Create a `splunk-sink.properties` file with the properties below. Substitute `` with the Splunk HEC token created earlier. ```properties name=SplunkSink topics=splunk-qs tasks.max=1 connector.class=com.splunk.kafka.connect.SplunkSinkConnector splunk.indexes=main splunk.hec.uri=http://localhost:8889 splunk.hec.token= splunk.sourcetypes=my_sourcetype confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 value.converter=org.apache.kafka.connect.storage.StringConverter ``` 8. Start the connector. ```bash confluent local load splunk --config splunk-sink.properties ``` 9. In the Splunk user interface, verify that data is flowing into your Splunk platform instance by searching using the search parameter `source="http:kafka"`. 10. Shut down Confluent Platform. ```bash confluent local destroy ``` 11. Shut down the Docker container. ```bash docker stop splunk docker rm splunk ``` ## Configure KRaft controllers To create a KRaft controller, create and configure a KRaftController CR. The following shows the key CR settings: ```yaml kind: KRaftController metadata: name: --- [1] namespace: --- [2] spec: replicas: --- [3] listeners: controller: --- [4] authentication: --- [5] tls: enabled: --- [6] externalAccess: --- [7] type: --- [8] controllerQuorumVoters: --- [9] configOverrides: --- [10] server: - default.replication.factor= --- [11] - listener --- [12] dependencies: schemaRegistry: --- [13] ``` * [1] Required. The name of this KRaft controller. * [2] The namespace of this KRaft controller. * [3] The desired number of replicas. Must be an odd number that is 3 or higher. A change to this setting will roll the cluster. * [4] Required. Communication to and among the KRaft controller nodes happens over this controller listener. * [5] See [Authentication for Kafka and KRaft](co-authenticate-kafka.md#co-authenticate-kafka) for configuring authentication. * [6] Set to `true` to enable TLS. See [Network encryption](co-network-encryption.md#co-network-encryption) for configuring TLS certificates. * [7] defines the external access configuration for the Kafka cluster. * [8] Required if `externalAccess` ([7]) is specified. Set to the Kubernetes service for external access. Valid options are `loadBalancer`, `nodePort`, `route`, `staticForPortBasedRouting`, and `staticForHostBasedRouting`. For details on external access configuration, see [Network encryption](co-networking-overview.md#co-networking-overview). * [9] Required for multi-region deployment. Follow the further configuration steps in [Set up MRC with KRaft](co-multi-region.md#co-mrc-kraft). * [10] Required. Use the `configOverrides` to set the matching properties as set in the Kafka CR. The following properties are required: [11], [12] * [11] Required. The default replication factor is the number of Kafka replicas. This parameter is needed for the KRaft controller to interact with the Kafka brokers for some features, such as Self-Balancing and metrics reporter. You must explicitly set it for KRaft to match the number of Kafka replicas (`spec.replicas` in the Kafka CR). Use `configOverrides` to set the property because the property is not directly supported in the KRaft CR. For example: ```yaml spec: configOverrides: server: - default.replication.factor=3 ``` * [12] Required for when authentication is enabled for Kafka replication listeners. Set the same properties set in the Kafka CR, under `spec.listeners.replication.authentication`. The following are sample configurations for mTLS authentication among replication listeners: ```yaml spec: configOverrides: server: - listener.name.replication.ssl.client.auth=required - listener.name.replication.ssl.key.password=${file:/vault/secrets/kafka-tls/jksPassword.txt:jksPassword} - listener.name.replication.ssl.keystore.location=/vault/secrets/kafka-tls/keystore.jks - listener.name.replication.ssl.keystore.password=${file:/vault/secrets/kafka-tls/jksPassword.txt:jksPassword} - listener.name.replication.ssl.principal.mapping.rules=RULE:.*CN[\s]?=[\s]?([a-zA-Z0-9._]*)?.*/$1/ - listener.name.replication.ssl.truststore.location=/vault/secrets/kafka-tls/truststore.jks - listener.name.replication.ssl.truststore.password=${file:/vault/secrets/kafka-tls/jksPassword.txt:jksPassword} - listener.security.protocol.map=CONTROLLER:SSL,REPLICATION:SSL ``` * [13] Required when the Kafka CR has a dependency on Schema Registry in the `spec.dependencies.schemaRegistry` section. Set the same Schema Registry dependency settings you set in the Kafka CR here. An example KRaftController CR: ```yaml kind: KRaftController metadata: name: kcontroller namespace: operator spec: replicas: 3 listeners: controller: authentication: type: plain jaasConfig: secretRef: kraft-secret tls: enabled: true dependencies: schemaRegistry: authentication: basic: secretRef: kafka-sr-credential type: basic tls: enabled: true configOverrides: server: - default.replication.factor=3 ``` ## Options ```none --all gather confluent-platform information (default true) --exclude-kubectl-misc exclude kubectl misc information. --exclude-logs exclude all pod logs. --exclude-pdb exclude pdb information. --exclude-pv-pvc exclude pv and pvc information. --follow-logs-duration int Follow pod logs similar to kubectl logs -f for a given time in second -h, --help help for support-bundle --include-kernel-params gather information about the kernel params. --include-namespace gather information about the namespace. --include-nodes include node information. --only-application-resources gather confluent-platform application resources information --only-cluster-resources gather confluent-platform cluster resources information --only-clusterlink gather only cluster link information. --only-confluentrolebinding gather only confluent role binding information. --only-connect gather only connect clusters information. --only-connector gather only connector information. --only-controlcenter gather only controlcenter clusters information. --only-flink gather only flink information. --only-gateway gather only gateway information. --only-kafka gather only kafka clusters information. --only-kafkarestclass gather only kafka rest class information. --only-kafkarestproxy gather only kafka rest proxy cluster's information. --only-kafkatopic gather only kafka topic information. --only-kraftcontroller gather only kraft controller cluster's information. --only-kraftmigrationjob gather only kraft migration job information. --only-ksqldb gather only ksqldb clusters information. --only-schema gather only schema information. --only-schemaexporter gather only schema exporter information. --only-schemaregistry gather only schemaregistry clusters information. --only-usmagent gather only USM agent information. --only-zk gather only zookeeper clusters information. --out-dir string directory where the support-bundle will be created; defaults to user's current directory if not configured. ``` ### Step 4: Install Confluent Platform 1. Deploy the KRaft controller and the Kafka brokers: ```bash kubectl apply -f $TUTORIAL_HOME/confluent-platform-c3++.yaml ``` 2. Install the sample producer app and topic: ```bash kubectl apply -f $TUTORIAL_HOME/producer-app-data.yaml ``` 3. Wait until all the Confluent Platform components are deployed and running: ```bash kubectl get pods ``` In this tutorial, the following components are being deployed: KRaft controller, Kafka, Connect, Schema Registry, ksqlDB, REST Proxy, Control Center . ### Step 4: Install Confluent Platform 1. Install all Confluent Platform components: ```bash kubectl apply -f $TUTORIAL_HOME/confluent-platform.yaml ``` 2. Install the sample producer app and topic: ```bash kubectl apply -f $TUTORIAL_HOME/producer-app-data.yaml ``` 3. Wait until all the Confluent Platform pods are deployed and running: ```bash kubectl get pods ``` In this tutorial, the following components are being deployed: ZooKeeper, Kafka, Connect, Schema Registry, ksqlDB, REST Proxy, Confluent Control Center (Legacy) . ### Required Configuration Properties * `bootstrap.servers` - A list of host/port pairs to use for establishing the initial connection to your Apache Kafka® cluster (of the form: `host1:port1,host2:port2,....`). Note that the client will make use of all servers in your cluster irrespective of which servers are specified via this property for bootstrapping. You may want to specify more than one in case one of the servers in your list is down at the time of initialization. * `client.id` - Under the hood, the Confluent JMS Client makes use of one or more Kafka clients for communication your Kafka cluster. The `client.id` of these clients is set to the value of this configuration property appended with a globally unique id (guid). The `client.id` string is passed to the server when making requests and is useful for debugging purposes. * `confluent.license` - A license key string provided to you by Confluent under the terms of a Confluent Enterprise subscription agreement. If not specified, you may use the client for a trial period of 30 days after which it will stop working. * `confluent.topic` - Name of the Kafka topic used for Confluent configuration, including licensing information. The default name for this topic is `_confluent-command`. To learn more, see [License topic configuration](/platform/current/connect/license.html#license-topic-configuration) and [License topic ACLs](/platform/current/connect/license.html#license-topic-acls). * `confluent.topic.replication.factor` - The replication factor for the Kafka topic used for Confluent configuration, including licensing information. This is used only if the topic does not already exist, and the default of three is appropriate for production use. If you are using a development environment with less than three brokers, you must set this to the number of brokers (e.g. 1). Configuration properties are set in the same way as any other Kafka client: ```bash Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("confluent.topic", "foo_confluent-command"); props.put("confluent.topic.replication.factor", "3"); props.put("client.id", "my-jms-client"); ``` Optional Configuration Properties ————————————————~ * `allow.out.of.order.acknowledge` - If true, does not throw an exception if a message is acknowledged out of order (which implicitly acknowledges any messages before it). default value is `false`. * `jms.fallback.message.type` - If the JMS Message type header is not associated with a message, fallback to this message type. * `consumer.group.id` - A string that uniquely identifies the group of consumer processes to which this client belongs. If not specified, this defaults to `confluent-jms` in the case of queues and `confluent-jms-{uuid}` in the case of topics, where {uuid} is a unique value for each consumer. This naming strategy provides load balancer semantics in the case of queues and publish-subscribe semantics in the case of topics, as required by the JMS Specification. * `jms.consumer.poll.timeout.ms` - The maximum length of time Kafka consumers should block when retrieving records from Kafka. You should not need to adjust this value. * `jms.consumer.close.timeout.ms` - The maximum number of milliseconds to wait for a clean shutdown when closing a `MessageConsumer`. * `message.listener.null.wait.ms` - The number of milliseconds to wait before polling Kafka for new messages if no messages were retrieved in a message listener poll loop. Reducing this value will improve consume latency in low throughput scenarios at the expense of higher network/CPU overhead. * `connection.stop.timeout.ms` - The maximum number of milliseconds to wait for the message listener threads to cleanly shutdown when connection.stop() has been called. * `jms.create.connection.ignore.authenticate` - If true, connection creation methods on ConnectionFactory that have username and password parameters will fall through to the corresponding methods that do not have these parameters (the parameters will be ignored). If false, use of these methods will result in a JMSException being thrown. * `message.listener.max.redeliveries` - The maximum number of times a message will be redelivered to a MessageConsumer listener when the session is in AUTO_ACKNOWLEDGE mode. Default value is 10. Standard Kafka Configuration Properties (Optional) ————————————————————————~~ All of the configuration properties of the underlying Java Kafka client library may be specified. Simply prefix the desired property with `producer.` or `consumer.` as appropriate. For example: ```bash props.put("producer.linger.ms", "1"); props.put("consumer.heartbeat.interval.ms", "1000"); ``` Enabling TLS Encryption (Optional) ————————————————~~ Security settings match those of the native Kafka java producer and consumer. Security settings are applied to both production and consumption of messages (you do not need to prefix security settings with `consumer.` or `producer.`). If client authentication is not required in the broker, then the following is a minimal configuration example: ```bash props.put("security.protocol", "SSL"); props.put("ssl.truststore.location", "/var/private/ssl/kafka.client.truststore.jks"); props.put("ssl.truststore.password", "test1234"); ``` If client authentication is required, then a keystore must be created like in step 1 and the following must also be configured: ```bash props.put("ssl.keystore.location", "/var/private/ssl/kafka.client.keystore.jks"); props.put("ssl.keystore.password", "test1234"); props.put("ssl.key.password", "test1234"); ``` ### Broker removal cannot complete due to offline partitions Broker removal can also fail in cases where taking a broker down will result in having fewer online brokers than the number of replicas required in your configurations. The broker status (available with [kafka-remove-brokers](configuration-options.md#sbc-command-remove-brokers) `--describe`) will remain as follows, until you restart one or more of the offline brokers: ```bash [2020-09-17 23:40:53,743] WARN [AdminClient clientId=adminclient-1] Connection to node -5 (localhost/127.0.0.1:9096) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)Broker 1 removal status: Partition Reassignment: IN_PROGRESS Broker Shutdown: COMPLETE ``` A short-hand way of troubleshooting this is to ask “how many brokers are down?” and “how many replicas/replication factors must the cluster support?” Partition reassignment (the [last phase in broker removal](configuration-options.md#sbc-broker-removal-phases)) will fail to complete in any case where you have `n` brokers down, and your configuration requires `n + 1` or more replicas. Alternatively, you can consider how many online brokers you need to support the required number of replicas. If you have `n` brokers online, these can support at most a total of `n` replicas. **Solution:** The solution is to restart the down brokers, and perhaps modify the cluster configuration as a whole. This might include both adding brokers and modifying replicas/replication factors (see example below). Scenarios that lead to this problem can be a combination of under-replicated topics and topics with too many replicas for the number of online brokers. Having a topic with a replication factor of 1 does not necessarily lead to a problem in and of itself. A quick way to get an overview of configured replicas on a running cluster is to use `kafka-topics --describe` on a specified topic, or on the whole cluster (with no topic specified). For system topics, you can scan the replication factors and replicas on system properties (which generate system topics). The [Tutorial: Add and Remove Brokers with Self-Balancing in Confluent Platform](sbc-tutorial.md#sbc-tutorial) covers these commands, replicas/replication factors, and the impact of these configurations. ### Sink tasks The previous section described how to implement a simple `SourceTask`. Unlike `SourceConnector` and `SinkConnector`, `SourceTask` and `SinkTask` have very different interfaces because `SourceTask` uses a pull interface and `SinkTask` uses a push interface. Both share the common lifecycle methods, but the `SinkTask` interface is quite different: ```java public abstract class SinkTask implements Task { ... [ lifecycle methods omitted ] ... public void initialize(SinkTaskContext context) { this.context = context; } public abstract void put(Collection records); public abstract void flush(Map offsets); public void open(Collection partitions) {} public void close(Collection partitions) {} } ``` The [SinkTask documentation](/platform/current/connect/javadocs/javadoc/org/apache/kafka/connect/sink/SinkTask.html) contains full details, but this interface is nearly as simple as the `SourceTask`. The `put()` method should contain most of the implementation, accepting sets of `SinkRecords`, performing any required translation, and storing them in the destination system. This process does not need to ensure the data has been fully written to the destination system before returning. In fact, in many cases some internal buffering will be useful so an entire batch of records can be sent at once (much like Kafka’s producer), reducing the overhead of inserting events into the downstream data store. The `SinkRecords` contain essentially the same information as `SourceRecords`: Kafka topic, partition, and offset and the event key and value. The `flush()` method is used during the offset commit process, which allows tasks to recover from failures and resume from a safe point such that no events will be missed. The method should push any outstanding data to the destination system and then block until the write has been acknowledged. The `offsets` parameter can often be ignored, but is useful in some cases where implementations want to store offset information in the destination store to provide exactly-once delivery. For example, an HDFS connector could do this and use atomic move operations to make sure the `flush()` operation atomically commits the data and offsets to a final location in HDFS. Internally, `SinkTask` uses a Kafka consumer to poll data. The consumer instances used in tasks for a connector belong to the same consumer group. Task reconfiguration or failures will trigger rebalance of the consumer group. During rebalance, the topic partitions will be reassigned to the new set of tasks. For more explanations of the Kafka consumer rebalance, see the [Consumer](../clients/consumer.md#kafka-consumer) section. Note that as the consumer is single threaded and you should make sure that `put()` or `flush()` will not take longer than the consumer session timeout. Otherwise, the consumer will be kicked out of the group, which triggers a rebalancing of partitions that stops all other tasks from making progress until the rebalance completes. To ensure that the resources are properly released and allocated during rebalance, `SinkTask` provides two additional methods: `close()` and `open()`, which are tied to the underlying rebalance callbacks of the `KafkaConsumer` that is driving the `SinkTask`. The `close()` method is used to close writers for partitions assigned to the `SinkTask`. This method will be called before a consumer rebalance operation starts and after the `SinkTask` stops fetching data. After being closed, Connect will not write any records to the task until a new set of partitions has been opened. The `close()` method has access to all topic partitions assigned to the `SinkTask` before rebalance starts. In general, Confluent recommends to close writers for all topic partitions and ensures that the states for all topic partitions are properly maintained. However, you can choose to close writers for a subset of topic partitions in your implementation. In this case, you need to carefully reason about the state before and after rebalance in order to achieve the desired delivery guarantee. The `open()` method is used to create writers for newly assigned partitions in case of consumer rebalance. This method will be called after partition re-assignment completes and before the `SinkTask` starts fetching data. Note that any errors raised from `close()` or `open()` will cause the task to stop, report a failure status, and the corresponding consumer instance to close. This consumer shutdown triggers a rebalance, and topic partitions for this task will be reassigned to other tasks of this connector. ## Separate principals Within the Connect worker configuration, all properties having a prefix of `producer.` and `consumer.` are applied to all source and sink connectors created in the worker. The `admin.` prefix is used for error reporting in sink connectors. The following describes how these prefixes are used: * The `consumer.` prefix controls consumer behavior for sink connectors. * The `producer.` prefix controls producer behavior for source connectors. * Both the `producer.` and `admin.` prefixes control producer and client behavior for sink connector error reporting. You can override these properties for individual connectors using the `producer.override.`, `consumer.override.`, and `admin.override.` prefixes. This includes overriding the worker service principal configuration to create separate service principals for each connector. Overrides are disabled by default. They are enabled using the `connector.client.config.override.policy` worker property. This property sets the per-connector overrides the worker permits. The out-of-the-box (OOTB) options for the override policy are: * `connector.client.config.override.policy=None` : Default. Does not allow any configuration overrides. * `connector.client.config.override.policy=Principal` : Allows overrides for the `security.protocol`, `sasl.jaas.config`, and `sasl.mechanism` configuration properties, using the `producer.override.`, `consumer.override`, and `admin.override` prefixes. * `connector.client.config.override.policy=All` : Allows overrides for all configuration properties using the `producer.override.`, `consumer.override`, and `admin.override` prefixes. If your Kafka broker supports client authentication over SSL, you can configure a separate principal for the worker and the connectors. In this case, you need to [generate a separate certificate](../security/security_tutorial.md#generating-keys-certs) for each of them and install them in separate keystores. The key Connect configuration differences are as follows, notice the unique password, keystore location, and keystore password: ```bash ### Worker configuration properties file Regardless of the mode used, Kafka Connect workers are configured by passing a worker configuration properties file as the first parameter. For example: ```bash bin/connect-distributed worker.properties ``` Sample worker configuration properties files are included with Confluent Platform to help you get started. The following list shows the location for Avro sample files: * `etc/schema-registry/connect-avro-distributed.properties` * `etc/schema-registry/connect-avro-standalone.properties` Use one of these files as a starting point. These files contain the necessary configuration properties to use the Avro converters that integrate with Schema Registry. They are configured to work well with Kafka and Schema Registry services running locally. They do not require running more than a single broker, making it easy for you to test Kafka Connect locally. The example configuration files can also be modified for production deployments by using the correct hostnames for Kafka and Schema Registry and acceptable (or default) values for the internal topic replication factor. For a list of worker configuration properties, see [Kafka Connect Worker Configuration Properties](/platform/current/connect/references/allconfigs.html). ### Producer and consumer overrides You may need to override default settings, other than those described in the previous section. The following two examples show when this might be required. **Worker override example** Consider a standalone process that runs a log file connector. For the logs being collected, you might prefer low-latency, best-effort delivery. That is, when there are connectivity issues, minimal data loss may be acceptable for your application in order to avoid data buffering on the client. This keeps log collection as lightweight as possible. To override [producer configuration properties](/platform/current/installation/configuration/producer-configs.html) and [consumer configuration properties](/platform/current/installation/configuration/consumer-configs.html) for all connectors controlled by the worker, you prefix worker configuration properties with `producer.` or `consumer.` as shown in the following example: ```json producer.retries=1 consumer.max.partition.fetch.bytes=10485760 ``` The previous example overrides the default producer `retries` property to retry sending messages only one time. The consumer override increases the default amount of data fetched from a partition per request to 10 MB. These configuration changes are applied to all connectors controlled by the worker. Be careful making any changes to these settings when running distributed mode workers. **Per-connector override example** By default, the producers and consumers used for connectors are created using the same properties that Connect uses for its own internal topics. This means that the same Kafka principal must be able to read and write to all the internal topics and all of the topics used by the connectors. You may want the producers and consumers used for connectors to use a different Kafka principal. It is possible for connector configurations to override worker properties used to create producers and consumers. These are prefixed with `producer.override.` and `consumer.override.`. For more information about per-connector overrides, see [Override the Worker Configuration](/platform/current/connect/references/allconfigs.html#override-the-worker-configuration). For detailed information about producers and consumers, see [Kafka Producer](/platform/current/clients/producer.html) and [Kafka Consumer](/platform/current/clients/consumer.html). For a list of configuration properties, see [producer configuration properties](/platform/current/installation/configuration/producer-configs.html) and [consumer configuration properties](/platform/current/installation/configuration/consumer-configs.html). ### How to run it 1. Start ZooKeeper. ```none sudo zookeeper-server-start ${CONFLUENT_HOME}/etc/kafka/zookeeper.properties ``` 2. Start Kafka with the `etc/kafka/server.properties` you just configured. ```none kafka-server-start ${CONFLUENT_HOME}/etc/kafka/server.properties ``` To learn more, see [Start Confluent Platform](../../installation/installing_cp/zip-tar.md#start-cp-command-line) and [how to install and run Confluent Platform](../../installation/overview.md#installation). You can configure clients like Schema Registry, Control Center, ksqlDB, and Connect to talk to Kafka and MDS over HTTPS in their respective properties files. ![image](images/security-rbac-mtls.png) ### Rolling restart If you need to do software upgrades, broker configuration updates, or cluster maintenance, then you will need to restart all the brokers in your Kafka cluster. To do this, you can do a rolling restart by restarting one broker at a time. Restarting the brokers one at a time provides high availability by avoiding downtime for end users. Some considerations to avoid downtime include: * Use [Confluent Control Center](https://docs.confluent.io/control-center/current/overview.html) to monitor broker status during the rolling restart. * Because one replica is unavailable while a broker is restarting, clients will not experience downtime if the number of remaining in sync replicas is greater than the configured `min.insync.replicas`. * Run brokers with `controlled.shutdown.enable=true` to migrate topic partition leadership before the broker is stopped. * The active controller should be the last broker you restart. This is to ensure that the active controller is not moved on each broker restart, which would slow down the restart. Before starting a rolling restart: 1. Verify your cluster is healthy and there are no under replicated partitions. In Control Center, navigate to **Overview** of the cluster, and observe the **Under replicated partitions** value. If there are under replicated partitions, investigate why before doing a rolling restart. 2. Identify which Kafka broker in the cluster is the active controller. The active controller will report `1` for the following metric `kafka.controller:type=KafkaController,name=ActiveControllerCount` and the remaining brokers will report `0`. Use the following workflow for rolling restart: 1. Connect to one broker, being sure to leave the active controller for last, and stop the broker process gracefully. Do not send a `kill -9` command. Wait until the broker has completely shutdown. ```none bin/kafka-server-stop ``` 2. If you are performing a [software upgrade](../installation/upgrade.md#upgrade) or making any system configuration changes, follow those steps on this broker. (If you are just changing broker properties, you could optionally do this before you stop the broker) 3. Start the broker back up, passing in the broker properties file. ```none bin/kafka-server-start etc/kafka/broker.properties ``` 4. Wait until that broker completely restarts and is caught up before proceeding to restart the next broker in your cluster. Waiting is important to ensure that leader failover happens as cleanly as possible. To know when the broker is caught up, in Control Center, navigate to **Overview** of the cluster, and observe the **Under replicated partitions** value. During broker restart, this number increases because data will not be replicated to topic partitions that reside on the restarting broker. ![image](kafka/underreplicated-down.png) After a broker restarts and is caught up, this number goes back to its original value before restart, which should be `0` in a healthy cluster. ![image](kafka/underreplicated-recovered.png) 5. Repeat the above steps on each broker until you have restarted all brokers but the active controller. Now you can restart the active controller. ### Limiting bandwidth usage during data migration Kafka lets you apply a throttle to replication traffic, setting an upper bound on the bandwidth used to move replicas from machine to machine. This is useful when rebalancing a cluster, bootstrapping a new broker or adding or removing brokers, as it limits the impact these data-intensive operations will have on users. There are three interfaces that can be used to engage a throttle. The simplest, and safest, is to apply a throttle when invoking [confluent-rebalancer](../clusters/rebalancer/quickstart.md#rebalancer) or `kafka-reassign-partitions`, but `kafka-configs` can also be used to view and alter the throttle values directly. So for example, if you were to execute a rebalance, with the below command, it would move partitions at no more than 50 MBps. ```none bin/kafka-reassign-partitions --bootstrap-server myhost:9092 --execute --reassignment-json-file bigger-cluster.json —throttle 50000000 ``` When you execute this script you will see the throttle engage: ```none … The throttle limit was set to 50000000 B/s Successfully started reassignment of partitions. ``` Should you wish to alter the throttle, during a rebalance, say to increase the throughput so it completes quicker, you can do this by re-running the execute command passing the same `reassignment-json-file`: ```none bin/kafka-reassign-partitions --bootstrap-server localhost:9092 --execute --reassignment-json-file bigger-cluster.json --throttle 700000000 There is an existing assignment running. The throttle limit was set to 700000000 B/s ``` After the rebalance completes the administrator can check the status of the rebalance using the `--verify` option. If the rebalance has completed, and `--verify` is run, the throttle will be removed. It is important that administrators remove the throttle in a timely manner after rebalancing completes by running the command with the `--verify` option. Failure to do so could cause regular replication traffic to be throttled. When the `--verify` option is executed, and the reassignment has completed, the script will confirm that the throttle was removed: ```none bin/kafka-reassign-partitions --bootstrap-server localhost:9092 --verify --reassignment-json-file bigger-cluster.json Status of partition reassignment: Reassignment of partition [my-topic,1] completed successfully Reassignment of partition [mytopic,0] completed successfully Throttle was removed. ``` The administrator can also validate the assigned configs using the `kafka-configs`. There are two pairs of throttle configuration used to manage the throttling process. The throttle value itself. This is configured, at a broker level, using the dynamic properties: ```none leader.replication.throttled.rate follower.replication.throttled.rate ``` There is also an enumerated set of throttled replicas: ```none leader.replication.throttled.replicas follower.replication.throttled.replicas ``` Which are configured per topic. All four config values are automatically assigned by `kafka-reassign-partitions` (discussed below). The throttle mechanism works by measuring the received and transmitted rates, for partitions in the `replication.throttled.replicas` lists, on each broker. These rates are compared to the `replication.throttled.rate` config to determine if a throttle should be applied. The rate of throttled replication (used by the throttle mechanism) is recorded in the below JMX metrics, so they can be externally monitored. ```none MBean:kafka.server:type=LeaderReplication,name=byte-rate MBean:kafka.server:type=FollowerReplication,name=byte-rate ``` To view the throttle limit configuration: ```none bin/kafka-configs --describe --bootstrap-server localhost:9092 --entity-type brokers Configs for brokers '2' are leader.replication.throttled.rate=1000000,follower.replication.throttled.rate=1000000 Configs for brokers '1' are leader.replication.throttled.rate=1000000,follower.replication.throttled.rate=1000000 ``` This shows the throttle applied to both leader and follower side of the replication protocol. By default both sides are assigned the same throttled throughput value. To view the list of throttled replicas: ```none bin/kafka-configs --describe --bootstrap-server localhost:9092 --entity-type topics Configs for topic ‘my-topic' are leader.replication.throttled.replicas=1:102,0:101,follower.replication.throttled.replicas=1:101,0:102 ``` Here we see the leader throttle is applied to partition 1 on broker 102 and partition 0 on broker 101. Likewise the follower throttle is applied to partition 1 on broker 101 and partition 0 on broker 102. By default `kafka-reassign-partitions` will apply the leader throttle to all replicas that exist before the rebalance, any one of which might be leader. It will apply the follower throttle to all move destinations. So if there is a partition with replicas on brokers `101,102`, being reassigned to `102,103`, a leader throttle, for that partition, would be applied to `101,102` (possible leaders during rebalance) and a follower throttle would be applied to `103` only (the move destination). If required, you can also use the `--alter` switch on `kafka-configs` to alter the throttle configurations manually. Some care should be taken when using throttled replication. In particular: 1. Throttle Removal: The throttle should be removed in a timely manner after reassignment completes (by running `confluent-rebalancer --finish` or `kafka-reassign-partitions -—verify`). 1. Ensuring Progress: If the throttle is set too low, in comparison to the incoming write rate, it is possible for replication to not make progress. This occurs when: ```none max(BytesInPerSec) > throttle ``` Where BytesInPerSec is the metric that monitors the write throughput of producers into each broker. The administrator can monitor whether replication is making progress, during the rebalance, using the metric: ```none kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=([-.\w]+),topic=([-.\w]+),partition=([0-9]+) ``` The lag should constantly decrease during replication. If the metric does not decrease the administrator should increase the throttle throughput as described above. 1. Avoiding long delays during replication: The throttled throughput should be large enough that replicas cannot be starved for extended periods. A good, conservative rule of thumb is to keep throttle above `#brokers MB/s` where `#brokers` is the number of brokers in your cluster. Administrators wishing to use lower throttle values can tune the response size used for replication based on the relation: ```none Worst-Case-Delay = replica.fetch.response.max.bytes x #brokers / throttle ``` Here, the admin should tune the throttle and/or `replica.fetch.response.max.bytes` appropriately to ensure the delay is never larger than `replica.lag.time.max.ms` (as it is possible for some partitions, particularly smaller ones, to enter the ISR before the rebalance completes) or the outer throttle window: `(replication.quota.window.size.seconds x replication.quota.window.num)` or the connection timeout `replica.socket.timeout.ms`. As the default for `replica.fetch.response.max.bytes` is 10MB and the delay should be less than 10s (`replica.lag.time.max.ms`), this leads to the rule of thumb that throttles should never be less than #brokers MBps . To better understand the relation let’s consider an example. Say we have a 5 node cluster, with default settings. We set a throttle of 10 MBps, cluster-wide, and add a new broker. The bootstrapping broker would replicate from the other 5 brokers with requests of size 10MB (default `replica.fetch.response.max.bytes`). The worst case payload, arriving at the same time on the bootstrapping broker, is 50MB. In this case the follower throttle, on the bootstrapping broker, would delay subsequent replication requests for (50MB / 10 MBps) = 5s, which is acceptable. However if we set the throttle to 1 MBps the worst-case delay would be 50s which is not acceptable. # Quick Start for Confluent REST Proxy for Kafka Use the following Quick Start instructions to get up and running with Confluent REST Proxy for Apache Kafka®. Prerequisites : - [Confluent Platform](../installation/index.md#installation-overview) You should configure and start a KRaft controller and a Kafka broker before you start REST Proxy. For detailed instructions on how to configure and run Confluent Platform see, [Tutorial: Set Up a Multi-Broker Kafka Cluster](../get-started/tutorial-multi-broker.md#basics-multi-broker-setup). You will only need to run one Kafka broker and one KRaft controller for this quick start. ```bash confluent local services kafka-rest start ``` To manually start each service in its own terminal, run instead: ```bash bin/kafka-server-start ./etc/kafka/controller.properties bin/kafka-server-start ./etc/kafka/broker.properties bin/kafka-rest-start ./etc/kafka-rest/kafka-rest.properties ## Add the uberjar to ksqlDB server In order for ksqlDB to be able to load your UDFs, they need to be compiled from classes into an uberjar. Run the following command to build an uberjar: ```bash gradle shadowJar ``` You should now have a directory, `extensions`, with a file named `example-udfs-0.0.1.jar` in it. In order to use the uberjar, you need to make it available to ksqlDB server. Create the following `docker-compose.yml` file: ```yaml version: '2' services: broker: image: confluentinc/cp-enterprise-kafka:8.1.0 hostname: broker container_name: broker ports: - "29092:29092" environment: KAFKA_BROKER_ID: 1 KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:9092,PLAINTEXT_HOST://localhost:29092 KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0 KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1 KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1 schema-registry: image: confluentinc/cp-schema-registry:8.1.0 hostname: schema-registry container_name: schema-registry depends_on: - broker ports: - "8081:8081" environment: SCHEMA_REGISTRY_HOST_NAME: schema-registry ksqldb-server: image: confluentinc/ksqldb-server:8.1.0 hostname: ksqldb-server container_name: ksqldb-server depends_on: - broker - schema-registry ports: - "8088:8088" volumes: - "./extensions/:/opt/ksqldb-udfs" environment: KSQL_LISTENERS: "http://0.0.0.0:8088" KSQL_BOOTSTRAP_SERVERS: "broker:9092" KSQL_KSQL_SCHEMA_REGISTRY_URL: "http://schema-registry:8081" KSQL_KSQL_LOGGING_PROCESSING_STREAM_AUTO_CREATE: "true" KSQL_KSQL_LOGGING_PROCESSING_TOPIC_AUTO_CREATE: "true" # Configuration for UDFs KSQL_KSQL_EXTENSION_DIR: "/opt/ksqldb-udfs" KSQL_KSQL_FUNCTIONS_FORMULA_BASE_VALUE: 5 ksqldb-cli: image: confluentinc/ksqldb-cli:8.1.0 container_name: ksqldb-cli depends_on: - broker - ksqldb-server entrypoint: /bin/sh tty: true ``` Notice that: - A volume is mounted from the local `extensions` directory (containing your uberjar) to the container `/opt/ksqldb-udfs` directory. The latter can be any directory that you like. This command effectively puts the uberjar on ksqlDB server’s file system. - The environment variable `KSQL_KSQL_EXTENSION_DIR` is configured to the same path that was set for the container in the volume mount. This is the path that ksqlDB will look for UDFs in. - The environment variable `KSQL_KSQL_FUNCTIONS_FORMULA_BASE_VALUE` is set to `5`. Recall that in the UDF example, the function loads an external parameter named`ksql.functions.formula.base.value`. All `KSQL_` environment variables are converted automatically to server configuration properties, which is where UDF parameters are looked up. ### Configuring Kafka Encrypted Communication This configuration enables ksqlDB to connect to a Kafka cluster over SSL, with a user supplied trust store: ```properties security.protocol=SSL ssl.truststore.location=/etc/kafka/secrets/kafka.client.truststore.jks ssl.truststore.password=confluent ``` The exact settings will vary depending on the security settings of the Kafka brokers, and how your SSL certificates are signed. For full details, and instructions on how to create suitable trust stores, please refer to the [Security Guide](../../../security/overview.md#security). To use separate trust stores for encrypted communication with Kafka and external communication with ksqlDB clients, prefix the SSL truststore configs with `ksql.streams.`: ```properties security.protocol=SSL ksql.streams.ssl.truststore.location=/etc/kafka/secrets/kafka.client.truststore.jks ksql.streams.ssl.truststore.password=confluent ``` ### Configure Kafka Authentication This configuration enables ksqlDB to connect to a secure Kafka cluster using PLAIN SASL, where the SSL certificates have been signed by a CA trusted by the default JVM trust store. ```none security.protocol=SASL_SSL sasl.mechanism=PLAIN sasl.jaas.config=\ org.apache.kafka.common.security.plain.PlainLoginModule required \ username="" \ password=""; ``` The exact settings will vary depending on what SASL mechanism your Kafka cluster is using and how your SSL certificates are signed. For more information, see the [Security Guide](../../../security/overview.md#security). ### Replicated topic with Avro schema causes errors? Confluent Replicator renames topics during replication, and if there are associated Avro schemas, they aren’t automatically matched with the renamed topics. In the ksqlDB CLI, the `PRINT` statement for a replicated topic works, which shows that the Avro schema ID exists in Schema Registry, and ksqlDB can deserialize the Avro message. But `CREATE STREAM` fails with a deserialization error: ```bash CREATE STREAM pageviews_original (viewtime bigint, userid varchar, pageid varchar) WITH (kafka_topic='pageviews.replica', value_format='AVRO'); [2018-06-21 19:12:08,135] WARN task [1_6] Skipping record due to deserialization error. topic=[pageviews.replica] partition=[6] offset=[1663] (org.apache.kafka.streams.processor.internals.RecordDeserializer:86) org.apache.kafka.connect.errors.DataException: pageviews.replica at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:97) at io.confluent.ksql.serde.connect.KsqlConnectDeserializer.deserialize(KsqlConnectDeserializer.java:48) at io.confluent.ksql.serde.connect.KsqlConnectDeserializer.deserialize(KsqlConnectDeserializer.java:27) ``` The solution is to register schemas manually against the replicated subject name for the topic: ```bash ### DELIMITED | Feature | Supported | |---------------------------------------------------------------------------------------------|-------------| | As value format | Yes | | As key format | Yes | | Multi-Column Keys | Yes | | [Schema Registry required](../operate-and-deploy/installation/server-config/avro-schema.md) | No | | [Schema inference](/reference/server-configuration#ksqlpersistencedefaultformatkey) | No | | [Single field wrapping](#ksqldb-serialization-formats-single-field-unwrapping) | No | | [Single field unwrapping](#ksqldb-serialization-formats-single-field-unwrapping) | Yes | The `DELIMITED` format supports comma-separated values. You can use other delimiter characters by specifying the KEY_DELIMITER and/or VALUE_DELIMITER when you use FORMAT=‘DELIMITED’ in a WITH clause. Only a single character is valid as a delimiter. The default is the comma character. For space- and tab-delimited values, use the special values `SPACE` or `TAB`, not an actual space or tab character. The delimiter is a Unicode character, as defined in `java.lang.Character`. For example, the smiley-face character works: ```sql CREATE STREAM delim_stream (f1 STRING, f2 STRING) with (KAFKA_TOPIC='delim', FORMAT='DELIMITED', VALUE_DELIMITER='☺', ...); ``` The serialized object should be a Kafka-serialized string, which will be split into columns. For example, given a SQL statement such as: ```sql CREATE STREAM x (ORGID BIGINT KEY, ID BIGINT KEY, NAME STRING, AGE INT) WITH (FORMAT='DELIMITED', ...); ``` ksqlDB splits a key of `120,21` and a value of `bob,49` into the four fields (two keys and two values) with `ORGID KEY` of `120`, `ID KEY` of `21`, `NAME` of `bob` and `AGE` of `49`. This data format supports all SQL [data types](sql/data-types.md#ksqldb-reference-data-types) except `ARRAY`, `MAP` and `STRUCT`. - `TIMESTAMP` typed data is serialized as a `long` value indicating the Unix epoch time in milliseconds. - `TIME` typed data is serialized as an `int` value indicating the number of milliseconds since the beginning of the day. - `DATE` typed data is serialized as an `int` value indicating the number of days since the Unix epoch. - `BYTES` typed data is serialized as a Base64-encoded string value. ### KAFKA | Feature | Supported | |---------------------------------------------------------------------------------------------|-------------| | As value format | Yes | | As key format | Yes | | Multi-Column Keys | No | | [Schema Registry required](../operate-and-deploy/installation/server-config/avro-schema.md) | No | | [Schema inference](/reference/server-configuration#ksqlpersistencedefaultformatkey) | No | | [Single field wrapping](#ksqldb-serialization-formats-single-field-unwrapping) | No | | [Single field unwrapping](#ksqldb-serialization-formats-single-field-unwrapping) | Yes | The `KAFKA` format supports `INT`, `BIGINT`, `DOUBLE` and `STRING` primitives that have been serialized using Kafka’s standard set of serializers. The format is designed primarily to support primitive message keys. It can be used as a value format, though certain operations aren’t supported when this is the case. Unlike some other formats, the `KAFKA` format does not perform any type coercion, so it’s important to correctly match the field type to the underlying serialized form to avoid deserialization errors. The table below details the SQL types the format supports, including details of the associated Kafka Java Serializer, Deserializer and Connect Converter classes you would need to use to write the key to Kafka, read the key from Kafka, or use to configure Apache Connect to work with the `KAFKA` format, respectively. | SQL field type | Kafka type | Kafka serializer | Kafka deserializer | Connect converter | |------------------|--------------------------------|-----------------------------------------------------------|-------------------------------------------------------------|-------------------------------------------------------| | INT / INTEGER | A 32-bit signed integer | `org.apache.kafka.common.serialization.IntegerSerializer` | `org.apache.kafka.common.serialization.IntegerDeserializer` | `orgapache.kafka.connect.converters.IntegerConverter` | | BIGINT | A 64-bit signed integer | `org.apache.kafka.common.serialization.LongSerializer` | `org.apache.kafka.common.serialization.LongDeserializer` | `org.apache.kafka.connect.converters.LongConverter` | | DOUBLE | A 64-bit floating point number | `org.apache.kafka.common.serialization.DoubleSerializer` | `org.apache.kafka.common.serialization.DoubleDeserializer` | `org.apache.kafka.connect.converters.DoubleConverter` | | STRING / VARCHAR | A UTF-8 encoded text string | `org.apache.kafka.common.serialization.StringSerializer` | `org.apache.kafka.common.serialization.StringDeserializer` | `org.apache.kafka.connect.storage.StringConverter` | Because the format supports only primitive types, you can only use it when the schema contains a single field. For example, if your Kafka messages have a `long` key, you can make them available to ksqlDB by using a statement like: ```sql CREATE STREAM USERS (ID BIGINT KEY, NAME STRING) WITH (VALUE_FORMAT='JSON', ...); ``` If you integrate ksqlDB with [Confluent Schema Registry](../../schema-registry/index.md#schemaregistry-intro), and your ksqlDB application uses a compatible value format (Avro, JSON_SR, or Protobuf), you can just supply the key column, and ksqlDB loads the value columns from Schema Registry: ```sql CREATE STREAM USERS (ID BIGINT KEY) WITH (VALUE_FORMAT='JSON_SR', ...); ``` The key column must be supplied, because ksqlDB supports only keys in `KAFKA` format. ### Protobuf | Feature | Supported | |-----------------------------------------------------------------------------------------------------------------------------------|--------------------------------------| | As value format | Yes | | As key format | Yes | | Multi-Column Keys | Yes | | [Schema Registry required](../operate-and-deploy/installation/avro-schema.md#ksqldb-installation-configure-serialization-formats) | `PROTOBUF`: Yes, `PROTOBUF_NOSR`: No | | [Schema inference](server-configuration.md#ksqldb-reference-server-configuration-persistence-default-format-key) | `PROTOBUF`: Yes, `PROTOBUF_NOSR`: No | | [Single field wrapping](#ksqldb-serialization-formats-single-field-unwrapping) | Yes | | [Single field unwrapping](#ksqldb-serialization-formats-single-field-unwrapping) | No | Protobuf handles `null` values differently than AVRO and JSON. Protobuf doesn’t have the concept of a `null` value, so the conversion between PROTOBUF and Java (Kafka Connect) objects is undefined. Usually, Protobuf resolves a “missing field” to the default value of its type. - **String:** the default value is the empty string. - **Byte:** the default value is empty bytes. - **Bool:** the default value is `false`. - **Numeric type:** the default value is zero. - **Enum:** the default value is the first defined enum value, which must be zero. - **Message field:** the field is not set. Its exact value is language-dependent. See the generated code guide for details. To enable alternative representations for `null` values in protobuf, protobuf-specific properties can be passed to `CREATE` statements. For example, the following `CREATE` statement will create a protobuf schema that wraps all primitive types into the corresponding standard wrappers (e.g. `google.protobuf.StringValue` for `string`). ```sql CREATE STREAM USERS (ID STRING KEY, i INTEGER, s STRING) WITH (VALUE_FORMAT='PROTOBUF', VALUE_PROTOBUF_NULLABLE_REPRESENTATION='WRAPPER'); ``` This way, `null` can be distinguished from default values. Similarly, when `VALUE_PROTOBUF_NULLABLE_REPRESENTATION` is set to `OPTIONAL`, all fields in protobuf will be declared optional, also allowing `null` primitive fields to be distinguished from default values. The same property values can be used with the `KEY_PROTOBUF_NULLABLE_REPRESENTATION` property to customize the protobuf serialization of the key. ## Replicated topic with Avro schema causes errors The Confluent Replicator renames topics during replication. If there are associated Avro schemas, they are not automatically matched with the renamed topics after replication completes. Using the `PRINT` statement for a replicated topic shows that the Avro schema ID exists in the Schema Registry. ksqlDB can deserialize the Avro message, but the `CREATE STREAM` statement fails with a deserialization error. For example: ```sql CREATE STREAM pageviews_original (viewtime bigint, userid varchar, pageid varchar) WITH (kafka_topic='pageviews.replica', value_format='AVRO'); ``` Example output with a deserialization error: ```none [2018-06-21 19:12:08,135] WARN task [1_6] Skipping record due to deserialization error. topic=[pageviews.replica] partition=[6] offset=[1663] (org.apache.kafka.streams.processor.internals.RecordDeserializer:86) org.apache.kafka.connect.errors.DataException: pageviews.replica at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:97) at io.confluent.ksql.serde.connect.KsqlConnectDeserializer.deserialize(KsqlConnectDeserializer.java:48) at io.confluent.ksql.serde.connect.KsqlConnectDeserializer.deserialize(KsqlConnectDeserializer.java:27) ``` The solution is to register Avro schemas manually against the replicated subject name for the topic. For example: ```bash ### Creating a mirror topic A mirror topic is a read-only topic that reflects all the data and metadata in another topic. Creating a mirror topic with the CLI uses the `kafka-mirrors` tool. Once a mirror topic is created, the mirror automatically begins fetching data from the source topic. For more information, see [Mirror Topics](https://docs.confluent.io/platform/current/multi-dc-deployments/cluster-linking/mirror-topics-cp.html). **Example Command** ```bash kafka-mirrors --create --mirror-topic example-topic \ --link demo-link \ --bootstrap-server localhost:9093 ``` **Example Output** ```bash Created topic example-topic. ``` To create a mirror topic, use `kafka-cluster-links` along with [bootstrap-server](#bootstrap-cluster-links) and the following flags. `--mirror-topic` : (Required) The name of the mirror topic to create. This must match exactly the name of the source topic to mirror over the cluster link. * Type: string `--link` : (Required) The name of the cluster link used to pull data from the source topic. * Type: string `--command-config` : Property file containing configurations to be passed to the [AdminClient](../../installation/configuration/admin-configs.md#cp-config-admin). For example, with security credentials for authorization and authentication. The following are optional configurations when creating a mirror topic: `--config` : A comma-separated list of configs to override when creating the mirror topic. Each config to override should be specified as `name=value`. For more information about which configurations can be set on a mirror topic, see [Configurations](https://docs.confluent.io/platform/current/multi-dc-deployments/cluster-linking/mirror-topics-cp.html#configurations) in Mirror Topics. * Type: string `--replication-factor` : The replication factor of the mirror topic being created. If not supplied, *defaults to the destination cluster’s default*, not the source topic’s replication factor. * Type: string `--source-topic` : The name of the source topic to mirror. Required if the cluster link has a prefix configured. To learn more, see [Prefixing Mirror Topics and Consumer Group Names](https://docs.confluent.io/platform/current/multi-dc-deployments/cluster-linking/mirror-topics-cp.html#prefixing-mirror-topics-and-consumer-group-names). * Type: string You must have `ALTER CLUSTER` authorization to create a mirror topic. #### IMPORTANT Changing the configuration of a topic does not change the replica assignment for the topic partition. Changing the replica placement of a topic configuration must be followed by a partition reassignment. The [confluent-rebalancer](../clusters/rebalancer/configuration-options.md#rebalancer-config-options) command line tool supports reassignment that also accounts for replica placement constraints. To learn more, see [Quick Start for Auto Data Balancing in Confluent Platform](../clusters/rebalancer/quickstart.md#rebalancer). For example, run the commands below to start a reassignment that matches the topic’s replica placement constraints. Note you should use Confluent Platform 5.5 or newer, which now includes `--topics` and `--exclude-internal-topics` flags to limit the set of topics that are eligible for reassignment. This will decrease the overall rebalance scope and therefore time. `--replica-placement-only` can be used to perform reassignment only on partitions that do not satisfy the replica placement constraints. ```none confluent-rebalancer execute --bootstrap-server kafka-west-1:9092 --replica-placement-only --throttle 10000000 --verbose ``` Run this command to monitor the status for the reassignment: ```none confluent-rebalancer status --bootstrap-server kafka-west-1:9092 ``` Run this command to finish the reassignment: ```none confluent-rebalancer finish --bootstrap-server kafka-west-1:9092 ``` For more information and examples, see [Quick Start for Auto Data Balancing in Confluent Platform](../clusters/rebalancer/quickstart.md#rebalancer). ### Consumer lag Replicator has an embedded consumer that reads data from the origin cluster, and it commits its offsets only after the connect worker’s producer has committed the data to the destination cluster (configure the frequency of commits with the parameter `offset.flush.interval.ms`). You can monitor the consumer lag of Replicator’s embedded consumer in the origin cluster (for Replicator instances that copy data from `dc1` to `dc2`, the origin cluster is `dc1`). The ability to monitor Replicator’s consumer lag is enabled when it is configured with `offset.topic.commit=true` (`true` by default), which allows Replicator to commit its own consumer offsets to the origin cluster `dc1` after the messages have been written to the destination cluster. 1. For Replicator copying from `dc1` to `dc2`: Select `dc1` (origin cluster) from the menu on the left and then select **Consumers**. Verify that there are two consumer groups, one for reach Replicator instance running from `dc1` to `dc2`: `replicator-dc1-to-dc2-topic1` and `replicator-dc1-to-dc2-topic2`. Replicator’s consumer lag information is available in Control Center and `kafka-consumer-groups`, but it is not available via JMX. 1. Click on `replicator-dc1-to-dc2-topic1` to view Replicator’s consumer lag in reading `topic1` and `_schemas`. This view is equivalent to: ```text docker-compose exec broker-dc1 kafka-consumer-groups --bootstrap-server broker-dc1:29091 --describe --group replicator-dc1-to-dc2-topic1 ``` ![image](images/c3-consumer-lag-dc1-topic1.png) 2. Click on `replicator-dc1-to-dc2-topic2` to view Replicator’s consumer lag in reading `topic2` (equivalent to `docker-compose exec broker-dc1 kafka-consumer-groups --bootstrap-server broker-dc1:29091 --describe --group replicator-dc1-to-dc2-topic2`) ![image](images/c3-consumer-lag-dc1-topic2.png) 2. For Replicator copying from `dc1` to `dc2`: do not mistakenly try to monitor Replicator consumer lag in the destination cluster `dc2`. Control Center also shows the Replicator consumer lag for topics in `dc2` (i.e., `topic1`, `_schemas`, `topic2.replica`) but this does not mean that Replicator is consuming from them. The reason you see this consumer lag in `dc2` is because by default Replicator is configured with `offset.timestamps.commit=true` for which Replicator commits its own offset timestamps of its consumer group in the `__consumer_offsets` topic in the destination cluster `dc2`. In case of disaster recovery, this enables Replicator to resume where it left off when switching to the secondary cluster. 3. Do not confuse consumer lag with an MBean attribute called `records-lag` associated with Replicator’s embedded consumer. That attribute reflects whether Replicator’s embedded consumer can keep up with the original data production rate, but it does not take into account replication lag producing the messages to the destination cluster. `records-lag` is real time, and it is normal for this value to be `0.0`. ```text docker-compose exec connect-dc2 \ kafka-run-class kafka.tools.JmxTool \ --object-name "kafka.consumer:type=consumer-fetch-manager-metrics,partition=0,topic=topic1,client-id=replicator-dc1-to-dc2-topic1-0" \ --attributes "records-lag" \ --jmx-url service:jmx:rmi:///jndi/rmi://connect-dc2:9892/jmxrmi ``` ## Use Control Center to monitor replicators You can use Control Center to monitor the replicators in your current deployment: 1. Stop Replicator and brokers on both the origin and destination clusters. Press `Ctl-C` in the each command window to stop the processes, but keep the windows open to make it easy to restart each one. 2. Activate the monitoring extension for Replicator by doing the following, as fully described in [Replicator monitoring extension](replicator-monitoring.md#replicator-monitoring-extension). - Add the full path to `replicator-rest-extension-.jar` to your CLASSPATH. - Add `rest.extension.classes=io.confluent.connect.replicator.monitoring.ReplicatorMonitoringExtension` to `my-examples/replication.properties`. 3. Uncomment or add the following lines to the Kafka configuration files for both the destination and origin, `my-examples/server_destination.properties` and `my-examples/server_origin.properties`, respectively. The configuration for `confluent.metrics.reporter.bootstrap.servers` must point to `localhost` on port `9092` in both files, so you may need to edit one or both of these port numbers. (Searching on `confluent.metrics` will take you to these lines in the files.) ```none confluent.metrics.reporter.topic.replicas=1 metric.reporters=io.confluent.metrics.reporter.ConfluentMetricsReporter confluent.metrics.reporter.bootstrap.servers=localhost:9092 ``` - The first line indicates to Control Center that your deployment is in development mode, using a replication factor of `1`. - The other two lines enable metrics reporting on Control Center, and provide access to the Confluent internal topic that collects and stores the monitoring data. 4. Edit `etc/confluent-control-center/control-center-dev.properties` to add the following two lines that specify origin and destination bootstrap servers for Control Center, as is required for monitoring multiple clusters. (A convenient place to add these lines is near the top of the file under “Control Center Settings”, immediately after the line that specifies `confluent.controlcenter.id`.) ```bash # multi-cluster monitoring confluent.controlcenter.kafka.origin.bootstrap.servers=localhost:9082 confluent.controlcenter.kafka.destination.bootstrap.servers=localhost:9092 ``` 5. Restart the brokers on the destination and origin clusters with the same commands used above, for example: ```none ./bin/kafka-server-start my-examples/server_destination.properties ``` ```none ./bin/kafka-server-start my-examples/server_origin.properties ``` 6. Restart Replicator and the Connect worker with the same command as above. For example: ```none ./bin/replicator --cluster.id replicator --consumer.config my-examples/consumer.properties --producer.config my-examples/producer.properties --replication.config my-examples/replication.properties --whitelist 'test-topic' ``` 7. Launch Control Center with the following command. ```none ./bin/control-center-start etc/confluent-control-center/control-center-dev.properties ``` If no port is defined in `control-center-dev.properties`, Control Center runs by default on port `9021`, as described in [Control Center for Confluent Platform](https://docs.confluent.io/control-center/current/overview.html). This is the desired config for this deployment. 8. Open Control Center at [http://localhost:9021/](http://localhost:9021/) in your web browser. The clusters are rendered on Control Center with auto-generated names, based on your configuration. ![image](images/c3-replicators-multi-cluster.png) 9. (Optional) On Control Center, edit the cluster names to suit your use case, as described in [Origin and Destination clusters](https://docs.confluent.io/control-center/current/replicators.html#origin-and-destination-clusters) in “Replicators” in the Control Center User Guide. 10. On Control Center, select the destination cluster, click **Replicators** on the navigation panel, and use Control Center to monitor replication performance and drill down on source and replicated topics. ![image](images/c3-replicators-all.png) To see messages produced to both the original and replicated topic on Control Center, try out `kafka-consumer-perf-test` in its own command window to auto-generate test data to `test-topic`. ```none kafka-producer-perf-test \ --producer-props bootstrap.servers=localhost:9082 \ --topic test-topic \ --record-size 1000 \ --throughput 1000 \ --num-records 3600000 ``` The command provides status output on messages sent, as shown: ```none 4999 records sent, 999.8 records/sec (0.95 MB/sec), 1.1 ms avg latency, 240.0 ms max latency. 5003 records sent, 1000.2 records/sec (0.95 MB/sec), 0.5 ms avg latency, 4.0 ms max latency. 5003 records sent, 1000.2 records/sec (0.95 MB/sec), 0.6 ms avg latency, 5.0 ms max latency. 5001 records sent, 1000.2 records/sec (0.95 MB/sec), 0.3 ms avg latency, 3.0 ms max latency. 5001 records sent, 1000.0 records/sec (0.95 MB/sec), 0.3 ms avg latency, 4.0 ms max latency. 5000 records sent, 1000.0 records/sec (0.95 MB/sec), 0.8 ms avg latency, 24.0 ms max latency. 5001 records sent, 1000.2 records/sec (0.95 MB/sec), 0.6 ms avg latency, 3.0 ms max latency. ... ``` Like before, you can consume these messages from the command line, using kafka-console-consumer to verify that the replica topic is receiving them: ```none ./bin/kafka-console-consumer --from-beginning --topic test-topic.replica --bootstrap-server localhost:9092 ``` You can also verify this on Control Center. Navigate to `test-topic` on the origin cluster to view messages on the original topic, and to `test-topic.replica` on the destination to view messages on the replicated topic. ![image](images/c3-replicator-topic-drilldown-messages.png) 11. To learn more about monitoring Replicators in Control Center, see [“Replicators” in Control Center User Guide](https://docs.confluent.io/control-center/current/replicators.html). 12. When you have completed your experiments with the tutorial, be sure to perform clean up as follows: - Stop any producers and consumers using `Ctl-C` in the each command window. - Use `Ctl-C` in each command window to stop each service in reverse order to which you started them (stop Control Center first, then Replicator, and finally the Kafka brokers. #### Test your Replicator Following is a generic Replicator testing scenario. A similar testing strategy is covered with more context as a part of the Replicator tutorial in the section [Configure and run Replicator](replicator-quickstart.md#config-and-run-replicator). 1. Create a test topic. If you haven’t already, create a topic named `test-topic` in the source cluster with the following command. ```none ./bin/kafka-topics --create --topic test-topic --replication-factor \ 1 --partitions 4 --bootstrap-server localhost:9082 ./bin/kafka-topics --describe --topic test-topic.replica --bootstrap-server localhost:9092 ``` The `kafka-topics --describe --topic` step in the above command checks whether `test-topic.replica` exits. After verifying that the topic exists, confirm that four partitions were created. In general, the Replicator makes sure that the destination topic has at least as many partitions as the source topic. It is fine if it has more, but because the Replicator preserves the partition assignment of the source data, any additional partitions will not be utilized. 2. Send data to the source cluster. At any time after you’ve created the topic in the source cluster, you can begin sending data to it using a Kafka producer to write to `test-topic` in the source cluster. You can then confirm that the data has been replicated by consuming from `test-topic.replica` in the destination cluster. For example, to send a sequence of numbers using Kafka’s console producer, you can use the following command. ```none seq 10000 | ./bin/kafka-console-producer --topic test-topic --broker-list localhost:9082 ``` 3. Run a consumer to confirm that the destination cluster got the data. You can then confirm delivery in the destination cluster using the console consumer. ```none ./bin/kafka-console-consumer --from-beginning --topic test-topic.replica \ --bootstrap-server localhost:9092 ``` ### Handling differences between preregistered and client-derived schemas The following properties can be configured in any client using a Schema Registry serializer (producers, streams, Connect). These are described specifically for connectors in [Kafka Connect converters](/platform/current/connect/concepts.html#converters), including full reference documentation in the section, [Configuration Options](/platform/current/schema-registry/connect.html#configuration-options). - `auto.register.schemas` - Specify if the serializer should attempt to register the schema with Schema Registry. - `use.latest.version` - Only applies when `auto.register.schemas` is set to `false`. If `auto.register.schemas` is set to `false` and `use.latest.version` is set to `true`, then instead of deriving a schema for the object passed to the client for serialization, Schema Registry will use the latest version of the schema in the subject for serialization. - `latest.compatibility.strict` - The default is `true`, but this only applies when `use.latest.version=true`. If both properties are `true`, a check is performed during serialization to verify that the latest subject version is backward compatible with the schema of the object being serialized. If the check fails, an error is thrown. If `latest.compatibility.strict` is `false`, then the latest subject version is used for serialization, without any compatibility check. Relaxing the compatibility requirement (by setting `latest.compatibility.strict` to `false`) may be useful, for example, when using [schema references](#referenced-schemas). The following table summarizes serializer behaviors based on the configurations of these three properties. | auto.register.schemas | use.latest.version | latest.compatibility.strict | Behavior | |-------------------------|----------------------|-------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | **true** | **(true or false)** | **(true or false)** | The serializer will attempt to register the schema with Schema Registry by deriving a schema for the object passed to the client for serialization. When `auto.register.schemas` is set to `true`, `use.latest.version` and `latest.compatibility.strict` are ignored, so it doesn’t matter how those are set; `auto.register.schemas` overrides them. | | **false** | **true** | **false** | Schema Registry will use the latest version of the schema in the subject for serialization. | | **false** | **true** | **true** | The serializer performs a check to verify that the latest subject version is backward compatible with the schema of the object being serialized. If the check fails, the serializer throws an error. | Here are two scenarios where you may want to disable schema auto-registration, and enable `use.latest.version`: - **Using schema references to combine multiple events in the same topic** - You can use [Schema references](#referenced-schemas) as a way to combine multiple events in the same topic. Disabling schema auto-registration is integral to this configuration for Avro and JSON Schema serializers. Examples of configuring serializers to use the latest schema version instead of auto-registering schemas are provided in the sections on [combining multiple event types in the same topic (Avro)](serdes-avro.md#multiple-event-types-same-topic-avro) and [combining multiple event types in the same topic (JSON)](serdes-json.md#multiple-event-types-same-topic-json). - **Ramping up production efficiency by disabling schema auto-registration and avoiding “Schema not found” exceptions** - Sometimes subtle (but not semantically significant) differences can exist between a pre-registered schema and the schema used by the client when using code-generated classes from the pre-registered schema with a Schema Registry aware serializer. An example of this is with Protobuf, where a fully-qualified type name such as `google.protobuf.Timestamp` may code-generate a descriptor with the type name `.google.protobuf.Timestamp`. Schema Registry considers these two variations of the same type name to be different. With auto-registration enabled, this would result in auto-registering two essentially identical schemas. With auto-registration disabled, this can cause a “Schema not found”. To configure the serializer to not register new schemas and ignore minor differences between client and registered schemas which could cause unexpected “Schema not found” exceptions, set these properties in your serializer configuration: ```properties auto.register.schemas=false use.latest.version=true latest.compatibility.strict=false ``` The `use.latest.version` sets the serializer to retrieve the latest schema version for the subject, and use that for validation and serialization, ignoring the client’s schema. The assumption is that if there are any differences between client and latest registered schema, they are minor and backward compatible. ### Adding security credentials The [test drive](#sr-test-drive-avro) examples show how to use the producer and consumer console clients as serializers and deserializers by passing Schema Registry properties on the command line and in config files. In addition to examples given in the “Test Drives”, you can pass truststore and keystore credentials for the Schema Registry, as described in [Additional configurations for HTTPS](/platform/current/schema-registry/security/index.html#additional-configurations-for-https) Here is an example for the producer on Confluent Platform: ```bash kafka-avro-console-producer --bootstrap-server localhost:9092 \ --property schema.registry.url=http://localhost:8081 --topic transactions-avro \ --property value.schema='{"type":"record","name":"Transaction","fields":[{"name":"id","type":"string"},{"name": "amount", "type": "double"}]}' \ --property schema.registry.ssl.truststore.location=/etc/kafka/security/schema.registry.client.truststore.jks \ --property schema.registry.ssl.truststore.password=myTrustStorePassword ``` ### Adding security credentials The [test drive](#sr-test-drive-json-schema) examples show how to use the producer and consumer console clients as serializers and deserializers by passing Schema Registry properties on the command line and in config files. In addition to examples given in the “Test Drives”, you can pass truststore and keystore credentials for the Schema Registry, as described in [Additional configurations for HTTPS](/platform/current/schema-registry/security/index.html#additional-configurations-for-https). Here is an example for the producer on Confluent Platform: ```bash kafka-json-schema-console-producer --bootstrap-server localhost:9092 \ --property schema.registry.url=http://localhost:8081 --topic transactions-json \ --property value.schema='{"type":"object", "properties":{"id":{"type":"string"}, "amount":{"type":"number"} }, "additionalProperties": false}' \ --property schema.registry.ssl.truststore.location=/etc/kafka/security/schema.registry.client.truststore.jks \ --property schema.registry.ssl.truststore.password=myTrustStorePassword ``` ### Adding security credentials The [test drive](#sr-test-drive-protobuf) examples show how to use the producer and consumer console clients as serializers and deserializers by passing Schema Registry properties on the command line and in config files. In addition to examples given in the “Test Drives”, you can pass truststore and keystore credentials for the Schema Registry, as described in [Additional configurations for HTTPS](/platform/current/schema-registry/security/index.html#additional-configurations-for-https). Here is an example for the producer on Confluent Platform: ```bash protobuf-console-producer --broker-list localhost:9093 --topic myTopic \ --producer.config ~/ect/kafka/producer.properties\ --property value.schema='syntax = "proto3"; message MyRecord {string id = 1; float amount = 2; string customer_id=3;}' \ --property schema.registry.url=https://localhost:8081 \ --property schema.registry.ssl.truststore.location=/etc/kafka/security/schema.registry.client.truststore.jks \ --property schema.registry.ssl.truststore.password=myTrustStorePassword ``` #### Configure the Confluent Control Center properties files In the Control Center properties file, you will use the default ports for `bootstrap.servers` and `zookeeper.connect`, but modify and add several other configurations. 1. Copy the default Control Center properties file to use as a basis for a specialized Control Center properties file for this tutorial: ```bash cp $CONFLUENT_HOME/etc/confluent-control-center/control-center-dev.properties $CONFLUENT_HOME/etc/confluent-control-center/control-center-multi-sr.properties ``` 2. Append the following lines to the end of the file. These update some defaults and add new configurations to match the server and Schema Registry setups in previous steps: ```bash echo "confluent.controlcenter.kafka.AK1.bootstrap.servers=localhost:9093" >> $CONFLUENT_HOME/etc/confluent-control-center/control-center-multi-sr.properties ``` ```bash echo "confluent.controlcenter.streams.cprest.url=http://0.0.0.0:8090" >> $CONFLUENT_HOME/etc/confluent-control-center/control-center-multi-sr.properties ``` ```bash echo "confluent.controlcenter.kafka.AK1.cprest.url=http://0.0.0.0:8091" >> $CONFLUENT_HOME/etc/confluent-control-center/control-center-multi-sr.properties ``` ```bash echo "confluent.controlcenter.schema.registry.SR-AK1.url=http://localhost:8082" >> $CONFLUENT_HOME/etc/confluent-control-center/control-center-multi-sr.properties ``` #### Example Producer Code When constructing the producer, configure the message value class to use the application’s code-generated `Payment` class. For example: ```java ... import io.confluent.kafka.serializers.KafkaAvroSerializer; ... props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class); props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, KafkaAvroSerializer.class); ... KafkaProducer producer = new KafkaProducer(props)); final Payment payment = new Payment(orderId, 1000.00d); final ProducerRecord record = new ProducerRecord(TOPIC, payment.getId().toString(), payment); producer.send(record); ... ``` Because the `pom.xml` includes `avro-maven-plugin`, the `Payment` class is automatically generated during compile. In this example, the connection information to the Kafka brokers and Schema Registry is provided by the configuration file that is passed into the code, but if you want to specify the connection information directly in the client application, see [this java template](https://github.com/confluentinc/examples/tree/latest/ccloud/template_delta_configs/java_producer_consumer.delta). For a full Java producer example, refer to [the producer example](https://github.com/confluentinc/examples/tree/latest/clients/avro/src/main/java/io/confluent/examples/clients/basicavro/ProducerExample.java). ## Configure the Kafka broker to connect to Schema Registry In cases where [broker-side schema validation](../schema-validation.md#schema-validation) is enabled on topics, the Kafka Broker attempts to connect to Schema Registry. Provide the following configurations in the broker properties file to allow the broker to connect to Schema Registry for validation. For example, if using [KRaft](../../kafka-metadata/kraft.md#kraft-overview), you would configure this in one of `$CONFLUENT_HOME/etc/kafka/broker.properties`, `controller.properties`, or `server.properties`, depending on your [KRaft setup](../../kafka-metadata/config-kraft.md#kraft-config-options). If role-based access control (RBAC) is enabled, the principal defined here should have appropriate permissions. ```bash ### Configure Schema Registry to communicate with RBAC services The next set of examples show how to connect a local Schema Registry to a remote Metadata Service (MDS) running RBAC. The `schema.registry.properties` file configurations reflect a remote Metadata Service (MDS) URL, location, and Kafka cluster ID. Also, the examples assume you are using credentials you got from your Security administrator for a pre-configured schema registry principal user (”service principal”), as mentioned in the prerequisites. Define these settings in `CONFLUENT_HOME/etc/schema-registry/schema-registry.properties`: 1. Configure Schema Registry authorization for communicating with the RBAC Kafka cluster. The `username` and `password` are RBAC credentials for the Schema Registry service principal, and metadataServerUrls is the location of your RBAC Kafka cluster (for example, a URL to an ec2 server). ```bash # Authorize Schema Registry to talk to Kafka (security protocol may also be SASL_SSL if using TLS/SSL) kafkastore.security.protocol=SASL_PLAINTEXT kafkastore.sasl.mechanism=OAUTHBEARER kafkastore.sasl.login.callback.handler.class=io.confluent.kafka.clients.plugins.auth.token.TokenUserLoginCallbackHandler kafkastore.sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required \ username="" \ password="" \ metadataServerUrls="://:"; ``` 2. Configure RBAC authorization, and bearer/basic authentication, for the Schema Registry resource. These settings can be used as-is, JETTY_AUTH is the recommended authentication mechanism. ```bash # These properties install the Schema Registry security plugin, and configure it to use RBAC for # authorization and OAuth for authentication resource.extension.class=io.confluent.kafka.schemaregistry.security.SchemaRegistrySecurityResourceExtension confluent.schema.registry.authorizer.class=io.confluent.kafka.schemaregistry.security.authorizer.rbac.RbacAuthorizer rest.servlet.initializor.classes=io.confluent.common.security.jetty.initializer.InstallBearerOrBasicSecurityHandler confluent.schema.registry.auth.mechanism=JETTY_AUTH ``` 3. Tell Schema Registry how to communicate with the Kafka cluster running the Metadata Service (MDS) and how to authenticate requests using a public key. - The value for `confluent.metadata.bootstrap.server.urls` can be the same as `metadataServerUrls`, depending on your environment. - In this step, you need a public key file to use to verify requests with token-based authorization, as mentioned in the prerequisites. ```bash # The location of the metadata service confluent.metadata.bootstrap.server.urls=://: # Credentials to use with the MDS, these should usually match those used for talking to Kafka confluent.metadata.basic.auth.user.info=: confluent.metadata.http.auth.credentials.provider=BASIC # The path to public keys that should be used to verify json web tokens during authentication public.key.path= ``` For additional configurations available to any client communicating with MDS, see also [REST client configurations](../../kafka/configure-mds/mds-configuration.md#rest-client-mds-config) in the Confluent Platform Security documentation. 4. Specify the `kafkastore.bootstrap.server` you want to use. The default is a commented out line for a local server. If you do not change this or uncomment it, the default will be used. ```bash #kafkastore.bootstrap.servers=PLAINTEXT://localhost:9092 ``` Uncomment this line and set it to the address of your bootstrap server. This may be different from the MDS server URL. The standard port for the Kafka bootstrap server is `9092`. ```bash kafkastore.bootstrap.servers=:9092 ``` 5. (Optional) Specify a custom `schema.registry.group.id` (to serve as Schema Registry cluster ID) which is different from the default, **schema-registry**. In the example, `schema.registry.group.id` is set to “schema-registry-cool-cluster”. ```bash # Schema Registry group id, which is the cluster id # The default for |sr| cluster ID is **schema-registry** schema.registry.group.id=schema-registry-cool-cluster ``` 6. (Optional) Specify a custom name for the Schema Registry default topic. (The default is **\_schemas**.) In the example, `schema.registry.group.id` is set to `_jax-schemas-topic`. ```bash # The name of the topic to store schemas in # The default schemas topic is **_schemas** kafkastore.topic=_jax-schemas-topic ``` 7. (Optional) Enable anonymous access to requests that occur without authentication. Any requests that occur without authentication are automatically granted the principal `User:ANONYMOUS` ```bash # This enables anonymous access with a principal of User:ANONYMOUS confluent.schema.registry.anonymous.principal=true authentication.skip.paths=/* ``` If you get the following error about not having authorization when you run the `curl` command to list subjects as described in [Start Schema Registry and test it](#rbac-start-and-test-sr), you can enable anonymous requests to bypass the authentication temporarily while you troubleshoot credentials. ```bash curl localhost:8081/subjects Error 401 Unauthorized

HTTP ERROR 401

Problem accessing /subjects. Reason:

    Unauthorized


Powered by Jetty:// 9.4.18.v20190429
``` ## Confluent Replicator Confluent Replicator is a type of Kafka source connector that replicates data from a source to destination Kafka cluster. An embedded consumer inside Replicator consumes data from the source cluster, and an embedded producer inside the Kafka Connect worker produces data to the destination cluster. Replicator version 4.0 and earlier requires a connection to ZooKeeper in the origin and destination Kafka clusters. If ZooKeeper is configured for authentication, the client configures the ZooKeeper security credentials via the global JAAS configuration setting `-Djava.security.auth.login.config` on the Connect workers, and the ZooKeeper security credentials in the origin and destination clusters must be the same. To configure Confluent Replicator security, you must configure the Replicator connector as shown below and additionally you must configure: * [Kafka Connect](#authentication-ssl-connect) To add TLS to the Confluent Replicator embedded consumer, modify the Replicator JSON properties file. This example is a subset of configuration properties to add for TLS encryption and authentication. The assumption here is that client authentication is required by the brokers. ```bash { "name":"replicator", "config":{ .... "src.kafka.ssl.truststore.location":"/etc/kafka/secrets/kafka.connect.truststore.jks", "src.kafka.ssl.truststore.password":"confluent", "src.kafka.ssl.keystore.location":"/etc/kafka/secrets/kafka.connect.keystore.jks", "src.kafka.ssl.keystore.password":"confluent", "src.kafka.ssl.key.password":"confluent", "src.kafka.security.protocol":"SSL" .... } } } ``` ## Schema Registry Schema Registry uses Kafka to persist schemas, and so it acts as a client to write data to the Kafka cluster. Therefore, if the Kafka brokers are configured for security, you should also configure Schema Registry to use security. You may also refer to the complete list of [Schema Registry configuration options](../../../schema-registry/installation/config.md#schemaregistry-config). The following is an example subset of `schema-registry.properties` configuration parameters to add for TLS encryption and authentication. The assumption here is that client authentication is required by the brokers. ```bash kafkastore.bootstrap.servers=SSL://kafka1:9093 kafkastore.security.protocol=SSL kafkastore.ssl.truststore.location=/var/private/ssl/kafka.client.truststore.jks kafkastore.ssl.truststore.password=test1234 kafkastore.ssl.keystore.location=/var/private/ssl/kafka.client.keystore.jks kafkastore.ssl.keystore.password=test1234 kafkastore.ssl.key.password=test1234 ``` ## Configure Confluent Replicator Confluent Replicator is a type of Kafka source connector that replicates data from a source to destination Kafka cluster. An embedded consumer inside Replicator consumes data from the source cluster, and an embedded producer inside the Kafka Connect worker produces data to the destination cluster. Replicator version 4.0 and earlier requires a connection to ZooKeeper in the origin and destination Kafka clusters. If ZooKeeper is configured for authentication, the client configures the ZooKeeper security credentials via the global JAAS configuration setting `-Djava.security.auth.login.config` on the Connect workers, and the ZooKeeper security credentials in the origin and destination clusters must be the same. To configure Confluent Replicator security, you must configure the Replicator connector as shown below and additionally you must configure: * [Kafka Connect](#sasl-plain-connect-workers) Configure Confluent Replicator to use SASL/PLAIN by adding these properties in the Replicator’s JSON configuration file. The JAAS configuration property defines `username` and `password` used by Replicator to configure the user for connections. In this example, Replicator connects to the broker as user `replicator`. ```bash { "name":"replicator", "config":{ .... "src.kafka.security.protocol" : "SASL_SSL", "src.kafka.sasl.mechanism" : "PLAIN", "src.kafka.sasl.jaas.config" : "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"replicator\" password=\"replicator-secret\";", .... } } } ``` ## Configure Schema Registry Schema Registry uses Kafka to persist schemas, and so it acts as a client to write data to the Kafka cluster. Therefore, if the Kafka brokers are configured for security, you should also configure Schema Registry to use security. You may also refer to the complete list of [Schema Registry configuration options](../../../../schema-registry/installation/config.md#schemaregistry-config). 1. Here is an example subset of `schema-registry.properties` configuration parameters to add for SASL authentication: ```bash kafkastore.bootstrap.servers=kafka1:9093 ## Configure Kafka Connect The following configurations are required for Kafka Connect worker operations, like group coordination and internal topic management when Confluent Server brokers require client authentication. Replace the placeholders with your actual values. ```properties bootstrap.servers=:9096 security.protocol=SSL ssl.truststore.location=/path/to/truststore.jks ssl.truststore.password= ssl.keystore.location=/path/to/keystore.jks ssl.keystore.password= ssl.key.password= producer.security.protocol=SSL producer.ssl.truststore.location=/path/to/truststore.jks producer.ssl.truststore.password= producer.ssl.keystore.location=/path/to/keystore.jks producer.ssl.keystore.password= producer.ssl.key.password= consumer.security.protocol=SSL consumer.ssl.truststore.location=/path/to/truststore.jks consumer.ssl.truststore.password= consumer.ssl.keystore.location=/path/to/keystore.jks consumer.ssl.keystore.password= consumer.ssl.key.password= listeners.https.ssl.client.auth=required listeners.https.ssl.truststore.location=/path/to/truststore.jks listeners.https.ssl.truststore.password= listeners.https.ssl.keystore.location=/path/to/keystore.jks listeners.https.ssl.keystore.password= listeners.https.ssl.key.password= ``` To allow for request forwarding from follower to leader on mTLS, Kafka Connect workers need to be configured on the secure impersonation super user list on MDS. ### Run example 1. Clone the [confluentinc/examples](https://github.com/confluentinc/examples) GitHub repository, and check out the `8.1.0-post` branch. ```bash git clone https://github.com/confluentinc/examples.git cd examples git checkout 8.1.0-post ``` 2. Navigate to `security/rbac/scripts` directory. ```bash cd security/rbac/scripts ``` 3. You have two options to run the example. - Option 1: run the example end-to-end for all services ```bash ./run.sh ``` - Option 2: step through it one service at a time ```bash ./init.sh ./enable-rbac-broker.sh ./enable-rbac-schema-registry.sh ./enable-rbac-connect.sh ./enable-rbac-rest-proxy.sh ./enable-rbac-ksqldb-server.sh ./enable-rbac-control-center.sh ``` 4. After you run the example, view the configuration files: ```bash # The original configuration bundled with Confluent Platform ls /tmp/original_configs/ ``` ```bash # Configurations added to each service's properties file ls ../delta_configs/ ``` ```bash # The modified configuration = original + delta ls /tmp/rbac_configs/ ``` 5. After you run the example, view the log files for each of the services. All logs are saved in the temporary directory `/tmp/rbac_logs/`. In that directory, you can step through the configuration properties for each of the services: ```bash connect control-center kafka kafka-rest ksql-server schema-registry ``` 6. In this example, the metadata service (MDS) logs are saved under your Confluent Platform installation directory. ```bash cat $CONFLUENT_HOME/logs/metadata-service.log ``` #### Broker - Additional RBAC configurations required for [server.properties](https://github.com/confluentinc/examples/tree/latest/security/rbac/delta_configs/server.properties.delta) ```none # Confluent Authorizer Settings # Semi-colon separated list of super users in the format : # For example super.users=User:admin;User:mds super.users=User:ANONYMOUS;User:mds # MDS Server Settings confluent.metadata.topic.replication.factor=1 # MDS Token Service Settings confluent.metadata.server.token.key.path=/tmp/tokenKeypair.pem # Configure the RBAC Metadata Service authorizer authorizer.class.name=io.confluent.kafka.security.authorizer.ConfluentServerAuthorizer confluent.authorizer.access.rule.providers=CONFLUENT,ZK_ACL # Bind Metadata Service HTTP service to port 8090 confluent.metadata.server.listeners=http://0.0.0.0:8090 # Configure HTTP service advertised hostname. Set this to http://127.0.0.1:8090 if running locally. confluent.metadata.server.advertised.listeners=http://127.0.0.1:8090 # HashLoginService Initializer confluent.metadata.server.authentication.method=BEARER confluent.metadata.server.user.store=FILE confluent.metadata.server.user.store.file.path=/tmp/login.properties # Add named listener TOKEN to existing listeners and advertised.listeners listeners=TOKEN://:9092,PLAINTEXT://:9093 advertised.listeners=TOKEN://localhost:9092,PLAINTEXT://localhost:9093 # Add protocol mapping for newly added named listener TOKEN listener.security.protocol.map=PLAINTEXT:PLAINTEXT,TOKEN:SASL_PLAINTEXT listener.name.token.sasl.enabled.mechanisms=OAUTHBEARER # Configure the public key used to verify tokens # Note: username, password and metadataServerUrls must be set if used for inter-broker communication listener.name.token.oauthbearer.sasl.jaas.config= \ org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required \ publicKeyPath="/tmp/tokenPublicKey.pem"; # Set SASL callback handler for verifying authentication token signatures listener.name.token.oauthbearer.sasl.server.callback.handler.class=io.confluent.kafka.server.plugins.auth.token.TokenBearerValidatorCallbackHandler # Set SASL callback handler for handling tokens on login. This is essentially a noop if not used for inter-broker communication. listener.name.token.oauthbearer.sasl.login.callback.handler.class=io.confluent.kafka.server.plugins.auth.token.TokenBearerServerLoginCallbackHandler # Settings for Self-Balancing Clusters confluent.balancer.topic.replication.factor=1 # Settings for Audit Logging confluent.security.event.logger.exporter.kafka.topic.replicas=1 ``` - Role bindings: ```bash # Broker Admin confluent iam rbac role-binding create --principal User:$USER_ADMIN_SYSTEM --role SystemAdmin --kafka-cluster $KAFKA_CLUSTER_ID # Producer/Consumer confluent iam rbac role-binding create --principal User:$USER_CLIENT_A --role ResourceOwner --resource Topic:$TOPIC1 --kafka-cluster $KAFKA_CLUSTER_ID confluent iam rbac role-binding create --principal User:$USER_CLIENT_A --role DeveloperRead --resource Group:console-consumer- --prefix --kafka-cluster $KAFKA_CLUSTER_ID ``` # These credentials authorize ksqlDB Server to access the Kafka cluster. sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required \ metadataServerUrls="http://" username="" password=""; ``` Save the file and restart ksqlDB Server. Log in to the server by using the ksqlDB CLI. ```bash ksql --config-file https://: --user --password ``` RBAC for ksqlDB depends on the Confluent Platform [Metadata Service (MDS)](overview.md#metadata-service) and the Confluent Server Authorizer. The `confluent.metadata` settings configure the Metadata Service. The `ksql.security.extension.class` setting configures ksqlDB for the Confluent Server Authorizer. For more information, see [Configure Confluent Server Authorizer in Confluent Platform](../../csa-introduction.md#confluent-server-authorizer). Use the ksqlDB service principal credentials for the following settings. - `sasl.jaas.config` for authorizing to the Kafka cluster with Confluent Server Authorizer - `confluent.metadata.basic.auth.user.info` for authorizing to MDS - `ksql.schema.registry.basic.auth.user.info` for authorizing to Schema Registry ### POST /security/1.0/principals/{principal}/roles/{roleName}/resources **Look up the rolebindings for the principal at the given scope/cluster using the given role.** Callable by Admins. * **Parameters:** * **principal** (*string*) – Fully-qualified KafkaPrincipal string for a user or group. * **roleName** (*string*) – The name of the role. **Example request:** ```http POST /security/1.0/principals/{principal}/roles/{roleName}/resources HTTP/1.1 Host: example.com Content-Type: application/json { "clusterName": "string", "clusters": { "kafka-cluster": "string", "connect-cluster": "string", "ksql-cluster": "string", "schema-registry-cluster": "string", "cmf": "string", "flink-environment": "string" } } ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – Granted **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json [ { "resourceType": "Topic", "name": "clicksTopic1", "patternType": "LITERAL" }, { "resourceType": "Topic", "name": "orders-2019", "patternType": "PREFIXED" } ] ``` * *default* – Error Response **Example response:** ```http HTTP/1.1 default - Content-Type: application/json { "status_code": 1, "error_code": 1, "type": "string", "message": "string", "errors": [ { "error_type": "string", "message": "string" } ] } ``` ### POST /security/1.0/lookup/principal/{principal}/resources **Look up the resource bindings for the principal at the given scope/cluster.** Includes bindings from groups that the user belongs to. Callable by Admins+User. * **Parameters:** * **principal** (*string*) – Fully-qualified KafkaPrincipal string for a user or group. **Example request:** ```http POST /security/1.0/lookup/principal/{principal}/resources HTTP/1.1 Host: example.com Content-Type: application/json { "clusterName": "string", "clusters": { "kafka-cluster": "string", "connect-cluster": "string", "ksql-cluster": "string", "schema-registry-cluster": "string", "cmf": "string", "flink-environment": "string" } } ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – Nested map of principal-to-role-to-resources. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "User:alice": { "DeveloperRead": [ { "resourceType": "Topic", "name": "billing-invoices", "patternType": "LITERAL" } ] }, "Group:Investors": { "DeveloperRead": [ { "resourceType": "Topic", "name": "investing-", "patternType": "PREFIXED" } ] } } ``` * *default* – Error Response **Example response:** ```http HTTP/1.1 default - Content-Type: application/json { "status_code": 1, "error_code": 1, "type": "string", "message": "string", "errors": [ { "error_type": "string", "message": "string" } ] } ``` ### POST /security/1.0/lookup/principal/{principal}/resource/{resourceType}/operation/{operation} **Summarizes what resources and rolebindings this principal is allowed to create.** Callable by Admins+User. * **Parameters:** * **principal** (*string*) – Fully-qualified KafkaPrincipal string for a user or group. * **resourceType** (*string*) – The type of resource to create or the type of resource to specify when creating a new rolebinding. * **operation** (*string*) – “Create” for creating an actual resource, “AlterAccess” for creating a rolebinding for a user. **Example request:** ```http POST /security/1.0/lookup/principal/{principal}/resource/{resourceType}/operation/{operation} HTTP/1.1 Host: example.com Content-Type: application/json { "clusters": { "kafka-cluster": "string", "connect-cluster": "string", "ksql-cluster": "string", "schema-registry-cluster": "string", "cmf": "string", "flink-environment": "string" } } ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – A deduped and squashed view of the user’s rolebindings for creating resources or rolebindings. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "result": "SOME", "resourcePatterns": [ { "resourceType": "Topic", "name": "billing-invoices", "patternType": "LITERAL" }, { "resourceType": "Topic", "name": "investing-", "patternType": "PREFIXED" } ] } ``` * *default* – Error Response **Example response:** ```http HTTP/1.1 default - Content-Type: application/json { "status_code": 1, "error_code": 1, "type": "string", "message": "string", "errors": [ { "error_type": "string", "message": "string" } ] } ``` #### Configure audit log destination The `destinations` option identifies the audit log cluster, which is provided by the bootstrap server. Use this setting to identify the communication channel between your audit log cluster and Kafka. You can use the `bootstrap_server` setting to deliver audit log messages to a specific cluster set aside for the sole purpose of retaining them. This ensures that no one can access or tamper with your organization’s audit logs, and enables you to selectively conduct more in-depth auditing of sensitive data, while keeping log volumes down for less sensitive data. If you deliver audit logs to another cluster, you must configure the connection to that cluster. Configure this connection as you would any producer writing to the cluster, using the prefix `confluent.security.event.logger.exporter.kafka` for the producer configuration keys, including the appropriate authentication information. For example, if you have a Kafka cluster listening on port 9092 of the host `audit.example.com`, and that cluster accepts SCRAM-SHA-256 authentication and has a principal named `confluent-audit` that is allowed to connect and produce to the audit log topics, the configuration would look like the following: ```json confluent.security.event.router.config=\ { \ "destinations": { \ "bootstrap_servers": ["audit.example.com:9092"], \ "topics": { \ "confluent-audit-log-events": { \ "retention_ms": 7776000000 \ } \ } \ }, \ "default_topics": { \ "allowed": "confluent-audit-log-events", \ "denied": "confluent-audit-log-events" \ } \ } confluent.security.event.logger.exporter.kafka.sasl.mechanism=SCRAM-SHA-256 confluent.security.event.logger.exporter.kafka.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required \ username="confluent-audit" \ password="secretP@ssword123"; ``` Bootstrap servers may be provided in either the router configuration JSON or a producer configuration property; if they appear in both places, the router configuration takes precedence. #### Configure audit log topic management on the destination cluster MDS manages the audit log topics on the destination cluster, creating missing topics, and keeping the retention time policies of those topics in sync with the audit log configuration policy. For MDS to do this, you must configure the admin client used by MDS to connect to the destination cluster. Use the `confluent.security.event.logger.destination.admin.` prefix when configuring the admin client in the MDS cluster’s `server.properties` file. Other than the prefix requirement, this configuration is similar to other admin client configurations. This connection must be consistent with the producer configuration on this and all of the managed clusters. For details about the properties specified here, refer to [Kafka AdminClient Configurations for Confluent Platform](../../../installation/configuration/admin-configs.md#cp-config-admin) and Kafka [AdminClient](/platform/current/clients/javadocs/javadoc/org/apache/kafka/clients/admin/AdminClient.html). **SASL_SSL Configuration** ```none confluent.security.event.logger.destination.admin.bootstrap.servers= confluent.security.event.logger.destination.admin.security.protocol=SASL_SSL confluent.security.event.logger.destination.admin.sasl.mechanism=PLAIN confluent.security.event.logger.destination.admin.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \ username="" \ password=""; confluent.security.event.logger.destination.admin.ssl.truststore.location= confluent.security.event.logger.destination.admin.ssl.truststore.password= ``` ## Quick start Prerequisite : * The [Confluent Platform must be installed](../../../installation/overview.md#installation). * The [Confluent CLI](https://docs.confluent.io/confluent-cli/current/installing.html) must be installed. 1. Create a directory for storing the `security.properties` file. For example: ```text mkdir /usr/secrets/ ``` 2. Generate the master encryption key based on a passphrase. Typically, a passphrase is much longer than a password and is easily remembered as a string of words (for example,\`\`Data in motion\`\`). You can specify the passphrase either in clear text on the command line, or store it in a file. A best practice is to enter this passphrase into a file and then pass it to the CLI (specified as `--passphrase @`). By using a file, you can avoid the logging history, which shows the passphrase in plain text. Choose a location for the secrets file on your local host (not a location where Confluent Platform services run). The secrets file contains encrypted secrets for the master encryption key, data encryption key, and configuration parameters, along with metadata, such as which cipher was used for encryption. ```text confluent secret master-key generate \ --local-secrets-file /usr/secrets/security.properties \ --passphrase @ ``` Your output should resemble: ```text Save the master key. It cannot be retrieved later. +------------+----------------------------------------------+ | Master Key | abC12DE+3fG45Hi67J8KlmnOpQr9s0Tuv+w1x2y3zab= | +------------+----------------------------------------------+ ``` 3. Save the master key because *it cannot be retrieved later*. 4. Export the master key in the environment variable, or add the master key to a bash script. #### IMPORTANT The subsequent [confluent secret](https://docs.confluent.io/confluent-cli/current/command-reference/secret/index.html) commands will fail if the environment variable is not set. ```text export CONFLUENT_SECURITY_MASTER_KEY=abC12DE+3fG45Hi67J8KlmnOpQr9s0Tuv+w1x2y3zab= ``` 5. Encrypt the specified configuration parameters. This step encrypts the properties specified by `--config` in the configuration file specified by `--config-file`. The property values are read from the configuration file, encrypted, and written to the local secrets file specified by `--local-secrets-file`. In place of the property values, instructions that are written into the configuration file allow the configuration resolution system to retrieve the secret values at runtime. The file path you specify in `--remote-secrets-file` is written into the configuration instructions and identifies where the resolution system can locate the secrets file at runtime. If you are running the secrets command centrally and distributing the secrets file to each node, then specify the eventual path of the secrets file in `--remote-secrets-file`. If you plan to run the secrets command on each node, then the `remote-secrets-file` should match the location specified by `--local-secrets-file`. #### NOTE Updates specified with `--local-secrets-file` flag modify the `security.properties` file. For every broker where you specify `--local-secrets-file`, you can store the `security.properties` file in a different location, which you specify using the `--remote-secrets-file`. For example, when encrypting a broker: - In `--local-secrets-file`, specify the file where the Confluent CLI will add and/or modify encrypted parameters. This modifies the `security.properties` file. - In `--remote-secrets-file`, specify the location of `security.properties` file that the broker will reference. If the `--config` flag is not specified, any property that contains the string `password` is encrypted in the configuration key. When running `encrypt` use a comma to specify multiple keys, for example: `--config "config.storage.replication.factor,config.storage.topic"`. This option is not available when using the `add` or `update` commands. Use the following example command to encrypt the `config.storage.replication.factor` and `config.storage.topic` parameters: ```text confluent secret file encrypt --config-file /etc/kafka/connect-distributed.properties \ --local-secrets-file /usr/secrets/security.properties \ --remote-secrets-file /usr/secrets/security.properties \ --config "config.storage.replication.factor,config.storage.topic" ``` You should see a similar entry in your `security.properties` file. This example shows the encrypted `config.storage.replication.factor` parameter. ```text config.storage.replication.factor = ${securepass:/usr/secrets/security.properties:connect-distributed.properties/config.storage.replication.factor} ``` 6. Decrypt the encrypted configuration parameter. ```text confluent secret file decrypt \ --local-secrets-file /usr/secrets/security.properties \ --config-file /etc/kafka/connect-distributed.properties \ --output-file decrypt.txt ``` You should see the decrypted parameter. This example shows the decrypted `config.storage.replication.factor` parameter. ```text config.storage.replication.factor=1 ``` ## Configure TLS encryption for Replicator Confluent Replicator is a type of Kafka source connector that replicates data from a source to destination Kafka cluster. An embedded consumer inside Replicator consumes data from the source cluster, and an embedded producer inside the Kafka Connect worker produces data to the destination cluster. Replicator version 4.0 and earlier requires a connection to ZooKeeper in the origin and destination Kafka clusters. If ZooKeeper is configured for authentication, the client configures the ZooKeeper security credentials via the global JAAS configuration setting `-Djava.security.auth.login.config` on the Connect workers, and the ZooKeeper security credentials in the origin and destination clusters must be the same. To configure Confluent Replicator security, you must configure the Replicator connector as shown below and additionally you must configure: * [Kafka Connect](#encryption-ssl-connect) To add TLS encryption to the Confluent Replicator embedded consumer, modify the Replicator JSON properties file. Here is an example subset of configuration properties to add for TLS encryption: ```bash { "name":"replicator", "config":{ .... "src.kafka.ssl.truststore.location":"/etc/kafka/secrets/kafka.connect.truststore.jks", "src.kafka.ssl.truststore.password":"confluent", "src.kafka.security.protocol":"SSL" .... } } } ``` ## Configure Kafka Connect From the perspective of the Confluent Server brokers, Kafka Connect is another Kafka client, and this tutorial configures Kafka Connect for TLS/SSL encryption and SASL/PLAIN authentication. Enabling Connect for security is simply a matter of passing the security configurations to the Connect workers, the producers used by source connectors, and the consumers used by sink connectors. Take the basic client security configuration: ```bash security.protocol=SASL_SSL ssl.truststore.location=/var/ssl/private/kafka.client.truststore.jks ssl.truststore.password=test1234 sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \ username="client" \ password="client-secret"; ``` And configure Kafka Connect for the following: * Top-level for Connect workers, with no additional configuration prefix * Embedded producer for source connectors, with an additional configuration prefix `producer.` * Embedded consumers for sink connectors, with an additional configuration prefix `consumer.` Combining these configurations, a Kafka Connect worker configuration for TLS/SSL encryption and SASL/PLAIN authentication is the following. You may configure these settings in the `connect-distributed.properties` file. ```bash ## Example 6: JDBC source connector with Avro to ksqlDB -> Key:Long and Value:Avro - [Kafka Connect JDBC source connector](https://github.com/confluentinc/examples/tree/latest/connect-streams-pipeline/jdbcavroksql-connector.json) produces Avro values, and null keys, to a Kafka topic. ```none { "name": "test-source-sqlite-jdbc-autoincrement-jdbcavroksql", "config": { "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector", "tasks.max": "1", "key.converter": "org.apache.kafka.connect.json.JsonConverter", "key.converter.schemas.enable": "false", "value.converter": "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url": "http://schema-registry:8081", "value.converter.schemas.enable": "true", "connection.url": "jdbc:sqlite:/usr/local/lib/retail.db", "mode": "incrementing", "incrementing.column.name": "id", "topic.prefix": "jdbcavroksql-", "table.whitelist": "locations" } } ``` - [ksqlDB](https://github.com/confluentinc/examples/tree/latest/connect-streams-pipeline/jdbcavro_statements.sql) reads from the Kafka topic and then uses `PARTITION BY` to create a new stream of messages with `BIGINT` keys. ![image](streams/images/example_6.jpg) ### Configure consumer connection 1. Define the configuration for the consumer listener on the source cluster. The following is an example with TLS and SASL/PLAIN enabled: ```yaml kafka_connect_replicator_consumer_listener: ssl_enabled: true sasl_protocol: plain ``` 2. Define the basic configuration for the consumer client connection: ```yaml kafka_connect_replicator_consumer_bootstrap_servers: ``` 3. Define the security configuration for the consumer client connection: ```yaml kafka_connect_replicator_consumer_ssl_ca_cert_path: kafka_connect_replicator_consumer_ssl_cert_path: kafka_connect_replicator_consumer_ssl_key_path: kafka_connect_replicator_consumer_ssl_key_password: ``` 4. Define custom properties for each client connection: ```yaml kafka_connect_replicator_consumer_custom_properties: ``` 5. For RBAC-enabled deployment, define the additional client custom properties. Specify either the Kafka cluster id (`kafka_connect_replicator_consumer_kafka_cluster_id`) or the cluster name (`kafka_connect_replicator_consumer_kafka_cluster_name`). ```yaml kafka_connect_replicator_consumer_erp_tls_enabled: kafka_connect_replicator_consumer_erp_host: kafka_connect_replicator_consumer_erp_admin_user: kafka_connect_replicator_consumer_erp_admin_password: kafka_connect_replicator_consumer_kafka_cluster_id: kafka_connect_replicator_consumer_kafka_cluster_name: kafka_connect_replicator_consumer_erp_pem_file: ``` ### Configure OAuth authentication using client credentials To enable credential-based OAuth on all Confluent Platform components, where clients authenticate with server using a client ID and a password, set the following variables: ```yaml all: vars: auth_mode: oauth oauth_superuser_client_id: oauth_superuser_client_password: oauth_sub_claim: client_id oauth_groups_claim: groups oauth_token_uri: oauth_issuer_url: oauth_jwks_uri: oauth_expected_audience: Confluent,account,api://default schema_registry_oauth_user: schema_registry_oauth_password: kafka_rest_oauth_user: kafka_rest_oauth_password: kafka_connect_oauth_user: kafka_connect_oauth_password: ksql_oauth_user: ksql_oauth_password: control_center_next_gen_oauth_user: control_center_next_gen_oauth_password: # Only needed when OAuth IdP server has TLS enabled with custom certificate. oauth_idp_cert_path: ``` For an example inventory file for a greenfield credential-based OAuth configuration, see the sample inventory file at: ```html https://github.com/confluentinc/cp-ansible/blob/8.1.0-post/docs/sample_inventories/oauth_greenfield.yml ``` ### Required settings for RBAC with centralized MDS To enable and configure RBAC with the centralized MDS, add the additional mandatory variables in your inventory file to. **Enable RBAC centralized MDS with Ansible** ```none all: vars: external_mds_enabled: true ``` **Provide the centralized MDS bootstrap URLs** Specify the URL for the MDS REST API on the Kafka cluster hosting MDS: ```none all: vars: mds_bootstrap_server_urls: ``` For example: ```none all: vars: mds_bootstrap_server_urls: https://ip-172-31-34-246.us-east-1.compute.internal:8090,https://ip-172-31-34-246.us-east-2.compute.internal:8090 ``` **Provide the centralized MDS bootstrap servers** Specify a list of the hostnames and ports for the listeners hosting the MDS that you wish to connect to: `:,:` ```none all: vars: mds_broker_bootstrap_servers: ``` For example: ```none all: vars: mds_broker_bootstrap_servers: ip-172-31-43-14.us-west-1.compute.internal:9093,ip-172-31-43-14.us-west-2.compute.internal:9093 ``` **Provide the centralized MDS broker listener security configuration** Specify the security settings of the remote Kafka broker that the centralized MDS runs on: (`mds_broker_bootstrap_servers`): ```none all: vars: mds_broker_listener: ssl_enabled: --- [1] ssl_client_authentication: --- [2] ssl_mutual_auth_enabled: --- [3] sasl_protocol: --- [4] ``` * [1] Set `ssl_enabled` to `true` if the remote MDS uses TLS. * [2] Set `ssl_mutual_auth_enabled` to `true` if the remote MDS uses mTLS. * [3] Set `ssl_client_authentication` to `required` if the remote MDS uses mTLS. * [4] Set `sasl_protocol` to the SASL protocol for the remote MDS. Options are: `none`, `kerberos`, `sasl_plain`, `sasl_scram` The MDS listener must have an authentication mode, mTLS, Kerberos, SASL/PLAIN, or SASL/SCRAM. You can set `sasl_protocol` to `none` only if `ssl_enabled` ([1]) is set to `true` and `ssl_client_authentication` ([2]) is set to `required`, therefor specifying mTLS authentication mode for the listener. The following example is for mTLS on the centralized MDS brokers: ```none all: vars: mds_broker_listener: ssl_enabled: true ssl_mutual_auth_enabled: true sasl_protocol: none ``` **Provide the paths to the centralized MDS server certificates and key pair for OAuth** ```none all: vars: create_mds_certs: false token_services_public_pem_file: token_services_private_pem_file: ``` ## Enable RBAC with ACL authorizer This section describes the workflow to enable RBAC in non-RBAC or ACL-based Confluent Platform deployments using an ACL authorizer. You can reference the [sample inventory files](https://github.com/confluentinc/cp-ansible/tree/master/docs/sample_inventories) for example non-RBAC to RBAC migration setups. If your clusters have an authorizer and all required ACLs, you can start at Step 4. For clusters without an authorizer, Steps 1-3 are needed to first enable authorization. Then you can migrate to RBAC. 1. Configure ACL authorizer and add super users for broker principals. If an ACL authorizer was already configured, you do not need to do a rolling restart. For KRaft-based clusters, an authorizer must be added in both the KRaft controller and broker. ```yaml kafka_broker_custom_properties: authorizer.class.name: org.apache.kafka.metadata.authorizer.StandardAuthorizer allow.everyone.if.no.acl.found: "true" super.users: "User:admin" kafka_controller_custom_properties: authorizer.class.name: org.apache.kafka.metadata.authorizer.StandardAuthorizer allow.everyone.if.no.acl.found: "true" super.users: "User:admin" ``` * `allow.everyone.if.no.acl.found=true` is set for zero downtime after enabling the authorizer. * You need to add broker’s principal in `super.users`. The `admin` user is used as a broker principal example in the above snippet. 2. Perform a rolling restart of the KRaft controllers and Kafka brokers. ```bash ansible-playbook -i confluent.platform.all \ --skip-tags package \ -e deployment_strategy=rolling \ --tags kafka_controller,kafka_broker ``` You can skip this step if ACLs are already enabled in the cluster. 3. Create ACLs for broker principals and user principals of all applications, including Confluent Platform components. When a new ACL is added, all the users who previously had access will lose access to that resource since it was previously set to allow all before the new ACL is added. There might be downtime for clients here between adding an authorizer and adding ACLs. 4. Add the custom broker listener and update all Confluent Platform components to communicate on that listener. An example snippet: ```yaml kafka_broker_custom_listeners: internal_client_listener: name: CUSTOM_LISTENER port: 9095 ssl_enabled: false sasl_protocol: plain schema_registry_kafka_listener_name: internal_client_listener kafka_connect_kafka_listener_name: internal_client_listener kafka_rest_kafka_listener_name: internal_client_listener ksql_kafka_listener_name: internal_client_listener control_center_next_gen_kafka_listener_name: internal_client_listener ``` 5. Run the following command to update the listener used for Kafka to Confluent Platform communication: ```bash ansible-playbook -i confluent.platform.all \ --skip-tags package \ -e deployment_strategy=rolling ``` 6. Enable RBAC and Metadata Service (MDS) on Kafka brokers. 1. Remove the simple authorizer properties added in Step 1. 2. Comment out the `*_kafka_listener_name` variables set in step 4. This will ensure Kafka to Confluent Platform communication via OAuthbearer on the internal listener once RBAC is enabled. 3. Add the variables to enable RBAC. Example snippet of RBAC with OAuth: ```yaml rbac_enabled: true auth_mode: oauth oauth_superuser_client_id: superuser oauth_superuser_client_password: my-secret oauth_sub_claim: client_id oauth_groups_claim: groups oauth_token_uri: https://oauth1:8443/realms/cp-ansible-realm/protocol/openid-connect/token oauth_issuer_url: https://oauth1:8443/realms/cp-ansible-realm oauth_jwks_uri: https://oauth1:8443/realms/cp-ansible-realm/protocol/openid-connect/certs oauth_expected_audience: Confluent,account,api://default schema_registry_oauth_user: schema_registry schema_registry_oauth_password: my-secret kafka_rest_oauth_user: kafka_rest kafka_rest_oauth_password: my-secret ``` 4. Run the command to enable RBAC in all Confluent Platform components. ```bash ansible-playbook -i confluent.platform.all \ --skip-tags package \ -e deployment_strategy=rolling ``` 7. Configure RBAC role bindings for resources of other components. This includes all the external Kafka clients and the clients of Confluent Platform components. # Configure and Deploy Unified Stream Manager Using Ansible Playbooks for Confluent Platform Confluent Unified Stream Manager connects customer-managed on-premises clusters with Confluent Cloud to enable Confluent Cloud features for Confluent Platform clusters. The Unified Stream Manager Agent acts as a centralized proxy/gateway for Kafka, and Ansible Playbooks for Confluent Platform (Confluent Ansible) acts as a tool to deploy the Unified Stream Manager Agent in a virtual environment. The Ansible roles and playbooks automate Unified Stream Manager Agent deployment, configuration, TLS setup, authentication, credential handling, health checks, and integration with other Confluent Platform components (Kafka, KRaft, and Connect). This topic presents the steps and guidance for deploying Unified Stream Manager with Confluent Ansible. It is part of the [Registering your Confluent Platform Kafka cluster in Confluent Cloud](http://docs.confluent.io/platform/current/usm/get-started.html#registration-process-overview) process. Review the steps described in the registration topic before proceeding with Unified Stream Manager deployment. The high-level workflow to deploy Unified Stream Manager with Confluent Ansible is as follows: 1. [Review the prerequisites and considerations](#ansible-usm-requirements). 2. [Register your Confluent Platform cluster in Confluent Cloud](http://docs.confluent.io/platform/current/usm/get-started.html#registration-process-overview). 3. [Configure and deploy Unified Stream Manager Agent](#ansible-usm-configure). 4. [Complete the registration process for the Unified Stream Manager Agent](#ansible-usm-registration). ## Configure and deploy Unified Stream Manager Agent You can use the sample inventory files for the Unified Stream Manager Agent in the following GitHub repository as references. ```bash https://github.com/confluentinc/cp-ansible/blob/8.1.0-post/docs/sample_inventory/usm/ ``` To configure and deploy Unified Stream Manager Agent: 1. Update your existing inventory file to add the Unified Stream Manager Agent host. For example: ```yaml usm_agent: hosts: usm-agent1.confluent.io: ``` 2. Configure the Unified Stream Manager Agent Confluent Cloud settings in the inventory file. ```yaml all: vars: ccloud_endpoint: --- [1] ccloud_environment_id: --- [2] ccloud_credential: --- [3] username: password: ``` The values are available in the output file generated when you perform the first step in the registration process. See [Generate the configuration file](https://docs.confluent.io/cloud/current/usm/register/deploy-agent.html#generate-and-download-the-configuration-file). * [1] Specify the Confluent Cloud endpoint. Use the `FRONTDOOR_URL` value in the generated output file, and append the port, for example, `https://api.us-west-2.aws.confluent.cloud:443`. * [2] Specify the Confluent Cloud Environment ID. * [3] Specify the Confluent Cloud credentials. 3. [Configure the Unified Stream Manager Agent authentication settings](#ansible-usm-authentication). 4. [Configure other settings for the Unified Stream Manager Agent in the inventory file](#ansible-usm-configurations). 5. Validate Unified Stream Manager Agent configuration. ```bash ansible-playbook -i hosts.yml confluent.platform.all --tags=validate_usm_agent_configs ``` 6. Deploy the Unified Stream Manager Agent. Once the inventory file is updated with the Unified Stream Manager Agent host and the required configurations, use the Confluent Ansible playbook to deploy the Unified Stream Manager Agent. For metadata and metrics to flow from Kafka, KRaft, and Connect clusters to the Unified Stream Manager Agent, the following Confluent Platform components also need to be redeployed: - KRaft controller - Kafka brokers - Connect Deploy the Unified Stream Manager Agent and the above-mentioned Confluent Platform components together using one of the two methods below. The playbooks ensure that the Unified Stream Manager Agent is deployed first, followed by the other components in the correct order, to prevent “connection refused” errors. To install all components together: ```bash ansible-playbook -i hosts.yml confluent.platform.all ``` To install only Unified Stream Manager Agent and related Confluent Platform components together: ```bash ansible-playbook -i hosts.yml confluent.platform.all --tags=usm_agent,kafka_controller,kafka_broker,kafka_connect ``` ## Annual commitments Confluent Cloud offers the ability to make a commitment to a minimum amount of spend over a specified time period. This commitment gives you access to discounts and provides the flexibility to use this commitment across the entire Confluent Cloud stack, including any [Kafka cluster type](../clusters/cluster-types.md#cloud-cluster-types), [ksqlDB on Confluent Cloud](../ksqldb/overview.md#cloud-ksqldb-create-stream-processing-apps), [Connectors](../connectors/overview.md#kafka-connect-cloud), and [Support](https://www.confluent.io/confluent-cloud/support/). With annual commitments, you can view the total amount of accrued usage during the commitment term and the amount of time left on your commitment. If you use more than your committed amount, you can continue using Confluent Cloud without interruption. You will be charged at your discounted rate for usage beyond the committed amount until the end of your commitment term. Commitments are minimums, and there is no negative impact to exceeding your committed usage. If you exceed this minimum, overage charges will be billed to the payment method set for your organization. [Contact Confluent](https://confluent.io/contact) to learn more about annual commitments, or review these topics: - [Get Started with Confluent Cloud on the AWS Marketplace with Commitments](ccloud-aws-ubb.md#ccloud-aws-market-ubb) - [Get Started with Confluent Cloud on the Azure Marketplace with Commitments](ccloud-azure-ubb.md#ccloud-azure-market-ubb) - [Get Started with Confluent Cloud on the Google Cloud Marketplace with Commitments](ccloud-gcp-ubb.md#ccloud-gcp-market-ubb) # Architectural considerations for streaming applications on Confluent Cloud This guide covers key architectural considerations when building streaming applications on Confluent Cloud, including cluster planning, event-driven patterns, and real-time integration strategies. Understanding these concepts helps you design scalable, resilient streaming applications that leverage Confluent Cloud’s comprehensive event streaming platform capabilities. Key topics covered: - **Cluster configuration planning** - Critical decisions that impact your streaming application. - **Data schema architecture and governance** - Schema Registry, data contracts, and governance patterns for data quality and compatibility. - **Stream processing integration** - Real-time data processing with Apache Flink®. - **Serverless architectures** - Event-driven, elastic streaming patterns. - **Stateless microservices** - Distributed, event-driven service architectures. - **Cloud-native streaming** - Applications designed for cloud-native event streaming. - **Network and security architecture** - Networking patterns and access control strategies. - **Multi-tenancy and resource management** - Shared cluster patterns and quota management. - **Observability and monitoring patterns** - Operational monitoring and compliance strategies. ## Benchmarking Benchmark testing is important because there is no one-size-fits-all recommendation for the configuration parameters you need to develop Kafka applications to Confluent Cloud. Proper configuration always depends on the use case, other features you have enabled, the data profile, and more. You should run benchmark tests if you plan to tune Kafka clients beyond the defaults. Regardless of your service goals, you should understand what the performance profile of your application is—it is especially important when you want to optimize for throughput or latency. Your benchmark tests can also feed into the calculations for determining the correct number of partitions and the number of producer and consumer processes. First, measure your bandwidth using the Kafka tools `kafka-producer-perf-test` and `kafka-consumer-perf-test`. For non-JVM clients that wrap [librdkafka](https://github.com/edenhill/librdkafka), you can use the [rdkafka_performance](https://github.com/edenhill/librdkafka/blob/master/examples/rdkafka_performance.c) interface. This first round of results provides a baseline performance to your Confluent Cloud instance, taking application logic out of the equation. Note that these perf tools do not support Schema Registry. Then test your application, starting with the default Kafka configuration parameters, and familiarize yourself with the default values. Determine the baseline input performance profile for a given producer by removing dependencies on anything upstream from the producer. Rather than receiving data from upstream sources, modify your producer to generate its own mock data at high output rates, such that the data generation is not a bottleneck. Ensure the mock data reflects the type of data used in production to produce results that more accurately reflect performance in production. Or, instead of using mock data, consider using copies of production data or cleansed production data in your benchmarking. If you test with compression, be aware of how the [mock data](https://www.confluent.io/blog/easy-ways-generate-test-data-kafka/) is generated. Sometimes mock data is unrealistic, containing repeated substrings or being padded with zeros, which may result in a better compression performance than what would be seen in production. 1. Run a single producer client on a single server and measure the resulting throughput using the available JMX metrics for the Kafka producer. Repeat the producer benchmarking test, increasing the number of producer processes on the server in each iteration to determine the number of producer processes per server to achieve the highest throughput. 2. Determine the baseline output performance profile for a given consumer in a similar way. Run a single consumer client on a single server and repeat this test, increasing the number of consumer processes on the server in each iteration to determine the number of consumer processes per server to achieve the highest throughput. 3. Run benchmark tests for different permutations of configuration parameters that reflect your service goals. Focus on a subset of configuration parameters, and avoid the temptation to discover and change other parameters from their default values without understanding exactly how they impact the entire system. Tune the settings on each iteration, run a test, observe the results, tune again, and so on, until you identify settings that work for your throughput and latency requirements. [Refer to this blog post](https://www.confluent.io/blog/apache-kafka-supports-200k-partitions-per-cluster) when considering partition count in your benchmark tests. # Kafka Producer for Confluent Cloud An Apache Kafka® Producer is a client application that publishes (writes) events to a Kafka cluster. This section gives an overview of the Kafka producer and an introduction to the configuration settings for tuning. The Kafka producer is conceptually much simpler than the consumer since it does not need group coordination. A producer **partitioner** maps each message to a topic partition, and the producer sends a produce request to the leader of that partition. The partitioners shipped with Kafka guarantee that all messages with the same non-empty key will be sent to the same partition. ### Run Replicator as a connector The connector JSON should look like this: ```json { "name": "replicate-topic", "config": { "connector.class": "io.confluent.connect.replicator.ReplicatorSourceConnector", "key.converter": "io.confluent.connect.replicator.util.ByteArrayConverter", "value.converter": "io.confluent.connect.replicator.util.ByteArrayConverter", "src.kafka.ssl.endpoint.identification.algorithm":"https", "src.kafka.sasl.mechanism":"PLAIN", "src.kafka.request.timeout.ms":"20000", "src.kafka.bootstrap.servers":"", "src.kafka.retry.backoff.ms":"500", "src.kafka.sasl.jaas.config":"org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";", "src.kafka.security.protocol":"SASL_SSL", "dest.kafka.ssl.endpoint.identification.algorithm":"https", "dest.kafka.sasl.mechanism":"PLAIN", "dest.kafka.request.timeout.ms":"20000", "dest.kafka.bootstrap.servers":"", "dest.kafka.retry.backoff.ms":"500", "dest.kafka.sasl.jaas.config":"org.apache.kafka.common.security.plain.PlainLoginModule required username=\"\" password=\"\";", "dest.kafka.security.protocol":"SASL_SSL", "dest.topic.replication.factor":"3", "topic.regex":".*" } } ``` If you have not already done so, configure the distributed Connect cluster correctly as shown here. ```none ssl.endpoint.identification.algorithm=https sasl.mechanism=PLAIN bootstrap.servers= sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="" password=""; security.protocol=SASL_SSL producer.ssl.endpoint.identification.algorithm=https producer.sasl.mechanism=PLAIN producer.bootstrap.servers= producer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="" password=""; producer.security.protocol=SASL_SSL key.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.json.JsonConverter group.id=connect-replicator config.storage.topic=connect-configs1 offset.storage.topic=connect-offsets1 status.storage.topic=connect-statuses1 plugin.path=/share/java ``` To learn more, see [Run Replicator as a Connector](/platform/current/multi-dc-deployments/replicator/replicator-run.html#run-crep-as-a-connector) in the [Replicator documentation](/platform/current/multi-dc-deployments/index.html). ## Configure properties There are three config files for consumer, producer, and replication. The minimal configuration changes for these are shown below. * `consumer.properties` ```none bootstrap.servers= ssl.endpoint.identification.algorithm=https sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="" password=""; security.protocol=SASL_SSL ``` * `producer.properties` ```none bootstrap.servers= ssl.endpoint.identification.algorithm=https sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="" password=""; security.protocol=SASL_SSL ``` * `replication.properties` Replace “Movies” for the `topic.whitelist` with the topics you want to replicate from the source cluster. ```none topic.whitelist=Movies topic.rename.format=${topic}-replica topic.auto.create=true topic.timestamp.type=CreateTime dest.topic.replication.factor=3 ``` #### NOTE Confluent does not support this script. If you encounter any problems, you can [file an issue](https://github.com/confluentinc/examples/issues) in GitHub. Running this script will generate delta configurations for: * Confluent Platform Components: * Schema Registry * ksqlDB Data Generator * ksqlDB * Confluent Replicator * Confluent Control Center (Legacy) * Kafka Connect * Kafka connector * Kafka command line tools * Kafka Clients: * Java (Producer/Consumer) * Java (Streams) * Python * .NET * Go * Node.js * C++ * OS: * ENV file ## Connect a JavaScript application to Confluent Cloud To configure JavaScript clients for Kafka to connect to a Kafka cluster in Confluent Cloud: 1. Install the Confluent JavaScript client for Kafka: ```bash npm install @confluentinc/kafka-javascript ``` 2. Configure your JavaScript application with the connection properties. You can obtain these from the Confluent Cloud Console by selecting your cluster and clicking **Clients**. 3. Use the configuration in your producer or consumer code: ```javascript const { Kafka } = require('@confluentinc/kafka-javascript'); const kafka = new Kafka({ kafkaJS: { brokers: ['your-bootstrap-servers'], ssl: true, sasl: { mechanism: 'plain', username: 'your-api-key', password: 'your-api-secret' } } }); // Create producer or consumer const producer = kafka.producer(); ``` 4. See the [JavaScript client examples](https://github.com/confluentinc/confluent-kafka-javascript/tree/master/examples) for complete working examples. 5. Integrate with your environment. #### **Sample configurations for authentication swapping** *SASL/PLAIN to SASL/OAUTHBEARER example:* ```yaml gateway: routes: - name: gateway security: auth: swap ssl: ignoreTrust: false truststore: type: PKCS12 location: /opt/ssl/client-truststore.p12 password: file: /opt/secrets/client-truststore.password keystore: type: PKCS12 location: /opt/ssl/gw-keystore.p12 password: file: /opt/secrets/gw-keystore.password keyPassword: value: inline-password clientAuth: required swapConfig: clientAuth: sasl: mechanism: PLAIN callbackHandlerClass: "org.apache.kafka.common.security.plain.PlainServerCallbackHandler" # (Optional if mechanism=SSL) jaasConfig: # (required if SASL/PLAIN) file: /opt/gateway/gw-users.conf connectionsMaxReauthMs: 0 # optional. link: https://docs.confluent.io/platform/current/installation/configuration/broker-configs.html#connections-max-reauth-ms secretstore: s1 clusterAuth: sasl: mechanism: OAUTHBEARER jaasConfig: file: /opt/gateway/cluster-login.tmpl.conf oauth: # required only if clusterAuth.sasl.mechanism=oauth tokenEndpointUri: "https://idp.mycompany.io:8080/realms/cp/protocol/openid-connect/token" ``` *mTLS to SASL/OAUTHBEARER example:* ```yaml gateway: routes: - name: gateway security: auth: swap ssl: ignoreTrust: false truststore: type: PKCS12 location: /opt/ssl/client-truststore.p12 password: file: /opt/secrets/client-truststore.password keystore: type: PKCS12 location: /opt/ssl/gw-keystore.p12 password: file: /opt/secrets/gw-keystore.password keyPassword: value: inline-password clientAuth: required swapConfig: clientAuth: ssl: principalMappingRules: "RULE:^CN=([a-zA-Z0-9._-]+),OU=.*$/$1/,RULE:^UID=([a-zA-Z0-9._-]+),.*$/$1/,DEFAULT" secretStore: "oauth-secrets" clusterAuth: sasl: mechanism: OAUTHBEARER callbackHandlerClass: "org.apache.kafka.common.security.oauthbearer.OAuthBearerValidatorCallbackHandler" jaasConfig: file: "/etc/gateway/cluster-jaas.tmpl.conf" oauth: tokenEndpointUri: "https://idp.mycompany.io:8080/realms/cp/protocol/openid-connect/token" ``` ## Get started To provision and configure Confluent Gateway, refer to the detailed guides available for both Docker and Confluent for Kubernetes (CFK) deployments. The documentation includes step-by-step installation, configuration for streaming domains and routes, and security recommendations. * [Configure and Deploy](gateway-deploy-overview.md#gateway-deploy-overview) * [Migrate Kafka Clusters](gateway-migrate.md#gateway-client-switchover) * [Set up Network Isolation and Custom Domains](gateway-custom-domains.md#gateway-custom-domains) ### Local install 1. Install [Confluent Platform](/platform/current/installation/index.html). 2. Customize the `/etc/confluent-kafka-mqtt/kafka-mqtt-dev.properties` properties file, specifying: - The Confluent Cloud Endpoint that you saved earlier for the bootstrap server. - Security information including the Confluent Cloud API key and secret you created in the previous section. - Topic information. ```text # add bootstrap server bootstrap.servers=pkc-12345.us-west2.gcp.confluent.cloud:9092 #configure connection to Confluent Cloud producer.security.protocol=SASL_SSL producer.sasl.mechanism=PLAIN producer.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="" password=""; # configure topic settings topic.regex.list=temperature:.*temperature confluent.topic.replication.factor=3 ``` 3. Start the MQTT proxy, specifying the properties file: ```bash bin/kafka-mqtt-start etc/confluent-kafka-mqtt/kafka-mqtt-dev.properties ``` ## Suggested reading - To learn how to migrate schemas from an on-premises (self-managed) Schema Registry to Confluent Cloud, see [Migrate Schemas](/platform/current/schema-registry/installation/migrate.html). - To configure and run native Confluent Cloud Schema Registry, see [Quick Start for Schema Management on Confluent Cloud](../get-started/schema-registry.md#cloud-sr-config). - For more information about running a cluster, see the [Schema Registry](/platform/current/schema-registry/installation/deployment.html) documentation. - To view a working example of hybrid Apache Kafka® clusters from self-hosted to Confluent Cloud, see [cp-demo](/platform/current/tutorials/cp-demo/docs/index.html). - For example configs for all Confluent Platform components and clients connecting to Confluent Cloud, see [template examples for components](https://github.com/confluentinc/examples/tree/latest/ccloud/template_delta_configs). - To look at all the code used in the Confluent Cloud demo, see the [Confluent Cloud demo examples](https://github.com/confluentinc/examples/tree/latest/ccloud). ## Compile and run a Table API program The following code example shows how to run a “Hello World” statement and how to query an example data stream. 1. Copy the following project object model (POM) into a file named pom.xml. ### pom.xml ```xml 4.0.0 example flink-table-api-java-hello-world 1.0 jar Apache Flink® Table API Java Hello World Example on Confluent Cloud 2.1.0 2.1-8 11 UTF-8 ${target.java.version} ${target.java.version} 2.17.1 confluent https://packages.confluent.io/maven/ apache.snapshots Apache Development Snapshot Repository https://repository.apache.org/content/repositories/snapshots/ false true org.apache.flink flink-table-api-java ${flink.version} io.confluent.flink confluent-flink-table-api-java-plugin ${confluent-plugin.version} org.apache.logging.log4j log4j-slf4j-impl ${log4j.version} runtime org.apache.logging.log4j log4j-api ${log4j.version} runtime org.apache.logging.log4j log4j-core ${log4j.version} runtime ./example org.apache.maven.plugins maven-compiler-plugin 3.10.1 ${target.java.version} ${target.java.version} org.apache.maven.plugins maven-shade-plugin 3.4.1 package shade org.apache.flink:flink-shaded-force-shading com.google.code.findbugs:jsr305 *:* META-INF/*.SF META-INF/*.DSA META-INF/*.RSA example.hello_table_api org.eclipse.m2e lifecycle-mapping 1.0.0 org.apache.maven.plugins maven-shade-plugin [3.1.1,) shade org.apache.maven.plugins maven-compiler-plugin [3.1,) testCompile compile ``` 2. Create a directory named “example”. ```bash mkdir example ``` 3. Create a file named `hello_table_api.java` in the `example` directory. ```bash touch example/hello_table_api.java ``` 4. Copy the following code into `hello_table_api.java`. ```java package example; import io.confluent.flink.plugin.ConfluentSettings; import io.confluent.flink.plugin.ConfluentTools; import org.apache.flink.table.api.EnvironmentSettings; import org.apache.flink.table.api.Table; import org.apache.flink.table.api.TableEnvironment; import org.apache.flink.types.Row; import java.util.List; /** * A table program example to get started with the Apache Flink® Table API. * *

It executes two foreground statements in Confluent Cloud. The results of both statements are * printed to the console. */ public class hello_table_api { // All logic is defined in a main() method. It can run both in an IDE or CI/CD system. public static void main(String[] args) { // Set up connection properties to Confluent Cloud. // Use the fromGlobalVariables() method if you assigned environment variables. // EnvironmentSettings settings = ConfluentSettings.fromGlobalVariables(); // Use the fromArgs(args) method if you want to run with command-line arguments. EnvironmentSettings settings = ConfluentSettings.fromArgs(args); // Initialize the session context to get started. TableEnvironment env = TableEnvironment.create(settings); System.out.println("Running with printing..."); // The Table API centers on 'Table' objects, which help in defining data pipelines // fluently. You can define pipelines fully programmatically. Table table = env.fromValues("Hello world!"); // Also, You can define pipelines with embedded Flink SQL. // Table table = env.sqlQuery("SELECT 'Hello world!'"); // Once the pipeline is defined, execute it on Confluent Cloud. // If no target table has been defined, results are streamed back and can be printed // locally. This can be useful for development and debugging. table.execute().print(); System.out.println("Running with collecting..."); // Results can be collected locally and accessed individually. // This can be useful for testing. Table moreHellos = env.fromValues("Hello Bob", "Hello Alice", "Hello Peter").as("greeting"); List rows = ConfluentTools.collectChangelog(moreHellos, 10); rows.forEach( r -> { String column = r.getFieldAs("greeting"); System.out.println("Greeting: " + column); }); } } ``` 5. Run the following command to build the jar file. ```bash mvn clean package ``` 6. Run the jar. If you assigned your cloud configuration to the environment variables specified in the [Prerequisites](#flink-java-table-api-quick-start-prerequisites) section, and you used the `fromGlobalVariables` method in the `hello_table_api` code, you don’t need to provide the command-line options. ```bash java -jar target/flink-table-api-java-hello-world-1.0.jar \ --cloud aws \ --region us-east-1 \ --flink-api-key key \ --flink-api-secret secret \ --organization-id b0b21724-4586-4a07-b787-d0bb5aacbf87 \ --environment-id env-z3y2x1 \ --compute-pool-id lfcp-8m03rm ``` Your output should resemble: ```none Running with printing... +----+--------------------------------+ | op | f0 | +----+--------------------------------+ | +I | Hello world! | +----+--------------------------------+ 1 row in set Running with collecting... Greeting: Hello Bob Greeting: Hello Alice Greeting: Hello Peter ``` ## Step 3. Create a CI/CD workflow in GitHub Actions The following steps show how to create an Action Workflow for automating the deployment of a Flink SQL statement on Confluent Cloud using Terraform. 1. In the toolbar at the top of the screen, click **Actions**. The **Get started with GitHub Actions** page opens. 2. Click **set up a workflow yourself ->**. If you already have a workflow defined, click **new workflow**, and then click **set up a workflow yourself ->**. 3. Copy the following YAML into the editor. This YAML file defines a workflow that runs when changes are pushed to the main branch of your repository. It includes a job named “terraform_flink_ccloud_tutorial” that runs on the latest version of Ubuntu. The job includes these steps: - Check out the code - Set up Terraform - Log in to Terraform Cloud using the API token stored in the Action Secret - Initialize Terraform - Apply the Terraform configuration to deploy changes to your Confluent Cloud account ```yaml on: push: branches: - main jobs: terraform_flink_ccloud_tutorial: name: "terraform_flink_ccloud_tutorial" runs-on: ubuntu-latest steps: - name: Checkout uses: actions/checkout@v4 - name: Setup Terraform uses: hashicorp/setup-terraform@v3 with: cli_config_credentials_token: ${{ secrets.TF_API_TOKEN }} - name: Terraform Init id: init run: terraform init - name: Terraform Validate id: validate run: terraform validate -no-color - name: Terraform Plan id: plan run: terraform plan env: TF_VAR_confluent_cloud_api_key: ${{ secrets.CONFLUENT_CLOUD_API_KEY }} TF_VAR_confluent_cloud_api_secret: ${{ secrets.CONFLUENT_CLOUD_API_SECRET }} - name: Terraform Apply id: apply run: terraform apply -auto-approve env: TF_VAR_confluent_cloud_api_key: ${{ secrets.CONFLUENT_CLOUD_API_KEY }} TF_VAR_confluent_cloud_api_secret: ${{ secrets.CONFLUENT_CLOUD_API_SECRET }} ``` 4. Click **Commit changes**, and in the dialog, enter a description in the **Extended description** textbox, for example, “CI/CD workflow to automate deployment on Confluent Cloud”. 5. Click **Commit changes**. The file `main.yml` is created in the `.github/workflows` directory in your repository. With this Action Workflow, your deployment of Flink SQL statements on Confluent Cloud is now automatic. ## Key features Tableflow offers the following capabilities: - [Materialize](get-started/overview.md#cloud-tableflow-get-started) Kafka topics or Flink tables as Iceberg or Delta Lake tables - Use [your storage](concepts/tableflow-storage.md#tableflow-storage-amazon-s3) (“Bring Your Own Storage”) or [Confluent Managed Storage](concepts/tableflow-storage.md#tableflow-storage-confluent-managed-storage) for materialized tables - Built-in [Iceberg REST Catalog (IRC)](get-started/quick-start-managed-storage.md#cloud-tableflow-quick-start-managed-storage-credentials) - [Catalog integration](how-to-guides/catalog-integration/overview.md#cloud-tableflow-how-to-guides-catalog-integration) with AWS Glue, Apache Polaris, and Snowflake Open Catalog - Use [Avro, Protobuf, and JSON Schema](concepts/tableflow-schemas.md#cloud-tableflow-schemas) as the input data format and support for schematization using Confluent Cloud Schema Registry. - [Self-managed encryption key (BYOK) support](../../security/encrypt/byok/tableflow-byok.md#tableflow-byok-integration) for enhanced security and compliance requirements. - [Automatic table maintenance](operate/monitor-tableflow.md#tableflow-monitor) Tableflow enhances data quality and structure by managing data preprocessing and preparation automatically before materializing streaming data into Iceberg or Delta Lake tables. Below are the key automated data processing and preparation capabilities supported in Tableflow. ## Flags ```none --file string Output file name. (default "asyncapi-spec.yaml") --group string Consumer Group ID for getting messages. (default "consumerApplication") --consume-examples Consume messages from topics for populating examples. --spec-version string Version number of the output file. (default "1.0.0") --kafka-api-key string Kafka cluster API key. --schema-context string Use a specific schema context. (default "default") --topics strings A comma-separated list of topics to export. Supports prefixes ending with a wildcard (*). --schema-registry-endpoint string The URL of the Schema Registry cluster. --value-format string Format message value as "string", "avro", "double", "integer", "jsonschema", or "protobuf". Note that schema references are not supported for Avro. (default "string") --kafka-endpoint string Endpoint to be used for this Kafka cluster. --cluster string Kafka cluster ID. --environment string Environment ID. ``` ## Examples Create a configuration file with connector configs and offsets. ```none { "name": "MyGcsLogsBucketConnector", "config": { "connector.class": "GcsSink", "data.format": "BYTES", "flush.size": "1000", "gcs.bucket.name": "APILogsBucket", "gcs.credentials.config": "****************", "kafka.api.key": "****************", "kafka.api.secret": "****************", "name": "MyGcsLogsBucketConnector", "tasks.max": "2", "time.interval": "DAILY", "topics": "APILogsTopic" }, "offsets": [ { "partition": { "kafka_partition": 0, "kafka_topic": "topic_A" }, "offset": { "kafka_offset": 1000 } } ] } ``` Create a connector in the current or specified Kafka cluster context. ```none confluent connect cluster create --config-file config.json confluent connect cluster create --config-file config.json --cluster lkc-123456 ``` ### Cloud ```none --source-cluster string Source cluster ID. --source-bootstrap-server string Bootstrap server address of the source cluster. Can alternatively be set in the configuration file using key "bootstrap.servers". --destination-cluster string Destination cluster ID for source initiated cluster links. --destination-bootstrap-server string Bootstrap server address of the destination cluster for source initiated cluster links. Can alternatively be set in the configuration file using key "bootstrap.servers". --remote-cluster string Remote cluster ID for bidirectional cluster links. --remote-bootstrap-server string Bootstrap server address of the remote cluster for bidirectional links. Can alternatively be set in the configuration file using key "bootstrap.servers". --source-api-key string An API key for the source cluster. For links at destination cluster this is used for remote cluster authentication. For links at source cluster this is used for local cluster authentication. If specified, the cluster will use SASL_SSL with PLAIN SASL as its mechanism for authentication. If you wish to use another authentication mechanism, do not specify this flag, and add the security configurations in the configuration file. --source-api-secret string An API secret for the source cluster. For links at destination cluster this is used for remote cluster authentication. For links at source cluster this is used for local cluster authentication. If specified, the cluster will use SASL_SSL with PLAIN SASL as its mechanism for authentication. If you wish to use another authentication mechanism, do not specify this flag, and add the security configurations in the configuration file. --destination-api-key string An API key for the destination cluster. This is used for remote cluster authentication links at the source cluster. If specified, the cluster will use SASL_SSL with PLAIN SASL as its mechanism for authentication. If you wish to use another authentication mechanism, do not specify this flag, and add the security configurations in the configuration file. --destination-api-secret string An API secret for the destination cluster. This is used for remote cluster authentication for links at the source cluster. If specified, the cluster will use SASL_SSL with PLAIN SASL as its mechanism for authentication. If you wish to use another authentication mechanism, do not specify this flag, and add the security configurations in the configuration file. --remote-api-key string An API key for the remote cluster for bidirectional links. This is used for remote cluster authentication. If specified, the cluster will use SASL_SSL with PLAIN SASL as its mechanism for authentication. If you wish to use another authentication mechanism, do not specify this flag, and add the security configurations in the configuration file. --remote-api-secret string An API secret for the remote cluster for bidirectional links. This is used for remote cluster authentication. If specified, the cluster will use SASL_SSL with PLAIN SASL as its mechanism for authentication. If you wish to use another authentication mechanism, do not specify this flag, and add the security configurations in the configuration file. --local-api-key string An API key for the local cluster for bidirectional links. This is used for local cluster authentication if remote link's connection mode is Inbound. If specified, the cluster will use SASL_SSL with PLAIN SASL as its mechanism for authentication. If you wish to use another authentication mechanism, do not specify this flag, and add the security configurations in the configuration file. --local-api-secret string An API secret for the local cluster for bidirectional links. This is used for local cluster authentication if remote link's connection mode is Inbound. If specified, the cluster will use SASL_SSL with PLAIN SASL as its mechanism for authentication. If you wish to use another authentication mechanism, do not specify this flag, and add the security configurations in the configuration file. --config strings A comma-separated list of "key=value" pairs, or path to a configuration file containing a newline-separated list of "key=value" pairs. --dry-run Validate a link, but do not create it. --no-validate Create a link even if the source cluster cannot be reached. --kafka-endpoint string Endpoint to be used for this Kafka cluster. --cluster string Kafka cluster ID. --environment string Environment ID. --context string CLI context name. ``` ### Why should I upgrade my Confluent CLI to the latest version, v4? As detailed in the Release Notes, several commands and flags have been renamed or modified for v4 to provide better functionality and map to feature updates. In particular, the Schema Registry commands are now aligned with [Always On Stream Governance](/cloud/current/stream-governance/packages.html#getting-started-enable-or-upgrade). To learn more, see [Deprecation of SRCM v2 clusters and regions APIs and upgrade guide](/cloud/current/stream-governance/packages.html#deprecation-of-srcm-v2-clusters-and-regions-apis-and-upgrade-guide). In practice, this means that users no longer explicitly create and secure Schema Registry clusters; in fact, these clusters cannot be created manually with the new CLI commands and backing APIs. The Schema Registry cluster is now auto-created in the environment when the first Kafka cluster is created, and in the same region as the Kafka cluster. Stream Governance and Schema Registry is always enabled in Confluent Cloud environments; you have the choice of keeping with the default “Essentials” package or upgrading to “Advanced”. Therefore, the set of `confluent schema-registry cluster` commands have been streamlined to describe existing clusters, while package upgrades are available on `confluent environment` commands: ```bash confluent environment update --governance-package advanced ``` Keep in mind that once you upgrade a package associated with an environment, you cannot “downgrade” back to “Essentials”: ```bash Downgrading the package from "advanced" to "essentials" is not allowed once the Schema Registry cluster is provisioned. ``` Several new commands have been added to support working with Kafka topics and plugins. ## Quick start To get started, install the latest version of the Confluent CLI, create a Kafka cluster and topic, and produce and consume messages as described below. 1. [Install the latest version of the CLI](install.md#cli-install) per the instructions for your operating system. 2. Sign up for a free Confluent Cloud account by entering the following command in your terminal: ```text confluent cloud-signup ``` You should be redirected to the [free Confluent Cloud account](https://www.confluent.io/get-started/) sign up page. 3. After you have signed up for a free account, start autocomplete by entering the following command in your terminal: ```text confluent shell ``` 4. Using the `confluent` interactive shell, enter the following command to log in to your Confluent Cloud account: ```text login ``` If your credentials are not saved locally, you must enter your credentials as shown in the following output: ```text Enter your Confluent Cloud credentials: Email: Password: ``` #### NOTE - If you signed up for a free Confluent Cloud account using your GitHub or Google credentials, you must provide your GitHub or Google username and password to sign in. - You add the `--save` flag if you want to save your credentials locally. This prevents you from having to enter them again in the future. 5. Create your first Kafka cluster: ```text kafka cluster create --cloud --region ``` For example: ```text kafka cluster create dev0 --cloud aws --region us-east-1 ``` You should see output similar to following: ```text It may take up to 5 minutes for the Kafka cluster to be ready. +-----------------------+----------------------------------------------------------+ | Current || false | | ID || lkc-dfgrt7 | | Name || dev0 | | Type || BASIC | | Ingress Limit (MB/s) || 250 | | Egress Limit (MB/s) || 750 | | Storage || 5 TB | | Provider || aws | | Region || us-east-1 | | Availability || single-zone | | Status || PROVISIONING | | Endpoint || SASL_SSL://xxx-xxxx.us-east-1.aws.confluent.cloud:1234 | | REST Endpoint || https://yyy-y11yy.us-east-1.aws.confluent.cloud:345 | +-----------------------+----------------------------------------------------------+ ``` 6. Create a topic in the cluster using the cluster ID from the output of the previous step: ```text kafka topic create --cluster ``` For example: ```text kafka topic create test_topic --cluster lkc-dfgrt7 ``` You should see output confirming that the topic was created: ```text Created topic "test_topic". ``` 7. Create an API key for the cluster: ```text api-key create --resource lkc-dfgrt7 ``` You should see output similar to the following: ```text It may take a couple of minutes for the API key to be ready. Save the API key and secret. The secret is not retrievable later. +-------------+-------------------------------------------------------------------+ | API Key | | | API Secret | | +-------------+-------------------------------------------------------------------+ ``` 8. Produce messages to your topic: ```text kafka topic produce --api-key --api-secret ``` For example: ```text kafka topic produce test_topic --api-key --api-secret ``` You should see output similar to: ```text Starting Kafka Producer. Use Ctrl-C or Ctrl-D to exit. ``` 9. Once the producer is active, type messages, delimiting them with return. For example: ```text today then now forever ``` 10. When you’re finished producing, exit with `Ctrl-C` or `Ctrl-D`. 11. Read back your produced messages, from the beginning: ```text kafka topic consume --api-key --api-secret --from-beginning ``` For example: ```text kafka topic consume test_topic --api-key --api-secret --from-beginning ``` Based on the previous messages entered, you should see output similar to: ```text Starting Kafka Consumer. Use Ctrl-C to exit. forever now today then ``` ## Working with Confluent Cloud for Government Use the links in this section to set up and manage your environment. Install the Confluent CLI: - [Install the CLI](https://docs.confluent.io/confluent-cli/current/install.html) Invite users and assign role-based access: - [Single Sign-on (SSO) Overview](/cloud/current/access-management/authenticate/sso/overview.html) - [Add an SSO user](/cloud/current/access-management/identity/user-accounts.html#add-an-sso-user) - [Restrict user access](/cloud/current/access-management/access-control/cloud-rbac.html) Kafka cluster management: - [CRUD operations for Kafka clusters](/cloud/current/clusters/create-cluster.html#how-to-work-with-clusters) - [Resize clusters](/cloud/current/clusters/resize.html) - [Self-Managed Encryption Keys and AWS](/cloud/current/clusters/byok/byok-aws.html) Setup network security: Setting up a private network on Confluent Cloud for Government is a two-step process. First you create the Confluent Cloud for Government network, then you add the private networking option. AWS includes multiple private networking options, including AWS PrivateLink, VPC Peering on AWS, and AWS Transit Gateway. - [Confluent Cloud Network on AWS](/cloud/current/networking/ccloud-network/aws.html#create-ccloud-network-aws) - [AWS PrivateLink](/cloud/current/networking/private-links/aws-privatelink.html) Monitoring and logging: - [Confluent Cloud Audit Log Overview](/cloud/current/monitoring/audit-logging/cloud-audit-log-concepts.html) - [Audit Log Reference](/cloud/current/monitoring/audit-logging/audit-log-records.html) - [Audit Log Event Schema](/cloud/current/monitoring/audit-logging/audit-log-schema.html) - [Access and Consume Audit Logs](/cloud/current/monitoring/audit-logging/configure.html) Backups and contingency plans: - [Confluent Replicator to Confluent Cloud Configurations](/cloud/current/get-started/examples/ccloud/docs/replicator-to-cloud-configuration-types.html#ccloud-to-ccloud-with-connect-backed-to-origin) - [Confluent for Kubernetes and Replicator GitHub Example](https://github.com/confluentinc/confluent-kubernetes-examples/tree/master/hybrid/replicator-cloud2cloud) - [Confluent for Kubernetes](https://docs.confluent.io/operator/current/overview.html) ## Configure Tiered Storage f For a complete guide to setting up and working with Tiered Storage, see [Tiered Storage in Confluent Platform](/platform/current/clusters/tiered-storage.html#tiered-storage). To configure and work with Tiered Storage starting from Control Center: 1. Click the cluster from the cluster navigation bar. 2. Click the **Cluster settings** menu. 3. Click the **Tiered storage** tab. ![image](images/c3-storage.png) You can hide or show the on-screen setup instructions, which walk through cloud provider setup as fully described in [Tiered Storage in Confluent Platform](/platform/current/clusters/tiered-storage.html#tiered-storage). 4. To view and edit dynamic settings, click **Edit settings**. ![image](images/c3-storage-dynamic-configs.png) View or change settings and click **Cancel** or **Save changes** as appropriate. 5. To set up storage, choose a cloud provider (click the **GCS** or **S3** tab). The S3 configuration options are shown here as an example. ![image](images/c3-storage-setup-s3.png) 6. Specify property values and paths to your credentials, then click **Generate configurations**. ![image](images/c3-storage-setup-example.png) 7. Copy the generated configurations block and paste it into the properties files for your brokers (for example, `$CONFLUENT_HOME/etc/kafka/server.properties`). ![image](images/c3-storage-gen-configs-output.png) #### IMPORTANT - The same bucket must be used across all brokers within a Tiered Storage enabled cluster. This applies to all supported platforms. - The Tiered Storage internal topic defaults to a replication factor of `3`. If you use `confluent local services start` to run a single broker cluster such as that described in [Quick Start guides](/platform/current/get-started/platform-quickstart.html#quickstart),’ you must add an additional line to the broker file, `$CONFLUENT_HOME/etc/kafka/server.properties`: `confluent.tier.metadata.replication.factor=1` - As a recommended best practice, do not set a retention policy on the cloud storage (such as an AWS S3 bucket) because this may conflict with the Kafka topic retention policy. 8. After you update these configurations to enable Tiered Storage, restart the brokers. This can be done in a [rolling](/platform/current/kafka/post-deployment.html#rolling-restart) fashion. 9. View cluster-wide metrics for **Tiered Storage** are shown on the **Tiered Storage** card on the **Brokers** overview page for the cluster. ![Tiered Storage panel enabled](images/c3-tiered-storage-metrics-overview.png) Click into these initial stats to view a metrics chart for Tiered Storage. ![Tiered Storage metrics chart](images/c3-tiered-storage-metrics.png) Hover and slide the cursor over a chart to get details on data at any particular point in time. ![Tiered Storage metrics detail on hover](images/c3-tiered-storage-metrics-details.png) 10. To get storage metrics on a specific topic, navigate to the topic (choose **Cluster > Topics**, select a topic from the list). The **Storage** card is shown on the Overview page for the topic. ![Tiered Storage metrics on a single topic](images/c3-tiered-storage-metrics-on-topic.png) ## Security for Confluent Platform components settings The following optional settings control TLS encryption between Control Center and Confluent Platform components or features. You can also configure Basic authentication for Schema Registry. You should configure these settings if you have configured your Kafka cluster with these security features. For TLS, you can choose to configure each component separately, or set a single store. - [Streams](#controlcenter-monitoring) - [Schema Registry](#controlcenter-sr) - [Connect](#controlcenter-connect) - [ksqlDB](#controlcenter-ksql) - [Single Proxy Server Store](#single-store) ### Confluent Platform 7.7 - 8.0 Considerations: : - You must use a special command to start Prometheus on MacOS. - By default Alertmanager and controllers in KRaft mode use port 9093. To run Prometheus and Alertmanager and KRaft mode controllers on the same host, you must manually edit the provided Control Center scripts. 1. Download the Confluent Platform archive (7.7 to 8.0 supported) and run these commands: ```bash wget https://packages.confluent.io/archive/8.0/confluent-8.0.0.tar.gz ``` ```bash tar -xvf confluent-8.0.0.tar.gz ``` ```bash cd confluent-8.0.0 ``` ```bash export CONFLUENT_HOME=`pwd` ``` 2. Update the broker and controller configurations to emit metrics to Prometheus by adding the following configurations to: `etc/kafka/controller.properties` and `etc/kafka/broker.properties` The fifth line (`confluent.telemetry.exporter._c3.metrics.include=`) is very long. Simply copy the code block as provided and append it to the end of the properties files. Pasting the fifth line results in a single line, even though it shows as wrapped in the documentation. ```bash metric.reporters=io.confluent.telemetry.reporter.TelemetryReporter confluent.telemetry.exporter._c3.type=http confluent.telemetry.exporter._c3.enabled=true confluent.telemetry.exporter._c3.metrics.include=io.confluent.kafka.server.request.(?!.*delta).*|io.confluent.kafka.server.server.broker.state|io.confluent.kafka.server.replica.manager.leader.count|io.confluent.kafka.server.request.queue.size|io.confluent.kafka.server.broker.topic.failed.produce.requests.rate.1.min|io.confluent.kafka.server.tier.archiver.total.lag|io.confluent.kafka.server.request.total.time.ms.p99|io.confluent.kafka.server.broker.topic.failed.fetch.requests.rate.1.min|io.confluent.kafka.server.broker.topic.total.fetch.requests.rate.1.min|io.confluent.kafka.server.partition.caught.up.replicas.count|io.confluent.kafka.server.partition.observer.replicas.count|io.confluent.kafka.server.tier.tasks.num.partitions.in.error|io.confluent.kafka.server.broker.topic.bytes.out.rate.1.min|io.confluent.kafka.server.request.total.time.ms.p95|io.confluent.kafka.server.controller.active.controller.count|io.confluent.kafka.server.session.expire.listener.zookeeper.disconnects.total|io.confluent.kafka.server.request.total.time.ms.p999|io.confluent.kafka.server.controller.active.broker.count|io.confluent.kafka.server.request.handler.pool.request.handler.avg.idle.percent.rate.1.min|io.confluent.kafka.server.session.expire.listener.zookeeper.disconnects.rate.1.min|io.confluent.kafka.server.controller.unclean.leader.elections.rate.1.min|io.confluent.kafka.server.replica.manager.partition.count|io.confluent.kafka.server.controller.unclean.leader.elections.total|io.confluent.kafka.server.partition.replicas.count|io.confluent.kafka.server.broker.topic.total.produce.requests.rate.1.min|io.confluent.kafka.server.controller.offline.partitions.count|io.confluent.kafka.server.socket.server.network.processor.avg.idle.percent|io.confluent.kafka.server.partition.under.replicated|io.confluent.kafka.server.log.log.start.offset|io.confluent.kafka.server.log.tier.size|io.confluent.kafka.server.log.size|io.confluent.kafka.server.tier.fetcher.bytes.fetched.total|io.confluent.kafka.server.request.total.time.ms.p50|io.confluent.kafka.server.tenant.consumer.lag.offsets|io.confluent.kafka.server.session.expire.listener.zookeeper.expires.rate.1.min|io.confluent.kafka.server.log.log.end.offset|io.confluent.kafka.server.broker.topic.bytes.in.rate.1.min|io.confluent.kafka.server.partition.under.min.isr|io.confluent.kafka.server.partition.in.sync.replicas.count|io.confluent.telemetry.http.exporter.batches.dropped|io.confluent.telemetry.http.exporter.items.total|io.confluent.telemetry.http.exporter.items.succeeded|io.confluent.telemetry.http.exporter.send.time.total.millis|io.confluent.kafka.server.controller.leader.election.rate.(?!.*delta).*|io.confluent.telemetry.http.exporter.batches.failed confluent.telemetry.exporter._c3.client.base.url=http://localhost:9090/api/v1/otlp confluent.telemetry.exporter._c3.client.compression=gzip confluent.telemetry.exporter._c3.api.key=dummy confluent.telemetry.exporter._c3.api.secret=dummy confluent.telemetry.exporter._c3.buffer.pending.batches.max=80 confluent.telemetry.exporter._c3.buffer.batch.items.max=4000 confluent.telemetry.exporter._c3.buffer.inflight.submissions.max=10 confluent.telemetry.metrics.collector.interval.ms=60000 confluent.telemetry.remoteconfig._confluent.enabled=false confluent.consumer.lag.emitter.enabled=true ``` 3. Download the Control Center archive and run these commands: ```bash wget https://packages.confluent.io/confluent-control-center-next-gen/archive/confluent-control-center-next-gen-2.3.0.tar.gz ``` ```bash tar -xvf confluent-control-center-next-gen-2.3.0.tar.gz ``` ```bash cd confluent-control-center-next-gen-2.3.0 ``` ```bash export C3_HOME=`pwd` ``` 4. Start Prometheus and Alertmanager To start Control Center, you must have three dedicated command windows: one for Prometheus, another for the Control Center process, and a third dedicated command window for Alertmanager. Run the following commands from `$C3_HOME` in all command windows. 1. Open `etc/confluent-control-center/prometheus-generated.yml` and change `localhost:9093` to `localhost:9098` ```bash alerting: alertmanagers: - static_configs: - targets: - localhost:9098 ``` 2. Start Prometheus. All operating systems except MacOS: ```bash bin/prometheus-start ``` MacOS: ```bash bash bin/prometheus-start ``` #### NOTE Prometheus runs but does not output any information to the screen. 3. Start Alertmanager. 1. Run this command: ```bash export ALERTMANAGER_PORT=9098 ``` 2. All operating systems except MacOS: ```bash bin/alertmanager-start ``` MacOS ```bash bash bin/alertmanager-start ``` 5. Start Control Center. 1. Open `etc/confluent-control-center/control-center-dev.properties` and update port `9093` to `9098`: ```bash confluent.controlcenter.alertmanager.url=http://localhost:9098 ``` 2. Run this command: ```bash bin/control-center-start etc/confluent-control-center/control-center-dev.properties ``` 6. Start Confluent Platform. To start Confluent Platform, you must have two dedicated command windows, one for the controller and another for the broker process. All the following commands are meant to be run from `CONFLUENT_HOME` in both command windows. The Confluent Platform start sequence requires you to generate a single random ID and use that *same* ID for both the controller and the broker process. 1. In the command window dedicated to running the controller, change directories into `CONFLUENT_HOME`. ```bash cd CONFLUENT_HOME ``` 2. Generate a random value for `KAFKA_CLUSTER_ID`. ```bash KAFKA_CLUSTER_ID="$(bin/kafka-storage random-uuid)" ``` 3. Use the following command to get the random ID and save the output. You need this value to start the controller *and* the broker. ```bash echo $KAFKA_CLUSTER_ID ``` 4. Format the log directories for the controller: ```bash bin/kafka-storage format --cluster-id $KAFKA_CLUSTER_ID -c etc/kafka/kraft/controller.properties --standalone ``` 5. Start the controller: ```bash bin/kafka-server-start etc/kafka/kraft/controller.properties ``` 6. Open a command window for the broker and navigate to `CONFLUENT_HOME`. ```bash cd CONFLUENT_HOME ``` 7. Set the `KAFKA_CLUSTER_ID` variable to the random ID you generated earlier with `kafka-storage random-uuid`. ```bash export KAFKA_CLUSTER_ID= ``` 8. Format the log directories for this broker: ```bash bin/kafka-storage format --cluster-id $KAFKA_CLUSTER_ID -c etc/kafka/kraft/broker.properties ``` 9. Start the broker: ```bash bin/kafka-server-start etc/kafka/kraft/broker.properties ``` #### IMPORTANT If you configured Control Center for RBAC in the 5.3 preview release, the configuration options have changed in Confluent Platform version 5.4 and later. You must update your configuration. 1. Uncomment the following lines for each configuration option in the appropriate Control Center properties file for your environment (`CONFLUENT_HOME/etc/confluent-control-center/control-center.properties`). Replace the placeholder values with your actual values. ```RST ############################# Control Center RBAC Settings ############################# # Enable RBAC authorization in Control Center by providing a comma-separated list of Metadata Service (MDS) URLs #confluent.metadata.bootstrap.server.urls=http://localhost:8090 # MDS credentials of an RBAC user for Control Center to act on behalf of # NOTE: This user must be a SystemAdmin on each Apache Kafka cluster #confluent.metadata.basic.auth.user.info=username:password # Enable SASL-based authentication for each Apache Kafka cluster (SASL_PLAINTEXT or SASL_SSL required) #confluent.controlcenter.streams.security.protocol=SASL_PLAINTEXT #confluent.controlcenter.kafka..security.protocol=SASL_PLAINTEXT # Enable authentication using a bearer token for Control Center's REST endpoints #confluent.controlcenter.rest.authentication.method=BEARER # NOTE: Must match the MDS public key #public.key.path=/path/to/publickey.pem ``` **Line descriptions:** - **Line 4:** MDS URL for authorizing resources. In a multiple MDS environment, separate the URLs with a comma. The presence of the MDS URL is what indicates to Control Center that RBAC is enabled. - **Line 8:** Metadata Service (MDS) credentials of an RBAC user for Control Center to act on behalf of. - **Line 11-12:** The confluent.controlcenter.streams prefix represents the Kafka streams application (You can use the option in line 12 for another Kafka cluster) and all configurations you need to add for setting up a Kafka cluster. - **Line 15:** The authentication method required to talk to the Control Center backend through the REST layer. The OAuth-style `BEARER` method is required. The Control Center frontend acquires an access token on your behalf and keeps it refreshed. HTTP Basic authentication headers are not accepted. - **Line 18:** The path to the public key required for REST authentication. Must be the same public key that resides on MDS. The public key checks the token and makes sure that the user requesting access is a valid user in the system. #### IMPORTANT Additional clusters in a multi-cluster environment require connections to Kafka with RBAC enabled due to a [known issue](#c3-ki-cluster-connections). You can no longer send only metrics to Control Center in an RBAC-enabled environment; you must fully enable management. For more information, see [Monitor Kafka with Metrics Reporter in Confluent Platform](/platform/current/monitor/metrics-reporter.html#metrics-reporter). 2. Restart Confluent Platform for the properties file configuration to take effect. If you are using a Confluent Platform development environment with a [Confluent CLI](https://docs.confluent.io/confluent-cli/current/index.html), stop and start as follows: ```bash confluent local stop confluent local start ``` The `control-center-dev.properties` file is passed in automatically. ### Configure TLS proxy server access to Schema Registry When Confluent Control Center connects to Schema Registry and Schema Registry has TLS enabled: - Schema Registry communicates with Kafka over the Kafka protocol, which is secured with TLS. - Control Center communicates with Kafka over the Kafka protocol, which is secured with TLS. - Control Center communicates with Schema Registry with the HTTPS protocol, which is secured with TLS. Essentially, Control Center functions as a proxy server to Schema Registry. To secure Control Center with HTTPS, configure Schema Registry to allow HTTPS as described in [Configuring the REST API for HTTP or HTTPS](/platform/current/schema-registry/security/index.html#schema-registry-http-https). In addition, Control Center should include a trusted certificate to its truststore to connect to Schema Registry over HTTPS as described in [Additional configurations for HTTPS](/platform/current/schema-registry/security/index.html#sr-https-additional). Be sure to prefix the Control Center configuration attributes in `control-center.properties` with `confluent.controlcenter.` For example: ```bash confluent.controlcenter.schema.registry.schema.registry.ssl.truststore.location= confluent.controlcenter.schema.registry.schema.registry.ssl.truststore.password= confluent.controlcenter.schema.registry.schema.registry.ssl.keystore.location= confluent.controlcenter.schema.registry.schema.registry.ssl.keystore.password= confluent.controlcenter.schema.registry.schema.registry.ssl.key.password= ``` ## Topic details Select a topic to display overview details for that topic and navigate to other features for topics: - [Schema](/platform/current/control-center/topics/schema.html#topicschema). - Inspect ([Message browser](messages.md#c3-topic-message-browser)). - [Settings](edit.md#c3-edit-topic) for topic configuration. - Query in ksqlDB - Registers a stream or table for a topic. - Consumer lag - [View consumer lag](view.md#c3-view-topic-consume-metrics) at the topic level. You can view consumer lag for a consumer group from the [Consumers](../clients/consumers.md#controlcenter-userguide-consumers) menu. To access the overview page for a Topic: 1. Select a cluster from the navigation bar and click the **Topics** menu item. 2. In the **Topics** table, click the topic name. The topic overview page automatically opens for that topic. In Normal mode, use the **Topic** page to: * View a topic overview with a health roll-up. * Drill into topic metric metrics by clicking the **Production**, **Consumption**, or **Availability** panels. * Search for partitions by partition ID. * View partition, replica placement, offset, and partition size details. ![Topics Overview page (Normal mode)](images/c3-topics-overview-page.png) #### Semantic and per-method changes - `subscribe`: - Regex flags are ignored while passing a topic subscription (like `i` or `g`). Regexes must start with `^,`; otherwise, an error is thrown. - Subscribe must be called only after `connect`. - An optional parameter, `replace` is provided. If set to `true`, the current subscription is replaced with the new one. If set to false, the new subscription is added to the current one, for example, `consumer.subscribe({ topics: ['topic1'], replace: true});`. The default value is false to retain existing behaviour. - While passing a list of topics to `subscribe()`, the `fromBeginning` is not set per `subscribe`. It must be configured in the top-level configuration. Before: ```javascript const consumer = kafka.consumer({ groupId: 'test-group', }); await consumer.connect(); await consumer.subscribe({ topics: ["topic"], fromBeginning: true }); ``` After: ```javascript const consumer = kafka.consumer({ kafkaJS: { groupId: 'test-group', fromBeginning: true, } }); await consumer.connect(); await consumer.subscribe({ topics: ["topic"] });`` ``` - `run` : - For auto-committing using a consumer, the properties `autoCommit` and `autoCommitInterval` on `run` are not set per `subscribe()`. They must be configured in the top-level configuration. `autoCommitThreshold` is not supported. If `autoCommit` is set to `true`, messages are *not* committed per-message, but periodically at the interval specified by `autoCommitInterval` (default 5 seconds). Before: ```javascript const kafka = new Kafka({ /* ... */ }); const consumer = kafka.consumer({ /* ... */ }); await consumer.connect(); await consumer.subscribe({ topics: ["topic"] }); consumer.run({ eachMessage: someFunc, autoCommit: true, autoCommitInterval: 5000, }); ``` After: ```javascript const kafka = new Kafka({ kafkaJS: { /* ... */ } }); const consumer = kafka.consumer({ kafkaJS: { /* ... */, autoCommit: true, autoCommitInterval: 5000, }, }); await consumer.connect(); await consumer.subscribe({ topics: ["topic"] }); consumer.run({ eachMessage: someFunc, }); ``` - The `heartbeat()` no longer needs to be called by the user in the `eachMessage/eachBatch` callback. Heartbeats are automatically managed by librdkafka. - The `partitionsConsumedConcurrently` is supported by both `eachMessage` and `eachBatch`. - An API compatible version of `eachBatch` is available, but the batch size calculation is not as per configured parameters. the batch size a constant maximum size and is configured internally. This is subject to change. The property `eachBatchAutoResolve` is supported. Within the `eachBatch` callback, use of `uncommittedOffsets` is unsupported, and within the returned batch, `offsetLag` and `offsetLagLow` are unsupported. - `commitOffsets`: - Does not yet support sending metadata for topic partitions being committed. - If called with no arguments, it commits all offsets passed to the user (or the stored offsets, if manually handling offset storage using `consumer.storeOffsets`). - `seek`: - The restriction to call seek only after `run` is removed. It can be called any time. - `pause` and `resume`: - These methods MUST be called after the consumer group is joined. In practice, this means it can be called whenever `consumer.assignment()` has a non-zero size, or within the `eachMessage/eachBatch` callback. - `stop` is not yet supported, and the user must disconnect the consumer. ### Admin client The library provides an admin client to interact with the Kafka cluster. The admin client provides several methods to manage topics, groups, and other Kafka entities. To following code snippet instantiates the `AdminClient`: ```js const Kafka = require('@confluentinc/kafka-javascript'); const client = Kafka.AdminClient.create({ 'client.id': 'kafka-admin', 'bootstrap.servers': 'broker01' }); // From an existing producer or consumer const depClient = Kafka.AdminClient.createFrom(producer); ``` These will instantiate and connect the `AdminClient`, which will allow the calling of the admin methods. A complete list of methods available on the admin client can be found in the API reference documentation. ## OAuthbearer callback authentication The JavaScript Client library supports OAuthBearer token authentication for both the promisified and the callback-based API. The token is fetched using a callback provided by the user. The callback is called at 80% of the token expiry time, and the library uses the new token for the next login attempt. ```js async function token_refresh(oauthbearer_config /* string - passed from config */, cb /* can be used if function is not async */) { // Some logic to fetch the token, before returning it. return { tokenValue, lifetime, principal, extensions }; } const producer = new Kafka().producer({ 'bootstrap.servers': '', 'security.protocol': 'sasl_ssl', // or sasl_plain 'sasl.mechanisms': 'OAUTHBEARER', 'sasl.oauthbearer.config': 'someConfigPropertiesKey=value', // Just passed straight to token_refresh as a string, carries no other significance. 'oauthbearer_token_refresh_cb': token_refresh, }); ``` For a special case of OAuthBearer token authentication, where the token is fetched from an OIDC provider using the `client_credentials` grant type, the library provides a built-in callback, which can be set through just the configuration without any custom function required: ```js const producer = new Kafka().producer({ 'bootstrap.servers': '', 'security.protocol': 'sasl_ssl', // or sasl_plain 'sasl.mechanisms': 'OAUTHBEARER', 'sasl.oauthbearer.method': 'oidc', 'sasl.oauthbearer.token.endpoint.url': issuerEndpointUrl, 'sasl.oauthbearer.scope': scope, 'sasl.oauthbearer.client.id': oauthClientId, 'sasl.oauthbearer.client.secret': oauthClientSecret, 'sasl.oauthbearer.extensions': `logicalCluster=${kafkaLogicalCluster},identityPoolId=${identityPoolId}` }); ``` These examples are for the promisified API, but the callback-based API can be used with the same configuration settings. ### Sink Connector Configuration Start the services using the Confluent CLI: ```bash confluent local start ``` Create a configuration file named aws-cloudwatch-metrics-sink-config.json with the following contents. ```text { "name": "aws-cloudwatch-metrics-sink", "config": { "name": "aws-cloudwatch-metrics-sink", "topics": "cloudwatch-metrics-topic", "connector.class": "io.confluent.connect.aws.cloudwatch.metrics.AwsCloudWatchMetricsSinkConnector", "tasks.max": "1", "key.converter": "io.confluent.connect.avro.AvroConverter", "key.converter.schema.registry.url": "http://localhost:8081", "value.converter": "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url": "http://localhost:8081", "aws.cloudwatch.metrics.url": "https://monitoring.us-east-2.amazonaws.com", "aws.cloudwatch.metrics.namespace": "service-namespace", "behavior.on.malformed.metric": "fail", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1" } } ``` The important configuration parameters used here are: - **aws.cloudwatch.metrics.url**: The endpoint URL that the sink connector uses to push the given metrics. - **aws.cloudwatch.metrics.namespace**: The Amazon CloudWatch Metrics namespace associated with the desired metrics. - **tasks.max**: The maximum number of tasks that should be created for this connector. Run this command to start the Amazon CloudWatch Metrics sink connector. ```bash confluent local load aws-cloudwatch-metrics-sink --config aws-cloudwatch-metrics-sink-config.json ``` To check that the connector started successfully view the Connect worker’s log by running: ```bash confluent local services connect log ``` Produce test data to the `cloudwatch-metrics-topic` topic in Kafka using the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/index.html) confluent local produce command. ```bash kafka-avro-console-producer \ --broker-list localhost:9092 --topic cloudwatch-metrics-topic \ --property parse.key=true \ --property key.separator=, \ --property key.schema='{"type":"string"}' \ --property value.schema='{"name": "myMetric","type": "record","fields": [{"name": "name","type": "string"},{"name": "type","type": "string"},{"name": "timestamp","type": "long"},{"name": "dimensions","type": {"name": "dimensions","type": "record","fields": [{"name": "dimensions1","type": "string"},{"name": "dimensions2","type": "string"}]}},{"name": "values","type": {"name": "values","type": "record","fields": [{"name":"count", "type": "double"},{"name":"oneMinuteRate", "type": "double"},{"name":"fiveMinuteRate", "type": "double"},{"name":"fifteenMinuteRate", "type": "double"},{"name":"meanRate", "type": "double"}]}}]}' ``` #### NOTE For details about using this connector with Kafka Connect Reporter, see [Connect Reporter](/kafka-connectors/self-managed/userguide.html#userguide-connect-reporter). ```properties name=datadog-metrics-sink topics=datadog-metrics-topic connector.class=io.confluent.connect.datadog.metrics.DatadogMetricsSinkConnector tasks.max=1 datadog.api.key=< Your Datadog Api key > datadog.domain=< anyone of COM/EU > behavior.on.error=< Optional Configuration > reporter.bootstrap.servers=localhost:9092 key.converter=io.confluent.connect.avro.AvroConverter key.converter.schema.registry.url=http://localhost:8081 value.converter=io.confluent.connect.avro.AvroConverter value.converter.schema.registry.url=http://localhost:8081 confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 confluent.license= ``` Before starting the connector, make sure that the configurations in `datadog properties` are properly set. #### NOTE Change the `confluent.topic.bootstrap.servers` property to include your broker address(es), and change the `confluent.topic.replication.factor` to `3` for staging or production use. Use curl to post a configuration to one of the Connect workers. Change `http://localhost:8083/` to the endpoint of one of your Connect worker(s). ```bash curl -sS -X POST -H 'Content-Type: application/json' --data @config.json http://localhost:8083/connectors ``` Use the following command to update the configuration of existing connector. ```bash curl -s -X PUT -H 'Content-Type: application/json' --data @config.json http://localhost:8083/connectors/FirebaseSourceConnector/config ``` Confirm that the connector is in a `RUNNING` state by running the following command: ```bash curl http://localhost:8083/connectors/FirebaseSourceConnector/status ``` The output should resemble: ```bash { "name":"FirebaseSourceConnector", "connector":{ "state":"RUNNING", "worker_id":"127.0.1.1:8083" }, "tasks":[ { "id":0, "state":"RUNNING", "worker_id":"127.0.1.1:8083" } ], "type":"source" } ``` To publish records into Firebase, follow the [Firebase documentation](https://firebase.google.com/docs/database/admin/save-data). The data produced to firebase should adhere to the following [data format](#firebase-source-data-format). You can also use the JSON example mentioned in the [data format section](#firebase-source-data-format), save it into a `data.json` file and finally import it into a Firebase database reference using the import feature in the Firebase console. To consume records written by the connector to the Kafka topic, run the following command: ```bash kafka-avro-console-consumer --bootstrap-server localhost:9092 --property schema.registry.url=http://localhost:8081 --topic artists --from-beginning ``` ```bash kafka-avro-console-consumer --bootstrap-server localhost:9092 --property schema.registry.url=http://localhost:8081 --topic songs --from-beginning ``` ### Properties-based example Create a file called github-source-quickstart.properties file with following properties: ```bash name=MyGithubConnector confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 tasks.max=1 connector.class=io.confluent.connect.github.GithubSourceConnector github.service.url=https://api.github.com github.access.token= github.repositories=apache/kafka github.resources=stargazers github.since=2019-01-01 topic.name.pattern=github-${resourceName} key.converter=io.confluent.connect.avro.AvroConverter key.converter.schema.registry.url=http://localhost:8081 value.converter=io.confluent.connect.avro.AvroConverter value.converter.schema.registry.url=http://localhost:8081 ``` Next, load the Source connector. ```bash .confluent local load MyGithubConnector --config github-source-quickstart.properties ``` Your output should resemble the following: ```bash { "name": "MyGithubConnector", "config": { "connector.class": "io.confluent.connect.github.GithubSourceConnector", "tasks.max": "1", "confluent.topic.bootstrap.servers":"localhost:9092", "confluent.topic.replication.factor":"1", "github.service.url":"https://api.github.com", "github.repositories":"apache/kafka", "github.resources":"stargazers", "github.since":"2019-01-01", "github.access.token":"", "topic.name.pattern":"github-${resourceName}", "key.converter":"io.confluent.connect.avro.AvroConverter", "key.converter.schema.registry.url":"http://localhost:8081", "value.converter":"io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url":"http://localhost:8081" }, "tasks": [], "type": null } ``` Enter the following command to confirm that the connector is in a `RUNNING` state: ```bash confluent local status MyGithubConnector ``` The output should resemble: ```bash { "name":"MyGithubConnector", "connector": { "state":"RUNNING", "worker_id":"127.0.1.1:8083" }, "tasks": [ { "id":0, "state":"RUNNING", "worker_id":"127.0.1.1:8083" } ], "type":"source" } ``` #### NOTE Change the `confluent.topic.bootstrap.servers` property to include your broker address(es), and change the `confluent.topic.replication.factor` to 3 for staging or production use. Use curl to post a configuration to one of the Kafka Connect Workers. Change `http://localhost:8083/` to the endpoint of one of your Kafka Connect worker(s). ```bash curl -s -X POST -H 'Content-Type: application/json' --data @config.json http://localhost:8083/connectors ``` Use the following command to update the configuration of existing connector. ```bash curl -s -X PUT -H 'Content-Type: application/json' --data @config.json http://localhost:8083/connectors/HDFS2SourceConnector/config ``` To consume records written by the connector to the configured Kafka topic, run the following command: ```bash kafka-avro-console-consumer --bootstrap-server localhost:9092 --property schema.registry.url=http://localhost:8081 --topic copy_of_test_hdfs --from-beginning ``` #### NOTE Change the `confluent.topic.bootstrap.servers` property to include your broker address(es), and change the `confluent.topic.replication.factor` to 3 for staging or production use. Use curl to post a configuration to one of the Kafka Connect Workers. Change `http://localhost:8083/` to the endpoint of one of your Kafka Connect worker(s). ```bash curl -s -X POST -H 'Content-Type: application/json' --data @config.json http://localhost:8083/connectors ``` Use the following command to update the configuration of existing connector. ```bash curl -s -X PUT -H 'Content-Type: application/json' --data @config.json http://localhost:8083/connectors/HDFS3SourceConnector/config ``` To consume records written by the connector to the configured Kafka topic, run the following command: ```bash kafka-avro-console-consumer --bootstrap-server localhost:9092 --property schema.registry.url=http://localhost:8081 --topic copy_of_test_hdfs --from-beginning ``` ## Quick start This quick start uses the HTTP Sink connector to consume records and send HTTP requests to a demo HTTP service running locally that is running without any authentication. 1. Before starting the connector, clone and run the [kafka-connect-http-demo](https://github.com/confluentinc/kafka-connect-http-demo) app on your machine. ```bash git clone https://github.com/confluentinc/kafka-connect-http-demo.git cd kafka-connect-http-demo mvn spring-boot:run -Dspring.profiles.active=simple-auth ``` 2. Install the connector through the [Confluent Hub Client](https://docs.confluent.io/current/connect/managing/confluent-hub/client.html) ```bash confluent local start ``` 3. Produce test data to the `http-messages` topic in Kafka using the Confluent CLI [confluent local services kafka produce](https://docs.confluent.io/confluent-cli/current/command-reference/local/services/kafka/confluent_local_services_kafka_produce.html) command. ```bash seq 10 | confluent local produce http-messages ``` 4. Create a `http-sink.json` file with the following contents: ```json { "name": "HttpSink", "config": { "topics": "http-messages", "tasks.max": "1", "connector.class": "io.confluent.connect.http.HttpSinkConnector", "http.api.url": "http://localhost:8080/api/messages", "value.converter": "org.apache.kafka.connect.storage.StringConverter", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1", "reporter.bootstrap.servers": "localhost:9092", "reporter.result.topic.name": "success-responses", "reporter.result.topic.replication.factor": "1", "reporter.error.topic.name":"error-responses", "reporter.error.topic.replication.factor":"1" } } ``` 5. Load the HTTP Sink connector. ```bash confluent local load HttpSink --config http-sink.json ``` 6. Verify the connector is in a `RUNNING` state. ```bash confluent local status HttpSink ``` 7. Verify the data was sent to the HTTP endpoint. ```bash curl localhost:8080/api/messages ``` Note that before running other examples, you should kill the demo app (`CTRL + C`) to avoid port conflicts. ### Property-based example Configure the `jira-source-quickstart.properties` file with following properties: ```bash name=MyJiraConnector confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 tasks.max=1 connector.class=io.confluent.connect.jira.JiraSourceConnector jira.url= jira.since=2019-10-17 23:50 jira.username= jira.api.token= jira.tables=roles topic.name.pattern=jira-topic-${resourceName} key.converter=io.confluent.connect.avro.AvroConverter key.converter.schema.registry.url=http://localhost:8081 value.converter=io.confluent.connect.avro.AvroConverter value.converter.schema.registry.url=http://localhost:8081 ``` Next, load the Source connector. ```bash ./bin/confluent local load MyJiraConnector --config ./etc/kafka-connect-jira/jira-source-quickstart.properties ``` Your output should resemble the following: ```bash { "name": "MyJiraConnector", "config": { "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1", "tasks.max": "1", "connector.class": "io.confluent.connect.jira.JiraSourceConnector", "jira.url": "", "jira.since": "2019-10-17 23:50", "jira.username": "< Your-Jira-Username >", "jira.api.token": "< Your-Jira-Access-Token >", "jira.tables": "roles", "topic.name.pattern":"jira-topic-${resourceName}", "key.converter":"io.confluent.connect.avro.AvroConverter", "key.converter.schema.registry.url":"http://localhost:8081", "value.converter":"io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url":"http://localhost:8081" "name": "MyJiraConnector" }, "tasks": [], "type": "source" } ``` Enter the following command to confirm that the connector is in a `RUNNING` state: ```bash confluent local status MyJiraConnector ``` The output should resemble the example below: ```bash { "name":"MyJiraConnector", "connector":{ "state":"RUNNING", "worker_id":"127.0.1.1:8083" }, "tasks":[ { "id":0, "state":"RUNNING", "worker_id":"127.0.1.1:8083" } ], "type":"source" } ``` ## Distributed This configuration is used typically along with [distributed mode](/platform/current/connect/concepts.html#distributed-workers). Write the following JSON to `connector.json`, configure all of the required values, and use the command below to post the configuration to one of the distributed connect workers. ```bash { "name": "connector1", "config": { "connector.class": "io.confluent.connect.jms.JmsSourceConnector", "kafka.topic":"MyKafkaTopicName", "jms.destination.name":"MyQueueName", "java.naming.factory.initial":"", "java.naming.provider.url":"", "confluent.license":"", "confluent.topic.bootstrap.servers":"localhost:9092" } } ``` Change the `confluent.topic.*` properties as required to suit your environment. If running on a single-node Kafka cluster you will need to include `confluent.topic.replication.factor=1`. Leave the `confluent.license` property blank for a 30 day trial. See the [configuration options](source_connector_config.md#jms-source-connector-license-config) for more details. For example, the following specifies looking up the IBM MQ connection information in LDAP (check the documentation for your JMS broker for more details). ```bash { "name": "connector1", "config": { "connector.class": "io.confluent.connect.jms.JmsSourceConnector", "kafka.topic":"MyKafkaTopicName", "jms.destination.name":"MyQueueName", "jms.destination.type":"queue", "java.naming.factory.initial":"com.sun.jndi.ldap.LdapCtxFactory", "java.naming.provider.url":"ldap://" "java.naming.security.principal":"MyUserName", "java.naming.security.credentials":"MyPassword", "confluent.license":"", "confluent.topic.bootstrap.servers":"localhost:9092" } } ``` Change the `confluent.topic.*` properties as required to suit your environment. If running on a single-node Kafka cluster you will need to include `"confluent.topic.replication.factor":"1"`. Leave the `confluent.license` property blank for a 30 day trial. See the [configuration options](source_connector_config.md#jms-source-connector-license-config) for more details. Use curl to post the configuration to one of the Kafka Connect Workers. Change `http://localhost:8083/` the endpoint of one of your Kafka Connect worker(s). ```bash curl -s -X POST -H 'Content-Type: application/json' --data @connector.json http://localhost:8083/connectors ``` #### NOTE For details about using this connector with Kafka Connect Reporter, see [Connect Reporter](/kafka-connectors/self-managed/userguide.html#userguide-connect-reporter). Run the connector with this configuration. ```bash confluent local load pagerduty-sink-connector --config pagerduty-sink.properties ``` The output should resemble: ```json { "name":"pagerduty-sink-connector", "config":{ "topics":"incidents", "tasks.max":"1", "connector.class":"io.confluent.connect.pagerduty.PagerDutySinkConnector", "pagerduty.api.key":"****", "behavior.on.error":"fail", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "value.converter":"io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url":"http://localhost:8081", "confluent.topic.bootstrap.servers":"localhost:9092", "confluent.topic.replication.factor":"1", "reporter.bootstrap.servers": "localhost:9092", "reporter.result.topic.replication.factor":"1", "reporter.error.topic.replication.factor":"1" "name":"pagerduty-sink-connector" }, "tasks":[ { "connector":"pagerduty-sink-connector", "task":0 } ], "type":"sink" } ``` #### NOTE For details about using this connector with Kafka Connect Reporter, see [Connect Reporter](/kafka-connectors/self-managed/userguide.html#userguide-connect-reporter). 1. Write the following JSON to `config.json` and configure all of the required values. ```json { "name" : "prometheus-connector", "config" : { "topics":"test-topic", "connector.class" : "io.confluent.connect.prometheus.PrometheusMetricsSinkConnector", "tasks.max" : "1", "confluent.topic.bootstrap.servers":"localhost:9092", "prometheus.listener.url": "http://localhost:8889/metrics", "key.converter": "io.confluent.connect.avro.AvroConverter", "key.converter.schema.registry.url": "http://localhost:8081", "value.converter": "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url": "http://localhost:8081", "reporter.result.topic.replication.factor": "1", "reporter.error.topic.replication.factor": "1", "behavior.on.error": "log" } } ``` #### NOTE Change the `confluent.topic.bootstrap.servers` property to include your broker address(es) and change the `confluent.topic.replication.factor` to `3` for production use. 2. Enter the following curl command to post the configuration to one of the Kafka Connect workers. Change `http://localhost:8083/` to the endpoint of one of your Kafka Connect worker(s). ```bash curl -sS -X POST -H 'Content-Type: application/json' --data @config.json http://localhost:8083/connectors ``` 3. Enter the following curl command to update the configuration of the connector: ```bash curl -s -X PUT -H 'Content-Type: application/json' --data @config.json http://localhost:8083/connectors/prometheus-connector/config ``` 4. Enter the following curl command to confirm that the connector is in a `RUNNING` state: ```bash curl http://localhost:8083/connectors/prometheus-connector/status | jq ``` The output should resemble: ```bash { "name": "prometheus-connector", "connector": { "state": "RUNNING", "worker_id": "127.0.1.1:8083" }, "tasks": [ { "id": 0, "state": "RUNNING", "worker_id": "127.0.1.1:8083", } ], "type": "sink" } ``` Search for the endpoint `/connectors/prometheus-connector/status`, the state of the connector and tasks should have status as `RUNNING`. 5. Use the following command to produce Avro data to the Kafka topic: `test-topic`: ```bash ./bin/kafka-avro-console-producer \ --broker-list localhost:9092 --topic test-topic \ --property value.schema='{"name": "metric","type": "record","fields": [{"name": "name","type": "string"},{"name": "type","type": "string"},{"name": "timestamp","type": "long"},{"name": "values","type": {"name": "values","type": "record","fields": [{"name":"doubleValue", "type": "double"}]}}]}' ``` While the console is waiting for the input, use the following three records and paste each of them on the console. ```bash {"name":"kafka_gaugeMetric1", "type":"gauge","timestamp": 1576236481,"values": {"doubleValue": 5.639623848362502}} {"name":"kafka_gaugeMetric1", "type":"gauge","timestamp": 1576236481,"values": {"doubleValue": 5.639623848362502}} {"name":"kafka_gaugeMetric2", "type":"gauge","timestamp": 1576236481,"values": {"doubleValue": 5.639623848362502}} {"name":"kafka_gaugeMetric3", "type":"gauge","timestamp": 1576236481,"values": {"doubleValue": 5.639623848362502}} ``` 6. Check the Prometheus portal on `localhost:9090` and verify that metrics were created. ### REST-based example This configuration is used typically along with [distributed workers](/platform/current/connect/concepts.html#distributed-workers). Write the following JSON to `connector.json`, configure all of the required values, and use the command below to post the configuration to one of the distributed connect worker(s). See Kafka Connect [REST API](/platform/current/connect/references/restapi.html) for more information. **Connect Distributed REST example with Platform Event:** ```json { "name" : "SFDCPlatformEvents1", "config" : { "connector.class", "io.confluent.salesforce.SalesforcePlatformEventSourceConnector", "tasks.max" : "1", "kafka.topic" : "< Required Configuration >", "salesforce.consumer.key" : "< Required Configuration >", "salesforce.consumer.secret" : "< Required Configuration >", "salesforce.password" : "< Required Configuration >", "salesforce.password.token" : "< Required Configuration >", "salesforce.platform.event.name" : "< Required Configuration >", "salesforce.username" : "< Required Configuration >", "salesforce.initial.start" : "all", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1", "confluent.license": " Omit to enable trial mode " } } ``` ### REST-based example This configuration is typically used with [distributed workers](/platform/current/connect/concepts.html#distributed-workers). 1. Write the following JSON to `connector.json` and configure all the required values: **Connect Distributed REST example with Platform Event** ```json { "name" : "SFDCPlatformEventsSink1", "config" : { "connector.class": "io.confluent.salesforce.SalesforcePlatformEventSinkConnector", "tasks.max" : "1", "topics" : "< Required Configuration >", "salesforce.consumer.key" : "< Required Configuration >", "salesforce.consumer.secret" : "< Required Configuration >", "salesforce.password" : "< Required Configuration >", "salesforce.password.token" : "< Required Configuration >", "salesforce.platform.event.name" : "< Required Configuration >", "salesforce.username" : "< Required Configuration >", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1", "confluent.license": " Omit to enable trial mode " } } ``` #### NOTE - Change the `confluent.topic.bootstrap.servers` property to include your broker address(es), and change the `confluent.topic.replication.factor` to 3 for staging or production use. - For details about using this connector with Kafka Connect Reporter, see [Connect Reporter](/kafka-connectors/self-managed/userguide.html#userguide-connect-reporter). 2. Use curl to post a configuration to one of the Kafka Connect Workers. Change `http://localhost:8083/` to the endpoint of one of your Kafka Connect worker(s). For more information, see Kafka Connect [REST API](/platform/current/connect/references/restapi.html). **Create a new connector:** ```bash curl -s -X POST -H 'Content-Type: application/json' --data @connector.json http://localhost:8083/connectors ``` **Update an existing connector:** ```bash curl -s -X PUT -H 'Content-Type: application/json' --data @connector.json http://localhost:8083/connectors/SFDCPlatformEventsSink1/config ``` ### REST-based example This configuration is used typically along with [distributed workers](/platform/current/connect/concepts.html#distributed-workers). 1. Create a file named `connector.json` using the following JSON configuration example: **Connect Distributed REST example with Push Topic**: ```json { "name" : "SalesforcePushTopicSourceConnector1", "config" : { "connector.class" : "io.confluent.salesforce.SalesforcePushTopicSourceConnector", "tasks.max" : "1", "kafka.topic" : "< Required Configuration >", "salesforce.consumer.key" : "< Required Configuration >", "salesforce.consumer.secret" : "< Required Configuration >", "salesforce.object" : "< Required Configuration >", "salesforce.password" : "< Required Configuration >", "salesforce.password.token" : "< Required Configuration >", "salesforce.push.topic.name" : "< Required Configuration >", "salesforce.username" : "< Required Configuration >", "salesforce.initial.start" : "all", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1", "confluent.license": " Omit to enable trial mode " } } ``` To include your broker address(es), change the `confluent.topic.bootstrap.servers` property. You can change the `confluent.topic.replication.factor` to 3 for staging or production use. 2. Use `curl` to post a configuration to one of the Connect workers. Change `http://localhost:8083/` to the endpoint of one of your Connect worker(s). For more information, see Connect [REST API](/platform/current/connect/references/restapi.html) . **Create a new connector:** ```bash curl -s -X POST -H 'Content-Type: application/json' --data @connectorPushTopic.json http://localhost:8083/connectors ``` **Update an existing connector:** ```bash curl -s -X PUT -H 'Content-Type: application/json' --data @connector.json http://localhost:8083/connectors/SalesforcePushTopicSourceConnector1/config ``` #### NOTE You can add the following Single Message Transform (SMT) to the connector configuration to process records generated by the Salesforce Bulk API Source connector. ```text "transforms" : "InsertField", "transforms.InsertField.type" : "org.apache.kafka.connect.transforms.InsertField$Value", "transforms.InsertField.static.field" : "_EventType", "transforms.InsertField.static.value" : "created" ``` 1. Create a configuration file named `salesforce-bulk-api-leads-sink-config.json` with the following contents. Ensure you enter a real username, password, security token, consumer key, and consumer secret. For details about configuration properties, see [Configuration Properties](configuration_options.md#salesforce-bulk-api-sink-connector-config). ```text { "name" : "SalesforceBulkApiSinkConnector", "config" : { "connector.class" : "io.confluent.connect.salesforce.SalesforceBulkApiSinkConnector", "tasks.max" : "1", "topics" : "sfdc-pushtopic-lead", "salesforce.object" : "Lead", "salesforce.password" : "< Required Configuration >", "salesforce.password.token" : "< Required Configuration >", "salesforce.username" : "< Required Configuration: secondary organization username >", "reporter.result.topic.replication.factor" : "1", "reporter.error.topic.replication.factor" : "1", "reporter.bootstrap.servers" : "localhost:9092", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1", "confluent.license": " Omit to enable trial mode " } } ``` For details about using this connector with Kafka Connect Reporter, see [Connect Reporter](/kafka-connectors/self-managed/userguide.html#userguide-connect-reporter). 2. Enter the Confluent CLI command to start the Salesforce Sink connector. ```bash confluent local load SalesforceBulkApiSinkConnector -- -d salesforce-bulk-api-leads-sink-config.json ``` Your output should resemble: ```none { "name": "SalesforceBulkApiSinkConnector", "config": { "connector.class" : "io.confluent.connect.salesforce.SalesforceBulkApiSinkConnector", "tasks.max" : "1", "topics" : "sfdc-pushtopic-leads", "salesforce.object" : "Lead", "salesforce.username" : "" "salesforce.password" : "", "salesforce.password.token" : "", "reporter.result.topic.replication.factor" : "1", "reporter.error.topic.replication.factor" : "1", "reporter.bootstrap.servers" : "localhost:9092", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1", "confluent.license": " Omit to enable trial mode " }, "tasks": [ ... ], "type": null } ``` ### REST-based example This configuration typically is used with [distributed workers](/platform/current/connect/concepts.html#distributed-workers). Write the following JSON to `connector.json`, configure all of the required values, and use the command below to post the configuration to one the distributed connect worker(s). For more information, see Kafka Connect [REST API](/platform/current/connect/references/restapi.html) . ```text { "config" : { "name" : "SalesforceSObjectSinkConnector1", "connector.class" : "io.confluent.salesforce.SalesforceSObjectSinkConnector", "tasks.max" : "1", "topics" : "< Required Configuration >", "salesforce.consumer.key" : "< Required Configuration >", "salesforce.consumer.secret" : "< Required Configuration >", "salesforce.object" : "< Required Configuration >", "salesforce.password" : "< Required Configuration >", "salesforce.password.token" : "< Required Configuration >", "salesforce.username" : "< Required Configuration >", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1", "salesforce.sink.object.operation": "delete", "override.event.type": "true", "confluent.license": " Omit to enable trial mode " } } ``` To include your broker address(es), change the `confluent.topic.bootstrap.servers` property. For staging or production use, change the `confluent.topic.replication.factor` to 3. For details about using this connector with Kafka Connect Reporter, see [Connect Reporter](/kafka-connectors/self-managed/userguide.html#userguide-connect-reporter). Use curl to post a configuration to one of the Kafka Connect Workers. Change `http://localhost:8083/` to the endpoint of one of your Kafka Connect worker(s). ```none curl -s -X POST -H 'Content-Type: application/json' --data @connector.json http://localhost:8083/connectors ``` ```none curl -s -X PUT -H 'Content-Type: application/json' --data @connector.json http://localhost:8083/connectors/SalesforcePushTopicSourceConnector1/config ``` ### REST-based example Use this setting with [distributed workers](/platform/current/connect/concepts.html#distributed-workers). Write the following JSON to `snmp-trap-source-config.json`, configure all of the required values, and use the following command to post the configuration to one of the distributed connect workers. For more information, see the Kafka Connect [Kafka Connect REST Interface](/platform/current/connect/references/restapi.html). ```json { "name": "SnmpTrapSourceConnector", "config": { "name":"SnmpTrapSourceConnector", "connector.class":"io.confluent.connect.snmp.SnmpTrapSourceConnector", "tasks.max":"1", "kafka.topic":"snmp-kafka-topic", "snmp.v3.enabled":"true", "snmp.batch.size":"50", "snmp.listen.address":"", "snmp.listen.port":"", "auth.password":"", "privacy.password":"", "security.name":"", "confluent.topic.bootstrap.servers":"localhost:9092", "confluent.topic.replication.factor":"1" } } ``` Use `curl` to post the configuration to one of the Kafka Connect Workers. Change `http://localhost:8083/` the endpoint of one of your Kafka Connect worker(s). ```bash curl -sS -X POST -H 'Content-Type: application/json' --data @snmp-trap-source-config.json http://localhost:8083/connectors ``` Use the following command to update the configuration of existing connector. ```bash curl -s -X PUT -H 'Content-Type: application/json' --data @snmp-trap-source-config.json http://localhost:8083/connectors/snmpTrapSourceConnector/config ``` Check that the connector started successfully. Review the Connect worker’s log by entering the following: ```bash confluent local services connect log ``` The SNMP device should be running and generating PDUs. The connector will listen and push PDUs of type trap to Kafka topic. ## JSON Schemaless Source Connector Example This example follows the same steps as the Quick Start. Review the Quick Start for help running the Confluent Platform and installing the Spool Dir connectors. 1. Generate a JSON dataset using the command below: ```bash curl "https://api.mockaroo.com/api/17c84440?count=500&key=25fd9c80" > "json-spooldir-source.json" ``` 2. Create a `spooldir.properties` file with the following contents: ```properties name=SchemaLessJsonSpoolDir tasks.max=1 connector.class=com.github.jcustenborder.kafka.connect.spooldir.SpoolDirSchemaLessJsonSourceConnector input.path=/path/to/data input.file.pattern=json-spooldir-source.json error.path=/path/to/error finished.path=/path/to/finished halt.on.error=false topic=spooldir-schemaless-json-topic value.converter=org.apache.kafka.connect.storage.StringConverter ``` 3. Load the SpoolDir Schemaless JSON Source connector. ```bash confluent local load spooldir --config spooldir.properties ``` #### IMPORTANT Don’t use the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/index.html) in production environments. #### Client-side OAuth client assertion for Kafka and KRaft For each of the Confluent component (currently in Confluent Platform 8.0, Schema Registry only) to authenticate with Kafka or KRaft using OAuth client assertion, configure the client-side OAuth client assertion in the component CR as below. For KRaft, the `authentication` object is under `dependencies.kRaftController.controllerListener.authentication`. To set up client assertion, first, you must complete the [client-side OAuth configuration](#co-authenticate-kafka-client-oauth). For client assertion, configure the additional properties on top of the existing OAuth configurations: ```yaml kind: spec: dependencies: : authentication: type: oauth --- [1] oauthSettings: clientAssertion: --- [2] ``` * [1] Required. * [2] See [the client assertion properties](#co-authenticate-client-assertion-settings) for a list of properties you can use. The following is a sample snippet of Schema Registry to authenticate with Kafka using local client assertion: ```yaml kind: SchemaRegistry spec: dependencies: kafka: bootstrapEndpoint: kafka.operator.svc.cluster.local:9071 authentication: type: oauth oauthSettings: tokenEndpointUri: http://keycloak:8080/realms/sso_test/protocol/openid-connect/token clientAssertion: clientId: private-key-client ## Configure for Kubernetes Horizontal Pod Autoscaler In Kubernetes, the [Horizontal Pod Autoscaler (HPA)](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) feature automatically scales the number of pod replicas. Starting in Confluent for Kubernetes (CFK) 2.1.0, you can configure Confluent Platform components to use HPA based on CPU and memory utilization of Confluent Platform pods. HPA is not supported for ZooKeeper and Control Center. To use HPA with a Confluent Platform component, create an HPA resource for the component custom resource (CR) out of band to integrate with CFK. The following example is to create an HPA resource for Connect based on CPU utilization and memory usage: ```yaml apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: connect-cluster-hpa namespace: confluent spec: scaleTargetRef: --- [1] apiVersion: platform.confluent.io/v1beta1 --- [2] kind: Connect --- [3] name: connect --- [4] minReplicas: 2 --- [5] maxReplicas: 4 --- [6] metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50 --- [7] - type: Resource resource: name: memory targetAverageValue: 1000Mi --- [8] ``` * [1] Required. Specify the Confluent component-specific information in this section. * [2] Required. CFK API version. * [3] Required. The CR kind of the object to scale. * [4] Required. The CR name of the object to scale. * [5] The minimum number of replicas when scaling down. If your Kafka default replication factor is N, the `minReplicas` on your HPA for your Kafka cluster must be >= N. If you want Schema Registry, Connect, ksqlDB to be HA, set `minReplicas` >= 2 * [6] The maximum number of replicas when scaling up. * [7] The target average CPU utilization of 50%. * [8] The target average memory usage value of 1000 Mi. Take the following into further consideration when setting up HPA for Confluent Platform: * If you have `oneReplicaPerNode` set to `true` for Kafka (which is the default), your upper bound for Kafka brokers is the number of available Kubernetes worker nodes you have. * If you have affinity or taint/toleration rules set for Kafka, that further constrains the available nodes. * If your underlying Kubernetes cluster doesn’t itself support autoscaling of the Kubernetes worker nodes, make sure there is enough Kubernetes worker nodes to allow HPA is successful. You can check the current status of HPA by running: ```bash kubectl get hpa ``` # Configure Replicator for Confluent Platform Using Confluent for Kubernetes Confluent Replicator allows you to replicate topics from one Apache Kafka® cluster to another. In addition to copying the messages, Replicator will create topics as needed, preserving the topic configuration in the source cluster. This includes preserving the number of partitions, the replication factor, and any configuration overrides specified for individual topics. Confluent Replicator is built as a connector. So, when you deploy Replicator in Confluent for Kubernetes, you use the Connect CRD to define a custom resource (CR) for Replicator and specify to use the `cp-enterprise-replicator` Docker image that contains the Replicator JARs. For example: ```yaml apiVersion: platform.confluent.io/v1beta1 kind: Connect metadata: name: replicator namespace: destination spec: replicas: 2 image: application: confluentinc/cp-enterprise-replicator:8.1.0 init: confluentinc/confluent-init-container:3.1.0 ``` This is a change from Confluent Operator 1.x, where Replicator had a Helm sub-Chart and a section in the `values.yaml` for configuration. See the [comprehensive example for configuring and deploying Confluent Replicator](https://github.com/confluentinc/confluent-kubernetes-examples/tree/master/hybrid/replicator) for the detailed steps and an example CR. ## Configure and deploy Unified Stream Manager 1. Configure **Unified Stream Manager Agent** using the USMAgent custom resource (CR), and then apply the CR using the `kubectl apply` command. ```yaml kind: USMAgent spec: replicas: image: application: --- [1] init: --- [2] authentication: type: --- [3] basic: secretRef: --- [4] tls: secretRef: --- [5] confluentCloudClient: endpoint: --- [6] environmentId: --- [7] authentication: type: --- [8] basic: secretRef: --- [9] externalAccess: --- [10] type: --- [11] loadBalancer: --- [12] nodePort: --- [13] ``` * [1] Set to the Unified Stream Manager application image. * [2] Set to the Unified Stream Manager CFK init container image. * [3] Set to `basic` or `mtls`. * [4] Required for basic authentication. Specify the secret containing the basic authentication credentials. * [5] For TLS between Unified Stream Manager Agent and Confluent Platform components, specify the secret containing the TLS certificate, the key, and the root certificate authority (CA) files. * [6] Specify the Confluent Cloud endpoint. The Confluent Cloud endpoint is available in the output file generated when you perform the first step in the registration process. See [Generate the configuration file](https://docs.confluent.io/cloud/current/usm/register/deploy-agent.html#generate-and-download-the-configuration-file). This step has to be completed before you deploy Unified Stream Manager Agent. * [7] Specify the Confluent Cloud Environment ID. The Environment ID is available in the output file generated when you perform the first step in the registration process. See [Generate the configuration file](https://docs.confluent.io/cloud/current/usm/register/deploy-agent.html#generate-and-download-the-configuration-file). This step has to be completed before you deploy Unified Stream Manager Agent. * [8] Set to `basic` for basic authentication. * [8] Set to `basic` for basic authentication. * [9] Required for basic authentication. Specify the secret containing the Cloud Api key and secret. The values are available in the output file generated when you perform the first step in the registration process. See [Generate the configuration file](https://docs.confluent.io/cloud/current/usm/register/deploy-agent.html#generate-and-download-the-configuration-file). This step has to be completed before you deploy Unified Stream Manager Agent. * [10] Optional. External access is optional for Unified Stream Manager Agent. * [11] Set to `loadBalancer` or `nodePort` to specify the external access type. * [12] Required when externalAccess type ([11]) is set to `loadBalancer`. For configuring load balancers, see [Configure Load Balancers for Confluent Platform in Confluent for Kubernetes](co-loadbalancers.md#co-loadbalancers). * [13] Required when externalAccess type ([11]) is set to `nodePort`. For configuring node ports, see [Configure Node Ports to Access Confluent Platform Components Using Confluent for Kubernetes](co-nodeports.md#co-nodeports). 2. Configure the **client-side properties** in Kafka, KRaft, and Connect for communication with Unified Stream Manager Agent, and then apply the changes to the CRs with the `kubectl apply` command. ```yaml spec: dependencies: usmAgentClient: url: --- [1] authentication: type: --- [2] basic: secretRef: --- [3] dpic: --- [4] tls: enabled: --- [5] secretRef: --- [6] dpic: --- [7] ``` * [1] Specify the Unified Stream Manager Agent URL. * [2] Set to `basic` or `mtls` to specify the authentication type. See [Basic authentication credentials](co-authenticate-cp.md#co-basic-server-creds) for the required format. * [3] Specify the secret containing the basic authentication credentials. * [4] Specify the basic authentication credential secret path in the container. For details, see [Provide secrets in HashiCorp Vault](co-credentials.md#co-directory-path-in-container). * [5] Set to `true` or `false` to enable or disable TLS. * [6] Specify the secret containing the TLS certificate. * [7] Specify the TLS certificate secret path in the container. For details, see [Provide secrets in HashiCorp Vault](co-credentials.md#co-directory-path-in-container). 3. [Register your Confluent Platform Connect cluster in Confluent Cloud](http://docs.confluent.io/cloud/current/usm/register-connect.html). You can use the following options to retrieve the Connect cluster ID (also known as the group ID) that is required to register the Connect cluster in Confluent Cloud: * Use the `kubectl describe connect` command, and fetch the Group ID under the `Status` section. * If you have the Confluent CLI installed, you can use the command `confluent connect cluster list` as described in the above registration topic. ### Configure the source-initiated cluster link on the source cluster For a source-initiated cluster link, configure the cluster information in the Source mode ClusterLink CR: ```yaml spec: sourceInitiatedLink: linkMode: Source --- [1] destinationKafkaCluster: bootstrapEndpoint: --- [2] clusterID: --- [3] kafkaRestClassRef: --- [4] name: --- [5] namespace: --- [6] sourceKafkaCluster: kafkaRestClassRef: --- [7] name: --- [8] namespace: --- [9] configs: --- [10] local.security.protocol: --- [11] local.listener.name: --- [12] ``` * [1] Required. * [2] Required. The bootstrap endpoint where the destination Kafka is running. * [3] The cluster ID of the destination Kafka cluster. If both `clusterID` and Kafka REST class name ([5]) are specified, this `clusterID` value takes precedence over the Kafka REST class name ([5]). You can get the cluster ID using the `curl` or `kafka-cluster` command with the proper flags. For example: ```bash curl https://:8090/kafka/v3/clusters/ -kv ``` ```bash kafka-cluster cluster-id --bootstrap-server kafka.operator.svc.cluster.local:9092 \ --config /tmp/kafka.properties ``` * [4] Optional. The reference to the KafkaRestClass application custom resource (CR) which defines the Kafka REST Class connection information. * [5] Required under [4]. The name of the [KafkaRestClass CR](co-manage-rest-api.md#co-manage-rest-api) on the destination cluster. * [6] Optional. The namespace of the KafkaRestClass CR. If omitted, the same namespace as this CR is assumed. * [7] Required. The reference to the KafkaRestClass application custom resource (CR) which defines the Kafka REST Class connection information. * [8] Required. The name of the [KafkaRestClass CR](co-manage-rest-api.md#co-manage-rest-api) on the source Kafka cluster. * [9] Optional. The namespace of the KafkaRestClass CR. If omitted, the same namespace as this CR is assumed. * [10] Use to specify additional configurations for the cluster link. * [11] SSL is required when using mTLS or SASL authentication in an RBAC-enabled cluster. : In all other cases, it is optional. Set to `SSL` for mTLS and `SASL_SSL` for SASL authentication. * [12] An SSL listener name is required when using mTLS authentication in an RBAC-enabled cluster. In all other cases it is optional. #### NOTE When RBAC is enabled in this Confluent Platform environment, the super user you configured for Kafka (`kafka.spec.authorization.superUsers`) does not have access to resources in the Schema Registry cluster. If you want the super user to be able to create schema exporters, grant the super user the permission on the Schema Registry cluster. In the source Schema Registry clusters, create a schema exporter CR and apply the configuration with the `kubectl apply -f ` command: ```yaml apiVersion: platform.confluent.io/v1beta1 kind: SchemaExporter metadata: name: --- [1] namespace: --- [2] spec: sourceCluster: --- [3] destinationCluster: --- [4] subjects: --- [5] subjectRenameFormat: --- [6] contextType: --- [7] contextName: --- [8] configs: --- [9] ``` * [1] Required. The name of the schema exporter. The name must be unique in a source Schema Registry cluster. * [2] The namespace for the schema exporter. * [3] The source Schema Registry cluster. You can either specify the cluster name or the endpoint. If not given, CFK will auto discover the source Schema Registry in the namespace of this schema exporter. The discover process errors out if more than one Schema Registry clusters are discovered in the namespace. See [Specify the source and destination Schema Registry clusters](#co-schema-exporter-discover) for configuration details. * [4] The destination Schema Registry cluster where the schemas will be exported. If not defined, the source cluster is used as the destination, and the schema exporter will be exporting schemas across contexts within the source cluster. See [Specify the source and destination Schema Registry clusters](#co-schema-exporter-discover) for configuration details. * [5] The subjects to export to the destination. Default value is `["*"]`, which denotes all subjects in the default context. * [6] The rename format that defines how to rename the subject at the destination. For example, if the value is `my-${subject}`, subjects at destination will become `my-XXX` where `XXX` is the original subject. * [7] Specify how to create context to move the subjects at the destination. The default value is `AUTO`, with which, the exporter will use an auto generated context in the destination cluster. The auto generated context name will be reported in the status. If set to `NONE`, the exporter copies the source schemas as-is. * [8] The name of the schema context on the destination to export the subjects. If this is defined, `spec.contextType` is ignored. * [9] Additional configs not supported by the SchemaExporter CRD properties. An example SchemaExporter CR: ```yaml apiVersion: platform.confluent.io/v1beta1 kind: SchemaExporter metadata: name: schema-exporter namespace: confluent spec: sourceCluster: schemaRegistryClusterRef: name: sr namespace: operator destinationCluster: schemaRegistryRest endpoint: https://schemaregistry.operator-dest.svc.cluster.local:8081 authentication: type: basic secretRef: sr-basic subjects: - subject1 - subject2 contextName: link-source ``` #### Discover Schema Registry using Schema Registry endpoint To specify how to connect to the Schema Registry endpoint, specify the connection information in the Schema CR. **Schema Registry endpoint** ```yaml spec: schemaRegistryRest: endpoint: --- [1] authentication: type: --- [2] ``` * [1] The endpoint where Schema Registry is running. * [2] Authentication method to use for the Schema Registry cluster. Supported types are `basic`, `mtls`, `bearer`, and `oauth`. You can use `bearer` when RBAC is enabled for Schema Registry. **Basic authentication to Schema Registry** ```yaml spec: schemaRegistryRest: authentication: type: basic --- [1] basic: secretRef: --- [2] directoryPathInContainer: --- [3] ``` * [1] Required for the basic authentication type. * [2] or [3] is required. * [2] The name of the secret that contains the credentials. See [Basic authentication](co-authenticate-cp.md#co-authenticate-cp-basic) for the required format. * [3] The directory path in the container where the required credentials are injected by Vault. See [Basic authentication](co-authenticate-cp.md#co-authenticate-cp-basic) for the required format. See [Provide secrets for Confluent Platform application CR](co-credentials.md#co-vault-category-2) for providing the credentials and required annotations when using Vault. **mTLS authentication to Schema Registry** ```yaml spec: schemaRegistryRest: authentication: type: mtls --- [1] tls: secretRef: --- [2] directoryPathInContainer: --- [3] ``` * [1] Required for the mTLS authentication type. * [2] The name of the secret that contains the TLS certificates. See [Provide TLS keys and certificates in PEM format](co-network-encryption.md#co-certs-pem) for the expected keys in the TLS secret. Only the PEM format is supported for Schema CRs. * [3] The directory path in the container where the expected keys and certificates are mounted. See [Provide TLS keys and certificates in PEM format](co-network-encryption.md#co-certs-pem) for the expected keys in the TLS secret. Only the PEM format is supported for Schema CRs. See [Provide secrets for Confluent Platform application CR](co-credentials.md#co-vault-category-2) for providing the keys and certificates using the Directory Path in Container feature. **Bearer authentication to Schema Registry (for RBAC)** When RBAC is enabled for Schema Registry, you can configure bearer authentication as below: ```yaml spec: schemaRegistryRest: authentication: type: bearer --- [1] bearer: secretRef: --- [2] directoryPathInContainer: --- [3] ``` * [1] Required for the bearer authentication type. * [2] or [3] is required. * [2] Required. The name of the secret that contains the bearer credentials. See [Bearer authentication](co-authenticate-kafka.md#co-authenticate-mds-bearer) for the required format. * [3] The directory path in the container where the required the bearer credentials are mounted. See [Bearer authentication](co-authenticate-kafka.md#co-authenticate-mds-bearer) for the required format. See [Provide secrets for Confluent Platform application CR](co-credentials.md#co-vault-category-2) for providing the credential using the Directory Path in Container feature. **OAuth authorization and authentication to Schema Registry** ```yaml schemaRegistryRest: authentication: type: oauth --- [1] oauth: secretRef: --- [2] directoryPathInContainer: --- [3] configuration: --- [4] ``` * [1] Required for OAuth. * [2] or [3] is required. * [2] The name of the secret that contains the bearer credentials. See [Bearer authentication](co-authenticate-kafka.md#co-authenticate-mds-bearer) for the required format. * [3] Set to the directory path in the container where required authentication credentials are injected by Vault. See [Bearer authentication](co-authenticate-kafka.md#co-authenticate-mds-bearer) for the required format. See [Provide secrets for Confluent Platform application CR](co-credentials.md#co-vault-category-2) for providing the credential and required annotations when using Vault. * [4] The client-side OAuth configuration. For details, see [Client-side OAuth/OIDC authentication for Confluent components](co-authenticate-cp.md#co-authenticate-cp-client-oauth). # Configure Network Encryption for Confluent Platform Using Confluent for Kubernetes This document describes how to configure network encryption with Confluent for Kubernetes (CFK). For security concepts in Confluent Platform, see [Security](/platform/current/security/index.html). To secure network communications of Confluent components, CFK supports Transport Layer Security (TLS), an industry-standard encryption protocol. TLS relies on keys and certificates to establish trusted connections. This section describes how to manage keys and certificates when you configure TLS encryption for Confluent Platform. CFK supports the following mechanisms to enable TLS encryption: [Auto-generated certificates](#co-configure-auto-certificates) : CFK auto-generates the server certificates, using the certificate authority (CA) that you provide. If all access and communication to Confluent services is within the Kubernetes network, auto-generated certificates are recommended. [User-provided certificates](#co-configure-user-provided-certificates) : User provides the private key, public key, and CA. If you need to enable access to Confluent services from an external-to-Kubernetes domain, user-provided certificates are recommended. [Separate certificates for internal and external communications](#co-configure-separate-certificates) : You provide separate TLS certificates for the internal and external communications so that you do not mix external and internal domains in the certificate SAN. This feature is supported for ksqlDB, Schema Registry, MDS, and Kafka REST services, starting in CFK 2.6.0 and Confluent Platform 7.4.0 release. [Dynamic Kafka certificate updates](#co-dynamic-certificates-update) : When you rotate certificates by providing new server certificates, CFK automatically updates the configurations to use those new certificates. And, by default, this update triggers a rolling restart of the affected Confluent Platform pod. To minimize disruption during rolling restarts of Kafka brokers, you can enable dynamic certificate loading for Kafka and Kafka REST service. CFK will update TLS private keys and certificates without rolling the Kafka cluster. This feature is only supported at the individual listener level. ### Define SAN The certificate must have the Subject Alternative Name (SAN) list, and the SAN list must be properly defined and cover all hostnames that the Confluent component will be accessed on: * If TLS for internal communication network encryption is enabled, include the internal network, `..svc.cluster.local`, in the SAN list. * If TLS for external network communication is enabled, include the external domain name in the SAN list. The following are the internal and external SANs of each Confluent component that need to be included in the component certificate SAN. The examples use the default component prefixes. Kafka : * Internal bootstrap access SAN: `..svc.cluster.local` * Example: `kafka.confluent.svc.cluster.local` * Internal access SAN: `-...svc.cluster.local` `` is the ordinal number of brokers, 0 to (number of brokers - 1). * Example: `kafka-0.kafka.confluent.svc.cluster.local` * The range can be handled through a wildcard domain, for example, `*.kafka.confluent.svc.cluster.local`. * External bootstrap domain SAN: `.my-external-domain` * Example: `kafka-bootstrap.acme.com` * External broker SAN: `.my-external-domain` * Example: `b0.acme.com` * The range can be handled through a wildcard domain, for example, `*.acme.com` MDS : * Internal access SAN: `-...svc.cluster.local` `` is the ordinal number of brokers, 0 to (number of brokers - 1). * Example: `kafka-0.kafka.confluent.svc.cluster.local` * External domain SAN: `.my-external-domain` * Example: `mds.my-external-domain` KRaft : * Internal bootstrap access SAN: `..svc.cluster.local` * Internal access SAN: `-...svc.cluster.local` `` is the ordinal number of the KRaft controller, 0 to (number of servers - 1). * Example: `kraftcontroller-0.kraftcontroller.confluent.svc.cluster.local` * External domain SAN: `.my-external-domain` ZooKeeper : * Internal bootstrap access SAN: `..svc.cluster.local` * Internal access SAN: `-...svc.cluster.local` `` is the ordinal number of ZooKeeper servers, 0 to (number of servers - 1). * Example: `zookeeper-0.zookeeper.confluent.svc.cluster.local` * No external access domain #### IMPORTANT Starting with Confluent Platform version 8.0, ZooKeeper is no longer part of Confluent Platform. Schema Registry : * Internal bootstrap access SAN: `..svc.cluster.local` * Internal access SAN: `-... svc.cluster.local` `` is the ordinal number of Schema Registry servers, 0 to (number of servers - 1). * Example: `schemaregistry-0.schemaregistry.confluent.svc.cluster.local` * External domain SAN: `.my-external-domain` REST Proxy : * Internal access SAN: `-... svc.cluster.local` `` is the ordinal number of REST Proxy servers, 0 to (number of servers - 1). * Example: `kafkarestproxy-0.kafkarestproxy.confluent.svc.cluster.local` * External domain SAN: `.my-external-domain` Connect : * Internal bootstrap access SAN: `..svc.cluster.local` * Internal SAN: `-...svc.cluster.local` `` is the ordinal number of Connect servers, 0 to (number of servers - 1). * Example: `connect-0.connect.confluent.svc.cluster.local` * External domain SAN: `.my-external-domain` ksqlDB : * Internal bootstrap access SAN: `..svc.cluster.local` * Internal access SAN: `-...svc.cluster.local` `` is the ordinal number of ksqlDB servers, 0 to (number of servers - 1). * Example: `ksqldb-0.ksqldb.confluent.svc.cluster.local` * External domain SAN: `.my-external-domain` Control Center (Legacy) : * Internal bootstrap access SAN: `..svc.cluster.local` * Internal access SAN: `-0...svc.cluster.local` * Example: `controlcenter-0.controlcenter.confluent.svc.cluster.local` * External domain SAN: `.my-external-domain` For an example of how to create certificates with appropriate SAN configurations, see the [Create your own certificates tutorial](https://github.com/confluentinc/confluent-kubernetes-examples/tree/master/security/production-secure-deploy#appendix-create-your-own-certificates). ### Migrate RBAC from OAuth or LDAP-based to dual authentication with mTLS This section describes how to migrate an OAuth or LDAP-based RBAC deployment to use mTLS as the one of the dual authentication methods. 1. Add a custom listener with dual OAuth or LDAP and mTLS authentications in Kafka. This listener will be used for Confluent Platform components to Kafka communications while the internal listener gets updated during migration. If migrating from LDAP to dual LDAP and mTLS, set the listener authentication type to `bearer`. If migrating from OAuth to dual OAuth and mTLS, set the listener authentication type to `oauth`. ```yaml kind: Kafka spec: listeners: custom: - name: customoauth port: 9093 authentication: type: oauthSettings : # If type: oauth above tokenEndpointUri: expectedIssuer: jwksEndpointUri: subClaimName: client_id mtls: sslClientAuthentication: "required" principalMappingRules: - "RULE:.*CN=([a-zA-Z0-9.-]*).*$/$1/" - "DEFAULT" tls: enabled: true ``` 2. Update all the Confluent Platform components CRs (Schema Registry, Connect, REST Proxy, Control Center) and the admin KafkaRestClass CR, to enable `sslClientAuthentication` from the client-side, and to update their Kafka dependency endpoint to communicate on the custom listener created in Step 1. All these have to be done in the same step. Note that the MDS authentication type and the Kafka authentication type should be the same since MDS exists on the Kafka cluster. * For OAuth-based RBAC, `oauth` for Kafka and MDS * For LDAP-based RBAC, `oauthbearer` for Kafka and `bearer` for MDS * Update the Confluent Platform components: ```yaml kind: spec: dependencies: kafka: bootstrapEndpoint: kafka.confluent.svc.cluster.local:9093 authentication: type: : sslClientAuthentication: true tls: enabled: true mds: endpoint: https://kafka.confluent.svc.cluster.local:8090 tokenKeyPair: secretRef: mds-token authentication: type: : sslClientAuthentication: true tls: enabled: true ``` * Update the `kafkaRest` dependency in the Kafka CR to enable the client-side mTLS in the embedded REST Proxy. ```yaml kind: Kafka spec: dependencies: kafkaRest: authentication: type: : sslClientAuthentication: true tls: enabled: true secretRef: tls-kafka ``` 3. Update the KaftRestClass CR. It is required to create role bindings. ```yaml kind: KafkaRestClass spec: kafkaRest: endpoint: https://kafka.confluent.svc.cluster.local:8090 authentication: type: : sslClientAuthentication: true tls: secretRef: tls-kafka ``` 4. Add the mTLS provider in the MDS service in parallel to the already existing OAuth or LDAP provider. This enables dual authentication with OAuth or LDAP and mTLS. ```yaml kind: kafka spec: services: mds: provider: mtls: sslClientAuthentication: <"required" or "requested"> principalMappingRules: : configurations: ``` 5. Add the mTLS authentication section in the Schema Registry, Connect, and REST Proxy CRs to support dual authentication. ```yaml kind: spec: authentication: mtls: sslClientAuthentication: "required" principalMappingRules: oauth: # If migrating to OAuth+mTLS ``` # Configure Host-Based Static Access to Confluent Platform Components Using Confluent for Kubernetes When you configure Kafka for host-based static access, the Kafka advertised listeners are set up with the broker prefix and the domain name. This method does not create any Kubernetes resources, and you need to explicitly configure external access to Kafka, for example, using NGINX ingress controller. This method requires: * Kafka is configured with TLS. * An Ingress controller that supports SSL passthrough is used. **To configure external access to Kafka using static host-based routing:** 1. Configure and deploy Kafka with the `staticForHostBasedRouting` access type. ```yaml listeners: external: externalAccess: type: staticForHostBasedRouting staticForHostBasedRouting: port: --- [1] domain: --- [2] brokerPrefix: --- [3] ``` * [1] Required. The `port` to be used in the advertised listener for a broker. Set it to `443` to support SNI capabilities. If you change this value on a running cluster, you must roll the cluster. * [2] Required. `domain` will be configured as part of the Kafka advertised listener. If you change this value on a running cluster, you must roll the cluster. * [3] Optional. Use `brokerPrefix` to change the default Kafka broker prefix. The default Kafka broker prefix is `b`. These are used for DNS entries. The broker DNS names become `0.`, `1.`, and so on. If not set, the default broker DNS names are `b0.`, `b1.`, and so on. For example, the following are Kafka advertised listeners for three Kafka brokers with `port: 443` and `domain: example.com`: * `b0.example.com:443` * `b1.example.com:443` * `b2.example.com:443` If you change this value on a running cluster, you must roll the cluster. 2. Deploy an Ingress controller, such as [ingress-nginx](https://kubernetes.github.io/ingress-nginx/deploy). For a list of available controllers, see [Ingress controllers](https://kubernetes.io/docs/concepts/services-networking/ingress-controllers). Your Ingress controller must support SSL passthrough that intercepts all traffic on the configured HTTPS port (default is 443) and hands it over to the Kafka TCP proxy. The below example is a Helm command to install NGINX Ingress controller with SSL passthrough enabled: ```bash helm upgrade --install ingress-nginx ingress-nginx \ --repo https://kubernetes.github.io/ingress-nginx \ --set controller.publishService.enabled=true \ --set controller.extraArgs.enable-ssl-passthrough="true" ``` 3. Configure the DNS addresses for Kafka brokers to point to the Ingress controller. You need the following to derive Kafka DNS entries: * `domain` name you provided in the configuration file in Step #1 * The external IP of the Ingress controller load balancer You can retrieve the external IP using the following command: ```bash kubectl get services -n ``` * Kafka `brokerPrefix` you provided in the configuration file in Step #1 The following example shows the DNS table entries using: * Domain: `example.com` * Three broker replicas with the default prefix/replica numbers: `b` ```none DNS name ExternalIP b0.example.com 34.71.198.214 b1.example.com 34.71.198.214 b2.example.com 34.71.198.214 ``` 4. Create an [Ingress resource](https://kubernetes.io/docs/concepts/services-networking/ingress/#the-ingress-resource) that includes a collection of rules the Ingress controller uses to route the inbound traffic to Kafka. Ingress uses annotations to configure some options depending on the Ingress controller, an example of which is the [rewrite-target annotation](https://github.com/kubernetes/ingress-nginx/blob/master/docs/examples/rewrite/README.md). Review the documentation for your Ingress controller to learn which annotations are supported. For detail on deploying the NGINX controller and configuring an Ingress resource, refer to [this tutorial](https://cloud.google.com/community/tutorials/nginx-ingress-gke). The following example is to create an Ingress resource for NGINX Ingress controller. The resource exposes three Kafka brokers: ```yaml apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: ingress-with-sni namespace: confluent annotations: nginx.ingress.kubernetes.io/ssl-passthrough: "true" ---[1] nginx.ingress.kubernetes.io/ssl-redirect: "false" ---[2] nginx.ingress.kubernetes.io/backend-protocol: HTTPS ---[3] ingress.kubernetes.io/ssl-passthrough: "true" ---[4] kubernetes.io/ingress.class: nginx ---[5] spec: tls: - hosts: - demo0.example.com - demo1.example.com - demo2.example.com - demo.example.com rules: - host: demo0.example.com http: paths: - path: / pathType: Prefix backend: service: name: kafka-0-internal port: number: 9092 - host: demo1.example.com http: paths: - path: / pathType: Prefix backend: service: name: kafka-1-internal port: number: 9092 - host: demo2.example.com http: paths: - path: / pathType: Prefix backend: service: name: kafka-2-internal port: number: 9092 ``` * Annotation [1] instructs the controller to send TLS connections directly to the backend instead of letting NGINX decrypt the communication. * Annotation [2] disables the default value. * Annotation [3] indicates how NGINX should communicate with the backend service. * Annotation [4] `ssl-passthrough` is required. * Annotation [5] uses the NGINX controller. For a tutorial scenario on configuring external access using host-based static access, see the [quickstart tutorial for host-based static access](https://github.com/confluentinc/confluent-kubernetes-examples/tree/master/networking/external-access-static-host-based). ### Issue: JAAS class path discrepancy between CFK 3.0 and Confluent Platform 7.x CFK 3.0 defaults to Confluent Platform 8.x behavior, including Jetty 12 support. Specifically, CFK 3.0.0 and higher uses the Confluent Platform 8.0 JAAS class path (`org.eclipse.jetty.security.jaas.spi.PropertyFileLoginModule`), instead of the Confluent Platform 7.x class path (`org.eclipse.jetty.jaas.spi.PropertyFileLoginModule`). **Solution:** To use the JAAS class path compatible with Confluent Platform 7.x, add the annotation, `platform.confluent.io/use-old-jetty9: "true"`, to your Confluent Platform component CR that expose REST API endpoints and have authentication enabled on those endpoints, such as Control Center, Control Center (Legacy), Schema Registry, Connect, ksqlDB, and REST Proxy. ```yaml apiVersion: platform.confluent.io/v1beta1 kind: metadata: name: controlcenter annotations: platform.confluent.io/use-old-jetty9: "true" ``` When you upgrade to Confluent Platform 8.0 or higher, remove the above annotation. # Kafka Producer for Confluent Platform An Apache Kafka® Producer is a client application that publishes (writes) events to a Kafka cluster. This section gives an overview of the Kafka producer and an introduction to the configuration settings for tuning. The Kafka producer is conceptually much simpler than the consumer since it does not need group coordination. A producer **partitioner** maps each message to a topic partition, and the producer sends a produce request to the leader of that partition. The partitioners shipped with Kafka guarantee that all messages with the same non-empty key will be sent to the same partition. # Manage Clusters in Confluent Platform Confluent Platform provides features to help you manage Apache Kafka® cluster rebalancing and cost. The following topics are included in this section: - [Metadata Management of Kafka in Confluent Platform](../kafka-metadata/overview.md#zk-or-kraft)- Discusses the options for metadata storage and leader elections for a cluster. - [Manage Self-Balancing Kafka Clusters in Confluent Platform](sbc/index.md#sbc) - With this feature enabled, a cluster automatically rebalances partitions across brokers when new brokers are added or existing brokers are removed. - [Quick Start for Auto Data Balancing in Confluent Platform](rebalancer/quickstart.md#rebalancer) - A tool that balances data so that the number of leaders and disk usage are even across brokers and racks on a per topic and cluster level while minimizing data movement. [Manage Self-Balancing Kafka Clusters in Confluent Platform](sbc/index.md#sbc) is the preferred alternative to [Quick Start for Auto Data Balancing in Confluent Platform](rebalancer/quickstart.md#rebalancer). - [Tiered Storage in Confluent Platform](tiered-storage.md#tiered-storage) - A feature that helps make storing huge volumes of data in Kafka manageable by reducing operational burden and cost. If you’re just getting started with Confluent Platform and Kafka, see also [Learn More About Confluent Products and Kafka](../get-started/kafka-basics.md#ak-basics) and [Quick Start for Confluent Platform](../get-started/platform-quickstart.md#quickstart). ## Configuring and starting controllers and brokers in KRaft mode As of Confluent Platform 8.0, ZooKeeper is no longer available. Confluent recommends migrating to [Kraft mode](https://docs.confluent.io/platform/current/installation/installing_cp/zip-tar.html#kraft-mode) . To learn more about running Kafka in KRaft mode, see [KRaft Overview for Confluent Platform](../../kafka-metadata/kraft.md#kraft-overview), [KRaft Configuration for Confluent Platform](../../kafka-metadata/config-kraft.md#configure-kraft), and the [Platform Quick Start](../../get-started/platform-quickstart.md#cp-quickstart-step-1). To learn about migrating from older versions, see [Migrate from ZooKeeper to KRaft on Confluent Platform](../../installation/migrate-zk-kraft.md#migrate-zk-kraft). This tutorial provides examples for KRaft mode only. Earlier versions of this documentation (such as [version 7.9](https://docs.confluent.io/platform/7.9/clusters/sbc/sbc-tutorial.html)) provide examples for both KRaft and ZooKeeper. The examples show an *isolated mode* configuration for a multi-broker cluster managed by a single controller. As shown in the steps below, you will use `$CONFLUENT_HOME/etc/kafka/broker.properties` and `$CONFLUENT_HOME/etc/kafka/controller.properties` as the basis to create a controller (`$CONFLUENT_HOME/etc/kafka/controller-sbc.properties`) and multiple brokers to test Self-Balancing. ### 4. Configure the controller and brokers to send metrics to Control Center with Prometheus In the following next steps, you will configure your Kafka brokers and controller to export their metrics using the `confluent.telemetry.exporter._c3.client.base.url` setting to push OTLP (OpenTelemetry Protocol) metrics. Control Center will act as an OTLP receiver, listening on `localhost:9090` for the incoming metrics. 1. If you have the controller and brokers running (per the previous steps), **stop these components in the reverse order** from which you started them. 1. Stop each broker by using Ctrl-C in each window. 2. Finally, stop the controller with Ctrl-C in its window. Leave the windows open so that you can quickly re-start the controller and brokers after you’ve added the additional required configurations. 2. Add the following lines to the end of the properties files for the controller and each one of the brokers to emit metrics to Prometheus, the OTLP endpoint. (The fourth line with the value for `confluent.telemetry.exporter._c3.metrics.include=i` is very long. Simply copy the code block as provided and paste it in at the end of the properties files. This line will paste in as a single line, even though it shows as wrapped in the documentation.) ```bash metric.reporters=io.confluent.telemetry.reporter.TelemetryReporter confluent.telemetry.exporter._c3.type=http confluent.telemetry.exporter._c3.enabled=true confluent.telemetry.exporter._c3.metrics.include=io.confluent.kafka.server.request.(?!.*delta).*|io.confluent.kafka.server.server.broker.state|io.confluent.kafka.server.replica.manager.leader.count|io.confluent.kafka.server.request.queue.size|io.confluent.kafka.server.broker.topic.failed.produce.requests.rate.1.min|io.confluent.kafka.server.tier.archiver.total.lag|io.confluent.kafka.server.request.total.time.ms.p99|io.confluent.kafka.server.broker.topic.failed.fetch.requests.rate.1.min|io.confluent.kafka.server.broker.topic.total.fetch.requests.rate.1.min|io.confluent.kafka.server.partition.caught.up.replicas.count|io.confluent.kafka.server.partition.observer.replicas.count|io.confluent.kafka.server.tier.tasks.num.partitions.in.error|io.confluent.kafka.server.broker.topic.bytes.out.rate.1.min|io.confluent.kafka.server.request.total.time.ms.p95|io.confluent.kafka.server.controller.active.controller.count|io.confluent.kafka.server.session.expire.listener.zookeeper.disconnects.total|io.confluent.kafka.server.request.total.time.ms.p999|io.confluent.kafka.server.controller.active.broker.count|io.confluent.kafka.server.request.handler.pool.request.handler.avg.idle.percent.rate.1.min|io.confluent.kafka.server.session.expire.listener.zookeeper.disconnects.rate.1.min|io.confluent.kafka.server.controller.unclean.leader.elections.rate.1.min|io.confluent.kafka.server.replica.manager.partition.count|io.confluent.kafka.server.controller.unclean.leader.elections.total|io.confluent.kafka.server.partition.replicas.count|io.confluent.kafka.server.broker.topic.total.produce.requests.rate.1.min|io.confluent.kafka.server.controller.offline.partitions.count|io.confluent.kafka.server.socket.server.network.processor.avg.idle.percent|io.confluent.kafka.server.partition.under.replicated|io.confluent.kafka.server.log.log.start.offset|io.confluent.kafka.server.log.tier.size|io.confluent.kafka.server.log.size|io.confluent.kafka.server.tier.fetcher.bytes.fetched.total|io.confluent.kafka.server.request.total.time.ms.p50|io.confluent.kafka.server.tenant.consumer.lag.offsets|io.confluent.kafka.server.session.expire.listener.zookeeper.expires.rate.1.min|io.confluent.kafka.server.log.log.end.offset|io.confluent.kafka.server.broker.topic.bytes.in.rate.1.min|io.confluent.kafka.server.partition.under.min.isr|io.confluent.kafka.server.partition.in.sync.replicas.count|io.confluent.telemetry.http.exporter.batches.dropped|io.confluent.telemetry.http.exporter.items.total|io.confluent.telemetry.http.exporter.items.succeeded|io.confluent.telemetry.http.exporter.send.time.total.millis|io.confluent.kafka.server.controller.leader.election.rate.(?!.*delta).*|io.confluent.telemetry.http.exporter.batches.failed confluent.telemetry.exporter._c3.client.base.url=http://localhost:9090/api/v1/otlp confluent.telemetry.exporter._c3.client.compression=gzip confluent.telemetry.exporter._c3.api.key=dummy confluent.telemetry.exporter._c3.api.secret=dummy confluent.telemetry.exporter._c3.buffer.pending.batches.max=80 confluent.telemetry.exporter._c3.buffer.batch.items.max=4000 confluent.telemetry.exporter._c3.buffer.inflight.submissions.max=10 confluent.telemetry.metrics.collector.interval.ms=60000 confluent.telemetry.remoteconfig._confluent.enabled=false confluent.consumer.lag.emitter.enabled=true ``` 3. Save the updated files. # Configure and Manage Confluent Platform This section provides topics on resources for managing Confluent Platform, including tools, configuration reference, and Apache Kafka® deployment and post-deployment guidance. - [Kafka Configuration Reference for Confluent Platform](../installation/configuration/index.md#cp-config-reference) - Contains a comprehensive reference guide of broker, topic, consumer, producer, and connect configuration properties. - [CLI Tools Shipped With Confluent Platform](../tools/cli-reference.md#cp-all-cli) - Contains a list of CLI tools for use with Kafka and Confluent Platform. - [Change Kafka Configurations Without Restart for Confluent Platform](../kafka/dynamic-config.md#kafka-dynamic-configurations) - Shows how you can change configuration properties using the `kafka-configs` tool without stopping a broker. - [Manage Clusters in Confluent Platform](../clusters/overview.md#manage-clusters) - Contains topics that describe how metadata is managed for a cluster, and how cost-saving features like Tiered Storage and the Confluent Rebalancer work. - [Configure Metadata Service (MDS) in Confluent Platform](../kafka/configure-mds/index.md#rbac-mds-config) - Describes the Confluent Metadata Service (MDS) - [Docker Operations for Confluent Platform](../installation/docker/operations/index.md#operations-overview) - Provides a guide to configuring Confluent running on Docker - [Running Kafka in Production with Confluent Platform](../kafka/deployment.md#cp-production-recommendations) - Covers how much memory, how many disks, CPU and more you should use for a Confluent Platform production deployment. In addition, provides configuration guidelines for production cluster. - [Best Practices for Kafka Production Deployments in Confluent Platform](../kafka/post-deployment.md#kafka-post-deployment) - This topic describes tasks that you might complete after moving to production; changing the log level, adding or modifying topics, changing the replication factor and more. ### Task example - source task Next you’ll describe the implementation of the corresponding `SourceTask`. The class is small, but too long to cover completely in this guide. You’ll use helper methods of which the details aren’t provided to describe most of the implementation, but you can refer to the source code for the full example. Just as with the connector, you must create a class inheriting from the appropriate base `Task` class. It also has some standard lifecycle methods: ```java public class FileStreamSourceTask extends SourceTask { private String filename; private InputStream stream; private String topic; private Long streamOffset; public void start(Map props) { filename = props.get(FileStreamSourceConnector.FILE_CONFIG); stream = openOrThrowError(filename); topic = props.get(FileStreamSourceConnector.TOPIC_CONFIG); } @Override public synchronized void stop() { stream.close() } ``` These are slightly simplified versions, but show that that these methods should be relatively simple and the only work they perform is allocating or freeing resources. There are two points to note about this implementation. First, the `start()` method does not yet handle resuming from a previous offset, which will be addressed in a later section. Second, the `stop()` method is synchronized. This will be necessary because `SourceTasks` are given a dedicated thread which they can block indefinitely, so they need to be stopped with a call from a different thread in the Worker. Next, implement the main functionality of the task: the `poll()` method that gets records from the input system and returns a `List`: ```java @Override public List poll() throws InterruptedException { try { ArrayList records = new ArrayList<>(); while (streamValid(stream) && records.isEmpty()) { LineAndOffset line = readToNextLine(stream); if (line != null) { Map sourcePartition = Collections.singletonMap("filename", filename); Map sourceOffset = Collections.singletonMap("position", streamOffset); records.add(new SourceRecord(sourcePartition, sourceOffset, topic, Schema.STRING_SCHEMA, line)); } else { Thread.sleep(1); } } return records; } catch (IOException e) { // Underlying stream was killed, probably as a result of calling stop. Allow to return // null, and driving thread will handle any shutdown if necessary. } return null; } ``` Again, some details are omitted, but you can see the important steps: the `poll()` method is going to be called repeatedly, and for each call it will loop trying to read records from the file. For each line it reads, it also tracks the file offset. It uses this information to create an output [SourceRecord](/platform/current/connect/javadocs/javadoc/org/apache/kafka/connect/source/SourceRecord.html) with four pieces of information: the source partition (there is only one, the single file being read), source offset (position in the file), output topic name, and output value (the line, including a schema indicating this value will always be a string). Other variants of the `SourceRecord` constructor can also include a specific output partition and a key. Note that this implementation uses the normal Java `InputStream` interface and may sleep if data is not available. This is acceptable because Kafka Connect provides each task with a dedicated thread. While task implementations have to conform to the basic `poll()` interface, they have a lot of flexibility in how they are implemented. In this case, an NIO-based implementation would be more efficient, but this simple approach works, is quick to implement, and is compatible with older versions of Java. Although not used in the example, `SourceTask` also provides two APIs to commit offsets in the source system: `commit()` and `commitRecord()`. These APIs are provided for source systems which have an acknowledgement mechanism for messages. Overriding these methods allows the source connector to acknowledge messages in the source system, either in bulk or individually, once they have been written to Kafka. The `commit()` API stores the offsets in the source system, up to the offsets that have been returned by `poll()`. The implementation of this API should block until the commit is complete. The `commitRecord()` API saves the offset in the source system for each `SourceRecord` after it is written to Kafka. As Kafka Connect will record offsets automatically, `SourceTask` is not required to implement them. In cases where a connector does need to acknowledge messages in the source system, only one of the APIs is typically required. # Kafka Connect * [Overview](index.md) * [Get Started](userguide.md) * [Connectors](kafka_connectors.md) * [Confluent Hub](confluent-hub/overview.md) * [Connect on z/OS](connect-zos.md) * [Install](install.md) * [License](license.md) * [Supported](supported-overview.md) * [Preview](preview.md) * [Configure](configuring.md) * [Monitor](monitoring.md) * [Logging](logging.md) * [Connect to Confluent Cloud](https://docs.confluent.io/cloud/current/cp-component/connect-cloud-config.html) * [Developer Guide](devguide.md) * [Tutorial: Moving Data In and Out of Kafka](quickstart.md) * [Reference](references/index.md) * [Transform](https://docs.confluent.io/kafka-connectors/transforms/current/overview.html) * [Custom Transforms](transforms/custom.md) * [Security](security-overview.md) * [Design](design.md) * [Add Connectors and Software](extending.md) * [Install Community Connectors](community.md) * [Upgrade](upgrade.md) * [Troubleshoot](troubleshoot.md) * [FileStream Connectors](filestream_connector.md) * [FAQ](faq.md) ## Connect worker role bindings Use the following steps to configure role bindings for the Connect worker: `User:$CONNECT_USER`. 1. Grant principal `User:$CONNECT_USER` the `ResourceOwner` role for `Topic:connect-configs`. ```none confluent iam rbac role-binding create \ --principal User:$CONNECT_USER \ --role ResourceOwner \ --resource Topic:connect-configs \ --kafka-cluster $KAFKA_CLUSTER_ID ``` 2. Grant principal `User:$CONNECT_USER` the `ResourceOwner` role for `Topic:connect-offsets`. ```none confluent iam rbac role-binding create \ --principal User:$CONNECT_USER \ --role ResourceOwner \ --resource Topic:connect-offsets \ --kafka-cluster $KAFKA_CLUSTER_ID ``` 3. Grant principal `User:$CONNECT_USER` the `ResourceOwner` role for `Topic:connect-statuses`. ```none confluent iam rbac role-binding create \ --principal User:$CONNECT_USER \ --role ResourceOwner \ --resource Topic:connect-statuses \ --kafka-cluster $KAFKA_CLUSTER_ID ``` 4. Grant principal `User:$CONNECT_USER` the `ResourceOwner` role for `Group:connect-cluster`. ```none confluent iam rbac role-binding create \ --principal User:$CONNECT_USER \ --role ResourceOwner \ --resource Group:connect-cluster \ --kafka-cluster $KAFKA_CLUSTER_ID ``` 5. Grant principal `User:$CONNECT_USER` the `SecurityAdmin` role. This allows `User:$CONNECT_USER` permission to make requests to the Metadata Service (MDS) to find out if a user making calls to the Connect REST API is authorized to perform required operations. Note that `$CONNECT_USER` does this by making an authorized request to MDS to check `$CLIENT` permissions. ```none confluent iam rbac role-binding create \ --principal User:$CONNECT_USER \ --role SecurityAdmin \ --kafka-cluster $KAFKA_CLUSTER_ID \ --connect-cluster-id $CONNECT_CLUSTER_ID ``` 6. List the role bindings for the principal `User:$CONNECT_USER`. Verify that all the role bindings are properly configured. ```none confluent iam rbac role-binding list \ --principal User:$CONNECT_USER \ --kafka-cluster $KAFKA_CLUSTER_ID \ --connect-cluster-id $CONNECT_CLUSTER_ID ``` The following two steps are required if using a Connect [Secret Registry](connect-rbac-secret-registry.md#connect-rbac-secret-registry). 7. Grant principal `User:$CONNECT_USER` the `ResourceOwner` role to `Topic:_confluent-secrets`. ```none confluent iam rbac role-binding create \ --principal User:$CONNECT_USER \ --role ResourceOwner \ --resource Topic:_secrets \ --kafka-cluster $KAFKA_CLUSTER_ID ``` 8. Grant principal `User:$CONNECT_USER` the `ResourceOwner` role to `Group:secret-registry`. ```none confluent iam rbac role-binding create \ --principal User:$CONNECT_USER \ --role ResourceOwner \ --resource Group:secret-registry \ --kafka-cluster $KAFKA_CLUSTER_ID ``` # Connect Secret Registry Kafka Connect provides a secret serving layer called the Secret Registry. The Secret Registry enables Connect to store encrypted Connect credentials in a topic exposed through a REST API. This eliminates any unencrypted credentials being located in the actual connector configuration. Two additional Connect REST API extensions support the Connect Secret Registry. The first extension enables RBAC. The second extension instantiates the Secret Registry node in Connect. Note that the property takes a comma-separated list of class names. ```properties rest.extension.classes=io.confluent.connect.security.ConnectSecurityExtension,io.confluent.connect.secretregistry.ConnectSecretRegistryExtension ``` The Connect Secret Registry provides the following: * **Persistence:** Secrets are stored in a compacted topic. * **Key grouping:** Secrets are associated with both a *key* and a *path*. This allows multiple keys to be grouped together. Authorization is typically performed at the path level. * **Versioning:** Multiple versions of a secret can be stored. * **Encryption:** Keys are stored in encrypted format. * **Master key rotation:** The master key for encryption can be changed. This allows all secrets to be re-encrypted if necessary. * **Auditing:** All requests to save or retrieve secrets are logged. The first character of the Connect Secret Registry key must be an alphabetic letter (a–z or A–Z). The following sections define the roles used to configure and interact with the Secret Registry and show a worker configuration example. # Kafka Connect and RBAC [Role-Based Access Control (RBAC)](../security/authorization/rbac/overview.md#rbac-overview) can be enabled for your Confluent Platform environment. If RBAC is enabled, there are role bindings that you may need to configure (or have set up for you) before you work with Connect and Connect resources. There are also RBAC configuration parameters that you need to add to your Connect worker configuration and connectors. Connect roles are managed by the RBAC system administrator for your environment. Make sure to review your user principal, RBAC role, and permissions with your RBAC system administrator before creating a Connect cluster or connectors. The following sections provide information about how to configure RBAC access as it applies to Kafka Connect. For information about how to configure RBAC for the overall Confluent Platform environment and other components, see [Role-Based Access Control (RBAC)](../security/authorization/rbac/overview.md#rbac-overview). * [Get Started With RBAC and Kafka Connect](rbac/connect-rbac-getting-started.md) * [Configure RBAC for a Connect Cluster](rbac/connect-rbac-connect-cluster.md) * [Configure RBAC for a Connect Worker](rbac/connect-rbac-worker.md) * [RBAC for self-managed connectors](rbac/connect-rbac-connectors.md) * [Connect Secret Registry](rbac/connect-rbac-secret-registry.md) * [Example Connect role-binding sequence](rbac/connect-rbac-example.md) ## Common Worker Configuration `bootstrap.servers` : A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping - this list only impacts the initial hosts used to discover the full set of servers. This list should be in the form `host1:port1,host2:port2,...`. Since these servers are just used for the initial connection to discover the full cluster membership (which may change dynamically), this list need not contain the full set of servers (you may want more than one, though, in case a server is down). * Type: list * Default: [localhost:9092] * Importance: high `key.converter` : Converter class for key Connect data. This controls the format of the data that will be written to Kafka for source connectors or read from Kafka for sink connectors. Popular formats include Avro and JSON. * Type: class * Default: * Importance: high `value.converter` : Converter class for value Connect data. This controls the format of the data that will be written to Kafka for source connectors or read from Kafka for sink connectors. Popular formats include Avro and JSON. * Type: class * Default: * Importance: high `internal.key.converter` : Converter class for internal key Connect data that implements the `Converter` interface. Used for converting data like offsets and configs. * Type: class * Default: * Importance: low `internal.value.converter` : Converter class for offset value Connect data that implements the `Converter` interface. Used for converting data like offsets and configs. * Type: class * Default: * Importance: low `offset.flush.interval.ms` : Interval at which to try committing offsets for tasks. * Type: long * Default: 60000 * Importance: low `offset.flush.timeout.ms` : Maximum number of milliseconds to wait for records to flush and partition offset data to be committed to offset storage before cancelling the process and restoring the offset data to be committed in a future attempt. * Type: long * Default: 5000 * Importance: low `plugin.path` : The comma-separated list of paths to directories that contain [Kafka Connect plugins](/kafka-connectors/self-managed/userguide.html#installing-kconnect-plugins). * Type: string * Default: * Importance: low `rest.advertised.host.name` : If this is set, this is the hostname that will be given out to other Workers to connect to. * Type: string * Importance: low `rest.advertised.listener` : Configures the listener used for communication between Workers. Valid values are either `http` or `https`. If the listeners property is not defined or if it contains an HTTP listener, the default value for this field is `http`. When the listeners property is defined and contains only HTTPS listeners, the default value is `https`. * Type: string * Importance: low `rest.advertised.port` : If this is set, this is the port that will be given out to other Workers to connect to. * Type: int * Importance: low `listeners` : A list of REST listeners in the format `protocol://host:port,protocol2://host2:port2` that determines the protocol used by Kafka Connect, where the protocol is either HTTP or HTTPS. For example: ```bash listeners=http://localhost:8080,https://localhost:8443 ``` By default, if no listeners are specified, the REST server runs on port 8083 using the HTTP protocol. When using HTTPS, the configuration must include the TLS/SSL configuration. For more details, see [Configuring the Connect REST API for HTTP or HTTPS](../security.md#connect-rest-api-http). * Type: list * Importance: low `response.http.headers.config` : Used to select which HTTP headers are returned in the HTTP response for Confluent Platform components. Specify multiple values in a comma-separated string using the format `[action][header name]:[header value]` where `[action]` is one of the following: `set`, `add`, `setDate`, or `addDate`. You must use quotation marks around the header value when the header value contains commas. For example: ```none response.http.headers.config="add Cache-Control: no-cache, no-store, must-revalidate", add X-XSS-Protection: 1; mode=block, add Strict-Transport-Security: max-age=31536000; includeSubDomains, add X-Content-Type-Options: nosniff ``` * Type: string * Default: “” * Importance: low `task.shutdown.graceful.timeout.ms` : Amount of time to wait for tasks to shutdown gracefully. This is the total amount of time, not per task. All task have shutdown triggered, then they are waited on sequentially. * Type: long * Default: 5000 * Importance: low ### GET /connectors Get a list of active connectors * **Response JSON Object:** * **connectors** (*array*) – List of connector names **Example request**: ```http GET /connectors HTTP/1.1 Host: connect.example.com Accept: application/json ``` **Example response**: ```http HTTP/1.1 200 OK Content-Type: application/json ["my-jdbc-source", "my-hdfs-sink"] ``` **Query parameters**: | Name | Data type | Required / Optional | Description | |------------------|-------------|-----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | `?expand=status` | Map | Optional | Retrieves additional state information for each of the connectors returned in the API call. The endpoint also returns the status of each of the connectors and its tasks as shown in the [?expand=status example](#expand-status) below. | | `?expand=info` | Map | Optional | Returns metadata of each of the connectors such as the configuration, task information, and type of connector as in [?expand=info example](#expand-info) below. | **?expand=status example** ```json { "FileStreamSinkConnectorConnector_0": { "status": { "name": "FileStreamSinkConnectorConnector_0", "connector": { "state": "RUNNING", "worker_id": "10.0.0.162:8083" }, "tasks": [ { "id": 0, "state": "RUNNING", "worker_id": "10.0.0.162:8083" } ], "type": "sink" } }, "DatagenConnectorConnector_0": { "status": { "name": "DatagenConnectorConnector_0", "connector": { "state": "RUNNING", "worker_id": "10.0.0.162:8083" }, "tasks": [ { "id": 0, "state": "RUNNING", "worker_id": "10.0.0.162:8083" } ], "type": "source" } } } ``` **?expand=info example** ```json { "FileStreamSinkConnectorConnector_0": { "info": { "name": "FileStreamSinkConnectorConnector_0", "config": { "connector.class": "org.apache.kafka.connect.file.FileStreamSinkConnector", "file": "/Users/smogili/file.txt", "tasks.max": "1", "topics": "datagen", "name": "FileStreamSinkConnectorConnector_0" }, "tasks": [ { "connector": "FileStreamSinkConnectorConnector_0", "task": 0 } ], "type": "sink" } }, "DatagenConnectorConnector_0": { "info": { "name": "DatagenConnectorConnector_0", "config": { "connector.class": "io.confluent.kafka.connect.datagen.DatagenConnector", "quickstart": "clickstream", "tasks.max": "1", "name": "DatagenConnectorConnector_0", "kafka.topic": "datagen" }, "tasks": [ { "connector": "DatagenConnectorConnector_0", "task": 0 } ], "type": "source" } } } ``` Users can also combine the status and info expands by appending both to the endpoint (for example, `http://localhost:8083/connectors?expand=status&expand=info`). This will return the metadata for the connectors and the current status of the connector and its tasks as shown in the following example: ### InternalSecretConfigProvider Confluent Platform provides another implementation of `ConfigProvider` named `InternalSecretConfigProvider` which is used with the Connect [Secret Registry](/platform/current/connect/rbac/connect-rbac-secret-registry.html). The `InternalSecretConfigProvider` requires [Role-based access control (RBAC)](../security/authorization/rbac/overview.md#rbac-overview) with Secret Registry. The Secret Registry is a secret serving layer that enables Connect to store encrypted Connect credentials in a topic exposed through a REST API. This eliminates any unencrypted credentials being located in the actual connector configuration. The following example shows how `InternalSecretConfigProvider` is configured in the worker configuration file: ```properties ### Standalone mode Standalone mode is typically used for development and testing, or for lightweight, single-agent environments-for example, sending web server logs to Kafka. The following example shows a command that launches a worker in standalone mode: ```bash bin/connect-standalone worker.properties connector1.properties [connector2.properties connector3.properties ...] ``` The first parameter (`worker.properties`) is the [worker configuration properties file](#connect-configuring-workers). Note that `worker.properties` is an example file name. You can use any valid file name for your worker configuration file. This file gives you control over settings such as the Kafka cluster to use and serialization format. For an example configuration file that uses [Avro](http://avro.apache.org/docs/current/) and [Schema Registry](/platform/current/schema-registry/connect.html) in a standalone mode, open the file located at `etc/schema-registry/connect-avro-standalone.properties`. You can copy and modify this file for use as your standalone worker properties file. The second parameter (`connector1.properties`) is the connector configuration properties file. All connectors have configuration properties that are loaded with the worker. As shown in the example, you can launch multiple connectors using this command. If you run multiple standalone workers on the same host machine, the following two configuration properties must be unique for each worker: * `offset.storage.file.filename`: The storage file name for connector offsets. This file is stored on the local filesystem in standalone mode. Using the same file name for two workers will cause offset data to be deleted or overwritten with different values. * `listeners`: A list of URIs the REST API will listen on in the format `protocol://host:port,protocol2://host2:port`–the protocol is either HTTP or HTTPS. You can specify hostname as `0.0.0.0` to bind to all interfaces or leave hostname empty to bind to the default interface. #### NOTE You update the `etc/schema-registry/connect-avro-standalone.properties` file if you need to apply a change to Connect when starting Confluent Platform services using the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/command-reference-index.html). ### Distributed mode Distributed mode does not have any additional command-line parameters other than loading the worker configuration file. New workers will either start a new group or join an existing one with a matching `group.id`. Workers then coordinate with the consumer groups to distribute the work to be done. The following shows an example command that launches a worker in distributed mode: ```bash bin/connect-distributed worker.properties ``` For an example distributed mode configuration file that uses Avro and [Schema Registry](/platform/current/schema-registry/connect.html), open `etc/schema-registry/connect-avro-distributed.properties`. You can make a copy of this file, modify it, use it as the new `worker.properties` file. Note that `worker.properties` is an example file name. You can use any valid file name for your properties file. In standalone mode, connector configuration property files are added as command-line parameters. However, in distributed mode, connectors are deployed and managed using a REST API request. To create connectors, you start the worker and then make a REST request to create the connector. REST request examples are provided in many [supported connector](https://docs.confluent.io/kafka-connectors/self-managed/supported.html) documents. For instance, see the [Azure Blob Storage Source connector REST-based example](https://docs.confluent.io/kafka-connectors/azure-blob-storage-source/current/index.html#rest-based-example) for one example. Note that if you run many distributed workers on one host machine for development and testing, the `listeners` configuration property must be unique for each worker. This is the port the REST interface listens on for HTTP requests. ### YAML ```yaml apiVersion: cmf.confluent.io/v1 kind: FlinkApplication metadata: name: app-1 spec: image: confluentinc/cp-flink:1.19.3-cp1 flinkVersion: v1_19 flinkConfiguration: taskmanager.numberOfTaskSlots: "1" serviceAccount: flink jobManager: resource: memory: 1024m cpu: 1 taskManager: resource: memory: 1024m cpu: 1 job: jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar state: running parallelism: 1 upgradeMode: stateless ``` The resource spec includes the following fields: * `image`: The name of the Docker image that is used to start the Flink cluster. CMF expects this image to be a [Confluent Platform Flink image](https://hub.docker.com/r/confluentinc/cp-flink) or to be derived from a Confluent Platform Flink image. * `flinkVersion`: The Flink version corresponding to the Flink version of the Docker image. * `flinkConfiguration`: A map of Flink configuration parameters. Before the configuration is passed to the Flink cluster, it is merged with the Environment’s default configuration for applications. The Flink configuration is used to configure cluster and job behavior, such as checkpointing, security, logging, and more. For more on Flink job configuration, see [Configure Flink Jobs in Confluent Manager for Apache Flink](../configure/overview.md#cmf-configure). * `serviceAccount`: The name of the Kubernetes service account that is used to start and run the application’s Flink cluster. * `jobManager` & `taskManager`: TheKubernetes specification of the Flink Job Manager and Task Manager pods. * `job.jarURI`: The path to the Flink job JAR file. To learn how to package Flink jobs and make the job JAR available to the cluster, see [Package Flink Jobs](packaging.md#cmf-package). * `job.state`: The desired state of the application. Can be `running` or `suspended`. * `job.parallelism`: The desired execution parallelism of the application. Can be adapted to rescale the application. ## Step 3: Generate mock data In Confluent Platform, you get [events](../_glossary.md#term-event) from an external source by using [Kafka Connect](../connect/index.md#connect-concepts). Connectors enable you to stream large volumes of data to and from your [Kafka cluster](../_glossary.md#term-Kafka-cluster). Confluent publishes many connectors for integrating with external systems, like MongoDB and Elasticsearch. For more information, see the [Kafka Connect Overview](../connect/index.md#kafka-connect) page. In this step, you run the [Datagen Source Connector](https://www.confluent.io/hub/confluentinc/kafka-connect-datagen/) to generate mock data. The mock data is stored in the `pageviews` and `users` topics that you created previously. To learn more about installing connectors, see [Install Self-Managed Connectors](../connect/install.md#connect-install-connectors). 1. In the navigation menu, click **Connect**. 2. Click the `connect-default` cluster in the **Connect clusters** list. 3. Click **Add connector** to start creating a connector for pageviews data. 4. Select the `DatagenConnector` tile. 5. In the **Name** field, enter `datagen-pageviews` as the name of the connector. 6. Enter the following configuration values in the following sections: **Common** section: - **Key converter class:** `org.apache.kafka.connect.storage.StringConverter` **General** section: - **kafka.topic:** Choose `pageviews` from the dropdown menu - **max.interval:** `100` - **quickstart:** `pageviews` 7. Click **Next** to review the connector configuration. When you’re satisfied with the settings, click **Launch**. ![Reviewing connector configuration in Confluent Control Center](images/connect-review-pageviews.png) Run a second instance of the [Datagen Source connector](https://www.confluent.io/hub/confluentinc/kafka-connect-datagen/) connector to produce mock data to the `users` topic. 1. In the navigation menu, click **Connect**. 2. In the **Connect clusters** list, click `connect-default`. 3. Click **Add connector**. 4. Select the `DatagenConnector` tile. 5. In the **Name** field, enter `datagen-users` as the name of the connector. 6. Enter the following configuration values: **Common** section: - **Key converter class:** `org.apache.kafka.connect.storage.StringConverter` **General** section: - **kafka.topic:** Choose `users` from the dropdown menu - **max.interval:** `1000` - **quickstart:** `users` 7. Click **Next** to review the connector configuration. When you’re satisfied with the settings, click **Launch**. 8. In the navigation menu, click **Topics** and in the list, click **users**. 9. Click **Messages** to confirm that the `datagen-users` connector is producing data to the `users` topic. ![Incoming messages displayed in the Topics page in Confluent Control Center](images/c3-topics-messages-users.gif) ## Confluent Platform features At the core of Confluent Platform is Kafka, the most popular open source distributed streaming platform. Kafka enables you to: - Publish and subscribe to streams of records - Store streams of records in a fault tolerant way - Process streams of records Each Confluent Platform release includes the latest release of Kafka and additional tools and services that make it easier to build and manage an event streaming platform. Confluent Platform provides community and commercially licensed features such as [Schema Registry](/platform/current/schema-registry/index.html), [Cluster Linking](../multi-dc-deployments/cluster-linking/index.md#cluster-linking), a [REST Proxy](../kafka-rest/index.md#kafkarest-intro), [100+ pre-built Kafka connectors](../connect/kafka_connectors.md#connectors-self-managed-cp), and [ksqlDB](../ksqldb/overview.md#ksql-home). For more information about Confluent components and the license that applies to them, see [Confluent Licenses](../installation/license.md#cp-license-overview). ![image](images/confluentPlatform.png) ## (Optional) Explore Control Center Confluent Control Center is a web-based tool for managing and monitoring Kafka in Confluent Platform. If you opted to install it as described in [(Optional) Install and configure Confluent Control Center](#get-started-multi-broker-install-config-c3), you can use if for monitoring, to create topics, and other actions. To view your cluster running locally in Control Center, open a browser and navigate to [http://localhost:9021/](http://localhost:9021). - To learn about managing clusters with Confluent Control Center, see [Manage Kafka Clusters Using Control Center for Confluent Platform](https://docs.confluent.io/control-center/current/clusters.html#controlcenter-userguide-clusters) - To view brokers in Confluent Control Center, see [Manage Kafka Brokers Using Control Center for Confluent Platform](https://docs.confluent.io/control-center/current/brokers.html#controlcenter-userguide-brokers) - To manage topics in Confluent Control Center, see [Manage Topics Using Control Center for Confluent Platform](https://docs.confluent.io/control-center/current/topics/overview.html#c3-all-topics) 1. Click either the Brokers card or **Brokers** on the menu to view broker metrics. From the brokers list at the bottom of the page, you can view detailed metrics and drill down on each broker. ![image](images/basics-c3-brokers-list.png) 2. Click **Topics** on the navigation menu. Note that only your test topic and the system (internal) topics are available at this point. The `default_ksql_processing_log` will show up as a topic if you configured and started ksqlDB. There is a lot more to Confluent Control Center, but it is not the focus of this tutorial. To complete similar steps using Confluent Control Center, see the [Quick Start for Confluent Platform](platform-quickstart.md#quickstart). # Kafka Configuration Reference for Confluent Platform Apache Kafka® configuration refers to the various settings and parameters that can be adjusted to optimize the performance, reliability, and security of a Kafka cluster and its clients. Kafka uses key-value pairs in a property file format for configuration. These values can be supplied either from a file or programmatically. The following configuration reference topics include settings for Kafka brokers, producers, and consumers, topics, and Kafka Connect. - [Configure Brokers and Controllers](broker-configs.md#cp-config-brokers) - [Configure Topics](topic-configs.md#cp-config-topics) - [Configure Producers](producer-configs.md#cp-config-producer) - [Configure Consumers](consumer-configs.md#cp-config-consumer) - [Configure Kafka Streams](streams-configs.md#cp-config-streams) - [Configure the AdminClient](admin-configs.md#cp-config-admin) - [Configure Kafka Connect](connect/index.md#cp-config-connect) - [Configure Kafka Source Connectors](connect/source-connect-configs.md#cp-config-source-connect) - [Configure Kafka Sink Connectors](connect/sink-connect-configs.md#cp-config-sink-connect) ### Optional Confluent Replicator Executable configurations Additional configurations that are optional and maybe passed to Replicator Executable via environment variable instead of files are: `REPLICATION_CONFIG` : A file that contains the configuration settings for the replication from the origin cluster. Default location is `/etc/replicator/replication.properties` in the Docker image. `CONSUMER_MONITORING_CONFIG` : A file that contains the configuration settings of the producer writing monitoring information related to Replicator’s consumer. Default location is `/etc/replicator/consumer-monitoring.properties` in the Docker image. `PRODUCER_MONITORING_CONFIG` : A file that contains the configuration settings of the producer writing monitoring information related to Replicator’s producer. Default location is `/etc/replicator/producer-monitoring.properties` in the Docker image. `BLACKLIST` : A comma-separated list of topics that should not be replicated, even if they are included in the whitelist or matched by the regular expression. `WHITELIST` : A comma-separated list of the names of topics that should be replicated. Any topic that is in this list and not in the blacklist will be replicated. `CLUSTER_THREADS` : The total number of threads across all workers in the Replicator cluster. `CONFLUENT_LICENSE` : The Confluent license key. Without the license key, Replicator can be used for a 30-day trial period. `TOPIC_AUTO_CREATE` : Whether to automatically create topics in the destination cluster if required. If you disable automatic topic creation, Kafka Streams and ksqlDB applications continue to work. Kafka Streams and ksqlDB applications use the Admin Client, so topics are still created. `TOPIC_CONFIG_SYNC` : Whether to periodically sync topic configuration to the destination cluster. `TOPIC_CONFIG_SYNC_INTERVAL_MS` : Specifies how frequently to check for configuration changes when `topic.config.sync` is enabled. `TOPIC_CREATE_BACKOFF_MS` : Time to wait before retrying auto topic creation or expansion. `TOPIC_POLL_INTERVAL_MS` : Specifies how frequently to poll the source cluster for new topics matching the whitelist or regular expression. `TOPIC_PRESERVE_PARTITIONS` : Whether to automatically increase the number of partitions in the destination cluster to match the source cluster and ensure that messages replicated from the source cluster use the same partition in the destination cluster. `TOPIC_REGEX` : A regular expression that matches the names of the topics to be replicated. Any topic that matches this expression (or is listed in the whitelist) and not in the blacklist will be replicated. `TOPIC_RENAME_FORMAT` : A format string for the topic name in the destination cluster, which may contain ${topic} as a placeholder for the originating topic name. `TOPIC_TIMESTAMP_TYPE` : The timestamp type for the topics in the destination cluster. ## Overview The systemd service unit files are included in the [RPM](rhel-centos.md#systemd-rhel-centos-install) and [Debian packages](deb-ubuntu.md#systemd-ubuntu-debian-install) for the following Confluent Platform components: - Apache Kafka® (`kafka`) - Kafka Connect (`kafka-connect`) - Confluent REST Proxy (`kafka-rest`) - ksqlDB (`ksql`) - Schema-Registry (`schema-registry`) Each component runs under its own user and a common `confluent` group that are set up during package installation. This configuration ensures proper security separation between components that are running on the same system. The usernames are prefixed with `cp-` followed by the component name. For example, `cp-kafka` and `cp-schema-registry`. For components with persistent storage, such as Kafka, the default component configuration file points to component-specific data directories `/var/lib/`. For example, Kafka points to `/var/lib/kafka`. ## Hardware The following table lists machine recommendations for installing individual Confluent Platform components. Confluent Platform supports both ARM64 and X86 hardware architecture. ARM64 is supported in Confluent Platform 7.6.0 and later. For consistent and optimal performance in your Confluent Platform cluster, ensure all cluster nodes have identical hardware specifications. This includes CPU type and core count, RAM capacity and speed, and storage type with matching performance characteristics like throughput and IOPS. Varying hardware among nodes can cause performance bottlenecks, uneven workload distribution, and overall cluster instability. Maintaining identical hardware across all nodes is essential for high availability and reliable performance in your Confluent Platform deployment. Note that the recommended CPU resource is the same for all platforms. For example, if 12 CPUs is recommended for non-Kubernetes environment, the recommendation for a Kubernetes environment would also be 12 CPU units. The following table lists hardware recommendations. Confluent Platform is used for a wide range of use cases and on a lot of different machines. These recommendations provide a good starting point based on the experiences of Confluent with production clusters, but actual requirements depend on your specific workload. | Component | Nodes | Storage | Memory | CPU | |---------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------|-------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------| | Control Center-Normal mode, see [System Requirements](https://docs.confluent.io/control-center/current/installation/system-requirements.html) | 1 | 200 GB, preferably SSDs | Minimum 8 GB RAM | 4 cores or more | | Control Center-Reduced infrastructure mode, see [System Requirements](https://docs.confluent.io/control-center/current/installation/system-requirements.html) | 1 | 128 GB, preferably SSDs | 8 GB RAM | 4 cores or more | | Control Center (Legacy)-Normal mode | 1 | 300 GB, preferably SSDs | 32 GB RAM (JVM default 6 GB) | 12 cores or more | | Control Center (Legacy)-Reduced infrastructure mode | 1 | 128 GB, preferably SSDs | 8 GB RAM (JVM default 4 GB) | 4 cores or more | | Broker | 3 | - 12 X 1 TB disk. RAID 10 is optional - Separate OS disks from Apache Kafka® storage | 64 GB RAM | 24 cores | | KRaft controller | 3-5 | 64 GB SSD | 4 GB RAM | 4 cores | | Confluent Manager for Apache Flink | 1 Kubernetes pod | 10 GB (Kubernetes persistent volume) | 4 GB RAM (For managing 150 Flink applications) | 3 cores (For managing 150 Flink applications) | | Connect | 2 | Storage is only required at installation time. | 0.5 - 4 GB heap size depending on connectors | Typically not CPU-bound. More cores is better than faster cores. | | Replicator- Same as Connect for nodes, storage, memory, and CPU. (See note that follows about AWS.) | 2 | Storage is only required at installation time. | 0.5 - 4 GB heap size | More cores is better | | ksqlDB - See [Capacity planning](../ksqldb/operate-and-deploy/capacity-planning.md#ksqldb-operate-capacity-planning-ksqldb-resources) | 2 | Use SSD. Sizing depends on the number of concurrent queries and the aggregation performed. Minimum 100 GB for a basic server. | 20 GB RAM | 4 cores | | REST Proxy | 2 | Storage is only required at installation time. | 1 GB overhead plus 64 MB per producer and 16 MB per consumer | 16 cores to handle HTTP requests in parallel and background threads for consumers and producers. | | Schema Registry | 2 | Storage is only required at installation time. | 1 GB heap size | Typically not CPU-bound. More cores is better than faster cores. | * If you want to use RAID disks, the recommendation is: * RAID 1 and RAID 10: Preferred * RAID 0: 2nd preferred * RAID 5: Not recommended ## Step 2: Upgrade Confluent Platform components In this step, you will upgrade the Confluent Platform components. For a [rolling upgrade](upgrade.md#rolling-upgrade), you can do this on one server at a time while the cluster continues to run. The details depend on your environment, but the steps to upgrade components are the same. You should always upgrade Confluent Control Center as the final Confluent Platform component. Upgrade steps: 1. Stop the Confluent Platform components. 2. Back up configuration files, for example in `./etc/kafka`. 3. Remove existing packages and their dependencies. 4. Install new packages. 5. Restart the Confluent Platform components. For details on how to upgrade different package types, see the following sections: - [Upgrade DEB packages using APT](upgrade.md#upgrade-deb-packages) - [Upgrade RPM packages by using YUM](upgrade.md#upgrade-rpm-packages) - [Upgrade using TAR or ZIP archives](upgrade.md#upgrade-tar-zip-archives) For details on how to upgrade individual Confluent Platform components, see the following sections: - [Upgrade Schema Registry](upgrade.md#upgrade-sr) - [Upgrade Confluent REST Proxy](upgrade.md#upgrade-rest-proxy) - [Upgrade Kafka Streams applications](upgrade.md#upgrade-kafka-streams) - [Upgrade Kafka Connect](upgrade.md#upgrade-connect) The [Confluent Replicator](../multi-dc-deployments/replicator/index.md#replicator-detail) version must match the Connect version it is deployed on. For example, Replicator 8.1 should only be deployed to Connect 8.1, so if you upgrade Connect, you must upgrade Replicator. - [Upgrade ksqlDB](upgrade.md#upgrade-ksqldb) - [Upgrade Control Center](https://docs.confluent.io/control-center/current/installation/upgrade.html) ## Hardware If you have followed the normal development path, you have tried Apache Kafka® on your laptop or on a small cluster of machines. But when it comes time to deploying Kafka to production, there are a few recommendations that you should consider. The following table lists hardware recommendations. Nothing is a hard-and-fast rule; Kafka is used for a wide range of use cases and on a lot of different machines. These recommendations provide a good starting point based on the experiences of Confluent with production clusters. | Component | Nodes | Storage | Memory | CPU | |---------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------|-------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------| | Control Center-Normal mode, see [System Requirements](https://docs.confluent.io/control-center/current/installation/system-requirements.html) | 1 | 200 GB, preferably SSDs | Minimum 8 GB RAM | 4 cores or more | | Control Center-Reduced infrastructure mode, see [System Requirements](https://docs.confluent.io/control-center/current/installation/system-requirements.html) | 1 | 128 GB, preferably SSDs | 8 GB RAM | 4 cores or more | | Control Center (Legacy)-Normal mode | 1 | 300 GB, preferably SSDs | 32 GB RAM (JVM default 6 GB) | 12 cores or more | | Control Center (Legacy)-Reduced infrastructure mode | 1 | 128 GB, preferably SSDs | 8 GB RAM (JVM default 4 GB) | 4 cores or more | | Broker | 3 | - 12 X 1 TB disk. RAID 10 is optional - Separate OS disks from Apache Kafka® storage | 64 GB RAM | 24 cores | | KRaft controller | 3-5 | 64 GB SSD | 4 GB RAM | 4 cores | | Confluent Manager for Apache Flink | 1 Kubernetes pod | 10 GB (Kubernetes persistent volume) | 4 GB RAM (For managing 150 Flink applications) | 3 cores (For managing 150 Flink applications) | | Connect | 2 | Storage is only required at installation time. | 0.5 - 4 GB heap size depending on connectors | Typically not CPU-bound. More cores is better than faster cores. | | Replicator- Same as Connect for nodes, storage, memory, and CPU. (See note that follows about AWS.) | 2 | Storage is only required at installation time. | 0.5 - 4 GB heap size | More cores is better | | ksqlDB - See [Capacity planning](../ksqldb/operate-and-deploy/capacity-planning.md#ksqldb-operate-capacity-planning-ksqldb-resources) | 2 | Use SSD. Sizing depends on the number of concurrent queries and the aggregation performed. Minimum 100 GB for a basic server. | 20 GB RAM | 4 cores | | REST Proxy | 2 | Storage is only required at installation time. | 1 GB overhead plus 64 MB per producer and 16 MB per consumer | 16 cores to handle HTTP requests in parallel and background threads for consumers and producers. | | Schema Registry | 2 | Storage is only required at installation time. | 1 GB heap size | Typically not CPU-bound. More cores is better than faster cores. | * If you want to use RAID disks, the recommendation is: * RAID 1 and RAID 10: Preferred * RAID 0: 2nd preferred * RAID 5: Not recommended # Monitoring Kafka with JMX in Confluent Platform Confluent Platform is a data-streaming platform that completes Kafka with advanced capabilities designed to help accelerate application development and connectivity for enterprise use cases. This topic describes the Java Management Extensions (JMX) and Managed Beans (MBeans) that are enabled by default for Kafka and Confluent Platform to enable monitoring of your Kafka applications. The next several sections describe how to configure JMX, verify that you have configured it correctly, and lists MBeans by Confluent Platform component. Note that features that are not enabled in your deployment will not generate MBeans. You can [search for a metric by name](#search-for-metric). You can also browse metrics by category: - [Broker metrics](#kafka-monitoring-metrics-broker) - [KRaft broker metrics](#kraft-broker-metrics) - [KRaft Quorum metrics](#kraft-quorum-metrics) - [Controller metrics](#controller-metrics) - [Log metrics](#log-metrics) - [Network metrics](#network-metrics) - [Producer metrics](#kafka-monitoring-metrics-producer) - [Consumer metrics](#kafka-monitoring-metrics-consumer) - [Consumer group metrics](#kafka-monitoring-metrics-consumer-group) - [Audit metrics](#audit-metrics) - [Authorizer metrics](#authorizer-metrics) - [RBAC and LDAP metrics](#rbac-and-ldap-health-metrics) To monitor these metrics with Docker, see [Monitoring with Docker Deployments](../installation/docker/operations/monitoring.md#use-jmx-monitor-docker-deployments). Find metrics for specific Confluent Platform and Kafka features in the following topics: - [Cluster linking metrics](../multi-dc-deployments/cluster-linking/metrics.md#cluster-linking-metrics) - [Connect metrics](../connect/monitoring.md#connect-monitoring-config-connectors) - [Kafka Streams metrics](../streams/monitoring.md#streams-monitoring) Confluent offers some alternatives to using JMX monitoring. - **Confluent Control Center**: You can deploy [Control Center](/control-center/current/overview.html) for out-of-the-box Kafka cluster monitoring so you don’t have to build your own monitoring system. - **Health+**: Consider monitoring and managing your environment with [Monitor Confluent Platform with Health+](../health-plus/index.md#health-plus). Ensure the health of your clusters and minimize business disruption with intelligent alerts, monitoring, and proactive support based on best practices created by the inventors of Kafka. #### IMPORTANT Secrets `config.providers` do not propagate to prefixes such as `client.*`. Thus, when using prefixes with secrets you must specify `config.providers` and `config.providers.securepass.class`. Refer to [Using prefixes in secrets configurations](../security/compliance/secrets/overview.md#secrets-prefixes) for details. | Security Configuration | Prefix | Where to Configure | |-------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------| | Audit logging | `confluent.security.event.` | `etc/kafka/server.properties` | | Broker | none | `etc/kafka/server.properties` | | Broker LDAP configurations | `ldap.` | `etc/kafka/server.properties` | | Broker Metadata Service (MDS) back-end configurations | `confluent.metadata.` | `etc/kafka/server.properties` | | Metadata Service (MDS) configurations | `confluent.metadata.server.` | `etc/kafka/server.properties` | | Console Clients | none | `client properties` (for example, `producer.config` or `consumer.config`) | | Connect workers | none, `producer.`, `consumer.`, or `admin.` | `etc/kafka/connect-distributed.properties` | | Control Center | `confluent.controlcenter.streams.` `confluent.controlcenter.connect.` `confluent.controlcenter.ksql.` | `etc/confluent-control-center/control-center.properties` | | Java Clients | Java clients use static parameters defined in the Javadoc: - [SSL](/platform/current/clients/javadocs/javadoc/org/apache/kafka/common/config/SslConfigs.html) - [SASL](/platform/current/clients/javadocs/javadoc/org/apache/kafka/common/config/SaslConfigs.html) | SslConfigs or SaslConfigs in Properties class | | Metrics Reporter | `confluent.metrics.reporter.` | `etc/kafka/server.properties` | | Rebalancer | `confluent.rebalancer.metrics.` | Pass configuration (e.g. `rebalance-metrics-client.properties`) using `--config-file` | | Replicator | - `dest.kafka.` - `src.kafka.` | connector JSON file (not the worker properties file) | | REST Proxy | `client.` | `etc/kafka/kafka-rest.properties` | | Schema Registry | `kafkastore.` | `etc/schema-registry/schema-registry.properties` | ### **ReplicaStatus** ```bash /clusters/{cluster_id}/topics/-/partitions/-/replica-status /clusters/{cluster_id}/topics/{topic_name}/partitions/-/replica-status /clusters/{cluster_id}/topics/{topic_name}/partitions/{partition_id}/replica-status ``` REST that runs with a Confluent Server deployment provides the full set of REST APIs. REST that runs in a Standalone deployment consists of the open-source Kafka REST APIs only. For more information about the open-source Kafka REST APIs available, see [Kafka REST Proxy](https://github.com/confluentinc/kafka-rest#kafka-rest-proxy) and the [openapi yaml](https://github.com/confluentinc/kafka-rest/blob/master/api/v3/openapi.yaml). When using the API in Confluent Server, all paths should be prefixed with `/kafka` as opposed to Standalone REST Proxy. For example, the path to list clusters is: * Confluent Server: `/kafka/v3/clusters` * Standalone REST Proxy: `/v3/clusters` Confluent Server provides an embedded instance of these APIs on the Kafka brokers for the v3 Admin API. The embedded APIs run on the Confluent HTTP service, `confluent.http.server.listeners`. Therefore, if you have the HTTP server running, the REST Proxy v3 API is automatically available to you through the brokers. Note that the [Metadata Server (MDS)](../security/authorization/rbac/mds-api.md#mds-api) is also running on the Confluent HTTP service, as another endpoint available to you with additional configurations. ### GET /clusters/{cluster_id} **Get Cluster** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Return the Kafka cluster with the specified `cluster_id`. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. **Example request:** ```http GET /clusters/{cluster_id} HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The Kafka cluster. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaCluster", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1", "resource_name": "crn:///kafka=cluster-1" }, "cluster_id": "cluster-1", "controller": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers/1" }, "acls": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/acls" }, "brokers": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers" }, "broker_configs": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/broker-configs" }, "consumer_groups": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups" }, "topics": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics" }, "partition_reassignments": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/-/partitions/-/reassignment" } } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### PUT /clusters/{cluster_id}/broker-configs/{name} **Update Dynamic Broker Config** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Update the dynamic cluster-wide broker configuration parameter specified by `name`. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **name** (*string*) – The configuration parameter name. **Example request:** ```http PUT /clusters/{cluster_id}/broker-configs/{name} HTTP/1.1 Host: example.com Content-Type: application/json { "value": "gzip" } ``` * **Status Codes:** * [204 No Content](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.5) – No Content * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### DELETE /clusters/{cluster_id}/broker-configs/{name} **Reset Dynamic Broker Config** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Reset the configuration parameter specified by `name` to its default value by deleting a dynamic cluster-wide configuration. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **name** (*string*) – The configuration parameter name. * **Status Codes:** * [204 No Content](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.5) – No Content * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### PUT /clusters/{cluster_id}/brokers/{broker_id}/configs/{name} **Update Broker Config** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Update the configuration parameter specified by `name`. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **broker_id** (*integer*) – The Kafka broker ID. * **name** (*string*) – The configuration parameter name. **Example request:** ```http PUT /clusters/{cluster_id}/brokers/{broker_id}/configs/{name} HTTP/1.1 Host: example.com Content-Type: application/json { "value": "gzip" } ``` * **Status Codes:** * [204 No Content](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.5) – No Content * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### DELETE /clusters/{cluster_id}/brokers/{broker_id}/configs/{name} **Reset Broker Config** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Reset the configuration parameter specified by `name` to its default value. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **broker_id** (*integer*) – The Kafka broker ID. * **name** (*string*) – The configuration parameter name. * **Status Codes:** * [204 No Content](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.5) – No Content * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### POST /clusters/{cluster_id}/topics/{topic_name}/configs:alter **Batch Alter Topic Configs** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Update or delete a set of topic configuration parameters. Also supports a dry-run mode that only validates whether the operation would succeed if the `validate_only` request property is explicitly specified and set to true. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **topic_name** (*string*) – The topic name. **batch_alter_topic_configs:** ```http POST /clusters/{cluster_id}/topics/{topic_name}/configs:alter HTTP/1.1 Host: example.com Content-Type: application/json { "data": [ { "name": "cleanup.policy", "operation": "DELETE" }, { "name": "compression.type", "value": "gzip" } ] } ``` **validate_only_batch_alter_topic_configs:** ```http POST /clusters/{cluster_id}/topics/{topic_name}/configs:alter HTTP/1.1 Host: example.com Content-Type: application/json { "data": [ { "name": "cleanup.policy", "operation": "DELETE" }, { "name": "compression.type", "value": "gzip" } ], "validate_only": true } ``` * **Status Codes:** * [204 No Content](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.5) – No Content * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [404 Not Found](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.5) – Indicates attempted access to an unreachable or non-existing resource like e.g. an unknown topic or partition. GET requests to endpoints not allowed in the accesslists will also result in this response. **endpoint_not_found:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "error_code": 404, "message": "HTTP 404 Not Found" } ``` **cluster_not_found:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "error_code": 404, "message": "Cluster my-cluster cannot be found." } ``` **unknown_topic_or_partition:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "error_code": 40403, "message": "This server does not host this topic-partition." } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### GET /clusters/{cluster_id}/topics/-/configs **List All Topic Configs** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Return the list of configuration parameters for all topics hosted by the specified cluster. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. **Example request:** ```http GET /clusters/{cluster_id}/topics/-/configs HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The list of cluster configs. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaTopicConfigList", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/topic-1/configs", "next": null }, "data": [ { "kind": "KafkaTopicConfig", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/topic-1/configs/cleanup.policy", "resource_name": "crn:///kafka=cluster-1/topic=topic-1/config=cleanup.policy" }, "cluster_id": "cluster-1", "topic_name": "topic-1", "name": "cleanup.policy", "value": "compact", "is_default": false, "is_read_only": false, "is_sensitive": false, "source": "DYNAMIC_TOPIC_CONFIG", "synonyms": [ { "name": "cleanup.policy", "value": "compact", "source": "DYNAMIC_TOPIC_CONFIG" }, { "name": "cleanup.policy", "value": "delete", "source": "DEFAULT_CONFIG" } ] }, { "kind": "KafkaTopicConfig", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/topic-1/configs/compression.type", "resource_name": "crn:///kafka=cluster-1/topic=topic-1/config=compression.type" }, "cluster_id": "cluster-1", "topic_name": "topic-1", "name": "compression.type", "value": "gzip", "is_default": false, "is_read_only": false, "is_sensitive": false, "source": "DYNAMIC_TOPIC_CONFIG", "synonyms": [ { "name": "compression.type", "value": "gzip", "source": "DYNAMIC_TOPIC_CONFIG" }, { "name": "compression.type", "value": "producer", "source": "DEFAULT_CONFIG" } ] } ] } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### GET /clusters/{cluster_id}/brokers/{broker_id} **Get Broker** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Return the broker specified by `broker_id`. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **broker_id** (*integer*) – The Kafka broker ID. **Example request:** ```http GET /clusters/{cluster_id}/brokers/{broker_id} HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The broker. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaBroker", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers/1", "resource_name": "crn:///kafka=cluster-1/broker=1" }, "cluster_id": "cluster-1", "broker_id": 1, "host": "localhost", "port": 9291, "configs": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers/1/configs" }, "partition_replicas": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers/1/partition-replicas" } } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### DELETE /clusters/{cluster_id}/brokers/{broker_id} **Delete Broker** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Delete the broker that is specified by `broker_id`. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **broker_id** (*integer*) – The Kafka broker ID. * **Query Parameters:** * **should_shutdown** (*boolean*) – To shutdown the broker or not, Default: true * **Status Codes:** * [202 Accepted](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.3) – The single broker removal response **Example response:** ```http HTTP/1.1 202 Accepted Content-Type: application/json { "kind": "KafkaBrokerRemoval", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers/1", "resource_name": "crn:///kafka=cluster-1/broker=1/" }, "cluster_id": "cluster-1", "broker_id": 1, "broker_task": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers/1" }, "broker": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers/1" } } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Bad broker or balancer request **IllegalBrokerRemoval:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot remove broker 1 as there are partitions with replication factor equal to 1 on the broker. One such partition: test_topic_partition_0." } ``` **BalancerOffline:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "The Confluent Balancer component is disabled or not started yet." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [404 Not Found](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.5) – Broker not found. **Example response:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "error_code": 404, "message": "Broker not found. Broker: 1 not found in the cluster: cluster-1" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### GET /clusters/{cluster_id}/consumer-groups **List Consumer Groups** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Return the list of consumer groups that belong to the specified Kafka cluster. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. **Example request:** ```http GET /clusters/{cluster_id}/consumer-groups HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The list of consumer groups. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaConsumerGroupList", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups", "next": null }, "data": [ { "kind": "KafkaConsumerGroup", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-1", "resource_name": "crn:///kafka=cluster-1/consumer-group=consumer-group-1" }, "cluster_id": "cluster-1", "consumer_group_id": "consumer-group-1", "is_simple": false, "partition_assignor": "org.apache.kafka.clients.consumer.RoundRobinAssignor", "state": "STABLE", "coordinator": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers/1" }, "consumers": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-1/consumers" }, "lag_summary": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-1/lag-summary" } }, { "kind": "KafkaConsumerGroup", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-2", "resource_name": "crn:///kafka=cluster-1/consumer-group=consumer-group-2" }, "cluster_id": "cluster-1", "consumer_group_id": "consumer-group-2", "is_simple": false, "partition_assignor": "org.apache.kafka.clients.consumer.StickyAssignor", "state": "PREPARING_REBALANCE", "coordinator": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers/2" }, "consumers": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-2/consumers" }, "lag_summary": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-2/lag-summary" } }, { "kind": "KafkaConsumerGroup", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-3", "resource_name": "crn:///kafka=cluster-1/consumer-group=consumer-group-3" }, "cluster_id": "cluster-1", "consumer_group_id": "consumer-group-3", "is_simple": false, "partition_assignor": "org.apache.kafka.clients.consumer.RangeAssignor", "state": "DEAD", "coordinator": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers/3" }, "consumers": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-3/consumers" }, "lag_summary": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-3/lag-summary" } } ] } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### GET /clusters/{cluster_id}/consumer-groups/{consumer_group_id}/lags **List Consumer Lags** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy)[![Available in dedicated clusters only](https://img.shields.io/badge/-Available%20in%20dedicated%20clusters%20only-%23bc8540)](https://docs.confluent.io/cloud/current/clusters/cluster-types.html#dedicated-cluster) Return a list of consumer lags of the consumers belonging to the specified consumer group. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **consumer_group_id** (*string*) – The consumer group ID. **Example request:** ```http GET /clusters/{cluster_id}/consumer-groups/{consumer_group_id}/lags HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The list of consumer lags. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaConsumerLagList", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-1/lags", "next": null }, "data": [ { "kind": "KafkaConsumerLag", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-1/lags/topic-1/partitions/1", "resource_name": "crn:///kafka=cluster-1/consumer-group=consumer-group-1/lag=topic-1/partition=1" }, "cluster_id": "cluster-1", "consumer_group_id": "consumer-group-1", "topic_name": "topic-1", "partition_id": 1, "consumer_id": "consumer-1", "instance_id": "consumer-instance-1", "client_id": "client-1", "current_offset": 1, "log_end_offset": 101, "lag": 100 }, { "kind": "KafkaConsumerLag", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-1/lags/topic-1/partitions/2", "resource_name": "crn:///kafka=cluster-1/consumer-group=consumer-group-1/lag=topic-1/partition=2" }, "cluster_id": "cluster-1", "consumer_group_id": "consumer-group-1", "topic_name": "topic-1", "partition_id": 2, "consumer_id": "consumer-2", "instance_id": "consumer-instance-2", "client_id": "client-2", "current_offset": 1, "log_end_offset": 11, "lag": 10 }, { "kind": "KafkaConsumerLag", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-1/lags/topic-1/partitions/3", "resource_name": "crn:///kafka=cluster-1/consumer-group=consumer-group-1/lag=topic-1/partition=3" }, "cluster_id": "cluster-1", "consumer_group_id": "consumer-group-1", "topic_name": "topic-1", "partition_id": 3, "consumer_id": "consumer-3", "instance_id": "consumer-instance-3", "client_id": "client-3", "current_offset": 1, "log_end_offset": 1, "lag": 0 } ] } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### GET /clusters/{cluster_id}/consumer-groups/{consumer_group_id}/consumers/{consumer_id} **Get Consumer** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Return the consumer specified by the `consumer_id`. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **consumer_group_id** (*string*) – The consumer group ID. * **consumer_id** (*string*) – The consumer ID. **Example request:** ```http GET /clusters/{cluster_id}/consumer-groups/{consumer_group_id}/consumers/{consumer_id} HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The consumer. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaConsumer", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-1/consumers/consumer-1", "resource_name": "crn:///kafka=cluster-1/consumer-group=consumer-group-1/consumer=consumer-1" }, "cluster_id": "cluster-1", "consumer_group_id": "consumer-group-1", "consumer_id": "consumer-1", "instance_id": "consumer-instance-1", "client_id": "client-1", "assignments": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-1/consumers/consumer-1/assignments" } } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### GET /clusters/{cluster_id}/consumer-groups/{consumer_group_id}/consumers/{consumer_id}/assignments **List Consumer Assignments** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Return a list of partition assignments for the specified consumer. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **consumer_group_id** (*string*) – The consumer group ID. * **consumer_id** (*string*) – The consumer ID. **Example request:** ```http GET /clusters/{cluster_id}/consumer-groups/{consumer_group_id}/consumers/{consumer_id}/assignments HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The list of consumer group assignments. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaConsumerAssignmentList", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-1/consumers/consumer-1/assignments", "next": null }, "data": [ { "kind": "KafkaConsumerAssignment", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-1/consumers/consumer-1/assignments/topic-1/partitions/1", "resource_name": "crn:///kafka=cluster-1/consumer-group=consumer-group-1/consumer=consumer-1/assignment=topic=1/partition=1" }, "cluster_id": "cluster-1", "consumer_group_id": "consumer-group-1", "consumer_id": "consumer-1", "topic_name": "topic-1", "partition_id": 1, "partition": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/topic-1/partitions/1" }, "lag": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-1/lags/topic-1/partitions/1" } }, { "kind": "KafkaConsumerAssignment", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-1/consumers/consumer-1/assignments/topic-2/partitions/2", "resource_name": "crn:///kafka=cluster-1/consumer-group=consumer-group-1/consumer=consumer-1/assignment=topic=2/partition=2" }, "cluster_id": "cluster-1", "consumer_group_id": "consumer-group-1", "consumer_id": "consumer-1", "topic_name": "topic-2", "partition_id": 2, "partition": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/topic-2/partitions/2" }, "lag": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-1/lags/topic-2/partitions/2" } }, { "kind": "KafkaConsumerAssignment", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-1/consumers/consumer-1/assignments/topic-3/partitions/3", "resource_name": "crn:///kafka=cluster-1/consumer-group=consumer-group-1/consumer=consumer-1/assignment=topic=3/partition=3" }, "cluster_id": "cluster-1", "consumer_group_id": "consumer-group-1", "consumer_id": "consumer-1", "topic_name": "topic-3", "partition_id": 3, "partition": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/topic-3/partitions/3" }, "lag": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-1/lags/topic-3/partitions/3" } } ] } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### GET /clusters/{cluster_id}/consumer-groups/{consumer_group_id}/consumers/{consumer_id}/assignments/{topic_name}/partitions/{partition_id} **Get Consumer Assignment** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Return information about the assignment for the specified consumer to the specified partition. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **consumer_group_id** (*string*) – The consumer group ID. * **consumer_id** (*string*) – The consumer ID. * **topic_name** (*string*) – The topic name. * **partition_id** (*integer*) – The partition ID. **Example request:** ```http GET /clusters/{cluster_id}/consumer-groups/{consumer_group_id}/consumers/{consumer_id}/assignments/{topic_name}/partitions/{partition_id} HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The consumer group assignment. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaConsumerAssignment", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-1/consumers/consumer-1/assignments/topic-1/partitions/1", "resource_name": "crn:///kafka=cluster-1/consumer-group=consumer-group-1/consumer=consumer-1/assignment=topic=1/partition=1" }, "cluster_id": "cluster-1", "consumer_group_id": "consumer-group-1", "consumer_id": "consumer-1", "topic_name": "topic-1", "partition_id": 1, "partition": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/topic-1/partitions/1" }, "lag": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/consumer-groups/consumer-group-1/lags/topic-1/partitions/1" } } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### PATCH /clusters/{cluster_id}/topics/{topic_name} **Update Partition Count** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Increase the number of partitions for a topic. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **topic_name** (*string*) – The topic name. **Example request:** ```http PATCH /clusters/{cluster_id}/topics/{topic_name} HTTP/1.1 Host: example.com Content-Type: application/json { "partitions_count": 10 } ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The topic. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaTopic", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/topic-1", "resource_name": "crn:///kafka=cluster-1/topic=topic-1" }, "cluster_id": "cluster-1", "topic_name": "topic-1", "is_internal": false, "replication_factor": 3, "partitions_count": 1, "partitions": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/topic-1/partitions" }, "configs": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/topic-1/configs" }, "partition_reassignments": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/topic-1/partitions/-/reassignments" } } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **topic_update_partitions_invalid:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40002, "message": "Topic already has 1 partitions." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### DELETE /clusters/{cluster_id}/topics/{topic_name} **Delete Topic** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Delete the topic with the given `topic_name`. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **topic_name** (*string*) – The topic name. * **Status Codes:** * [204 No Content](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.5) – No Content * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [404 Not Found](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.5) – Indicates attempted access to an unreachable or non-existing resource like e.g. an unknown topic or partition. GET requests to endpoints not allowed in the accesslists will also result in this response. **endpoint_not_found:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "error_code": 404, "message": "HTTP 404 Not Found" } ``` **cluster_not_found:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "error_code": 404, "message": "Cluster my-cluster cannot be found." } ``` **unknown_topic_or_partition:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "error_code": 40403, "message": "This server does not host this topic-partition." } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### GET /clusters/{cluster_id}/links/{link_name}/configs/{config_name} **Describe the config under the cluster link** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **link_name** (*string*) – The link name * **config_name** (*string*) – The link config name **Example request:** ```http GET /clusters/{cluster_id}/links/{link_name}/configs/{config_name} HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – Config name and value **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaLinkConfigData", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/1Rh_4htxSuen7RYGvGmgNw/links/my-new-link-1", "resource_name": null }, "cluster_id": "1Rh_4htxSuen7RYGvGmgNw", "name": "consumer.offset.sync.ms", "value": "3825940", "is_default": false, "is_read_only": false, "is_sensitive": false, "source": "DYNAMIC_CLUSTER_LINK_CONFIG", "synonyms": [ "cosm" ], "link_name": "link-db-1" } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### GET /clusters/{cluster_id}/links/{link_name}/mirrors/{mirror_topic_name} **Describe the mirror topic** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **link_name** (*string*) – The link name * **mirror_topic_name** (*string*) – Cluster Linking mirror topic name * **Query Parameters:** * **include_state_transition_errors** (*boolean*) – Whether to include mirror state transition errors in the response. Default: false **Example request:** ```http GET /clusters/{cluster_id}/links/{link_name}/mirrors/{mirror_topic_name} HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – Metadata of the mirror topic **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaMirrorData", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/link/link-1/mirrors/topic-1", "resource_name": "crn:///kafka=cluster-1" }, "link_name": "link-sb-1", "mirror_topic_name": "topic-1", "source_topic_name": "topic-1", "num_partitions": 3, "mirror_lags": [ { "partition": 0, "lag": 0, "last_source_fetch_offset": 0 }, { "partition": 1, "lag": 10000, "last_source_fetch_offset": 1000 }, { "partition": 2, "lag": 40000, "last_source_fetch_offset": 12030 } ], "mirror_status": "ACTIVE", "mirror_topic_error": "NO_ERROR", "state_time_ms": 1612550939300 } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### GET /kafka/v3/clusters/{cluster_id}/share-groups **List Share Groups** [![Early Access](https://img.shields.io/badge/Lifecycle%20Stage-Early%20Access-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Return the list of share groups that belong to the specified Kafka cluster. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. **Example request:** ```http GET /kafka/v3/clusters/{cluster_id}/share-groups HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The list of share groups. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaShareGroupList", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/share-groups", "next": null }, "data": [ { "kind": "KafkaShareGroup", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/share-groups/share-group-1", "resource_name": "crn:///kafka=cluster-1/share-group=share-group-1" }, "cluster_id": "cluster-1", "share_group_id": "share-group-1", "state": "STABLE", "coordinator": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers/1" }, "consumers": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/share-groups/share-group-1/consumers" }, "consumer_count": 2, "partition_count": 3, "assigned_topic_partitions": [ { "kind": "KafkaShareGroupTopicPartition", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/share-groups/share-group-1/assigned-topic-partitions/topic-1/0", "resource_name": "crn:///kafka=cluster-1/share-group=share-group-1/topic-partition=topic-1:0" }, "topic_name": "topic-1", "partition_id": 0, "partition": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/topic-1/partitions/0" } } ] }, { "kind": "KafkaShareGroup", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/share-groups/share-group-2", "resource_name": "crn:///kafka=cluster-1/share-group=share-group-2" }, "cluster_id": "cluster-1", "share_group_id": "share-group-2", "state": "EMPTY", "coordinator": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers/2" }, "consumers": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/share-groups/share-group-2/consumers" }, "consumer_count": 2, "partition_count": 3, "assigned_topic_partitions": [ { "kind": "KafkaShareGroupTopicPartition", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/share-groups/share-group-2/assigned-topic-partitions/topic-1/0", "resource_name": "crn:///kafka=cluster-1/share-group=share-group-2/topic-partition=topic-1:0" }, "topic_name": "topic-1", "partition_id": 0, "partition": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/topic-1/partitions/0" } } ] } ] } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### GET /kafka/v3/clusters/{cluster_id}/share-groups/{group_id} **Get Share Group** [![Early Access](https://img.shields.io/badge/Lifecycle%20Stage-Early%20Access-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Return the share group specified by the `group_id`. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **group_id** (*string*) – The group ID. **Example request:** ```http GET /kafka/v3/clusters/{cluster_id}/share-groups/{group_id} HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The share group. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaShareGroup", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/share-groups/share-group-1", "resource_name": "crn:///kafka=cluster-1/share-group=share-group-1" }, "cluster_id": "cluster-1", "share_group_id": "share-group-1", "state": "STABLE", "coordinator": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers/1" }, "consumers": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/share-groups/share-group-1/consumers" }, "consumer_count": 2, "partition_count": 3, "assigned_topic_partitions": [ { "kind": "KafkaShareGroupTopicPartition", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/share-groups/share-group-1/assigned-topic-partitions/topic-1/0", "resource_name": "crn:///kafka=cluster-1/share-group=share-group-1/topic-partition=topic-1:0" }, "topic_name": "topic-1", "partition_id": 0, "partition": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/topic-1/partitions/0" } }, { "kind": "KafkaShareGroupTopicPartition", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/share-groups/share-group-1/assigned-topic-partitions/topic-1/1", "resource_name": "crn:///kafka=cluster-1/share-group=share-group-1/topic-partition=topic-1:1" }, "topic_name": "topic-1", "partition_id": 1, "partition": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/topic-1/partitions/1" } } ] } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### GET /kafka/v3/clusters/{cluster_id}/share-groups/{group_id}/consumers/{consumer_id} **Get Share Group Consumer** [![Early Access](https://img.shields.io/badge/Lifecycle%20Stage-Early%20Access-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Return the consumer specified by the `consumer_id`. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **group_id** (*string*) – The group ID. * **consumer_id** (*string*) – The consumer ID. **Example request:** ```http GET /kafka/v3/clusters/{cluster_id}/share-groups/{group_id}/consumers/{consumer_id} HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The consumer. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaShareGroupConsumer", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/share-groups/share-group-1/consumers/consumer-1", "resource_name": "crn:///kafka=cluster-1/share-group=share-group-1/consumer=consumer-1" }, "cluster_id": "cluster-1", "group_id": "share-group-1", "consumer_id": "consumer-1", "client_id": "client-1", "assignments": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/share-groups/share-group-1/consumers/consumer-1/assignments" } } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### GET /kafka/v3/clusters/{cluster_id}/share-groups/{group_id}/consumers/{consumer_id}/assignments **List Share Group Consumer Assignments** [![Early Access](https://img.shields.io/badge/Lifecycle%20Stage-Early%20Access-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Return the consumer assignments specified by the `consumer_id`. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **group_id** (*string*) – The group ID. * **consumer_id** (*string*) – The consumer ID. **Example request:** ```http GET /kafka/v3/clusters/{cluster_id}/share-groups/{group_id}/consumers/{consumer_id}/assignments HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The list of share group assignments. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaConsumerAssignmentList", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/share-groups/share-group-1/consumers/consumer-1/assignments", "next": null }, "data": [ { "kind": "KafkaShareGroupConsumerAssignment", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/share-groups/share-group-1/consumers/consumer-1/assignments/topic-1/partitions/1", "resource_name": "crn:///kafka=cluster-1/share-group=share-group-1/consumer=consumer-1/assignment=topic=1/partition=1" }, "cluster_id": "cluster-1", "group_id": "share-group-1", "consumer_id": "consumer-1", "topic_name": "topic-1", "partition_id": 1, "partition": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/topic-1/partitions/1" } }, { "kind": "KafkaConsumerAssignment", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/share-groups/share-group-1/consumers/consumer-1/assignments/topic-2/partitions/2", "resource_name": "crn:///kafka=cluster-1/share-group=share-group-1/consumer=consumer-1/assignment=topic=2/partition=2" }, "cluster_id": "cluster-1", "group_id": "share-group-1", "consumer_id": "consumer-1", "topic_name": "topic-2", "partition_id": 2, "partition": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/topic-2/partitions/2" } }, { "kind": "KafkaConsumerAssignment", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/share-groups/share-group-1/consumers/consumer-1/assignments/topic-3/partitions/3", "resource_name": "crn:///kafka=cluster-1/share-group=share-group-1/consumer=consumer-1/assignment=topic=3/partition=3" }, "cluster_id": "cluster-1", "group_id": "share-group-1", "consumer_id": "consumer-1", "topic_name": "topic-3", "partition_id": 3, "partition": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/topics/topic-3/partitions/3" } } ] } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [403 Forbidden](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4) – Indicates a client authorization error. Kafka authorization failures will contain error code 40301 in the response body. **kafka_authorization_failed:** ```http HTTP/1.1 403 Forbidden Content-Type: application/json { "error_code": 40301, "message": "Request is not authorized" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### GET /clusters/{cluster_id}/brokers/{broker_id}/tasks/{task_type} **Get single Broker Task.** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Return a single Broker Task specified with `task_type` for broker specified with `broker_id` in the cluster specified with `cluster_id`. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **broker_id** (*integer*) – The Kafka broker ID. * **task_type** (*string*) – The Kafka broker task type. **Example request:** ```http GET /clusters/{cluster_id}/brokers/{broker_id}/tasks/{task_type} HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The broker task **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaBrokerTask", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers/1/tasks/add-broker", "resource_name": "crn:///kafka=cluster-1/broker=1/task=1" }, "cluster_id": "cluster-1", "broker_id": 1, "task_type": "add-broker", "task_status": "FAILED", "sub_task_statuses": { "partition_reassignment_status": "ERROR" }, "created_at": "2019-10-12T07:20:50Z", "updated_at": "2019-10-12T07:20:55Z", "error_code": 10013, "error_message": "The Confluent Balancer operation was overridden by a higher priority operation", "broker": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers/1" } } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### GET /clusters/{cluster_id}/remove-broker-tasks/{broker_id} **Get Remove Broker Task** [![Generally Available](https://img.shields.io/badge/Lifecycle%20Stage-Generally%20Available-%2345c6e8)](#section/Versioning/API-Lifecycle-Policy) Return the remove broker task for the specified `broker_id`. * **Parameters:** * **cluster_id** (*string*) – The Kafka cluster ID. * **broker_id** (*integer*) – The Kafka broker ID. **Example request:** ```http GET /clusters/{cluster_id}/remove-broker-tasks/{broker_id} HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The remove broker task. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "kind": "KafkaRemoveBrokerTask", "metadata": { "self": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/remove-broker-tasks/1", "resource_name": "crn:///kafka=cluster-1/remove-broker-task=1" }, "cluster_id": "cluster-1", "broker_id": 1, "shutdown_scheduled": false, "broker_replica_exclusion_status": "COMPLETED", "partition_reassignment_status": "FAILED", "broker_shutdown_status": "CANCELED", "error_code": 10006, "error_message": "Error while computing the initial remove broker plan for brokers [1] prior to shutdown.", "broker": { "related": "https://pkc-00000.region.provider.confluent.cloud/kafka/v3/clusters/cluster-1/brokers/1" } } ``` * [400 Bad Request](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.1) – Indicates a bad request error. It could be caused by an unexpected request body format or other forms of request validation failure. **bad_request_cannot_deserialize:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 400, "message": "Cannot deserialize value of type `java.lang.Integer` from String \"A\": not a valid `java.lang.Integer` value" } ``` **unsupported_version_exception:** ```http HTTP/1.1 400 Bad Request Content-Type: application/json { "error_code": 40035, "message": "The version of this API is not supported in the underlying Kafka cluster." } ``` * [401 Unauthorized](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.2) – Indicates a client authentication error. Kafka authentication failures will contain error code 40101 in the response body. **kafka_authentication_failed:** ```http HTTP/1.1 401 Unauthorized Content-Type: application/json { "error_code": 40101, "message": "Authentication failed" } ``` * [429 Too Many Requests](https://www.rfc-editor.org/rfc/rfc6585#section-4) – Indicates that a rate limit threshold has been reached, and the client should retry again later. **Example response:** ```http HTTP/1.1 429 Too Many Requests Content-Type: text/html { "description": "A sample response from Jetty's DoSFilter.", "value": " Error 429 Too Many Requests

HTTP ERROR 429 Too Many Requests

URI: /v3/clusters/my-cluster
STATUS: 429
MESSAGE: Too Many Requests
SERVLET: default
" } ``` * *5XX* – A server-side problem that might not be addressable from the client side. Retriable Kafka errors will contain error code 50003 in the response body. **generic_internal_server_error:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 500, "message": "Internal Server Error" } ``` **produce_v3_missing_schema:** ```http HTTP/1.1 5XX - Content-Type: application/json { "error_code": 50002, "message": "Error when fetching latest schema version. subject = my-topic" } ``` ### SASL Authentication Kafka SASL configurations are described [here](../../../security/authentication/overview.md#kafka-sasl-auth). Note that all of the SASL configurations (for the Admin REST APIs to broker communication) are prefixed with `client.`, or alternatively `admin.`. To enable SASL authentication with the Kafka broker set `kafka.rest.client.security.protocol` to either `SASL_PLAINTEXT` or `SASL_SSL`. Then set `kafka.rest.client.sasl.jaas.config` with the credentials to be used by the Admin REST APIs to authenticate with Kafka. For example: ```none kafka.rest.client.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="kafkarest" password="kafkarest"; ``` Alternatively you can create a JAAS configuration file, for example `CONFLUENT_HOME/etc/kafka/server-jaas.properties`: ```bash KafkaClient { org.apache.kafka.common.security.plain.PlainLoginModule required username="kafkarest" password="kafkarest"; }; ``` The name of the section in the JAAS file must be `KafkaClient`. Then pass it as a JVM argument: ```bash export KAFKA_OPTS="-Djava.security.auth.login.config=${CONFLUENT_HOME}/etc/kafka/server-jaas.properties" ``` For details about configuring Kerberos see [JDK’s Kerberos Requirements](https://docs.oracle.com/javase/8/docs/technotes/guides/security/jgss/tutorials/KerberosReq.html). ## Important Configuration Options The full set of configuration options are documented [here](config.md#kafkarest-config) . However, some configurations should be changed for production. Some **must** be changed because they depend on your cluster layout: `bootstrap.servers` : A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping. This list only impacts the initial hosts used to discover the full set of servers. This list should be in the form `host1:port1,host2:port2,...`. Because these servers are only used for the initial connection to discover the full cluster membership (which may change dynamically), this list does not require the full set of servers. You might want to specify multiple servers in case one goes down. * Type: list * Default: * Valid Values: * Importance: high `schema.registry.url` : The base URL for Schema Registry that should be used by the serializer. * Type: string * Default: “[http://localhost:8081](http://localhost:8081)” * Importance: high #### NOTE The configuration property `auto.register.schemas` is not supported for Kafka REST Proxy. `id` : Unique ID for this REST server instance. This is used in generating unique IDs for consumers that do not specify their ID. The ID is empty by default, which makes a single server setup easier to get up and running, but is not safe for multi-server deployments where automatic consumer IDs are used. * Type: string * Default: “” * Importance: high Other settings are important to the health and performance of the proxy. You should consider changing these based on your specific use case. `consumer.request.max.bytes` : Maximum number of bytes in message keys and values returned by a single request. Smaller values reduce the maximum memory used by a single consumer and may be helpful to clients that cannot perform a streaming decode of responses, limiting the maximum memory used to decode and process a single JSON payload. Conversely, larger values may be more efficient because many messages can be batched into a single request, reducing the number of HTTP requests (and network round trips) required to consume the same set of messages. Note that this can also be overridden by clients on a per-request basis using the `max_bytes` query parameter. However, this setting controls the absolute maximum; `max_bytes` settings exceeding this value will be ignored. * Type: long * Default: 67108864 * Importance: medium `fetch.min.bytes` : The minimum number of bytes in message keys and values returned by a single request before the timeout of consumer.request.timeout.ms passes. * Type: int * Default: -1 * Importance: medium `consumer.request.timeout.ms` : The maximum total time to wait for messages for a request if the maximum request size has not yet been reached. The consumer uses a timeout to enable batching. A larger value will allow the consumer to wait longer, possibly including more messages in the response. However, this value is also a lower bound on the latency of consuming a message from Kafka. If consumers need low latency message delivery, then specify a lower value. * Type: int * Default: 1000 * Importance: medium `consumer.threads` : The maximum number of threads to run consumer requests on. Consumers requests are ran one per thread in a synchronous manner. You must set this value higher than the maximum number of consumers in a single consumer group, otherwise rebalances will deadlock. * Type: int * Default: 50 * Importance: medium `host.name` : The host name used to generate absolute URLs for consumers. If empty, the default canonical hostname is used. You may need to set this value if the FQDN of your host cannot be automatically determined. * Type: string * Default: “” * Importance: medium ## Schemas Although the records serialized to Kafka are opaque bytes, they must have some rules about their structure to make it possible to process them. One aspect of this structure is the schema of the data, which defines its shape and fields. Is it an integer? Is it a map with keys `foo`, `bar`, and `baz`? Something else? Without any mechanism for enforcement, schemas are implicit. A consumer, somehow, needs to know the form of the produced data. Frequently this happens by getting a group of people to agree verbally on the schema. This approach, however, is error prone. It’s often better if the schema can be managed centrally, audited, and enforced programmatically. [Confluent Schema Registry](../../schema-registry/index.md#schemaregistry-intro), a project outside of Kafka, helps with schema management. Schema Registry enables producers to register a topic with a schema so that when any further data is produced, it is rejected if it doesn’t conform to the schema. Consumers can consult Schema Registry to find the schema for topics they don’t know about. Rather than having you glue together producers, consumers, and schema configuration, ksqlDB integrates transparently with Schema Registry. By enabling a configuration option so that the two systems can talk to each other, ksqlDB stores all stream and table schemas in Schema Registry. These schemas can then be downloaded and used by any application working with ksqlDB data. Moreover, ksqlDB can infer the schemas of existing topics automatically, so that you don’t need to declare their structure when you define the stream or table over it. ## Content Types The ksqlDB HTTP API uses content types for requests and responses to indicate the serialization format of the data and the API version. Your request should specify this serialization format and version in the `Accept` header, for example: ```none Accept: application/vnd.ksql.v1+json ``` The less specific `application/json` content type is also permitted. However, this is only for compatibility and ease of use, and you should use the versioned value where possible. `application/json` maps to the latest versioned content type, meaning the response may change after upgrading the server to a later version. The server also supports content negotiation, so you may include multiple, weighted preferences: ```none Accept: application/vnd.ksql.v1+json; q=0.9, application/json; q=0.5 ``` For example, content negotiation is useful when a new version of the API is preferred, but you are not sure if it is available yet. Here’s an example request that returns the results from the `LIST STREAMS` command: ```bash curl -X "POST" "http://localhost:8088/ksql" \ -H "Accept: application/vnd.ksql.v1+json" \ -d $'{ "ksql": "LIST STREAMS;", "streamsProperties": {} }' ``` Here’s an example request that retrieves streaming data from `TEST_STREAM`: ```bash curl -X "POST" "http://localhost:8088/query" \ -H "Accept: application/vnd.ksql.v1+json" \ -d $'{ "ksql": "SELECT * FROM TEST_STREAM EMIT CHANGES;", "streamsProperties": {} }' ``` A `PROTOBUF` content type where the rows are serialized in the `PROTOBUF` format is also supported for querying the `/query` and `/query-stream` endpoints. You can specify this serialization format in the `Accept` header: ```none Accept: application/vnd.ksql.v1+protobuf ``` The following example shows a curl command that issues a Pull query on a table called `CURRENTLOCATION` with the `PROTOBUF` content type: ```bash curl -X "POST" "http://localhost:8088/query" \ -H "Accept: application/vnd.ksql.v1+protobuf" \ -d $'{ "ksql": "SELECT * FROM CURRENTLOCATION;", "streamsProperties": {} }' ``` Response: ```json [{"header":{"queryId":"query_1655152127973","schema":"`PROFILEID` STRING KEY, `LA` DOUBLE, `LO` DOUBLE","protoSchema":"syntax = \"proto3\";\n\nmessage ConnectDefault1 {\n string PROFILEID = 1;\n double LA = 2;\n double LO = 3;\n}\n"}}, {"row":{"protobufBytes":"CggxOGY0ZWE4NhF90LNZ9bFCQBmASL99HYRewA=="}}, {"row":{"protobufBytes":"Cgg0YTdjN2I0MRFAE2HD07NCQBnM7snDQoVewA=="}}, {"row":{"protobufBytes":"Cgg0YWI1Y2JhZBGKsOHplbJCQBmMSuoENIVewA=="}}, {"row":{"protobufBytes":"Cgg0ZGRhZDAwMBHNO07RkeRCQBk9m1Wfq5lewA=="}}, {"row":{"protobufBytes":"Cgg4YjZlYWU1ORFtxf6ye7JCQBmMSuoENIVewA=="}}, {"row":{"protobufBytes":"CghjMjMwOWVlYxGUh4Va0+RCQBn0/dR46ZpewA=="}}] ``` The `protoSchema` field in the `header` corresponds to the content of a `.proto` file that the proto compiler uses at build time. Use the `protoSchema` field to deserialize the `protobufBytes` into `PROTOBUF` messages. Provide the `--basic` and `--user` options if basic HTTPS authentication is enabled on the cluster, as shown in the following command. `bash hl_lines="3" curl -X "POST" "https://localhost:8088/ksql" \ -H "Accept: application/vnd.ksql.v1+json" \ --basic --user ":" \ -d $'{ "ksql": "LIST STREAMS;", "streamsProperties": {} }'` ## Can ksqlDB connect to an Apache Kafka cluster over TLS and authenticate using SASL? Yes. Internally, ksqlDB uses standard Kafka consumers and producers. The procedure to securely connect ksqlDB to Kafka is the same as connecting any app to Kafka. For more information, see [Configure Kafka Authentication](operate-and-deploy/installation/security.md#ksqldb-installation-security-configure-kafka-auth). ## Important Sizing Factors This section describes the important factors to consider when scoping out your ksqlDB deployment. **Throughput**: In general, higher throughput requires more resources. **Query Types**: Your realized throughput will largely be a function of the type of queries you run. You can think of ksqlDB queries as falling into these categories: - Project/Filter, e.g. `SELECT FROM WHERE ` - Joins - Aggregations, e.g. `SUM, COUNT, TOPK, TOPKDISTINCT` A project/filter query reads records from an input stream or table, may filter the records according to some predicate, and performs stateless transformations on the columns before writing out records to a sink stream or table. Project/filter queries require the fewest resources. For a single project/filter query running on an instance provisioned as recommended above you can expect to realize from ~40 MB/second up to the rate supported by your network. The throughput depends largely on the average message size and complexity. Processing small messages with many columns is CPU intensive and will saturate your CPU. Processing large messages with fewer columns requires less CPU and ksqlDB will start saturating the network for such workloads. Stream-table joins read and write to Kafka Streams state stores and require around twice the CPU of project/filter. Though Kafka Streams state stores are stored on disk, we recommend that you provision sufficient memory to keep the working set memory-resident to avoid expensive disk I/O. So expect around half the throughput and expect to provision higher-memory instances. Aggregations read from and may write to a state store for every record. They consume around twice the CPU of joins. The CPU required increases if the aggregation uses a window as the state store must be updated for every window. **Number of Queries**: The available resources on a server are shared across all queries. So expect that the processing throughput per server will decrease proportionally with the number of queries it is executing (see the notes on vertically and horizontally scaling a ksqlDB cluster in this document to add more processing capacity in such situations) . Furthermore, SQL queries run as Kafka Streams applications. Each query starts its own Kafka Streams worker threads, and uses its own consumers and producers. This adds a little bit of CPU overhead per query. You should avoid running a large number of queries on one ksqlDB cluster. Instead, use interactive mode to play with your data and develop sets of queries that function together. Then, run these in their own headless cluster. Check out the [Recommendations and Best Practices](#recommendations-and-best-practices) section for more details. **Data Schema**: ksqlDB handles mapping serialized Kafka records to columns in a stream or table’s schema. In general, more complex schemas with a higher ratio of columns to bytes of data require more CPU to process. **Number of Partitions**: Kafka Streams creates one RocksDB state store instance for aggregations and joins for every topic partition processed by a given ksqlDB server. Each RocksDB state store instance has a memory overhead of 50 MB for its cache plus the data actually stored. **Key Space**: For aggregations and joins, Kafka Streams/RocksDB tries to keep the working set of a state store in memory to avoid I/O operations. If there are many keys, this requires more memory. It also makes reads and writes to the state store more expensive. Note that the size of the data in a state store is not limited by memory (RAM) but only by available disk space on a ksqlDB server. ## Next steps - See ksqlDB in action with the [ksqlDB Quick Start](../quickstart.md#ksqldb-quick-start). - Learn more with the [ksqlDB Tutorials and Examples](../tutorials/overview.md#ksql-tutorials). - Take the developer courses: [Introduction to ksqlDB](https://developer.confluent.io/learn-kafka/ksqldb/intro/) and [ksqlDB Architecture](https://developer.confluent.io/learn-kafka/inside-ksqldb/streaming-architecture/). ### Write the Kafka consumer code Now we can write the code that triggers side effects when anomalies are found. Add the following Java file at `src/main/java/io/ksqldb/tutorial/EmailSender.java`. This is a simple program that consumes events from Kafka and sends an email with SendGrid for each one it finds. There are a few constants to fill in, including a SendGrid API key. You can get one by signing up for SendGrid. ```java package io.ksqldb.tutorial; import org.apache.kafka.clients.consumer.ConsumerConfig; import org.apache.kafka.clients.consumer.ConsumerRecord; import org.apache.kafka.clients.consumer.ConsumerRecords; import org.apache.kafka.clients.consumer.KafkaConsumer; import org.apache.kafka.common.serialization.StringDeserializer; import io.confluent.kafka.serializers.KafkaAvroDeserializer; import io.confluent.kafka.serializers.KafkaAvroDeserializerConfig; import io.confluent.kafka.serializers.AbstractKafkaAvroSerDeConfig; import com.sendgrid.SendGrid; import com.sendgrid.Request; import com.sendgrid.Response; import com.sendgrid.Method; import com.sendgrid.helpers.mail.Mail; import com.sendgrid.helpers.mail.objects.Email; import com.sendgrid.helpers.mail.objects.Content; import java.time.Duration; import java.time.Instant; import java.time.ZoneId; import java.time.format.DateTimeFormatter; import java.time.format.FormatStyle; import java.util.Collections; import java.util.Properties; import java.util.Locale; import java.io.IOException; public class EmailSender { // Matches the broker port specified in the Docker Compose file. private final static String BOOTSTRAP_SERVERS = "localhost:29092"; // Matches the Schema Registry port specified in the Docker Compose file. private final static String SCHEMA_REGISTRY_URL = "http://localhost:8081"; // Matches the topic name specified in the ksqlDB CREATE TABLE statement. private final static String TOPIC = "possible_anomalies"; // For you to fill in: which address SendGrid should send from. private final static String FROM_EMAIL = "<< FILL ME IN >>"; // For you to fill in: the SendGrid API key to use their service. private final static String SENDGRID_API_KEY = "<< FILL ME IN >>"; private final static SendGrid sg = new SendGrid(SENDGRID_API_KEY); private final static DateTimeFormatter formatter = DateTimeFormatter.ofLocalizedDateTime(FormatStyle.SHORT) .withLocale(Locale.US) .withZone(ZoneId.systemDefault()); public static void main(final String[] args) throws IOException { final Properties props = new Properties(); props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, BOOTSTRAP_SERVERS); props.put(ConsumerConfig.GROUP_ID_CONFIG, "email-sender"); props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true"); props.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, "1000"); props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest"); props.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, SCHEMA_REGISTRY_URL); props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class); props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, KafkaAvroDeserializer.class); props.put(KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG, true); try (final KafkaConsumer consumer = new KafkaConsumer<>(props)) { consumer.subscribe(Collections.singletonList(TOPIC)); while (true) { final ConsumerRecords records = consumer.poll(Duration.ofMillis(100)); for (final ConsumerRecord record : records) { final PossibleAnomaly value = record.value(); if (value != null) { sendEmail(value); } } } } } private static void sendEmail(PossibleAnomaly anomaly) throws IOException { Email from = new Email(FROM_EMAIL); Email to = new Email(anomaly.getEmailAddress().toString()); String subject = makeSubject(anomaly); Content content = new Content("text/plain", makeContent(anomaly)); Mail mail = new Mail(from, subject, to, content); Request request = new Request(); try { request.setMethod(Method.POST); request.setEndpoint("mail/send"); request.setBody(mail.build()); Response response = sg.api(request); System.out.println("Attempted to send email!\n"); System.out.println("Status code: " + response.getStatusCode()); System.out.println("Body: " + response.getBody()); System.out.println("Headers: " + response.getHeaders()); System.out.println("======================"); } catch (IOException ex) { throw ex; } } private static String makeSubject(PossibleAnomaly anomaly) { return "Suspicious activity detected for card " + anomaly.getCardNumber(); } private static String makeContent(PossibleAnomaly anomaly) { return String.format("Found suspicious activity for card number %s. %s transactions were made for a total of %s between %s and %s", anomaly.getCardNumber(), anomaly.getNAttempts(), anomaly.getTotalAmount(), formatter.format(Instant.ofEpochMilli(anomaly.getStartBoundary())), formatter.format(Instant.ofEpochMilli(anomaly.getEndBoundary()))); } } ``` #### Restore mirroring after a failover with truncate-and-restore If you want to restore mirroring after a `promote` or a `failover`, you can use the `truncate-and-restore` command. After failing over or promoting a mirror topic, you can run `truncate-and-restore` on the original primary topic that will make it a mirror fetching from the newly-stopped mirror topic. This command will also truncate and delete any divergent records that were produced to the original primary cluster after the point of failover. This means that there could be some loss of data if your clients are not set up to reprocess data. To learn more, see [Convert a mirror topic to a normal topic](mirror-topics-cp.md#convert-mirror-topic-to-normal-topic). `truncate-and-restore` is available only on [“bidirectional” links](mirror-topics-cp.md#bidirectional-linking-cp), and only in KRaft mode. To learn more about running Kafka in KRaft mode, see [KRaft Overview for Confluent Platform](../../kafka-metadata/kraft.md#kraft-overview), [KRaft Configuration for Confluent Platform](../../kafka-metadata/config-kraft.md#configure-kraft), and the [Platform Quick Start](../../get-started/platform-quickstart.md#cp-quickstart-step-1). Also, the [basic Cluster Linking tutorial](topic-data-sharing.md#tutorial-topic-data-sharing) includes a full walkthrough of how to run Cluster Linking in KRaft mode. #### IMPORTANT As of Confluent Platform 8.0, ZooKeeper is no longer available for new deployments. Confluent recommends migrating to KRaft mode for new deployments. To learn more about running Kafka in KRaft mode, see [/platform/current/KRaft Overview](/platform/current/KRaft Overview) steps in the [Platform Quick Start](/platform/current/get-started/platform-quickstart.html). To learn about migrating from older versions, see [Migrate from ZooKeeper to KRaft on Confluent Platform](/platform/current/installation/migrate-zk-kraft.html). This tutorial provides examples for KRaft mode only. Earlier versions of this documentation provide examples for both KRaft and ZooKeeper. For KRaft, the examples show a *combined mode* configuration, where for each cluster the broker and controller run on the same server. Currently, combined mode is not intended for production use but is shown here to simplify the tutorial. If you want to run controllers and brokers on separate servers, use KRaft in isolated mode. To learn more, see [KRaft Overview](/platform/current/kafka-metadata/kraft.html#kraft-overview) and [Kraft mode](https://docs.confluent.io/platform/current/installation/installing_cp/zip-tar.html#kraft-mode) under [Configure Confluent Platform for production](https://docs.confluent.io/platform/current/installation/installing_cp/zip-tar.html#configure-cp-for-production). ### Create the Confluent Cloud to Confluent Platform link 1. Create another user API key for this cluster link on your Confluent Cloud cluster. ```bash confluent api-key create --resource $CC_CLUSTER_ID ``` You use the same cluster that served as the destination in previous steps as the source cluster in the following steps, therefore, you create a different API key and secret for the same cluster to serve in this new role. 2. Keep the resulting API key and secret in a safe place. This tutorial refers to these as `` and ``. You will add these to a configuration file in the next step. #### IMPORTANT If you are setting this up in production, you should use a service account API key instead of a user-associated key. To do this, you would create a service account for your cluster link, give the service account the requisite ACLs, then create an API key for the service account. It’s best practice for each cluster link to have its own API key and service account. A guide on [how to set up privileges to access Confluent Cloud clusters with a service account](https://docs.confluent.io/cloud/current/multi-cloud/cluster-linking/topic-data-sharing.html#set-up-privileges-for-the-cluster-link-to-access-topics-on-the-source-cluster) is provided in the topic data sharing tutorial. 3. Use `confluent kafka cluster describe` to get the Confluent Cloud cluster Endpoint URL. ```bash confluent kafka cluster describe $CC_CLUSTER_ID ``` This Endpoint URL will be referred to as `` in the following steps. 4. Save your API key and secret along with the following configuration entries in a file called `$CONFLUENT_CONFIG/clusterlink-cloud-to-CP.config` that the Confluent Platform commands will use to authenticate into Confluent Cloud: ```bash $CONFLUENT_CONFIG/clusterlink-cloud-to-CP.config ``` The configuration entries you need in this file are as follows: ```bash bootstrap.servers= security.protocol=SASL_SSL sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username='' password=''; ``` 5. Create the cluster link to Confluent Platform. If you want to follow this example exactly, name the cluster link `from-cloud-link` but you have the option to name it whatever you like. You will use the cluster link name to create and manipulate mirror topics. You cannot rename a cluster link once it’s created. The following command creates the cluster link on an unsecured Confluent Platform cluster. If you have security set up on your Confluent Platform cluster, you must pass security credentials to this command with `--command-config` as shown in [Setting Properties on a Cluster Link](https://docs.confluent.io/platform/current/multi-dc-deployments/cluster-linking/configs.html#setting-properties-on-a-cluster-link). ```bash kafka-cluster-links --bootstrap-server localhost:9092 \ --create --link from-cloud-link \ --config-file $CONFLUENT_CONFIG/clusterlink-cloud-to-CP.config \ --cluster-id $CC_CLUSTER_ID --command-config $CONFLUENT_CONFIG/CP-command.config ``` Your output should resemble the following: ```bash Cluster link 'from-cloud-link' creation successfully completed. ``` 6. Check that the link exists with the `kafka-cluster-links --list` command, as follows. ```bash kafka-cluster-links --list --bootstrap-server localhost:9092 --command-config $CONFLUENT_CONFIG/CP-command.config ``` Your output should resemble the following, showing the previous `from-on-prem-link` you created along with the new `from-cloud-link` ```none Link name: 'from-on-prem-link', link ID: '7eb4304e-b513-41d2-903e-147dea62a01c', remote cluster ID: 'lkc-1vgo6', local cluster ID: 'G1pnOMOxSjWYIX8xuR2cfQ' Link name: 'from-cloud-link', link ID: 'b1a56076-4d6f-45e0-9013-ff305abd0e54', remote cluster ID: 'lkc-1vgo6', local cluster ID: 'G1pnOMOxSjWYIX8xuR2cfQ' ``` ### KRaft and ZooKeeper - As of Confluent Platform 8.0, ZooKeeper is no longer available for new deployments. Confluent recommends migrating to KRaft mode for new deployments. To learn more about running Kafka in KRaft mode, see [KRaft Overview for Confluent Platform](../../kafka-metadata/kraft.md#kraft-overview) and the KRaft steps in the [Quick Start for Confluent Platform](../../get-started/platform-quickstart.md#quickstart). To learn about migrating from older versions, see [Migrate from ZooKeeper to KRaft on Confluent Platform](../../installation/migrate-zk-kraft.md#migrate-zk-kraft). - Specifically, in relation to this migration to KRaft, `password.encoder.secret` is not required for KRaft mode, but is required when [migrating from ZooKeeper to KRaft](../../installation/migrate-zk-kraft.md#migrate-zk-kraft). Use of this parameter for Cluster Linking, when needed for older versions on ZooKeeper, is shown in [Tutorial: Link Confluent Platform and Confluent Cloud Clusters](hybrid-cp.md#cluster-link-hybrid-cp). To learn more about how this is handled in Confluent Platform 8.0 and later, see [Update password configurations dynamically](../../kafka/dynamic-config.md#dynamic-config-passwords-upgrade). - This documentation provides examples for KRaft mode only. Earlier versions of this documentation provide examples for both KRaft and ZooKeeper. - Some examples in the various tutorials show a *combined mode* configuration, where for each cluster the broker and controller run on the same server. Currently, combined mode is not intended for production use but is shown here to simplify the tutorial. If you want to run controllers and brokers on separate servers, use KRaft in isolated mode. To learn more, see [KRaft Overview for Confluent Platform](../../kafka-metadata/kraft.md#kraft-overview) and [KRaft Configuration for Confluent Platform](../../kafka-metadata/config-kraft.md#configure-kraft). ## Known Issues, Limitations, and Best Practices * While the use of Single Message Transformations (SMTs) in Replicator is supported, it is not a best practice. The use of Apache Flink® or Kafka Streams is considered best practice because these are more scalable and easier to debug. * Replicator should not be used for serialization changes. In these cases, the recommended method is to use ksqlDB. To learn more, see the documentation on [ksqlDB](../../ksqldb/overview.md#ksql-home) and the tutorial on [How to convert a stream’s serialization format](https://developer.confluent.io/tutorials/changing-serialization-format/ksql.html) on the Confluent Developer site. * When running Replicator with version 5.3.0 or above, set `connect.protocol=eager` as there is a known issue where using the default of `connect.protocol=compatible` or `connect.protocol=sessioned` can cause issues with tasks rebalancing and duplicate records. * If you encounter `RecordTooLargeException` when you use compressed records, set the record batch size for the Replicator producer to the highest possible value. When Replicator decompresses records while consuming from the source cluster, it checks the size of the uncompressed batch on the producer before recompressing them and may throw `RecordTooLargeException`. Setting the record batch size mitigates the exception, and compression proceeds as expected when the record is sent to the destination cluster. * The Replicator latency metric is calculated by subtracting the time the record was produced to the source from the time it was replicated on the destination. This works in the real time case, when there is active production going on in the source cluster and the calculation we are doing is in real time. However if you are replicating old data, you will see very large latency due to the old record timestamps. In the historical data case, the latency does not indicate how long Replicator is taking to replicate data. It indicates how much time has passed between the original message and now for the message that Replicator is currently replicating. As Replicator proceeds over historical data, the latency metric should decrease quickly. * There’s an issue with the Replicator lag metric where the value `NaN` is reported if there has not been a sample of lag being reported in a given time window. This can happen if you have limited production in the source cluster or if Replicator is not flushing data fast enough to the destination cluster, thus causing it to not be able to record enough samples in the given time window. This will cause the JMX metrics to report `NaN` for the Replicator metrics. `NaN` may not necessarily mean that the lag is 0; it means that there aren’t enough samples in the given time window to report lag. ## MirrorMaker MirrorMaker is a stand-alone tool for copying data between two Kafka clusters. To learn more, see [Mirroring data between clusters](https://kafka.apache.org/documentation/#basic_ops_mirror_maker) in the Kafka documentation. MirrorMaker 2 is supported as a stand-alone executable, but is not supported as a connector. Confluent Replicator is a more complete solution that handles topic configuration and data, and integrates with Kafka Connect and Confluent Control Center to improve availability, scalability and ease of use. To learn more, try out the Quick Start [Tutorial: Replicate Data Across Kafka Clusters in Confluent Platform](replicator-quickstart.md#replicator-quickstart) and see [Migrate from Kafka MirrorMaker to Replicator in Confluent Platform](migrate-replicator.md#migrate-replicator). # Replicate Topics Across Kafka Clusters in Confluent Platform * [Overview](index.md) * [Example: Active-active Multi-Datacenter](replicator-docker-tutorial.md) * [Tutorial: Replicate Data Across Clusters](replicator-quickstart.md) * [Tutorial: Run as an Executable or Connector](replicator-run.md) * [Configure](configuration_options.md) * [Verify Configuration](replicator-verifier.md) * [Tune](replicator-tuning.md) * [Monitor](replicator-monitoring.md) * [Configure for Cross-Cluster Failover](replicator-failover.md) * [Migrate from MirrorMaker to Replicator](migrate-replicator.md) * [Replicator Schema Translation Example for Confluent Platform](replicator-schema-translation.md) ## Related content * For a practical guide to designing and configuring multiple Apache Kafka clusters to be resilient in case of a disaster scenario, see the [Disaster Recovery white paper](https://www.confluent.io/white-paper/disaster-recovery-for-multi-datacenter-apache-kafka-deployments/). This white paper provides a plan for failover, failback, and ultimately successful recovery. * For an overview of using Confluent Platform for data replication, see [Overview of Multi-Datacenter Deployment Solutions on Confluent Platform](../index.md#multi-dc). * For a quick start on how to configure Replicator and set up your own multi-cluster deployment, see [Tutorial: Replicate Data Across Kafka Clusters in Confluent Platform](replicator-quickstart.md#replicator-quickstart). * For an overview of Replicator see [Replicate Multi-Datacenter Topics Across Kafka Clusters in Confluent Platform](index.md#replicator-detail). * For an introduction to using Confluent Platform to create stretch clusters with followers, observers, and replica placement, see [Configure Multi-Region Clusters in Confluent Platform](../multi-region.md#bmrr). ### Convert Replicator connector configurations to Replicator executable configurations Replicator connect configuration can be converted to a Replicator executable configuration. One of the key differences between the two is that the Connect configuration has two configuration files (a worker properties file and a connector properties or JSON file) while Replicator executable has three configuration files (a consumer, a producer, and a replication properties file). It’s helpful to think about this in the following way: * The consumer configuration file contains all the properties you need to configure the consumer embedded within Replicator that consumes from the source cluster. This would include any special configurations you want to use to tune the source consumer, in addition to the necessary security and connection details needed for the consumer to connect to the source cluster. * The producer configuration file contains all the properties you need to configure the producer embedded within Replicator that produces to the destination cluster. This would include any special configurations you want to use to tune the destination producer, in addition to the necessary security and connection details needed for the producer to connect to the destination cluster. * The replication configuration file contains all the properties you need to configure the actual Replicator that does the work of taking the data from the source consumer and passing it to the destination producer. This would include all Connect-specific configurations needed for Replicator as well as any necessary Replicator configurations. If you have the following worker properties: ```none config.storage.replication.factor=3 offset.storage.replication.factor=3 status.storage.replication.factor=3 connect.protocol=eager connector.client.config.override.policy=All bootstrap.servers=destination-cluster:9092 ssl.endpoint.identification.algorithm=https sasl.mechanism=PLAIN security.protocol=SASL_SSL sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username=\"destUser\" password=\"destPassword\"; ``` And the following Replicator JSON: ```json { "connector.class":"io.confluent.connect.replicator.ReplicatorSourceConnector", "tasks.max":4, "topic.whitelist":"test-topic", "topic.rename.format":"${topic}.replica", "confluent.license":"XYZ" "name": "replicator", "header.converter": "io.confluent.connect.replicator.util.ByteArrayConverter", "key.converter": "io.confluent.connect.replicator.util.ByteArrayConverter", "value.converter": "io.confluent.connect.replicator.util.ByteArrayConverter", "src.consumer.max.poll.records":"10000", "producer.override.linger.ms":"10", "producer.override.compression.type":"lz4", "src.kafka.bootstrap.servers": "source-cluster:9092", "src.kafka.ssl.endpoint.identification.algorithm": "https", "src.kafka.security.protocol": "SASL_SSL", "src.kafka.sasl.mechanism": "PLAIN", "src.kafka.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"sourceUser\" password=\"sourcePassword\";", "dest.kafka.bootstrap.servers": "destination-cluster:9092", "dest.kafka.ssl.endpoint.identification.algorithm": "https", "dest.kafka.security.protocol": "SASL_SSL", "dest.kafka.sasl.mechanism": "PLAIN", "dest.kafka.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"destUser\" password=\"destPassword\";" } ``` You can convert the configuration shown above in these two configuration files to the following Consumer, Producer, and Replication configurations needed to use Replicator executable: **Consumer Configurations**: ```none bootstrap.servers=source-cluster:9092 ssl.endpoint.identification.algorithm=https security.protocol=SASL_SSL sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username=\"sourceUser\" password=\"sourcePassword\"; max.poll.records=10000 ``` For the consumer configurations, strip the `src.kafka` and `src.consumer` prefixes and simply list the actual configuration you want for the source consumer. The Replicator executable will know that because this has been placed in the consumer configuration, it needs to apply these configurations to the source consumer that will poll the source cluster. **Producer Configurations**: ```none bootstrap.servers=destination-cluster:9092 ssl.endpoint.identification.algorithm=https security.protocol=SASL_SSL sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username=\"destUser\" password=\"destPassword\"; linger.ms=10 compression.type=lz4 ``` For the producer configurations, strip the `dest.kafka` and `producer.overrides` prefixes and simply list the actual configuration you want for the destination producer. The Replicator executable will know that because this has been placed in the producer configuration, it needs to apply these configurations to the destination producer that will write to the destination cluster. **Replication Configurations**: ```none config.storage.replication.factor=3 offset.storage.replication.factor=3 status.storage.replication.factor=3 connect.protocol=eager tasks.max=4 topic.whitelist=test-topic topic.rename.format=${topic}.replica confluent.license=XYZ name=replicator header.converter=io.confluent.connect.replicator.util.ByteArrayConverter key.converter=io.confluent.connect.replicator.util.ByteArrayConverter value.converter=io.confluent.connect.replicator.util.ByteArrayConverter ``` For the replication configurations, only include the configurations that are important for Replicator or Connect. It’s important to note that here you don’t need the `connector.client.config.override.policy` configuration anymore, as the Replicator executable directly passes in the producer configurations specified in the configuration file. This makes it easier to think about configuring the important consumers and producers for replication, rather than incorporating an extra Connect configuration. ### Run Replicator on the source cluster Replicator should be run on the destination cluster if possible. If this is not practical it is possible to run Replicator on the source cluster from Confluent Platform 5.4.0 onwards. Make the following changes to run Replicator in this way: * `connector.client.config.override.policy` to be set to `All` in the Connect worker configuration or in `--replication.config` if using Replicator Executable. * `bootstrap.servers` in the Connect worker configuration should point to the source cluster (for Replicator Executable specify this in `--producer.config`) * any client configurations (security etc.) for the source cluster should be provided in the Connect worker configuration (for Replicator Executable specify these in `--producer.config`) * `producer.override.bootstrap.servers` in the connector configuration should point to the destination cluster (for Replicator Executable specify this in `--replication.config`) * any client configurations (security etc.) for the destination cluster should be provided in the connector configuration with prefix `producer.override.` (for Replicator Executable specify these in `--replication.config`) * configurations with the prefix `src.kafka.` and `dest.kafka` should be provided as usual An example configuration for Replicator running as a connector on the source cluster can be seen below: ```bash { "connector.class": "io.confluent.connect.replicator.ReplicatorSourceConnector", "name": "replicator", "producer.override.ssl.endpoint.identification.algorithm": "https", "producer.override.sasl.mechanism": "PLAIN", "producer.override.request.timeout.ms": 20000, "producer.override.bootstrap.servers": "destination-cluster:9092", "producer.override.retry.backoff.ms": 500, "producer.override.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"someUser\" password=\"somePassword\";", "producer.override.security.protocol": "SASL_SSL", "key.converter": "io.confluent.connect.replicator.util.ByteArrayConverter", "value.converter": "io.confluent.connect.replicator.util.ByteArrayConverter", "topic.whitelist": "someTopic", "src.kafka.bootstrap.servers": "source-cluster:9092", "dest.kafka.bootstrap.servers": "destination-cluster:9092", "dest.kafka.ssl.endpoint.identification.algorithm": "https", "dest.kafka.security.protocol": "SASL_SSL", "dest.kafka.sasl.mechanism": "PLAIN", "dest.kafka.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"someUser\" password=\"somePassword\";" } ``` In this configuration Replicator is producing between clusters rather than consuming and the default producer configurations are not optimal for this. Consider adjusting the following configurations to increase the throughput of the producer flow: * `producer.override.linger.ms=500` * `producer.override.batch.size=600000` These values are provided as a starting point only and should be further tuned to your environment and use case. For more detail on running Replicator on the source cluster when the destination is Confluent Cloud, see [Confluent Replicator to Confluent Cloud Configurations](/cloud/current/get-started/examples/ccloud/docs/replicator-to-cloud-configuration-types.html). ## Configuration Options `schema.registry.url` : Comma-separated list of URLs for Schema Registry instances that can be used to register or look up schemas. * Type: list * Default: “” * Importance: high `auto.register.schemas` : Specify if the Serializer should attempt to register the Schema with Schema Registry. * Type: boolean * Default: true * Importance: medium `use.latest.version` : Only applies when `auto.register.schemas` is set to `false`. If `auto.register.schemas` is set to `false` and `use.latest.version` is set to `true`, then instead of deriving a schema for the object passed to the client for serialization, Schema Registry will use the latest version of the schema in the subject for serialization. The property `use.latest.version` can be set on producers or consumers to serialize or deserialize messages per the latest version. * Type: boolean * Default: false * Importance: medium #### NOTE To learn more, see how to use schema references to combine [multiple event types in the same topic](fundamentals/serdes-develop/index.md#multiple-event-types-same-topic-sr) with [Avro](fundamentals/serdes-develop/serdes-avro.md#multiple-event-types-same-topic-avro), [JSON Schema](fundamentals/serdes-develop/serdes-json.md#multiple-event-types-same-topic-json), or [Protobuf](fundamentals/serdes-develop/serdes-protobuf.md#multiple-event-types-same-topic-protobuf). `latest.compatibility.strict` : Only applies when `use.latest.version` is set to `true`. If `latest.compatibility.strict` is `true` (the default), then when using `use.latest.version=true` during serialization, a check is performed to verify that the latest subject version is backward compatible with the schema of the object being serialized. If the check fails, then an error results. If the check succeeds, then serialization is performed. If `latest.compatibility.strict` is `false`, then the latest subject version is used for serialization, without any compatibility check. Serialization may fail in this case. Relaxing the compatibility requirement (by setting `latest.compatibility.strict` to `false`) may be useful, for example, when implementing [Kafka Connect converters](../connect/index.md#connect-converters) and [schema references](fundamentals/serdes-develop/index.md#referenced-schemas). * Type: boolean * Default: true * Importance: medium #### NOTE To learn more about this setting, see [Schema Evolution and Compatibility for Schema Registry on Confluent Platform](fundamentals/schema-evolution.md#schema-evolution-and-compatibility). `max.schemas.per.subject` : Maximum number of schemas to create or cache locally. * Type: int * Default: 1000 * Importance: low `key.subject.name.strategy` : Determines how to construct the subject name under which the key schema is registered with Schema Registry. For additional information, see Schema Registry [Subject name strategy](fundamentals/serdes-develop/index.md#sr-schemas-subject-name-strategy). Any implementation of `io.confluent.kafka.serializers.subject.strategy.SubjectNameStrategy` can be specified. By default, `-key` is used as the subject. Specifying an implementation of `io.confluent.kafka.serializers.subject.SubjectNameStrategy` is deprecated as of `4.1.3` and if used may have some performance degradation. * Type: class * Default: class io.confluent.kafka.serializers.subject.TopicNameStrategy * Importance: medium `value.subject.name.strategy` : Determines how to construct the subject name under which the value schema is registered with Schema Registry. For additional information, see Schema Registry [Subject name strategy](fundamentals/serdes-develop/index.md#sr-schemas-subject-name-strategy). Any implementation of `io.confluent.kafka.serializers.subject.strategy.SubjectNameStrategy` can be specified. By default, `-value` is used as the subject. Specifying an implementation of `io.confluent.kafka.serializers.subject.SubjectNameStrategy` is deprecated as of `4.1.3` and if used may have some performance degradation. * Type: class * Default: class io.confluent.kafka.serializers.subject.TopicNameStrategy * Importance: medium `basic.auth.credentials.source` : Specify how to pick the credentials for the Basic authentication header. The supported values are URL, USER_INFO and SASL_INHERIT. * Type: string * Default: “URL” * Importance: medium `basic.auth.user.info` : Specify the user info for the Basic authentication in the form of {username}:{password}. schema.registry.basic.auth.user.info is a deprecated alias for this configuration. * Type: password * Default: “” * Importance: medium The following Schema Registry dedicated properties, configurable on the client, are available on Confluent Platform version 5.4.0 (and later). To learn more, see the information on configuring clients in [Additional configurations for HTTPS](security/index.md#sr-https-additional). `schema.registry.ssl.truststore.location` : The location of the trust store file. For example, `schema.registry.kafkastore.ssl.truststore.location=/etc/kafka/secrets/kafka.client.truststore.jks` * Type: string * Default: “” * Importance: medium `schema.registry.ssl.truststore.password` : The password for the trust store file. If a password is not set, access to the truststore is still available but integrity checking is disabled. * Type: password * Default: “” * Importance: medium `schema.registry.ssl.keystore.location` : The location of the key store file. This is optional for the client and can be used for two-way authentication for the client. For example, `schema.registry.kafkastore.ssl.keystore.location=/etc/kafka/secrets/kafka.schemaregistry.keystore.jks`. * Type: string * Default: “” * Importance: medium `schema.registry.ssl.keystore.password` : The store password for the key store file. This is optional for the client and only needed if `ssl.keystore.location` is configured. * Type: password * Default: “” * Importance: medium `schema.registry.ssl.key.password` : The password of the private key in the key store file. This is optional for the client. * Type: password * Default: “” * Importance: medium ### GET /schemas/ids/{int: id} Get the schema string identified by the input ID. * **Parameters:** * **id** (*int*) – the globally unique identifier of the schema * **format** (*string*) – Desired output format, dependent on schema type. For AVRO schemas, valid values are: `""` (default) or `resolved`. For PROTOBUF schemas, valid values are: `""` (default), `ignore_extensions`, or `serialized`. (The parameter does not apply to JSON schemas.) * **subject** (*string*) – Add `?subject=` at the end of this request to look for the subject in all contexts starting with the default context, and return the schema with the id from that context. To learn more about contexts, see the [exporters](#schemaregistry-api-exporters) API reference and the quick start and concepts guides for [Schema Linking on Confluent Platform](../schema-linking-cp.md#schema-linking-cp-overview) and [Schema Linking on Confluent Cloud](/cloud/current/sr/schema-linking.html). * **Response JSON Object:** * **schema** (*string*) – Schema string identified by the ID * **Status Codes:** * [404 Not Found](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.5) – * Error code 40403 – Schema not found * [500 Internal Server Error](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.5.1) – * Error code 50001 – Error in the backend datastore **Example request**: ```http GET /schemas/ids/1 HTTP/1.1 Host: schemaregistry.example.com Accept: application/vnd.schemaregistry.v1+json, application/vnd.schemaregistry+json, application/json ``` **Example response**: ```http HTTP/1.1 200 OK Content-Type: application/vnd.schemaregistry.v1+json { "schema": "{\"type\": \"string\"}" } ``` ### GET /schemas/ids/{int: id}/schema Retrieves only the schema identified by the input ID. * **Parameters:** * **id** (*int*) – the globally unique identifier of the schema * **format** (*string*) – Desired output format, dependent on schema type. For AVRO schemas, valid values are: `""` (default) or `resolved`. For PROTOBUF schemas, valid values are: `""` (default), `ignore_extensions`, or `serialized`. (The parameter does not apply to JSON schemas.) * **subject** (*string*) – Add `?subject=` at the end of this request to look for the subject in all contexts starting with the default context, and return the schema with the ID from that context. To learn more about contexts, see the [exporters](#schemaregistry-api-exporters) API reference and the quick start and concepts guides for [Schema Linking on Confluent Platform](../schema-linking-cp.md#schema-linking-cp-overview) and [Schema Linking on Confluent Cloud](/cloud/current/sr/schema-linking.html). * **Response JSON Object:** * **schema** (*string*) – Schema identified by the ID * **Status Codes:** * [404 Not Found](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.5) – * Error code 40403 – Schema not found * [500 Internal Server Error](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.5.1) – * Error code 50001 – Error in the backend datastore **Example request**: ```http GET /schemas/ids/1/schema HTTP/1.1 Host: schemaregistry.example.com Accept: application/vnd.schemaregistry.v1+json, application/vnd.schemaregistry+json, application/json ``` **Example response**: ```http HTTP/1.1 200 OK Content-Type: application/vnd.schemaregistry.v1+json "string" ``` ## Kafka producers and consumers for development and testing The Confluent and open source Apache Kafka® scripts for basic actions on Kafka clusters and topics live in `$CONFLUENT_HOME/etc/bin`. A full reference for Confluent premium command line tools and utilities is provided in [CLI Tools for Confluent Platform](/platform/current/installation/cli-reference.html). These include Confluent provided producers and consumers that you can run locally against either self-managed locally installed Confluent Platform instance, against the [Confluent Platform demo](/platform/current/tutorials/cp-demo/docs/overview.html), or Confluent Cloud clusters. In `$CONFLUENT_HOME/etc/bin`, you will find: - `kafka-avro-console-consumer` - `kafka-avro-console-producer` - `kafka-protobuf-console-consumer` - `kafka-protobuf-console-producer` - `kafka-json-schema-console-consumer` - `kafka-json-schema-console-producer` These are provided in the same location along with the original, generic `kafka-console-consumer` and `kafka-console-producer`, which expect an Avro schema by default. A reference for the open source utilities is provided in [Kafka Command-Line Interface (CLI) Tools](/kafka/operations-tools/kafka-tools.html). ### Topics and Schemas Schemas are associated with Kafka topics, organized under subjects in Schema Registry. (See [Terminology](../schema_registry_onprem_tutorial.md#schema-registry-terminology).) The quick start below describes how to migrate Schema Registry and the schemas it contains, but not Kafka topics. For a continuous migration (extend to cloud), you need only do a schema migration, since your topics continue to live in the primary, self-managed cluster. For a one-time migration (lift and shift), you must follow schema migration with topic migration, using [Replicator](../../multi-dc-deployments/replicator/index.md#replicator-detail) to migrate your topics to the Confluent Cloud cluster, as mentioned in [Related Content](#sr-next-steps-topics) after the quick start. The property `topic.rename.format` is described in [Destination Topics](../../multi-dc-deployments/replicator/configuration_options.md#rep-destination-topics) under [Replicator Configuration Reference for Confluent Platform](../../multi-dc-deployments/replicator/configuration_options.md#replicator-config-options). ### Single Datacenter Setup Within a single datacenter or location, a multi-node, multi-broker cluster provides Kafka data replication across the nodes. Producers write and consumers read data to/from topic partition leaders. Leaders replicate data to followers so that messages are copied to more than one broker. You can configure parameters on producers and consumers to optimize your single cluster deployment for various goals, including message durability and high availability. Kafka [producers can set the acks configuration parameter](../installation/configuration/producer-configs.md#cp-config-producer) to control when a write is considered successful. For example, setting producers to `acks=all` requires other brokers in the cluster acknowledge receiving the data before the leader broker responds to the producer. If a leader broker fails, the Kafka cluster recovers when a follower broker is elected leader and client applications can continue to write and read messages through the new leader. ##### ACLs and Security In a multi-DC setup with ACLs enabled, the schemas ACL topic must be replicated. In the case of an outage, the ACLs will be cached along with the schemas. Schema Registry will continue to run READs with ACLs if the primary Kafka cluster goes down. - For an overview of security strategies and protocols for Schema Registry, see [Secure Schema Registry for Confluent Platform](security/index.md#schemaregistry-security). - To learn how to configure ACLs on roles related to Schema Registry, see [Schema Registry ACL Authorizer for Confluent Platform](../confluent-security-plugins/schema-registry/authorization/sracl_authorizer.md#confluentsecurityplugins-sracl-authorizer). - To learn how to define Kafka topic based ACLs, see [Schema Registry Topic ACL Authorizer for Confluent Platform](../confluent-security-plugins/schema-registry/authorization/topicacl_authorizer.md#confluentsecurityplugins-topicacl-authorizer). - To learn about using role-based authorization with Schema Registry, see [Configure Role-Based Access Control for Schema Registry in Confluent Platform](security/rbac-schema-registry.md#schemaregistry-rbac). - To learn more about Replicator security, see [Security and ACL Configurations](../multi-dc-deployments/replicator/index.md#replicator-security-overview) in the Replicator documentation. ## Configuring Security for Schema ID Validation In general, Schema Registry initiates the connection to the brokers. Schema ID Validation is unique in that the broker(s) initiate the connection to Schema Registry. They do so in order to retrieve schemas from the registry, and verify that the messages they receive from producers match schemas associated with particular topics. With Schema ID Validation enabled, the sequence of tasks looks something like this: 1. A broker receives a message from a producer, and sees that it’s directed to a topic that has a schema associated. 2. The broker initiates a connection to Schema Registry. 3. The broker asks for the schema associated with the topic (by schema ID). 4. Schema Registry receives the request, finds the requested schema in its schema storage, and returns it to the broker. 5. The broker validates the schema ID. Therefore, to set up security on a cluster that has broker-side Schema ID Validation enabled on topics, you must configure settings on the Kafka broker to support this broker-initiated connection to Schema Registry. For multiple brokers, each broker must be configured. For example, for mTLS, ideally you would have a different certificate for each broker. Note that Schema Registry’s internal Kafka client to Kafka brokers is not relevant at all to the connection between broker-side Schema ID Validation and Schema Registry’s HTTP listeners. The security settings below do not reflect anything about the Schema Registry internal client-to-broker connection. The broker configurations below include `confluent.schema.registry.url`, which tells the broker how to connect to Schema Registry. You may already have configured this on your brokers, as a [prerequisite for using Schema Validation](#sv-set-sr-url-on-brokers). The rest of the settings shown are specific to security configurations. ## Troubleshoot error “Schema Registry is not set up” If you get an error message on Control Center when you try to access a topic schema (”Schema Registry is not set up”), first make sure that Schema Registry is running. Then verify that the Schema Registry `listeners` configuration matches the Control Center `confluent.controlcenter.schema.registry.url` configuration. Also check the HTTPS configuration parameters. ![image](images/c3-SR-not-set-up.png) For more information, see [A schema for message values has not been set for this topic](https://docs.confluent.io/control-center/current/installation/troubleshooting.html#c3-schema-registry-not-set-up), and start-up procedures for [Quick Start for Confluent Platform](../get-started/platform-quickstart.md#quickstart), or [Install Confluent Platform On-Premises](../installation/overview.md#installation), depending on which one of these you are using to run Confluent Platform. #### IMPORTANT As of Confluent Platform 7.5, ZooKeeper is deprecated for new deployments. Confluent recommends KRaft mode for new deployments. To learn more about running Kafka in KRaft mode, see [KRaft Overview for Confluent Platform](../kafka-metadata/kraft.md#kraft-overview), [KRaft Configuration for Confluent Platform](../kafka-metadata/config-kraft.md#configure-kraft), and the [Platform Quick Start](../get-started/platform-quickstart.md#cp-quickstart-step-1), and [Settings for other Kafka and Confluent Platform components](../kafka-metadata/config-kraft.md#config-cp-components-kraft). The following example provides both KRaft (*combined mode*) configurations. Another example of running multi-cluster Schema Registry in KRaft mode is shown in the [Schema Linking Quick Start for Confluent Platform](schema-linking-cp.md#schema-linking-cp-overview). Note that KRaft combined mode is for local experimentation only and is not supported by Confluent. ### Auto Schema Registration By default, client applications automatically register new schemas. If they produce new messages to a new topic, then they will automatically try to register new schemas. This is convenient in development environments, but in production environments it’s recommended that client applications do not automatically register new schemas. Best practice is to register schemas outside of the client application to control when schemas are registered with Schema Registry and how they evolve. Within the application, you can disable automatic schema registration by setting the configuration parameter `auto.register.schemas=false`, as shown in the following example. ```java props.put(AbstractKafkaAvroSerDeConfig.AUTO_REGISTER_SCHEMAS, false); ``` To manually register the schema outside of the application, you can use Control Center. First, create a new topic called `test` in the same way that you created a new topic called `transactions` earlier in the tutorial. Then from the **Schema** tab, click **Set a schema** to define the new schema. Specify values for: * `namespace`: a fully qualified name that avoids schema naming conflicts * `type`: [Avro data type](https://avro.apache.org/docs/1.8.1/spec.html#schemas), one of `record`, `enum`, `union`, `array`, `map`, `fixed` * `name`: unique schema name in this namespace * `fields`: one or more simple or complex data types for a `record`. The first field in this record is called `id`, and it is of type `string`. The second field in this record is called `amount`, and it is of type `double`. If you were to define the same schema as used earlier, you would enter the following in the schema editor: ```java { "type": "record", "name": "Payment", "namespace": "io.confluent.examples.clients.basicavro", "fields": [ { "name": "id", "type": "string" }, { "name": "amount", "type": "double" } ] } ``` If you prefer to connect directly to the REST endpoint in Schema Registry, then to define a schema for a new subject for the topic `test`, run the command below. ```bash curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" \ --data '{"schema": "{\"type\":\"record\",\"name\":\"Payment\",\"namespace\":\"io.confluent.examples.clients.basicavro\",\"fields\":[{\"name\":\"id\",\"type\":\"string\"},{\"name\":\"amount\",\"type\":\"double\"}]}"}' \ http://localhost:8081/subjects/test-value/versions ``` In this sample output, it creates a schema with id of `1`.: ```bash {"id":1} ``` ## Related content - Blog post: [Ensure Data Quality and Data Evolvability with a Secured Schema Registry](https://www.confluent.io/blog/ensure-data-quality-and-evolvability-with-secured-schema-registry/) - [Access Control (RBAC) for Schema Linking Exporters](../schema-linking-cp.md#cp-schema-linking-rbac) - [Configure Metadata Service (MDS) in Confluent Platform](../../kafka/configure-mds/index.md#rbac-mds-config) - [Use Role-Based Access Control (RBAC) for Authorization in Confluent Platform](../../security/authorization/rbac/overview.md#rbac-overview) - [Configure Metadata Service (MDS) in Confluent Platform](../../kafka/configure-mds/index.md#rbac-mds-config) - [Confluent CLI](https://docs.confluent.io/confluent-cli/current/installing.html) - [Role-Based Access Control for Confluent Platform Quick Start](../../security/authorization/rbac/rbac-cli-quickstart.md#rbac-cli-quickstart) - [Use Predefined RBAC Roles in Confluent Platform](../../security/authorization/rbac/rbac-predefined-roles.md#rbac-predefined-roles) - [Schema Registry Security Plugin for Confluent Platform](../../confluent-security-plugins/schema-registry/introduction.md#confluentsecurityplugins-schema-registry-security-plugin) - [Operation and Resource Support for Schema Registry in Confluent Platform](../../confluent-security-plugins/schema-registry/authorization/index.md#confluentsecurityplugins-schema-registry-authorization) ### Clients The new Producer and Consumer clients support security for Kafka versions 0.9.0 and higher. If you are using the Kafka Streams API, you can read on how to configure equivalent [SSL](/platform/current/clients/javadocs/javadoc/org/apache/kafka/common/config/SslConfigs.html) and [SASL](/platform/current/clients/javadocs/javadoc/org/apache/kafka/common/config/SaslConfigs.html) parameters. 1. Configure the following properties in a client properties file `client.properties`. ```bash sasl.mechanism=GSSAPI # Configure SASL_SSL if TLS/SSL encryption is enabled, otherwise configure SASL_PLAINTEXT security.protocol=SASL_SSL ``` 2. Configure a service name that matches the primary name of the Kafka server configured in the broker JAAS file. ```bash sasl.kerberos.service.name=kafka ``` 3. Configure the JAAS configuration property with a unique principal, i.e., usually the same name as the user running the client, and keytab, i.e., secret key, for each client. ```bash sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required \ useKeyTab=true \ storeKey=true \ keyTab="/etc/security/keytabs/kafka_client.keytab" \ principal="kafkaclient1@EXAMPLE.COM"; ``` 4. For command-line utilities like `kafka-console-consumer` or `kafka-console-producer`, `kinit` can be used along with `useTicketCache=true`. ```bash sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required \ useTicketCache=true; ``` # Configure Clients for SASL/OAUTHBEARER authentication in Confluent Platform To configure Confluent Platform and Kafka clients to use SASL/OAUTHBEARER authentication with TLS encryption when connecting to Confluent Server brokers, add the following properties to your client’s `properties` file, replacing the placeholders with your actual values: ```none sasl.mechanism=OAUTHBEARER security.protocol=SASL_SSL ssl.truststore.location= ssl.truststore.password= sasl.login.callback.handler.class=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginCallbackHandler sasl.login.connect.timeout.ms=15000 # optional sasl.oauthbearer.token.endpoint.url= sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required \ clientId="" \ clientSecret="" \ scope=""; # optional ``` The optional `scope` parameter defines the level of access the client is requesting, but is required if your identity provider does not have a default scope or your groups claim is linked to a scope. For Kafka Java clients supporting SASL OAUTHBEARER, allow specific IdP endpoints by setting the following configuration property: ```properties org.apache.kafka.sasl.oauthbearer.allowed.urls=,,... ``` This property specifies a comma-separated list of allowed IdP JWKS (JSON Web Key Set) and token endpoint URLs. Use \* (asterisk) as the value to allow any endpoint. ```properties org.apache.kafka.sasl.oauthbearer.allowed.urls=* ``` You should consult the specific Kafka client and IdP documentation for the exact interpretation and security implications of such a broad setting. Java applications should set this property as a JVM system property when launching the application: ```bash -Dorg.apache.kafka.sasl.oauthbearer.allowed.urls=,,... ``` For other clients (for example, Python, Go, .NET) that are built on librdkafka, these clients use different property names and configuration mechanisms. So, refer to specific client library documentation for the equivalent OAuthBEARER configuration properties. For details on the client configuration properties used in this example, see [Client Configuration Properties for Confluent Platform](../../../../clients/client-configs.md#client-producer-consumer-config-recs-cp). ### Configure JavaScript clients Configure your Node.js client with UAMI-specific properties using the confluent-kafka-javascript client: ```javascript const { Kafka } = require('@confluentinc/kafka-javascript').KafkaJS; const bootstrapServers = ''; // Azure IMDS API version - use 2025-04-07 or later const azureIMDSApiVersion = '2025-04-07'; const bootstrapEndpoint = ''; const uamiClientId = ''; const azureIMDSQueryParams = `api-version=${azureIMDSApiVersion}&resource=${bootstrapEndpoint}&client_id=${uamiClientId}`; const logicalCluster = ''; const identityPoolId = ''; const kafka = new Kafka({ 'bootstrap.servers': bootstrapServers, 'security.protocol': 'SASL_SSL', 'sasl.mechanisms': 'OAUTHBEARER', 'sasl.oauthbearer.method': 'oidc', 'sasl.oauthbearer.metadata.authentication.type': 'azure_imds', 'sasl.oauthbearer.config': `query=${azureIMDSQueryParams}`, 'sasl.oauthbearer.extensions': `logicalCluster=${logicalCluster},identityPoolId=${identityPoolId}` }); const producer = kafka.producer(); await producer.connect(); ``` ## Related content - [Use Centralized ACLs with MDS for Authorization in Confluent Platform](../rbac/authorization-acl-with-mds.md#authorization-acl-with-mds) - [Schema Registry ACL Authorizer for Confluent Platform](../../../confluent-security-plugins/schema-registry/authorization/sracl_authorizer.md#confluentsecurityplugins-sracl-authorizer) - [Confluent Replicator to Confluent Cloud ACL Configurations](/cloud/current/get-started/examples/ccloud/docs/replicator-to-cloud-configuration-types.html) - [Configure Authorization of ksqlDB with Kafka ACLs](../../../ksqldb/operate-and-deploy/installation/security.md#ksqldb-installation-security-auth-with-acls) - [Required ACL setting for secure Kafka clusters](../../../streams/developer-guide/security.md#streams-developer-guide-security-acls) - [Cluster Linking Authorization (ACLs)](../../../multi-dc-deployments/cluster-linking/security.md#cluster-link-acls) - [Configure Control Center to work with Kafka ACLs on Confluent Platform](/control-center/current/security/config-c3-for-kafka-acls.html) - [Confluent CLI confluent iam acl](https://docs.confluent.io/confluent-cli/current/command-reference/iam/acl/index.html#confluent-iam-acl) - [Access Control Lists (ACLs) for Confluent Cloud](/cloud/current/access-management/acl.html) ### Authentication and group-based authorization using an LDAP server A Kerberos-enabled LDAP server (for example, Active Directory or Apache Directory Server) may be used for authentication as well as group-based authorization if users and groups are managed by this server. The instructions below use `SASL/GSSAPI` for authentication using AD or DS and obtain group membership of the users from the same server. The example is based on the assumption that you have the following three user principals and keytabs for these principals: * `kafka/localhost@EXAMPLE.COM`: Service principal for brokers * `alice@EXAMPLE.COM`: Client principal, member of group `Kafka Developers` * `ldap@EXAMPLE.COM` : Principal used by LDAP Authorizer Note that the user principal used for authorization is the local name (for example, `kafka`, `alice`) by default and these short principals are used to determine group membership. Brokers may be configured with custom `principal.builder.class` or `sasl.kerberos.principal.to.local.rules` to override this behavior. The attributes used for mapping users to groups may also be customized to match your LDAP server. If you have already started the broker using `SASL/SCRAM-SHA-256` following the instructions above, stop the server first. The instructions below are based on the assumption that you have already updated configuration for brokers, producers, and consumers as described earlier. Configure listeners to use `GSSAPI` by updating the following properties in your broker configuration file (for example, `etc/kafka/server.properties`). ```bash sasl.enabled.mechanisms=GSSAPI sasl.mechanism.inter.broker.protocol=GSSAPI sasl.kerberos.service.name=kafka listener.name.sasl_plaintext.gssapi.sasl.jaas.config= \ com.sun.security.auth.module.Krb5LoginModule required \ keyTab="/tmp/keytabs/kafka.keytab" \ principal="kafka/localhost@EXAMPLE.COM" \ debug="true" \ storeKey="true" \ useKeyTab="true"; ``` Add or update the following properties in your producer and consumer configuration files (e.g. `etc/kafka/producer.properties` and `etc/kafka/consumer.properties`) ```bash sasl.mechanism=GSSAPI sasl.kerberos.service.name=kafka sasl.jaas.config= com.sun.security.auth.module.Krb5LoginModule required \ keyTab="/tmp/keytabs/alice.keytab" \ principal="alice@EXAMPLE.COM" \ debug="true" \ storeKey="true" \ useKeyTab="true"; ``` Restart the broker, and run the producer and consumer as described earlier. Producers and consumers are now authenticated using your Kerberos server. Group information is also obtained from the same server using LDAP. ## RBAC benefits RBAC helps you: * Manage security access across|cp| including Kafka, ksqlDB, Connect, Schema Registry, Confluent Control Center and Confluent Platform for Apache Flink® by using granular permissions to control user and group access. For example, with RBAC you can specify permissions for each connector or Flink job in a cluster, making it easier to get multiple workloads up and running. * Manage authorization at scale. Administrators can centrally manage the assignment of predefined roles, and also delegate the responsibility of managing access and permissions to the different departments or business units who are the true owners and most familiar with those resources. * Centrally manage authentication and authorization for multiple clusters, which includes: MDS, Kafka Clusters, Connect, ksqlDB, Schema Registry clusters, Confluent Platform for Apache Flink applications, and a single Confluent Control Center. ## Connect To configure [Connect RBAC](../../../connect/rbac/connect-rbac-getting-started.md#connect-rbac-getting-started) role bindings using the REST API: 1. Get the MDS token: ```none curl --cacert --key --cert -u : -s https://:8090/security/1.0/authenticate ``` 2. Grant the Security Admin role to a Connect cluster: ```none curl --cacert --key --cert -X POST https://:8090/security/1.0/principals/User:/roles/SecurityAdmin -H "accept: application/json" -H "Authorization: Bearer " -H "Content-Type: application/json" -d '{"clusters":{"kafka-cluster":"","connect-cluster":""}} ``` 3. Grant the Connect user the ResourceOwner role on the group that Connect nodes use to coordinate across the cluster: ```none curl --cacert --key --cert -X POST https://:8090/security/1.0/principals/User:/roles/ResourceOwner/bindings -H "accept: application/json" -H "Authorization: Bearer " -H "Content-Type: application/json" -d '{"scope":{"clusters":{"kafka-cluster":""}},"resourcePatterns":[{"resourceType":"Group","name":"connect-cluster","patternType":"LITERAL"}]}' ``` 4. Grant the Resource Owner role on the configuration storage topic: ```none curl --cacert --key --cert -X POST https://:8090/security/1.0/principals/User:/roles/ResourceOwner/bindings -H "accept: application/json" -H "Authorization: Bearer " -H "Content-Type: application/json" -d '{"scope":{"clusters":{"kafka-cluster":""}},"resourcePatterns":[{"resourceType":"Topic","name":"connect-configs","patternType":"LITERAL"}]}' ``` 5. Grant the Resource Owner role on the offset storage topic: ```none curl --cacert --key --cert -X POST https://:8090/security/1.0/principals/User:/roles/ResourceOwner/bindings -H "accept: application/json" -H "Authorization: Bearer " -H "Content-Type: application/json" -d '{"scope":{"clusters":{"kafka-cluster":""}},"resourcePatterns":[{"resourceType":"Topic","name":"connect-offsets","patternType":"LITERAL"}]}' ``` 6. Grant the Resource Owner role on the status storage topic: ```none curl --cacert --key --cert -X POST https://:8090/security/1.0/principals/User:/roles/ResourceOwner/bindings -H "accept: application/json" -H "Authorization: Bearer " -H "Content-Type: application/json" -d '{"scope":{"clusters":{"kafka-cluster":""}},"resourcePatterns":[{"resourceType":"Topic","name":"connect-status","patternType":"LITERAL"}]}' ``` ### Verify the audit log retention setting These procedures only affect your retention policy. It is recommended that you make minor changes only. 1. Use the Confluent CLI to modify the audit log configuration and update the `retention_ms` of one or more destination topics: ```json # Capture the current configuration from MDS confluent audit-log config describe > /tmp/audit-log-config.json # View what was captured cat /tmp/audit-log-config.json { "destinations": { "bootstrap_servers": [ "logs1.example.com:9092", "logs2.example.com:9092" ], "topics": { "confluent-audit-log-events": { "retention_ms": 7776000000 } } }, "default_topics": { "allowed": "confluent-audit-log-events", "denied": "confluent-audit-log-events" } } # Make a small change vim /tmp/audit-log-config.json # e.g. - change 7776000000 to 7776000001 # Post the change back to MDS confluent audit-log config update < /tmp/audit-log-config.json ``` 2. Verify that the topic’s `retention.ms` setting reflects the new value on the destination cluster: ```json cat /tmp/destination-cluster-admin-client.properties bootstrap.servers= security.protocol=SASL_SSL sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \ username="" \ password=""; ssl.endpoint.identification.algorithm=https sasl.mechanism=PLAIN truststore.location= truststore.password= kafka-topics --bootstrap-server \ --command-config /tmp/destination-cluster-admin-client.properties \ --describe --topic Topic: confluent-audit-log-events PartitionCount: 6 ReplicationFactor: 3 Configs: min.insync.replicas=2,cleanup.policy=delete,retention.ms=7776000001 Topic: confluent-audit-log-events Partition: 0 Leader: 2 Replicas: 2,1,0 Isr: 2,0,1 Topic: confluent-audit-log-events Partition: 1 Leader: 1 Replicas: 1,0,2 Isr: 2,0,1 Topic: confluent-audit-log-events Partition: 2 Leader: 0 Replicas: 0,2,1 Isr: 2,0,1 Topic: confluent-audit-log-events Partition: 3 Leader: 2 Replicas: 2,0,1 Isr: 2,0,1 Topic: confluent-audit-log-events Partition: 4 Leader: 1 Replicas: 1,2,0 Isr: 2,0,1 Topic: confluent-audit-log-events Partition: 5 Leader: 0 Replicas: 0,1,2 Isr: 2,0,1 ``` 3. Alter the `retention.ms` value of one of the destination topics directly on the destination cluster: ```json kafka-topics --bootstrap-server :9092 \ --command-config /tmp/destination-cluster-admin-client.properties \ --alter --topic confluent-audit-log-events \ --config retention.ms=7776000002 ``` 4. Verify that the audit log configuration shows the new `retention_ms` setting: ```json confluent audit-log config describe { "destinations": { "bootstrap_servers": [ "logs1.example.com:9092", "logs2.example.com:9092" ], "topics": { "confluent-audit-log-events": { "retention_ms": 7776000002 } } }, "default_topics": { "allowed": "confluent-audit-log-events", "denied": "confluent-audit-log-events" } } ``` If this troubleshooting procedure doesn’t work (for example, if audit logging is not configured properly, you will get an error when you attempt to run the `describe` command), check to ensure that the connection and credentials in your MDS broker properties (the properties prefixed by `confluent.security.event.logger.destination.admin.`) are working. Also verify that you’ve granted sufficient permissions to the admin client principal on the destination cluster. Note that the minimum role binding should grant the ResourceOwner role on topics with the prefix `confluent-audit-log-events` on the destination cluster. Finally, confirm that the destination cluster is reachable and listening for connections from the MDS cluster’s network address. ### Verify the audit log configuration is synchronized to registered clusters Use the following command to verify that the audit log configuration is synchronized to registered clusters: ```none kafka-configs --bootstrap-server :9092 \ --command-config /tmp/managed-cluster-admin-client.properties \ --entity-type brokers \ --entity-default \ --describe \ | grep confluent.security.event.router.config ``` You should see the same JSON audit log configuration you get when you run `confluent audit-log config describe`. It is possible that `retention_ms` values may differ if the audit topics have been altered directly on the destination cluster, in which case the metadata in the JSON may also be different. Everything else should be the same. If this verification fails check the following for the MDS cluster registry: - The clusters expose an auth token listener (`listener.name..sasl.enabled.mechanism=OAUTHBEARER`) - The clusters’ TLS keys are verifiable by certificates in the MDS server’s trust store. Also look for error status messages when making an audit log API update request to MDS. ## Identity and access management Confluent Platform offers several built-in features to help you enforce who can access your Confluent cluster and what they can do is foundational to security. - The role-based access control (RBAC) system lets you assign roles like “ClusterAdmin” or “DeveloperRead” to users and service accounts. You can scope permissions to individual clusters, topics, or consumer groups. - For environments not using RBAC, can use Apache Kafka® Access Control Lists (ACLs) to control producer and consumer access at the topic or group level. ACLs also provide compatibility for existing Kafka security setups. - OAuth 2.0 supported integrations with identity providers like Okta, Keycloak, and Entra ID allow centralized user management as well as single sign-on (SSO) for Confluent Cloud. - Confluent supports TLS for secure communication and can enforce mutual authentication (mTLS) between clients and brokers. By issuing client certificates, you can authenticate both ends of every connection. ### Step 2 - Start the producer To start the producer, run the `kafka-avro-console-producer` command for the KMS provider that you want to use, where `` is the bootstrap URL for your Confluent Platform cluster and `` is the URL for your Schema Registry instance. ```shell ./bin/kafka-avro-console-producer --bootstrap-server \ --property schema.registry.url= \ --topic test \ --producer.config config.properties \ --property basic.auth.credentials.source=USER_INFO \ --property basic.auth.user.info=${SR_API_KEY}:${SR_API_SECRET} \ --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"f1","type":"string","confluent:tags":["PII"]}]}' \ --property value.rule.set='{ "domainRules": [ { "name": "encryptPII", "type": "ENCRYPT", "tags":["PII"], "params": { "encrypt.kek.name": "aws-kek1", "encrypt.kms.key.id": "arn:aws:kms:us-east-1:xxxx:key/xxxx", "encrypt.kms.type": "aws-kms" }, "onFailure": "ERROR,NONE"}]}' ``` ### Overview All components and clients in `cp-demo` make full use of Confluent Platform’s extensive [security features](../../security/overview.md#security). - [Role-Based Access Control (RBAC)](../../security/authorization/rbac/overview.md#rbac-overview) for authorization. Give principals access to resources using role-bindings. #### NOTE RBAC is powered by the [Metadata Service (MDS)](../../kafka/configure-mds/index.md#rbac-mds-config) which uses Confluent Server Authorizer to connect to an OpenLDAP directory service. This enables group-based authorization for scalable access management. - [SSL](../../security/authentication/mutual-tls/overview.md#kafka-ssl-authentication) for encryption and mTLS for authentication. The example [automatically generates](https://github.com/confluentinc/cp-demo/tree/latest/scripts/security/certs-create.sh) SSL certificates and creates keystores, truststores, and secures them with a password. - [HTTPS for Control Center](https://docs.confluent.io/platform/current/control-center/installation/configuration.html#https-settings). - [HTTPS for Schema Registry](../../schema-registry/security/index.md#schemaregistry-security). - [HTTPS for Connect](../../connect/security.md#connect-security). You can see each component’s security configuration in the example’s [docker-compose.yml](https://github.com/confluentinc/cp-demo/tree/latest/docker-compose.yml) file. ### Embedded REST Proxy For the next few steps, use the REST Proxy that is embedded on the Kafka brokers. Only [REST Proxy API v3](../../kafka-rest/api.md#rest-proxy-v3) is supported. 1. Create a role binding for the client to be granted `ResourceOwner` role for the topic `dev_users`. Get the Kafka cluster ID: ```none KAFKA_CLUSTER_ID=$(curl -s https://localhost:8091/v1/metadata/id --tlsv1.2 --cacert scripts/security/snakeoil-ca-1.crt | jq -r ".id") ``` Create the role binding: ```text # Create the role binding for the topic ``dev_users`` docker compose exec tools bash -c "confluent iam rbac role-binding create \ --principal User:appSA \ --role ResourceOwner \ --resource Topic:dev_users \ --kafka-cluster-id $KAFKA_CLUSTER_ID" ``` 2. Create the topic `dev_users` with embedded REST Proxy. Get the Kafka cluster ID: ```none KAFKA_CLUSTER_ID=$(curl -s https://localhost:8091/v1/metadata/id --tlsv1.2 --cacert scripts/security/snakeoil-ca-1.crt | jq -r ".id") ``` Use `curl` to create the topic: ```text docker exec restproxy curl -s -X POST \ -H "Content-Type: application/json" \ -H "accept: application/json" \ -d "{\"topic_name\":\"dev_users\",\"partitions_count\":64,\"replication_factor\":2,\"configs\":[{\"name\":\"cleanup.policy\",\"value\":\"compact\"},{\"name\":\"compression.type\",\"value\":\"gzip\"}]}" \ --cert /etc/kafka/secrets/mds.certificate.pem \ --key /etc/kafka/secrets/mds.key \ --tlsv1.2 \ --cacert /etc/kafka/secrets/snakeoil-ca-1.crt \ -u appSA:appSA \ "https://kafka1:8091/kafka/v3/clusters/$KAFKA_CLUSTER_ID/topics" | jq ``` 3. List topics with embedded REST Proxy to find the newly created `dev_users`. Get the Kafka cluster ID: ```none KAFKA_CLUSTER_ID=$(curl -s https://localhost:8091/v1/metadata/id --tlsv1.2 --cacert scripts/security/snakeoil-ca-1.crt | jq -r ".id") ``` Use `curl` to list the topics: ```text docker exec restproxy curl -s -X GET \ -H "Content-Type: application/json" \ -H "accept: application/json" \ --cert /etc/kafka/secrets/mds.certificate.pem \ --key /etc/kafka/secrets/mds.key \ --tlsv1.2 \ --cacert /etc/kafka/secrets/snakeoil-ca-1.crt \ -u appSA:appSA \ https://kafka1:8091/kafka/v3/clusters/$KAFKA_CLUSTER_ID/topics | jq '.data[].topic_name' ``` Your output should resemble below. Output may vary, depending on other topics you may have created, but at least you should see the topic `dev_users` created in the previous step. ```text "_confluent-monitoring" "dev_users" "users" "wikipedia-activity-monitor-KSTREAM-AGGREGATE-STATE-STORE-0000000003-changelog" "wikipedia-activity-monitor-KSTREAM-AGGREGATE-STATE-STORE-0000000003-repartition" "wikipedia.failed" "wikipedia.parsed" "wikipedia.parsed.count-by-domain" "wikipedia.parsed.replica" ``` ## Use case The use case for this application is an Kafka event streaming application that processes real-time edits to real Wikipedia pages. The following image shows the application topology: ![image](tutorials/cp-demo/images/cp-demo-overview-with-ccloud.svg) The full event streaming platform based on Confluent Platform is described as follows: 1. Wikimedia’s [EventStreams](https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams) publishes a continuous stream of real-time edits happening to real wiki pages. 2. A Kafka source connector [kafka-connect-sse](https://www.confluent.io/hub/cjmatta/kafka-connect-sse) streams the server-sent events (SSE) from [https://stream.wikimedia.org/v2/stream/recentchange](https://stream.wikimedia.org/v2/stream/recentchange), and a custom Connect transform [kafka-connect-json-schema](https://www.confluent.io/hub/jcustenborder/kafka-connect-json-schema) extracts the JSON from these messages and then are written to a Kafka cluster. 3. Data processing is done with [ksqlDB](../../ksqldb/overview.md#ksql-home) and a [Kafka Streams](../../streams/overview.md#kafka-streams) application. 4. A Kafka sink connector [kafka-connect-elasticsearch](https://www.confluent.io/hub/confluentinc/kafka-connect-elasticsearch) streams the data out of Kafka and is materialized into [Elasticsearch](https://www.elastic.co/products/elasticsearch) for analysis by [Kibana](https://www.elastic.co/products/kibana). All data is in Avro format, uses Confluent Schema Registry, and [Confluent Control Center](https://www.confluent.io/product/control-center/) is managing and monitoring the deployment. #### Requirements * [An OIDC-compliant identity provider (IdP)](https://docs.confluent.io/platform/current/security/authentication/sso-for-c3/configure-sso-using-oidc.html#step-1-establish-a-trust-relationship-between-cp-and-identity-provider). * Port 8090 must be opened on the Kafka brokers and accessible by all hosts. * Set up one principal in OIDC for the MDS admin user to bootstrap roles and permissions for the Confluent Platform component principals. It is recommended that you create a user named `superuser`. * Set up one principal per Confluent Platform component in your OIDC server. These users are used by the Confluent Platform components to authenticate to MDS and access their respective resources. In the below examples, the following component users are used: * Confluent Server: `kafka_broker` * Schema Registry: `schema_registry` * Connect: `kafka_connect` * ksqlDB: `ksql` * REST Proxy: `kafka_rest` * Confluent Server REST API: `kafka_broker` * Control Center: `control_center` * Set up Confluent Platform with [OAuth/OIDC authentication](ansible-authenticate.md#ansible-oauth). ## Metrics API The [Confluent Cloud Metrics](../monitoring/metrics-api.md#metrics-api) provides programmatic access to actionable metrics for your Confluent Cloud deployment, including server-side metrics for the Confluent-managed services. However, the Metrics API does not allow you to get client-side metrics. To retrieve client-side metrics, see [Producers](#ccloud-monitoring-producers) and [Consumers](#ccloud-monitoring-consumers). The Metrics API, enabled by default, aggregates metrics at the topic and cluster level. Any authorized user can gain access to the metrics that allow you to monitor overall usage and performance. To get started with the Metrics API, see the [Confluent Cloud Metrics](../monitoring/metrics-api.md#metrics-api) documentation. You can use the Metrics API to query metrics at the following granularities (other resolutions are available if needed): - Bytes produced per minute grouped by topic - Bytes consumed per minute grouped by topic - Max retained bytes per hour over two hours for a given topic - Max retained bytes per hour over two hours for a given cluster You can retrieve the metrics easily over the internet using HTTPS, capturing them at regular intervals to get a time series and an operational view of cluster performance. You can integrate the metrics into any cloud provider monitoring tools like [Azure Monitor](https://azure.microsoft.com/en-us/services/monitor/#product-overview), [Google Cloud’s operations suite](https://cloud.google.com/products/operations) (formerly Stackdriver), or [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/), or into existing monitoring systems like [Prometheus](https://prometheus.io/) and [Datadog](https://www.datadoghq.com/), and then plot them in a time series graph to see usage over time. When writing your own application to use the Metrics API, see the [full API specification](https://api.telemetry.confluent.cloud/docs) to use advanced features. ## Schema Management and Evolution There is an implicit contract that Kafka producers write data with a schema that can be read by Kafka consumers, even as producers and consumers evolve their schemas. Kafka applications depend on these schemas and expect that any changes made to schemas are still compatible and able to run. This is where Confluent Schema Registry helps: It provides centralized schema management and compatibility checks as schemas evolve. If your application is using Schema Registry, you can simulate a Schema Registry instance in your unit testing. Use Confluent Schema Registry’s [MockSchemaRegistryClient](https://github.com/confluentinc/schema-registry/blob/master/client/src/main/java/io/confluent/kafka/schemaregistry/client/SchemaRegistryClient.java) to register and retrieve schemas that enable you to serialize and deserialize data. `MockSchemaRegistryClient` example in [Kafka Tutorial](https://developer.confluent.io/tutorials/create-stateful-aggregation-minmax/kstreams.html). As you start building examples of streaming applications and tests using the Kafka Streams API along with Schema Registry, for integration testing, use any of the tools described in [Integration Testing](#ccloud-testing-integration). After your applications are running in production, schemas may evolve but still need to be compatible for all applications that rely on both old and new versions of a schema. Confluent Schema Registry allows for [schema evolution and provides compatibility checks](/cloud/current/sr/fundamentals/schema-evolution.html) to ensure that the contract between producers and consumers is not broken. This allows producers and consumers to update independently as well as evolve their schemas independently, with assurances that they can read new and legacy data. Confluent provides a [Schema Registry Maven plugin](/cloud/current/sr/develop/maven-plugin.html), which checks the compatibility of a new schema against previously registered schemas. Refer to an example of this plugin in [Java example client pom.xml](https://github.com/confluentinc/examples/blob/latest/clients/cloud/java/pom.xml). # Manage Kafka Cluster Configuration Settings in Confluent Cloud This topic describes the default Apache Kafka® cluster configuration settings in Confluent Cloud. For a complete description of all Kafka configurations, see [Confluent Platform Configuration Reference](/platform/current/installation/configuration/index.html). Considerations: - You cannot edit cluster settings on Confluent Cloud on Basic, Standard, Enterprise, and Freight clusters, but many configuration settings are available at the topic level instead. For more information, see [Manage Topics in Confluent Cloud](../topics/overview.md#cloud-topics-manage). - You can change some configuration settings on Dedicated clusters using the Confluent CLI or REST API. See [Change cluster settings for Dedicated clusters](#custom-settings-dedicated). - The default maximum timeout for registered consumers is different for Confluent Cloud Kafka clusters than for Confluent Platform clusters and cannot be changed. - `group.max.session.timeout.ms` default is 1200000 ms (20 minutes) ## Cluster limit comparison Use the table below to compare cluster limits across cluster types. For Enterprise clusters, the following table shows the current maximum (10 eCKU). If you’re participating in the 32 eCKU Limited Availability for Enterprise clusters, your cluster limits are higher. | Dimension | [Basic](#basic-cluster) | [Standard](#standard-cluster) | [Enterprise](#enterprise-cluster) | [Dedicated](#dedicated-cluster) | [Freight](#freight-cluster) | |---------------------------------------------------------------------------|---------------------------|---------------------------------|-----------------------------------------------------------------------|-----------------------------------|-------------------------------| | [Maximum eCKU/CKU](#min-max-ecku) | 50 | 10 | 10 (current maximum)/ 32 (Limited Availability) | 152 | 152 | | Ingress (MBps) \* | 250 | 250 | 600 | 9,120 | 9,120 | | Egress (MBps) \* | 750 | 750 | 1800 | 27,360 | 27,360 | | Partitions (pre-replication) \* | 1500 | 2500 | 30,000 | 100,000 | 50,000 | | Number of partitions you can compact \* | 1500 | 2500 | 3,600 | 100,000 | None | | Total client connections \* | 1000 | 10,000 | 180,000 | 2,736,000 | 2,736,000 | | Connection attempts (per second) \* | 80 | 800 | 5,000 | 76,000 | 76,000 | | Requests (per second) \* | 15,000 | 15,000 | 75,000 | 2,280,000 | 2,280,000 | | Message size (MB) | 8 | 8 | 20 | 20 | 20 | | Client version (minimum) | 0.11.0 | 0.11.0 | 0.11.0 | 0.11.0 | 0.11.0 | | Request size (MB) | 100 | 100 | 100 | 100 | 100 | | Fetch bytes (MB) | 55 | 55 | 55 | 55 | 55 | | API keys | 50 | 100 | 500 | 2,000 | 500 | | Partition creation and deletion (per five minute period) | 250 | 500 | 500 | 5,000 | 500 | | Connector tasks per Kafka cluster | 250 \*† | 250 | 250 | 250 | N/A | | ACLs | 1,000 | 1,000 | 4,000 | 10,000 | 10,000 | | Kafka REST Produce v3 - Max throughput (MBps): | 10 | 10 | 10 | 7,600 | 10 | | Kafka REST Produce v3 - Max connection requests (per second): | 25 | 25 | 25 | 45,600 | 25 | | Kafka REST Produce v3 - Max streamed requests (per second): | 1000 | 1000 | 1000 | 456,000 | 1000 | | Kafka REST Produce v3 - Max message size for Kafka REST Produce API (MB): | 8 | 8 | 8 | 20 | 8 | | Kafka REST Admin v3 - Max connection requests (per second): | 25 | 25 | 25 | 45,600 | 25 | \* Limit based on Elastic Confluent Unit for Kafka (eCKU). You only pay for the capacity you use up to the limit. For more information, see [Elastic Confluent Unit for Kafka](../billing/overview.md#e-cku-definition). † Limit based on a Dedicated Kafka cluster with 152 CKU. For more information, see [CKU purchase limits](#cku-limits-per-cluster) and [Confluent Unit for Kafka](../billing/overview.md#cku-definition). \*† Basic clusters are limited to one task per connector. You can deploy 250 connectors to a Basic cluster but each connector can only have one task. If you need more than one task, upgrade your cluster. The capabilities provided in this topic are for planning purposes, and are not a guarantee of performance, which varies depending on each unique configuration. ### Resources you can manage in code - [API keys](https://www.pulumi.com/registry/packages/confluentcloud/api-docs/apikey) - [Connectors](https://www.pulumi.com/registry/packages/confluentcloud/api-docs/connector) - [Confluent Cloud Environments](https://www.pulumi.com/registry/packages/confluentcloud/api-docs/environment/) - [Kafka ACLs](https://www.pulumi.com/registry/packages/confluentcloud/api-docs/kafkaacl/) - [Kafka clusters](https://www.pulumi.com/registry/packages/confluentcloud/api-docs/kafkacluster/) - [Kafka topics](https://www.pulumi.com/registry/packages/confluentcloud/api-docs/kafkatopic/) - [Networks](https://www.pulumi.com/registry/packages/confluentcloud/api-docs/network/) - [Peering networks](https://www.pulumi.com/registry/packages/confluentcloud/api-docs/peering/) - [Private Link Access](https://www.pulumi.com/registry/packages/confluentcloud/api-docs/privatelinkaccess/) - [Role bindings](https://www.pulumi.com/registry/packages/confluentcloud/api-docs/rolebinding/) - [Service accounts](https://www.pulumi.com/registry/packages/confluentcloud/api-docs/serviceaccount/) [Get started with Pulumi](https://www.pulumi.com/docs/get-started/) and install the [Confluent Cloud provider for Pulumi](https://www.pulumi.com/registry/packages/confluentcloud/). ### Configuration 1. Add the following details: - Select the output record value format (data going to the Kafka topic): AVRO, JSON, or JSON_SR (JSON Schema). [Schema Registry](../get-started/schema-registry.md#cloud-sr-config) must be enabled to use a Schema Registry-based format (for example, Avro, or JSON Schema). For additional information, see [Schema Registry Enabled Environments](limits.md#connect-ccloud-environment-limits). - **Amazon CloudWatch Logs Endpoint URL**: The URL to use as the endpoint for connecting to Amazon CloudWatch for Logs. For example, `https://logs.us-east-1.amazonaws.com`. - **Amazon CloudWatch Logs Group Name**: The name of the log group on Amazon CloudWatch under which the desired log streams are contained. ### **Show advanced configurations** - **Schema context**: Select a schema context to use for this connector, if using a schema-based data format. This property defaults to the **Default** context, which configures the connector to use the default schema set up for Schema Registry in your Confluent Cloud environment. A schema context allows you to use separate schemas (like schema sub-registries) tied to topics in different Kafka clusters that share the same Schema Registry environment. For example, if you select a non-default context, a **Source** connector uses only that schema context to register a schema and a **Sink** connector uses only that schema context to read from. For more information about setting up a schema context, see [What are schema contexts and when should you use them?](../sr/faqs-cc.md#faq-schema-contexts). - **CloudWatch Log Stream Name(s)**: List of the log streams on Amazon CloudWatch where you want to track log records. If the field is left empty, all log streams under the log group are tracked. - **AWS Poll Interval in Milliseconds**: Time in milliseconds (ms) the connector waits between polling the endpoint for updates. The default value is `1000` ms (1 second). **Auto-restart policy** - **Enable Connector Auto-restart**: Control the auto-restart behavior of the connector and its task in the event of user-actionable errors. Defaults to `true`, enabling the connector to automatically restart in case of user-actionable errors. Set this property to `false` to disable auto-restart for failed connectors. In such cases, you would need to manually restart the connector. **Transforms** - **Single Message Transforms**: To add a new SMT, see [Add transforms](single-message-transforms.md#cc-single-message-transforms-ui). For more information about unsupported SMTs, see [Unsupported transformations](single-message-transforms.md#cc-single-message-transforms-unsupported-transforms). For all property values and definitions, see [Configuration Properties](#cc-amazon-cloudwatch-logs-source-config-properties). 2. Click **Continue**. ## Features The Amazon SQS Source connector provides the following features: * **Topics created automatically**: The connector can automatically create Kafka topics. * **At least once delivery**: The connector guarantees that records are delivered at least once to the Kafka topic. * **Supports multiple tasks**: The connector supports running one or more tasks. More tasks may improve performance. * **Automatic retries**: The connector will retry all requests (that can be retried) when the Amazon SQS service is unavailable. This value defaults to three retries. * **Supported data formats**: The connector supports Avro, JSON Schema (JSON-SR), Protobuf, and JSON (schemaless) output formats. [Schema Registry](../get-started/schema-registry.md#cloud-sr-config) must be enabled to use a Schema Registry-based format (for example, Avro, JSON Schema, or Protobuf). See [Schema Registry Enabled Environments](limits.md#connect-ccloud-environment-limits) for additional information. * **Provider integration support**: The connector supports IAM role-based authorization using Confluent Provider Integration. For more information about provider integration setup, see the [IAM roles authentication](#cc-amazon-sqs-source-setup-connection). For more information and examples to use with the Confluent Cloud API for Connect, see the [Confluent Cloud API for Connect Usage Examples](connect-api-section.md#ccloud-connect-api) section. # Stream Processing with Confluent Cloud for Apache Flink Apache Flink® is a powerful, scalable stream processing framework for running complex, stateful, low-latency streaming applications on large volumes of data. Flink excels at complex, high-performance, mission-critical streaming workloads and is used by many companies for production stream processing applications. Flink is the de facto industry standard for stream processing. Confluent Cloud for Apache Flink provides a cloud-native, serverless service for Flink that enables simple, scalable, and secure stream processing that integrates seamlessly with Apache Kafka®. Your Kafka topics appear automatically as queryable Flink tables, with schemas and metadata attached by Confluent Cloud. Confluent Cloud for Apache Flink supports creating stream-processing applications by using Flink SQL, the [Flink Table API](reference/table-api.md#flink-table-api) (Java and Python), and custom [user-defined functions](concepts/user-defined-functions.md#flink-sql-udfs). To run Flink on-premises with Confluent Platform, see [Confluent Platform for Apache Flink](/platform/current/flink/overview.html). - [What is Confluent Cloud for Apache Flink?](#ccloud-flink-overview-what-is-flink) - [Cloud native](#ccloud-flink-overview-cloud-native) - [Complete](#ccloud-flink-overview-complete) - [Everywhere](#ccloud-flink-overview-everywhere) - [Program Flink with SQL, Java, and Python](#ccloud-flink-overview-program-flink) - [Confluent for VS Code](#ccloud-flink-overview-vs-code) # Manage Topics in Confluent Cloud An Apache Kafka® [topic](../_glossary.md#term-topic) is a category or feed that stores messages. [Producers](../_glossary.md#term-producer) send messages and write data to topics, and [consumers](../_glossary.md#term-consumer) read messages from topics. Topics are grouped by cluster within [environments](../security/access-control/hierarchy/cloud-environments.md#cloud-environments). You can apply [schemas](../_glossary.md#term-schema) to topics. This page provides steps to create, edit, and delete Kafka topics in Confluent Cloud using the Cloud Console or the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/command-reference/kafka/topic/index.html). You can also list, create or delete topics with [REST APIs](https://docs.confluent.io/cloud/current/api.html#tag/Topic-(v3)). If you have more than 1000 topics, Cloud Console may not display metrics for all the topics. For complete monitoring of all topics, use the Metrics API. For more information, see [Confluent Cloud Metrics](../monitoring/metrics-api.md#metrics-api). # Configure a service account The Unified Stream Manager (USM) Agent requires a service account to securely authenticate with Confluent Cloud and collect metadata from your Confluent Platform cluster. The service account must have both the `USMAgent` and `DataSteward` roles. Additionally, it requires the following API keys: * An API key with `Schema Registry` scope. * An API key with `Cloud resource management` scope. Use separate service accounts for each logically separate Confluent Platform environments that connects to Confluent Cloud though USM. For example, if you have development and production environments, use a separate service account for each. You have two options for configuration: creating a new service account dedicated to this purpose or using an existing service account. * When you create a new account in the wizard, the roles `USMAgent` and `DataSteward` and API keys `schema registry` and `Cloud resource management` with the necessary permissions are assigned automatically. * If you choose to use an existing account, you must manually verify that it has the `USMAgent` and `DataSteward` roles assigned. If these roles are not assigned, the registration fails. To add role bindings to a principal, see [Add role bindings to a principal](https://docs.confluent.io/cloud/current/security/access-control/rbac/manage-role-bindings.html#add-role-bindings-to-a-principal). Also, verify the service account has required API keys `schema registry` and `Cloud resource management` for the USM Agent to use. For details, see [Add an API key](../../security/authenticate/workload-identities/service-accounts/api-keys/manage-api-keys.md#create-api-key). ### Confluent Platform versions - For the compatible Confluent Platform versions for this version of Confluent CLI, see the [compatibility table](https://docs.confluent.io/platform/current/installation/versions-interoperability.html#confluent-cli). - The Confluent CLI for Confluent Platform requires that you have the [Confluent REST Proxy server for Apache Kafka](/platform/current/kafka-rest/index.html) running. The Confluent REST Proxy server mediates uses the [APIs](/platform/current/kafka-rest/api.html) and mediate between the Confluent CLI and your clusters. This is not required for the Apache Kafka® tools or “scripts” that come with Kafka and ship with Confluent Platform. These alternatives to the Confluent CLI for Confluent Platform do not require the Confluent REST Proxy service to be running. Therefore, the Confluent Platform tutorials in the documentation sometimes feature Kafka scripts rather than the Confluent CLI commands in order to simplify setup for getting started tasks. For example, the basic Cluster Linking tutorial for Confluent Platform that describes how to [Share data across topics](/platform/current/multi-dc-deployments/cluster-linking/topic-data-sharing.html) uses the Kafka scripts throughout (such as `kafka-cluster-links --list` to [list mirror topics](/platform/current/multi-dc-deployments/cluster-linking/topic-data-sharing.html#list-mirror-topics) rather than [confluent kafka topic list](/platform/current/command-reference/kafka/topic/confluent_kafka_topic_list.html) or [confluent kafka link list](/platform/current/command-reference/kafka/link/confluent_kafka_link_list.html)). In such scenarios, running the Confluent CLI commands would fail to work if you did not have the REST Proxy server running. (This is also not an issue for the Confluent CLI on Confluent Cloud, which is fully-managed, and integrates with the [Confluent Cloud APIs](https://docs.confluent.io/cloud/current/api.html) under the hood.) ### Prerequisites - [Confluent Platform](/platform/current/installation/installing_cp/index.html) is installed and services are running by using the Confluent CLI [Confluent CLI](https://docs.confluent.io/confluent-cli/current/index.html) commands. - Kafka and Schema Registry are running locally on the default ports. Note that this quick start assumes that you are using the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/index.html) commands; however, [standalone installations](/platform/current/installation/installing_cp/index.html) are also supported. By default ZooKeeper, Kafka, Schema Registry, Kafka Connect REST API, and Kafka Connect are started with the `confluent local start` command. Note that as of Confluent Platform 7.5, ZooKeeper is deprecated for new deployments. Confluent recommends KRaft mode for new deployments. ## Prerequisites - [Confluent Platform](/platform/current/installation/installing_cp/index.html) is installed and services are running by using the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/index.html) commands. This quick start assumes that you are using the Confluent CLI. By default ZooKeeper, Kafka, Schema Registry, Kafka Connect REST API, and Kafka Connect are started with the `confluent local start` command. For more information, see [Confluent Platform](/platform/current/installation/installing_cp/index.html). Note that as of Confluent Platform 7.5, ZooKeeper is deprecated for new deployments. Confluent recommends KRaft mode for new deployments. - [Kudu](https://kudu.apache.org/releases/) and [Impala](https://impala.apache.org/downloads.html) are installed and configured properly ([Using Kudu with Impala](https://kudu.apache.org/docs/kudu_impala_integration.html)). For DECIMAL type support, we need at least Kudu 1.7.0, and Impala 3.0. - Verify that the [Impala JDBC driver](https://www.cloudera.com/downloads/connectors/impala/jdbc/2-6-15.html) is available on the Kafka Connect process’s `CLASSPATH`. - Kafka and Schema Registry are running locally on the default ports. ### Prerequisites - [Confluent Platform](/platform/current/installation/installing_cp/index.html) is installed and services are running by using the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/index.html) commands. This quick start assumes that you are using the Confluent CLI. By default ZooKeeper, Kafka, Schema Registry, Kafka Connect REST API, and Kafka Connect are started with the `confluent local start` command. For more information, see [Confluent Platform](/platform/current/installation/installing_cp/index.html). Note that as of Confluent Platform 7.5, ZooKeeper is deprecated for new deployments. Confluent recommends KRaft mode for new deployments. - [Kudu](https://kudu.apache.org/releases/) and [Impala](https://impala.apache.org/downloads.html) are installed and configured properly ([Using Kudu with Impala](https://kudu.apache.org/docs/kudu_impala_integration.html)). For DECIMAL type support, we need at least Kudu 1.7.0, and Impala 3.0. - Verify that the [Impala JDBC driver](https://www.cloudera.com/downloads/connectors/impala/jdbc/2-6-15.html) is available on the Kafka Connect process’s `CLASSPATH`. - Kafka and Schema Registry are running locally on the default ports. #### IMPORTANT Starting with Confluent Platform version 8.0, ZooKeeper is no longer part of Confluent Platform. * A multi-region cluster deployment across three Kubernetes regions, where each cluster hosts CFK, Kafka brokers, and ZooKeeper servers: ![image](images/co-mrc-3clusters.png) * A multi-region cluster deployment across three Kubernetes regions, where each cluster hosts CFK, ZooKeeper servers, and two clusters host Kafka brokers: ![image](images/co-mrc-2.5clusters.png) You can set up an MRC with the following communication methods among ZooKeeper, Kafka, Connect, and Schema Registry deployed across regions: * Use internal listeners among ZooKeeper, Kafka, Connect, and Schema Registry across regions. You set up a DNS resolution that allows each region in the MRC configuration to be able to resolve internal pods in other regions. Internal listeners are used among the MRC components (ZooKeeper, Kafka, Connect, and Schema Registry). * Use external access among ZooKeeper, Kafka, Connect, Connect, and Schema Registry across regions. Without the required networking configuration, CFK redirects internal communication among the MRC components (ZooKeeper, Kafka, Connect, and Schema Registry) to use endpoints that can be accessed externally by each region. For MRC, if there are other components that depend on Kafka, you need to configure an external listener for Kafka. If you want to reduce the number of load balancers, you can use an alternative way for external access, such as Ingress. The supported security features work in multi-region cluster deployments. For specific configurations, see [Configure Security for Confluent Platform with Confluent for Kubernetes](co-security-overview.md#co-security-overview). ## Prerequisites - Confluent for VS Code: Follow the steps in [Installation](overview.md#vscode-installation) - A running Kafka cluster With Confluent for VS Code, you can connect to any Kafka API-compatible cluster and any Confluent Schema Registry API-compatible server. ## Security considerations If you are running Self-Balancing with security configured, you must [configure authentication for REST endpoints on the brokers](configuration-options.md#sbc-rest-endpoint-configs-secure-setup). Without these configurations in the broker properties files, Control Center will not have access to Self-Balancing in a secure environment. If you are using role-based access control (RBAC), the user interacting with Self-Balancing on Control Center must be have the [RBAC role](../../security/authorization/rbac/rbac-predefined-roles.md#rbac-predefined-roles) `SystemAdmin` on the Kafka cluster to be able to add or remove brokers, and to perform other Self-Balancing related tasks. For information on setting up security on Confluent Platform, see the sections on [authentication methods](../../security/authentication/overview.md#authentication-overview), [role-based access control](../../security/authorization/rbac/overview.md#rbac-overview), [RBAC and ACLs](../../security/authorization/rbac/overview.md#rbac-and-acls), and [Enable Security for a KRaft-Based Cluster in Confluent Platform](../../security/security_tutorial.md#security-tutorial). For information about setting up security on Confluent Platform, see [Security Overview](../../security/overview.md#security), [Enable Security for a KRaft-Based Cluster in Confluent Platform](../../security/security_tutorial.md#security-tutorial), and the overviews on [authentication methods](../../security/authentication/overview.md#authentication-overview). Also, the [Scripted Confluent Platform Demo](../../tutorials/cp-demo/index.md#cp-demo) shows various types of security enabled on an example deployment. ### Self-Balancing options do not show up on Control Center When Self-Balancing Clusters are enabled, status and configuration options are available on Control Center **Cluster Settings** > **Self-balancing** tab. If, instead, this tab displays a message about Confluent Platform version requirements and configuring HTTP servers on the brokers, this indicates something is missing from your configurations or that you are not running the required version of Confluent Platform. Also, if you are running Self-Balancing with security enabled, you may get an error message such as: **Error 504 Gateway Timeout**, which indicates that you also must configure authentication for REST endpoints in your broker files, as described below. **Solution:** Verify that you have the following settings and update your configuration as needed. - In the Kafka broker files, [confluent.balancer.enable](configuration-options.md#sbc-config-enable) must be set to `true` to enable Self-Balancing. - In the Control Center properties file, `confluent.controlcenter.streams.cprest.url` must specify the associated URL for each broker in the cluster as REST endpoints for `controlcenter.cluster`, as described in [Required Configurations for Control Center](configuration-options.md#sbc-configs-c3). - Security is not a requirement for Self-Balancing but if security is enabled, you must also [configure authentication for REST endpoints on the brokers](configuration-options.md#sbc-rest-endpoint-configs-secure-setup). In this case, you would use use `confluent.metadata.server.listeners` (which enables the [Metadata Service](../../kafka/configure-mds/index.md#rbac-mds-config)) instead of `confluent.http.server.listeners` to listen for API requests To learn more, see [Security considerations](#sbc-security-considerations). ## Architecture Kafka Connect has three major models in its design: * **Connector model**: A connector is defined by specifying a `Connector` class and configuration options to control what data is copied and how to format it. Each `Connector` instance is responsible for defining and updating a set of `Tasks` that actually copy the data. Kafka Connect manages the `Tasks`; the `Connector` is only responsible for generating the set of `Tasks` and indicating to the framework when they need to be updated. `Source` and `Sink` `Connectors`/`Tasks` are distinguished in the API to ensure the simplest possible API for both. * **Worker model**: A Kafka Connect cluster consists of a set of `Worker` processes that are containers that execute `Connectors` and `Tasks`. `Workers` automatically coordinate with each other to distribute work and provide scalability and fault tolerance. The `Workers` will distribute work among any available processes, but are not responsible for management of the processes; any process management strategy can be used for `Workers` (e.g. cluster management tools like YARN or Mesos, configuration management tools like Chef or Puppet, or direct management of process lifecycles). * **Data model**: Connectors copy streams of messages from a partitioned input stream to a partitioned output stream, where at least one of the input or output is *always* Kafka. Each of these streams is an ordered set messages where each message has an associated offset. The format and semantics of these offsets are defined by the Connector to support integration with a wide variety of systems; however, to achieve certain delivery semantics in the face of faults requires that offsets are unique within a stream and streams can seek to arbitrary offsets. The message contents are represented by `Connectors` in a serialization-agnostic format, and Kafka Connect supports pluggable `Converters` for storing this data in a variety of serialization formats. Schemas are built-in, allowing important metadata about the format of messages to be propagated through complex data pipelines. However, schema-free data can also be use when a schema is simply unavailable. The connector model addresses three key user requirements. First, Kafka Connect performs **broad copying by default** by having users define jobs at the level of `Connectors` which then break the job into smaller `Tasks`. This two level scheme strongly encourages connectors to use configurations that encourage copying broad swaths of data since they should have enough inputs to break the job into smaller tasks. It also provides one point of **parallelism** by requiring `Connectors` to immediately consider how their job can be broken down into subtasks, and select an appropriate granularity to do so. Finally, by specializing source and sink interfaces, Kafka Connect provides an **accessible connector API** that makes it very easy to implement connectors for a variety of systems. The worker model allows Kafka Connect to **scale to the application**. It can run scaled down to a single worker process that also acts as its own coordinator, or in clustered mode where connectors and tasks are dynamically scheduled on workers. However, it assumes very little about the *process management* of the workers, so it can easily run on a variety of cluster managers or using traditional service supervision. This architecture allows scaling up and down, but Kafka Connect’s implementation also adds utilities to support both modes well. The REST interface for managing and monitoring jobs makes it easy to run Kafka Connect as an organization-wide service that runs jobs for many users. Command line utilities specialized for ad hoc jobs make it easy to get up and running in a development environment, for testing, or in production environments where an agent-based approach is required. The data model addresses the remaining requirements. Many of the benefits come from coupling tightly with Kafka. Kafka serves as a natural buffer for both **streaming and batch** systems, removing much of the burden of managing data and ensuring delivery from connector developers. Additionally, by always requiring Kafka as one of the endpoints, the larger data pipeline can leverage the many tools that integrate well with Kafka. This allows Kafka Connect to **focus only on copying data** because a variety of stream processing tools are available to further process the data, which keeps Kafka Connect simple, both conceptually and in its implementation. This differs greatly from other systems where ETL must occur before hitting a sink. In contrast, Kafka Connect can bookend an ETL process, leaving any transformation to tools specifically designed for that purpose. Finally, Kafka includes partitions in its core abstraction, providing another point of **parallelism**. # Kafka Connectors Self-Managed Connectors for Confluent Platform You can use self-managed Apache Kafka® connectors to move data in and out of Kafka. The self-managed connectors are for use with Confluent Platform. For more information on fully-managed connectors, see [Confluent Cloud](https://docs.confluent.io/cloud/current/connectors/index.html). Popular connectors [![image](connect/images/logo/jdbc.png)](https://docs.confluent.io/kafka-connectors/jdbc/current/) **JDBC Source and Sink** The Kafka Connect JDBC Source connector imports data from any relational database with a JDBC driver into an Kafka topic. The Kafka Connect JDBC Sink connector exports data from Kafka topics to any relational database with a JDBC driver. [![image](connect/images/logo/jms.jpg)](https://docs.confluent.io/kafka-connectors/jms-source/current/overview.html) **JMS Source** The Kafka Connect JMS Source connector is used to move messages from any JMS-compliant broker into Kafka. [![image](connect/images/logo/connect-logo.svg)](https://docs.confluent.io/kafka-connectors/elasticsearch/current/overview.html) **Elasticsearch Service Sink** The Kafka Connect Elasticsearch Service Sink connector moves data from Kafka to Elasticsearch. It writes data from a topic in Kafka to an index in Elasticsearch. [![image](connect/images/logo/s3.png)](https://docs.confluent.io/kafka-connectors/s3-sink/current/overview.html) **Amazon S3 Sink** The Kafka Connect Amazon S3 Sink connector exports data from Kafka topics to S3 objects in either Avro, JSON, or Bytes formats. [![image](connect/images/logo/hdfs.png)](https://docs.confluent.io/kafka-connectors/hdfs/current/overview.html) **HDFS 2 Sink** The Kafka Connect HDFS 2 Sink connector allows you to export data from Apache Kafka topics to HDFS 2.x files in a variety of formats. The connector integrates with Hive to make data immediately available for querying with HiveQL. [![image](connect/images/logo/replicator.png)](https://docs.confluent.io/platform/current/multi-dc-deployments/replicator/) **Replicator** Replicator allows you to easily and reliably replicate topics from one Kafka cluster to another. Managing connectors Supported connectors Confluent supports many self-managed connectors that import and export data from some of the most commonly used data systems. Practically all connectors are available from Confluent Hub. Preview connectors Confluent introduces preview connectors to gain early feedback from users. Preview connectors are only suitable for evaluation and non-production purposes. Installing connectors Install the connectors by using the Confluent Hub client (recommended) or manually install by downloading the plugin file. Configuring connectors Connector configurations are key-value mappings. In distributed mode, they are included in the JSON payload sent over the REST API request that creates (or modifies) the connector. Licensing connectors With a Developer License, you can use Confluent Platform commercial connectors on an unlimited basis in Connect clusters that use a single-broker Apache Kafka cluster. A 30-day trial period is available when using a multi-broker cluster. Monitoring connectors You can manage and monitor Connect, connectors, and clients using JMX and the REST interface. Adding connectors or software The Kafka Connect Base image contains Kafka Connect and all of its dependencies. To add new connectors to this image, you need to build a new Docker image that has the new connectors installed. Upgrading a connector plugin Upgrading a connector is similar to upgrading any other Apache Kafka client application. Refer to the documentation for individual connector plugins if you have a need for rolling upgrades. Manually installing Community connectors If a connector is not available on Confluent Hub, you can use the JARs to directly install the connectors into your Apache Kafka installation. Kafka Connect Kafka Connect, an open source component of Apache Kafka, is a framework for connecting Kafka with external systems such as databases, key-value stores, search indexes, and file systems. All connectors ### Configure CSFLE without sharing KEK If you do not want share your Key Encryption Key (KEK) with Confluent, follow the steps below: - Define the [schema for the topic](https://docs.confluent.io/cloud/current/sr/schemas-manage.html#cloud-schema-create) and add [tags](https://docs.confluent.io/cloud/current/sr/schemas-manage.html#cloud-schema-tagging) to the fields in the schema that you want to encrypt. - Create [encryption keys](https://docs.confluent.io/cloud/current/security/encrypt/csfle/manage-csfle.html#add-encryption-key-csfle) for each KMS. - Add [encryption rules](https://docs.confluent.io/cloud/current/security/encrypt/csfle/manage-csfle.html#add-encryption-rule-csfle) that specify the encryption key you want to use to encrypt the tags. - Grant DeveloperWrite permission for encryption key and DeveloperRead permission for the Schema Registry API keys. - Add the following parameters in the connector configuration: ### AWS CSFLE Rule Executor For AWS, pass the following configuration parameters: | Parameter | Description | |------------------------------------------------------|--------------------------------| | `rule.executors._default_.param.access.key.id=?` | The AWS access key identifier. | | `rule.executors._default_.param.secret.access.key=?` | The AWS secret access key. | ### Azure CSFLE Rule Executor For Azure, pass the following configuration parameters: | Parameter | Description | |------------------------------------------------|------------------------------| | `rule.executors._default_.param.tenant.id` | The Azure tenant identifier. | | `rule.executors._default_.param.client.id` | The Azure client identifier. | | `rule.executors._default_.param.client.secret` | The Azure client secret. | ### Google Cloud CSFLE Rule Executor For Google Cloud, pass the following configuration parameters: | Parameter | Description | |-------------------------------------------------|--------------------------------------------------------| | `rule.executors._default_.param.account.type` | This parameter contains the Google Cloud account type. | | `rule.executors._default_.param.client.id` | The Google Cloud client identifier. | | `rule.executors._default_.param.client.email` | The Google Cloud client email address. | | `rule.executors._default_.param.private.key.id` | The Google Cloud private key identifier. | | `rule.executors._default_.param.private.key` | The Google Cloud private key. | ### HashiCorp Vault CSFLE Rule Executor For HashiCorp Vault, pass the following configuration parameters: | Parameter | Description | |--------------------------------------------|----------------------------------------------------------| | `rule.executors._default_.param.token.id` | The token identifier for HashiCorp Vault. | | `rule.executors._default_.param.namespace` | The namespace for HashiCorp Vault Enterprise (optional). | For more information, see [CSFLE without sharing access to your Key Encryption Keys (KEKs)](https://docs.confluent.io/cloud/current/security/encrypt/csfle/overview.html#csfle-with-shared-confluent-access-to-kek) . ## Prerequisites - You must [download](https://www.confluent.io/download/#confluent-platform) self-managed Confluent Platform for your environment. - If your environment already includes, or will include, Active Directory (LDAP service), it must be configured as well. The configurations on this page are based on Microsoft Active Directory (AD). You must update these configurations to match your LDAP service. Nested LDAP groups are not supported. - Brokers running MDS must be configured with a separate listener for inter-broker communication. If required, you can configure these users as `super.users`, but they cannot rely on access to resources using role-based or group-based access. The broker user must be configured as a super user or granted access using [ACLs](../../security/authorization/acls/overview.md#kafka-authorization). - Brokers will accept requests on the inter-broker listener port before the metadata for RBAC authorization has been initialized. However, requests on other ports are only accepted after the required metadata has been initialized, including any available LDAP metadata. Broker initialization only completes after all relevant metadata has been obtained and cached. When starting multiple brokers in an MDS cluster with a replication factor of 3 (default) for a metadata topic, at least three brokers must also be started simultaneously to enable initialization to complete on the brokers. Note that there is a timeout/retry limitation for this initialization, which you can specify in `confluent.authorizer.init.timeout.ms`. For details, refer to [Configure Confluent Server Authorizer in Confluent Platform](../../security/csa-introduction.md#confluent-server-authorizer). - REST Proxy services that integrate with AD/LDAP using MDS will use the user login name as the user principal for authorization decisions. By default, this is also the principal used by brokers for users authenticating using SASL/GSSAPI (Kerberos). If your broker configuration overrides `principal.builder.class` or `sasl.kerberos.principal.to.local.rules` to create a different principal, the user principal used by brokers may be different from the principal used by other Confluent Platform components. In this case, you should configure ACLs and role bindings for your customized principal for broker resources. ## Configuration Options for TLS Encryption between REST Proxy and Apache Kafka Brokers Note that all the TLS configurations (for REST Proxy to Broker communication) are prefixed with `client.`. If you want the configuration to apply just to admins, consumers or producers, you can replace the prefix with `admin.`, `consumer.` or `producer.` respectively. In addition to these configurations, make sure `bootstrap.servers` configuration is set with SSL://host:port end-points, or you’ll accidentally open a TLS connection to a non-TLS port. Keep in mind that authenticated and encrypted connection to Kafka Brokers will only work when Kafka is running with appropriate security configuration. For details, see [Kafka Security](../../../security/overview.md#security). `client.security.protocol` : Protocol used to communicate with brokers. Valid values are: PLAINTEXT, SSL, SASL_PLAINTEXT, SASL_SSL. * Type: string * Default: PLAINTEXT * Importance: high `client.ssl.key.password` : The password of the private key in the key store file. This is optional for client. * Type: password * Default: null * Importance: high `client.ssl.keystore.location` : The location of the key store file. This is optional for client and can be used for two-way client authentication. * Type: string * Default: null * Importance: high `client.ssl.keystore.password` : The store password for the key store file. This is optional for client and only needed if ssl.keystore.location is configured. * Type: password * Default: null * Importance: high `client.ssl.truststore.location` : The location of the trust store file. * Type: string * Default: null * Importance: high `client.ssl.truststore.password` : The password for the trust store file. * Type: string * Default: null * Importance: high `client.ssl.enabled.protocols` : The comma-separated list of protocols enabled for TLS connections. The default value is `TLSv1.2,TLSv1.3` when running with Java 11 or later, `TLSv1.2` otherwise. With the default value for Java 11 (`TLSv1.2,TLSv1.3`), Kafka clients and brokers prefer TLSv1.3 if both support it, and falls back to TLSv1.2 otherwise (assuming both support at least TLSv1.2). * Type: list * Default: `TLSv1.2,TLSv1.3` * Importance: medium `client.ssl.keystore.type` : The file format of the key store file. This is optional for client. * Type: string * Default: JKS * Importance: medium `client.ssl.protocol` : The TLS protocol used to generate the SSLContext. The default is `TLSv1.3` when running with Java 11 or newer, `TLSv1.2` otherwise. This value should be fine for most use cases. Allowed values in recent JVMs are `TLSv1.2` and `TLSv1.3`. `TLS`, `TLSv1.1`, `SSL`, `SSLv2` and `SSLv3` might be supported in older JVMs, but their usage is discouraged due to known security vulnerabilities. With the default value for this configuration and `ssl.enabled.protocols`, clients downgrade to `TLSv1.2` if the server does not support `TLSv1.3`. If this configuration is set to `TLSv1.2`, clients do not use `TLSv1.3`, even if it is one of the values in `ssl.enabled.protocols` and the server only supports `TLSv1.3`. * Type: string * Default: `TLSv1.3` * Importance: medium `client.ssl.provider` : The name of the security provider used for TLS connections. Default value is the default security provider of the JVM. * Type: string * Default: null * Importance: medium `client.ssl.truststore.type` : The file format of the trust store file. * Type: string * Default: JKS * Importance: medium `client.ssl.cipher.suites` : A list of cipher suites. This is a named combination of authentication, encryption, MAC, and key exchange algorithms used to negotiate the security settings for a network connection using the TLS network protocol. By default, all the available cipher suites are supported. * Type: list * Default: null * Importance: low `client.ssl.endpoint.identification.algorithm` : The endpoint identification algorithm to validate server hostname using server certificate. * Type: string * Default: null * Importance: low `client.ssl.keymanager.algorithm` : The algorithm used by key manager factory for TLS connections. Default value is the key manager factory algorithm configured for the Java Virtual Machine. * Type: string * Default: SunX509 * Importance: low `client.ssl.secure.random.implementation` : The SecureRandom PRNG implementation to use for TLS cryptography operations. * Type: string * Default: null * Importance: low `client.ssl.trustmanager.algorithm` : The algorithm used by trust manager factory for TLS connections. Default value is the trust manager factory algorithm configured for the Java Virtual Machine. * Type: string * Default: PKIX * Importance: low ### RBAC REST Proxy workflow Here is a summary of the RBAC REST Proxy security workflow: 1. A user makes REST API call to REST Proxy using LDAP credentials for HTTP Basic Authentication. 2. REST Proxy authenticates the user with the MDS by acquiring a token for the authenticated user. 3. The generated token is used to impersonate the user request and authenticate between Kafka clients and the Kafka cluster. For Kafka clients, the `SASL_PLAINTEXT`/`SASL_SSL` security protocol is used and the proprietary callback handler passes the token to the Kafka cluster. Similarly, when communicating with Schema Registry, the authentication token is passed to the Schema Registry client using a proprietary implementation of the `BearerAuthCredentialProvider` interface. 4. If the user does not have the requisite role or ACL permission for the requested resource (for example, topic, group, or cluster), then the REST API call fails and returns an error with the HTTP 403 status code. ![image](images/rbac-rest-proxy-security.png) ## Securing interactive deployments Securing the interactive ksqlDB installation involves securing the HTTP endpoints that the ksqlDB server is listening on. As well as accepting connections and requests from clients, a multi-node ksqlDB cluster also requires inter-node communications. You can choose to configure the external client and internal inter-node communication separately or over a single listener: - [Securing single listener setup](#ksqldb-installation-security-securing-single-listener): Ideal for single-node installations, or where the inter-node communication is over the same network interface as client communication. - [Securing dual listener setup](#ksqldb-installation-security-securing-dual-listener): Useful where inter-node communication is over a different network interface or requires different authentication or encryption configuration. ## Securing dual-listener setup Using dual listeners for ksqlDB is appropriate when the client and inter-node communication utilize different authentication and security configurations. This is most likely the case when ksqlDB is deployed as an IaaS service. The supported setups are SSL-mutual auth for the internal communication combined with SSL encryption and authentication for the external client: - [Configuring internal for SSL-mutual authentication](#ksqldb-installation-security-configuring-internal-for-ssl-mutual-authentication): Creates secure and authenticated connections for inter-node communication, but leaves the external client API unsecured. This is most appropriate when clients are trusted, but the internal APIs are protected from use. - [Configuring internal for SSL-mutual authentication and external for HTTP-BASIC authentication](#ksqldb-installation-security-configuring-internal-for-ssl-mutual-and-external-for-http-basic): Creates secure and authenticated connections for inter-node communication and uses basic authentication for the external client API. This is most likely to be used with SSL above. - [Configuring internal for SSL-mutual authentication and external for SSL encryption](#ksqldb-installation-security-configuring-internal-for-ssl-mutual-external-for-ssl-encryption): Creates secure and authenticated connections for inter-node communication and uses SSL for the external client API. This is most likely to be used with authentication below. ### Configuration Options These properties are available to specify for the cluster link. If you disable a feature that has filters (ACL sync, consumer offset sync, auto create mirror topics) after having it enabled initially, then any existing filters will be cleared (deleted) from the cluster link. `acl.filters` : JSON string that lists the ACLs to migrate. Define the ACLs in a file, `acl.filters.json`, and pass the file name as an argument to `--acl-filters-json-file`. See [Migrating ACLs from Source to Destination Cluster](security.md#cluster-link-acls-migrate) for examples of how to define the ACLs in the JSON file. * Type: string * Default: “” #### NOTE Populate `acl.filters` by passing a JSON file on the command line that specifies the ACLs as described in [Migrating ACLs from Source to Destination Cluster](security.md#cluster-link-acls-migrate). `acl.sync.enable` : Whether or not to migrate ACLs. To learn more, see [Migrating ACLs from Source to Destination Cluster](security.md#cluster-link-acls-migrate). * Type: boolean * Default: false `acl.sync.ms` : How often to refresh the ACLs, in milliseconds (if ACL migration is enabled). The default is 5000 milliseconds (5 seconds). * Type: int * Default: 5000 `auto.create.mirror.topics.enable` : Whether or not to auto-create mirror topics based on topics on the source cluster. When set to “true”, mirror topics will be auto-created. Setting this option to “false” disables mirror topic creation and clears any existing filters. For details on this option, see [auto-create mirror topics](/platform/current/multi-dc-deployments/cluster-linking/mirror-topics-cp.html#auto-create-mirror-topics). `auto.create.mirror.topics.filters` : A JSON object with one property, `topicFilters`, that contains an array of filters to apply to indicate which topics should be mirrored. For details on this option, see [auto-create mirror topics](/platform/current/multi-dc-deployments/cluster-linking/mirror-topics-cp.html#auto-create-mirror-topics). `cluster.link.prefix` : A prefix that is applied to the names of the mirror topics. The same prefix is applied to consumer groups when [consumer.group.prefix.enable](#consumer-group-prefix) is set to `true`. To learn more, see “Prefixing Mirror Topics and Consumer Group Names” in [Mirror Topics](mirror-topics-cp.md#mirror-topics-concepts). #### NOTE The prefix cannot be changed after the cluster link is created. * Type: string * Default: null `cluster.link.paused` : Whether or not the cluster link is running or paused. The default is false. * Type: boolean * Default: false `cluster.link.retry.timeout.ms` : The number of milliseconds after which failures are no longer retried and partitions are marked as failed. If the source topic is deleted and re-created within this timeout, the link may contain records from the old as well as the new topic. * Type: int * Default: 300000 (5 minutes) `availability.check.ms` : How often the cluster link checks to see if the source cluster is available. The frequency with which the cluster link checks is specified in milliseconds. * Type: int * Default: 60000 (1 minute) A cluster link regularly checks whether the source cluster is still available for mirroring data by performing a `DescribeCluster` operation (bounded by `default.api.timeout.ms`). If the source cluster becomes unavailable (for example, because of an outage or disaster), then the cluster link signals this by updating its status and the status of its mirror topics. `availability.check.ms` works in tandem [availability.check.consecutive.failure.threshold](#cluster-link-availability-check-consecutive-failure-threshold). `availability.check.consecutive.failure.threshold` : The number of consecutive failed availability checks the source cluster is allowed before the cluster link status becomes `SOURCE_UNAVAILABLE`. * Type: int * Default: 5 If, for example, the default (5) is used, the source cluster is determined to be unavailable after 5 failed checks in a row. If [availability.check.ms](#cluster-link-availability-check-ms) and `default.api.timeout.ms` are also set to their defaults of 1 minute and there are 5 failed checks, then the cluster link will show as `SOURCE_UNAVAILABLE` after 5 \* (1+1) mins = 10 minutes. Note that this reflects that source unavailability is detected after `availability.check.consecutive.failure.threshold` \* (`default.api.timeout.ms` + `availability.check.ms`), taking into account the `DescribeCluster` operation performed as a part of [availability.check.ms](#cluster-link-availability-check-ms). `connections.max.idle.ms` : Idle connections timeout. The server socket processor threads close any connections that idle longer than this. * Type: int * Default: 600000 `connection.mode` : Used only for source-initiated links. Set this to INBOUND on the destination cluster’s link (which you create first). Set this to OUTBOUND on the source cluster’s link (which you create second). You must use this in combination with `link.mode`. This property should only be set for source-initiated cluster links. * Type: string * Default: OUTBOUND `consumer.offset.group.filters` : JSON to denote the list of consumer groups to be migrated. To learn more, see [Migrating consumer groups from source to destination cluster](commands.md#cluster-link-migrate-consumer-groups). * Type: string * Default: “” #### NOTE Consumer group filters should only include groups that are not being used on the destination. This will help ensure that the system does not override offsets committed by other consumers on the destination. The system attempts to work around filters containing groups that are also used on the destination, but in these cases there are no guarantees; offsets may be overwritten. For mirror topic “promotion” to work, the system must be able to roll back offsets, which cannot be done if the group is being used by destination consumers. `consumer.offset.sync.enable` : Whether or not to migrate consumer offsets from the source cluster. If you set this up and run Cluster Linking, then later disable it, the filters will be cleared (deleted) from the cluster link. * Type: boolean * Default: false `consumer.offset.sync.ms` : How often to sync consumer offsets, in milliseconds, if enabled. * Type: int * Default: 30000 `consumer.group.prefix.enable` : When set to `true`, the prefix specified for the [cluster link prefix](#cluster-link-prefix) is also applied to the names of consumer groups. The cluster link prefix must be specified in order for the consumer group prefix to be applied. To learn more, see “Prefixing Mirror Topics and Consumer Group Names” in [Mirror Topics](mirror-topics-cp.md#mirror-topics-concepts). * Type: boolean * Default: false #### NOTE Consumer group prefixing cannot be enabled for bidirectional links. `num.cluster.link.fetchers` : Number of fetcher threads used to replicate messages from source brokers in cluster links. * Type: int * Default: 1 `topic.config.sync.ms` : How often to refresh the topic configs, in milliseconds. * Type: int * Default: 5000 `topic.config.sync.include` : The list of topic configs to sync from the source topic. By default, certain topic configurations are synced from the source topic to the mirror topic to ensure consistency. This parameter allows you to explicitly specify which topic configurations should be synced, giving you control over which properties are copied from source to destination. To learn more, see [Override default syncing to specify independent mirror topic behavior](mirror-topics-cp.md#override-default-mirror-topic-syncs-cp) in [Mirror Topics](mirror-topics-cp.md#mirror-topics-concepts). * Type: string * Default: (all default sync configs are included) `link.fetcher.flow.control` : Maximum lag between high watermark and log end offset after which Cluster Linking will stop fetching. This is to synchronize the Cluster Linking fetch rate and the in-sync replica (ISR) fetch rate to avoid being under the minimum ISR. Setting this value specifies the flow control approach. * Type: int * Default: 0 The following values for this configuration option apply to the approach: - `>=0`: Lag approach. - `-1`: Under min ISR approach. `-1` means the maximum lag is not enforced. Cluster Linking fetch will stop when the partition is under the minimum ISR. - `-2`: Under-replicated partition approach. `-2` specifies that Cluster Linking fetch will stop when the partition is under-replicated. If a broker goes down on the destination cluster due to an outage or planned failover (for example, proactively shutting down a broker), mirror topics will lag source topics on under-replicated partitions at the destination. To minimize or resolve mirror topic lag in these scenarios, set `link.fetcher.flow.control=-1`. `local.listener.name` : For a source initiated link, an alternative listener to be used by the cluster link on the source cluster. For more, see [Understanding Listeners in Cluster Linking](#cluster-link-listeners) `link.mode` : Used only for source-initiated links. Set this to DESTINATION on the destination cluster’s link (which you create first). Set this to SOURCE on the source cluster’s link (which you create second). For [bidirectional mode](#bidirectional-cluster-linking), set this to BIDIRECTIONAL on both clusters. You must use this in combination with `connection.mode`. This property should only be set for source-initiated cluster links. * Type: string * Default: DESTINATION `mirror.start.offset.spec` : Whether to get the full history of a mirrored topic (`earliest`), exclude the history and get only the `latest` version, or to get the history of the topic starting at a given timestamp. * Type: string * Default: earliest - If set to a value of `earliest` (the default), new mirror topics get the full history of their associated topics. - If set to a value of `latest`, new mirror topics will exclude the history and only replicate messages sent after the mirror topic is created. - If set to a timestamp in ISO 8601 format (`YYYY-MM-DDTHH:mm:SS.sss`), new mirror topics get the history of the topics starting from the timestamp. When a mirror topic is created, it reads the value of this configuration and begins replication accordingly. If the setting is changed, it does not affect existing mirror topics; new mirror topics use the new value when they’re created. If some mirror topics need to start from earliest and some need to start from latest, there are two options: - Change the value of the cluster link’s `mirror.start.offset.spec` to the desired starting position before creating the mirror topic, or - Use two distinct cluster links, each with their own value for `mirror.start.offset.spec`, and create mirror topics on the appropriate cluster link as desired. `topic.config.sync.ms` : How often to refresh the topic configs, in milliseconds. * Type: int * Default: 5000 #### Default security config for bidirectional connectivity By default, a cluster link in bidirectional mode is configured similar to the default configuration for two cluster links. ![image](multi-dc-deployments/cluster-linking/images/cluster-link-bidirectional-security.png) Each cluster requires: - The ability to connect (outbound) to the other cluster. (If this is not possible, see [Advanced options for bidirectional Cluster Linking](#cluster-linking-bidirectional-advanced).) - A user to create a cluster link object on it with: - An authentication configuration (such as API key or OAuth) for a principal on its remote cluster with ACLs or RBAC role bindings giving permission to read topic data and metadata. - The Describe:Cluster ACL - The `DescribeConfigs:Cluster` ACL if consumer offset sync is enabled (which is recommended) - The required ACLs or RBAC role bindings for a cluster link, as described in [Authorization (ACLs)](security.md#cluster-link-acls) (the rows for a cluster link on a source cluster). - `link.mode=BIDIRECTIONAL` ### ACLs Overview Replicator supports communication with secure Kafka over TLS/SSL for both the source and destination clusters. Replicator also supports TLS/SSL or SASL for authentication. Differing security configurations can be used on the source and destination clusters. All properties documented here are additive (i.e. you can apply both TLS/SSL Encryption and SASL Plain authentication properties) except for `security.protocol`. The following table can be used to determine the correct value for this: | Encryption | Authentication | security.protocol | |--------------|------------------|---------------------| | TLS/SSL | None | SSL | | TLS/SSL | TLS/SSL | SSL | | TLS/SSL | SASL | SASL_SSL | | Plaintext | SASL | SASL_PLAINTEXT | You can configure Replicator connections to source and destination Kafka with: - [TLS/SSL Encryption](../../security/protect-data/encrypt-tls.md#encryption-ssl-replicator). You can use different TLS/SSL configurations on the source and destination clusters. - [SSL Authentication](../../security/authentication/mutual-tls/overview.md#authentication-ssl-replicator) - [SASL/SCRAM](../../security/authentication/sasl/scram/overview.md#sasl-scram-replicator) - [SASL/GSSAPI](../../security/authentication/sasl/gssapi/overview.md#sasl-gssapi-replicator) - [SASL/PLAIN](../../security/authentication/sasl/plain/overview.md#sasl-plain-replicator) To configure security on the source cluster, see the connector configurations for [Source Kafka: Security](configuration_options.md#source-security-config). To configure security on the destination cluster, see the connector configurations [Destination Kafka: Security](configuration_options.md#destination-security-config) and the general security configuration for Connect workers [here](../../connect/security.md#connect-security). #### IMPORTANT - For current versions of Replicator, it is recommended that you use the previously mentioned JMX metrics to monitor Replicator lag as it is more accurate than using the consumer group lag tool. The following methodology to monitor Replicator lag is only recommended if you are using Replicator with a legacy version below 5.4.0. - Replicator latency is calculated by taking the timestamp of the record consumed on the source and subtracting that from when the message offset to the destination is flushed. If old records are processed or if the time setting on the source records is not the same for producers and consumers, then the metric will spike. This is misleading and should not be construed as a latency issue, but rather is a limitation of this type of metrics calculation. You can monitor Replicator lag by using the [Consumer Group Command tool](https://kafka.apache.org/documentation/#basic_ops_consumer_lag) (`kafka-consumer-groups`). To use this functionality, you must set the Replicator `offset.topic.commit` config to `true` (the default value). Replicator does not consume using a consumer group, instead it manually assigns partitions. When `offset.topic.commit` is true, Replicator commits consumer offsets (again manually), but these are for reference only and do not represent an active consumer group. Since Replicator only commits offsets and does not actually form a consumer group, the `kafka-consumer-groups` command output will show no active members in the group (correctly); only the committed offsets. This is expected behavior for Replicator. To check membership information, use Connect status endpoints rather than `kafka-consumer-groups`. Replication lag is the number of messages that were produced to the origin cluster, but have not yet arrived to the destination cluster. It can also be measured as the amount of time it currently takes for a message to get replicated from origin to destination. Note that this can be higher than the latency between the two datacenters if Replicator is behind for some reason and needs time to catch up. The main reasons to monitor replication lag are: * If there is a need to failover from origin to destination and *if the origin cannot be restored*, all events that were produced to origin and not replicated to the target will be lost. (If the origin *can* be restored, the events will not be lost.) * Any event processing that happens at the destination will be delayed by the lag. The lag is typically just a few hundred milliseconds (depending on the network latency between the two datacenters), but it can grow larger if network partitions or configuration changes temporarily pause replication and the replicator needs to catch up. If the replication lag keeps growing, it indicates that Replicator throughput is lower than what gets produced to the origin cluster and that additional Replicator tasks or Connect Workers are necessary. For example, if producers are writing 100 MBps to the origin cluster, but the Replicator only replicates 50 MBps. To increase the throughput, the TCP socket buffer should be increased on the Replicator and the brokers. When Replicator is running in the destination cluster (recommended), you must also increase the following: - The TCP send socket buffer (`socket.send.buffer.bytes`) on the source cluster brokers. - The receive TCP socket buffer (`socket.receive.buffer.bytes`) on the consumers. A value of 512 KB is reasonable but you may want to experiment with values up to 12 MB. If you are using Linux, you might need to change the default socket buffer maximum for the Kafka settings to take effect. For more information about tuning your buffers, [see this article](https://www.cyberciti.biz/faq/linux-tcp-tuning/). ## Unified Stream Manager Unified Stream Manager (USM) is now generally available with Confluent Platform 8.1. USM registers your on-premises Confluent Platform cluster with Confluent Cloud to provide a single pane of glass for your data streams. With USM, you can do the following: - Use a global policy catalog to enforce data contracts and encryption rules. - View unified data lineage across all your clusters and infrastructures. - View and troubleshoot topics and connectors across Confluent Platform and Confluent Cloud from a single, centralized interface. This release introduces the USM agent component to enable a secure, private network connection to Confluent Cloud: - All communication occurs over private networking. - The agent initiates connections from your private environment over a limited set of endpoints. - Only telemetry and resource metadata are shared with Confluent Cloud. Kafka brokers and Connect workers are included to update embedded reporters to send the necessary telemetry and resource metadata. To support the global policy catalog, Confluent Platform Schema Registry provides a read-only mode and a Schema Importer. With these features, Confluent Cloud Schema Registry can serve as the source of truth, while the on-premises Confluent Platform Schema Registry acts as a read-only cache and forwards all write requests to Confluent Cloud. USM is designed to share only telemetry and metadata between Confluent Platform and Confluent Cloud. This limited data sharing lets you adopt USM without needing to accept the full Confluent Cloud data processing addendum. For more information about USM, see [Unified Stream Manager in Confluent Platform](../usm/overview.md#usm-overview). ### What is the Schema Registry endpoint and how is it surfaced on Confluent Cloud Console? The Schema Registry endpoint (also referred to as `schema-registry-url`) is the API endpoint URL for your Confluent Cloud Schema Registry cluster in a specific environment. It’s used to make REST API calls to the Schema Registry service for operations such as: - Creating, reading, updating, and deleting schemas - Managing schema subjects and versions - Configuring compatibility settings - Performing schema validation and evolution operations This is surfaced on Confluent Cloud Console as the Schema Registry endpoint. To find it, navigate to your Confluent Cloud environment, select a cluster, click **Schema Registry** on the left menu, and click the **Endpoints** tab. To find the Schema Registry endpoints using [Confluent CLI Command reference](https://docs.confluent.io/confluent-cli/current/command-reference/overview.html), run `confluent schema-registry cluster describe` (after selecting the appropriate environment with `confluent environment use `). On the Confluent CLI, the flag for the Schema Registry URL is `--schema-registry-endpoint`, as described in [confluent schema-registry cluster describe](https://docs.confluent.io/confluent-cli/current/command-reference/schema-registry/cluster/confluent_schema-registry_cluster_describe.html). You can also list endpoints using [confluent schema-registry endpoint list](https://docs.confluent.io/confluent-cli/current/command-reference/schema-registry/endpoint/confluent_schema-registry_endpoint_list.html). Note that in current versions, `confluent schema-registry cluster describe` returns only the PrivateLink Attachment private endpoints, whereas `confluent schema-registry endpoint list` lists all endpoints including the Confluent Cloud network. To learn more about working with schemas on the Cloud Console, see the [Schema Management Quick Start](/cloud/current/get-started/schema-registry.html) and [Manage Schemas on Confluent Cloud](/cloud/current/sr/schemas-manage.html). To learn about working with Schema Registry endpoints using the APIs, see the [Stream Catalog REST API Usage and Examples Guide](/cloud/current/stream-governance/stream-catalog-rest-apis.html) and [Confluent Cloud Schema Registry REST API Usage Examples](/cloud/current/sr/sr-rest-apis.html). ## Compatibility and schema evolution Apache Kafka® producers write data to Kafka topics and Kafka consumers read data from Kafka topics. There is an implicit “contract” that producers write data with a schema that can be read by consumers, even as producers and consumers evolve their schemas. Schema Registry helps ensure that this contract is met with compatibility checks. It is useful to think about schemas as APIs. Applications depend on APIs and expect any changes made to APIs are still compatible and applications can still run. Similarly, streaming applications depend on schemas and expect any changes made to schemas are still compatible and they can still run. Schema evolution requires compatibility checks to ensure that the producer-consumer contract is not broken. This is where Schema Registry helps: it provides centralized schema management and compatibility checks as schemas evolve. To learn more about how Schema Registry manages compatibility, see the following topic in either the Confluent Cloud or Confluent Platform documentation: - Confluent Cloud documentation: [Schema Evolution and Compatibility for Schema Registry](/cloud/current/sr/fundamentals/schema-evolution.html) - Confluent Platform documentation: [Schema Evolution and Compatibility for Schema Registry](/platform/current/schema-registry/fundamentals/schema-evolution.html) ## Limitations - Currently, when using Confluent Replicator to migrate schemas, Confluent Cloud is not supported as the source cluster. Confluent Cloud can only be the destination cluster. As an alternative, you can migrate schemas using the [REST API for Schema Registry](../develop/api.md#schemaregistry-api) to achieve the desired deployments. Specifics regarding Confluent Cloud limits on schemas and managing storage space are described in the APIs reference in [Manage Schemas in Confluent Cloud](/cloud/current/sr/index.html). - Replicator does not support an “active-active” Schema Registry setup. It only supports migration (either one-time or continuous) from an active Schema Registry to a passive Schema Registry. - Newer versions of Replicator cannot be used to replicate data from early version Kafka clusters to [Confluent Cloud](/cloud/current/index.html). Specifically, Replicator version 5.4.0 or later cannot be used to replicate data from clusters Apache Kafka® v0.10.2 or earlier nor from Confluent Platform v3.2.0 or earlier, to Confluent Cloud. If you have clusters on these earlier versions, use Replicator 5.0.x to replicate to Confluent Cloud until you can upgrade. Keep in mind the following, and plan your upgrades accordingly: - Kafka Connect workers included in Confluent Platform 3.2 and later are compatible with any Kafka broker that is included in Confluent Platform 3.0 and later as documented in [Cross-component compatibility](../../installation/versions-interoperability.md#cross-component-compatibility). - Confluent Platform 5.0.x has an end-of-support date of July 31, 2020 as documented in [Supported Versions and Interoperability for Confluent Platform](../../installation/versions-interoperability.md#interoperability-versions). ### Contexts and exporters Schema Registry introduces two new concepts to support Schema Linking: - **Contexts** - A [context](#schema-contexts) represents an independent scope in Schema Registry, and can be used to create any number of separate “sub-registries” within one Schema Registry cluster. Each schema context is an independent grouping of schema IDs and subject names, allowing the same schema ID in different contexts to represent completely different schemas. Any schema ID or subject name without an explicit context lives in the default context, denoted by a single dot `.`. An explicit context starts with a dot and can contain any parts separated by additional dots, such as `.mycontext.subcontext`. Context names operate similar to absolute Unix paths, but with dots instead of forward slashes (the default schema is like the root Unix path). However, there is no relationship between two contexts that share a prefix. - **Exporters** - A [schema exporter](#schema-exporters) is a component that resides in Schema Registry for exporting schemas from one Schema Registry cluster to another. The lifecycle of a schema exporter is managed through APIs, which are used to create, pause, resume, and destroy a schema exporter. A schema exporter is like a “mini-connector” that can perform change data capture for schemas. The Quick Start below shows you how to get started using schema exporters and contexts for Schema Linking. For in-depth descriptions of these concepts, see [Contexts](#schema-contexts) and [Exporters](#schema-exporters) # Authentication in Confluent Platform * [Overview](overview.md) * [Mutual TLS](mutual-tls/index.md) * [Overview](mutual-tls/overview.md) * [Use Principal Mapping](mutual-tls/tls-principal-mapping.md) * [OAuth/OIDC](oauth-oidc/index.md) * [Overview](oauth-oidc/overview.md) * [Claim Validation for OAuth JWT tokens](oauth-oidc/configure-oauth-jwt.md) * [OAuth/OIDC Service-to-Service Authentication](oauth-oidc/service-to-service.md) * [Configure Confluent Server Brokers](oauth-oidc/configure-cs.md) * [Configure Confluent Schema Registry](oauth-oidc/configure-sr.md) * [Configure Metadata Service](oauth-oidc/configure-mds.md) * [Configure Kafka Connect](oauth-oidc/configure-connect.md) * [Configure Confluent Control Center](oauth-oidc/configure-c3.md) * [Configure REST Proxy](oauth-oidc/configure-rest-proxy.md) * [Configure Truststores for TLS Handshake with Identity Providers](oauth-oidc/configure-truststore.md) * [Migrate from mTLS to OAuth Authentication](oauth-oidc/migrate-from-mtls-to-oauth.md) * [Use OAuth with ksqlDB](oauth-oidc/ksql-integration.md) * [Multi-Protocol Authentication](multi-protocol/index.md) * [Overview](multi-protocol/overview.md) * [Use AuthenticationHandler Class](multi-protocol/authenticationhandler.md) * [REST Proxy](rest-proxy/index.md) * [Overview](rest-proxy/overview.md) * [Principal Propagation for mTLS](rest-proxy/principal-propagation.md) * [SSO for Confluent Control Center](sso-for-c3/index.md) * [Overview](sso-for-c3/overview.md) * [Configure OIDC SSO for Control Center](sso-for-c3/configure-sso-using-oidc.md) * [Configure OIDC SSO for Confluent CLI](sso-for-c3/configure-sso-for-cli.md) * [Troubleshoot](sso-for-c3/troubleshoot.md) * [HTTP Basic Authentication](http-basic-auth/index.md) * [Overview](http-basic-auth/overview.md) * [SASL](sasl/index.md) * [Overview](sasl/overview.md) * [SASL/GSSAPI (Kerberos)](sasl/gssapi/index.md) * [SASL/OAUTHBEARER](sasl/oauthbearer/index.md) * [SASL/PLAIN](sasl/plain/index.md) * [SASL/SCRAM](sasl/scram/index.md) * [LDAP](ldap/index.md) * [Overview](ldap/overview.md) * [Configure Kafka Clients](ldap/client-authentication-ldap.md) * [Delegation Tokens](delegation-tokens/index.md) * [Overview](delegation-tokens/overview.md) # Confluent Metadata API Reference for Confluent Platform The Confluent Metadata API has many endpoints, conceptually grouped as follows: **Authentication** Authenticates users against LDAP and returns user bearer tokens that can be used with the other MDS endpoints and components in Confluent Platform (when configured to do so). **Authorization** Authorizes users to perform specific actions. Clients are not expected to use these endpoints, which are used by Confluent Platform components (such as Connect and ksqlDB) to authorize user actions. **Role Based Access Control** * Role binding CRUD * Role binding summaries (used by Confluent CLI) * High-level role binding management and rollups (used by Confluent Control Center ) **Centralized ACL control** ACL CRUD for legacy Kafka-managed and centralized MDS-based ACLs **Audit log configuration** Configuration governing which events get logged, and where those audit log events are sent. Works in conjunction with the Cluster Registry to push configuration changes to Kafka clusters. **Cluster registry** Tracking and naming CP components and clusters. * Manually populated and updated by Admins. * Leveraged by the Audit Log configuration. * Leveraged by RBAC APIs to allow for RoleBinding calls to use “nice names” for clusters instead of cluster IDs. ## Use CSFLE for Confluent Enterprise Client-side field level encryption (CSFLE) is available in Confluent Enterprise to help you protect sensitive data in your Confluent Enterprise and perform stream processing on encrypted data. You can use CSFLE with Confluent Enterprise without sharing access to your [Key Encryption Keys (KEKs)](../../../_glossary.md#term-key-encryption-key-KEK). Here are some key points about using CSFLE: * You must use a key management service (KMS) to manage access to your Key Encryption Keys (KEKs). * Extensive security checks and balances provided by Confluent protect your sensitive data. * No user or application in Confluent Enterprise can access your encrypted fields in plaintext. * Stream processing in Confluent Enterprise using Flink and ksqlDB is not possible because the data is encrypted and cannot be decrypted to perform operations. * Your organization manages running producers and consumers with the proper configurations to access the KEKs and encrypt or decrypt data. * You must use a key management service (KMS) to manage access to your Key Encryption Keys (KEKs). * Extensive security checks and balances provided by Confluent protect your sensitive data. * You own and manage your Key Encryption Keys (KEKs) and are responsible for overseeing the entire lifecycle of the KEKs. * Sharing access to KEKs is not supported. * Confluent never directly accesses your Key Encryption Keys (KEKs). Each KEK remains securely stored in your key management service (KMS) that is owned and managed by you. Confluent interacts with two APIs that use a KEK identifier and a payload to either encrypt or decrypt the payload with the specified KEK. Confluent can only see the KEK identifier and the payloads (encrypted or decrypted DEKs). * Use the logging and auditing capabilities provided by your KMS to monitor and trace all access to KEKs to address any compliance or regulatory requirements. The steps are summarized in the diagram below. ![Steps for client-side field level encryption and access control](images/csfle-no-cmk-access.png) ## Overview This tutorial provides a step-by-step example to enable [TLS/SSL encryption](protect-data/encrypt-tls.md#kafka-ssl-encryption), [SASL authentication](authentication/overview.md#kafka-sasl-auth), and [authorization](authorization/acls/overview.md#kafka-authorization) on Confluent Platform with monitoring using Confluent Control Center. Follow the steps to walk through configuration settings for securing Apache Kafka® brokers, Kafka Connect, and Confluent Replicator, plus all the components required for monitoring, including the Confluent Metrics Reporter. When working through the tutorial, be aware of the following: * For simplicity, this tutorial uses [SASL/PLAIN (or PLAIN)](authentication/sasl/plain/overview.md#kafka-sasl-auth-plain), a simple username/password authentication mechanism typically used with TLS encryption to implement secure authentication. * For production deployments of Confluent Platform, [SASL/GSSAPI (Kerberos)](authentication/sasl/gssapi/overview.md#kafka-sasl-auth-gssapi) or [SASL/SCRAM](authentication/sasl/scram/overview.md#kafka-sasl-auth-scram) is recommended. * Confluent Cloud uses [SASL/PLAIN (or PLAIN)](authentication/sasl/plain/overview.md#kafka-sasl-auth-plain) over TLS v1.2 encryption for authentication because it offers broad client support while providing a good level of security. The usernames and passwords used in the SASL exchange are API keys and secrets that should be securely managed using a secrets store and rotated periodically. ## Next steps To see a fully secured multi-node cluster, check out the Docker-based [Confluent Platform demo](../tutorials/cp-demo/index.md#cp-demo). It shows entire configurations, including security-related and non security-related configuration parameters, on all components in Confluent Platform, and the demo’s playbook has a security section for further learning. Read the [documentation](overview.md#security) for more details about security design and configuration on all components in Confluent Platform. While this tutorial uses the PLAIN mechanism for the SASL examples, Confluent additionally supports [GSSAPI (Kerberos)](authentication/sasl/gssapi/overview.md#kafka-sasl-auth-gssapi) and [SCRAM](authentication/sasl/scram/overview.md#kafka-sasl-auth-scram), which are more suitable for production. We welcome feedback in the [Confluent community](https://launchpass.com/confluentcommunity) security channel in Slack! ## Overview This example shows users how to build pipelines with Apache Kafka® in Confluent Platform. ![image](streams/images/pipeline.jpg) It showcases different ways to produce data to Kafka topics, with and without Kafka Connect, and various ways to serialize it for the Kafka Streams API and ksqlDB. | Example | Produce to Kafka Topic | Key | Value | Stream Processing | |-----------------------------------------|--------------------------------|--------|--------------|---------------------| | Confluent CLI Producer with String | CLI | String | String | Kafka Streams | | JDBC source connector with JSON | JDBC with SMT to add key | Long | Json | Kafka Streams | | JDBC source connector with SpecificAvro | JDBC with SMT to set namespace | null | SpecificAvro | Kafka Streams | | JDBC source connector with GenericAvro | JDBC | null | GenericAvro | Kafka Streams | | Java producer with SpecificAvro | Producer | Long | SpecificAvro | Kafka Streams | | JDBC source connector with Avro | JDBC | Long | Avro | ksqlDB | Detailed walk-thru of this example is available in the whitepaper [Kafka Serialization and Deserialization (SerDes) Examples](https://www.confluent.io/resources/kafka-streams-serialization-deserialization-code-examples) and the blog post [Building a Real-Time Streaming ETL Pipeline in 20 Minutes](https://www.confluent.io/blog/building-real-time-streaming-etl-pipeline-20-minutes/) ### Optional configuration parameters Here are the optional [Streams configuration parameters](../javadocs.md#streams-javadocs), with the level of importance indicated for each: - High: These parameters can have a significant impact on performance. Take care when deciding the values of these parameters. - Medium: These parameters can have some impact on performance. Your specific environment will determine how much tuning effort should be focused on these parameters. - Low: These parameters have a less general or less significant impact on performance. | Parameter Name | Importance | Description | Default Value | |-----------------------------------------------|--------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------| | acceptable.recovery.lag | Medium | The maximum acceptable lag (number of offsets to catch up) for an instance to be considered caught-up and ready for the active task. | 10,000 | | application.server | Low | A host:port pair pointing to an embedded user defined endpoint that can be used for discovering the locations of state stores within a single Kafka Streams application. The value of this must be different for each instance of the application. | the empty string | | buffered.records.per.partition | Low | The maximum number of records to buffer per partition. | 1000 | | cache.max.bytes.buffering | Medium | Deprecated in Confluent Platform 7.4. Use `statestore.cache.max.bytes` instead. | 10485760 bytes | | client.id | Medium | An ID string to pass to the server when making requests. (This setting is passed to the consumer/producer clients used internally by Kafka Streams.) | the empty string | | commit.interval.ms | Low | The frequency with which to save the position (offsets in source topics) of tasks. - For at-least-once processing, committing means saving the position (offsets) of the processor. - For exactly-once processing, it means to commit the transaction, which includes saving the position. | 30000 ms (`at_least_once`) / 100 ms (`exactly_once_v2`) | | connections.max.idle.ms | Low | The number of milliseconds to wait before closing idle connections. | 540000 ms (9 minutes) | | default.client.supplier | Low | Client supplier class that implements the `org.apache.kafka.streams.KafkaClientSupplier` interface. | | | default.deserialization.exception.handler | Medium | Deprecated. Use `default.production.exception.handler` instead. | See [default.deserialization.exception.handler](#streams-developer-guide-deh) | | default.dsl.store | Low | Deprecated in Confluent Platform 7.7. The default state store type used by DSL operators. | “rocksDB” | | default.key.serde | Medium | Default serializer/deserializer class for record keys, implements the `Serde` interface (see also value.serde). | `null` | | default.production.exception.handler | Medium | Exception handling class that implements the `ProductionExceptionHandler` interface. | See [default.production.exception.handler](#streams-def-prod-exc-hand) | | default.timestamp.extractor | Medium | Default timestamp extractor class that implements the `TimestampExtractor` interface. | See [Timestamp Extractor](#streams-developer-guide-timestamp-extractor) | | default.value.serde | Medium | Default serializer/deserializer class for record values, implements the `Serde` interface (see also key.serde). | `null` | | default.windowed.key.serde.inner | Medium | Deprecated in Confluent Platform 7.9. Use `windowed.inner.class.serde` instead. | `Serdes.ByteArray().getClass().getName()` | | default.windowed.value.serde.inner | Medium | Deprecated in Confluent Platform 7.9. Use `windowed.inner.class.serde` instead. | `Serdes.ByteArray().getClass().getName()` | | dsl.store.suppliers.class | Low | Defines a default state store implementation. | `BuiltInDslStoreSuppliers` | | enable.metrics.push | Medium | Push client metrics to the cluster, if the cluster has a client metrics subscription that matches this client. | | | ensure.explicit.internal.resource.naming | Medium | Enables enforcement of explicit naming for all internal resources of the topology, including internal topics. | `false` | | group.protocol | Low | The protocol used for group coordination. | `classic` | | log.summary.interval.ms | Low | Added to a window’s `maintainMs` to ensure data is not deleted from the log prematurely. Allows for clock drift. | 120000 milliseconds (2 minutes) | | max.task.idle.ms | Medium | Maximum amount of time Kafka Streams waits to fetch data to ensure in-order processing semantics. | 0 milliseconds | | max.warmup.replicas | Medium | The maximum number of warmup replicas (extra standbys beyond the configured num.standbys) that can be assigned at once. | 2 | | metadata.max.age.ms | Low | The period of time in milliseconds after which a refresh of metadata is forced. | 300000 ms (5 minutes) | | metric.reporters | Low | A list of classes to use as metrics reporters. | the empty list | | metrics.num.samples | Low | The number of samples maintained to compute metrics. | 2 | | metrics.recording.level | Low | The highest recording level for metrics. | `INFO` | | metrics.sample.window.ms | Low | The window of time a metrics sample is computed over. | 30000 milliseconds | | num.standby.replicas | High | The number of standby replicas for each task. | 0 | | num.stream.threads | Medium | The number of threads to execute stream processing. | 1 | | poll.ms | Low | The amount of time in milliseconds to block waiting for input. | 100 milliseconds | | probing.rebalance.interval.ms | Low | The maximum time to wait before triggering a rebalance to probe for warmup replicas that have sufficiently caught up. | 600000 milliseconds (10 minutes) | | processing.exception.handler | Medium | Exception handling class that implements the `ProcessingExceptionHandler` interface. | `LogAndFailProcessingExceptionHandler` | | processing.guarantee | Medium | The processing mode. Can be either `at_least_once` (default), or `exactly_once_v2` (for EOS version 2, requires Confluent Platform version 5.5.x / Kafka version 2.5.x or higher). Deprecated config options are `exactly_once` (for EOS version 1) and `exactly_once_beta` (for EOS version 2). | See [Processing Guarantee](#streams-developer-guide-processing-guarantee) | | production.exception.handler | Medium | Exception handling class that implements the `ProductionExceptionHandler` interface. For more information, see [production.exception.handler](#streams-developer-guide-production-exception-handler). | `DefaultProductionExceptionHandler` | | rack.aware.assignment.non_overlap_cost | Low | Cost associated with moving tasks from existing assignment. For more information, see [rack.aware.assignment.non_overlap_cost](#streams-developer-guide-rack-aware-assignment-non-overlap-cost). | `null` | | rack.aware.assignment.strategy | Low | The strategy used for rack-aware assignment. Values are “none” (default), “min_traffic”, and “balance_subtopology”. For more information, see [rack.aware.assignment.strategy](#streams-developer-guide-rack-aware-assignment-strategy). | `none` | | rack.aware.assignment.tags | Low | List of tag keys used to distribute standby replicas across Kafka Streams clients. | the empty list | | rack.aware.assignment.traffic_cost | Low | Cost associated with cross-rack traffic. For more information, see [rack.aware.assignment.traffic_cost](#streams-developer-guide-rack-aware-assignment-traffic-cost). | `null` | | replication.factor | High | The replication factor for changelog topics and repartition topics created by the application. If your broker cluster is on version Confluent Platform 5.4.x (Kafka 2.4.x) or newer, you can set -1 to use the broker default replication factor. | 1 | | retries | Medium | The number of retries for broker requests that return a retryable error. | 0 | | retry.backoff.ms | Medium | The amount of time in milliseconds, before a request is retried. This applies if the `retries` parameter is configured to be greater than 0. | 100 | | rocksdb.config.setter | Medium | The RocksDB configuration. | | | state.cleanup.delay.ms | Low | The amount of time in milliseconds to wait before deleting state when a partition has migrated. | 600000 milliseconds | | state.dir | High | Directory location for state stores. | `/${java.io.tmpdir}/kafka-streams` | | statestore.cache.max.bytes | Medium | Maximum number of memory bytes to be used for record caches across all threads. | 10485760 bytes | | task.assignor.class | Medium | A task assignor class or class name implementing the `TaskAssignor` interface. | The high-availability task assignor. | | task.timeout.ms | Medium | The maximum amount of time in ms a task might stall due to internal errors and retries until an error is raised. | 300000 milliseconds (5 minutes) | | topology.optimization | Low | Enables/Disables topology optimization. | `NO_OPTIMIZATION` | | upgrade.from | Medium | The version you are upgrading from during a rolling upgrade. | See [Upgrade From](#streams-developer-guide-upgrade-from) | | windowed.inner.class.serde | Medium | Serde for the inner class of a windowed record. | | | windowstore.changelog.additional.retention.ms | Low | Added to a windows maintainMs to ensure data is not deleted from the log prematurely. Allows for clock drift. | 86400000 milliseconds = 1 day | | window.size.ms | Low | Sets window size for the deserializer in order to calculate window end times. | `null` | ##### Join co-partitioning requirements For equi-joins, input data must be co-partitioned when joining. This ensures that input records with the same key, from both sides of the join, are delivered to the same stream task during processing. **It is your responsibility to ensure data co-partitioning when joining**. Co-partitioning is not required when performing [KTable-KTable Foreign-Key](#streams-developer-guide-dsl-joins-ktable-ktable-foreign-key) joins and [GlobalKTable](../concepts.md#streams-concepts-globalktable) joins. The requirements for data co-partitioning are: * The input topics of the join (left side and right side) must have the **same number of partitions**. * All applications that *write* to the input topics must have the **same partitioning strategy** so that records with the same key are delivered to same partition number. In other words, the keyspace of the input data must be distributed across partitions in the same manner. This means that, for example, applications that use Kafka’s [Java Producer API](../../clients/overview.md#kafka-clients) must use the same partitioner (cf. the producer setting `"partitioner.class"` aka `ProducerConfig.PARTITIONER_CLASS_CONFIG`), and applications that use the Kafka’s Streams API must use the same `StreamPartitioner` for operations such as `KStream#to()`. The good news is that, if you happen to use the default partitioner-related settings across all applications, you do not need to worry about the partitioning strategy. Why is data co-partitioning required? Because [KStream-KStream](#streams-developer-guide-dsl-joins-kstream-kstream), [KTable-KTable](#streams-developer-guide-dsl-joins-ktable-ktable), and [KStream-KTable](#streams-developer-guide-dsl-joins-kstream-ktable) joins are performed based on the keys of records, for example, `leftRecord.key == rightRecord.key`. It is required that the input streams/tables of a join are co-partitioned by key. There are two exceptions in which co-partitioning is not required. : - For [KStream-GlobalKTable](#streams-developer-guide-dsl-joins-kstream-globalktable) joins, co-partitioning is not required because *all* partitions of the `GlobalKTable`’s underlying changelog stream are made available to each `KafkaStreams` instance, so each instance has a full copy of the changelog stream. Further, a `KeyValueMapper` allows for non-key based joins from the `KStream` to the `GlobalKTable`. - [KTable-KTable Foreign-Key](#streams-developer-guide-dsl-joins-ktable-ktable-foreign-key) joins do not require co-partitioning. Kafka Streams internally ensures co-partitioning for Foreign-Key joins. Kafka Streams partly verifies the co-partitioning requirement : During the partition assignment step, that is, at runtime, Kafka Streams verifies whether the number of partitions for both sides of a join are the same. If they’re not, a `TopologyBuilderException` (runtime exception) is being thrown. Note that Kafka Streams can’t verify whether the partitioning strategy matches between the input streams/tables of a join. You must ensure that this is the case. Ensuring data co-partitioning : If the inputs of a join are not co-partitioned yet, you must ensure this manually. You may follow a procedure such as outlined below. To avoid bottlenecks, we recommend repartitioning the topic with fewer partitions to match the larger partition number. It’s also possible to repartition the topic with more partitions to match the smaller partition number. For stream-table joins, we recommended repartitioning the KStream, because repartitioning a KTable may result is a second state store. For table-table joins, consider the size of the KTables and repartition the smaller KTable. 1. Identify the input KStream/KTable in the join whose underlying Kafka topic has the smaller number of partitions. Let’s call this stream/table “SMALLER”, and the other side of the join “LARGER”. To learn about the number of partitions of a Kafka topic you can use, for example, the CLI tool `bin/kafka-topics` with the `--describe` option. 2. Within your application, re-partition the data of “SMALLER”. You must ensure that, when repartitioning the data with repartition, the same partitioner is used as for “LARGER”. - If “SMALLER” is a KStream: `KStream#repartition(Repartitioned.numberOfPartitions(...))`. - If “SMALLER” is a KTable: `KTable#toStream#repartition(Repartitioned.numberOfPartitions(...).toTable())`. 3. Within your application, perform the join between “LARGER” and the new stream/table. # Integration with Confluent Control Center Since the 3.2 release, [Confluent Control Center](https://docs.confluent.io/control-center/current/overview.html) displays the underlying [producer metrics](../kafka/monitoring.md#kafka-monitoring-metrics-producer) and [consumer metrics](../kafka/monitoring.md#kafka-monitoring-metrics-consumer) of a Kafka Streams application, which the Kafka Streams API uses internally whenever data needs to be read from or written to Kafka topics. These metrics can be used, for example, to monitor the so-called “consumer lag” of an application, which indicates whether an application at its [current capacity and available computing resources](developer-guide/running-app.md#streams-developer-guide-execution-scaling) is able to keep up with the incoming data volume. In Control Center, all of the running instances of a Kafka Streams application appear as a single consumer group. Restore consumers of an application are displayed separately. Behind the scenes, the Streams API uses a dedicated “restore” consumer for the purposes of fault tolerance and state management. This restore consumer manually assigns and manages the topic partitions it consumes from and is not a member of the application’s consumer group. As a result, the restore consumers are displayed separately from their application. # Kafka Streams for Confluent Platform Kafka Streams is a client library for building applications and microservices, where the input and output data are stored in an Apache Kafka® cluster. It combines the simplicity of writing and deploying standard Java and Scala applications on the client side with the benefits of Kafka’s server-side [cluster](../_glossary.md#term-Kafka-cluster) technology. If your Kafka Streams applications use Confluent Cloud resources, you can monitor them with Confluent Cloud Console. For more information, see [Monitor Kafka Streams Applications in Confluent Cloud](/cloud/current/kafka-streams/monitor-kafka-streams-apps.html). Free Video Course : [The free Kafka Streams 101 course](https://developer.confluent.io/learn-kafka/kafka-streams/get-started/) shows what Kafka Streams is and how to get started with it. Quick Start Guide : [Build your first Kafka Streams application](https://developer.confluent.io/tutorials/creating-first-apache-kafka-streams-application/confluent.html) shows how to run a Java application that uses the Kafka Streams library by demonstrating a simple end-to-end data pipeline powered by Kafka. Streams Podcasts : [Streaming Audio](https://developer.confluent.io/podcast/) is a podcast from Confluent, the team that built Kafka. Confluent developer advocates and guests unpack a variety of topics surrounding Kafka, [event stream](../_glossary.md#term-event-stream) processing, and real-time data. - [Capacity Planning Your Apache Kafka Cluster](https://developer.confluent.io/podcast/capacity-planning-your-apache-kafka-cluster/) - [Real-Time Stream Processing with Kafka Streams ft. Bill Bejeck](https://developer.confluent.io/podcast/real-time-stream-processing-with-kafka-streams-ft-bill-bejeck) - [Running Hundreds of Stream Processing Applications with Apache Kafka at Wise](https://developer.confluent.io/podcast/running-hundreds-of-stream-processing-applications-with-apache-kafka-at-wise) - [Apache Kafka Fundamentals: The Concept of Streams and Tables ft. Michael Noll](https://confluent.buzzsprout.com/186154/3559354-apache-kafka-fundamentals-the-concept-of-streams-and-tables-ft-michael-noll) - [Introducing JSON and Protobuf Support ft. David Araujo and Tushar Thole](https://confluent.buzzsprout.com/186154/3970760-introducing-json-and-protobuf-support-ft-david-araujo-and-tushar-thole) Recommended Reading : - Blog post: [Introducing Apache Kafka 4.1](https://www.confluent.io/blog/introducing-apache-kafka-4-1/) - Blog post: [Streams and Tables in Apache Kafka: A Primer](https://www.confluent.io/blog/kafka-streams-tables-part-1-event-streaming/) - Blog post: [Introducing Kafka Streams: Stream Processing Made Simple](https://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/) - Course: [Kafka Streams 101](https://developer.confluent.io/learn-kafka/kafka-streams/get-started/) - Course: [Kafka Storage and Processing Fundamentals](https://developer.confluent.io/learn/kafka-storage-and-processing/) Screencasts : Watch [Apache Kafka 4.1: Enhanced Stability, New OAuth Support, Scalable Queues, Broker-Side Rebalancing](https://www.youtube.com/watch?v=cr9cDJGjm2E) on YouTube. Watch the [Intro to Streams API](https://www.youtube.com/watch?v=Z3JKCLG3VP4) on YouTube. ## Deploy Confluent Replicator Starting in the 6.1.0 release, Ansible Playbooks for Confluent Platform supports deployment of [Confluent Replicator](/platform/current/multi-dc-deployments/replicator/replicator-run.html#replicator-executable). Using Ansible, you can deploy Replicator with the following security mechanisms: * SASL/PLAIN * SASL/SCRAM * Kerberos * mTLS * Plaintext (which is no auth no encryption) The general deployment model is to deploy Replicator after both the source and destination clusters have been deployed. We recommend creating an inventory file specifically for the Replicator deployment, excluding other cluster deployment-related configuration. In this section, an example file, `replicator-hosts.yml`, is used. There are two clusters in this example, the source cluster and the destination cluster. Replicator has four client connections split across the two clusters: * Replicator configuration connection to the cluster which is used for storing configuration information in topics. See [Configure Replicator configuration connection](#ansible-replicator-client-connection). * Replicator monitoring connection which is used to produce metrics to the metrics cluster. This is often the same cluster as the cluster used to store configuration information. See [Configure monitoring connection](#ansible-replicator-monitoring-connection). * Replicator consumer connection which is used to consume data from the source cluster. See [Configure consumer connection](#ansible-replicator-consumer-connection). * Replicator producer connection which is used to produce data to the destination cluster. See [Configure producer connection](#ansible-replicator-producer-connection). The following sections list the configuration properties required in the Replicator inventory file. The examples use: * SASL/PLAIN with TLS on the source cluster * Kerberos with TLS on the destination cluster After configuring the replicator, you deploy the replicator with the following command. The command uses the example inventory file, `replicator-hosts.yml`. ```bash ansible-playbook -i replicator-hosts.yml playbooks/all.yml ``` #### NOTE JWT assertion retrieval from file flow is not recommended for production environments. Use [local client assertion flow](#ansible-oauth-client-local-client-assertion) instead. To configure JWT assertion retrieval from file flow: 1. Set the [OAuth client assertion variables](#ansible-oauth-client-local-client-assertion) 2. Enable JWT assertion retrieval from file flow using the following variables for Confluent Platform components. Set the variable to the directory where client assertion files exist. ```yaml oauth_superuser_oauth_client_assertion_file_base_path: kafka_broker_oauth_client_assertion_file_base_path: kafka_controller_oauth_client_assertion_file_base_path: schema_registry_oauth_client_assertion_file_base_path: kafka_connect_oauth_client_assertion_file_base_path: ksql_oauth_client_assertion_file_base_path: kafka_rest_oauth_client_assertion_file_base_path: kafka_connect_replicator_oauth_client_assertion_file_base_path: kafka_connect_replicator_producer_oauth_client_assertion_file_base_path: kafka_connect_replicator_erp_oauth_client_assertion_file_base_path: kafka_connect_replicator_consumer_erp_oauth_client_assertion_file_base_path: ``` 3. Each component acting as a client to the server component must have an individual assertion file at the above base file path you set above (`_oauth_client_assertion_file_base_path:`) to prevent token reuse issues. The following is an example ksqlDB directory structure for JWT assertion retrieval from file flow: ```bash ksql_oauth_client_assertion_file_base_path/kafka_client.jwt ksql_oauth_client_assertion_file_base_path/schema_registry_client.jwt ksql_oauth_client_assertion_file_base_path/mds_client.jwt ksql_oauth_client_assertion_file_base_path/ksql_client.jwt ``` For a full list of client assertion files, see the Confluent Ansible variables file at: ```html https://github.com/confluentinc/cp-ansible/blob/8.1.0-post/roles/variables/vars/main.yml ``` #### Settings for RBAC with mTLS Sample inventory files for RBAC configurations are provided in the `sample_inventories` directory under the Confluent Ansible home directory: ```html https://github.com/confluentinc/cp-ansible/blob/8.1.0-post/docs/sample_inventories/ ``` Add the required variables in your inventory file to enable and configure RBAC with mTLS. The following are the most commonly used variables to enable RBAC with mTLS: * `rbac_enabled` Set to `true` for RBAC. * `auth_mode` Authorization mode on all Confluent Platform components. Set to `mtls` for RBAC with mTLS only. * `mds_ssl_client_authentication` The configuration of the MDS server to enforce SSL client authentication on MDS. The MDS server will use mTLS certificates for authentication and the principal extracted from certificates for authorization. Options are: * `none`: The client does not need to to send certificate. If the client sends certificate it will be ignored. * `requested`: The clients may or may not send the certificates. In case clients do not send certificates, there should be either LDAP or OAuth credentials/token which should provide the principal. This option is used during upgrades. * `required`: The client must send certificates to the server. Default: `none` * `ssl_client_authentication` Kafka broker listeners configuration to enforce SSL client authentication. Options are: * `none`: The client does not need to to send certificate. If the client sends certificate it will be ignored. * `requested`: The clients may or may not send the certificates. In case clients do not send certificates, there should be either LDAP or OAuth credentials/token which should provide the principal. This option is used during upgrades. * `required`: The client must send certificates to the server. Default: `none` * `_ssl_client_authentication` The component-level setting for Schema Registry, Connect, REST Proxy to enforce SSL client authentication. Options are: * `none`: The client does not need to to send certificate. If the client sends certificate it will be ignored. * `requested`: The clients may or may not send the certificates. In case clients do not send certificates, there should be either LDAP or OAuth credentials/token which should provide the principal. This option is used during upgrades. * `required`: The client must send certificates to the server. Default: The value of `ssl_client_authentication` * `erp_ssl_client_authentication` Embedded REST Proxy server’s configuration to enforce SSL client authentication on Embedded REST Proxy. Options are: * `none`: The client does not need to to send certificate. If the client sends certificate it will be ignored. * `requested`: The clients may or may not send the certificates. In case clients do not send certificates, there should be either LDAP or OAuth credentials/token which should provide the principal. This option is used during upgrades. * `required`: The client must send certificates to the server. Default: `mds_ssl_client_authentication` value * `impersonation_super_users` Required for `auth_mode: mtls`. A list of principals allowed to get an impersonation token for other users except the impersonation-protected users (`impersonation_protected_users`). For more information, see [Enable Token-based Authentication for RBAC](https://docs.confluent.io/platform/current/security/authorization/rbac/configure-mtls-rbac.html). For example: ```yaml impersonation_super_users: - 'kafka_broker' - 'kafka_rest' - 'schema_registry' - 'kafka_connect' ``` Default: None * `impersonation_protected_users` Required for RBAC with mTLS only. A list of principals who cannot be impersonated by REST Proxy. Super users should be added here to disallow them from being impersonated. For example: ```yaml impersonation_protected_users: - 'super_user' ``` * `principal_mapping_rules` The rules to map a distinguished name from the certificates to a short principal name. Default: `DEFAULT` For example: ```yaml principal_mapping_rules: - "RULE:.*CN=([a-zA-Z0-9.-_]*).*$/$1/" - "DEFAULT" ``` For details about principal mapping rules, see [Principal Mapping Rules for SSL Listeners](https://docs.confluent.io/platform/current/kafka/configure-mds/mutual-tls-auth-rbac.html#principal-mapping-rules-for-ssl-listeners-extract-a-principal-from-a-certificate). * `rbac_super_users` Additional list of super user principals for RBAC-enabled Confluent Platform clusters. When mTLS is enabled on Kafka brokers or KRaft controllers, their certificate principals should be passed in this list. You can add certificate principals and any other super users you want in this variable, and it will reach to both broker and controller. If you define this, Confluent Ansible does not pick the certificate principal of KRaft in Kafka brokers and Kafka brokers in KRaft automatically. You must explicitly add those principals in this `rbac_super_users` list. Default: None For example: ```yaml all: rbac_super_users: - User:C=US,ST=Ca,L=PaloAlto,O=CONFLUENT,OU=TEST,CN=kafka_broker - User:C=US,ST=Ca,L=PaloAlto,O=CONFLUENT,OU=TEST,CN=kafka_controller - User:CN=kafka_user1 ``` ## Cluster registry You can use Ansible Playbooks for Confluent Platform to name your clusters within the [cluster registries](/platform/current/security/cluster-registry.html) in Confluent Platform. Cluster registry provides a way to centrally register and identify Kafka clusters in the metadata service (MDS) to simplify the RBAC role binding process and to enable centralized audit logging. Register the Kafka clusters in the MDS cluster registry using the following variables in the inventory file of the cluster. * To register a Kafka cluster in the MDS: ```none kafka_broker_cluster_name: ``` * To register a Schema Registry cluster in the MDS: ```none schema_registry_cluster_name: ``` * To register a Kafka Connect cluster in the MDS: ```none kafka_connect_cluster_name: ``` * To register a ksqlDB cluster in the MDS: ```none ksql_cluster_name: ``` ## Add Confluent license To add a Confluent license key for Confluent Platform components, use a custom property for each Confluent Platform component in the `hosts.yml` file as following: ```yaml all: vars: kafka_broker_custom_properties: confluent.license: kafka.rest.confluent.license.topic: "_confluent-command" schema_registry_custom_properties: confluent.license: kafka_connect_custom_properties: confluent.license: control_center_next_gen_custom_properties: confluent.license: kafka_rest_custom_properties: confluent.license: ksql_custom_properties: confluent.license: ``` Note that Confluent Server (Kafka broker) contains Kafka REST Server, and this component also requires a valid license configuration. Set the `kafka.rest.confluent.license.topic` property to the `_confluent-command` topic that stores the Confluent license. To add license to a connector, use the following config in the `hosts.yaml` file: ```yaml all: vars: kafka_connect_connectors: - name: sample-connector config: confluent.license: ``` The following example adds a license key for Kafka and Schema Registry. The example creates a variable for the license key and uses the variable in the custom properties. ```yaml vars: confluent_license: asdfkjkadslkfjaslkdf kafka_broker_custom_properties: confluent.license: "{{ confluent_license }}" kafka.rest.confluent.license.topic: "_confluent-command" schema_registry_custom_properties: confluent.license: "{{ confluent_license }}" ``` For additional license configuration parameters you can set with the above custom properties, see [License Configurations for Confluent Platform](https://docs.confluent.io/platform/current/installation/configuration/license-configs.html#license-configurations-for-cp). #### Produce Records 1. Run the producer, passing in arguments for: - the local file with configuration parameters to connect to your Kafka cluster - the topic name ```bash lein producer $HOME/.confluent/java.config test1 ``` You should see: ```text … Producing record: alice {"count":0} Producing record: alice {"count":1} Producing record: alice {"count":2} Producing record: alice {"count":3} Producing record: alice {"count":4} Produced record to topic test1 partition [0] @ offest 0 Produced record to topic test1 partition [0] @ offest 1 Produced record to topic test1 partition [0] @ offest 2 Produced record to topic test1 partition [0] @ offest 3 Produced record to topic test1 partition [0] @ offest 4 Producing record: alice {"count":5} Producing record: alice {"count":6} Producing record: alice {"count":7} Producing record: alice {"count":8} Producing record: alice {"count":9} Produced record to topic test1 partition [0] @ offest 5 Produced record to topic test1 partition [0] @ offest 6 Produced record to topic test1 partition [0] @ offest 7 Produced record to topic test1 partition [0] @ offest 8 Produced record to topic test1 partition [0] @ offest 9 10 messages were produced to topic test1! ``` 2. View the [producer code](https://github.com/confluentinc/examples/tree/latest/clients/cloud/clojure/src/io/confluent/examples/clients/clj/producer.clj) ### Produce Records 1. Build the client examples: ```text ./gradlew clean build ``` 2. Run the producer, passing in arguments for: - the local file with configuration parameters to connect to your Kafka cluster - the topic name ```text ./gradlew runApp -PmainClass="io.confluent.examples.clients.cloud.ProducerExample" \ -PconfigPath="$HOME/.confluent/java.config" \ -Ptopic="test1" ``` 3. Verify the producer sent all the messages. You should see: ```text ... Producing record: alice {"count":0} Producing record: alice {"count":1} Producing record: alice {"count":2} Producing record: alice {"count":3} Producing record: alice {"count":4} Producing record: alice {"count":5} Producing record: alice {"count":6} Producing record: alice {"count":7} Producing record: alice {"count":8} Producing record: alice {"count":9} Produced record to topic test1 partition [0] @ offset 0 Produced record to topic test1 partition [0] @ offset 1 Produced record to topic test1 partition [0] @ offset 2 Produced record to topic test1 partition [0] @ offset 3 Produced record to topic test1 partition [0] @ offset 4 Produced record to topic test1 partition [0] @ offset 5 Produced record to topic test1 partition [0] @ offset 6 Produced record to topic test1 partition [0] @ offset 7 Produced record to topic test1 partition [0] @ offset 8 Produced record to topic test1 partition [0] @ offset 9 10 messages were produced to topic test1 ... ``` 4. View the [producer code](https://github.com/confluentinc/examples/tree/latest/clients/cloud/groovy/src/main/groovy/io/confluent/examples/clients/cloud/ProducerExample.groovy). ### Kafka Streams 1. Run the Kafka Streams application, passing in arguments for: - the local file with configuration parameters to connect to your Kafka cluster - the same topic name you used earlier ```bash ./gradlew runApp -PmainClass="io.confluent.examples.clients.cloud.StreamsExample" \ -PconfigPath="$HOME/.confluent/java.config" \ -Ptopic="test1" ``` 2. Verify the consumer received all the messages. You should see: ```text ... [Consumed record]: alice, 0 [Consumed record]: alice, 1 [Consumed record]: alice, 2 [Consumed record]: alice, 3 [Consumed record]: alice, 4 [Consumed record]: alice, 5 [Consumed record]: alice, 6 [Consumed record]: alice, 7 [Consumed record]: alice, 8 [Consumed record]: alice, 9 ... [Running count]: alice, 0 [Running count]: alice, 1 [Running count]: alice, 3 [Running count]: alice, 6 [Running count]: alice, 10 [Running count]: alice, 15 [Running count]: alice, 21 [Running count]: alice, 28 [Running count]: alice, 36 [Running count]: alice, 45 ... ``` 3. When you are done, press `CTRL-C`. 4. View the [Kafka Streams code](https://github.com/confluentinc/examples/tree/latest/clients/cloud/groovy/src/main/groovy/io/confluent/examples/clients/cloud/StreamsExample.groovy). ### Consume Avro Records 1. Consume from topic `test2` by doing the following: - Referencing a properties file ```bash docker-compose exec connect bash -c 'kafka-avro-console-consumer --topic test2 --bootstrap-server $CONNECT_BOOTSTRAP_SERVERS --consumer.config /tmp/ak-tools-ccloud.delta --property basic.auth.credentials.source=$CONNECT_VALUE_CONVERTER_BASIC_AUTH_CREDENTIALS_SOURCE --property schema.registry.basic.auth.user.info=$CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_BASIC_AUTH_USER_INFO --property schema.registry.url=$CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL --max-messages 5' ``` - Referencing individual properties ```bash docker-compose exec connect bash -c 'kafka-avro-console-consumer --topic test2 --bootstrap-server $CONNECT_BOOTSTRAP_SERVERS --consumer-property sasl.mechanism=PLAIN --consumer-property security.protocol=SASL_SSL --consumer-property sasl.jaas.config="$SASL_JAAS_CONFIG_PROPERTY_FORMAT" --property basic.auth.credentials.source=$CONNECT_VALUE_CONVERTER_BASIC_AUTH_CREDENTIALS_SOURCE --property schema.registry.basic.auth.user.info=$CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_BASIC_AUTH_USER_INFO --property schema.registry.url=$CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL --max-messages 5' ``` You should see the following messages: ```text {"ordertime":{"long":1494153923330},"orderid":{"int":25},"itemid":{"string":"Item_441"},"orderunits":{"double":0.9910185646928878},"address":{"io.confluent.ksql.avro_schemas.KsqlDataSourceSchema_address":{"city":{"string":"City_61"},"state":{"string":"State_41"},"zipcode":{"long":60468}}}} ``` 2. When you are done, press `CTRL-C`. 3. View the [consumer Avro code](https://github.com/confluentinc/examples/tree/latest/clients/cloud/kafka-connect-datagen/start-docker-avro.sh). ### Kafka Streams 1. Run the Kafka Streams application, passing in arguments for: - the local file with configuration parameters to connect to your Kafka cluster - the same topic name you used earlier ```bash ./gradlew runApp -PmainClass="io.confluent.examples.clients.cloud.StreamsExample" \ -PconfigPath="$HOME/.confluent/java.config" \ -Ptopic="test1" ``` 2. Verify the consumer received all the messages. You should see: ```text ... [Consumed record]: alice, 0 [Consumed record]: alice, 1 [Consumed record]: alice, 2 [Consumed record]: alice, 3 [Consumed record]: alice, 4 [Consumed record]: alice, 5 [Consumed record]: alice, 6 [Consumed record]: alice, 7 [Consumed record]: alice, 8 [Consumed record]: alice, 9 ... [Running count]: alice, 0 [Running count]: alice, 1 [Running count]: alice, 3 [Running count]: alice, 6 [Running count]: alice, 10 [Running count]: alice, 15 [Running count]: alice, 21 [Running count]: alice, 28 [Running count]: alice, 36 [Running count]: alice, 45 ... ``` 3. When you are done, press `CTRL-C`. 4. View the [Kafka Streams code](https://github.com/confluentinc/examples/tree/latest/clients/cloud/kotlin/src/main/kotlin/io/confluent/examples/clients/cloud/StreamsExample.kt). ## Chained Transformation You can use SMTs together to perform a more complex transformation. The following examples show how the `ValueToKey` and `ExtractField` SMTs are chained together to set the key for data coming from a [JDBC Connector](../../../../kafka-connect-jdbc/current/index.html). During the transform, `ValueToKey` copies the message `c1` field into the message key and then `ExtractField` extracts just the integer portion of that field. ```json "transforms": "createKey,extractInt", "transforms.createKey.type": "org.apache.kafka.connect.transforms.ValueToKey", "transforms.createKey.fields": "c1", "transforms.extractInt.type": "org.apache.kafka.connect.transforms.ExtractField$Key", "transforms.extractInt.field": "c1" ``` The following shows what the message looked like before the transform. ```none "./bin/kafka-avro-console-consumer \ --bootstrap-server localhost:9092 \ --property schema.registry.url=http://localhost:8081 \ --property print.key=true \ --from-beginning \ --topic mysql-foobar null {"c1":{"int":1},"c2":{"string":"foo"},"create_ts":1501796305000,"update_ts":1501796305000} null {"c1":{"int":2},"c2":{"string":"foo"},"create_ts":1501796665000,"update_ts":1501796665000} ``` After the connector configuration is applied, new rows are inserted (piped) into the MySQL table: ```none "echo "insert into foobar (c1,c2) values (100,'bar');"|mysql --user=username --password=pw demo ``` The following is displayed in the Avro console consumer. Note that the key (the first value on the line) matches the value of c1, which was defined with the transforms. ```none 100 {"c1":{"int":100},"c2":{"string":"bar"},"create_ts":1501799535000,"update_ts":1501799535000} ``` ## Step 3: Convert the serialization format to JSON 1. Run the following statement to confirm that the current format of this table is Avro Schema Registry. ```sql SHOW CREATE TABLE gaming_player_activity_source; ``` Your output should resemble: ```text +-------------------------------------------------------------+ | SHOW CREATE TABLE | +-------------------------------------------------------------+ | CREATE TABLE `env`.`clus`.`gaming_player_activity_source` ( | | `key` VARBINARY(2147483647), | | `player_id` INT NOT NULL, | | `game_room_id` INT NOT NULL, | | `points` INT NOT NULL, | | `coordinates` VARCHAR(2147483647) NOT NULL, | | ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS | | WITH ( | | 'changelog.mode' = 'append', | | 'connector' = 'confluent', | | 'kafka.cleanup-policy' = 'delete', | | 'kafka.max-message-size' = '2097164 bytes', | | 'kafka.partitions' = '6', | | 'kafka.retention.size' = '0 bytes', | | 'kafka.retention.time' = '604800000 ms', | | 'key.format' = 'raw', | | 'scan.bounded.mode' = 'unbounded', | | 'scan.startup.mode' = 'earliest-offset', | | 'value.format' = 'avro-registry' | | ) | | | +-------------------------------------------------------------+ ``` 2. Run the following statement to create a second table that has the same schema but is configured with the value format set to JSON with Schema Registry. The key format is unchanged. ```sql CREATE TABLE gaming_player_activity_source_json ( `key` VARBINARY(2147483647), `player_id` INT NOT NULL, `game_room_id` INT NOT NULL, `points` INT NOT NULL, `coordinates` VARCHAR(2147483647) NOT NULL ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS WITH ( 'value.format' = 'json-registry', 'key.format' = 'raw' ); ``` This statement creates a corresponding Kafka topic and Schema Registry subject named `gaming_player_activity_source_json-value` for the value. 3. Run the following SQL to create a long-running statement that continuously transforms `gaming_player_activity_source` records into `gaming_player_activity_source_json` records. ```sql INSERT INTO gaming_player_activity_source_json SELECT * FROM gaming_player_activity_source; ``` 4. Run the following statement to confirm that records are continuously appended to the target table: ```sql SELECT * FROM gaming_player_activity_source_json; ``` Your output should resemble: ```none key player_id game_room_id points coordinates x'31303834' 1084 3583 211 [51,93] x'31303037' 1007 2268 55 [98,72] x'31303230' 1020 1625 431 [01,08] x'31303934' 1094 4760 43 [80,71] x'31303539' 1059 2822 390 [33,74] ... ``` 5. Run the following statement to confirm that the format of the `gaming_player_activity_source_json` table is JSON. ```sql SHOW CREATE TABLE gaming_player_activity_source_json; ``` Your output should resemble: ```text +--------------------------------------------------------------------------------------+ | SHOW CREATE TABLE | +--------------------------------------------------------------------------------------+ | CREATE TABLE `jim-flink-test-env`.`cluster_0`.`gaming_player_activity_source_json` ( | | `key` VARBINARY(2147483647), | | `player_id` INT NOT NULL, | | `game_room_id` INT NOT NULL, | | `points` INT NOT NULL, | | `coordinates` VARCHAR(2147483647) NOT NULL | | ) DISTRIBUTED BY HASH(`key`) INTO 6 BUCKETS | | WITH ( | | 'changelog.mode' = 'append', | | 'connector' = 'confluent', | | 'kafka.cleanup-policy' = 'delete', | | 'kafka.max-message-size' = '2097164 bytes', | | 'kafka.partitions' = '6', | | 'kafka.retention.size' = '0 bytes', | | 'kafka.retention.time' = '604800000 ms', | | 'key.format' = 'raw', | | 'scan.bounded.mode' = 'unbounded', | | 'scan.startup.mode' = 'earliest-offset', | | 'value.format' = 'json-registry' | | ) | | | +--------------------------------------------------------------------------------------+ ``` ## Step 2: Apply the Transform Topic action In the previous step, you created a Flink table and populated it with a few rows. In this step, you apply the Transform Topic action to create a transformed output table. 1. Navigate to the [Environments](https://confluent.cloud/environments) page, and in the navigation menu, click **Data portal**. 2. In the **Data portal** page, click the dropdown menu and select the environment for your workspace. 3. In the **Recently created** section, find your **users** topic and click it to open the details pane. 4. In the details pane, click **Actions**, and in the Actions list, click **Transform topic** to open the dialog. 5. In the **Action details** section, set up the transformation. - **user_id** field: select the **Key field** checkbox. - **registertime** field: enter *registration_time*. - **Partition count** property: enter *3*. - **Serialization format** property: select **JSON Schema**. By default, the name of the transformed topic is `users_transform`, and you can change this as desired. 6. In the **Runtime configuration** section, configure how the transformation statement will run. - (Optional) Select the Flink compute pool to run the embedding query. The current compute pool is selected as the default. - (Optional) Select **Run with a service account** for production jobs. The service account you select must have the EnvironmentAdmin role to create topics, schemas, and run Flink statements. - (Optional) Select **Show SQL** to view the Flink statement that does the transformation work. Your Flink SQL should resemble: ```sql CREATE TABLE `your-env`.`your-cluster`.`users_transform` DISTRIBUTED BY HASH ( `user_id` ) INTO 3 BUCKETS WITH ( 'value.format' = 'json-registry', 'key.format' = 'json-registry' ) AS SELECT `user_id`, `registertime` as `registration_time`, `gender`, `regionid` FROM `your-env`.`your-cluster`.`users`; ``` 7. Click **Confirm and run** to run the transformation statement. A **Summary** page displays the result of the job submission, showing the statement name and other details. ## Step 3: Inspect the transformed topic 1. In the **Summary** page, click the **Output topic** link for the **users_transform** topic, and in the topic’s details pane, click **Query** to open a Flink workspace. 2. Run the following statement to view the rows in the **users_transform** table. Note the renamed **registration_time** column. ```sql SELECT * FROM `users_transform`; ``` Click **Stop** to end the statement. 3. Run the following command to confirm that the `user_id` field in the transformed table is a key field. ```sql DESCRIBE `users_source_transform`; ``` Your output should resemble: ```text +-------------------+-----------+----------+------------+ | Column Name | Data Type | Nullable | Extras | +-------------------+-----------+----------+------------+ | user_id | STRING | NULL | BUCKET KEY | | registration_time | BIGINT | NULL | | | gender | STRING | NULL | | | regionid | STRING | NULL | | +-------------------+-----------+----------+------------+ ``` 4. Run the following command to confirm the serialization format and partition count on the transformed topic. ```sql SHOW CREATE TABLE `users_source_transform`; ``` Your output should resemble: ```text CREATE TABLE `your-env`.`your-cluster`.`users_transform` ( `user_id` VARCHAR(2147483647), `registration_time` BIGINT, `gender` VARCHAR(2147483647), `regionid` VARCHAR(2147483647) ) DISTRIBUTED BY HASH(`user_id`) INTO 3 BUCKETS WITH ( 'changelog.mode' = 'append', 'connector' = 'confluent', 'kafka.cleanup-policy' = 'delete', 'kafka.max-message-size' = '2097164 bytes', 'kafka.retention.size' = '0 bytes', 'kafka.retention.time' = '7 d', 'key.format' = 'json-registry', 'scan.bounded.mode' = 'unbounded', 'scan.startup.mode' = 'earliest-offset', 'value.format' = 'json-registry' ) ``` #### Step 3: Produce and consume with Confluent CLI The following is an example CLI command to produce to `test-topic`: ```text confluent kafka topic produce test-topic \ --protocol SASL_SSL \ --sasl-mechanism OAUTHBEARER \ --bootstrap ":19091,:19092" \ --ca-location scripts/security/snakeoil-ca-1.crt ``` - Specify `--protocol SASL_SSL` to use the SASL_SSL/OAUTHBEAER authentication. - Specify `--sasl-mechanism OAUTHBEARER` to enable the OAUTHBEARER mechanism. - `--bootstrap` is the list of hosts that the producer/consumer talks to. This list should be the same as what you configured in Step 1. Hosts should be separated by commas. - `--ca-location` is the path to the CA certificate verifying the broker’s key, and it’s required for SSL verification. For more details about setting up this flag, see [this document](https://github.com/edenhill/librdkafka/blob/master/INTRODUCTION.md#ssl). #### IMPORTANT The principal specified above is the Kafka user, the same as specified in [Kafka Broker](sasl.md#controlcenter-sasl-broker). For each Kafka topic that Confluent Control Center creates, ACLs are created to grant the specified principal the following privileges: - CREATE - WRITE - DESCRIBE - DESCRIBE_CONFIGS - READ The following ACLs are created to grant the specified principal privileges for the consumer group related to the Confluent Control Center Streams application: - READ ACLs granting the following privileges are also created for the cluster: - DESCRIBE - DESCRIBE_CONFIGS You must export a Control Center JAAS configuration before starting Control Center. ```bash export CONTROL_CENTER_OPTS='-Djava.security.auth.login.config=' control-center-start config/control-center.properties ``` ## Migrate to JavaScript Client from KafkaJS Below is a simple produce example for users migrating from KafkaJS. ```javascript // require('kafkajs') is replaced with require('@confluentinc/kafka-javascript').KafkaJS. const { Kafka } = require("@confluentinc/kafka-javascript").KafkaJS; async function producerStart() { const kafka = new Kafka({ kafkaJS: { brokers: [''], ssl: true, sasl: { mechanism: 'plain', username: '', password: '', }, } }); const producer = kafka.producer(); await producer.connect(); console.log("Connected successfully"); const res = [] for (let i = 0; i < 50; i++) { res.push(producer.send({ topic: 'test-topic', messages: [ { value: 'v222', partition: 0 }, { value: 'v11', partition: 0, key: 'x' }, ] })); } await Promise.all(res); await producer.disconnect(); console.log("Disconnected successfully"); } producerStart(); ``` To migrate to the JavaScript Client from the KafkaJS: 1. Change the import statement, and add a `kafkaJS` block around your configs. From: ```javascript const { Kafka } = require('kafkajs'); const kafka = new Kafka({ brokers: ['kafka1:9092', 'kafka2:9092'], /* ... */ }); const producer = kafka.producer({ /* ... */, }); ``` To: ```javascript const { Kafka } = require('@confluentinc/kafka-javascript').KafkaJS; const kafka = new Kafka({ kafkaJS: { brokers: ['kafka1:9092', 'kafka2:9092'], /* ... */ } }); const producer = kafka.producer({ kafkaJS: { /* ... */, } }); ``` 2. Try running your program. In case a migration is needed, an informative error will be thrown. If you’re using Typescript, some of these changes will be caught at compile time. 3. The most common expected changes to the code are: - For the **producer**: `acks`, `compression` and `timeout` are not set per `send()`. They must be configured in the top-level configuration while creating the producer. - For the **consumer**: - `fromBeginning` is not set per `subscribe()`. It must be configured in the top-level configuration while creating the consumer. - `autoCommit` and `autoCommitInterval` are not set per `run()`. They must be configured in the top-level configuration while creating the consumer. - `autoCommitThreshold` is not supported. - `eachBatch`’s batch size never exceeds 1. - For errors: Check the `error.code` rather than the error `name` or `type`. 4. A more exhaustive list of semantic and configuration differences is [presented below](#common). An example migration: ```diff -const { Kafka } = require('kafkajs'); +const { Kafka } = require('@confluentinc/kafka-javascript').KafkaJS; const kafka = new Kafka({ + kafkaJS: { clientId: 'my-app', brokers: ['kafka1:9092', 'kafka2:9092'] + } }) const producerRun = async () => { - const producer = kafka.producer(); + const producer = kafka.producer({ kafkaJS: { acks: 1 } }); await producer.connect(); await producer.send({ topic: 'test-topic', - acks: 1, messages: [ { value: 'Hello confluent-kafka-javascript user!' }, ], }); }; const consumerRun = async () => { // Consuming - const consumer = kafka.consumer({ groupId: 'test-group' }); + const consumer = kafka.consumer({ kafkaJS: { groupId: 'test-group', fromBeginning: true } }); await consumer.connect(); - await consumer.subscribe({ topic: 'test-topic', fromBeginning: true }); + await consumer.subscribe({ topic: 'test-topic' }); await consumer.run({ eachMessage: async ({ topic, partition, message }) => { console.log({ partition, offset: message.offset, value: message.value.toString(), }) }, }); }; producerRun().then(consumerRun).catch(console.error); ``` ### Consumer configuration changes ```javascript const consumer = kafka.consumer({ kafkaJS: { /* producer-specific configuration changes. */ } }); ``` Each allowed config property is discussed below. If there is any change in semantics or the default values, the property and the change is **highlighted in bold**. | Property | Default Value | Comment | |--------------------------|---------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | groupId | none | A mandatory string denoting consumer group name that this consumer is a part of. | | **partitionAssigners** | [PartitionAssigners.roundRobin] | Support for range, roundRobin, and cooperativeSticky assignors is provided. Custom assignors are not supported. | | **partitionAssignors** | [PartitionAssignors.roundRobin] | Alias for partitionAssigners | | **rebalanceTimeout** | **300000** | The maximum allowed time for each member to join the group once a rebalance has begun. Note, that setting this value also changes the max poll interval. Message processing in eachMessage/eachBatch must not take more than this time. | | heartbeatInterval | 3000 | The expected time in milliseconds between heartbeats to the consumer coordinator. | | metadataMaxAge | 5 minutes | Time in milliseconds after which to refresh metadata for known topics | | allowAutoTopicCreation | `true` | Determines if a topic should be created if it doesn’t exist while consuming. | | **maxBytesPerPartition** | 1048576 (1MB) | determines how many bytes can be fetched in one request from a single partition. There is a change in semantics, this size grows dynamically if a single message larger than this is encountered, and the client does not get stuck. | | minBytes | 1 | Minimum number of bytes the broker responds with (or wait until maxWaitTimeInMs) | | maxBytes | 10485760 (10MB) | Maximum number of bytes the broker responds with. | | **retry** | object | Identical to retry in the common configuration. This takes precedence over the common config retry. | | readUncommitted | false | If `true`, consumer will read transactional messages which have not been committed. | | **maxInFlightRequests** | null | Maximum number of in-flight requests **per broker connection.** If not set, it is practically unbounded (same as KafkaJS). | | rackId | null | Can be set to an arbitrary string which will be used for fetch-from-follower if set up on the cluster. | | **fromBeginning** | false | If there is initial offset in offset store or the desired offset is out of range, and this is true, we consume the earliest possible offset. **This is set on a per-consumer level, not on a per subscribe level.** | | **autoCommit** | `true` | Whether to periodically auto-commit offsets to the broker while consuming. **This is set on a per-consumer level, not on a per run level.** | | **autoCommitInterval** | 5000 | Offsets are committed periodically at this interval, if autoCommit is true. **This is set on a per-consumer level, not on a per run level. The default value is changed to 5 seconds.** | | outer config | {} | The configuration outside the kafkaJS block can contain any of the keys present in the librdkafka CONFIGURATION table. | ### Consume A `Consumer` receives messages from Kafka. The following example illustrates how to create an instance and start consuming messages: ```js const consumer = new Kafka().consumer({ 'bootstrap.servers': '', 'group.id': 'test-group', // Mandatory property for a consumer - the consumer group id. }); await consumer.connect(); await consumer.subscribe({ topics: ["test-topic"] }); consumer.run({ eachMessage: async ({ topic, partition, message }) => { console.log({ topic, partition, headers: message.headers, offset: message.offset, key: message.key?.toString(), value: message.value.toString(), }); } }); // Whenever we're done consuming, maybe after user input or a signal: await consumer.disconnect(); ``` The consumer must be connected before calling `run`. The `run` method starts the consumer loop, which takes care of polling the cluster, and will call the `eachMessage` callback for every message you get from the cluster. The message may contain several other fields besides the value. For example: ```js { // Key of the message - may not be set. key: Buffer.from('key'), // Value of the message - will be set. value: Buffer.from('value'), // The timestamp set by the producer or the broker in milliseconds since the Unix epoch. timestamp: '1734008723000', // The current epoch of the leader for this partition. leaderEpoch: 2, // Size of the message in bytes. size: 6, // Offset of the message on the partition. offset: '42', // Headers that were sent along with the message. headers: { 'header-key-0': ['header-value-0', 'header-value-1'], 'header-key-1': Buffer.from('header-value'), } } ``` A message is considered to be processed successfully when `eachMessage` for that message runs to completion without throwing an error. In case an error is thrown, the message is marked unprocessed, and `eachMessage` will be called with the same message again. #### Subscribe and rebalance To consume messages, the consumer must be a part of a [consumer group](https://github.com/confluentinc/librdkafka/blob/master/INTRODUCTION.md#consumer-groups), and it must subscribe to one or more topics. The group is specified with the `group.id` property, and the `subscribe` method should be called after connecting to the cluster. The consumer does not actually join the consumer group until `run` is called. Joining a consumer group causes a rebalance within all the members of that consumer group, where each consumer is assigned a set of partitions to consume from. Rebalances may also be caused when a consumer leaves a group by disconnecting, or when new partitions are added to a topic. It is possible to add a callback to track rebalances: ```js const rebalance_cb = (err, assignment) => { switch (err.code) { case ErrorCodes.ERR__ASSIGN_PARTITIONS: console.log(`Assigned partitions ${JSON.stringify(assignment)}`); break; case ErrorCodes.ERR__REVOKE_PARTITIONS: console.log(`Revoked partitions ${JSON.stringify(assignment)}`); break; default: console.error(err); } }; const consumer = new Kafka().consumer({ 'bootstrap.servers': '', 'group.id': 'test-group', 'rebalance_cb': rebalance_cb, }); ``` It’s also possible to modify the assignment of partitions, or pause consumption of newly assigned partitions just after a rebalance. ```js const rebalance_cb = (err, assignment, assignmentFns) => { switch (err.code) { case ErrorCodes.ERR__ASSIGN_PARTITIONS: // Change the assignment as needed - this mostly boils down to changing the offset to start consumption from, though // you are free to do anything. if (assignment.length > 0) { assignment[0].offset = 34; } assignmentFns.assign(assignment); // Can pause consumption of new partitions just after a rebalance. break; case ErrorCodes.ERR__REVOKE_PARTITIONS: break; default: console.error(err); } }; ``` Subscriptions can be changed anytime, and the running consumer triggers a rebalance whenever that happens. The current assignment of partitions to the consumer can be checked with the `assign` method. ### Metadata To retrieve metadata from Kafka, use the `getMetadata` method with `Kafka.Producer` or `Kafka.KafkaConsumer`. When fetching metadata for a specific topic, if a topic reference does not exist, one is created using the default configuration. See the documentation on `Client.getMetadata` if you want to set configuration parameters, for example, `acks` on a topic to produce messages to. The following example illustrates how to use the `getMetadata` method. ```js const opts = { topic: 'librdtesting-01', timeout: 10000 }; producer.getMetadata(opts, (err, metadata) => { if (err) { console.error('Error getting metadata'); console.error(err); } else { console.log('Got metadata'); console.log(metadata); } }); ``` Metadata on any connection is returned in the following data structure: ```js { orig_broker_id: 1, orig_broker_name: "broker_name", brokers: [ { id: 1, host: 'localhost', port: 40 } ], topics: [ { name: 'awesome-topic', partitions: [ { id: 1, leader: 20, replicas: [1, 2], isrs: [1, 2] } ] } ] } ``` ### Property-based example Create a configuration file for the connector. This file is included with the connector in `etc/kafka-connect-appdynamics-metrics/appdynamics-metrics-sink-connector.properties`. This configuration is typically used for [standalone workers](/platform/current/connect/concepts.html#standalone-workers). ```properties name=appdynamics-metrics-sink topics=appdynamics-metrics-topic connector.class=io.confluent.connect.appdynamics.metrics.AppDynamicsMetricsSinkConnector tasks.max=1 machine.agent.host= machine.agent.port= behavior.on.error=fail confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 reporter.bootstrap.servers=localhost:9092 reporter.result.topic.replication.factor=1 reporter.error.topic.replication.factor=1 key.converter=io.confluent.connect.avro.AvroConverter key.converter.schema.registry.url=http://localhost:8081 value.converter=io.confluent.connect.avro.AvroConverter value.converter.schema.registry.url=http://localhost:8081 ``` ### Load the Amazon Redshift Sink connector 1. Create a properties file for your Redshift Sink connector. ```text name=redshift-sink confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 connector.class=io.confluent.connect.aws.redshift.RedshiftSinkConnector tasks.max=1 topics=orders aws.redshift.domain=< Required Configuration > aws.redshift.port=< Required Configuration > aws.redshift.database=< Required Configuration > aws.redshift.user=< Required Configuration > aws.redshift.password=< Required Configuration > pk.mode=kafka auto.create=true ``` Fill in the configuration parameters of your cluster as they appear in your [Cluster Details](https://console.aws.amazon.com/redshift/home#cluster-list:). 2. Load the `redshift-sink` connector: ```bash confluent local load redshift-sink --config redshift-sink.properties ``` Your output should resemble the following: ```text { "name": "redshift-sink", "config": { "confluent.topic.bootstrap.servers": "localhost:9092", "connector.class": "io.confluent.connect.aws.redshift.RedshiftSinkConnector", "tasks.max": "1", "topics": "orders", "aws.redshift.domain": "cluster-name.cluster-id.region.redshift.amazonaws.com", "aws.redshift.port": "5439", "aws.redshift.database": "dev", "aws.redshift.user": "awsuser", "aws.redshift.password": "your-password", "auto.create": "true", "pk.mode": "kafka", "name": "redshift-sink" }, "tasks": [], "type": "sink" } ``` Note that non-CLI users can load the Redshift Sink connector by using the following command: ```text ${CONFLUENT_HOME}/bin/connect-standalone \ ${CONFLUENT_HOME}/etc/schema-registry/connect-avro-standalone.properties \ redshift-sink.properties ``` ## Quick Start The following quick start uses the `AzureBlobStorageSinkConnector` to write an Avro file from the Kafka topic named `blob_topic` to Azure Blob Storage. Also, the `AzureBlobStorageSinkConnector` should be completely stopped before starting the `AzureBlobStorageSourceConnector` to avoid creating source/sink cycle. Then, the `AzureBlobStorageSourceConnector` loads that Avro file from Azure Blob Storage to the Kafka topic named `copy_of_blob_topic`. For an example of how to get Kafka Connect connected to [Confluent Cloud](/cloud/current/index.html), see [Connect Self-Managed Kafka Connect to Confluent Cloud](/cloud/current/cp-component/connect-cloud-config.html#distributed-cluster). 1. Follow the instructions from Connect Azure Blob Storage Sink connector to set up the data to use below. 2. Install the connector through the [Confluent Hub Client](https://docs.confluent.io/current/connect/managing/confluent-hub/client.html). ```bash # run from your Confluent Platform installation directory confluent connect plugin install confluentinc/kafka-connect-azure-blob-storage-source:latest ``` ### Using a bundled schema specification There are a few quick start schema specifications bundled with the Datagen connector. These schemas are listed in [this directory](https://github.com/confluentinc/kafka-connect-datagen/tree/master/src/main/resources). To use one of these bundled schemas, refer to [this mapping](https://github.com/confluentinc/kafka-connect-datagen/blob/master/src/main/java/io/confluent/kafka/connect/datagen/DatagenTask.java#L66-L73). In the configuration file, set the property `quick start` to the associated name, as shown in the following example: ```text "quickstart": "users", ``` ### Install the Connector Refer to the [Debezium tutorial](https://github.com/debezium/debezium-examples/tree/master/tutorial#using-mysql) if you want to use Docker images for setting up Kafka, ZooKeeper, and Kafka Connect. Note that as of Confluent Platform 7.5, ZooKeeper is deprecated for new deployments. Confluent recommends KRaft mode for new deployments. For the following tutorial, you need to have a local setup of Confluent Platform. Navigate to your Confluent Platform installation directory and run the following command to install the connector: ```bash confluent connect plugin install debezium/debezium-connector-mysql:0.9.4 ``` Adding a new connector plugin requires restarting Connect. Use the Confluent CLI to restart Connect. ```bash confluent local services connect stop && confluent local services connect start Using CONFLUENT_CURRENT: /Users/username/Sandbox/confluent-snapshots/var/confluent.NuZHxXfq Starting Zookeeper Zookeeper is [UP] Starting Kafka Kafka is [UP] Starting Schema Registry Schema Registry is [UP] Starting Kafka REST Kafka REST is [UP] Starting Connect Connect is [UP] ``` Check if the MySQL plugin has been installed correctly and picked up by the plugin loader: ```bash curl -sS localhost:8083/connector-plugins | jq .[].class | grep mysql "io.debezium.connector.mysql.MySqlConnector" ``` ## Required properties `name` : Unique name for the connector. Trying to register again with the same name will fail. `connector.class` : The name of the Java class for the connector. You must use a value of `io.debezium.connector.postgresql.PostgresConnector` for the PostgreSQL connector. `tasks.max` : The maximum number of tasks that should be created for this connector. The connector always uses a single task and so does not use this value– the default is always acceptable. * Type: int * Default: 1 `plugin.name` : The name of the Postgres logical decoding plugin installed on the server. When the processed transactions are very large it is possible that the JSON batch event with all changes in the transaction will not fit into the hard-coded memory buffer of size 1 GB. In such cases, it is possible to switch to streaming mode when every change in transactions is sent as a separate message from PostgreSQL into Debezium. You can configure the streaming mode by setting `plugin.name` to `pgoutput`. For more details, see [PostgreSQL 10+ logical decoding support (pgoutput)](https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-pgoutput) in the Debezium documentation. * Type: string * Importance: medium * Default: `decoderbufs` * Valid values: `decoderbufs`, `wal2json` and `wal2json_rds`. There are two new options supported since 0.8.0.Beta1: `wal2json_streaming` and `wal2json_rds_streaming`. `slot.name` : The name of the Postgres logical decoding slot created for streaming changes from a plugin and database instance. Values must conform to Postgres replication slot naming rules which state: “Each replication slot has a name, which can contain lower-case letters, numbers, and the underscore character.” * Type: string * Importance: medium * Default: `debezium` `slot.drop.on.stop` : Indicates to drop or not to drop the logical replication slot when the connector finishes orderly. Should only be set to `true` in testing or development environments. Dropping the slot allows WAL segments to be discarded by the database. If set to `true` the connector may not be able to resume from the WAL position where it left off. * Type: string * Importance: low * Default: `false` `publication.name` : The name of the PostgreSQL publication created for streaming changes when using `pgoutput`. This publication is created at start-up if it does not already exist and it includes all tables. If the publication already exists, either for all tables or configured with a subset of tables, the connector uses the publication as it is defined. * Type: string * Importance: low * Default: `dbz_publication` `database.hostname` : IP address or hostname of the PostgreSQL database server. * Type: string * Importance: high `database.port` : Integer port number of the PostgreSQL database server. * Type: int * Importance: low * Default: `5432` `database.user` : Username to use when when connecting to the PostgreSQL database server. * Type: string * Importance: high `database.password` : Password to use when when connecting to the PostgreSQL database server. * Type: password * Importance: high `database.dbname` : The name of the PostgreSQL database from which to stream the changes. * Type: string * Importance: high `topic.prefix` : Topic prefix that provides a namespace for the particular PostgreSQL database server or cluster in which Debezium is capturing changes. The prefix should be unique across all other connectors, since it is used as a topic name prefix for all Kafka topics that receive records from this connector. Only alphanumeric characters, hyphens, dots and underscores must be used in the database server logical name. Do not change the value of this property. If you change the name value, after a restart, instead of continuing to emit events to the original topics, the connector emits subsequent events to topics whose names are based on the new value. * Type: string * Default: No default `schema.include.list` : An optional comma-separated list of regular expressions that match schema names to be monitored. Any schema name not included in the whitelist will be excluded from monitoring. By default all non-system schemas are monitored. May not be used with `schema.exclude.list`. * Type: list of strings * Importance: low `schema.exclude.list` : An optional comma-separated list of regular expressions that match schema names to be excluded from monitoring. Any schema name not included in the exclude list will be monitored, with the exception of system schemas. May not be used with `schema.whitelist`. * Type: list of strings * Importance: low `table.include.list` : An optional comma-separated list of regular expressions that match fully-qualified table identifiers for tables to be monitored. Any table not included in the whitelist is excluded from monitoring. Each identifier is in the form `schemaName.tableName`. By default the connector will monitor every non-system table in each monitored schema. May not be used with `table.exclude.list`. * Type: list of strings * Importance: low `table.exclude.list` : An optional comma-separated list of regular expressions that match fully-qualified table identifiers for tables to be excluded from monitoring. Any table not included in the exclude list is monitored. Each identifier is in the form `schemaName.tableName`. May not be used with `table.whitelist`. * Type: list of strings * Importance: low `column.include.list` : An optional, comma-separated list of regular expressions that match the fully-qualified names of columns that should be included in change event record values. Fully-qualified names for columns are of the form `schemaName.tableName.columnName`. Do not also set the `column.exclude.list` property. * Type: list of strings * Importance: low `column.exclude.list` : An optional comma-separated list of regular expressions that match the fully-qualified names of columns that should be excluded from change event message values. Fully-qualified names for columns are of the form `schemaName.tableName.columnName`. * Type: list of strings * Importance: low `skip.messages.without.change` : Specifies whether to skip publishing messages when there is no change in included columns. This would essentially filter messages if there is no change in columns included as per `column.include.list` or `column.exclude.list` properties. * Type: boolean * Default: false `time.precision.mode` : Time, date, and timestamps can be represented with different kinds of precision, including: - `adaptive`: Captures the time and timestamp values exactly as they are in the database. `adaptive` uses either millisecond, microsecond, or nanosecond precision values based on the database column type. - `adaptive_time_microseconds`: Captures the date, datetime and timestamp values exactly as they are in the database. - `adaptive_time_microseconds`: Uses either millisecond, microsecond, or nanosecond precision values based on the database column type, with the exception of `TIME` type fields, which are always captured as microseconds. - `connect`: Always represents time and timestamp values using Kafka Connect’s built-in representations for Time, Date, and Timestamp. `connect` uses millisecond precision regardless of database column precision. For more details, see [temporal values](https://debezium.io/docs/connectors/postgresql/#temporal-values). * Type: string * Importance: high * Default: `adaptive` `decimal.handling.mode` : Specifies how the connector should handle values for `DECIMAL` and `NUMERIC` columns: - `precise`: Represents values precisely using `java.math.BigDecimal`, which are represented in change events in binary form. - `double`: Represents them using double values. `double` may result in a loss of precision but is easier to use. - `string option`: Encodes values as a formatted string. `string option` is easy to consume but semantic information about the real type is lost. See [Decimal Values](https://debezium.io/docs/connectors/postgresql/#decimal-values). * Type: string * Importance: high * Default: `precise` `hstore.handling.mode` : Specifies how the connector should handle values for hstore columns. `map` represents using MAP. `json` represents them using JSON strings. The JSON option encodes values as formatted strings, such as `key`: `val`. For more details, see [HStore Values](https://debezium.io/docs/connectors/postgresql/#hstore-values). * Type: list of strings * Importance: low * Default: `map` `interval.handling.mode` : Specifies how the connector should handle values for interval columns. * Type: string * Default: `numeric` * Valid values: [`numeric` or `string`] `database.sslmode` : Sets whether or not to use an encrypted connection to the PostgreSQL server. The option of `disable` uses an unencrypted connection. `require` uses a secure (encrypted) connection and fails if one cannot be established. `verify-ca` is similar to `require`, but additionally verify the server TLS certificate against the configured Certificate Authority (CA) certificates. Fails if no valid matching CA certificates are found. `verify-full` is similar to `verify-ca` but additionally verify that the server certificate matches the host to which the connection is attempted. For more information, see the [PostgreSQL documentation](https://www.postgresql.org/docs/9.6/libpq-connect.html). * Type: string * Importance: low * Default: `disable` `database.sslcert` : The path to the file containing the SSL certificate of the client. See the [PostgreSQL documentation](https://www.postgresql.org/docs/9.6/libpq-connect.html) for more information. * Type: string * Importance: high `database.sslkey` : The path to the file containing the SSL private key of the client. See the [PostgreSQL documentation](https://www.postgresql.org/docs/9.6/libpq-connect.html) for more information. * Type: string `database.sslpassword` : The password to access the client private key from the file specified by `database.sslkey`. See the [PostgreSQL documentation](https://www.postgresql.org/docs/9.6/libpq-connect.html) for more information. * Type: string * Importance: low `database.sslrootcert` : The path to the file containing the root certificate(s) against which the server is validated. See the [PostgreSQL documentation](https://www.postgresql.org/docs/9.6/libpq-connect.html) for more information. * Type: string * Importance: low `database.tcpKeepAlive` : Enable TCP keep-alive probe to verify that database connection is still alive. Enabled by default. See the [PostgreSQL documentation](https://www.postgresql.org/docs/9.6/libpq-connect.html) for more information. * Type: string * Importance: low * Default: `true` `tombstones.on.delete` : Controls whether a tombstone event should be generated after a delete event. When `true` the delete operations are represented by a delete event and a subsequent tombstone event. When `false` only a delete event is sent. Emitting a tombstone event (the default behavior) allows Kafka to completely delete all events pertaining to the given key once the source record got deleted. * Type: string * Importance: high * Default: `true` `column.truncate.to.length.chars` : An optional, comma-separated list of regular expressions that match the fully-qualified names of character-based columns. Fully-qualified names for columns are of the form schemaName.tableName.columnName. In change event records, values in these columns are truncated if they are longer than the number of characters specified by length in the property name. You can specify multiple properties with different lengths in a single configuration. Length must be a positive integer, for example, `+column.truncate.to.20.chars.` * Type: list of strings * Default: No default `column.mask.with.length.chars` : An optional, comma-separated list of regular expressions that match the fully-qualified names of character-based columns. Fully-qualified names for columns are of the form `schemaName.tableName.columnName`. In change event values, the values in the specified table columns are replaced with length number of asterisk `(*)` characters. You can specify multiple properties with different lengths in a single configuration. Length must be a positive integer or zero. When you specify zero, the connector replaces a value with an empty string. * Type: list of strings * Default: No default `column.mask.hash.hashAlgorithm.with.salt.salt`; `column.mask.hash.v2.hashAlgorithm.with.salt.salt` : An optional, comma-separated list of regular expressions that match the fully-qualified names of character-based columns. Fully-qualified names for columns are of the form `..`. In the resulting change event record, the values for the specified columns are replaced with pseudonyms, which consists of the hashed value that results from applying the specified `hashAlgorithm` and `salt`. Based on the hash function that is used, referential integrity is maintained, while column values are replaced with pseudonyms. Supported hash functions are described in the [MessageDigest](https://docs.oracle.com/javase/7/docs/technotes/guides/security/StandardNames.html#MessageDigest) documentation. * Type: list of strings * Default: No default `column.propagate.source.type` : An optional, comma-separated list of regular expressions that match the database-specific data type name for some columns. Fully-qualified data type names are of the form `databaseName.tableName.typeName`, or `databaseName.schemaName.tableName.typeName`. For these data types, the connector adds parameters to the corresponding field schemas in emitted change records. The added parameters specify the original type and length of the column: `__debezium.source.column.type + __debezium.source.column.length + __debezium.source.column.scale`. * Type: list of strings * Default: No default `datatype.propagate​.source.type` : An optional, comma-separated list of regular expressions that match the database-specific data type name for some columns. Fully-qualified data type names are of the form `databaseName.tableName.typeName`, or `databaseName.schemaName.tableName.typeName`. For more details, see the [Debezium documentation](https://debezium.io/documentation/reference/1.3/connectors/postgresql.html#postgresql-property-datatype-propagate-source-type) * Type: list of strings * Default: No default `message.key.columns` : A list of expressions that specify the columns that the connector uses to form custom message keys for change event records that it publishes to the Kafka topics for specified tables. By default, Debezium uses the primary key column of a table as the message key for records that it emits. In place of the default, or to specify a key for tables that lack a primary key, you can configure custom message keys based on one or more columns. To establish a custom message key for a table, list the table, followed by the columns to use as the message key. Each list entry takes the following format: ```text :__, ``` To base a table key on multiple column names, insert commas between the column names. Each fully-qualified table name is a regular expression in the following format: ```text . ``` The property can include entries for multiple tables. Use a semicolon to separate table entries in the list. The following example sets the message key for the tables `inventory.customers` and `purchase.orders`: ```text inventory.customers:pk1,pk2;(.*).purchaseorders:pk3,pk4 ``` For the table inventory.customer, the columns pk1 and pk2 are specified as the message key. For the `purchaseorders` tables in any database, the columns `pk3` and `pk4` server as the message key. There is no limit to the number of columns that you use to create custom message keys. However, it’s best to use the minimum number that are required to specify a unique key. * Type: list * Default: empty string `publication.autocreate.mode` : Applies only when streaming changes by using the [pgoutput plug-in](https://www.postgresql.org/docs/current/sql-createpublication.html). The setting determines how creation of a publication should work. * Default: `all_tables` * Valid values: [`all_tables`, `disabled`, `filtered`] `replica.identity.autoset.values` : A comma-separated list of regular expressions that match fully-qualified tables and the replica identity value to be used in a table. This property determines the value for replica identity at the table level and will overwrite the existing value in the database. For more details about this property, see the [Debezium documentation](https://debezium.io/documentation/reference/2.4/connectors/postgresql.html#postgresql-replica-autoset-type). * Type: list of strings * Default: empty string `binary.handling.mode` : Specifies how binary (bytea) columns should be represented in change events. * Type: bytes or string * Importance: low * Valid values: [`bytes`, `base4`, `hex`] `schema.name.adjustment.mode` : Specifies how schema names should be adjusted for compatibility with the message converter used by the connector. Possible settings are: `avro`, which replaces the characters that cannot be used in the Avro type name with an underscore, and `none`, which does not apply any adjustment. * Type: string * Default: `avro` `field.name.adjustment.mode` : Specifies how schema names should be adjusted for compatibility with the message converter used by the connector. The following are possible settings: - `avro`: Replaces the characters that cannot be used in the Avro type name with an underscore - `none`: Does not apply any adjustment - `avro_unicode`: Replaces the underscore or characters that cannot be used in the Avro type name with corresponding unicode like `_uxxxx`. Note that `_` is an escape sequence like backslash in Java. For more details, see [Avro naming](https://debezium.io/documentation/reference/2.2/configuration/avro.html#avro-naming). * Type: string * Default: `none` `money.fraction.digits` : Specifies how many decimal digits should be used when converting Postgres money type to `java.math.BigDecimal`, which represents the values in change events. Applicable only when `decimal.handling.mode` is set to `precise`. * Type: int * Default: 2 `message.prefix.include.list` : An optional, comma-separated list of regular expressions that match names of logical decoding message prefixes for which you want to capture. Any logical decoding message with a prefix not included in `message.prefix.include.list` is excluded. Do not also set the `message.prefix.exclude.list` parameter when setting this property. For information about the structure of message events and about their ordering semantics, see message events. * Type: list of strings * Default: By default, all logical decoding messages are captured. `message.prefix.exclude.list` : An optional, comma-separated list of regular expressions that match names of logical decoding message prefixes for which you do not to capture. Any logical decoding message with a prefix that is not included in `message.prefix.exclude.list` is included. Do not also set the `message.prefix.include.list` parameter when setting this property. To exclude all logical decoding messages pass `.*` into this config. * Type: list of strings * Default: No default ### Install the Connector If you want to use Docker images for setting up Kafka, ZooKeeper and Connect, refer to the [Debezium tutorial](https://github.com/debezium/debezium-examples/tree/master/tutorial#using-sql-server/). For the following tutorial, it is required to have a local setup of the Confluent Platform. Note that as of Confluent Platform 7.5, ZooKeeper is deprecated for new deployments. Confluent recommends KRaft mode for new deployments. Navigate to your Confluent Platform installation directory and run the following command to install the connector: ```bash confluent connect plugin install debezium/debezium-connector-sqlserver:latest ``` Adding a new connector plugin requires restarting Connect. Use the Confluent CLI to restart Connect. ```bash confluent local services connect stop && confluent local services connect start Using CONFLUENT_CURRENT: /Users/username/Sandbox/confluent-snapshots/var/confluent.NuZHxXfq Starting Zookeeper Zookeeper is [UP] Starting Kafka Kafka is [UP] Starting Schema Registry Schema Registry is [UP] Starting Kafka REST Kafka REST is [UP] Starting Connect Connect is [UP] ``` Check if the SQL Server plugin has been installed correctly and picked up by the plugin loader. ```bash curl -sS localhost:8083/connector-plugins | jq '.[].class' | grep SqlServer "io.debezium.connector.sqlserver.SqlServerConnector" ``` #### NOTE Default connector properties are already set for this quick start. To view the connector properties, refer to `etc/kafka-connect-elasticsearch/quickstart-elasticsearch.properties`. 1. List the available predefined connectors using the following command: ```bash confluent local list ``` Example output: ```bash Bundled Predefined Connectors (edit configuration under etc/): elasticsearch-sink file-source file-sink jdbc-source jdbc-sink hdfs-sink s3-sink ``` 2. Load the `elasticsearch-sink` connector: ```bash confluent local load elasticsearch-sink ``` Example output: ```bash { "name": "elasticsearch-sink", "config": { "connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector", "tasks.max": "1", "topics": "test-elasticsearch-sink", "key.ignore": "true", "connection.url": "http://localhost:9200", "type.name": "kafka-connect", "name": "elasticsearch-sink" }, "tasks": [], "type": null } ``` 3. After the connector finishes ingesting data to Elasticsearch, enter the following command to check that data is available in Elasticsearch: ```bash curl -XGET 'http://localhost:9200/test-elasticsearch-sink/_search?pretty' ``` Example output: ```bash { "took" : 39, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 3, "max_score" : 1.0, "hits" : [ { "_index" : "test-elasticsearch-sink", "_type" : "kafka-connect", "_id" : "test-elasticsearch-sink+0+0", "_score" : 1.0, "_source" : { "f1" : "value1" } }, { "_index" : "test-elasticsearch-sink", "_type" : "kafka-connect", "_id" : "test-elasticsearch-sink+0+2", "_score" : 1.0, "_source" : { "f1" : "value3" } }, { "_index" : "test-elasticsearch-sink", "_type" : "kafka-connect", "_id" : "test-elasticsearch-sink+0+1", "_score" : 1.0, "_source" : { "f1" : "value2" } } ] } } ``` ## Property-based example Create a configuration file `firebase-sink.properties` with the following content. This file should be placed inside the Confluent Platform installation directory. This configuration is used typically along with [standalone workers](/platform/current/connect/concepts.html#standalone-workers). ```text name=FirebaseSinkConnector topics=artists,songs connector.class=io.confluent.connect.firebase.FirebaseSinkConnector tasks.max=1 gcp.firebase.credentials.path=file-path gcp.firebase.database.reference=database-url insert.mode=set/update/push key.converter=io.confluent.connect.avro.AvroConverter key.converter.schema.registry.url=http://localhost:8081 value.converter=io.confluent.connect.avro.AvroConverter value.converter.schema.registry.url":"http://localhost:8081 confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 confluent.license= ``` Run the connector with this configuration. ```bash confluent local load FirebaseSinkConnector --config firebase-sink.properties ``` The output should resemble: ```json { "name":"FirebaseSinkConnector", "config":{ "topics":"artists,songs", "tasks.max":"1", "connector.class":"io.confluent.connect.firebase.FirebaseSinkConnector", "gcp.firebase.database.reference":"https://.firebaseio.com", "gcp.firebase.credentials.path":"file-path-to-your-gcp-service-account-json-file", "insert.mode":"update", "key.converter" : "io.confluent.connect.avro.AvroConverter", "key.converter.schema.registry.url":"http://localhost:8081", "value.converter" : "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url":"http://localhost:8081", "confluent.topic.bootstrap.servers":"localhost:9092", "confluent.topic.replication.factor":"1", "name":"FirebaseSinkConnector" }, "tasks":[ { "connector":"FirebaseSinkConnector", "task":0 } ], "type":"sink" } ``` Confirm that the connector is in a `RUNNING` state. ```bash confluent local status FirebaseSinkConnector ``` The output should resemble: ```bash { "name":"FirebaseSinkConnector", "connector":{ "state":"RUNNING", "worker_id":"127.0.1.1:8083" }, "tasks":[ { "id":0, "state":"RUNNING", "worker_id":"127.0.1.1:8083" } ], "type":"sink" } ``` ## Quick Start This quick start uses the Google Cloud Functions Sink connector to consume records and send them to a Google Cloud Functions function. Prerequisites : - [Confluent Platform](/platform/current/installation/index.html) - [Confluent CLI](https://docs.confluent.io/confluent-cli/current/installing.html) (requires separate installation) 1. Before starting the connector, create and deploy a basic Google Cloud Functions instance. - Navigate to the [Google Cloud Console](https://console.cloud.google.com). - Go to the [Cloud Functions](https://console.cloud.google.com/functions) tab. - Create a new function. - For creating an unauthenticated function select **Allow unauthenticated invocations** and go ahead. - For authenticated functions select **Require Authentication** and then click Variables, Networking and Advanced Settings to display additional settings. Click the Service account drop down and select the desired service account. - Note down the project id, the region, and the function name as they will be used later. - Further, to add an invoker account for an already deployed function, click **Add members** in the Permission tab of the functions home page. In the popup, select add member and select *Cloud Functions Invoker* Role. 2. Install the connector by running the following command from your Confluent Platform installation directory: ```bash confluent connect plugin install confluentinc/kafka-connect-gcp-functions:latest ``` 3. Start Confluent Platform. ```bash confluent local start ``` 4. Produce test data to the `functions-messages` topic in Kafka using the CLI command below. ```bash echo key1,value1 | confluent local produce functions-messages --property parse.key=true --property key.separator=, echo key2,value2 | confluent local produce functions-messages --property parse.key=true --property key.separator=, echo key3,value3 | confluent local produce functions-messages --property parse.key=true --property key.separator=, ``` 5. Create a `gcp-functions.json` file with the following contents: ```json { "name": "gcp-functions", "config": { "topics": "functions-messages", "tasks.max": "1", "connector.class": "io.confluent.connect.gcp.functions.GoogleCloudFunctionsSinkConnector", "key.converter":"org.apache.kafka.connect.storage.StringConverter", "value.converter":"org.apache.kafka.connect.storage.StringConverter", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor":1, "function.name": "", "project.id": "", "region": "", "gcf.credentials.path": "", "reporter.bootstrap.servers": "localhost:9092", "reporter.error.topic.name": "test-error", "reporter.error.topic.replication.factor": 1, "reporter.error.topic.key.format": "string", "reporter.error.topic.value.format": "string", "reporter.result.topic.name": "test-result", "reporter.result.topic.key.format": "string", "reporter.result.topic.value.format": "string", "reporter.result.topic.replication.factor": 1 } } ``` #### NOTE For details about using this connector with Kafka Connect Reporter, see [Connect Reporter](/kafka-connectors/self-managed/userguide.html#userguide-connect-reporter). 6. Load the Google Cloud Functions Sink connector. ```bash confluent local load gcp-functions --config gcp-functions.json ``` #### IMPORTANT Don’t use the CLI commands in production environments. 7. Confirm that the connector is in a `RUNNING` state. ```bash confluent local status gcp-functions ``` 8. Confirm that the messages were delivered to the result topic in Kafka ```bash confluent local consume test-result --from-beginning ``` 9. Cleanup resources * Delete the connector ```bash confluent local unload gcp-functions ``` * Stop Confluent Platform ```bash confluent local stop ``` * Delete the created Google Cloud Function in the Google Cloud Platform portal. #### NOTE Before you begin: [Start](https://gemfire82.docs.pivotal.io/docs-gemfire/getting_started/15_minute_quickstart_gfsh.html) the VMware Tanzu GemFire locator and server. Create a cache region to store the data. Start the services using the Confluent CLI. ```bash confluent local start ``` Every service starts in order, printing a message with its status. ```bash Starting Zookeeper Zookeeper is [UP] Starting Kafka Kafka is [UP] Starting Schema Registry Schema Registry is [UP] Starting Kafka REST Kafka REST is [UP] Starting Connect Connect is [UP] Starting KSQL Server KSQL Server is [UP] Starting Control Center Control Center is [UP] ``` To import a few records with a simple schema in Kafka, start the Avro console producer as follows: ```bash ./bin/kafka-avro-console-producer --broker-list localhost:9092 --topic input_topic \ --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"f1","type":"string"}]}' ``` Then, in the console producer, enter the following: ```bash {"f1": "value1"} {"f1": "value2"} {"f1": "value3"} ``` The three records entered are published to the Kafka topic `input_topic` in Avro format. # HDFS 3 Sink Connector for Confluent Platform The Kafka Connect HDFS 3 Sink connector allows you to export data from Kafka topics to HDFS 3.x files in a variety of formats and integrates with Hive to make data immediately available for querying with HiveQL. Note the following: - This connector is released separately from the HDFS 2.x connector. If you are targeting an HDFS 2.x distribution, see the [HDFS 2 Sink connector for Confluent Platform](https://docs.confluent.io/kafka-connect-hdfs/current/index.html) documentation for more details. If you are upgrading from the HDFS 2 Sink connector for Confluent Platform, update `connector.class` to `io.confluent.connect.hdfs3.Hdfs3SinkConnector` and `partitioner.class` to `io.confluent.connect.storage.partitioner.*` All HDFS 2.x configurations are applicable in this connector. - The HDFS 3 Sink connector in your Docker image can only run on a Connect pod where the template includes the `runAsUser` property as shown in the following example: ```text podTemplate: podSecurityContext: fsGroup: 1000 runAsUser: 1000 runAsNonRoot: true ``` The connector periodically polls data from Apache Kafka® and writes them to HDFS. The data from each Kafka topic is partitioned by the provided partitioner and divided into chunks. Each chunk of data is represented as an HDFS file with topic, Kafka partition, start and end offsets of this data chunk in the file name. If a partitioner is not specified in the configuration, the default partitioner which preserves the Kafka partitioning is used. The size of each data chunk is determined by the number of records written to HDFS, the time written to HDFS, and schema compatibility. The HDFS 3 Sink connector integrates with Hive and when it is enabled, the connector automatically creates an external Hive partitioned table for each Kafka topic and updates the table according to the available data in HDFS. ### Extensible data formats Out of the box, the connector supports writing data to HDFS in Avro and Parquet format. However, you can write other formats to HDFS by extending the `Format` class. You must configure the `format.class` and `partitioner.class` if you want to write other formats to HDFS or use other partitioners. The following example configurations show how to write Parquet format and use the field partitioner: ```properties format.class=io.confluent.connect.hdfs3.parquet.ParquetFormat partitioner.class=io.confluent.connect.storage.partitioner.FieldPartitioner ``` You must use the [AvroConverter](/kafka-connectors/self-managed/userguide.html#configuring-key-and-value-converters), `ProtobufConverter`, or `JsonSchemaConverter` with `ParquetFormat` for this connector. Attempting to use the `JsonConverter` (with or without schemas) results in a NullPointerException and a StackOverflowException. When using the field partitioner, you must specify the `partition.field.name` configuration to specify the field name of the record that is used for partitioning. Note that if the source Kafka topic is stored as plain JSON, you can’t use a formatter that requires a schema, you can only use the JSON formatter. The following example shows how to use Parquet format and the field partitioner. 1. [Produce](https://docs.confluent.io/confluent-cli/current/command-reference/local/services/kafka/confluent_local_services_kafka_produce.html) test Avro data to the `parquet_field_hdfs` topic in Kafka. ```bash ./bin/kafka-avro-console-producer --broker-list localhost:9092 --topic parquet_field_hdfs \ --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"name","type":"string"}, {"name":"address","type":"string"}, {"name" : "age", "type" : "int"}, {"name" : "is_customer", "type" : "boolean"}]}' # paste each of these messages {"name":"Peter", "address":"Mountain View", "age":27, "is_customer":true} {"name":"David", "address":"Mountain View", "age":37, "is_customer":false} {"name":"Kat", "address":"Palo Alto", "age":30, "is_customer":true} {"name":"David", "address":"San Francisco", "age":35, "is_customer":false} {"name":"Leslie", "address":"San Jose", "age":26, "is_customer":true} {"name":"Dani", "address":"Seatle", "age":32, "is_customer":false} {"name":"Kim", "address":"San Jose", "age":30, "is_customer":true} {"name":"Steph", "address":"Seatle", "age":31, "is_customer":false} ``` 2. Create a `hdfs3-parquet-field.json` file with the following contents: ```json { "name": "hdfs3-parquet-field", "config": { "connector.class": "io.confluent.connect.hdfs3.Hdfs3SinkConnector", "tasks.max": "1", "topics": "parquet_field_hdfs", "hdfs.url": "hdfs://localhost:9000", "flush.size": "3", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "value.converter": "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url":"http://localhost:8081", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1", "format.class":"io.confluent.connect.hdfs3.parquet.ParquetFormat", "partitioner.class":"io.confluent.connect.storage.partitioner.FieldPartitioner", "partition.field.name":"is_customer" } } ``` 3. Load the HDFS3 Sink connector. ```bash confluent local load hdfs3-parquet-field --config hdfs3-parquet-field.json ``` 4. Confirm that the connector is in a `RUNNING` state. ```bash confluent local status hdfs3-parquet-field ``` 5. Validate that the Parquet data is in HDFS. ```bash # list files in partition called is_customer=true hadoop fs -ls /topics/parquet_field_hdfs/is_customer=true # the following should appear in the list # /topics/parquet_field_hdfs/is_customer=true/parquet_field_hdfs+0+0000000000+0000000002.parquet # /topics/parquet_field_hdfs/is_customer=true/parquet_field_hdfs+0+0000000004+0000000004.parquet ``` 6. Extract the contents of the file using the [parquet-tools-1.9.0.jar](https://repo1.maven.org/maven2/org/apache/parquet/parquet-tools/1.9.0/parquet-tools-1.9.0.jar). ```bash # substitute "" for the HDFS name node hostname hadoop jar parquet-tools-1.9.0.jar cat --json / hdfs:///topics/parquet_field_hdfs/is_customer=true/parquet_field_hdfs+0+0000000000+0000000002.parquet ``` 7. If you experience issues with the previous step, first copy the Parquet file from HDFS to the local filesystem and try again with java. ```bash hadoop fs -copyToLocal /topics/parquet_field_hdfs/is_customer=true/parquet_field_hdfs+0+0000000000+0000000002.parquet / /tmp/parquet_field_hdfs+0+0000000000+0000000002.parquet java -jar parquet-tools-1.9.0.jar cat --json /tmp/parquet_field_hdfs+0+0000000000+0000000002.parquet # expected output {"name":"Peter","address":"Mountain View","age":27,"is_customer":true} {"name":"Kat","address":"Palo Alto","age":30,"is_customer":true} ``` ## Quick Start In this quickstart, you copy Avro data from a single topic to a local HEAVY-AI database running on Docker. This example assumes you are running Kafka and Schema Registry locally on the default ports. It also assumes your have Docker installed and running. First, bring up HEAVY-AI database by running the following Docker command: ```bash docker run -d -p 6274:6274 omnisci/core-os-cpu:v4.7.0 ``` This starts the CPU-based community version of HEAVY-AI, and maps it to port 6274 on localhost. By default, the user name is `admin` and the password is `HyperInteractive`. The default database is `omnisci`. Start the Confluent Platform using the Confluent CLI command below. ```bash confluent local start ``` ## Quick start In this quick start, you copy data from a single Kafka topic to a measurement on a local Influx database running on Docker. This example assumes you are running Kafka and Schema Registry locally on the default ports. It also assumes you have Docker installed and running. Note that InfluxDB Docker can be replaced with any installed InfluxDB server. To get started, complete the following steps: 1. Start the Influx database by running the following Docker command: ```bash docker run -d -p 8086:8086 --name influxdb-local influxdb:1.7.7 ``` This starts the Influx database and maps it to port 8086 on `localhost`. By default, the username and password are blank. The database connection URL is `http://localhost:8086`. 2. Start the Confluent Platform using the following Confluent CLI command: ```bash confluent local start ``` ### Property-based example In this section, you will complete the steps in a property-based example. 1. Create a configuration file for the connector. This configuration is typically used with [standalone workers](/platform/current/connect/concepts.html#standalone-workers). Note that this file is included with the connector in `./etc/kafka-connect-influxdb/influxdb-sink-connector.properties` and contains the following settings: ```bash name=InfluxDBSinkConnector connector.class=io.confluent.influxdb.InfluxDBSinkConnector tasks.max=1 topics=orders influxdb.url=http://localhost:8086 influxdb.db=influxTestDB measurement.name.format=${topic} value.converter=io.confluent.connect.avro.AvroConverter value.converter.schema.registry.url=http://localhost:8081 The first few settings are common settings you specify for all connectors, except for topics which are specific to sink connectors like this one. The ``influxdb.url`` specifies the connection URL of the influxDB server. ``influxdb.db`` specifies the database bame. ``influxdb.username`` specifies the username, and ``influxdb.password`` specifies the password of the InfluxDB server, respectively. By default the username and password are blank for the previous InfluxDB server above, so it is not added in the configuration. ``` 2. Run the connector with the following configuration: ```bash confluent local load InfluxDBSinkConnector --config etc/kafka-connect-influxdb/influxdb-sink-connector.properties ``` ### Avro tags example In this section, you will complete the steps in an Avro tags example. 1. Configure your connector configuration with the values shown in the following example: ```text name=InfluxDBSinkConnector connector.class=io.confluent.influxdb.InfluxDBSinkConnector tasks.max=1 topics=products influxdb.url=http://localhost:8086 influxdb.db=influxTestDB measurement.name.format=${topic} value.converter=io.confluent.connect.avro.AvroConverter value.converter.schema.registry.url=http://localhost:8081 ``` 2. Create Avro tags for a topic named `products` using the following producer command: ```text kafka-avro-console-producer \ --broker-list localhost:9092 \ --topic products \ --property value.schema='{"name": "myrecord","type": "record","fields": [{"name":"id","type":"int"}, {"name": "product","type": "string"}, {"name": "quantity","type": "int"},{"name": "price","type": "float"}, {"name": "tags","type": {"name": "tags","type": "record","fields": [{"name": "DEVICE","type": "string"},{"name": "location","type": "string"}]}}]}' ``` The console producer waits for input. 3. Copy and paste the following records into the terminal: ```text {"id": 1, "product": "pencil", "quantity": 100, "price": 50, "tags" : {"DEVICE": "living", "location": "home"}} {"id": 2, "product": "pen", "quantity": 200, "price": 60, "tags" : {"DEVICE": "living", "location": "home"}} ``` 4. Verify the data is in InfluxDB. ### Topic to database example If `measurement.name.format` is not present in the configuration, the connector uses the Kafka topic name as the database name and takes the measurement name from a field in the message. 1. Configure your connector configuration with the values shown in the following example: ```text name=InfluxDBSinkConnector connector.class=io.confluent.influxdb.InfluxDBSinkConnector tasks.max=1 topics=products influxdb.url=http://localhost:8086 value.converter=io.confluent.connect.avro.AvroConverter value.converter.schema.registry.url=http://localhost:8081 ``` 2. Create an Avro record for a topic named `products` using the following producer command: ```text kafka-avro-console-producer \ --broker-list localhost:9092 \ --topic products \ --property value.schema='{"name": "myrecord","type": "record","fields": [{"name":"id","type":"int"}, {"name": "measurement","type":"string"}]}' ``` The console producer waits for input. 3. Copy and paste the following records into the terminal: ```text {"id": 1, "measurement": "test"} {"id": 2, "measurement": "test2"} ``` The following query shows the measurements and points written to InfluxDB. ```text > use products; > show measurements; name: measurements name test test2 > select * from test; name: test time id ---- -- 1601464614638 1 ``` ### Custom timestamp example In this section, you will complete the steps in a custom timestamp example. 1. Configure your connector configuration with the values shown in the following example: ```text name=InfluxDBSinkConnector connector.class=io.confluent.influxdb.InfluxDBSinkConnector tasks.max=1 topics=products influxdb.url=http://localhost:8086 influxdb.db=influxTestDB measurement.name.format=${topic} event.time.fieldname=time value.converter=io.confluent.connect.avro.AvroConverter value.converter.schema.registry.url=http://localhost:8081 ``` 2. Create an Avro record for a topic named `products` using the following producer command: ```text kafka-avro-console-producer \ --broker-list localhost:9092 \ --topic products \ --property value.schema='{"name": "myrecord","type": "record","fields": [{"name":"id","type":"int"}, {"name": "time","type":"long"}]}' ``` The console producer waits for input. Note that the timestamp needs to be in milliseconds since the Unix Epoch (Unix time). 3. Copy and paste the following record into the terminal: ```text {"id": 1, "time": 123412341234} ``` The following shows the custom timestamp written to InfluxDB. ```text > precision ms > select * from products; name: products time id ---- -- 123412341234 1 ``` ## Quick Start In this quick start, you copy data from a single measurement from a local Influx database running on Docker into a Kafka topic. This example assumes you are running Kafka and Schema Registry locally on the default ports. It also assumes you have Docker installed and running. First, bring up the Influx database by running the following Docker command: ```bash docker run -d -p 8086:8086 --name influxdb-local influxdb:1.7.7 ``` This starts the Influx database, and maps it to port 8086 on `localhost`. By default, the user name and password are blank. The database connection URL is `http://localhost:8086`. To create sample data in the Influx database, log in to the Docker container using the following command: ```bash docker exec -it bash ``` Once you are in the Docker container, log in to InfluxDB shell: ```bash influx ``` Your output should resemble: ```bash Connected to http://localhost:8086 version 1.7.7 InfluxDB shell version: 1.7.7 ``` ### Source connector configuration 1. Start the services using the Confluent CLI: ```bash confluent local start ``` 2. Create a configuration file named `kinesis-source-config.json` with the following contents. ```text { "name": "kinesis-source", "config": { "connector.class": "io.confluent.connect.kinesis.KinesisSourceConnector", "tasks.max": "1", "kafka.topic": "kinesis_topic", "kinesis.region": "US_WEST_1", "kinesis.stream": "my_kinesis_stream", "confluent.license": "", "name": "kinesis-source", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1" } } ``` The important configuration parameters used here are: - **kinesis.stream.name**: The Kinesis Stream to subscribe to. - **kafka.topic**: The Kafka topic in which the messages received from Kinesis are produced. - **tasks.max**: The maximum number of tasks that should be created for this connector. Each Kinesis [shard](https://docs.aws.amazon.com/streams/latest/dev/key-concepts.html#shard) is allocated to a single task. If the number of shards specified exceeds the number of tasks, the connector throws an exception and fails. - **kinesis.region**: The region where the stream exists. Defaults to `US_EAST_1` if not specified. - You may pass your AWS credentials to the Kinesis connector through your source connector configuration. To pass AWS credentials in the source configuration set the **aws.access.key.id** and the **aws.secret.key.id** parameters. ```text "aws.acess.key.id": "aws.secret.key.id": ``` 3. Run the following command to start the Kinesis Source connector. ```bash confluent local load source-kinesis --config source-kinesis-config.json ``` 4. Run the following command to check that the connector started successfully by viewing the Connect worker’s log: ```bash confluent local services connect log ``` 5. Start a Kafka Consumer in a separate terminal session to view the data exported by the connector into the Kafka topic ```text bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic kinesis_topic --from-beginning ``` 6. Stop the Confluent services using the following command: ```bash confluent local stop ``` ### Load the Kudu Source Connector Load the predefined Kudu Source connector. 1. Optional: View the available predefined connectors with this command: ```bash confluent local list ``` Your output should resemble: ```bash Bundled Predefined Connectors (edit configuration under etc/): elasticsearch-sink file-source file-sink jdbc-source jdbc-sink kudu-source kudu-sink hdfs-sink s3-sink ``` 2. Create a `kudu-source.json` file for your Kudu Source connector. ```text { "name": "kudu-source", "config": { "connector.class": "io.confluent.connect.kudu.KuduSourceConnector", "tasks.max": "1", "impala.server": "127.0.0.1", "impala.port": "21050", "kudu.database": "test", "mode": "incrementing", "incrementing.column.name": "id", "topic.prefix": "test-kudu-", "table.whitelist": "accounts", "key.converter": "io.confluent.connect.avro.AvroConverter", "key.converter.schema.registry.url": "http://localhost:8081", "value.converter": "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url": "http://localhost:8081", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1", "impala.ldap.password": "secret", "impala.ldap.user": "kudu", "name": "kudu-source" } } ``` 3. Load the `kudu-source` connector. The `test` file must be in the same directory where Connect is started. ```bash confluent local load kudu-source --config kudu-source.json ``` Your output should resemble: ```bash { "name": "kudu-source", "config": { "connector.class": "io.confluent.connect.kudu.KuduSourceConnector", "tasks.max": "1", "impala.server": "127.0.0.1", "impala.port": "21050", "kudu.database": "test", "mode": "incrementing", "incrementing.column.name": "id", "topic.prefix": "test-kudu-", "table.whitelist": "accounts", "key.converter": "io.confluent.connect.avro.AvroConverter", "key.converter.schema.registry.url": "http://localhost:8081", "value.converter": "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url": "http://localhost:8081", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1", "impala.ldap.password": "", "impala.ldap.user": "", "name": "kudu-source" }, "tasks": [], "type": "source" } ``` To check that it has copied the data that was present when you started Kafka Connect, start a console consumer, reading from the beginning of the topic: ```bash ./bin/kafka-avro-console-consumer --bootstrap-server localhost:9092 --topic test-kudu-accounts --from-beginning {"id":1,"name":{"string":"alice"}} {"id":2,"name":{"string":"bob"}} ``` The output shows the two records as expected, one per line, in the JSON encoding of the Avro records. Each row is represented as an Avro record and each column is a field in the record. You can see both columns in the table, `id` and `name`. The IDs were auto-generated and the column is of type `INTEGER NOT NULL`, which can be encoded directly as an integer. The `name` column has type `STRING` and can be `NULL`. The JSON encoding of Avro encodes the strings in the format `{"type": value}`, so you can see that both rows have `string` values with the names specified when you inserted the data. #### Procedure 1. In your Connect worker, run the following command: ```text kinit -kt /path/to/the/keytab --renewable -f ``` The flags `--renewable` and `-f` are required when using kinit, since a long running connector has to renew the ticket-granting ticket (TGT) and the tickets must be forwardable (`-f`). 2. If not running, start Confluent Platform. ```text confluent local start ``` 3. Create the following connector configuration JSON file and save it as `config5.json`. #### NOTE The Oracle CDC Source connector uses the `oracle.kerberos.cache.file` configuration property to specify the path to the cache file generated previously. ```json { "name": "SimpleOracleCDC_5", "config":{ "connector.class": "io.confluent.connect.oracle.cdc.OracleCdcSourceConnector", "name": "SimpleOracleCDC_5", "tasks.max":1, "key.converter": "io.confluent.connect.avro.AvroConverter", "key.converter.schema.registry.url": "http://localhost:8081", "value.converter": "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url": "http://localhost:8081", "confluent.topic.bootstrap.servers":"localhost:9092", "oracle.server": "", "oracle.port": 1521, "oracle.sid":"", "oracle.pdb.name":"", "oracle.username": "", "oracle.password": "", "start.from":"snapshot", "redo.log.topic.name": "redo-log-topic-5", "table.inclusion.regex":"", "table.topic.name.template":"", "table.topic.name.template": "", "connection.pool.max.size": 20, "confluent.topic.replication.factor":1, "topic.creation.groups": "redo", "topic.creation.redo.include": "redo-log-topic-5", "topic.creation.redo.replication.factor": 3, "topic.creation.redo.partitions": 1, "topic.creation.redo.cleanup.policy": "delete", "topic.creation.redo.retention.ms": 1209600000, "topic.creation.default.replication.factor": 3, "topic.creation.default.partitions": 5, "topic.creation.default.cleanup.policy": "compact", "oracle.kerberos.cache.file": "" } } ``` 4. Create `redo-log-topic-5`. Make sure the topic name matches the value you put for `"redo.log.topic.name"`. ```text bin/kafka-topics --create --topic redo-log-topic-5 \ --bootstrap-server broker:9092 --replication-factor 1 \ --partitions 1 --config cleanup.policy=delete \ --config retention.ms=120960000 ``` 5. Enter the following command to start the connector: ```text curl -s -X POST -H 'Content-Type: application/json' --data @config5.json http://localhost:8083/connectors | jq ``` 6. Enter the following command and verify that connector is started with two running tasks. ```text curl -s -X GET -H 'Content-Type: application/json' http://localhost:8083/connectors/SimpleOracleCDC_5/status | jq ``` 7. Perform `INSERT`, `UPDATE`, and `DELETE` row operations for each and verify following lists expected results: - The connector starts and has one running task. - Change event topics are created for each captured table. - The connector does not produce records for tables that were not included in regex or were explicitly excluded using the `table.exclusion.regex` property. - If the `redo.log.corruption.topic` property was configured, the connector sends corrupted records to the specified corruption topic. #### Procedure 1. If not running, start Confluent Platform. ```text confluent local start ``` 2. Create the following connector configuration JSON file and save it as `config6.json`. #### NOTE Note the following properties (shown in the example): * The Oracle CDC Source connector uses the `oracle.ssl.truststore.file` and `oracle.ssl.truststore.password` properties to specify the location of the truststore containing the trusted server certificate and the truststore password. * The passthrough properties `oracle.connection.javax.net.ssl.keyStore` and `oracle.connection.javax.net.ssl.keyStorePassword` are also used to supply the keystore location and password. ```json { "name": "SimpleOracleCDC_6", "config":{ "connector.class": "io.confluent.connect.oracle.cdc.OracleCdcSourceConnector", "name": "SimpleOracleCDC_6", "tasks.max":1, "key.converter": "io.confluent.connect.avro.AvroConverter", "key.converter.schema.registry.url": "http://localhost:8081", "value.converter": "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url": "http://localhost:8081", "confluent.topic.bootstrap.servers":"localhost:9092", "oracle.server": "", "oracle.port": 1521, "oracle.sid":"", "oracle.pdb.name":"", "oracle.username": "", "oracle.password": "", "start.from":"snapshot", "redo.log.topic.name": "redo-log-topic-6", "table.inclusion.regex":"", "table.topic.name.template":"", "table.topic.name.template": "", "connection.pool.max.size": 20, "confluent.topic.replication.factor":1, "topic.creation.groups": "redo", "topic.creation.redo.include": "redo-log-topic-6", "topic.creation.redo.replication.factor": 3, "topic.creation.redo.partitions": 1, "topic.creation.redo.cleanup.policy": "delete", "topic.creation.redo.retention.ms": 1209600000, "topic.creation.default.replication.factor": 3, "topic.creation.default.partitions": 5, "topic.creation.default.cleanup.policy": "compact", "oracle.ssl.truststore.file": "", "oracle.ssl.truststore.password": "", "oracle.connection.javax.net.ssl.keyStore": "", "oracle.connection.javax.net.ssl.keyStorePassword": "" } } ``` 3. Create `redo-log-topic-6`. Make sure the topic name matches the value you put for `"redo.log.topic.name"`. ```text bin/kafka-topics --create --topic redo-log-topic-6 \ --bootstrap-server broker:9092 --replication-factor 1 \ --partitions 1 --config cleanup.policy=delete \ --config retention.ms=120960000 ``` 4. Enter the following command to start the connector: ```text curl -s -X POST -H 'Content-Type: application/json' --data @config5.json http://localhost:8083/connectors | jq ``` 5. Enter the following command and verify that connector is started with two running tasks. ```text curl -s -X GET -H 'Content-Type: application/json' http://localhost:8083/connectors/SimpleOracleCDC_6/status | jq ``` 6. Perform `INSERT`, `UPDATE`, and `DELETE` row operations for each and verify following lists expected results: - The connector starts and has one running task. - Change event topics are created for each captured table. - The connector does not produce records for tables that were not included in regex or were explicitly excluded using the `table.exclusion.regex` property. - If the `redo.log.corruption.topic` property was configured, the connector sends corrupted records to the specified corruption topic. #### Procedure 1. If not running, start Confluent Platform. ```text confluent local start ``` 2. Create the following connector configuration JSON file and save it as `config7.json`. #### NOTE * `oracle.service.name` property specifies the service name to use when connecting to RAC * An `oracle.sid` is still required. It can be the SID of any of the database instances. ```json { "name": "SimpleOracleCDC_7", "config":{ "connector.class": "io.confluent.connect.oracle.cdc.OracleCdcSourceConnector", "name": "SimpleOracleCDC_7", "tasks.max":1, "key.converter": "io.confluent.connect.avro.AvroConverter", "key.converter.schema.registry.url": "http://localhost:8081", "value.converter": "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url": "http://localhost:8081", "confluent.topic.bootstrap.servers":"localhost:9092", "oracle.server": "", "oracle.port": 1521, "oracle.sid":"", "oracle.service.name":"", "oracle.pdb.name":"", "oracle.username": "", "oracle.password": "", "start.from":"snapshot", "redo.log.topic.name": "redo-log-topic-7", "table.inclusion.regex":"", "table.topic.name.template":"", "table.topic.name.template": "", "connection.pool.max.size": 20, "confluent.topic.replication.factor":1, "topic.creation.groups": "redo", "topic.creation.redo.include": "redo-log-topic-6", "topic.creation.redo.replication.factor": 3, "topic.creation.redo.partitions": 1, "topic.creation.redo.cleanup.policy": "delete", "topic.creation.redo.retention.ms": 1209600000, "topic.creation.default.replication.factor": 3, "topic.creation.default.partitions": 5, "topic.creation.default.cleanup.policy": "compact" } } ``` 3. Create `redo-log-topic-7`. Make sure the topic name matches the value you put for `"redo.log.topic.name"`. ```text bin/kafka-topics --create --topic redo-log-topic-7 \ --bootstrap-server broker:9092 --replication-factor 1 \ --partitions 1 --config cleanup.policy=delete \ --config retention.ms=120960000 ``` 4. Enter the following command to start the connector: ```text curl -s -X POST -H 'Content-Type: application/json' --data @config5.json http://localhost:8083/connectors | jq ``` 5. Enter the following command and verify that connector is started with two running tasks. ```text curl -s -X GET -H 'Content-Type: application/json' http://localhost:8083/connectors/SimpleOracleCDC_7/status | jq ``` 6. Perform `INSERT`, `UPDATE`, and `DELETE` row operations for each and verify following lists expected results: - The connector starts and has one running task. - Change event topics are created for each captured table. - The connector does not produce records for tables that were not included in regex or were explicitly excluded using the `table.exclusion.regex` property. - If the `redo.log.corruption.topic` property was configured, the connector sends corrupted records to the specified corruption topic. ### NUMERIC data type with no precision or scale results in unreadable output The following Oracle database table includes `ORDER_NUMBER` and `CUSTOMER_NUMBER` NUMERIC data types without precision or scale included. ```text CREATE TABLE MARIPOSA_ORDERS ( "ORDER_NUMBER" NUMBER PRIMARY KEY, "ORDER_DATE" TIMESTAMP(6) NOT NULL, "SHIPPED_DATE" TIMESTAMP(6) NOT NULL, "STATUS" VARCHAR2(50), "CUSTOMER_NUMBER" NUMBER) ``` The Oracle CDC Source connector generates the following schema in Schema Registry, when using the Avro converter and the connector property `"numeric.mapping: "best_fit_or_decimal"`: ```json { "fields": [ { "name": "ORDER_NUMBER", "type": { "connect.name": "org.apache.kafka.connect.data.Decimal", "connect.parameters": { "scale": "127" }, "connect.version": 1, "logicalType": "decimal", "precision": 64, "scale": 127, "type": "bytes" } }, { "name": "ORDER_DATE", "type": { "connect.name": "org.apache.kafka.connect.data.Timestamp", "connect.version": 1, "logicalType": "timestamp-millis", "type": "long" } }, { "name": "SHIPPED_DATE", "type": { "connect.name": "org.apache.kafka.connect.data.Timestamp", "connect.version": 1, "logicalType": "timestamp-millis", "type": "long" } }, { "default": null, "name": "STATUS", "type": [ "null", "string" ] }, { "default": null, "name": "CUSTOMER_NUMBER", "type": [ "null", { "connect.name": "org.apache.kafka.connect.data.Decimal", "connect.parameters": {"scale": "127"}, "connect.version": 1, "logicalType": "decimal", "precision": 64, "scale": 127, "type": "bytes" } ] }, ... omitted { "default": null, "name": "username", "type": [ "null", "string" ] } ], "name": "ConnectDefault", "namespace": "io.confluent.connect.avro", "type": "record" } ``` In this scenario, the resulting values for `ORDER_NUMBER` or `CUSTOMER_NUMBER` are unreadable, as shown below: ```text A\u0000\u000b\u001b8¸®æ«Îò,Rt]!\u0013_\u0018aVKæ,«1\u0010êo\u0017\u000bKðÀ\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000" ``` This is because the connector attempts to preserve accuracy (not lose any decimals) when precision and scale are not provided. As a workaround, you can set `"numeric.mapping: "best_fit_or_double"` or `"numeric.mapping: best_fit_or_string"`, or use [ksqlDB to create a new stream](https://docs.ksqldb.io/en/latest/operate-and-deploy/schema-registry-integration/#create-a-new-stream) with explicit data types (based on a Schema Registry schema). For example: ```sql CREATE STREAM ORDERS_RAW ( ORDER_NUMBER DECIMAL(9,2), ORDER_DATE TIMESTAMP, SHIPPED_DATE TIMESTAMP, STATUS VARCHAR, CUSTOMER_NUMBER DECIMAL(9,2) ) WITH ( KAFKA_TOPIC='ORCL.ADMIN.MARIPOSA_ORDERS', VALUE_FORMAT='AVRO' ); ``` When you check the stream using `SELECT * FROM ORDERS_RAW EMIT CHANGES`, you will see readable values for `ORDER_NUMBER` and `CUSTOMER_NUMBER` as shown in the following example. ```sql { "ORDER_NUMBER": 5361, "ORDER_DATE": "2020-08-06T03:41:58.000", "SHIPPED_DATE": "2020-08-11T03:41:58.000", "STATUS": "Not shipped yet", "CUSTOMER_NUMBER": 9076 } ``` ### Property-based example Create a configuration file `salesforce-cdc-source.properties` with the following content. This file should be placed inside the Confluent Platform installation directory. This configuration is used typically along with [standalone workers](/platform/current/connect/concepts.html#standalone-workers). ```none name=SalesforceCdcSourceConenctor connector.class=io.confluent.salesforce.SalesforceCdcSourceConnector tasks.max=1 kafka.topic=< Required Configuration > salesforce.consumer.key=< Required Configuration > salesforce.consumer.secret=< Required Configuration > salesforce.username=< Required Configuration > salesforce.password=< Required Configuration > salesforce.password.token=< Required Configuration > salesforce.cdc.name=< Required Configuration > salesforce.initial.start=all confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 confluent.license= ``` ### Property-based example This configuration is typically used with [standalone workers](/platform/current/connect/concepts.html#standalone-workers). This configuration overrides the record `_EventType` to perform upsert operations using an external id field named `CustomId__c`. The config ignores the field `CleanStatus` in the Kafka source record. ```text name=SalesforceSObjectSinkConnector1 connector.class=io.confluent.salesforce.SalesforceSObjectSinkConnector tasks.max=1 topics=LeadsTopic< Required Configuration > salesforce.consumer.key=< Required Configuration > salesforce.consumer.secret=< Required Configuration > salesforce.object=< Required Configuration > salesforce.password=< Required Configuration > salesforce.password.token=< Required Configuration > salesforce.push.topic.name=< Required Configuration > salesforce.username=< Required Configuration > salesforce.ignore.fields=CleanStatus salesforce.ignore.reference.fields=true salesforce.custom.id.field.name=CustomId__c salesforce.use.custom.id.field=true salesforce.sink.object.operation=upsert override.event.type=true confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 confluent.license= ``` To include your broker, change the `confluent.topic.bootstrap.servers` property address(es), and for staging or production use, change the `confluent.topic.replication.factor` to 3. When working on a downloaded Confluent development cluster, or any single broker cluster, use a `confluent.topic.replication.factor` of 1. For details about using this connector with Kafka Connect Reporter, see [Connect Reporter](/kafka-connectors/self-managed/userguide.html#userguide-connect-reporter). ## Quick Start This quick start uses the Solace Sink connector to consume records from Kafka and send them to a Solace PubSub+ Standard broker. 1. Start a Solace PubSub+ Standard docker container. ```bash docker run -d --name "solace" --hostname "solace" \ -p 8080:8080 -p 55555:55555 -p 5550:5550 \ --shm-size=1000000000 \ --tmpfs /dev/shm \ --ulimit nofile=2448:38048 \ -e username_admin_globalaccesslevel=admin \ -e username_admin_password=admin \ -e system_scaling_maxconnectioncount=100 \ solace/solace-pubsub-standard:9.1.0.77 ``` 2. Install the connector through the [Confluent Hub Client](https://docs.confluent.io/current/connect/managing/confluent-hub/client.html). ```bash # run from your CP installation directory confluent connect plugin install confluentinc/kafka-connect-solace-sink:latest ``` 3. [Install the Solace JMS Client Library](#installing-solace-client-jar). 4. Start the Confluent Platform. ```bash confluent local start ``` 5. [Produce](https://docs.confluent.io/current/cli/command-reference/confluent-produce.html) test data to the `sink-messages` topic in Kafka. ```bash seq 10 | confluent local produce sink-messages ``` 6. Create a `solace-sink.json` file with the following contents: ```json { "name": "SolaceSinkConnector", "config": { "connector.class": "io.confluent.connect.jms.SolaceSinkConnector", "tasks.max": "1", "topics": "sink-messages", "solace.host": "smf://localhost:55555", "solace.username": "admin", "solace.password": "admin", "solace.dynamic.durables": "true", "jms.destination.type": "queue", "jms.destination.name": "connector-quickstart", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "value.converter": "org.apache.kafka.connect.storage.StringConverter", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1" } } ``` 7. Load the Solace Sink connector. ```bash confluent local load solace --config solace-sink.json ``` 8. Confirm the connector is in a `RUNNING` state. ```bash confluent local status solace ``` 9. Navigate to the [Solace UI](http://localhost:8080) to confirm the messages were delivered to the `connector-quickstart` queue in the `default` Message VPN. ## Quick Start This quick start uses the Splunk Source connector to receive application data ingest it into Kafka. 1. Install the connector using the [Confluent Hub Client](https://docs.confluent.io/current/connect/managing/confluent-hub/client.html). ```text # run from your CP installation directory confluent connect plugin install confluentinc/kafka-connect-splunk-source:latest ``` 2. Start the Confluent Platform. ```bash confluent local start ``` 3. Create a `splunk-source.properties` file with the following contents: ```text name=splunk-source kafka.topic=splunk-source tasks.max=1 connector.class=io.confluent.connect.SplunkHttpSourceConnector splunk.collector.index.default=default-index splunk.port=8889 splunk.ssl.key.store.path=/path/to/your/keystore.jks splunk.ssl.key.store.password= confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 ``` 4. Load the Splunk Source connector. ```bash confluent local load splunk-source --config splunk-source.properties ``` #### IMPORTANT Don’t use the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/index.html) in production environments. 5. Confirm the connector is in a `RUNNING` state. ```bash confluent local status splunk-source ``` 6. Simulate an application sending data to the connector. ```bash curl -k -X POST https://localhost:8889/services/collector/event -d '{"event":"from curl"}' ``` 7. Verify the data was ingested into the Kafka topic. ```text kafka-avro-console-consumer --bootstrap-server localhost:9092 --topic splunk-source --from-beginning ``` 8. Shut down Confluent Platform. ```bash confluent local destroy ``` ## Quick start This quick start uses the TIBCO Sink connector to consume records from Kafka and send them to TIBCO Enterprise Message Service™ - Community Edition. 1. Download TIBCO Enterprise Message Service™ - Community Edition ([Mac](https://www.tibco.com/resources/product-download/tibco-enterprise-message-service-community-edition-free-download-mac) or [Linux](https://www.tibco.com/resources/product-download/tibco-enterprise-message-service-community-edition-free-download-linux)) and run the appropriate installer. See the [TIBCO Enterprise Message Service™ Installation Guide](https://docs.tibco.com/pub/ems-zlinux/8.5.0/doc/pdf/TIB_ems_8.5_installation.pdf) for more details. Similar documentation is available for each version of TIBCO EMS. 2. Install the connector through the [Confluent Hub Client](https://docs.confluent.io/current/connect/managing/confluent-hub/client.html). ```bash # run from your CP installation directory confluent connect plugin install confluentinc/kafka-connect-tibco-sink:latest ``` 3. [Install the TIBCO JMS Client Library](#installing-tibco-client-jar). 4. Start Confluent Platform. ```bash confluent local start ``` 5. [Produce](https://docs.confluent.io/current/cli/command-reference/confluent-produce.html) test data to the `sink-messages` topic in Kafka. ```bash seq 10 | confluent local produce sink-messages ``` 6. Create a `tibco-sink.json` file with the following contents: ```json { "name": "TibcoSinkConnector", "config": { "connector.class": "io.confluent.connect.jms.TibcoSinkConnector", "tasks.max": "1", "topics": "sink-messages", "tibco.url": "tcp://localhost:7222", "tibco.username": "admin", "tibco.password": "", "jms.destination.type": "queue", "jms.destination.name": "connector-quickstart", "key.converter": "org.apache.kafka.connect.storage.StringConverter", "value.converter": "org.apache.kafka.connect.storage.StringConverter", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1" } } ``` 7. Load the TIBCO Sink connector. ```bash confluent local load tibco --config tibco-sink.json ``` 8. Confirm that the connector is in a `RUNNING` state. ```bash confluent local status tibco ``` 9. Confirm the messages were delivered to the `connector-quickstart` queue in TIBCO. ```bash # open TIBCO admin tool (password is empty) tibco/ems/8.4/bin/tibemsadmin -server "tcp://localhost:7222" -user admin > show queue connector-quickstart ``` ### Deploy CFK with cluster object deletion protection Confluent for Kubernetes (CFK) provides validating admission webhooks for deletion events of the Confluent Platform clusters. CFK webhooks are disabled by default in this release of CFK. CFK provides the following webhooks: * **Webhook to prevent component deletion when its persistent volume (PV) reclaim policy is set to Delete** This webhook (`cfk-resources.webhooks.platform.confluent.io`) blocks deletion requests on CRs with PVs in `ReclaimPolicy: Delete`. Without this prevention, a CR deletion will result in the deletion of those PVs and data loss. This webhook only applies to the components that have persistent volumes, namely, ZooKeeper (Confluent Platform 7.9 or earlier only), Kafka, ksqlDB, and Control Center (Legacy). In addition to blocking deletion as described above, this webhook blocks updates to prevent manual modifications during ZooKeeper to KRaft migration when the CR has the `platform.confluent.io/kraft-migration-cr-lock: "true"` annotation set. This webhook does not block normal CR updates (outside of KRaft migration). * **Webhook to prevent CFK StatefulSet deletion** The proper way to delete Confluent Platform resources is to delete the component custom resource (CR) as CFK watches those deletion events and properly cleans everything up. Deletion of StatefulSets can result in unintended PV deletion and data loss. This webhook (`core-resources.webhooks.platform.confluent.io`) blocks delete requests on CFK StatefulSets. * **Webhook to prevent unsafe Kafka pod deletion** This webhook (`kafka-pods.webhooks.platform.confluent.io`) blocks Kafka pod deletion when the removal of a broker will result in fewer in-sync replicas than configured in the `min.insync.replicas` Kafka property. Dropping below that value can result in data loss. Pod deletion can happen during Kubernetes maintenance without warning, such as during node replacement, and this webhook is an additional safeguard for your Kafka data. Review the following when using this webhook: * This webhook is only supported on clusters with fewer than 140,000 partitions. * This webhook does not take the Kafka setting, [minimum in-sync replicas](https://docs.confluent.io/platform/current/installation/configuration/broker-configs.html#brokerconfigs_min.insync.replicas) (`min.insync.replicas`), into consideration. The minimum in-sync replicas setting on all topics is assumed to be `2` for Kafka with 3 or more replicas. Do not create topics with minimum in-sync replicas set to 1. * To avoid having an internal ksqlDB topic with min in-sync replicas set to 1, set the ksqlDB internal topic replicas setting to `3` using `configOverrides` in the ksqlDB CR: ```yaml spec: configOverrides: server: - ksql.internal.topic.replicas=3 ``` * **Webhook to prevent unsafe pod eviction** This webhook (`evictions.webhooks.platform.confluent.io`) follows the same logic as the pod deletion webhook described above and prevents the creation of the pod eviction object which results in the draining of the pod nodes. ## Create a cluster link Create a cluster link by using a new ClusterLink custom resource (CR) and apply the CR with the `kubectl apply -f ` command. To create a destination-initiated cluster link : Create a cluster link on the destination cluster and configure authentication and encryption for the source cluster. To create a source-initiated cluster link : Create two cluster links with the same cluster link names. `mirrorTopics`, `mirrorTopicOptions`, `aclFilter`, `consumerGroupFilters` and other cluster link configs must be defined in the ClusterLink CR on **the destination cluster**, for both destination-initiated and source-initiated cluster links. ```yaml kind: ClusterLink metadata: name: clusterlink --- [1] namespace: --- [2] spec: name: --- [3] sourceInitiatedLink: --- [4] linkMode: --- [5] destinationKafkaCluster: --- [6] sourceKafkaCluster: --- [7] consumerGroupFilters: --- [8] aclFilters: --- [9] configs: --- [10] mirrorTopics: --- [11] mirrorTopicOptions: --- [12] ``` * [1] Required. The name of the ClusterLink CR. * [2] Optional. The namespace of the ClusterLink CR. If omitted, the same namespace as this CR is assumed. * [3] Optional. The name of the cluster link. If not defined `metadata.name` ([1]) is used. * [4] Optional. Configure if this cluster link is a source-initiated cluster link. * [5] Required under `sourceInitiatedLink`. Specify whether this source-initiated cluster link CR is on the source cluster or on the destination cluster. Valid values are `Source` and `Destination`. * [6] Required. The information about the destination cluster. See [Configure the destination-initiated cluster link](#co-clusterlink-destination-initiated-connection) and [Configure the source-initiated cluster link on the destination cluster](#co-clusterlink-source-initiated-connection-destination-mode). * [7] Required. The information about the source cluster. See [Configure the destination-initiated cluster link](#co-clusterlink-destination-initiated-connection) and [Configure the source-initiated cluster link on the source cluster](#co-clusterlink-source-initiated-connection-source-mode). * [8] Optional. An array of consumer groups to be migrated from the source cluster to the destination cluster. See [Define consumer group filters](#co-clusterlink-consumer-group-filters). * [9] Optional. An array of Access Control Lists (ACLs) to be migrated from the source cluster to the destination cluster. See [Define ACL filters](#co-clusterlink-acl-filters). * [10] Optional. A map of additional configurations for creating a cluster link. For example: ```yaml spec: configs: connections.max.idle.ms: "620000" cluster.link.retry.timeout.ms: "200000" ``` This setting can be in all types and modes or ClusterLink CRs. For the list of optional configurations, see [Cluster Linking config options](https://docs.confluent.io/platform/current/multi-dc-deployments/cluster-linking/configs.html#configuration-options). * [11] Optional. An array of mirror topics. See [Create a mirror topic](#co-create-mirror-topic). * [12] Optional. Configuration options for mirror topics. See [Configure mirror topic options](#co-clusterlink-mirror-topic-options). # Manage Password Encoder Secrets for Confluent Platform Using Confluent for Kubernetes To encrypt sensitive configuration information in Confluent for Kubernetes (CFK), such as passwords for SASL/PLAIN or TLS, you define a password encoder in your custom resource (CR). The use cases for the feature include the following: * For destination-initiated (default) Kafka Cluster Linking, the destination Kafka cluster needs to set a password encoder secret and use it to encrypt the sensitive authentication and TLS information of the source cluster. For source-initiated (default) Kafka Cluster Linking, the source Kafka cluster needs to set a password encoder secret and use it to encrypt the sensitive authentication and TLS information of the destination cluster. * For Schema Linking, a password encoder secret needs to be configured in the source Schema Registry cluster. For details about password encoder secret, see [Kafka Broker Configuration](https://docs.confluent.io/platform/current/installation/configuration/broker-configs.html#brokerconfigs_password.encoder.secret). To specify a password encoder secret: 1. Create the `password-encoder.txt` file with the following content: ```text password= oldPassword= ``` `oldPassword` is only required for password rotations. 2. Store the secret for the password encoder, using either a Kubernetes secret or the directory path in the container feature. * To use a Kubernetes secret, create a Kubernetes secret using the file created in the previous step: The expected key (the file name) is `password-encoder.txt`. For example: ```bash kubectl create secret generic myencodersecret \ --from-file=password-encoder.txt=$MY_PATH/password-encoder.txt ``` * To use the directory path in the container feature, copy the `password-encoder.txt` file to the container path. 3. In the Kafka or Schema Registry CR, specify the secret created in the previous step: ```yaml spec: passwordEncoder: secretRef: --- [1] directoryPathInContainer: --- [2] ``` * If `spec.passwordEncoder` is defined, either [1] or [2] is required. * [1] The secret for the password encoder. * [2] The path in the container where the `password-encoder.txt` file exists. See [Provide secrets for Confluent Platform component CR](co-credentials.md#co-vault-category-1) for providing the secret and required annotations when using Vault. 4. Apply the CR changes using the `kubectl apply` command. The cluster will automatically restart. ### Issue: Use different authentication for internal and external listeners When you set up SASL/PLAIN and SASL/PLAIN with LDAP listeners in the Kafka CR, you will get the following error: ```text WARN [Producer clientId=producer-client] Connection to node -1 (/:9092) terminated during authentication. This may happen due to any of the following reasons: (1) Authentication failed due to invalid credentials with brokers older than 1.0.0, (2) Firewall blocking Kafka TLS traffic (eg it may only allow HTTPS traffic), (3) Transient network issue. (org.apache.kafka.clients.NetworkClient) ``` **Workaround:** To implement both a SASL/PLAIN listener and a SASL/PLAIN with LDAP listener in the Kafka cluster, the SASL/PLAIN listener must be configured with `authentication.jaasConfigPassThrough`. Following is the example configuration that can be used to set up an internal listener with SASL/PLAIN and an external listener with SASL/PLAIN with LDAP: **Step 1:** Create a file, `creds-kafka-sasl-users.conf` with the following content. ```bash sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \ username="kafka" \ password="kafka-secret" \ user_kafka="kafka-secret"; ``` **Step 2:** Create a secret credential. ```bash kubectl create secret generic credential \ --from-file=plain-jaas.conf=./creds-kafka-sasl-users.conf \ --namespace confluent ``` **Step 3:** Specify the JAAS configuration passthrough in the Kafka CR and apply the change with the `kubectl apply -f` command. ```yaml kind: Kafka spec: listeners: internal: authentication: type: plain jaasConfigPassThrough: secretRef: credential tls: enabled: true external: authentication: type: ldap tls: enabled: true ``` ### GCS To enable Tiered Storage on Google Cloud Platform (GCP) with Google Cloud Storage (GCS): 1. To enable Tiered Storage, add the following properties in your `broker.properties` file: ```properties confluent.tier.feature=true confluent.tier.enable=true confluent.tier.backend=GCS confluent.tier.gcs.bucket= confluent.tier.gcs.region= # confluent.tier.metadata.replication.factor=1 ``` Adding the above properties enables the Tiered Storage components on GCS with default parameters on all of the possible configurations. - `confluent.tier.feature` enables Tiered Storage for a broker. Setting this to `true` allows a broker to utilize Tiered Storage. - `confluent.tier.enable` sets the default value for created topics. Setting this to `true` causes all non-compacted topics to be tiered. When set to `true`, this causes all existing, non-compacted topics to have this configuration set to `true` as well. Only topics explicitly set to `false` do not use tiered storage. It is not required to set `confluent.tier.enable=true` to enable Tiered Storage. - `confluent.tier.backend` refers to the cloud storage service a broker connects to. For Google Cloud Storage, set this to `GCS` as shown above. - `BUCKET_NAME` and `REGION` are the S3 bucket name and its region, respectively. A broker interacts with this bucket for writing and reading tiered data. For example, a bucket named `tiered-storage-test-gcs` located in the `us-central1` region would have these properties: ```properties confluent.tier.gcs.bucket=tiered-storage-test-gcs confluent.tier.gcs.region=us-central1 ``` 2. Provide [GCS credentials](https://cloud.google.com/docs/authentication/getting-started) to connect to the GCS bucket. You can set these through `broker.properties` or through environment variables. Either method is sufficient. The brokers prioritize using the credentials supplied through `broker.properties`. If the brokers do not find credentials in `broker.properties`, they use environment variables instead. - **Broker Properties** - Add the following property to your `broker.properties` file: ```properties confluent.tier.gcs.cred.file.path= ``` This field is hidden from the server log files. - **Environment Variables** - Specify GCS credentials with this local environment variable: ```properties export GOOGLE_APPLICATION_CREDENTIALS= ``` If `broker.properties` does not contain the property with the path to the credentials file, the broker will use the above environment variable to connect to the GCS bucket. See the [GCS documentation](https://cloud.google.com/docs/authentication) for more information. 3. The GCS bucket should allow the broker to perform the following actions. These operations are required by the broker to properly enable and use Tiered Storage. ```properties storage.buckets.get storage.objects.get storage.objects.list storage.objects.create storage.objects.delete storage.objects.update ``` Troubleshooting Certificates : If the brokers fail to start due to Tiered Storage errors such as inability to access buckets and security certificate issues, make sure that you have the needed Google CA certificate(s). To troubleshoot: 1. Go to [Google Trust Services repository](https://pki.goog/repository/), scroll down to the section **Download CA certificates**, and click **Expand all**. 2. Choose a certificate suitable for your cluster (for example, **GlobalSign R4**) that is currently valid (not yet expired), click the **Action** drop-down next to it, and download the Certificate (PEM) file to all the brokers in the cluster. 3. Import the certificate by running the following command: ```bash keytool -import -trustcacerts -keystore /var/ssl/private/kafka_broker.truststore.jks -alias root -file ``` #### Create a Dead Letter Queue topic To create a DLQ, add the following configuration properties to your sink connector configuration: ```bash errors.tolerance = all errors.deadletterqueue.topic.name = ``` The following example shows a GCS Sink connector configuration with DLQ enabled: ```bash { "name": "gcs-sink-01", "config": { "connector.class": "io.confluent.connect.gcs.GcsSinkConnector", "tasks.max": "1", "topics": "gcs_topic", "gcs.bucket.name": "", "gcs.part.size": "5242880", "flush.size": "3", "storage.class": "io.confluent.connect.gcs.storage.GcsStorage", "format.class": "io.confluent.connect.gcs.format.avro.AvroFormat", "partitioner.class": "io.confluent.connect.storage.partitioner.DefaultPartitioner", "value.converter": "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url": "http://localhost:8081", "schema.compatibility": "NONE", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1", "errors.tolerance": "all", "errors.deadletterqueue.topic.name": "dlq-gcs-sink-01" } } ``` Even if the DLQ topic contains the records that failed, it does not show why. You can add the following configuration property to include failed record header information. ```bash errors.deadletterqueue.context.headers.enable=true ``` Record headers are added to the DLQ when `errors.deadletterqueue.context.headers.enable` parameter is set to `true`–the default is `false`. You can then use the [kafkacat](../tools/kafkacat-usage.md#kafkacat-usage) to view the record header and determine why the record failed. Errors are also sent to [Connect Reporter](/kafka-connectors/self-managed/userguide.html#userguide-connect-reporter). To avoid conflicts with the original record header, the DLQ context header keys start with `_connect.errors`. Here is the same example configuration with headers enabled: ```bash { "name": "gcs-sink-01", "config": { "connector.class": "io.confluent.connect.gcs.GcsSinkConnector", "tasks.max": "1", "topics": "gcs_topic", "gcs.bucket.name": "", "gcs.part.size": "5242880", "flush.size": "3", "storage.class": "io.confluent.connect.gcs.storage.GcsStorage", "format.class": "io.confluent.connect.gcs.format.avro.AvroFormat", "partitioner.class": "io.confluent.connect.storage.partitioner.DefaultPartitioner", "value.converter": "io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url": "http://localhost:8081", "schema.compatibility": "NONE", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1", "errors.tolerance": "all", "errors.deadletterqueue.topic.name": "dlq-gcs-sink-01", "errors.deadletterqueue.context.headers.enable":true } } ``` For more information about DLQs, see [Kafka Connect Deep Dive – Error Handling and Dead Letter Queues](https://www.confluent.io/blog/kafka-connect-deep-dive-error-handling-dead-letter-queues/). ## Requirements To use CSFLE in Confluent Platform with self-managed connectors, you must meet the following requirements: - An installation of Confluent Enterprise 8.0 and later with the CSFLE Add-On enabled. - Ensure Schema Registry is configured with the following properties before it starts: ```shell resource.extension.class=io.confluent.kafka.schemaregistry.rulehandler.RuleSetResourceExtension,io.confluent.dekregistry.DekRegistryResourceExtension confluent.license= confluent.license.addon.csfle= ``` #### NOTE The value for `confluent.license.addon.csfle` is the same as your main `confluent.license` key. - An external KMS to manage your Key Encryption Keys (KEKs). For more information, see [Manage KEKs](../security/protect-data/csfle/manage-keys.md#manage-keks-csfle). - The [KMS provider](/platform/current/security/protect-data/csfle/quick-start.html#step-1-configure-the-kms-provider) must be configured for the connector. - A Kafka topic to use as a data source or destination. #### IMPORTANT The Confluent CLI [confluent local](https://docs.confluent.io/confluent-cli/current/command-reference/local/index.html) commands are intended for a single-node development environment and are not suitable for a production environment. The data that are produced are transient and are intended to be temporary. For production-ready workflows, see [Install and Upgrade Confluent Platform](../installation/index.md#installation-overview). Every service will start in order, printing a message with its status: ```bash Starting KRaft Controller KRaft Controller is [UP] Starting Kafka Kafka is [UP] Starting Schema Registry Schema Registry is [UP] Starting Kafka REST Kafka REST is [UP] Starting Connect Connect is [UP] Starting ksqlDB Server ksqlDB Server is [UP] ``` #### NOTE For instructions on getting your actual cluster IDs, refer to [Cluster Identifiers in Confluent Platform](../../security/authorization/rbac/rbac-get-cluster-ids.md#rbac-get-cluster-ids). 1. Enter the following Confluent CLI command to give an example service principal named `$CONNECT_USER` the role `SecurityAdmin` on the Connect cluster. The example cluster IDs `$CONNECT_CLUSTER` for Connect and `$KAFKA_CLUSTER` for Kafka are used in the command and all subsequent command examples. ```none confluent iam rbac role-binding create \ --principal User:$CONNECT_USER \ --role SecurityAdmin \ --kafka-cluster $KAFKA_CLUSTER \ --connect-cluster-id $CONNECT_CLUSTER ``` 2. Enter the following command to give `$CONNECT_USER` the role `ResourceOwner` on the group that Connect workers use to coordinate with other workers. ```none confluent iam rbac role-binding create \ --principal User:$CONNECT_USER \ --role ResourceOwner \ --resource Group:$CONNECT_CLUSTER \ --kafka-cluster $KAFKA_CLUSTER ``` 3. Enter the following commands to give `$CONNECT_USER` the role `ResourceOwner` on the internal Kafka topics used by Connect to store configuration, status, and offset information. Internal configuration topic `$CONFIGS_TOPIC`, internal offsets topic `$OFFSETS_TOPIC`, and status topic `$STATUS_TOPIC` are used in the examples. #### NOTE The configuration topics `config.storage.topic`, `offset.storage.topic`, and `status.storage.topic` are where the internal configuration, offset configuration, and status configuration data are stored. These are set for Confluent Platform to `connect-configs`, `connect-offsets`, and `connect-status` by default. ```none confluent iam rbac role-binding create \ --principal User:$CONNECT_USER \ --role ResourceOwner \ --resource Topic:$CONFIGS_TOPIC \ --kafka-cluster $KAFKA_CLUSTER ``` ```none confluent iam rbac role-binding create \ --principal User:$CONNECT_USER \ --role ResourceOwner \ --resource Topic:$OFFSETS_TOPIC \ --kafka-cluster $KAFKA_CLUSTER ``` ```none confluent iam rbac role-binding create \ --principal User:$CONNECT_USER \ --role ResourceOwner \ --resource Topic:$STATUS_TOPIC \ --kafka-cluster $KAFKA_CLUSTER ``` #### IMPORTANT By default the Kafka worker uses the same settings and the same principal for reading and writing to the `_confluent-command` license topic that it uses to read and write to internal topics. For more information about Connect licensing, see [Licensing Connectors](/kafka-connectors/self-managed/license.html). If you have configured a Connect [Secret Registry](connect-rbac-secret-registry.md#connect-rbac-secret-registry), you must complete two additional steps. 1. Enter the following command to give `$CONNECT_USER` the role `ResourceOwner` on the group that secret registry nodes use to coordinate with other nodes. Secret Registry group ID `$SECRET_REGISTRY_GROUP` is used in the example. Note that the actual value of `$SECRET_REGISTRY_GROUP` needs to match the value of `config.providers.secret.param.secret.registry.group.id` in the Connect worker properties. This value defaults to `secret-registry` if not specified in the Connect worker properties. ```none confluent iam rbac role-binding create \ --principal User:$CONNECT_USER \ --role ResourceOwner \ --resource Group:$SECRET_REGISTRY_GROUP \ --kafka-cluster $KAFKA_CLUSTER ``` 2. Enter the following command to give `$CONNECT_USER` the role `ResourceOwner` on the Kafka topic used to store secrets. Kafka secrets topic `$SECRETS_TOPIC` is used in the example. Note that the actual value of `$SECRETS_TOPIC` needs to match the value of `config.providers.secret.param.kafkastore.topic` in the Connect worker properties. This value defaults to `_confluent-secrets` if not specified in the Connect worker properties. #### WARNING The default value for the secrets topic changed from `_secrets` to `_confluent-secrets` in version 5.4. If your Secret Registry cluster is not configured with a `kafkastore.topic` property, explicitly set it to `_secrets` before upgrading to 5.4, to avoid losing existing secrets. ```none confluent iam rbac role-binding create \ --principal User:$CONNECT_USER \ --role ResourceOwner \ --resource Topic:$SECRETS_TOPIC \ --kafka-cluster $KAFKA_CLUSTER ``` ## Connector role bindings Use the following steps to configure role bindings for the connector: `User:$CONNECTOR_USER`. 1. Grant principal `User:$CONNECTOR_USER` the `ResourceOwner` role to `Topic:$DATA_TOPIC`. ```none confluent iam rbac role-binding create \ --principal User:$CONNECTOR_USER \ --role ResourceOwner \ --resource Topic:$DATA_TOPIC \ --kafka-cluster $KAFKA_CLUSTER_ID ``` The following step is only required if using **Schema Registry**. 2. Grant principal `User:$CONNECTOR_USER` the `ResourceOwner` role to `Subject:$(DATA_TOPIC)-value`. ```none confluent iam rbac role-binding create \ --principal User:$CONNECTOR_USER \ --role ResourceOwner \ --resource Subject:$(DATA_TOPIC)-value \ --kafka-cluster $KAFKA_CLUSTER_ID \ --schema-registry-cluster $SCHEMA_REGISTRY_CLUSTER_ID ``` The following step is only required for **Sink** connectors. 3. Grant principal `User:$CONNECTOR_USER` the `DeveloperRead` role to the consumer group `Group:$connect-`. ```none confluent iam rbac role-binding create \ --principal User:$CONNECTOR_USER \ --role DeveloperRead \ --resource Group:$connect- \ --prefix \ --kafka-cluster $KAFKA_CLUSTER_ID ``` 4. List the role bindings for the principal `User:$CONNECTOR_USER` to the Connect cluster. ```none confluent iam rbac role-binding list \ --principal User:$CONNECTOR_USER \ --kafka-cluster $KAFKA_CLUSTER_ID \ --connect-cluster-id $CONNECT_CLUSTER_ID ``` #### IMPORTANT - The Kafka Connect framework does not allow you to unset or set `null` for producer or consumer configuration properties. Instead, try to set the default callback handler at the connector level using the following configuration property: ```properties producer.override.sasl.login.callback.handler.class=org.apache.kafka.common.security.authenticator.AbstractLogin$DefaultLoginCallbackHandler ``` - For source connectors whose destination clusters uses SCRAM SASL mechanism, the default callback handler should not be set at connector level. Instead, set the producer configurations on the Kafka Connect framework. In such cases, change the distributed worker to point to the appropriate producer settings, for example: ```bash producer.bootstrap.servers"=x:9096,y:9096,z:9096 producer.retry.backoff.ms= 500, producer.security.protocol=SASL_SSL producer.sasl.mechanism=SCRAM-SHA-512 producer.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username=\"######\" password=\"#####\";", producer.ssl.truststore.location = /var/ssl/private/kafka_connect.truststore.jks producer.ssl.truststore.password = ${securepass:/var/ssl/private/kafka-connect-security.properties:connect-distributed.properties/producer.ssl.truststore.password} ``` To enable per-connector configuration properties and override the default worker properties, add the following `connector.client.config.override.policy` configuration parameter to the worker properties file. `connector.client.config.override.policy` : The class name or alias implementation of ConnectorClientConfigOverridePolicy. This defines configurations that can be overridden by the connector. The default implementation is `All`. Other possible policies are `None` and `Principal`. * Type: string * Default: All * Valid Values: [All, None, Principal] * Importance: medium When `connector.client.config.override.policy=All`, each connector that belongs to the worker is allowed to override the worker configuration. This is implemented by adding one of the following override prefixes to the source and sink connector configurations: * `producer.override.` * `consumer.override.` # Authentication settings for Connect workers ssl.keystore.location=/var/private/ssl/kafka.worker.keystore.jks ssl.keystore.password=worker1234 ssl.key.password=worker1234 ``` Connect workers manage the producers used by source connectors and the consumers used by sink connectors. So, for the connectors to leverage security, you also have to override the default producer/consumer configuration that the worker uses. ```bash ### Reporter and Kerberos security The following configuration example shows a sink connector with all the necessary configuration properties for Reporter and Kerberos security. This example shows the [Prometheus Metrics Sink Connector for Confluent Platform](https://docs.confluent.io/kafka-connectors/prometheus-metrics/current/index.html), but can be modified for any applicable sink connector. ```json { "name" : "prometheus-connector", "config" : { "topics":"prediction-metrics", "connector.class" : "io.confluent.connect.prometheus.PrometheusMetricsSinkConnector", "tasks.max" : "1", "confluent.topic.bootstrap.servers":"localhost:9092", "confluent.topic.ssl.truststore.location":"/etc/pki/hadoop/kafkalab.jks", "confluent.topic.ssl.truststore.password":"xxxx", "confluent.topic.ssl.keystore.location":"/etc/pki/hadoop/kafkalab.jks", "confluent.topic.ssl.keystore.password":"xxxx", "confluent.topic.ssl.key.password":"xxxx", "confluent.topic.security.protocol":"SASL_SSL", "confluent.topic.replication.factor": "3", "confluent.topic.sasl.kerberos.service.name":"kafka", "confluent.topic.sasl.jaas.config":"com.sun.security.auth.module.Krb5LoginModule required \nuseKeyTab=true \nstoreKey=true \nkeyTab=\"/etc/security/keytabs/svc.kfkconnect.lab.keytab\" \nprincipal=\"svc.kfkconnect.lab@DS.DTVENG.NET\";", "prometheus.scrape.url": "http://localhost:8889/metrics", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable": "false", "behavior.on.error": "LOG", "reporter.result.topic.replication.factor": "3", "reporter.error.topic.replication.factor": "3", "reporter.bootstrap.servers":"localhost:9092", "reporter.producer.ssl.truststore.location":"/etc/pki/hadoop/kafkalab.jks", "reporter.producer.ssl.truststore.password":"xxxx", "reporter.producer.ssl.keystore.location":"/etc/pki/hadoop/kafkalab.jks", "reporter.producer.ssl.keystore.password":"xxxx", "reporter.producer.ssl.key.password":"xxxx", "reporter.producer.security.protocol":"SASL_SSL", "reporter.producer.sasl.kerberos.service.name":"kafka", "reporter.producer.sasl.jaas.config":"com.sun.security.auth.module.Krb5LoginModule required \nuseKeyTab=true \nstoreKey=true \nkeyTab=\"/etc/security/keytabs/svc.kfkconnect.lab.keytab\" \nprincipal=\"svc.kfkconnect.lab@DS.DTVENG.NET\";", "reporter.admin.ssl.truststore.location":"/etc/pki/hadoop/kafkalab.jks", "reporter.admin.ssl.truststore.password":"xxxx", "reporter.admin.ssl.keystore.location":"/etc/pki/hadoop/kafkalab.jks", "reporter.admin.ssl.keystore.password":"xxxx", "reporter.admin.ssl.key.password":"xxxx", "reporter.admin.security.protocol":"SASL_SSL", "reporter.admin.sasl.kerberos.service.name":"kafka", "reporter.admin.sasl.jaas.config":"com.sun.security.auth.module.Krb5LoginModule required \nuseKeyTab=true \nstoreKey=true \nkeyTab=\"/etc/security/keytabs/svc.kfkconnect.lab.keytab\" \nprincipal=\"svc.kfkconnect.lab@DS.DTVENG.NET\";", "confluent.license":"eyJ0eXAiOiJK ...omitted" } ``` ## Step 1: Download and start Confluent Platform In this step, you start by cloning a GitHub repository. This repository contains a Docker compose file and some required configuration files. The `docker-compose.yml` file sets ports and Docker environment variables such as the replication factor and listener properties for Confluent Platform and its components. To learn more about the settings in this file, see [Docker Image Configuration Reference for Confluent Platform](../installation/docker/config-reference.md#config-reference). 1. Clone the [Confluent Platform all-in-one example repository](https://github.com/confluentinc/cp-all-in-one/tree/latest/cp-all-in-one), for example: ```bash git clone https://github.com/confluentinc/cp-all-in-one.git ``` 2. Change to the cloned repository’s root directory: ```bash cd cp-all-in-one ``` 3. The default branch may not be the latest. Check out the branch for the version you want to run, for example, 8.1.0-post: ```bash git checkout 8.1.0-post ``` 4. The `docker-compose.yml` file is located in a nested directory. Navigate into the following directory: ```bash cd cp-all-in-one ``` 5. Start the Confluent Platform stack with the `-d` option to run in detached mode: ```bash docker compose up -d ``` #### NOTE If you using an Docker Compose V1, you need to use a dash in the `docker compose` commands. For example: ```bash docker-compose up -d ``` To learn more, see [Migrate to Compose V2](https://docs.docker.com/compose/releases/migrate/). Each Confluent Platform component starts in a separate container. Your output should resemble the following. Your output may vary slightly from these examples depending on your operating system. ```bash ✔ Network cp-all-in-one_default Created 0.0s ✔ Container flink-jobmanager Started 0.5s ✔ Container broker Started 0.5s ✔ Container prometheus Started 0.5s ✔ Container flink-taskmanager Started 0.5s ✔ Container flink-sql-client Started 0.5s ✔ Container alertmanager Started 0.5s ✔ Container schema-registry Started 0.5s ✔ Container connect Started 0.6s ✔ Container rest-proxy Started 0.6s ✔ Container ksqldb-server Started 0.6s ✔ Container control-center Started 0.7s ``` 6. Verify that the services are up and running: ```bash docker compose ps ``` Your output should resemble: ```none NAME IMAGE COMMAND SERVICE CREATED STATUS PORTS alertmanager confluentinc/cp-enterprise-alertmanager:2.2.1 "alertmanager-start" alertmanager 8 minutes ago Up 8 minutes 0.0.0.0:9093->9093/tcp, [::]:9093->9093/tcp broker confluentinc/cp-server:8.1.0 "/etc/confluent/dock…" broker 8 minutes ago Up 8 minutes 0.0.0.0:9092->9092/tcp, [::]:9092->9092/tcp, 0.0.0.0:9101->9101/tcp, [::]:9101->9101/tcp connect cnfldemos/cp-server-connect-datagen:0.6.4-7.6.0 "/etc/confluent/dock…" connect 8 minutes ago Up 8 minutes 0.0.0.0:8083->8083/tcp, [::]:8083->8083/tcp control-center confluentinc/cp-enterprise-control-center-next-gen:2.2.1 "/etc/confluent/dock…" control-center 8 minutes ago Up 8 minutes 0.0.0.0:9021->9021/tcp, [::]:9021->9021/tcp flink-jobmanager cnfldemos/flink-kafka:1.19.1-scala_2.12-java17 "/docker-entrypoint.…" flink-jobmanager 8 minutes ago Up 8 minutes 0.0.0.0:9081->9081/tcp, [::]:9081->9081/tcp flink-sql-client cnfldemos/flink-sql-client-kafka:1.19.1-scala_2.12-java17 "/docker-entrypoint.…" flink-sql-client 8 minutes ago Up 8 minutes 6123/tcp, 8081/tcp flink-taskmanager cnfldemos/flink-kafka:1.19.1-scala_2.12-java17 "/docker-entrypoint.…" flink-taskmanager 8 minutes ago Up 8 minutes 6123/tcp, 8081/tcp ksqldb-server confluentinc/cp-ksqldb-server:8.1.0 "/etc/confluent/dock…" ksqldb-server 8 minutes ago Up 8 minutes 0.0.0.0:8088->8088/tcp, [::]:8088->8088/tcp prometheus confluentinc/cp-enterprise-prometheus:2.2.1 "prometheus-start" prometheus 8 minutes ago Up 8 minutes 0.0.0.0:9090->9090/tcp, [::]:9090->9090/tcp rest-proxy confluentinc/cp-kafka-rest:8.1.0 "/etc/confluent/dock…" rest-proxy 8 minutes ago Up 8 minutes 0.0.0.0:8082->8082/tcp, [::]:8082->8082/tcp schema-registry confluentinc/cp-schema-registry:8.1.0 "/etc/confluent/dock…" schema-registry 8 minutes ago Up 8 minutes 0.0.0.0:8081->8081/tcp, [::]:8081->8081/tcp ``` After a few minutes, if the state of any component isn’t **Up**, run the `docker compose up -d` command again, or try `docker compose restart `, for example: ```bash docker compose restart control-center ``` ### Kafka For Kafka in KRaft mode, you must configure a node to be a broker or a controller. In addition, you must create a unique cluster ID and format the log directories with that ID. Typically in a production environment, you should have a minimum of three brokers and three controllers. * Navigate to the KRaft configuration files located in the `/etc/kafka/` directory. In this directory, you will find three sample property files for different node roles: - `broker.properties`: Use this file to configure a broker node. - `controller.properties`: Use this file to configure a controller node. - `server.properties`: Use this file to configure a node that runs in combined mode as both a broker and a controller. This mode is not supported for production environments. Choose the appropriate properties file for the node’s role in your KRaft cluster and then customize the settings in that file. * Configure the `process.roles`, `node.id` and `controller.quorum.voters` for each node. - For `process.roles`, set whether the node will be a `broker` or a `controller`. `combined` mode, meaning `process.roles` is set to `broker,controller`, is currently not supported and should only be used for experimentation. - Set a system-wide unique ID for the `node.id` for each broker/controller. - `controller.quorum.voters` should be a comma-separated list of controllers in the format `nodeID@hostname:port` ```bash ############################# Server Basics ############################# # The role of this server. Setting this puts us in KRaft mode process.roles=broker # The node id associated with this instance's roles node.id=2 # The connect string for the controller quorum controller.quorum.voters=1@controller1:9093,3@controller3:9093,5@controller5:9093 ``` * Configure how brokers and clients communicate with the broker using `listeners`, and where controllers listen with `controller.listener.names`. - `listeners`: Comma-separated list of URIs and listener names to listen on in the format `listener_name://host_name:port` - `controller.listener.names`: Comma-separated list of `listener_name` entries for listeners used by the controller. For more information, see [KRaft Configuration for Confluent Platform](../../kafka-metadata/config-kraft.md#configure-kraft). * Configure security for your environment. - For general security guidance, see [KRaft Security in Confluent Platform](../../security/component/kraft-security.md#kraft-security). - For role-based access control (RBAC), see [Configure Metadata Service (MDS) in Confluent Platform](../../kafka/configure-mds/index.md#rbac-mds-config). - For configuring SASL/SCRAM for broker-to-broker communication, see [KRaft-based Confluent Platform clusters](../../security/authentication/sasl/scram/overview.md#sasl-scram-kraft-based-clusters). # List of Kafka brokers to connect to, e.g. PLAINTEXT://hostname:9092,SSL://hostname2:9092 kafkastore.bootstrap.servers=PLAINTEXT://hostname:9092,SSL://hostname2:9092 ``` This configuration is for a three node multi-node cluster. For more information, see [Deploy Schema Registry in Production on Confluent Platform](../../schema-registry/installation/deployment.md#schema-registry-prod). ### Configure the LDAP identity provider This configuration shows the LDAP context to identify LDAP users and groups to the MDS. The baseline LDAP configuration procedure for MDS is shown followed by detailed descriptions of the essential configuration options. 1. Ensure you have this information available before you begin. - The hostname (LDAP server URL, for example, `LDAPSERVER.EXAMPLE.COM`), port (for example, `389`), and any other security mechanisms (such as TLS) - The full DN (distinguished name) of LDAP users - If you have a complex LDAP directory tree, consider developing search filters for your configuration. These filters help MDS to trim LDAP search results. 2. Edit your Kafka properties file (`/etc/kafka/server.properties`). 3. Add the following baseline configuration for your identify provider (LDAP). ```RST ############################# Identity Provider Settings (LDAP) ############################# # Search groups for group-based authorization. ldap.group.name.attribute= ldap.group.object.class=group ldap.group.member.attribute=member ldap.group.member.attribute.pattern=CN=(.*),DC=rbac,DC=confluent,DC=io ldap.group.search.base=CN=Users,DC=rbac,DC=confluent,DC=io #Limit the scope of searches to subtrees off of base ldap.user.search.scope=2 #Enable filters to limit search to only those groups needed ldap.group.search.filter=(|(CN=)(CN=)) # Kafka authenticates to the directory service with the bind user. ldap.java.naming.provider.url=ldap://:389 ldap.java.naming.security.authentication=simple ldap.java.naming.security.credentials= ldap.java.naming.security.principal= # Locate users. Make sure that these attributes and object classes match what is in your directory service. ldap.user.name.attribute= ldap.user.object.class=user ldap.user.search.base= ``` 4. Adjust the configuration details for your environment, particularly the content in brackets (`<>`). Pay special attention to the following as you work: * Nested LDAP groups are not supported. * If you enable LDAP authentication for Kafka clients by adding [the LDAP callback handler](../../security/authentication/ldap/client-authentication-ldap.md#client-auth-with-ldap) (not shown in this configuration): - Specify `ldap.user.password.attribute` only if your LDAP server does not support simple bind. - If you define this property (`io.confluent.security.auth.provider.ldap.LdapAuthenticateCallbackHandler`), LDAP will perform the user search and return the password back to Kafka and Kafka will perform the password check. - The LDAP server will return the user’s hashed password, so Kafka cannot authenticate the user unless the user’s properties file also uses the hashed password. 5. Save and close the property file. 6. After configuring LDAP–but before configuring MDS– connect to and query your LDAP server to verify your LDAP connection information. It is recommended that you use an LDAP tool to do this (for example, JXplorer). 7. When *all* sections of the MDS configuration are complete and your LDAP connection is verified, [Start Confluent Platform](../../installation/overview.md#installation). The following sections provide details about the baseline LDAP configuration options for user and group-based authorization. For more details about LDAP configuration, see [Configure LDAP Group-Based Authorization for MDS](ldap-auth-config.md#ldap-auth-config) and [Configure LDAP Authentication](ldap-auth-mds.md#ldap-auth-mds). ### MDS REST client configurations If a component (such as Schema Registry, ksqlDB, Confluent Control Center, or Connect) client configured to communicate with MDS includes an incorrect username or password, it can result in an endless loop of attempts to authenticate, which can inadvertently produce a continuous loop of exceptions in your REST client exception log and impact performance. For example: ```none [2021-01-25 05:11:58,330] ERROR [pool-17-thread-1] Error while refreshing active metadata server urls, retrying (io.confluent.security.auth.client.rest.RestClient) io.confluent.security.auth.client.rest.exceptions.RestClientException: Unauthorized; error code: 401 at io.confluent.security.auth.client.rest.RestClient$HTTPRequestSender.lambda$submit$0(RestClient.java:353) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ``` To control the number of authentication retry attempts, include the following options in your REST client MDS configuration: - `confluent.metadata.server.urls.max.retries` - `confluent.metadata.server.urls.fail.on.401` For details about these configuration options, refer to [REST client configurations](mds-configuration.md#rest-client-mds-config). ### Handling large message sizes We strongly recommend that you adhere to the default maximum size of 1 MB for messages. When it is absolutely necessary to increase the maximum message size, the following are a few of the many implications you should consider. Also consider alternative options such as using compression and/or splitting up messages. Heap fragmentation : Consistently large messages likely cause heap fragmentation on the broker side, requiring significant JVM tuning to maintain consistent performance. Dirty page cache : Accessing messages that are no longer available in the page cache is slow. With larger messages, fewer messages can fit in the page cache, causing degraded performance. Kafka client buffer sizes : Default buffer sizes on the client side are tuned for small messages (<1MB). You will have to tune client side buffers on both the producer and consumer to properly handle the messages. See [max.message.bytes](/platform/current/installation/configuration/topic-configs.html#max-message-bytes). To configure Kafka to handle larger messages, set the following configuration parameters at the level you need, in Producer, Consumer and Topic. If all topics needs this configuration, set it in the Broker configuration, but this is not recommended for the reasons listed above. | Scope | Config Parameter | Notes | |----------------------------------|---------------------------------------------------|--------------------------------------------------------------------------| | Topic | `max.message.bytes` | Recommended to set the maximum message size at the topic level. | | Broker | `message.max.bytes` | Setting the maximum message size at the broker level is not recommended. | | Producer | `max.request.size` | Required for the producer level change of the maximum message size. | | `batch.size` `buffer.memory` | Use these parameters for performance tuning. | | | Consumer | `fetch.max.bytes` `max.partition.fetch.bytes` | Use these to set the maximum message size at the consumer level. | For example, if you want to be able to handle 2 MB messages, you need to configure as below. Topic configuration: ```none max.message.bytes=2097152 ``` Producer configuration: ```none max.request.size=2097152 ### License Client Configuration A Kafka client is used to check the license topic for compliance. Review the following information about how to configure this license client when using principal propagation. Configure license client authentication : When using principal propagation, client license authentication is inherited from the inter-broker listeners. Configure license client authorization : When using principal propagation and RBAC or ACLs, you must configure client authorization for the license topic. #### NOTE The `_confluent-command` internal topic is available as the preferred alternative to the `_confluent-license` topic for components such as Schema Registry, REST Proxy, and Confluent Server (which were previously using `_confluent-license`). Both topics will be supported going forward. Here are some guidelines: - New deployments (Confluent Platform 6.2.1 and later) will default to using `_confluent-command` as shown below. - Existing clusters will continue using the `_confluent-license` unless manually changed. - Newly created clusters on Confluent Platform 6.2.1 and later will default to creating the `_confluent-command` topic, and only existing clusters that already have a `_confluent-license` topic will continue to use it. - **RBAC authorization** Run this command to add `ResourceOwner` for the component user for the Confluent license topic resource (default name is `_confluent-command`). ```none confluent iam rbac role-binding create \ --role ResourceOwner \ --principal User: \ --resource Topic:_confluent-command \ --kafka-cluster ``` - **ACL authorization** Run this command to configure Kafka authorization, where bootstrap server, client configuration, service account ID is specified. This grants create, read, and write on the `_confluent-command` topic. ```none kafka-acls --bootstrap-server --command-config \ --add --allow-principal User: --operation Create --operation Read --operation Write \ --topic _confluent-command ``` ## High Availability for pull queries ksqlDB supports [pull queries](../developer-guide/ksqldb-reference/select-pull-query.md#ksqldb-reference-select-pull-query), which you use to query materialized state that is stored while executing a [persistent query](../concepts/queries.md#ksqldb-concepts-queries-persistent). This works without issue when all nodes in your ksqlDB cluster are operating correctly, but what happens when a node storing that state goes down? First, you must start multiple nodes and make sure inter-node communication is configured so that query forwarding works correctly: ```properties listeners=http://0.0.0.0:8088 ksql.advertised.listener=http://host1.example.com:8088 ``` The `ksql.advertised.listener` configuration specifies the URL that is propagated to other nodes for inter-node requests, so it must be reachable from other hosts/pods in the cluster. Inter-node requests are critical in a multi-node cluster. For more information, see [configuring listeners of a ksqlDB cluster](installation/server-config.md#ksqldb-install-configure-server-configuring-listeners). While waiting for a failed node to restart is one possibility, this approach may incur more downtime than you want, and it may not be possible if there is a more serious failure. The other possibility is to have replicas of the data, ready to go when they’re needed. Fortunately, Kafka Streams provides a mechanism to do this: ```properties ksql.streams.num.standby.replicas=1 ksql.query.pull.enable.standby.reads=true ``` This first configuration tells Kafka Streams to use a separate task that operates independently of the active (writer) state store to build up a replica of the state. The second config indicates that reading is allowed from the replicas (or *standbys*) if reading fails from the active store. This approach is sufficient to enable high availability for pull queries in ksqlDB, but it requires that every request must try the active first. A better approach is to use a *heartbeating* mechanism to detect failed nodes preemptively, before a pull query arrives, so the request can forward straight to a replica. Set the following configs to detect failed nodes preemptively. ```properties ksql.heartbeat.enable=true ksql.lag.reporting.enable=true ``` The first configuration enables heartbeating, which should improve the speed of request handling significantly during failures, as described above. The second config allows for lag data of each of the standbys to be collected and sent to the other nodes to make routing decisions. In this case, the lag is defined by how many messages behind the active a given standby is. If ensuring freshness is a priority, you can provide a threshold in a pull query request to avoid the largest outliers: ```sql SET 'ksql.query.pull.max.allowed.offset.lag'='100'; SELECT * FROM QUERYABLE_TABLE WHERE ID = 456; ``` This configuration causes the request to consider only standbys that are within 100 messages of the active host. With these configurations, you can introduce as much redundancy as you require and ensure that your pull queries succeed with controlled lag and low latency. ### Mount Volumes Various features (plugins, UDFs, embedded connectors) may require that you mount volumes to the docker image. To do this, follow [the official docker documentation](https://docs.docker.com/storage/volumes/). As an example using `docker-compose`, you can mount a udf directory and use it like this: ```yaml ksqldb-server: image: confluentinc/cp-ksqldb-server:8.1.0 hostname: ksqldb-server container_name: ksqldb-server depends_on: - broker - schema-registry ports: - "8088:8088" volumes: - "./extensions/:/opt/ksqldb-udfs" environment: KSQL_LISTENERS: "http://0.0.0.0:8088" KSQL_BOOTSTRAP_SERVERS: "broker:9092" KSQL_KSQL_SCHEMA_REGISTRY_URL: "http://schema-registry:8081" KSQL_KSQL_LOGGING_PROCESSING_STREAM_AUTO_CREATE: "true" KSQL_KSQL_LOGGING_PROCESSING_TOPIC_AUTO_CREATE: "true" # Configuration for UDFs KSQL_KSQL_EXTENSION_DIR: "/opt/ksqldb-udfs" ``` #### ksqlDB Quickstart stack Download the `docker-compose.yml` file from the **Include Kafka** tab of the [ksqlDB Quick Start](../../quickstart.md#ksqldb-quick-start). This `docker-compose.yml` file defines a stack with these features: - Start one ksqlDB Server instance. - Does not start Schema Registry, so Avro and Protobuf schemas aren’t available. - Start the ksqlDB CLI container automatically. Use the following command to start the ksqlDB CLI in the running `ksqldb-cli` container. ```bash docker exec -it ksqldb-cli ksql http://ksqldb-server:8088 ``` #### Interactive ksqlDB clusters pre Kafka 2.0 [Interactive ksqlDB clusters](../how-it-works.md#ksqldb-architecture-interactive-deployment), (which is the default configuration), require that the authenticated ksqlDB user has open access to create, read, write, delete topics, and use any consumer group: Interactive ksqlDB clusters require these ACLs: - The `DESCRIBE_CONFIGS` operation on the `CLUSTER` resource type. - The `CREATE` operation on the `CLUSTER` resource type. - The `DESCRIBE`, `READ`, `WRITE` and `DELETE` operations on all `TOPIC` resource types. - The `DESCRIBE` and `READ` operations on all `GROUP` resource types. It’s still possible to restrict the authenticated ksqlDB user from accessing specific resources using `DENY` ACLs. For example, you can add a `DENY` ACL to stop SQL queries from accessing a topic that contains sensitive data. For example, given the following setup: - A 3-node ksqlDB cluster with ksqlDB servers running on IPs 198.51.100.0, 198.51.100.1, 198.51.100.2 - Authenticating with the Kafka cluster as a ‘KSQL1’ user. Then the following commands would create the necessary ACLs in the Kafka cluster to allow ksqlDB to operate: ```bash # Manage Metadata Schemas in ksqlDB for Confluent Platform Use the `ksql-migrations` tool to manage metadata schemas for your ksqlDB clusters by applying statements from *migration files* to your ksqlDB clusters. This enables you to keep your SQL statements for creating streams, tables, and queries in version control and manage the versions of your ksqlDB clusters based on the migration files that have been applied. ```none usage: ksql-migrations [ {-c | --config-file} ] [ ] Commands are: apply Migrates the metadata schema to a new schema version. create Creates a blank migration file with the specified description, which can then be populated with ksqlDB statements and applied as the next schema version. destroy-metadata Destroys all ksqlDB server resources related to migrations, including the migrations metadata stream and table and their underlying Kafka topics. WARNING: this is not reversible! help Display help information info Displays information about the current and available migrations. initialize-metadata Initializes the migrations schema metadata (ksqlDB stream and table) on the ksqlDB server. new-project Creates a new migrations project directory structure and config file. validate Validates applied migrations against local files. See 'ksql-migrations help ' for more information on a specific command. ``` The `ksql-migrations` tool supports migrations files containing the following types of ksqlDB statements: - `CREATE STREAM` - `CREATE TABLE` - `CREATE STREAM ... AS SELECT` - `CREATE TABLE ... AS SELECT` - `CREATE OR REPLACE` - `INSERT INTO ... AS SELECT` - `PAUSE ` - `RESUME ` - `TERMINATE ` - `DROP STREAM` - `DROP TABLE` - `ALTER STREAM` - `ALTER TABLE` - `INSERT INTO ... VALUES` - `CREATE CONNECTOR` - `DROP CONNECTOR` - `CREATE TYPE` - `DROP TYPE` - `SET ` - `UNSET ` - `DEFINE ` - available if both `ksql-migrations` and the server are version 0.18 or newer. - `UNDEFINE ` - available if both `ksql-migrations` and the server are version 0.18 or newer. - `ASSERT SCHEMA` - available if both `ksql-migrations` and the server are version 0.27 or newer. - `ASSERT TOPIC` - available if both `ksql-migrations` and the server are version 0.27 or newer. Any properties or variables set using the `SET`, `UNSET`, `DEFINE` and `UNDEFINE` are applied in the current migration file only. They do not carry over to the next migration file, even if multiple migration files are applied as part of the same `ksql-migrations apply` command #### NOTE In the following examples, the `AVRO` schema string in Schema Registry is a single-line raw string without newline characters (`\n`). The strings are shown as human-readable text for convenience. For example, the following a physical schema is in `AVRO` format and is registered with Schema Registry with ID 1: ```json { "schema": { "type": "record", "name": "PageViewValueSchema", "namespace": "io.confluent.ksql.avro_schemas", "fields": [ { "name": "page_name", "type": "string", "default": "abc" }, { "name": "ts", "type": "int", "default": 123 } ] } } ``` The following `CREATE` statement defines a stream on the `pageviews` topic and specifies the physical schema that has an ID of `1`. ```sql CREATE STREAM pageviews (pageId INT KEY) WITH ( KAFKA_TOPIC='pageviews-avro-topic', KEY_FORMAT='KAFKA', VALUE_FORMAT='AVRO',VALUE_SCHEMA_ID=1,PARTITIONS=1 ); ``` The following output from the `describe pageviews` command shows the inferred logical schema for the `pageviews` stream: ```sql DESCRIBE pageviews; Name : PAGEVIEWS Field | Type PAGEID | INTEGER (key) page_name | VARCHAR(STRING) ts | INTEGER ``` If `WRAP_SINGLE_VALUE` is `false` in the statement, and if `KEY_SCHEMA_ID` is set, `ROWKEY` is used as the key’s column name. If `VALUE_SCHEMA_ID` is set, `ROWVAL` is used as the value’s column name. The physical schema is used as the column data type. For example, the following physical schema is `AVRO` and is defined in Schema Registry with ID `2`: ```json {"schema": "int"} ``` The following `CREATE` statement defines a table on the `pageview-count` topic and specifies the physical schema that has ID `2`. `sql hl_lines="7" CREATE TABLE pageview_count ( pageId INT PRIMARY KEY ) WITH ( KAFKA_TOPIC='pageview-count', KEY_FORMAT='KAFKA', VALUE_FORMAT='AVRO', VALUE_SCHEMA_ID=2, WRAP_SINGLE_VALUE=false, PARTITIONS=1 );` The inferred logical schema for the `pageview_count` table is: ```none Name : PAGEVIEW_COUNT Field | Type PAGEID | INTEGER (primary key) ROWVAL | INTEGER ``` For more information about `WRAP_SINGLE_VALUE`, see [Single field unwrapping](../reference/serialization.md#ksqldb-serialization-formats-single-field-unwrapping). ### Data Serialization When a schema ID is provided, and schema inference is successful, ksqlDB can create the data source. When writing to the data source, the physical schema inferred by the schema ID is used to serialize data, instead of the logical schema that’s used in other cases. Because ksqlDB’s logical schema accepts `null` values but the physical schema may not, serialization can fail even if the inserted value is valid for the logical schema. The following example shows a physical schema that’s defined in Schema Registry with ID `1`. No default values are specified for the `page_name` and `ts` fields. `json hl_lines="8-9 12-13" { "schema": { "type": "record", "name": "PageViewValueSchema", "namespace": "io.confluent.ksql.avro_schemas", "fields": [ { "name": "page_name", "type": "string" }, { "name": "ts", "type": "int" } ] } }` The following example creates a stream with schema ID `1`: ```sql CREATE STREAM pageviews ( pageId INT KEY ) WITH ( KAFKA_TOPIC='pageviews-avro-topic', KEY_FORMAT='KAFKA', VALUE_FORMAT='AVRO', VALUE_SCHEMA_ID=1, PARTITIONS=1 ); ``` ksqlDB infers the following schema for `pageviews`: ```none Name : PAGEVIEWS Field | Type PAGEID | INTEGER (key) page_name | VARCHAR(STRING) ts | INTEGER ``` If you insert values to `pageviews` with `null` values, ksqlDB returns an error: ```sql INSERT INTO pageviews VALUES (1, null, null); ``` ```none Failed to insert values into 'PAGEVIEWS'. Could not serialize value: [ null | null ]. Error serializing message to topic: pageviews-avro-topic1. Invalid value: null used for required field: "page_name", schema type: STRING ``` This error occurs because `page_name` and `ts` are required fields without default values in the specified physical schema. ## Step 4: Create a stream You’re ready to create a [stream](concepts/streams.md#ksqldb-concepts-streams). A stream associates a schema with an underlying Kafka topic. You use the [CREATE STREAM](developer-guide/ksqldb-reference/create-stream.md#ksqldb-reference-create-stream) statement to register a stream on a topic. If the topic doesn’t exist yet, ksqlDB creates it on the Kafka broker. In the ksqlDB CLI, copy the following SQL and press Enter to run the statement. ```sql CREATE STREAM riderLocations (profileId VARCHAR, latitude DOUBLE, longitude DOUBLE) WITH (kafka_topic='locations', value_format='json', partitions=1); ``` Your output should resemble: ```none Message Stream created ``` Here’s what each parameter in the CREATE STREAM statement does: - `kafka_topic`: Name of the Kafka topic underlying the stream. In this case, it’s created automatically, because it doesn’t exist yet, but you can create streams over topics that exist already. - `value_format`: Encoding of the messages stored in the Kafka topic. For JSON encoding, each row is stored as a JSON object whose keys and values are column names and values, for example: ```json {"profileId": "c2309eec", "latitude": 37.7877, "longitude": -122.4205} ``` - `partitions`: Number of partitions to create for the `locations` topic. This parameter is not needed for topics that exist already. ## Streams A stream is a partitioned, immutable, append-only collection that represents a series of historical facts. For example, the rows of a stream could model a sequence of financial transactions, like “Alice sent $100 to Bob”, followed by “Charlie sent $50 to Bob”. Once a row is inserted into a stream, it can never change. New rows can be appended at the end of the stream, but existing rows can never be updated or deleted. Each row is stored in a particular partition. Every row, implicitly or explicitly, has a key that represents its identity. All rows with the same key reside in the same partition. To create a stream, use the `CREATE STREAM` command. The following example statement specifies a name for the new stream, the names of the columns, and the data type of each column. ```sql CREATE STREAM s1 ( k VARCHAR KEY, v1 INT, v2 VARCHAR ) WITH ( kafka_topic = 's1', partitions = 3, value_format = 'json' ); ``` This creates a new stream named `s1` with three columns: `k`, `v1`, and `v2`. The column `k` is designated as the key of this stream, which controls the partition that each row is stored in. When the data is stored, the value portion of each row’s underlying Kafka record is serialized in the JSON format. Under the hood, each stream corresponds to a [Kafka topic](../../concepts/apache-kafka-primer.md#ksqldb-apache-kafka-primer-topics) with a registered schema. If the backing topic for a stream doesn’t exist when you declare it, ksqlDB creates it on your behalf, as shown in the previous example statement. You can also declare a stream on top of an existing topic. When you do that, ksqlDB simply registers its associated schema. If topic `s2` already exists, the following statement register a new stream over it: ```sql CREATE STREAM s2 ( k1 VARCHAR KEY, v1 VARCHAR ) WITH ( kafka_topic = 's2', value_format = 'json' ); ``` # Create Clickstream Data Analysis Pipeline Using ksqlDB in Confluent Platform This example shows how you can use ksqlDB to process a stream of click data, aggregate and filter it, and join to information about the users. Visualisation of the results is provided by Grafana, on top of data streamed to Elasticsearch. These steps will guide you through how to setup your environment and run the clickstream analysis tutorial from a Docker container. ![image](ksqldb/images/clickstream_demo_flow.png) Prerequisites: : - Docker - Docker version 1.11 or later is [installed and running](https://docs.docker.com/engine/installation/). - Docker Compose is [installed](https://docs.docker.com/compose/install/). Docker Compose is installed by default with Docker for Mac. - Docker memory is allocated minimally at 6 GB. When using Docker Desktop for Mac, the default Docker memory allocation is 2 GB. You can change the default allocation to 6 GB in Docker. Navigate to **Preferences** > **Resources** > **Advanced**. - Internet connectivity - [Operating System](../../installation/versions-interoperability.md#operating-systems) currently supported by Confluent Platform - Networking and Kafka on Docker - Configure your hosts and ports to allow both internal and external components to the Docker network to communicate. - Configure your hostnames and ports to allow the Docker network’s internal and external components to communicate. - (Optional) [curl](https://curl.se/). - In the steps below, you will download a Docker Compose file. You can download this file any way you like, but the instructions below provide the explicit curl command you can use to download the file. - [jq](https://stedolan.github.io/jq/) version 1.6 or later - If you are using Linux as your host, for the Elasticsearch container to start successfully you must first run: ```bash sudo sysctl -w vm.max_map_count=262144 ``` ## Create the Clickstream Data Once you’ve confirmed all the Docker containers are running, create the source connectors that generate mock data. This demo leverages the embedded Connect worker in ksqlDB. 1. Launch the ksqlDB CLI: ```bash docker-compose exec ksqldb-cli ksql http://ksqldb-server:8088 ``` 2. Ensure the ksqlDB server is ready to receive requests by running the following until it succeeds: ```sql show topics; ``` The output should look similar to: ```none Kafka Topic | Partitions | Partition Replicas ``` 3. Run the script [create-connectors.sql](https://github.com/confluentinc/examples/tree/latest/clickstream/ksql/ksql-clickstream-demo/demo/create-connectors.sql) that executes the ksqlDB statements to create three source connectors for generating mock data. ```sql RUN SCRIPT '/scripts/create-connectors.sql'; ``` The output should look similar to: ```none CREATE SOURCE CONNECTOR datagen_clickstream_codes WITH ( 'connector.class' = 'io.confluent.kafka.connect.datagen.DatagenConnector', 'kafka.topic' = 'clickstream_codes', 'quickstart' = 'clickstream_codes', 'maxInterval' = '20', 'iterations' = '100', 'format' = 'json', 'key.converter' = 'org.apache.kafka.connect.converters.IntegerConverter'); Message Created connector DATAGEN_CLICKSTREAM_CODES [...] ``` 4. Now the `clickstream` generator is running, simulating the stream of clicks. Sample the messages in the `clickstream` topic: ```sql print clickstream limit 3; ``` Your output should resemble: ```bash Key format: HOPPING(JSON) or TUMBLING(JSON) or HOPPING(KAFKA_STRING) or TUMBLING(KAFKA_STRING) or KAFKA_STRING Value format: JSON or KAFKA_STRING rowtime: 2020/06/11 10:38:42.449 Z, key: 222.90.225.227, value: {"ip":"222.90.225.227","userid":12,"remote_user":"-","time":"1","_time":1,"request":"GET /images/logo-small.png HTTP/1.1","status":"302","bytes":"1289","referrer":"-","agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"} rowtime: 2020/06/11 10:38:42.528 Z, key: 111.245.174.248, value: {"ip":"111.245.174.248","userid":30,"remote_user":"-","time":"11","_time":11,"request":"GET /site/login.html HTTP/1.1","status":"302","bytes":"14096","referrer":"-","agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"} rowtime: 2020/06/11 10:38:42.705 Z, key: 122.152.45.245, value: {"ip":"122.152.45.245","userid":11,"remote_user":"-","time":"21","_time":21,"request":"GET /images/logo-small.png HTTP/1.1","status":"407","bytes":"4196","referrer":"-","agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"} Topic printing ceased ``` 5. The second data generator running is for the HTTP status codes. Sample the messages in the `clickstream_codes` topic: ```sql print clickstream_codes limit 3; ``` Your output should resemble: ```bash Key format: KAFKA_INT Value format: JSON or KAFKA_STRING rowtime: 2020/06/11 10:38:40.222 Z, key: 200, value: {"code":200,"definition":"Successful"} rowtime: 2020/06/11 10:38:40.688 Z, key: 404, value: {"code":404,"definition":"Page not found"} rowtime: 2020/06/11 10:38:41.006 Z, key: 200, value: {"code":200,"definition":"Successful"} Topic printing ceased ``` 6. The third data generator is for the user information. Sample the messages in the `clickstream_users` topic: ```sql print clickstream_users limit 3; ``` Your output should resemble: ```bash Key format: KAFKA_INT Value format: JSON or KAFKA_STRING rowtime: 2020/06/11 10:38:40.815 Z, key: 1, value: {"user_id":1,"username":"Roberto_123","registered_at":1410180399070,"first_name":"Greta","last_name":"Garrity","city":"San Francisco","level":"Platinum"} rowtime: 2020/06/11 10:38:41.001 Z, key: 2, value: {"user_id":2,"username":"akatz1022","registered_at":1410356353826,"first_name":"Ferd","last_name":"Pask","city":"London","level":"Gold"} rowtime: 2020/06/11 10:38:41.214 Z, key: 3, value: {"user_id":3,"username":"akatz1022","registered_at":1483293331831,"first_name":"Oriana","last_name":"Romagosa","city":"London","level":"Platinum"} Topic printing ceased ``` 7. Go to Confluent Control Center UI at [http://localhost:9021](http://localhost:9021) and view the three kafka-connect-datagen source connectors created with the ksqlDB CLI. ![Datagen Connectors](ksqldb/images/c3_datagen_connectors.png) ### Create the ksqlDB source streams For ksqlDB to be able to use the topics that Debezium created, you must declare streams over it. Because you configured Kafka Connect with Schema Registry, you don’t need to declare the schema of the data for the streams, because it’s inferred from the schema that Debezium writes with. Run the following statement to create a stream over the `customers` table: ```sql CREATE STREAM customers WITH ( kafka_topic = 'customers.public.customers', value_format = 'avro' ); ``` Do the same for `orders`. For this stream, specify that the timestamp of the event is derived from the data itself. Specifically, it’s extracted and parsed from the `ts` field. ```sql CREATE STREAM orders WITH ( kafka_topic = 'my-replica-set.logistics.orders', value_format = 'avro', timestamp = 'ts', timestamp_format = 'yyyy-MM-dd''T''HH:mm:ss' ); ``` Finally, repeat the same for `shipments`: ```sql CREATE STREAM shipments WITH ( kafka_topic = 'my-replica-set.logistics.shipments', value_format = 'avro', timestamp = 'ts', timestamp_format = 'yyyy-MM-dd''T''HH:mm:ss' ); ``` ### Run the microservice Compile the program with: ```bash mvn compile ``` And run it: ```bash mvn exec:java -Dexec.mainClass="io.ksqldb.tutorial.EmailSender" ``` If everything is configured correctly, emails will be sent whenever an anomaly is detected. There are a few things to note with this simple implementation. First, if you start more instances of this microservice, the partitions of the `possible_anomalies` topic will be load balanced across them. This takes advantage of the standard [Kafka consumer groups](/platform/current/clients/consumer.html#consumer-groups) behavior. Second, this microservice is configured to checkpoint its progress every `100` milliseconds through the `ENABLE_AUTO_COMMIT_CONFIG` configuration. That means any successfully processed messages will not be reprocessed if the microservice is taken down and turned on again. Finally, note that ksqlDB emits a new event every time a tumbling window changes. ksqlDB uses a model called “refinements” to continually emit new changes to stateful aggregations. For example, if an anomaly was detected because three credit card transactions were found in a given interval, an event would be emitted from the table. If a fourth is detected in the same interval, another event is emitted. Because SendGrid does not (at the time of writing) support idempotent email submission, you would need to have a small piece of logic in your program to prevent sending an email multiple times for the same period. This is omitted for brevity. If you wish, you can continue the example by inserting more events into the `transactions` topics. ### Create the ksqlDB calls stream For ksqlDB to be able to use the topic that Debezium created, you must declare a stream over it. Because you configured Kafka Connect with Schema Registry, you don’t need to declare the schema of the data for the streams. It is simply inferred from the schema that Debezium writes with. Run the following at the ksqlDB CLI: ```sql CREATE STREAM calls WITH ( kafka_topic = 'call-center-db.call-center.calls', value_format = 'avro' ); ``` ### Listing cluster links **Example Command** ```bash kafka-cluster-links --list --bootstrap-server localhost:9093 ``` **Example Output** ```bash Link name: 'example-link', link ID: '123-some-link-id', remote cluster ID: '123-some-cluster-id', local cluster ID: ', local cluster ID: '456-some-other-cluster-id'', remote cluster available: 'true' ``` You can list existing cluster links. The command returns the link name, link ID (an internally allocated unique ID), the cluster ID of the linked cluster, and whether the linked cluster is available or not. `--link` : If provided, only lists the specified cluster link. * Type: string `--command-config` : Property file containing configurations to be passed to the [AdminClient](../../installation/configuration/admin-configs.md#cp-config-admin). For example, with security credentials for authorization and authentication. * Type: string `--include-topics` : If provided, includes a list of all mirror topics on this cluster link. * Type: string You must have `DESCRIBE CLUSTER` authorization to list cluster links. ### Viewing a cluster link task status You can view the status of the following configurable tasks: - [Consumer offset sync](mirror-topics-cp.md#mirror-topics-consumer-offsets) - [ACL sync (migrate ACLs)](security.md#cluster-link-acls-migrate) - [Topic configurations sync](mirror-topics-cp.md#sync-topic-configs) - [Auto-Create mirror topics](mirror-topics-cp.md#auto-create-mirror-topics-concepts) To view the status of any given task on Confluent Platform, use the following command: ```bash confluent kafka link task list ``` Or: ```bash ./bin/kafka-cluster-links.sh ... --list-tasks --link ``` ### Deleting a cluster link **Example Command** ```bash kafka-cluster-links --bootstrap-server localhost:9093 \ --delete \ --link example-link ``` **Example Output** ```bash Cluster link 'example-link' deletion successfully completed. ``` To delete an existing link, use `kafka-cluster-links` along with [bootstrap-server](#bootstrap-cluster-links) and these flags. `--link` : (Required) The name of the cluster link to describe. * Type: string `--command-config` : Property file containing configurations to be passed to the [AdminClient](../../installation/configuration/admin-configs.md#cp-config-admin). For example, with security credentials for authorization and authentication. * Type: string `--validate-only` : If provided, validates the cluster link deletion but doesn’t apply the delete. `--force` : Force deletion of a link even if there are mirror topics are currently linked with it. You must have `ALTER CLUSTER` authorization to delete a cluster link, as described in [Authorization (ACLs)](security.md#cluster-link-acls). #### Create the Principal and ACLs to allow the cluster link to read from cluster A The cluster link needs a principal that is authorized to read data from cluster A. You created the “link” principal in the cluster setup step, above, and now you will assign it the required privileges. 1. Give the link’s principal the **Describe:Cluster ACL**. ```bash $CONFLUENT_HOME/bin/kafka-acls --command-config my-examples/command.config --bootstrap-server localhost:9092 \ --add --allow-principal User:link --operation Describe --cluster ``` This ACL is specifically required for bidirectional mode. 2. At a minimum, give the link’s principal **Read:Topics** and **DescribeConfigs:Topics** on the topics that the cluster link is allowed to read from. This example allows the cluster link to read data from all topics. Alternatively, only specific topic names or prefixes can be given. These can be different from the topic ACLs given on the remote cluster. 3. (Recommended) Assign additional ACLs for syncing consumer offsets, which is a critical feature of a bidirectional cluster link. To learn about consumer offset sync configuration options, see `consumer.offset.sync.enable` and `consumer.offset.sync.ms` in [Configuration Options](#cp-cluster-link-config-options). - Grant the link’s principal **Describe** permissions on all topics. ```bash $CONFLUENT_HOME/bin/kafka-acls --command-config my-examples/command.config --bootstrap-server localhost:9092 --add --allow-principal User:link --operation Describe --topic "*" ``` Your output should resemble: ```bash Adding ACLs for resource `ResourcePattern(resourceType=TOPIC, name=*, patternType=LITERAL)`: (principal=User:link, host=*, operation=DESCRIBE, permissionType=ALLOW) ``` - Grant the link’s principal **Describe** permissions on all consumer groups. ```bash $CONFLUENT_HOME/bin/kafka-acls --command-config my-examples/command.config --bootstrap-server localhost:9092 --add --allow-principal User:link --operation Describe --operation Read --group "*" ``` Your output should resemble: ```bash Adding ACLs for resource `ResourcePattern(resourceType=GROUP, name=*, patternType=LITERAL)`: (principal=User:link, host=*, operation=READ, permissionType=ALLOW) (principal=User:link, host=*, operation=DESCRIBE, permissionType=ALLOW) Current ACLs for resource `ResourcePattern(resourceType=GROUP, name=*, patternType=LITERAL)`: (principal=User:link, host=*, operation=DESCRIBE, permissionType=ALLOW) (principal=User:link, host=*, operation=READ, permissionType=ALLOW) ``` - Grant the link’s principal **DescribeConfigs** permissions on the cluster. ```bash $CONFLUENT_HOME/bin/kafka-acls --command-config my-examples/command.config --bootstrap-server localhost:9092 --add --allow-principal User:link --operation DescribeConfigs --cluster ``` Your output should resemble: ```bash Adding ACLs for resource `ResourcePattern(resourceType=CLUSTER, name=kafka-cluster, patternType=LITERAL)`: (principal=User:link, host=*, operation=DESCRIBE_CONFIGS, permissionType=ALLOW) ``` 4. (Optional) Assign additional ACLs for [syncing (migrating) ACLs](security.md#cluster-link-acls-migrate) or using [prefixing](/platform/current/multi-dc-deployments/cluster-linking/mirror-topics-cp.html#prefix-mirror-topics-and-consumer-group-names) plus [auto-create mirror topics](/platform/current/multi-dc-deployments/cluster-linking/mirror-topics-cp.html#auto-create-mirror-topics). These can be different from the ACLs given on the remote cluster. ## Replicator with RBAC When using RBAC, Replicator clients should use token authentication as described in [Configure Clients for SASL/OAUTHBEARER authentication in Confluent Platform](../../security/authentication/sasl/oauthbearer/configure-clients.md#security-sasl-rbac-oauthbearer-clientconfig). These configurations should be prefixed with the usual Replicator prefixes of `src.kafka.` and `dest.kafka.`. An example configuration for source and destination cluster that are RBAC enabled is below: ```bash src.kafka.sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required \ username="sourceUser \ password="xxx" \ metadataServerUrls="http://sourceHost:8090"; src.kafka.security.protocol=SASL_PLAINTEXT src.kafka.sasl.mechanism=OAUTHBEARER src.kafka.sasl.login.callback.handler.class=io.confluent.kafka.clients.plugins.auth.token.TokenUserLoginCallbackHandler dest.kafka.sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required \ username="destUser \ password="xxx" \ metadataServerUrls="http://destHost:8090"; dest.kafka.security.protocol=SASL_PLAINTEXT dest.kafka.sasl.mechanism=OAUTHBEARER dest.kafka.sasl.login.callback.handler.class=io.confluent.kafka.clients.plugins.auth.token.TokenUserLoginCallbackHandler ``` for Replicator executable these should not be prefixed and should be placed in files referred to by `--consumer.config` and `--producer.config` as shown below: ```bash ### Verify topic replication across the clusters When Replicator finishes initialization, it checks the origin cluster for topics that need to be replicated. In this case, it finds `test-topic` and creates the corresponding topic in the destination cluster. You can verify this with the following command. ```none ./bin/kafka-topics --describe --topic test-topic.replica --bootstrap-server localhost:9092 ``` Note that you are checking the existence of `test-topic.replica` because `test-topic` was renamed when it was replicated to the destination cluster, according to your configuration. Your output should look similar to this: ```none ./bin/kafka-topics --describe --topic test-topic.replica --bootstrap-server localhost:9092 Topic: test-topic.replica PartitionCount: 1 ReplicationFactor: 1 Configs: message.timestamp.type=CreateTime,segment.bytes=1073741824 Topic: test-topic.replica Partition: 0 Leader: 0 Replicas: 0 Isr: 0 Offline: 0 ``` You can also list and describe the topics on the destination cluster. Replicated topics, like `test-topic.replica` will be listed. ```none ./bin/kafka-topics --list --bootstrap-server localhost:9092 ``` At any time after you’ve created the topic in the origin cluster, you can begin sending data to it using a Kafka producer to write to `test-topic` in the origin cluster. You can then confirm that the data has been replicated by consuming from `test-topic.replica` in the destination cluster. For example, to send a sequence of numbers using Kafka’s console producer, run the following command in a new terminal window: ```none seq 10000 | ./bin/kafka-console-producer --topic test-topic --broker-list localhost:9082 ``` You can confirm delivery in the destination cluster using the console consumer in its own terminal window: ```none ./bin/kafka-console-consumer --from-beginning --topic test-topic.replica --bootstrap-server localhost:9092 ``` If the numbers 1 to 10,000 appear in the consumer output, this indicates that you have successfully created multi-cluster replication. Press `Ctl-C` to end the consumer readout and return to the command prompt. ## Run Example 1. Clone the [confluentinc/examples](https://github.com/confluentinc/examples) GitHub repository. ```bash git clone https://github.com/confluentinc/examples ``` 1. Change directory to the Schema Translation example. ```bash cd examples/replicator-schema-translation ``` 2. Start the entire example by running a single command that creates source and destination clusters automatically and adds a schema to the source Schema Registry. This takes less than 5 minutes to complete. ```bash docker-compose up -d ``` 3. Wait at least 2 minutes and then verify the example has completely started by checking the subjects in the source and destination Schema Registry. ```bash # Source Schema Registry should show one subject, i.e., the output should be ["testTopic-value"] docker-compose exec connect curl http://srcSchemaregistry:8085/subjects # Destination Schema Registry should show no subjects, i.e., the output should be [] docker-compose exec connect curl http://destSchemaregistry:8086/subjects ``` 4. To prepare for schema translation, put the source Schema Registry in “READONLY” mode and the destination registry in “IMPORT” mode. Note that this works only when the destination Schema Registry has no registered subjects (as is true in this example), otherwise the import would fail with a message similar to “Cannot import since found existing subjects”. ```bash docker-compose exec connect /etc/kafka/scripts/set_sr_modes_pre_translation.sh ``` Your output should resemble: ```bash Setting srcSchemaregistry to READONLY mode: {"mode":"READONLY"} Setting destSchemaregistry to IMPORT mode: {"mode":"IMPORT"} ``` 5. Submit Replicator to perform the translation. ```bash docker-compose exec connect /etc/kafka/scripts/submit_replicator.sh ``` Your output should show the posted Replicator configuration. The key configuration that enables the schema translation is `schema.subject.translator.class=io.confluent.connect.replicator.schemas.DefaultSubjectTranslator` ```bash { "name": "testReplicator", "config": { "connector.class": "io.confluent.connect.replicator.ReplicatorSourceConnector", "topic.whitelist": "_schemas", "topic.rename.format": "${topic}.replica", "key.converter": "io.confluent.connect.replicator.util.ByteArrayConverter", "value.converter": "io.confluent.connect.replicator.util.ByteArrayConverter", "src.kafka.bootstrap.servers": "srcKafka1:10091", "dest.kafka.bootstrap.servers": "destKafka1:11091", "tasks.max": "1", "confluent.topic.replication.factor": "1", "schema.subject.translator.class": "io.confluent.connect.replicator.schemas.DefaultSubjectTranslator", "schema.registry.topic": "_schemas", "schema.registry.url": "http://destSchemaregistry:8086", "name": "testReplicator" }, "tasks": [], "type": "source" } ``` 6. Verify the schema translation by revisiting the subjects in the source and destination Schema Registries. ```bash # Source Schema Registry should show one subject, i.e., the output should be ["testTopic-value"] docker-compose exec connect curl http://srcSchemaregistry:8085/subjects # Destination Schema Registry should show one subject, i.e., the output should be ["testTopic.replica-value"] docker-compose exec connect curl http://destSchemaregistry:8086/subjects ``` 7. To complete the example, reset both Schema Registries to `READWRITE` mode, this completes the migration process: ```bash docker-compose exec connect /etc/kafka/scripts/set_sr_modes_post_translation.sh ``` ## Workflows and examples You can integrate Maven Plugin goals with [GitHub Actions](https://docs.github.com/en/actions) into a continuous integration/continuous deployment (CI/CD) pipleline to manage schemas on Schema Registry. A general example for developing and validating an Apache Kafka® client application with a Python producer and consumer is provided in the [kafka-github-actions demo repo](https://github.com/ybyzek/kafka-github-actions). Here is an alternative sample [pom.xml](https://maven.apache.org/guides/introduction/introduction-to-the-pom.html) with project configurations for more detailed validate and register steps. ```bash 4.0.0 io.confluent GitHub-Actions-Demo 1.0 confluent https://packages.confluent.io/maven/ <$CONFLUENT_SCHEMA_REGISTRY_URL> <$CONFLUENT_BASIC_AUTH_USER_INFO> 8.1.0 io.confluent kafka-schema-registry-maven-plugin ${confluent.version} ${schemaRegistryUrl} ${schemaRegistryBasicAuthUserInfo} validate validate validate src/main/resources/order.avsc src/main/resources/flight.proto PROTOBUF set-compatibility validate set-compatibility FORWARD_TRANSITIVE FORWARD_TRANSITIVE test-local validate test-local-compatibility src/main/resources/order.avsc src/main/resources/flight.proto ProtoBuf src/main/resources/flightSchemas src/main/resources/orderSchemas FORWARD_TRANSITIVE FORWARD_TRANSITIVE test-compatibility validate test-compatibility src/main/resources/order.avsc src/main/resources/flight.proto PROTOBUF register register src/main/resources/order.avsc src/main/resources/flight.proto PROTOBUF ``` The following workflows can be coded as GitHub actions to accomplish CICD for schema management. 1. When a pull request is created to merge a new schema to master, validate the schema, check local schema compatibility, set compatibility of subject, and test schema compatibility with subject. ```properties run: mvn validate ``` The validate step would include: ```bash mvn schema-registry:validate@validate mvn schema-registry:test-local-compatibility@test-local mvn schema-registry:set-compatibility@set-compatibility mvn schema-registry:test-compatibility@test-compatibility ``` Integrated with GitHub Actions, the `pull-request.yaml` for this step might look like this: ```bash name: Testing branch for compatibility before merging on: pull_request: branches: [ master ] paths: [src/main/resources/*] jobs: validate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - uses: actions/setup-java@v2 with: java-version: '11' distribution: 'temurin' cache: maven - name: Validate if schema is valid run: mvn schema-registry:validate@validate test-local-compatibility: needs: validate runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - uses: actions/setup-java@v2 with: java-version: '11' distribution: 'temurin' cache: maven - name: Test schema with locally present schema run: mvn schema-registry:test-local-compatibility@test-local set-compatibility: needs: test-local-compatibility runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - uses: actions/setup-java@v2 with: java-version: '11' distribution: 'temurin' cache: maven - name: Set compatibility of subject run: mvn schema-registry:set-compatibility@set-compatibility test-compatibility: needs: set-compatibility runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - uses: actions/setup-java@v2 with: java-version: '11' distribution: 'temurin' cache: maven - name: Test schema with subject run: mvn schema-registry:test-compatibility@test-compatibility ``` If compatibility checking passes a new pull request is created for approval. 2. Register schema when a pull request is approved and merged to master. Run the action to register the new schema on the Schema Registry: ```bash run: mvn schema-registry:register@register ``` The `push.yaml` for this step would look like this: ```bash name: Registering Schema on merge of pull request on: push: branches: [ master ] paths: [src/main/resources/*] jobs: register-schema: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - uses: actions/setup-java@v2 with: java-version: '11' distribution: 'temurin' cache: maven - name: Register Schema run: mvn io.confluent:kafka-schema-registry-maven-plugin:register@register ``` ## Avro deserializer You can plug in `KafkaAvroDeserializer` to `KafkaConsumer` to receive messages of any Avro type from Kafka. In the following example, messages are received with a key of type `string` and a value of type Avro record from Kafka. When getting the message key or value, a `SerializationException` may occur if the data is not well formed. The examples below use the default hostname and port for the Kafka bootstrap server (`localhost:9092`) and Schema Registry (`localhost:8081`). ```none import org.apache.kafka.clients.consumer.Consumer; import org.apache.kafka.clients.consumer.ConsumerRecord; import org.apache.kafka.clients.consumer.ConsumerRecords; import org.apache.kafka.clients.consumer.KafkaConsumer; import org.apache.kafka.clients.consumer.ConsumerConfig; import org.apache.avro.generic.GenericRecord; import java.io.FileInputStream; import java.io.IOException; import java.io.InputStream; import java.nio.file.Files; import java.nio.file.Paths; import java.util.Arrays; import java.util.Properties; import java.util.Random; Properties props = new Properties(); props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092"); props.put(ConsumerConfig.GROUP_ID_CONFIG, "group1"); props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer"); props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "io.confluent.kafka.serializers.KafkaAvroDeserializer"); props.put("schema.registry.url", "http://localhost:8081"); props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest"); String topic = "topic1"; final Consumer consumer = new KafkaConsumer(props); consumer.subscribe(Arrays.asList(topic)); try { while (true) { ConsumerRecords records = consumer.poll(100); for (ConsumerRecord record : records) { System.out.printf("offset = %d, key = %s, value = %s \n", record.offset(), record.key(), record.value()); } } } finally { consumer.close(); } ``` With Avro, it is not necessary to use a property to specify a specific type, since the type can be derived directly from the Avro schema, using the namespace and name of the Avro type. This allows the Avro deserializer to be used out of the box with topics that have records of heterogeneous Avro types. This would be the case when using the `RecordNameStrategy` (or `TopicRecordNameStrategy`) to store multiple types in the same topic, as described in Martin Kleppmann’s blog post [Should You Put Several Event Types in the Same Kafka Topic?](https://www.confluent.io/blog/put-several-event-types-kafka-topic/). (An alternative is to use schema references, as described in [Multiple event types in the same topic](#multiple-event-types-same-topic-avro) and [Putting Several Event Types in the Same Topic – Revisited](https://www.confluent.io/blog/multiple-event-types-in-the-same-kafka-topic/)) This differs from the [Protobuf](serdes-protobuf.md#sr-deserializer-protobuf) and [JSON Schema](serdes-json.md#sr-deserializer-json) deserializers, where in order to return a specific rather than a generic type, you must use a specific property. To return a specific type in Avro, you must add the following configuration: ```bash props.put(KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG, true); ``` Here is a summary of specific and generic return types for each schema format. | | Avro | Protobuf | JSON Schema | |---------------|--------------------------------------------------------------------|--------------------------------------------------------------|----------------------------------------------------------------| | Specific type | Generated class that implements org.apache.avro.SpecificRecord | Generated class that extends com.google.protobuf.Message | Java class (that is compatible with Jackson serialization) | | Generic type | org.apache.avro.GenericRecord | com.google.protobuf.DynamicMessage | com.fasterxml.jackson.databind.JsonNode | ### Confluent Platform 1. Start Confluent Platform using the following command: > ```bash > confluent local services start > ``` > > 2. Verify registered schema types. > > > Schema Registry supports arbitrary schema types. You should verify which schema types are currently registered with Schema Registry. > To do so, type the following command (assuming you use the default URL and port for Schema Registry, `localhost:8081`): > ```bash > curl http://localhost:8081/schemas/types > ``` > The response will be one or more of the following. If additional schema format plugins are installed, these will also be available. > ```bash > ["JSON", "PROTOBUF", "AVRO"] > ``` > Alternatively, use the curl `--silent` flag, and pipe the command through [jq](https://stedolan.github.io/jq/) (`curl --silent http://localhost:8081/schemas/types | jq`) to get nicely formatted output: > ```bash > "JSON", > "PROTOBUF", > "AVRO" > ``` 3. Use the producer to send Avro records in JSON as the message value. > The new topic, `transactions-json`, will be created as a part of this producer command if it does not already exist. > This command starts a producer, and creates a schema for the transactions-avro topic. The schema has two fields, `id` and `amount`. > ```none > kafka-json-schema-console-producer --bootstrap-server localhost:9092 \ > --property schema.registry.url=http://localhost:8081 --topic transactions-json \ > --property value.schema='{"type":"object", "properties":{"id":{"type":"string"},"amount":{"type":"number"} }}' > ``` 4. Type the following command in the shell, and hit return. > ```none > { "id":"1000", "amount":500 } > ``` 5. Open a new terminal window, and use the consumer to read from topic `transactions-json` and get the value of the message in JSON. > ```none > kafka-json-schema-console-consumer --bootstrap-server localhost:9092 --from-beginning --topic transactions-json --property schema.registry.url=http://localhost:8081 > ``` > You should see following in the console. > ```none > {"id":"1000","amount":500} > ``` > Leave this consumer running. 6. Use the producer to send another record as the message value, which includes a new property not explicitly declared in the schema. > JSON Schema has an open content model, which allows any number of additional properties to appear in a JSON document without being specified in the JSON schema. > This is achieved with `additionalProperties` set to `true`, which is the default. If you do not explicitly disable `additionalProperties` (by setting it to `false`), > undeclared properties are allowed in records. These next few steps demonstrate this unique aspect of JSON Schema. > Return to the producer session that is already running and send the following message, which includes a new property `"customer_id"` that is not declared in the schema > with which we started this producer. (Hit return to send the message.) > ```none > {"id":"1000","amount":500,"customer_id":"1221"} > ``` 7. Return to your running consumer to read from topic `transactions-json` and get the new message. > You should see the new output added to the original. > ```none > {"id":"1000","amount":500} > {"id":"1000","amount":500,"customer_id":"1221"} > ``` > The message with the new property (`customer_id`) is successfully produced and read. If you try this with the other schema formats (Avro, Protobuf), > it will fail at the producer command because those specifications require that all properties be explicitly declared in the schemas. > Keep this consumer running. 8. Start a producer and pass a JSON Schema with `additionalProperties` explicitly set to `false`. > Return to the producer command window, and stop the producer with Ctl+C. > Type the following in the shell, and press return. This is the same producer and topic (`transactions-json`) used in the previous steps. > The schema is almost the same as the previous one, but in this example `additionalProperties` is explicitly set to false, as a part of the schema. > ```none > kafka-json-schema-console-producer --bootstrap-server localhost:9092 --property schema.registry.url=http://localhost:8081 --topic transactions-json \ > --property value.schema='{"type":"object", "properties":{"id":{"type":"string"},"amount":{"type":"number"} }} "additionalProperties": false}' > ``` 9. In another shell, use `curl` to get the top-level compatibility configuration. > ```none > curl --silent -X GET http://localhost:8081/config > ``` > Example result (this is the default): > ```none > {"compatibilityLevel":"BACKWARD"} > ``` 10. Update the compatibility requirements globally. > ```none > curl -X PUT -H "Content-Type: application/vnd.schemaregistry.v1+json" \ > --data '{"compatibility": "NONE"}' \ > http://localhost:8081/config > ``` > The output will be: > ```none > {"compatibilityLevel":"NONE"} > ``` 11. Start a new producer and pass a JSON Schema with additionalProperties explicitly set to `false`. > (You can shut down the previous producer, and start this one in the same window.) > ```none > kafka-json-schema-console-producer --bootstrap-server localhost:9092 \ > --property schema.registry.url=http://localhost:8081 --topic transactions-json \ > --property value.schema='{"type":"object", "properties":{"id":{"type":"string"}, "amount":{"type":"number"} }, "additionalProperties": false}' > ``` 12. Attempt to use the producer to send another record as the message value, which includes a new property not explicitly declared in the schema. > ```none > { "id":"1001","amount":500,"customer_id":"this-will-break"} > ``` > This will break. You will get the following error: > ```none > org.apache.kafka.common.errors.SerializationException: Error serializing JSON message > ... > Caused by: org.apache.kafka.common.errors.SerializationException: JSON {"id":"1001","amount":500,"customer_id":"1222"} does not match schema > {"type":"object","properties":{"id":{"type":"string"},"amount":{"type":"number"}},"additionalProperties":false} at > io.confluent.kafka.serializers.json.AbstractKafkaJsonSchemaSerializer.serializeImpl(AbstractKafkaJsonSchemaSerializer.java:132) > ... 5 more > Caused by: org.everit.json.schema.ValidationException: #: extraneous key [customer_id] is not permitted > ... > ``` > The consumer will continue running, but no new messages will be displayed. > This is the same behavior you would see by default if using Avro or Protobuf in this scenario. 13. Rerun the producer in default mode as before and send a follow-on message with an undeclared property. > In the producer command window, stop the producer with Ctl+C. > Run the original producer command. There is no need to explicitly declare `additionalProperties` as `true` (although you could), as this is the default. > ```none > kafka-json-schema-console-producer --bootstrap-server localhost:9092 \ > --property schema.registry.url=http://localhost:8081 --topic transactions-json \ > --property value.schema='{"type":"object", "properties":{"id":{"type":"string"},"amount":{"type":"number"} }}' > ``` 14. Use the producer to send another record as the message value, which again includes a new property not explicitly declared in the schema. > ```none > { "id":"1001","amount":500,"customer_id":"1222"} > ``` 15. Return to the consumer session to read the new message. > The consumer should still be running and reading from topic `transactions-json`. You will see following new message in the console. > ```none > {"id":"1001","amount":500,"customer_id":"this-will-work-again"} > ``` > More specifically, if you followed all steps in order and started the consumer with the `--from-beginning` flag > as mentioned earlier, the consumer shows a history of all messages sent: > ```none > {"id":"1000","amount":500} > {"id":"1000","amount":500,"customer_id":"1221"} > {"id":"1001","amount":500,"customer_id":"this-will-work-again"} > ``` 16. In another shell, use this [curl](https://curl.haxx.se/docs/manual.html) command (piped through `jq` for readability) to query the schemas that were registered with Schema Registry as versions 1 and 2. > To query version 1 of the schema, type: > ```none > curl --silent -X GET http://localhost:8081/subjects/transactions-json-value/versions/1/schema | jq . > ``` > Here is the expected output for version 1: > ```none > { > "type": "object", > "properties": { > "id": { > "type": "string" > }, > "amount": { > "type": "number" > } > } > } > ``` > To query version 2 of the schema. > ```none > curl --silent -X GET http://localhost:8081/subjects/transactions-json-value/versions/2/schema | jq . > ``` > Here is the expected output for version 2: > ```none > { > "type": "object", > "properties": { > "id": { > "type": "string" > }, > "amount": { > "type": "number" > } > }, > "additionalProperties": false > } > ``` 17. View the latest version of the schema in more detail by running this command. > ```none > curl --silent -X GET http://localhost:8081/subjects/transactions-json-value/versions/latest | jq . > ``` > Here is the expected output of the above command: > ```bash > "subject": "transactions-json-value", > "version": 2, > "id": 2, > "schemaType": "JSON", > "schema": "{\"type\":\"object\",\"properties\":{\"id\":{\"type\":\"string\"},\"amount\":{\"type\":\"number\"}},\"additionalProperties\":false}" > ``` 18. Use Confluent Control Center to examine schemas and messages. > > > Messages that were successfully produced also show on Control Center ([http://localhost:9021/](http://localhost:9021/)) > in **Topics > > Messages**. You may have to select a partition or jump to a timestamp to see messages sent earlier. > (For timestamp, type in a number, which will default to partition `1/Partition: 0`, and press return. To get the message view shown here, > select the **cards** icon on the upper right.) > ![image](images/serdes-json-c3-messages.png) > Schemas you create are available on the **Schemas** tab for the selected topic. > ![image](images/serdes-json-c3-schema.png) 19. Run shutdown and cleanup tasks. - You can stop the consumer and producer with Ctl-C in their respective command windows. - To stop Confluent Platform, type `confluent local services stop`. - If you would like to clear out existing data (topics, schemas, and messages) before starting again with another test, type `confluent local destroy`. ### System topics and security configurations The following configurations for system topics are available: - `exporter.config.topic` - Stores configurations for the exporters. The default name for this topic is `_exporter_configs`, and its default/required configuration is: `numPartitions=1`, `replicationFactor=3`, and `cleanup.policy=compact`. - `exporter.state.topic` - Stores the status of the exporters. The default name for this topic is `_exporter_states`, and its default/required configuration is: `numPartitions=1`, `replicationFactor=3`, and `cleanup.policy=compact`. If you are using role-based access control (RBAC), `exporter.config.topic` and `exporter.state.topic` require `ResourceOwner` on these topics, as does the `_schemas` internal topic. See also, [Use Role-Based Access Control (RBAC) in Confluent Cloud](https://docs.confluent.io/cloud/current/access-management/access-control/cloud-rbac.html#) and [Configuring Role-Based Access Control for Schema Registry on Confluent Platform](https://docs.confluent.io/platform/current/schema-registry/security/rbac-schema-registry.html). If you are configuring Schema Registry on Confluent Platform using the [Schema Registry Security Plugin](https://docs.confluent.io/platform/current/confluent-security-plugins/schema-registry/install.html), you must activate both the exporter and the [Schema Registry security plugin](https://docs.confluent.io/platform/current/confluent-security-plugins/schema-registry/install.html#activate-the-plugins) by specifying both extension classes in the `$CONFLUENT_HOME/etc/schema-registry/schema-registry.properties` files: ```bash resource.extension.class=io.confluent.kafka.schemaregistry.security.SchemaRegistrySecurityResourceExtension,io.confluent.schema.exporter.SchemaExporterResourceExtension ``` The configuration for the exporter resource extension class in the `schema-registry.properties` is described in [Set up source and destination environments](https://docs.confluent.io/platform/current/schema-registry/schema-linking-cp.html#set-up-source-and-destination-environments) in Schema Linking on Confluent Platform. ## Prerequisites and Setting Schema Registry URLs on the Brokers Basic requirements to run these examples are generally the same as those described for the [Schema Registry Tutorial](schema_registry_onprem_tutorial.md#sr-tutorial-prereqs) with the exception of Maven, which is not needed here. Also, Confluent Platform version 5.4.0 or later is required here. As an additional prerequisite to enable Schema ID Validation on the brokers, you must specify `confluent.schema.registry.url` in the Kafka `server.properties` file (`$CONFLUENT_HOME/etc/kafka/server.properties`) before you start Confluent Platform. This tells the broker how to connect to Schema Registry. For example: ```none confluent.schema.registry.url=http://schema-registry:8081 ``` This configuration accepts a comma-separated list of URLs for Schema Registry instances. This setting is required to make Schema ID Validation available both from the [Confluent CLI](/ccloud-cli/current/command-reference/index.html) and on the [Control Center for Confluent Platform](https://docs.confluent.io/control-center/current/overview.html). ### Basic Authentication For this setup, the brokers are configured to authenticate to Schema Registry using [basic authentication](../security/authentication/http-basic-auth/overview.md#http-basic-auth). Define the following settings on each broker (`$CONFLUENT_HOME/etc/kafka/server.properties`). ```bash confluent.schema.registry.url=http://: confluent.basic.auth.credentials.source= confluent.basic.auth.user.info=: #required only if credentials source is set to USER_INFO ``` - The property `confluent.basic.auth.credentials.source` defines the type of credentials to use (user name and password). These are literals, not variables. - If you set `confluent.basic.auth.credentials` to `USER_INFO`, you must also specify `confluent.basic.auth.user.info`. ## Configure Kafka clients You can configure the JAAS configuration property for each client in `producer.properties` or `consumer.properties` files. The login module describes how the clients like producer and consumer can connect to the Confluent Server broker. The following is an example configuration for a Kafka client to use token authentication: ```bash sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required \ username="tokenID123" \ password="lAYYSFmLs4bTjf+lTZ1LCHR/ZZFNA==" \ tokenauth="true"; ``` The options `username` and `password` are used by clients to configure the token ID and token HMAC. And the option `tokenauth` is used to indicate the server about token authentication. In this example, clients connect to the Confluent Server broker using token ID: `tokenID123`. Different clients within a JVM may connect using different tokens by specifying different token details in `sasl.jaas.config`. #### NOTE For details on all required and optional broker configuration properties, see [Kafka Broker and Controller Configuration Reference for Confluent Platform](../../../installation/configuration/broker-configs.md#cp-config-brokers). 1. Configure the truststore, keystore, and password in the `server.properties` file of every broker. Because this stores passwords directly in the broker configuration file, it is important to restrict access to these files using file system permissions. ```bash ssl.truststore.location=/var/private/ssl/kafka.server.truststore.jks ssl.truststore.password=test1234 ssl.keystore.location=/var/private/ssl/kafka.server.keystore.jks ssl.keystore.password=test1234 ssl.key.password=test1234 ``` Note that `ssl.truststore.password` is technically optional, but strongly recommended. If a password is not set, access to the truststore is still available, but integrity checking is disabled. 2. If you want to enable TLS for interbroker communication, add the following to the broker properties file, which defaults to `PLAINTEXT`: ```bash security.inter.broker.protocol=SSL ``` 3. Configure the ports for the Apache Kafka® brokers to listen for client and interbroker TLS (`SSL`) connections. You should configure `listeners`, and optionally, `advertised.listeners` if the value is different from `listeners`. ```bash listeners=SSL://kafka1:9093 advertised.listeners=SSL://localhost:9093 ``` 4. Configure both TLS (`SSL`) ports and `PLAINTEXT` ports if: * TLS is not enabled for interbroker communication * Some clients connecting to the Confluent Platform cluster do not use TLS ```bash listeners=PLAINTEXT://kafka1:9092,SSL://kafka1:9093 advertised.listeners=PLAINTEXT://localhost:9092,SSL://localhost:9093 ``` Note that `advertised.host.name` and `advertised.port` configure a single `PLAINTEXT` port and are incompatible with secure protocols. Use `advertised.listeners` instead. 5. To enable the broker to authenticate clients (two-way authentication), you must configure all the brokers for client authentication. Configure this to use `required` rather than `requested` because misconfigured clients can still connect successfully and this provides a false sense of security. ```bash ssl.client.auth=required ``` #### NOTE If you specify `ssl.client.auth=required`, client authentication fails if valid client certificates are not provided. SASL listeners can be enabled in parallel to mTLS if you have defined SASL listeners with the following listener prefix: ```bash listener.name..ssl.client.auth ``` For details, see [KIP-684](https://cwiki.apache.org/confluence/display/KAFKA/KIP-684+-+Support+mutual+TLS+authentication+on+SASL_SSL+listeners#KIP684SupportmutualTLSauthenticationonSASL_SSLlisteners-UseadifferentconfigurationoptionforenablingmTLSwithSASL_SSL). #### SEE ALSO To see an example Confluent Replicator configuration, refer to the [TLS source authentication demo script](https://github.com/confluentinc/examples/tree/latest//replicator-security/scripts/submit_replicator_source_ssl_auth.sh). For demos of common security configurations refer to [Replicator security demos](https://github.com/confluentinc/examples/tree/latest//replicator-security). To configure Confluent Replicator for a destination cluster with TLS authentication, modify the Replicator JSON configuration to include the following: ```bash { "name":"replicator", "config":{ .... "dest.kafka.ssl.truststore.location":"/etc/kafka/secrets/kafka.connect.truststore.jks", "dest.kafka.ssl.truststore.password":"confluent", "dest.kafka.ssl.keystore.location":"/etc/kafka/secrets/kafka.connect.keystore.jks", "dest.kafka.ssl.keystore.password":"confluent", "dest.kafka.ssl.key.password":"confluent", "dest.kafka.security.protocol":"SSL" .... } } } ``` Additionally, the following properties are required in the Connect worker: ```bash security.protocol=SSL ssl.truststore.location=/etc/kafka/secrets/kafka.connect.truststore.jks ssl.truststore.password=confluent ssl.keystore.location=/etc/kafka/secrets/kafka.connect.keystore.jks ssl.keystore.password=confluent ssl.key.password=confluent producer.security.protocol=SSL producer.ssl.truststore.location=/etc/kafka/secrets/kafka.connect.truststore.jks producer.ssl.truststore.password=confluent producer.ssl.keystore.location=/etc/kafka/secrets/kafka.connect.keystore.jks producer.ssl.keystore.password=confluent producer.ssl.key.password=confluent ``` For more details, see [general security configuration for Connect workers](../../../connect/security.md#connect-security). ### Configure Connect Worker level configurations for connectors Add the following configurations to enable OAuth authentication for Kafka Connect workers, allowing them to securely produce and consume messages using the SASL_SSL protocol. By specifying the OAUTHBEARER mechanism, these settings ensure that both producers and consumers authenticate using OAuth tokens, leveraging the OAuthBearerLoginCallbackHandler for token management. The use of SASL_SSL ensures that data in transit is encrypted, enhancing the security of your Kafka Connect deployment. Replace the placeholder values with your actual configuration values. ```properties producer.security.protocol=SASL_SSL producer.sasl.mechanism=OAUTHBEARER producer.sasl.login.callback.handler.class=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginCallbackHandler producer.sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required \ clientId="" \ clientSecret="" \ scope=""; consumer.security.protocol=SASL_SSL consumer.sasl.mechanism=OAUTHBEARER consumer.sasl.login.callback.handler.class=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginCallbackHandler consumer.sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required \ clientId="" \ clientSecret="" \ scope=""; ``` ## How to Migrate This section explains the processes you need to follow to migrate from mTLS based to OAuth authentication in your Confluent Platform clusters. 1. List the existing ACLs, using the following Confluent CLI command: ```text confluent kafka acl list ``` This command lists the ACLs currently in place. Save this list for comparison after you complete the migration. 2. Make sure you are running Confluent Platform version 7.7 or later. 3. Configure the `AuthenticationHandler` according to [Use the AuthenticationHandler Class for Multi-Protocol Authentication in Confluent Platform](../multi-protocol/authenticationhandler.md#authenticationhandler). 4. Enable OAuth/OIDC for Confluent Server and other Confluent Platform services. * [Configure Confluent Schema Registry for OAuth Authentication in Confluent Platform](configure-sr.md#configure-sr-for-oauth) * [Configure Kafka Connect for OAuth Authentication in Confluent Platform](configure-connect.md#configure-connect-for-oauth) * [Configure Confluent Server Brokers for OAuth Authentication in Confluent Platform](configure-cs.md#configure-cs-for-oauth) 5. Change one or more client authentication from mTLS to OAuth. 6. Bring your cluster up. 7. Restart your client applications. 8. List the ACLs again to verify the principals remain compatible. ```text confluent kafka acl list ``` Compare the list to the one you created in step 1. If the principals remain the same, no changes to the ACLs are necessary. The ACLs continue to work as before and are evaluated based on the OAuth principal instead of the mTLS principal. Alternatively, you can [Deploy with Ansible Playbooks](https://docs.confluent.io/ansible/current/overview.html) and [Confluent for Kubernetes](https://docs.confluent.io/operator/current/) to upgrade to 7.7 and migrate from mTLS to OAuth. ### Enable RBAC and Metadata Service (MDS) Brokers are now ready to be RBAC-enabled. Perform these configuration updates for each broker and incrementally update all brokers using rolling restart. Configure broker to use Confluent Server Authorizer. You must retain ACL provider along with RBAC to ensure that existing ACLs are applied. Configure at least one principal in `super.users` for brokers in the metadata cluster to enable role bindings to be created for other clusters. In this example, the user `admin` is granted access to create role bindings for any cluster. ```RST authorizer.class.name=io.confluent.kafka.security.authorizer.ConfluentServerAuthorizer confluent.authorizer.access.rule.providers=ZK_ACL,CONFLUENT super.users=User:admin ``` Follow the instructions in [Configure Metadata Service (MDS) in Confluent Platform](../../../kafka/configure-mds/index.md#rbac-mds-config) to create a key pair for MDS. Configure MDS on the broker. You must update paths to the key files to match your set up. ```RST confluent.metadata.server.listeners=http://0.0.0.0:8090 confluent.metadata.server.advertised.listeners=http://localhost:8090 confluent.metadata.server.authentication.method=BEARER confluent.metadata.server.token.key.path= ``` If you are using other Confluent Platform components, create a new listener to enable token-based authentication using MDS. ```RST listeners=EXTERNAL://:9092,INTERNAL://:9093,TOKEN://:9094 advertised.listeners=EXTERNAL://localhost:9092,INTERNAL://localhost:9093,TOKEN://localhost:9094 listener.security.protocol.map=EXTERNAL:SASL_PLAINTEXT,INTERNAL:SASL_PLAINTEXT,TOKEN:SASL_PLAINTEXT listener.name.token.sasl.enabled.mechanisms=OAUTHBEARER listener.name.token.oauthbearer.sasl.jaas.config= \ org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required \ publicKeyPath="/path/to/publickey.pem"; listener.name.token.oauthbearer.sasl.login.callback.handler.class=io.confluent.kafka.server.plugins.auth.token.TokenBearerServerLoginCallbackHandler listener.name.token.oauthbearer.sasl.server.callback.handler.class=io.confluent.kafka.server.plugins.auth.token.TokenBearerValidatorCallbackHandler ``` If you are using [LDAP group-based authorization](../ldap/configure.md#kafka-ldap-config) in any of your clusters, you must [configure LDAP](../../csa-introduction.md#confluent-server-authorizer) for brokers running MDS. You must use the prefix `ldap.` in all the LDAP configs. For example: ```RST ldap.java.naming.provider.url=ldap://LDAPSERVER.EXAMPLE.COM:3268/DC=EXAMPLE,DC=COM ``` If you are enabling RBAC in other Confluent Platform components, you should configure brokers running MDS with LDAP configs that match your LDAP server to enable centralized authentication using LDAP. Refer to [Configure LDAP Authentication](../../../kafka/configure-mds/ldap-auth-mds.md#ldap-auth-mds) for details. If your metadata cluster has less than 3 brokers, adjust the replication factor for metadata topics. For example: ```RST confluent.metadata.topic.replication.factor=2 confluent.license.replication.factor=2 ``` ### Roles for accessing topics, streams, and tables Use the following Confluent CLI commands to give an interactive user the necessary roles for creating streams and table. SHOW or PRINT a topic : - `ResourceOwner` role on the Kafka topic - `DeveloperRead` role on the Schema Registry subject, if the topic has an Avro, Protobuf, or JSON_SR schema This role enables an interactive user to display the specified topic by using the SHOW and PRINT statements. Also, users can CREATE streams and tables from these topics. The ksqlDB service principal doesn’t need a role on the topic for these statements. #### NOTE The subject’s name is the topic’s name appended with `-value`. ```bash # Grant read-only access for a user to read a topic. confluent iam rbac role-binding create \ --principal User:$USER_NAME \ --role ResourceOwner \ --resource Topic:$TOPIC_NAME \ $KAFKA_ID ``` ```bash # Grant read-only access for a user to read a subject. confluent iam rbac role-binding create \ --principal User:$USER_NAME \ --role DeveloperRead \ --resource Subject:${TOPIC_NAME}-value $KAFKA_ID \ --schema-registry-cluster $SR_ID ``` SELECT from a stream or table : - `ResourceOwner` role on the source topic - `DeveloperRead` role on the `_confluent-ksql-${KSQLDB_ID}` consumer group - `ResourceOwner` role on the `_confluent-ksql-${KSQLDB_ID}transient` transient topics, for tables - `DeveloperRead` role on the Schema Registry subject, if the topic has an Avro, Protobuf, or JSON_SR schema - `ResourceOwner` role on the `_confluent-ksql-transient*` subjects, for tables that use Avro (not required for streams) These roles enable a user to read from a stream or a table by using the SELECT statement. If a SELECT statement contains a JOIN that uses an unauthorized topic, the SELECT fails with an authorization error. ```bash # Grant read-only access for a user to read a topic. confluent iam rbac role-binding create \ --principal User:$USER_NAME \ --role ResourceOwner \ --resource Topic:$TOPIC_NAME \ $KAFKA_ID ``` ```bash # For tables: grant access to the transient query topics. # This is a limitation of ksqlDB tables. Giving this permission to # the prefixed topics lets the user view tables from other queries. confluent iam rbac role-binding create \ --principal User:$USER_NAME \ --role ResourceOwner \ --resource Topic:_confluent-ksql-${KSQLDB_ID}transient \ --prefix \ $KAFKA_CLUSTER_ID # For tables that use Avro: grant access to the transient subjects. confluent iam rbac role-binding create \ --principal User:$USER_NAME \ --role ResourceOwner \ --resource Subject:_confluent-ksql-${KSQLDB_ID}transient \ --prefix \ $KAFKA_CLUSTER_ID \ --schema-registry-cluster $SR_ID ``` ```bash # Grant read-only access for a user to read from a consumer group. confluent iam rbac role-binding create \ --principal User:$USER_NAME \ --role DeveloperRead \ --resource Group:_confluent-ksql-${KSQLDB_ID} \ --prefix \ $KAFKA_ID ``` ```bash # Grant read-only access for a user to read a subject. confluent iam rbac role-binding create \ --principal User:$USER_NAME \ --role DeveloperRead \ --resource Subject:${TOPIC_NAME}-value $KAFKA_ID \ --schema-registry-cluster $SR_ID ``` Write to a topic with INSERT : - `DeveloperWrite` role on the Kafka topic - `ResourceOwner` or `DeveloperWrite` role on the Schema Registry subject, if the topic has an Avro, Protobuf, or JSON_SR schema These roles enable a user to write data by using INSERT statements. The INSERT INTO statement contains a SELECT clause that requires the user to have read permissions on the topic in the query. ```bash # Grant write access for a user to a topic. confluent iam rbac role-binding create \ --principal User:$USER_NAME \ --role DeveloperWrite \ --resource Topic:$TOPIC_NAME \ $KAFKA_ID ``` ```bash # Grant full access for a user to create a subject and write to it. confluent iam rbac role-binding create \ --principal User:$USER_NAME \ --role ResourceOwner \ --resource Subject:${TOPIC_NAME}-value $KAFKA_ID \ --schema-registry-cluster $SR_ID ``` CREATE STREAM : - `ResourceOwner` role on the source topic - `DeveloperRead` role on the `_confluent-ksql-${KSQLDB_ID}` consumer groups - `ResourceOwner` or `DeveloperWrite` role on the Schema Registry subject, if the source topic has an Avro, Protobuf, or JSON_SR schema These roles enable an interactive user to register a stream or table on the specified topic by using the CREATE STREAM statement. If the topic has an Avro, Protobuf, or JSON_SR schema, the interactive user and the ksqlDB service principal must have full access for the subject in Schema Registry. ```bash # Grant read-only access for a user to a topic. confluent iam rbac role-binding create \ --principal User:$USER_NAME \ --role ResourceOwner \ --resource Topic:$TOPIC_NAME \ $KAFKA_ID # Grant read-only access for the ksql service principal to a topic. confluent iam rbac role-binding create \ --principal User:ksql \ --role ResourceOwner \ --resource Topic:$TOPIC_NAME \ $KAFKA_ID ``` ```bash # Grant read-only access for a user to the ksqlDB consumer groups. confluent iam rbac role-binding create \ --principal User:$USER_NAME \ --role DeveloperRead \ --resource Group:_confluent-ksql-${KSQLDB_ID} \ --prefix \ $KAFKA_ID # Grant read-only access for the ksqlDB service principal to the ksqlDB consumer groups. confluent iam rbac role-binding create \ --principal User:ksql \ --role DeveloperRead \ --resource Group:_confluent-ksql-${KSQLDB_ID} \ --prefix \ $KAFKA_ID ``` ```bash # Grant full access for a user to create a subject and write to it. confluent iam rbac role-binding create \ --principal User:$USER_NAME \ --role ResourceOwner \ --resource Subject:${TOPIC_NAME}-value \ $KAFKA_ID \ --schema-registry-cluster $SR_ID # Grant full access for the ksqlDB service principal to create a subject and write to it. confluent iam rbac role-binding create \ --principal User:ksql \ --role ResourceOwner \ --resource Subject:${TOPIC_NAME}-value \ $KAFKA_ID \ --schema-registry-cluster $SR_ID ``` CREATE TABLE : - `ResourceOwner` role on the source topic - `ResourceOwner` role on the `_confluent-ksql-${KSQLDB_ID}transient` transient topics - `ResourceOwner` or `DeveloperWrite` role on the Schema Registry subject, if the source topic has an Avro, Protobuf, or JSON_SR schema - `ResourceOwner` role on `_confluent-ksql-${KSQLDB_ID}*` subjects (for tables that use Avro, Protobuf, or JSON_SR) These roles enable an interactive user to register a table on the specified topic by using the CREATE TABLE statement. If the topic has an Avro, Protobuf, or JSON_SR schema, the interactive user and the ksqlDB service principal must have full access for the subject in Schema Registry. #### NOTE The `ResourceOwner` role on the transient topics is a limitation of KSQL tables. Giving this permission to the prefixed topics lets the user view tables from other queries. ```bash # Grant read-only access for a user to a topic. confluent iam rbac role-binding create \ --principal User:$USER_NAME \ --role ResourceOwner \ --resource Topic:$TOPIC_NAME \ $KAFKA_ID # Grant read-only access for ksql service principal to a topic. confluent iam rbac role-binding create \ --principal User:ksql \ --role ResourceOwner \ --resource Topic:$TOPIC_NAME \ $KAFKA_ID ``` ```bash # Grant full access for a user to the transient query topics. # This is a limitation of ksqlDB tables. Giving this permission to # the prefixed topics lets the user view tables from other queries. confluent iam rbac role-binding create \ --principal User:$USER_NAME \ --role ResourceOwner \ --resource Topic:_confluent-ksql-${KSQLDB_ID}transient \ --prefix \ $KAFKA_ID # Grant full access for the ksql service principal to ksqlDB transient topics. confluent iam rbac role-binding create \ --principal User:ksql \ --role ResourceOwner \ --resource Topic:_confluent-ksql-${KSQLDB_ID}transient \ --prefix \ $KAFKA_ID ``` ```bash # Grant full access for a user to create a subject and write to it. confluent iam rbac role-binding create \ --principal User:$USER_NAME \ --role ResourceOwner \ --resource Subject:${TOPIC_NAME}-value \ $KAFKA_ID \ --schema-registry-cluster $SR_ID # Grant full access for the ksql service principal to create a subject and write to it. confluent iam rbac role-binding create \ --principal User:ksql \ --role ResourceOwner \ --resource Subject:${TOPIC_NAME}-value \ $KAFKA_ID \ --schema-registry-cluster $SR_ID # For tables that use Avro, Protobuf, or JSON_SR: # Grant full access for the ksql service principal to all internal ksql subjects. confluent iam rbac role-binding create \ --principal User:ksql \ --role ResourceOwner \ --resource Subject:_confluent-ksql-${KSQLDB_ID} \ --prefix \ $KAFKA_ID \ --schema-registry-cluster $SR_ID ``` Create a stream or table with a persistent query : - `ResourceOwner` role on the source topic - `ResourceOwner` role on the ksqlDB sink topic - `ResourceOwner` or `DeveloperWrite` role on the Schema Registry sink subject, if the source topic has an Avro, Protobuf, or JSON_SR schema - `DeveloperRead` role on `_confluent-ksql-${KSQLDB_ID}*` subjects (for tables that use Avro, Protobuf, or JSON_SR) These roles enable a user to create streams and tables with persistent queries. Because ksqlDB creates a new *sink topic*, the user must have sufficient permissions to create, read, and write to the sink topic. The `ResourceOwner` role is necessary on the sink topic, because the interactive user and the ksqlDB service principal need permissions to create the sink topic if it doesn’t exist already. #### NOTE The sink topic has the same name as the stream or table and is all uppercase. If the topic has an Avro schema, the interactive user and the ksqlDB service principal must have `ResourceOwner` or `DeveloperWrite` permission on the sink topic’s subject in Schema Registry. For tables that are created with a persistent query and use Avro, Protobuf, or JSON_SR the ksqlDB service principal must have `DeveloperRead` permission on all internal subjects. #### NOTE The subject’s name is the sink topic’s name appended with `-value`. ```bash # Grant read-only access for a user to a topic. confluent iam rbac role-binding create \ --principal User:$USER_NAME \ --role ResourceOwner \ --resource Topic:$SOURCE_TOPIC_NAME \ $KAFKA_ID # Grant read-only access for the ksql service principal to a topic. confluent iam rbac role-binding create \ --principal User:ksql \ --role ResourceOwner \ --resource Topic:$SOURCE_TOPIC_NAME \ $KAFKA_ID ``` ```bash # Grant read-only access for a user to a topic. confluent iam rbac role-binding create \ --principal User:$USER_NAME \ --role ResourceOwner \ --resource Topic:$SINK_TOPIC_NAME \ $KAFKA_ID # Grant read-only access for the ksql service principal to a topic. confluent iam rbac role-binding create \ --principal User:ksql \ --role ResourceOwner \ --resource Topic:$SINK_TOPIC_NAME \ $KAFKA_ID ``` ```bash # Grant full access for a user to create a subject and write to it. confluent iam rbac role-binding create \ --principal User:$USER_NAME \ --role ResourceOwner \ --resource Subject:${SINK_TOPIC_NAME}-value \ $KAFKA_ID \ --schema-registry-cluster $SR_ID # Grant full access for the ksql service principal to create a subject and write to it. confluent iam rbac role-binding create \ --principal User:ksql \ --role ResourceOwner \ --resource Subject:${SINK_TOPIC_NAME}-value \ $KAFKA_ID \ --schema-registry-cluster $SR_ID # For tables that use Avro, Protobuf, or JSON_SR created with a persistent query: # Grant read access for the ksql service principal to all internal ksql subjects. confluent iam rbac role-binding create \ --principal User:ksql \ --role DeveloperRead \ --resource Subject:_confluent-ksql-${KSQLDB_ID} \ --prefix \ $KAFKA_ID \ --schema-registry-cluster $SR_ID ``` Full control over a topic and schema : - `ResourceOwner` role on the Kafka topic - `ResourceOwner` role on the Schema Registry subject, if the topic has an Avro, Protobuf, or JSON_SR schema Use the following commands to grant a user full control over a topic and its schema, including permissions to delete the topic and schema. ```bash # Grant full access for a user to manage a topic. confluent iam rbac role-binding create \ --principal User:$USER_NAME \ --role ResourceOwner \ --resource Topic:$TOPIC_NAME \ $KAFKA_ID ``` ```bash # Grant full access for a user to manage a subject. confluent iam rbac role-binding create \ --principal User:$USER_NAME \ --role ResourceOwner \ --resource Subject:${TOPIC_NAME}-value $KAFKA_ID \ --schema-registry-cluster $SR_ID ``` Delete a topic : - `DeveloperManage` role on the Kafka topic - `DeveloperManage` role on the Schema Registry subject, if the topic has an Avro, Protobuf, or JSON_SR schema These roles enable a user to delete a topic by using the DROP STREAM/TABLE [DELETE TOPIC] statements. Use the following commands to grant a user delete access to a topic and corresponding schema. ```bash # Grant delete access for a user to a topic. confluent iam rbac role-binding create \ --principal User:$USER_NAME \ --role DeveloperManage \ --resource Topic:$TOPIC_NAME \ $KAFKA_ID ``` ```bash # Grant delete access for a user to a subject. confluent iam rbac role-binding create \ --principal User:$USER_NAME \ --role DeveloperManage \ --resource Subject:${TOPIC_NAME}-value $KAFKA_ID \ --schema-registry-cluster $SR_ID ``` ### POST /security/1.0/principals/{principal}/roles/{roleName} **Binds the principal to a cluster-scoped role for a specific cluster or in the given scope.** Callable by Admins. * **Parameters:** * **principal** (*string*) – Fully-qualified KafkaPrincipal string for a user or group. * **roleName** (*string*) – The name of the cluster-scoped role to bind the user to. **Example request:** ```http POST /security/1.0/principals/{principal}/roles/{roleName} HTTP/1.1 Host: example.com Content-Type: application/json { "clusterName": "string", "clusters": { "kafka-cluster": "string", "connect-cluster": "string", "ksql-cluster": "string", "schema-registry-cluster": "string", "cmf": "string", "flink-environment": "string" } } ``` * **Status Codes:** * [204 No Content](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.5) – Role Granted * *default* – Error Response **Example response:** ```http HTTP/1.1 default - Content-Type: application/json { "status_code": 1, "error_code": 1, "type": "string", "message": "string", "errors": [ { "error_type": "string", "message": "string" } ] } ``` ### DELETE /security/1.0/principals/{principal}/roles/{roleName} **Remove the role (cluster or resource scoped) from the principal at the given scope/cluster.** No-op if the user doesn’t have the role. Callable by Admins. * **Parameters:** * **principal** (*string*) – Fully-qualified KafkaPrincipal string for a user or group. * **roleName** (*string*) – The name of the role. **Example request:** ```http DELETE /security/1.0/principals/{principal}/roles/{roleName} HTTP/1.1 Host: example.com Content-Type: application/json { "clusterName": "string", "clusters": { "kafka-cluster": "string", "connect-cluster": "string", "ksql-cluster": "string", "schema-registry-cluster": "string", "cmf": "string", "flink-environment": "string" } } ``` * **Status Codes:** * [204 No Content](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.5) – Role removal processed. * *default* – Error Response **Example response:** ```http HTTP/1.1 default - Content-Type: application/json { "status_code": 1, "error_code": 1, "type": "string", "message": "string", "errors": [ { "error_type": "string", "message": "string" } ] } ``` ### POST /security/1.0/principals/{principal}/roles/{roleName}/bindings **Incrementally grant the resources to the principal at the given scope/cluster using the given role.** Callable by Admins+ResourceOwners. * **Parameters:** * **principal** (*string*) – Fully-qualified KafkaPrincipal string for a user or group. * **roleName** (*string*) – The name of the role. **Example request:** ```http POST /security/1.0/principals/{principal}/roles/{roleName}/bindings HTTP/1.1 Host: example.com Content-Type: application/json { "scope": { "clusterName": "string", "clusters": { "kafka-cluster": "string", "connect-cluster": "string", "ksql-cluster": "string", "schema-registry-cluster": "string", "cmf": "string", "flink-environment": "string" } }, "resourcePatterns": [ { "resourceType": "string", "name": "string", "patternType": "string" } ] } ``` * **Status Codes:** * [204 No Content](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.5) – Granted * *default* – Error Response **Example response:** ```http HTTP/1.1 default - Content-Type: application/json { "status_code": 1, "error_code": 1, "type": "string", "message": "string", "errors": [ { "error_type": "string", "message": "string" } ] } ``` ### DELETE /security/1.0/principals/{principal}/roles/{roleName}/bindings **Incrementally remove the resources from the principal at the given scope/cluster using the given role.** Callable by Admins+ResourceOwners. * **Parameters:** * **principal** (*string*) – Fully-qualified KafkaPrincipal string for a user or group. * **roleName** (*string*) – The name of the role. **Example request:** ```http DELETE /security/1.0/principals/{principal}/roles/{roleName}/bindings HTTP/1.1 Host: example.com Content-Type: application/json { "scope": { "clusterName": "string", "clusters": { "kafka-cluster": "string", "connect-cluster": "string", "ksql-cluster": "string", "schema-registry-cluster": "string", "cmf": "string", "flink-environment": "string" } }, "resourcePatterns": [ { "resourceType": "string", "name": "string", "patternType": "string" } ] } ``` * **Status Codes:** * [204 No Content](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.5) – Resources Removed * *default* – Error Response **Example response:** ```http HTTP/1.1 default - Content-Type: application/json { "status_code": 1, "error_code": 1, "type": "string", "message": "string", "errors": [ { "error_type": "string", "message": "string" } ] } ``` ### PUT /security/1.0/principals/{principal}/roles/{roleName}/bindings **Overwrite existing resource grants.** Callable by Admins+ResourceOwners. * **Parameters:** * **principal** (*string*) – Fully-qualified KafkaPrincipal string for a user or group. * **roleName** (*string*) – The name of the role. **Example request:** ```http PUT /security/1.0/principals/{principal}/roles/{roleName}/bindings HTTP/1.1 Host: example.com Content-Type: application/json { "scope": { "clusterName": "string", "clusters": { "kafka-cluster": "string", "connect-cluster": "string", "ksql-cluster": "string", "schema-registry-cluster": "string", "cmf": "string", "flink-environment": "string" } }, "resourcePatterns": [ { "resourceType": "string", "name": "string", "patternType": "string" } ] } ``` * **Status Codes:** * [204 No Content](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.5) – Resources Set * *default* – Error Response **Example response:** ```http HTTP/1.1 default - Content-Type: application/json { "status_code": 1, "error_code": 1, "type": "string", "message": "string", "errors": [ { "error_type": "string", "message": "string" } ] } ``` ### POST /security/1.0/lookup/principals/{principal}/roleNames **Returns the effective list of role names for a principal.** For groups, these are the roles that are bound. For users, this is the combination of roles granted to the specific user and roles granted to the user’s groups. Callable by Admins+User. * **Parameters:** * **principal** (*string*) – Fully-qualified KafkaPrincipal string for a user or group. **Example request:** ```http POST /security/1.0/lookup/principals/{principal}/roleNames HTTP/1.1 Host: example.com Content-Type: application/json { "clusterName": "string", "clusters": { "kafka-cluster": "string", "connect-cluster": "string", "ksql-cluster": "string", "schema-registry-cluster": "string", "cmf": "string", "flink-environment": "string" } } ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – List of role names. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json [ "Cluster Admin", "Security Admin" ] ``` * *default* – Error Response **Example response:** ```http HTTP/1.1 default - Content-Type: application/json { "status_code": 1, "error_code": 1, "type": "string", "message": "string", "errors": [ { "error_type": "string", "message": "string" } ] } ``` ### POST /security/1.0/lookup/role/{roleName} **Look up the KafkaPrincipals who have the given role for the given scope.** Callable by Admins. * **Parameters:** * **roleName** (*string*) – Role name to look up. **Example request:** ```http POST /security/1.0/lookup/role/{roleName} HTTP/1.1 Host: example.com Content-Type: application/json { "clusterName": "string", "clusters": { "kafka-cluster": "string", "connect-cluster": "string", "ksql-cluster": "string", "schema-registry-cluster": "string", "cmf": "string", "flink-environment": "string" } } ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – List of fully-qualified KafkaPrincipals. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json [ "User:alice", "Group:FinanceAdmin" ] ``` * *default* – Error Response **Example response:** ```http HTTP/1.1 default - Content-Type: application/json { "status_code": 1, "error_code": 1, "type": "string", "message": "string", "errors": [ { "error_type": "string", "message": "string" } ] } ``` ### POST /security/1.0/lookup/role/{roleName}/resource/{resourceType}/name/{resourceName} **Look up the KafkaPrincipals who have the given role on the specified resource for the given scope.** Callable by Admins. * **Parameters:** * **roleName** (*string*) – Role name to look up. * **resourceType** (*string*) – Type of resource to look up. * **resourceName** (*string*) – Name of resource to look up. **Example request:** ```http POST /security/1.0/lookup/role/{roleName}/resource/{resourceType}/name/{resourceName} HTTP/1.1 Host: example.com Content-Type: application/json { "clusterName": "string", "clusters": { "kafka-cluster": "string", "connect-cluster": "string", "ksql-cluster": "string", "schema-registry-cluster": "string", "cmf": "string", "flink-environment": "string" } } ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – List of fully-qualified KafkaPrincipals. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json [ "User:alice", "Group:FinanceAdmin" ] ``` * *default* – Error Response **Example response:** ```http HTTP/1.1 default - Content-Type: application/json { "status_code": 1, "error_code": 1, "type": "string", "message": "string", "errors": [ { "error_type": "string", "message": "string" } ] } ``` ### GET /security/1.0/registry/clusters **Returns a list of all clusters in the registry, optionally filtered by cluster type.** If the calling principal doesn’t have permissions to see the full cluster info, some information (“hosts”, “protocol”, etc) is redacted. Callable by Admins+User. * **Query Parameters:** * **clusterType** (*string*) – Optionally filter down by cluster type. **Example request:** ```http GET /security/1.0/registry/clusters HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – List of Clusters. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json [ { "clusterName": "string", "scope": { "clusters": { "kafka-cluster": "string", "connect-cluster": "string", "ksql-cluster": "string", "schema-registry-cluster": "string", "cmf": "string", "flink-environment": "string" } }, "hosts": [ { "host": "string", "port": 1 } ], "protocol": "SASL_PLAINTEXT" } ] ``` * *default* – Error Response **Example response:** ```http HTTP/1.1 default - Content-Type: application/json { "status_code": 1, "error_code": 1, "type": "string", "message": "string", "errors": [ { "error_type": "string", "message": "string" } ] } ``` ### POST /security/1.0/registry/clusters **Define/overwrite named clusters.** May result in a 409 Conflict if the name and scope combination of any cluster conflicts with existing clusters in the registry. Callable by Admins. **Example request:** ```http POST /security/1.0/registry/clusters HTTP/1.1 Host: example.com Content-Type: application/json [ { "clusterName": "string", "scope": { "clusters": { "kafka-cluster": "string", "connect-cluster": "string", "ksql-cluster": "string", "schema-registry-cluster": "string", "cmf": "string", "flink-environment": "string" } }, "hosts": [ { "host": "string", "port": 1 } ], "protocol": "SASL_PLAINTEXT" } ] ``` * **Status Codes:** * [204 No Content](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.5) – Clusters added. * *default* – Error Response **Example response:** ```http HTTP/1.1 default - Content-Type: application/json { "status_code": 1, "error_code": 1, "type": "string", "message": "string", "errors": [ { "error_type": "string", "message": "string" } ] } ``` ### GET /security/1.0/registry/clusters/{clusterName} **Returns the information for a single named cluster, assuming the cluster exists and is visible to the calling principal.** Callable by Admins+User. * **Parameters:** * **clusterName** (*string*) – The name of cluster (ASCII printable characters without spaces). **Example request:** ```http GET /security/1.0/registry/clusters/{clusterName} HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The named cluster, if it exists and the caller has permission to see it. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "clusterName": "string", "scope": { "clusters": { "kafka-cluster": "string", "connect-cluster": "string", "ksql-cluster": "string", "schema-registry-cluster": "string", "cmf": "string", "flink-environment": "string" } }, "hosts": [ { "host": "string", "port": 1 } ], "protocol": "SASL_PLAINTEXT" } ``` * *default* – Error Response **Example response:** ```http HTTP/1.1 default - Content-Type: application/json { "status_code": 1, "error_code": 1, "type": "string", "message": "string", "errors": [ { "error_type": "string", "message": "string" } ] } ``` ### GET /security/1.0/lookup/managed/clusters/principal/{principal} **Identifies the scopes for the rolebindings that a user can see.** May include rolebindings from scopes and clusters that never existed or previously existed (in other words, rolebindings that have been decommissioned, but are still defined in the system). Callable by Admins+User. * **Parameters:** * **principal** (*string*) – Fully-qualified KafkaPrincipal string for a user or group. * **Query Parameters:** * **clusterType** (*string*) – Filter down by cluster type. **Example request:** ```http GET /security/1.0/lookup/managed/clusters/principal/{principal} HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – List of Scopes **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json [ { "clusters": { "kafka-cluster": "string", "connect-cluster": "string", "ksql-cluster": "string", "schema-registry-cluster": "string", "cmf": "string", "flink-environment": "string" } } ] ``` * *default* – Error Response **Example response:** ```http HTTP/1.1 default - Content-Type: application/json { "status_code": 1, "error_code": 1, "type": "string", "message": "string", "errors": [ { "error_type": "string", "message": "string" } ] } ``` ### GET /security/1.0/lookup/rolebindings/principal/{principal} **List all rolebindings for the specifed principal for all scopes and clusters that have any rolebindings.** Be aware that this simply looks at the rolebinding data, and does not mean that the clusters actually exist. Callable by Admins+User. * **Parameters:** * **principal** (*string*) – Fully-qualified KafkaPrincipal string for a user or group. * **Query Parameters:** * **clusterType** (*string*) – Filter down by a cluster type. **Example request:** ```http GET /security/1.0/lookup/rolebindings/principal/{principal} HTTP/1.1 Host: example.com ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – List of RoleBindings for the user per scope. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json [ { "scope": { "clusterName": "string", "clusters": { "kafka-cluster": "string", "connect-cluster": "string", "ksql-cluster": "string", "schema-registry-cluster": "string", "cmf": "string", "flink-environment": "string" } }, "rolebindings": {} } ] ``` * *default* – Error Response **Example response:** ```http HTTP/1.1 default - Content-Type: application/json { "status_code": 1, "error_code": 1, "type": "string", "message": "string", "errors": [ { "error_type": "string", "message": "string" } ] } ``` ### POST /security/1.0/lookup/rolebindings/principal/{principal} **List all rolebindings for the specifed principal and scope.** Callable by Admins+User. * **Parameters:** * **principal** (*string*) – Fully-qualified KafkaPrincipal string for a user or group. **Example request:** ```http POST /security/1.0/lookup/rolebindings/principal/{principal} HTTP/1.1 Host: example.com Content-Type: application/json { "clusters": { "kafka-cluster": "string", "connect-cluster": "string", "ksql-cluster": "string", "schema-registry-cluster": "string", "cmf": "string", "flink-environment": "string" } } ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – Item per Scope **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "scope": { "clusterName": "string", "clusters": { "kafka-cluster": "string", "connect-cluster": "string", "ksql-cluster": "string", "schema-registry-cluster": "string", "cmf": "string", "flink-environment": "string" } }, "rolebindings": {} } ``` * *default* – Error Response **Example response:** ```http HTTP/1.1 default - Content-Type: application/json { "status_code": 1, "error_code": 1, "type": "string", "message": "string", "errors": [ { "error_type": "string", "message": "string" } ] } ``` ### POST /security/1.0/lookup/managed/clusters/principal/{principal} **Identify the rolebinding abilities (view vs manage) the user has on the specified scope.** Used by the Confluent Control Center UI to control access to rolebinding add/remove buttons. Callable by Admins+ResourceOwners. * **Parameters:** * **principal** (*string*) – Fully-qualified KafkaPrincipal string for a user or group. **Example request:** ```http POST /security/1.0/lookup/managed/clusters/principal/{principal} HTTP/1.1 Host: example.com Content-Type: application/json { "clusters": { "kafka-cluster": "string", "connect-cluster": "string", "ksql-cluster": "string", "schema-registry-cluster": "string", "cmf": "string", "flink-environment": "string" } } ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The rolebinding abilities the user has for a specified scope. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "cluster": [ "string" ], "resources": {} } ``` * *default* – Error Response **Example response:** ```http HTTP/1.1 default - Content-Type: application/json { "status_code": 1, "error_code": 1, "type": "string", "message": "string", "errors": [ { "error_type": "string", "message": "string" } ] } ``` ### POST /security/1.0/lookup/managed/rolebindings/principal/{principal} **Identify the rolebindings this user can see and manage.** Callable by Admins+ResourceOwners. * **Parameters:** * **principal** (*string*) – Fully-qualified KafkaPrincipal string for a user or group. * **Query Parameters:** * **resourceType** (*string*) – Filter down by resource type. **Example request:** ```http POST /security/1.0/lookup/managed/rolebindings/principal/{principal} HTTP/1.1 Host: example.com Content-Type: application/json { "clusters": { "kafka-cluster": "string", "connect-cluster": "string", "ksql-cluster": "string", "schema-registry-cluster": "string", "cmf": "string", "flink-environment": "string" } } ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – Rolebindings that the user can manage. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "scope": { "clusters": { "kafka-cluster": "string", "connect-cluster": "string", "ksql-cluster": "string", "schema-registry-cluster": "string", "cmf": "string", "flink-environment": "string" } }, "cluster_role_bindings": {}, "resource_role_bindings": {} } ``` * *default* – Error Response **Example response:** ```http HTTP/1.1 default - Content-Type: application/json { "status_code": 1, "error_code": 1, "type": "string", "message": "string", "errors": [ { "error_type": "string", "message": "string" } ] } ``` ### POST /security/1.0/rbac/principals **List of MDS cached users and groups.** For use by a rolebinding admin on the provided scope. Callable by Admins+ResourceOwners, but not broker super.users. * **Query Parameters:** * **type** (*string*) – The type of principals requested. **Example request:** ```http POST /security/1.0/rbac/principals HTTP/1.1 Host: example.com Content-Type: application/json { "clusters": { "kafka-cluster": "string", "connect-cluster": "string", "ksql-cluster": "string", "schema-registry-cluster": "string", "cmf": "string", "flink-environment": "string" } } ``` * **Status Codes:** * [200 OK](https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.2.1) – The list of principals for the requested type, or all principals. **Example response:** ```http HTTP/1.1 200 OK Content-Type: application/json [ "Group:admin", "Group:developers", "Group:users", "User:alice", "User:bob", "User:charlie", "User:david" ] ``` * *default* – Error Response **Example response:** ```http HTTP/1.1 default - Content-Type: application/json { "status_code": 1, "error_code": 1, "type": "string", "message": "string", "errors": [ { "error_type": "string", "message": "string" } ] } ``` ## Grant topic permissions To interact with topics using the [Kafka CLI tools](../../../tools/cli-reference.md#cp-all-cli), you must provide a JAAS configuration that enables Kafka CLI tools to authenticate with a broker. You can provide the JAAS configuration using a file (`--command-config`) or using the command line options `--producer-property` or `--consumer-property` for the producer or consumer. This configuration is required for creating topics, producing, consuming, and more. For example: ```none kafka-console-producer --producer-property sasl.mechanism=OAUTHBEARER ``` The value you specify in `sasl.mechanism` depends on your broker’s security configuration for the port. In this case, OAUTHBEARER is used because it is the default configuration in the automated RBAC demo. However, you can use any authentication mechanism exposed by your broker. # Grant read-only access for a user to a topic. confluent iam rbac role-binding create \ --principal User: \ --role DeveloperRead \ --resource Topic: \ ``` When creating role bindings for Schema Registry, ksqlDB, and Connect you must provide two identifiers: the Kafka cluster identifier and an additional component cluster identifier. For example, the following command assigns the `DeveloperWrite` role on a topic in a Schema Registry cluster: ```bash ## Prerequisites - [Migrate legacy audit log configurations](audit-logs-cli-config.md#migrate-kafka-cluster-audit-log-configs) from all of your Kafka clusters into a combined JSON policy. #### IMPORTANT You must satisfy this prerequisite prior to registering all of the Kafka clusters. - [Register](../../cluster-registry.md#cluster-registry-registering) all of your Kafka clusters, including the MDS cluster, in the [Cluster Registry in Confluent Platform](../../cluster-registry.md#cluster-registry). #### NOTE MDS cluster registration does not occur by default. You must explicitly register the MDS cluster in the cluster registry before registering other clusters. - Configure all of your registered clusters to use the same MDS for RBAC. - The MDS cluster uses the admin client to communicate with registered clusters (managed clusters). Ensure that the MDS admin client can connect to all of your registered clusters by having them expose an authentication token listener (for example, `listener.name.external.sasl.enabled.mechanisms=OAUTHBEARER`), and registering that listener’s port in the cluster registry. When using SASL_SSL, only use TLS keys that are verifiable by certificates in your client trust stores. - Configure a cluster to receive the audit logs. Set up an audit log writer user (with a name like `auditlogwriter`) on that cluster with the ability to write to the destination topics. For example, grant the DeveloperWrite role on the topic prefix `confluent-audit-log-events`. For details, refer to [Configure the audit log writer to the destination cluster](#mds-cluster-exporter-config). - Grant the [AuditAdmin](../../authorization/rbac/rbac-predefined-roles.md#rbac-predefined-roles) role on all your Kafka clusters to users or groups who will be managing the audit log configuration. #### NOTE The recommended way to grant permissions is for the audit log administrator to run any of the Confluent CLI [confluent audit-log](https://docs.confluent.io/confluent-cli/current/command-reference/audit-log/index.html) commands. The error message returns a list of recommended role bindings to grant to the user: ```none confluent login --url "http://mds.example.com:8090" # authenticate as user "alice" confluent audit-log config describe Error: 403 Forbidden User:alice not permitted to DescribeConfigs on one or more clusters. Fix it: confluent iam rbac role-binding create --role AuditAdmin --principal User:alice --kafka-cluster DBS26_qTQ-mT23p5opUK_g confluent iam rbac role-binding create --role AuditAdmin --principal User:alice --kafka-cluster prz9a_-xqqlRgmekDoLw4U ``` ### Docker configuration When you enable security for the Confluent Platform, you must pass secrets (for example, credentials, certificates, keytabs, Kerberos config) to the container. The images handle this by using the credentials available in the secrets directory. The containers specify a Docker volume for secrets, which the admin must map to a directory on the host that contains the required secrets. For example, if the `securities.properties` file is located on the host at `/scripts/security`, and you want it mounted at `/secrets` in the Docker container, then you would specify: ```yaml volumes: - ./scripts/security:/secrets ``` To configure secrets protection in Docker images, you must manually add the following configuration to the `docker-compose.yml` file: ```yaml CONFLUENT_SECURITY_MASTER_KEY: _CONFIG_PROVIDERS: "securepass" _CONFIG_PROVIDERS_SECUREPASS_CLASS: "io.confluent.kafka.security.config.provider.SecurePassConfigProvider" ``` `` can be any of the following: - `KAFKA` - `KSQL` - `CONNECT` - `SCHEMA_REGISTRY` - `CONTROL_CENTER` For details about Docker configuration options, refer to [Docker Image Configuration Reference for Confluent Platform](../../../installation/docker/config-reference.md#config-reference). For a Confluent Server broker, your configuration should look like the following: ```yaml CONFLUENT_SECURITY_MASTER_KEY: KAFKA_CONFIG_PROVIDERS: "securepass" KAFKA_CONFIG_PROVIDERS_SECUREPASS_CLASS: "io.confluent.kafka.security.config.provider.SecurePassConfigProvider" ``` ## Opening multiple ports Alternatively, you might choose to open multiple ports so that different protocols can be used for broker-broker and broker-client communication. If you want to use TLS encryption throughout (for example, for broker-broker and broker-client communication), but also want to add SASL authentication to the broker-client connection: 1. Open two additional ports during the first restart: ```bash listeners=PLAINTEXT://broker1:9091,SSL://broker1:9092,SASL_SSL://broker1:9093 ``` 2. Again, restart the Kafka clients, changing their configuration to point at the newly-opened, SASL and TLS secured port: ```bash bootstrap.servers=[broker1:9093,...] security.protocol=SASL_SSL ...etc ``` For more details, refer to [SASL](authentication/overview.md#kafka-sasl-auth). 3. The second server restart would switch the cluster to use encrypted broker-broker communication using the TLS port you previously opened on port 9092: ```bash listeners=PLAINTEXT://broker1:9091,SSL://broker1:9092,SASL_SSL://broker1:9093 security.inter.broker.protocol=SSL ``` 4. The final restart secures the cluster by closing the `PLAINTEXT` port: ```bash listeners=SSL://broker1:9092,SASL_SSL://broker1:9093 security.inter.broker.protocol=SSL ``` ### Step 3 - Start the consumer with decryption To start the consumer with decryption, run the `kafka-avro-console-consumer` command for the KMS provider that you want to use, where `` is the bootstrap URL for your Confluent Platform cluster. ```shell ./bin/kafka-avro-console-consumer --bootstrap-server \ --topic test \ --property schema.registry.url= \ --consumer.config config.properties ``` After you run the producer and consumer, you can verify that the data is encrypted and decrypted by using the `kafka-configs --describe` command for the topic. ```shell kafka-configs --bootstrap-server \ --entity-type topics \ --entity-name test \ --describe ``` Example test record should look like this: ```text {"f1": "foo"} {"f1": "foo", "f2": {"string": "bar"}} ``` ## Configure Confluent Server brokers Administrators can configure a mix of secure and unsecured clients. This tutorial ensures that all broker/client and interbroker network communication is encrypted in the following manner: * All broker/client communication use `SASL_SSL` security protocol, which ensures that the communication is encrypted and authenticated using SASL/PLAIN. * All interbroker communication use `SSL` security protocol, which ensures that the communication is encrypted and authenticated using TLS. * The unsecured `PLAINTEXT` port is not enabled. The steps are as follows: 1. Enable the desired security protocols and ports in each Confluent Server broker’s `server.properties`. Notice that both `SSL` and `SASL_SSL` are enabled. ```bash listeners=SSL://:9093,SASL_SSL://:9094 # Kraft-specific configurations for the broker role # process.roles should be 'broker' for a dedicated broker node, 'controller' for a dedicated controller node, # or 'broker,controller' for a combined node. process.roles=broker,controller node.id={unique_node_id} # Unique ID for this broker/controller node # The list of controller nodes in the Kraft quorum. # Format: @:,@:,... # For example: 1@localhost:9093,2@localhost:9093,3@localhost:9093 controller.quorum.voters={node_id_1}@{host_1}:{port_1},{node_id_2}@{host_2}:{port_2},{node_id_3}@{host_3}:{port_3} ``` 2. To enable the Confluent Server brokers to authenticate each other using mutual TLS (mTLS) authentication, you need to configure all the Confluent Server brokers for client authentication (in this case, the requesting broker is the “client”). We recommend setting `ssl.client.auth=required`. We discourage configuring it as `requested` because misconfigured brokers will still connect successfully and it provides a false sense of security. ```bash security.inter.broker.protocol=SSL ssl.client.auth=required ``` 3. Define the TLS/SSL truststore, keystore, and password in the `server.properties` file of every Confluent Server broker. Because this stores passwords directly in the Confluent Server broker configuration file, it is important to restrict access to these files using file system permissions. ```bash ssl.truststore.location=/var/ssl/private/kafka.server.truststore.jks ssl.truststore.password=test1234 ssl.keystore.location=/var/ssl/private/kafka.server.keystore.jks ssl.keystore.password=test1234 ssl.key.password=test1234 ``` 4. Enable SASL/PLAIN mechanism in the `server.properties` file of every broker. ```bash sasl.enabled.mechanisms=PLAIN ``` 5. Create the broker’s JAAS configuration file in each Confluent Server broker’s `config` directory, let’s call it `kafka_server_jaas.conf` for this example. * Configure a `KafkaServer` section used when the broker validates client connections, including those from other brokers. The broker properties `username` and `password` are used to initiate connections to other brokers, and in this example, `kafkabroker` is the user for interbroker communication. The `user_{userName}` property set defines the passwords for all other clients that connect to the broker. In this example, there are two users `kafkabroker` and `client`. #### NOTE Note the two semicolons in each section. ```bash KafkaServer { org.apache.kafka.common.security.plain.PlainLoginModule required username="kafkabroker" password="kafkabroker-secret" user_kafkabroker="kafkabroker-secret" user_kafka-broker-metric-reporter="kafkabroker-metric-reporter-secret" user_client="client-secret"; }; ``` 1. If you are using Confluent Control Center to monitor your deployment, and if the monitoring cluster backing Confluent Control Center is also configured with the same security protocols, you must configure the Confluent Metrics Reporter for security as well. Add these configurations to the `server.properties` file of each Confluent Server broker. ```bash metric.reporters=io.confluent.metrics.reporter.ConfluentMetricsReporter confluent.metrics.reporter.security.protocol=SASL_SSL confluent.metrics.reporter.ssl.truststore.location=/var/ssl/private/kafka.server.truststore.jks confluent.metrics.reporter.ssl.truststore.password=test1234 confluent.metrics.reporter.sasl.mechanism=PLAIN confluent.metrics.reporter.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \ username="kafka-broker-metric-reporter" \ password="kafka-broker-metric-reporter-secret"; ``` 2. To enable ACLs, we need to configure an authorizer. Kafka provides a simple authorizer implementation, and to use it, you can add the following to `server.properties`: ```shell authorizer.class.name=kafka.security.authorizer.AclAuthorizer ``` 3. The default behavior is such that if a resource has no associated ACLs, then no one is allowed to access the resource, except super users. Setting Confluent Server broker principals as super users is a convenient way to give them the required access to perform interbroker operations. Because this tutorial configures the interbroker security protocol as SSL, set the super user name to be the `distinguished name` configured in the broker’s certificate. (See other [authorization configuration options](authorization/acls/overview.md#kafka-auth-superuser)). ```bash .. comment: START_KRAFT_ADDITIONS_SUPER_USERS super.users=User:;User:;User:;User:;User:kafka-broker-metric-reporter .. comment: END_KRAFT_ADDITIONS_SUPER_USERS ``` Combining the configuration steps described above, the Confluent Server broker’s `server.properties` file contains the following configuration settings: ```bash .. comment: START_KRAFT_ADDITIONS_COMBINED ### Configure Console Producer and Consumer The command line tools for console producer and consumer are convenient ways to send and receive a small amount of data to the cluster. They are clients and thus need security configurations as well. 1. Create a `client_security.properties` file with the security configuration parameters described above, with no additional configuration prefix. ```bash security.protocol=SASL_SSL ssl.truststore.location=/var/ssl/private/kafka.client.truststore.jks ssl.truststore.password=test1234 sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \ username="client" \ password="client-secret"; ``` 2. Pass in the properties file when using the command line tools. ```bash kafka-console-producer --bootstrap-server kafka1:9094 --topic test-topic --producer.config client_security.properties kafka-console-consumer --bootstrap-server kafka1:9094 --topic test-topic --consumer.config client_security.properties ``` ## Replicator Confluent Replicator is a type of Confluent Platform source connector that replicates data from a source to destination Confluent Platform cluster. An embedded consumer inside Replicator consumes data from the source cluster, and an embedded producer inside the Kafka Connect worker produces data to the destination cluster. Take the basic client security configuration: ```bash security.protocol=SASL_SSL ssl.truststore.location=/var/ssl/private/kafka.client.truststore.jks ssl.truststore.password=test1234 sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \ username="client" \ password="client-secret"; ``` And configure Replicator for the following: * Top-level Replicator consumer from the origin cluster, with an additional configuration prefix `src.kafka.` Combining the configuration steps described above, the Replicator JSON properties file contains the following configuration settings: ```bash { "name":"replicator", "config":{ .... "src.kafka.security.protocol" : "SASL_SSL", "src.kafka.ssl.truststore.location" : "var/private/ssl/kafka.server.truststore.jks", "src.kafka.ssl.truststore.password" : "test1234", "src.kafka.sasl.mechanism" : "PLAIN", "src.kafka.sasl.jaas.config" : "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"replicator\" password=\"replicator-secret\";", .... } } ``` #### default.timestamp.extractor A timestamp extractor pulls a timestamp from an instance of [ConsumerRecord](/platform/current/clients/javadocs/javadoc/org/apache/kafka/clients/consumer/ConsumerRecord.html). Timestamps are used to control the progress of streams. The default extractor is [FailOnInvalidTimestamp](/platform/current/streams/javadocs/javadoc/org/apache/kafka/streams/processor/FailOnInvalidTimestamp.html). This extractor retrieves built-in timestamps that are automatically embedded into Kafka messages by the Kafka producer client since [Kafka version 0.10](https://cwiki.apache.org/confluence/x/eaSnAw). Depending on the setting of Kafka’s server-side `log.message.timestamp.type` broker and `message.timestamp.type` topic parameters, this extractor provides you with: * **event-time** processing semantics if `log.message.timestamp.type` is set to `CreateTime` aka “producer time” (which is the default). This represents the time when a Kafka producer sent the original message. If you use Kafka’s official producer client or one of Confluent’s producer clients, the timestamp represents milliseconds since the epoch. * **ingestion-time** processing semantics if `log.message.timestamp.type` is set to `LogAppendTime` aka “broker time”. This represents the time when the Kafka broker received the original message, in milliseconds since the epoch. The `FailOnInvalidTimestamp` extractor throws an exception if a record contains an invalid, that is, negative, built-in timestamp, because Kafka Streams would not process this record but silently drop it. Invalid built-in timestamps can occur for various reasons: if, for example, you consume a topic that is written to by pre-0.10 Kafka producer clients or by third-party producer clients that don’t support the new Kafka 0.10 message format yet; another situation in which this may happen is after upgrading your Kafka cluster from `0.9` to `0.10`, where all the data that was generated with `0.9` does not include the `0.10` message timestamps. If you have data with invalid timestamps and want to process it, then there are two alternative extractors available. Both work on built-in timestamps, but handle invalid timestamps differently. * [LogAndSkipOnInvalidTimestamp](/platform/current/streams/javadocs/javadoc/org/apache/kafka/streams/processor/LogAndSkipOnInvalidTimestamp.html): This extractor logs a warn message and returns the invalid timestamp to Kafka Streams, which will not process but silently drop the record. This log-and-skip strategy allows Kafka Streams to make progress instead of failing if there are records with an invalid built-in timestamp in your input data. * [UsePartitionTimeOnInvalidTimestamp](/platform/current/streams/javadocs/javadoc/org/apache/kafka/streams/processor/UsePartitionTimeOnInvalidTimestamp.html). This extractor returns the record’s built-in timestamp if it is valid, that is, not negative. If the record does not have a valid built-in timestamps, the extractor returns the previously extracted valid timestamp from a record of the same topic partition as the current record as a timestamp estimation. In case that no timestamp can be estimated, it throws an exception. Another built-in extractor is [WallclockTimestampExtractor](/platform/current/streams/javadocs/javadoc/org/apache/kafka/streams/processor/WallclockTimestampExtractor.html). This extractor does not actually “extract” a timestamp from the consumed record but rather returns the current time in milliseconds from the system clock (`System.currentTimeMillis()`), which effectively means Kafka Streams operates on the basis of the so-called **processing-time** of events. You can also provide your own timestamp extractors, for instance to retrieve timestamps embedded in the payload of messages. If you can’t extract a valid timestamp, you can either throw an exception, return a negative timestamp, or estimate a timestamp. Returning a negative timestamp results in data loss, as the corresponding record isn’t processed, but instead, it’s dropped silently. If you want to estimate a new timestamp, you can use the value provided by `previousTimestamp`, that is, a Kafka Streams timestamp estimation. Here is an example of a custom `TimestampExtractor` implementation: ```java import org.apache.kafka.clients.consumer.ConsumerRecord; import org.apache.kafka.streams.processor.TimestampExtractor; // Extracts the embedded timestamp of a record (giving you "event-time" semantics). public class MyEventTimeExtractor implements TimestampExtractor { @Override public long extract(final ConsumerRecord record, final long previousTimestamp) { // `Foo` is your own custom class, which we assume has a method that returns // the embedded timestamp (milliseconds since midnight, January 1, 1970 UTC). long timestamp = -1; final Foo myPojo = (Foo) record.value(); if (myPojo != null) { timestamp = myPojo.getTimestampInMillis(); } if (timestamp < 0) { // Invalid timestamp! Attempt to estimate a new timestamp, // otherwise fall back to wall-clock time (processing-time). if (previousTimestamp >= 0) { return previousTimestamp; } else { return System.currentTimeMillis(); } } return timestamp; } } ``` You would then define the custom timestamp extractor in your Kafka Streams configuration as follows: ```java import java.util.Properties; import org.apache.kafka.streams.StreamsConfig; Properties streamsConfiguration = new Properties(); streamsConfiguration.put(StreamsConfig.DEFAULT_TIMESTAMP_EXTRACTOR_CLASS_CONFIG, MyEventTimeExtractor.class); ``` ## RBAC role bindings Kafka Streams supports role-based access control (RBAC) for controlling access to resources in your Kafka clusters. The following table shows required RBAC roles for access to cluster resources. | Resource | Role | Command | Notes | |------------------------------------------|--------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------| | Input topic | `DeveloperRead` | ```bash confluent iam rbac role-binding create \ --principal User: \ --role DeveloperRead \ --resource Topic: \ --kafka-cluster ``` | | | Output topic | `DeveloperWrite` | ```bash confluent iam rbac role-binding create \ --principal User: \ --role DeveloperWrite \ --resource Topic: \ --kafka-cluster ``` | | | Internal topic | `ResourceOwner` | ```bash confluent iam rbac role-binding create \ --principal User: \ --role ResourceOwner \ --prefix \ --resource Topic: \ --kafka-cluster ``` | Required on all internal topics for internal topic management, for example, internal delete calls. | | Idempotent Producer | `DeveloperWrite` | ```bash confluent iam rbac role-binding create \ --principal User: \ --role DeveloperWrite \ --resource Cluster: \ --kafka-cluster ``` | The role binding is on the cluster, because no topic is involved. | | Transactional Producer | `DeveloperWrite` | ```bash confluent iam rbac role-binding create \ --principal User: \ --role DeveloperWrite \ --prefix \ --resource Transactional-Id: \ --kafka-cluster ``` | When `processing.guarantee` is set to `exactly_once` or `exactly_once_v2`. | | Consumer group | `DeveloperRead` | ```bash confluent iam rbac role-binding create \ --principal User: \ --role DeveloperRead \ --prefix \ --resource Group: \ --kafka-cluster ``` | | | Schema Registry with input/output topics | `DeveloperRead` | ```bash confluent iam rbac role-binding create \ --principal User: \ --role DeveloperRead \ --prefix \ --resource Subject: \ --kafka-cluster ``` | The resource also may be `Subject:`. | | Schema Registry with internal topics | `ResourceOwner` | ```bash confluent iam rbac role-binding create \ --principal User: \ --role ResourceOwner \ --prefix \ --resource Subject: \ --kafka-cluster ``` | If internal topic schema usage is enabled. | | Confluent Cloud governance features | `DataDiscovery` or `DataSteward` | ```bash confluent iam rbac role-binding create \ --principal User: \ --role DataDiscovery \ --prefix \ --resource Subject: \ --kafka-cluster ``` | Required for Confluent Cloud governance features, like Stream Catalog, for searching, tagging, or managing business metadata topics. | # Kafka Streams Quick Start for Confluent Platform Confluent for VS Code provides project scaffolding for many different Apache Kafka® clients, including Kafka Streams. The generated project has everything you need to compile and run a simple Kafka Streams application that you can extend with your code. This guide shows you how to build a Kafka Streams application that connects to a Kafka cluster. You’ll learn how to: - Create a Kafka Streams project using Confluent for VS Code - Process streaming data with Kafka Streams operations - Run your application in a Docker container Confluent for VS Code generates a project for a Kafka Streams application that consumes messages from an input topic and produces messages to an output topic by using the following code. ```java builder.stream(INPUT_TOPIC, Consumed.with(stringSerde, stringSerde)) .peek((k, v) -> LOG.info("Received raw event: {}", v)) .mapValues(value -> generateEnrichedEvent()) .peek((k, v) -> LOG.info("Generated enriched event: {}", v)) .to(OUTPUT_TOPIC, Produced.with(stringSerde, stringSerde)); ``` ### Standalone REST Proxy For the next few steps, use the REST Proxy that is running as a standalone service. 1. Use the standalone REST Proxy to try to produce a message to the topic `users`, referencing schema id `9`. This schema was registered in Schema Registry in the previous section. It should fail due to an authorization error. ```text docker compose exec restproxy curl -X POST \ -H "Content-Type: application/vnd.kafka.avro.v2+json" \ -H "Accept: application/vnd.kafka.v2+json" \ --cert /etc/kafka/secrets/restproxy.certificate.pem \ --key /etc/kafka/secrets/restproxy.key \ --tlsv1.2 \ --cacert /etc/kafka/secrets/snakeoil-ca-1.crt \ --data '{"value_schema_id": 9, "records": [{"value": {"user":{"userid": 1, "username": "Bunny Smith"}}}]}' \ -u appSA:appSA \ https://restproxy:8086/topics/users ``` Your output should resemble: ```JSON {"offsets":[{"partition":null,"offset":null,"error_code":40301,"error":"Not authorized to access topics: [users]"}],"key_schema_id":null,"value_schema_id":9} ``` 2. Create a role binding for the client permitting it produce to the topic `users`. Get the Kafka cluster ID: ```none KAFKA_CLUSTER_ID=$(curl -s https://localhost:8091/v1/metadata/id --tlsv1.2 --cacert scripts/security/snakeoil-ca-1.crt | jq -r ".id") ``` Create the role binding: ```text # Create the role binding for the topic ``users`` docker compose exec tools bash -c "confluent iam rbac role-binding create \ --principal User:appSA \ --role DeveloperWrite \ --resource Topic:users \ --kafka-cluster-id $KAFKA_CLUSTER_ID" ``` 3. Again try to produce a message to the topic `users`. It should pass this time. ```text docker compose exec restproxy curl -X POST \ -H "Content-Type: application/vnd.kafka.avro.v2+json" \ -H "Accept: application/vnd.kafka.v2+json" \ --cert /etc/kafka/secrets/restproxy.certificate.pem \ --key /etc/kafka/secrets/restproxy.key \ --tlsv1.2 \ --cacert /etc/kafka/secrets/snakeoil-ca-1.crt \ --data '{"value_schema_id": 9, "records": [{"value": {"user":{"userid": 1, "username": "Bunny Smith"}}}]}' \ -u appSA:appSA \ https://restproxy:8086/topics/users ``` Your output should resemble: ```JSON {"offsets":[{"partition":1,"offset":0,"error_code":null,"error":null}],"key_schema_id":null,"value_schema_id":9} ``` 4. Create consumer instance `my_avro_consumer`. ```text docker compose exec restproxy curl -X POST \ -H "Content-Type: application/vnd.kafka.v2+json" \ --cert /etc/kafka/secrets/restproxy.certificate.pem \ --key /etc/kafka/secrets/restproxy.key \ --tlsv1.2 \ --cacert /etc/kafka/secrets/snakeoil-ca-1.crt \ --data '{"name": "my_consumer_instance", "format": "avro", "auto.offset.reset": "earliest"}' \ -u appSA:appSA \ https://restproxy:8086/consumers/my_avro_consumer ``` Your output should resemble: ```text {"instance_id":"my_consumer_instance","base_uri":"https://restproxy:8086/consumers/my_avro_consumer/instances/my_consumer_instance"} ``` 5. Subscribe `my_avro_consumer` to the `users` topic. ```text docker compose exec restproxy curl -X POST \ -H "Content-Type: application/vnd.kafka.v2+json" \ --cert /etc/kafka/secrets/restproxy.certificate.pem \ --key /etc/kafka/secrets/restproxy.key \ --tlsv1.2 \ --cacert /etc/kafka/secrets/snakeoil-ca-1.crt \ --data '{"topics":["users"]}' \ -u appSA:appSA \ https://restproxy:8086/consumers/my_avro_consumer/instances/my_consumer_instance/subscription ``` 6. Try to consume messages for `my_avro_consumer` subscriptions. It should fail due to an authorization error. ```text docker compose exec restproxy curl -X GET \ -H "Accept: application/vnd.kafka.avro.v2+json" \ --cert /etc/kafka/secrets/restproxy.certificate.pem \ --key /etc/kafka/secrets/restproxy.key \ --tlsv1.2 \ --cacert /etc/kafka/secrets/snakeoil-ca-1.crt \ -u appSA:appSA \ https://restproxy:8086/consumers/my_avro_consumer/instances/my_consumer_instance/records ``` Your output should resemble: ```text {"error_code":40301,"message":"Not authorized to access group: my_avro_consumer"} ``` 7. Create a role binding for the client permitting it access to the consumer group `my_avro_consumer`. Get the Kafka cluster ID: ```none KAFKA_CLUSTER_ID=$(curl -s https://localhost:8091/v1/metadata/id --tlsv1.2 --cacert scripts/security/snakeoil-ca-1.crt | jq -r ".id") ``` Create the role binding: ```text # Create the role binding for the group ``my_avro_consumer`` docker compose exec tools bash -c "confluent iam rbac role-binding create \ --principal User:appSA \ --role ResourceOwner \ --resource Group:my_avro_consumer \ --kafka-cluster-id $KAFKA_CLUSTER_ID" ``` 8. Again try to consume messages for `my_avro_consumer` subscriptions. It should fail due to a different authorization error. ```text # Note: Issue this command twice due to https://github.com/confluentinc/kafka-rest/issues/432 docker compose exec restproxy curl -X GET \ -H "Accept: application/vnd.kafka.avro.v2+json" \ --cert /etc/kafka/secrets/restproxy.certificate.pem \ --key /etc/kafka/secrets/restproxy.key \ --tlsv1.2 \ --cacert /etc/kafka/secrets/snakeoil-ca-1.crt \ -u appSA:appSA \ https://restproxy:8086/consumers/my_avro_consumer/instances/my_consumer_instance/records docker compose exec restproxy curl -X GET \ -H "Accept: application/vnd.kafka.avro.v2+json" \ --cert /etc/kafka/secrets/restproxy.certificate.pem \ --key /etc/kafka/secrets/restproxy.key \ --tlsv1.2 \ --cacert /etc/kafka/secrets/snakeoil-ca-1.crt \ -u appSA:appSA \ https://restproxy:8086/consumers/my_avro_consumer/instances/my_consumer_instance/records ``` Your output should resemble: ```JSON {"error_code":40301,"message":"Not authorized to access topics: [users]"} ``` 9. Create a role binding for the client permitting it access to the topic `users`. Get the Kafka cluster ID: ```none KAFKA_CLUSTER_ID=$(curl -s https://localhost:8091/v1/metadata/id --tlsv1.2 --cacert scripts/security/snakeoil-ca-1.crt | jq -r ".id") ``` Create the role binding: ```text # Create the role binding for the group my_avro_consumer docker compose exec tools bash -c "confluent iam rbac role-binding create \ --principal User:appSA \ --role DeveloperRead \ --resource Topic:users \ --kafka-cluster-id $KAFKA_CLUSTER_ID" ``` 10. Again try to consume messages for `my_avro_consumer` subscriptions. It should pass this time. ```text # Note: Issue this command twice due to https://github.com/confluentinc/kafka-rest/issues/432 docker compose exec restproxy curl -X GET \ -H "Accept: application/vnd.kafka.avro.v2+json" \ --cert /etc/kafka/secrets/restproxy.certificate.pem \ --key /etc/kafka/secrets/restproxy.key \ --tlsv1.2 \ --cacert /etc/kafka/secrets/snakeoil-ca-1.crt \ -u appSA:appSA \ https://restproxy:8086/consumers/my_avro_consumer/instances/my_consumer_instance/records docker compose exec restproxy curl -X GET \ -H "Accept: application/vnd.kafka.avro.v2+json" \ --cert /etc/kafka/secrets/restproxy.certificate.pem \ --key /etc/kafka/secrets/restproxy.key \ --tlsv1.2 \ --cacert /etc/kafka/secrets/snakeoil-ca-1.crt \ -u appSA:appSA \ https://restproxy:8086/consumers/my_avro_consumer/instances/my_consumer_instance/records ``` Your output should resemble: ```JSON [{"topic":"users","key":null,"value":{"userid":1,"username":"Bunny Smith"},"partition":1,"offset":0}] ``` 11. Delete the consumer instance `my_avro_consumer`. ```text docker compose exec restproxy curl -X DELETE \ -H "Content-Type: application/vnd.kafka.v2+json" \ --cert /etc/kafka/secrets/restproxy.certificate.pem \ --key /etc/kafka/secrets/restproxy.key \ --tlsv1.2 \ --cacert /etc/kafka/secrets/snakeoil-ca-1.crt \ -u appSA:appSA \ https://restproxy:8086/consumers/my_avro_consumer/instances/my_consumer_instance ``` ### Configure monitoring connection 1. Define the listener configuration for the monitoring interceptors: ```yaml kafka_connect_replicator_monitoring_interceptor_listener: ssl_enabled: true sasl_protocol: kerberos ``` 2. Define the basic monitoring configuration: ```yaml kafka_connect_replicator_monitoring_interceptor_bootstrap_servers: ``` 3. Define the security configuration for the monitoring connection. ```yaml kafka_connect_replicator_monitoring_interceptor_kerberos_principal: kafka_connect_replicator_monitoring_interceptor_kerberos_keytab_path: kafka_connect_replicator_monitoring_interceptor_ssl_ca_cert_path: kafka_connect_replicator_monitoring_interceptor_ssl_cert_path: kafka_connect_replicator_monitoring_interceptor_ssl_key_path: kafka_connect_replicator_monitoring_interceptor_ssl_key_password: ``` 4. For RBAC-enabled deployment, define additional custom properties for the monitoring connection. `kafka_connect_replicator_monitoring_interceptor` configs default to match `kafka_connect_replicator` configs. The following are required only if you are producing metrics to a different cluster than where you are storing your configs. Specify either the Kafka cluster id (`kafka_connect_replicator_monitoring_interceptor_kafka_cluster_id`) or the cluster name (`kafka_connect_replicator_monitoring_interceptor_kafka_cluster_name`). ```yaml kafka_connect_replicator_monitoring_interceptor_rbac_enabled: true kafka_connect_replicator_monitoring_interceptor_erp_tls_enabled: kafka_connect_replicator_monitoring_interceptor_erp_host: kafka_connect_replicator_monitoring_interceptor_erp_admin_user: kafka_connect_replicator_monitoring_interceptor_erp_admin_password: password kafka_connect_replicator_monitoring_interceptor_kafka_cluster_id: kafka_connect_replicator_monitoring_interceptor_kafka_cluster_name: kafka_connect_replicator_monitoring_interceptor_erp_pem_file: ``` ### Connect to Confluent Cloud Schema Registry To enable components to connect to Confluent Cloud Schema Registry, get the Schema Registry URL, the api key, and the secret, and set the following variables in the `hosts.yml` file: * `ccloud_schema_registry_enabled` * `ccloud_schema_registry_url` * `ccloud_schema_registry_key` * `ccloud_schema_registry_secret` For example: ```yaml all: vars: ccloud_schema_registry_enabled: true ccloud_schema_registry_url: https://psrc-zzzzz.europe-west3.gcp.confluent.cloud ccloud_schema_registry_key: AAAAAAAAAAAAAAAA ccloud_schema_registry_secret: bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb ``` See a sample inventory file for Confluent Cloud Kafka and Schema Registry configuration at the following location: ```bash https://github.com/confluentinc/cp-ansible/blob/8.1.0-post/docs/sample_inventories/ccloud.yml ``` ### Configure SASL/GSSAPI (Kerberos) authentication The Ansible playbook does not currently configure Key Distribution Center (KDC) and Active Directory KDC configurations. You must set up your own KDC independently of the playbook and provide your own keytabs to configure SASL/GSSAPI (SASL with Kerberos): * Create principals within your organization’s Kerberos KDC server for each component and for each host in each component. * Generate keytabs for these principals. The keytab files must be present on the Ansible control node. To install Kerberos packages and configure the client configuration file on each host, add the following configuration parameters in the `hosts.yaml` file. * Specify whether to install Kerberos packages and to configure the client configuration file. The default value is `true`. If the hosts already have the client configuration file configured, set `kerberos_configure` to `false`. ```yaml all: vars: kerberos_configure: ``` * Specify the client configuration file. The default value is `/etc/krb5.conf`. Use this variable only when you want to specify a custom location of the client configuration file. ```yaml all: vars: kerberos_client_config_file_dest: ``` If `kerberos_configure` is set to `true`, Confluent Ansible will generate the client config file at this location on the host nodes. If `kerberos_configure` is set to `false`, Confluent Ansible will expect the client configuration file to be present at this location on the host nodes. * Specify the *realm* part of the Kafka broker Kerberos principal and the hostname of machine with KDC running. ```yaml all: vars: kerberos: realm: kdc_hostname: admin_hostname: ``` The example below shows the Kerberos configuration settings for the Kerberos principal, `kafka/kafka1.hostname.com@EXAMPLE.COM`. ```yaml all: vars: kerberos_configure: true kerberos: realm: example.com kdc_hostname: ip-192-24-45-82.us-west.compute.internal admin_hostname: ip-192-24-45-82.us-west.compute.internal ``` Each host in the inventory file also needs to set variables that define their Kerberos principal and the location of the keytab on the Ansible controller. The `hosts.yml` inventory file should look like: ```yaml kafka_controller: hosts: ip-192-24-34-224.us-west.compute.internal: kafka_controller_kerberos_keytab_path: /tmp/keytabs/kafka-ip-192-24-34-224.us-west.compute.internal.keytab kafka_controller_kerberos_principal: kafka/ip-192-24-34-224.us-west.compute.internal@REALM.EXAMPLE.COM ip-192-24-37-15.us-west.compute.internal: kafka_controller_kerberos_keytab_path: /tmp/keytabs/kafka-ip-192-24-34-224.us-west.compute.internal.keytab kafka_controller_kerberos_principal: kafka/ip-192-24-34-224.us-west.compute.internal@REALM.EXAMPLE.COM ip-192-24-34-224.us-west.compute.internal: kafka_controller_kerberos_keytab_path: /tmp/keytabs/kafka-ip-192-24-34-224.us-west.compute.internal.keytab kafka_controller_kerberos_principal: kafka/ip-192-24-34-224.us-west.compute.internal@REALM.EXAMPLE.COM ``` ```yaml kafka_broker: hosts: ip-192-24-34-224.us-west.compute.internal: kafka_broker_kerberos_keytab_path: /tmp/keytabs/kafka-ip-192-24-34-224.us-west.compute.internal.keytab kafka_broker_kerberos_principal: kafka/ip-192-24-34-224.us-west.compute.internal@REALM.EXAMPLE.COM ip-192-24-37-15.us-west.compute.internal: kafka_broker_kerberos_keytab_path: /tmp/keytabs/kafka-ip-192-24-34-224.us-west.compute.internal.keytab kafka_broker_kerberos_principal: kafka/ip-192-24-34-224.us-west.compute.internal@REALM.EXAMPLE.COM ip-192-24-34-224.us-west.compute.internal: kafka_broker_kerberos_keytab_path: /tmp/keytabs/kafka-ip-192-24-34-224.us-west.compute.internal.keytab kafka_broker_kerberos_principal: kafka/ip-192-24-34-224.us-west.compute.internal@REALM.EXAMPLE.COM ``` ```yaml schema_registry: hosts: ip-192-24-34-224.us-west.compute.internal: schema_registry_kerberos_keytab_path: /tmp/keytabs/schemaregistry-ip-192-24-34-224.us-west.compute.internal.keytab schema_registry_kerberos_principal: schemaregistry/ip-192-24-34-224.us-west.compute.internal@REALM.EXAMPLE.COM ``` ```yaml kafka_connect: hosts: ip-192-24-34-224.us-west.compute.internal: kafka_connect_kerberos_keytab_path: /tmp/keytabs/connect-ip-192-24-34-224.us-west.compute.internal.keytab kafka_connect_kerberos_principal: connect/ip-192-24-34-224.us-west.compute.internal@REALM.EXAMPLE.COM ``` ```yaml kafka_rest: hosts: ip-192-24-34-224.us-west.compute.internal: kafka_rest_kerberos_keytab_path: /tmp/keytabs/restproxy-ip-192-24-34-224.us-west.compute.internal.keytab kafka_rest_kerberos_principal: restproxy/ip-192-24-34-224.us-west.compute.internal@REALM.EXAMPLE.COM ``` ```yaml ksql: hosts: ip-192-24-34-224.us-west.compute.internal: ksql_kerberos_keytab_path: /tmp/keytabs/ksql-ip-192-24-34-224.us-west.compute.internal.keytab ksql_kerberos_principal: ksql/ip-192-24-34-224.us-west.compute.internal@REALM.EXAMPLE.COM ``` ```yaml control_center_next_gen: hosts: ip-192-24-34-224.us-west.compute.internal: control_center_next_gen_kerberos_keytab_path: /tmp/keytabs/controlcenter-ip-192-24-34-224.us-west.compute.internal.keytab control_center_next_gen_kerberos_principal: controlcenter/ip-192-24-34-224.us-west.compute.internal@REALM.EXAMPLE.COM ``` ### Configure single sign-on authentication for Confluent Control Center and Confluent CLI In Confluent Ansible, you can configure single sign-on (SSO) authentication for Control Center using OpenID Connect (OIDC). As a prerequisite for SSO, you need to configure: * [An OIDC-compliant identity provider (IdP)](https://docs.confluent.io/platform/current/security/authentication/sso-for-c3/configure-sso-using-oidc.html#step-1-establish-a-trust-relationship-between-cp-and-identity-provider). For RBAC with mTLS, you can use the [file-based authentication](ansible-authorize.md#ansible-file-based-authentication) without IdP. * [The MDS](https://docs.confluent.io/ansible/current/ansible-authorize.html#role-based-access-control). For SSO, RBAC needs to be enabled, and RBAC requires MDS. To use SSO in Control Center or Confluent CLI, specify the following variables in your inventory file. For details on the setting variables, refer to [Configure SSO for Confluent Control Center using OIDC](https://docs.confluent.io/platform/current/control-center/security/sso/configure-sso-using-oidc.html). * `sso_mode` To enable SSO, set to `oidc`. * `sso_groups_claim` Groups in JSON Web Tokens (JWT) Default: `groups` * `sso_sub_claim`: Sub in JWT. Default: `sub` * `sso_issuer_url` The issuer URL, which is typically the authorization server’s URL. This value is compared to the issuer claim in the JWT token for verification. * `sso_jwks_uri` The JSON Web Key Set (JWKS) URI. It is used to verify any JSON Web Token (JWT) issued by the IdP. * `sso_authorize_uri` The base URI for authorize endpoint, that initiates an OAuth authorization request. * `sso_token_uri` The IdP token endpoint, from where a token is requested by the MDS. * `sso_client_id` The client id for authorization and token request to IdP. * `sso_client_password` The client password for authorize and token request to IdP. * `sso_groups_scope` Optional. The name of the custom groups. Use this setting to handle a case where the `groups` field is not present in tokens by default, and you have configured a custom scope for issuing groups. The name of the scope could be anything, such as `groups`, `allow_groups`, `offline_access`, etc. `offline_access` is a well-defined scope used to request refresh token. This scope can be requested when the `sso_refresh_token` setting is set to `true`. The scope is defined in OIDC RFC, and is not specific to any IdP. Possible values: `groups`, `openid`, `offline_access`, etc. Default: `groups` * `sso_refresh_token` Configures whether the `offline_access` scope can be requested in the authorization URI. Set this to `false` if offline tokens are not allowed for the user or client in the IdP. As described in [SSO Session management](https://docs.confluent.io/platform/current/control-center/security/sso/configure-sso-using-oidc.html#step-3-customize-additional-security-and-usability), for RBAC to work as expected, the default value of `true` should not be changed to `false`. Default: `true` * `sso_cli` To enable SSO in Confluent CLI, set it to `true. When enabling SSO in CLI, you must also provide ``sso_device_authorization_uri`. Default: `false` * `sso_device_authorization_uri` Device Authorization endpoint of Idp, Required to enable SSO in Confluent CLI. * `sso_idp_cert_path` TLS certificate (full path of file on the control node) of the IdP domain for OIDC SSO in Control Center or Confluent CLI. Required when the IdP server has TLS enabled with custom certificate. The following is an example snippet of an inventory file for setting up Confluent Platform with RBAC, SASL/PLAIN protocol, and Control Center SSO: ```yaml all: vars: ansible_connection: ssh ansible_user: ec2-user ansible_become: true ansible_ssh_private_key_file: /home/ec2-user/guest.pem ## TLS Configuration - Custom Certificates ssl_enabled: true #### SASL Authentication Configuration #### sasl_protocol: plain ## RBAC Configuration rbac_enabled: true ## LDAP CONFIGURATION kafka_broker_custom_properties: ldap.java.naming.factory.initial: com.sun.jndi.ldap.LdapCtxFactory ldap.com.sun.jndi.ldap.read.timeout: 3000 ldap.java.naming.provider.url: ldaps://ldap1:636 ldap.java.naming.security.principal: uid=mds,OU=rbac,DC=example,DC=com ldap.java.naming.security.credentials: password ldap.java.naming.security.authentication: simple ldap.user.search.base: OU=rbac,DC=example,DC=com ldap.group.search.base: OU=rbac,DC=example,DC=com ldap.user.name.attribute: uid ldap.user.memberof.attribute.pattern: CN=(.*),OU=rbac,DC=example,DC=com ldap.group.name.attribute: cn ldap.group.member.attribute.pattern: CN=(.*),OU=rbac,DC=example,DC=com ldap.user.object.class: account ## LDAP USERS mds_super_user: mds mds_super_user_password: password kafka_broker_ldap_user: kafka_broker kafka_broker_ldap_password: password schema_registry_ldap_user: schema_registry schema_registry_ldap_password: password kafka_connect_ldap_user: connect_worker kafka_connect_ldap_password: password ksql_ldap_user: ksql ksql_ldap_password: password kafka_rest_ldap_user: rest_proxy kafka_rest_ldap_password: password control_center_next_gen_ldap_user: control_center control_center_next_gen_ldap_password: password ## Variables to enable SSO in Control Center sso_mode: oidc # necessary configs in MDS server for sso in C3 sso_groups_claim: groups sso_sub_claim: sub sso_groups_scope: groups sso_issuer_url: sso_jwks_uri: sso_authorize_uri: sso_token_uri: sso_client_id: sso_client_password: sso_refresh_token: true kafka_controller: hosts: demo-controller-0: demo-controller-1: demo-controller-2: kafka_broker: hosts: demo-broker-0: demo-broker-1: demo-broker-2: schema_registry: hosts: demo-sr-0: kafka_connect: hosts: demo-connect-0: kafka_rest: hosts: demo-rest-0: ksql: hosts: demo-ksql-0: control_center_next_gen: hosts: demo-c3-0: ``` # Specify exporter for switchover sr_switch_over_exporter_name: "cp-to-cc-exporter" password_encoder_secret: ``` **Sync Schemas to a specific context:** 1. Export the default context in Confluent Platform to the `site1` context in Confluent Cloud, and import all schemas in the `site1` context in Confluent Cloud to the default context in Confluent Platform. ```yaml # When using contexts unified_stream_manager: schema_registry_endpoint: "https://psrc-xyz.us-east-1.aws.confluent.cloud" authentication_type: basic basic_username: "your-cc-api-key" basic_password: "your-cc-api-secret" remote_context: "site1" schema_exporters: - name: "production-exporter" subjects: ["*"] context_type: "CUSTOM" context: "site1" schema_importers: - name: "production-importer" subjects: [":.site1:*"] context: "." sr_switch_over_exporter_name: "production-exporter" password_encoder_secret: ``` 2. Export the `corp` context in Confluent Platform to the `site1` context in Confluent Cloud, and import all schemas in the `site1.corp` context in Confluent Cloud to the `corp` context in Confluent Platform. ```yaml # When using contexts unified_stream_manager: schema_registry_endpoint: "https://psrc-xyz.us-east-1.aws.confluent.cloud" authentication_type: basic basic_username: "your-cc-api-key" basic_password: "your-cc-api-secret" remote_context: "site1" schema_exporters: - name: "production-exporter-2" subjects: [":.corp:*"] context_type: "CUSTOM" context: "site1" schema_importers: - name: "production-importer-2" subjects: [":.site1.corp:*"] context: "site1" sr_switch_over_exporter_name: "production-exporter-2" password_encoder_secret: ``` **Greenfield setup - enable forwarding and importer:** ```yaml ## Upgrade notes Before you start the upgrade process, review the following changes and make any necessary updates. * ZooKeeper removal in Confluent Platform 8.0 ZooKeeper was removed in Confluent Platform 8.0 and is no longer supported with the 8.0 version. Follow the steps in [Upgrade ZooKeeper-based Confluent Platform deployment](#ansible-upgrade-zk) to migrate your ZooKeeper-based deployment to KRaft before you upgrade to Confluent Platform 8.0. * Upgrade Control Center from 2.0 or 2.1 to 2.2 in Confluent Ansible 8.0 Confluent Platform 8.0 does not work with Control Center 2.0 or 2.1. So, when upgrading Confluent Platform to 8.0 and Control Center to 2.2, you must upgrade Control Center before upgrading Kafka. This dependency is only specific to when you upgrade to Confluent Platform 8.0. * Upgrade Confluent Control Center (Legacy) alerts to Control Center in Confluent Ansible 8.0 Starting in the 8.0 release, Confluent Ansible and Confluent Platform no longer support Confluent Control Center (Legacy). If you have alerts you need to migrate from Confluent Control Center (Legacy) to Control Center, before you upgrade your Confluent Platform deployment to 8.0, upgrade the Confluent Platform to 7.9.1 and migrate your alerts as described in [Control Center Alert Migration](https://docs.confluent.io/control-center/current/installation/alert-migrate.html). Your Confluent Control Center (Legacy) and Control Center must be configured with the same Kafka bootstrap endpoint to point to the same Kafka cluster for alert migration. Upgrading Confluent Control Center (Legacy) to Control Center or upgrading Confluent Control Center (Legacy) metrics to Control Center are not supported. * Upgrade Log4j to Log4j 2 in Confluent Ansible 8.0 Starting in the 8.0 release, Confluent Ansible and Confluent Platform only support Log4j2. When upgrading from Confluent Ansible 7.x to 8.x, the custom Log4j configurations on your 7.x cluster are not automatically converted to Log4j 2 configurations. You need to explicitly define the variables for Log4j 2 as described in [Configure Log4j 2](ansible-configure.md#ansible-log4j2). In 8.x, by default, Confluent Ansible sets up Log4j 2 with the default values mentioned in [VARIABLES.md](https://github.com/confluentinc/cp-ansible/blob/master/docs/VARIABLES.md). * SASL/SCRAM default version The default SASL/SCRAM version was changed from 256 to 512. If the version of SASL/SCRAM is specified as 256 in your `server.properties`, you must update your inventory and change `sasl_protocol: scram` to `sasl_protocol: scram256`. * Enable Admin REST APIs When upgrading from 5.5.x to 6.2.x, you must enable Admin REST APIs by setting the following property in your inventory file. If Admin REST APIs is not enabled, component upgrades will fail: ```yaml kafka_broker_rest_proxy_enabled: true ``` * Disable canonicalization If canonicalization has not been enabled during the Confluent Platform cluster creation, explicitly set the following property in the `hosts.yml` inventory file. ```yaml kerberos: canonicalize: false ``` * Variable name updates in `hosts.yaml` Misspelled variable names were corrected in the `7.2.2` version. If upgrading from a version, earlier than `7.2.2`, to a version, `7.2.2` or later, make the following updates in your inventory file: * From: `kakfa_connect_replicator_` * To: `kafka_connect_replicator_` ## Vector search with Pinecone The following example assumes a Pinecone API key as shown in [Pinecone Quick Start](https://docs.pinecone.io/guides/get-started/quickstart) and an OpenAI connection as shown in [Connection resource](../../flink/reference/statements/create-model.md#flink-sql-create-model-connection-resource). - Follow this [Pinecone notebook](https://colab.research.google.com/github/pinecone-io/examples/blob/master/docs/semantic-search.ipynb) to create a index of LangChain docs. This example shows the following steps: 1. Run the following command to create a connection resource named `azureopenai_connection` that uses your Azure API key. ```sql CREATE CONNECTION azureopenai_connection WITH ( 'type' = 'azureopenai', 'endpoint' = '', 'api-key' = '' ); ``` 2. Run the following command to create a connection resource named “pinecone_connection” that uses your Pinecone credentials. ```sql CREATE CONNECTION pinecone_connection WITH ( 'type' = 'pinecone', 'endpoint' = '', 'api-key' = '' ); ``` 3. Run the following statements to create the tables. ```sql CREATE TABLE text_input (input STRING); CREATE TABLE embedding_output (question STRING, embedding ARRAY); -- Create the search table. CREATE TABLE pinecone ( text STRING, embeddings ARRAY ) WITH ( 'connector' = 'pinecone', 'pinecone.connection' = 'pinecone_connection', ); ``` 4. Run the following statements to create and run the embedding model. ```sql -- Create the embedding model. CREATE MODEL openaiembed INPUT (input STRING) OUTPUT (embedding ARRAY) WITH( 'task' = 'classification', 'provider'= 'azureopenai', 'azureopenai.input_format'='OPENAI-EMBED', 'azureopenai.connection' = 'azureopenai_connection' ); -- Insert testing data. INSERT INTO embedding_output SELECT * FROM text_input, LATERAL TABLE(ML_PREDICT('openaiembed', input)); INSERT INTO text_input VALUES ('what is LangChain?'), ('how do I use the LLMChain in LangChain?'), ('what is a pipeline in LangChain?'), ('how to partially format prompt templates'); ``` 5. Run the following statements to execute the vector search. ```sql -- Run the vector search. SELECT * FROM embedding_output, LATERAL TABLE(VECTOR_SEARCH_AGG(pinecone, DESCRIPTOR(embeddings), embedding, 3)); -- Or flatten the result. CREATE TABLE pinecone_result AS SELECT * FROM embedding_output, LATERAL TABLE(VECTOR_SEARCH_AGG(pinecone, DESCRIPTOR(embeddings), embedding, 3)); SELECT * FROM pinecone_result CROSS JOIN UNNEST(search_results) AS T(text, embeddings, score); ``` ## Do I get charged for internal topics created by Kafka Streams or ksqlDB? For Basic and Standard clusters using the legacy billing model, partitions for internal topics, prefixed with an underscore `_`, created by Confluent components like ksqlDB and Kafka Streams do count toward partition billing. However, topics internal to Kafka itself, like consumer offsets, do not count. For more information, see [Partitions](overview.md#partition-billing). ### Partitions Confluent Cloud does not charge for partitions on any type of Kafka cluster, but the number of partitions you use can have an impact on eCKU. To determine eCKUs limits for partitions, Confluent Cloud bills only for pre-replication (leader partitions) across a cluster. For more information, see [eCKU/CKU comparison](../clusters/cluster-types.md#ecku-comparison-table).
Legacy partition billing for Basic and Standard clusters Confluent Cloud charges for partitions on Basic and Standard clusters. You are charged for the number of unique partitions that exist on your cluster during a given hour. - Basic clusters receive 10 partitions free of charge. - Standard clusters receive 500 partitions free of charge. - Enterprise clusters have no partition-based charges. - Dedicated clusters have no partition-based charges. For billing purposes, partitions for topics that you create and partitions for internal topics are counted. Internal topics are topics that are automatically created by Confluent components such as ksqlDB, Kafka Streams, and Connect, and prefixed with an underscore (`_`). Partitions for topics that are internal to Kafka itself and are not visible in the Cloud Console, such as consumer offsets, do not count against partition limits or toward partition billing.
### Common properties The following table provides several common configuration properties for Producers and Consumers that you can review for potential modification. | Configuration property | Java default | librdkafka default | Notes | |------------------------------------------|-------------------|------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | `client.id` | empty string | rdkafka | You should set the `client.id` to something meaningful in your application, especially if you are running multiple clients or want to easily trace logs or activities to specific client instances. | | `connections.max.idle.ms` | 540000 ms (9 min) | See librdkafka `socket.timeout.ms` | You can change this when an intermediate load balancer disconnects idle connections after inactivity. For example: AWS 350 seconds, Azure 4 minutes, Google Cloud 10 minutes. | | `sasl.kerberos.service.name` | null | kafka | Changing the default service name will cause issues for those who don’t have it configured. | | `socket.connection.setup.timeout.max.ms` | 30000 ms (30 sec) | not available | librdkafka doesn’t have exponential backoff for this timeout. | | `socket.connection.setup.timeout.ms` | 10000 ms (10 sec) | 30000 ms (30 sec) | librdkafka doesn’t have exponential backoff for this timeout. | | `metadata.max.age.ms` | 300000 ms (5 min) | 900000 ms (15 min) | librdkafka has the `topic.metadata.refresh.interval.ms` property that defaults to 300000 milliseconds (5 minutes). | | `reconnect.backoff.max.ms` | 1000 ms (1 sec) | 10000 ms (10 sec) | | | `reconnect.backoff.ms` | 50 ms | 100 ms | | | `max.in.flight.requests.per.connection` | 5 | 1000000 | librdkafka produces to a single partition per batch, setting it to 5 limits producing to 5 partitions per broker. | ## Features All clusters have the following features: - [Kafka ACLs](../security/access-control/acls/overview.md#acl-manage) - [Fully-managed replica placement](resilience.md#confluent-cloud-resilience) - [User interface to manage consumer lag](../monitoring/monitor-lag.md#cloud-monitoring-lag) - [Topic management](../topics/overview.md#cloud-topics-manage) - [Fully-Managed Connectors](../connectors/overview.md#kafka-connect-cloud) - [View and consume Connect logs](../connectors/logging-cloud-connectors.md#ccloud-connector-logging) - [Stream Governance](../stream-governance/index.md#cloud-dg) - [Stream Catalog](../stream-governance/stream-catalog.md#cloud-stream-catalog) - [Stream Lineage](../stream-governance/stream-lineage.md#cloud-stream-lineage) - [Encryption-at-rest](https://confluent.safebase.us/?itemUid=ef061e5b-a2f4-469e-92bc-ab973e3d7842&source=title) - [TLS for data in transit](../security/encrypt/tls.md#manage-data-in-transit-with-tls) - [Role-based Access Control (RBAC)](../security/access-control/rbac/overview.md#cloud-rbac) (Basic clusters do not support RBAC roles for resources within the Kafka cluster) ### Feature comparison table The tables below offer comparisons of the features supported by only some Kafka cluster types. | Feature | [Basic](#basic-cluster) | [Standard](#standard-cluster) | [Enterprise](#enterprise-cluster) | [Dedicated](#dedicated-cluster) | [Freight](#freight-cluster) | |---------------------------------------------------------------------------------------------------------------------|---------------------------|---------------------------------|-------------------------------------|----------------------------------------------------------|-------------------------------| | [Exactly Once Semantics](/platform/current/streams/concepts.html#streams-concepts-processing-guarantees) | Yes | Yes | Yes | Yes | No | | [Key based compacted storage](/platform/current/kafka/design.html#log-compaction) | Yes | Yes | Yes | Yes | No | | [Custom Connectors](../connectors/bring-your-connector/overview.md#cc-bring-your-connector) | Yes | Yes | No | Yes | No | | [Flink](../flink/overview.md#ccloud-flink) | Yes | Yes | Yes | Yes | No | | [ksqlDB](../ksqldb/overview.md#cloud-ksqldb-create-stream-processing-apps) | Yes | Yes | No | Yes | No | | [Public networking](../networking/overview.md#cloud-networking-support-public) | Yes | Yes | No | Yes | No | | [Private networking](../networking/overview.md#cloud-networking-support-public) | No | No | Yes | Yes | Yes | | [OAuth](../security/authenticate/workload-identities/identity-providers/oauth/overview.md#oauth-overview) | No | Yes | Yes | Yes | Yes | | [Mutual TLS (mTLS)](../security/authenticate/workload-identities/identity-providers/mtls/overview.md#mtls-overview) | No | No | No | Yes | No | | [Audit logs](../monitoring/audit-logging/cloud-audit-log-concepts.md#cloud-audit-logs) | No | Yes | Yes | Yes | Yes | | [Self-managed encryption keys](../security/encrypt/byok/overview.md#byok-encrypted-clusters) | No | No | Yes | Yes | No | | [Automatic Elastic scaling](../billing/overview.md#e-cku-definition) | Yes | Yes | Yes | No | Yes | | [Stream Sharing](../stream-sharing/index.md#cloud-data-sharing) | Yes | Yes | No | Yes but all private networking options are not supported | No | | [Client Quotas](client-quotas.md#client-quotas) | No | No | No | Yes | No | | [Access Transparency](../monitoring/audit-logging/access-transparency-overview.md#access-transparency-overview) | No | No | No | Yes | No | ## Replicator Deployment Options Migrating topic data is achieved by running Replicator in one of three modes. They are functionally equivalent, but you might prefer one over the other based on your starting point. | Replicator Mode | Advantages and Scenarios | |---------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | As a connector within a distributed Connect cluster (on a VM) | Ideal if you already have a Connect cluster in use with the destination cluster. | | As a packaged executable on a VM | Isolates three easy-to-use config files (for replicator, consumer, producer), and avoids having to explicitly configure the Connect cluster. The [Quick Start](replicator-cloud-quickstart.md#cloud-replicator-quickstart) walks through an example of running Replicator as this type of executable. | | As a packaged executable on Kubernetes | Similar to the above, but might be easier to start as a single isolated task. Ideal if you are already managing tasks within Kubernetes. | #### Configure properties There are three config files for the executable (consumer, producer, and replication), and the minimal configuration changes for these are shown below. * `consumer.properties` ```none ssl.endpoint.identification.algorithm=https sasl.mechanism=PLAIN bootstrap.servers= retry.backoff.ms=500 sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="" password=""; security.protocol=SASL_SSL ``` * `producer.properties` ```none ssl.endpoint.identification.algorithm=https sasl.mechanism=PLAIN bootstrap.servers= sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="" password=""; security.protocol=SASL_SSL ``` * `replication.properties` - No special configuration is required in `replication.properties`. ### Replicator - [Replicator Quick Start to Migrate Topic Data on Confluent Cloud](replicator-cloud-quickstart.md#cloud-replicator-quickstart) - [Confluent Replicator to Confluent Cloud Configurations](/platform/current/tutorials/examples/ccloud/docs/replicator-to-cloud-configuration-types.html) - [On-Premises to Confluent Cloud example](/platform/current/tutorials/cp-demo/docs/index.html) - [Multi-DC Deployment Architectures](/platform/current/multi-dc-deployments/index.html) - [Replicator for Multi-Datacenter Replication](/platform/current/multi-dc-deployments/replicator/index.html) - [Tutorial: Replicating Data Between Clusters](/platform/current/multi-dc-deployments/replicator/replicator-quickstart.html#replicator-quickstart) - [Configure and Run Replicator](/platform/current/multi-dc-deployments/replicator/replicator-run.html#replicator-run) - [Disaster Recovery for Multi-Datacenter Apache Kafka Deployments](https://www.confluent.io/white-paper/disaster-recovery-for-multi-datacenter-apache-kafka-deployments/). #### Create a custom connector Use the following command to create a custom connector. Command syntax: ```bash confluent connect cluster create [flags] ``` For example: ```bash confluent connect cluster create --config-file connector-config.json --cluster lkc-abcd123 --environment env-a12b34 ``` Example output: ```bash +------+---------------------+ | ID | clcc-wzxp69 | | Name | my-custom-connector | +------+---------------------+ ``` Note that the ID of a Custom Connector starts with a prefix of `clcc` and the ID of a Managed Connector starts with a prefix of `lcc`. The JSON payload file consists of the following configuration properties. Note that the connector name used in both instances of `"name"` in the payload must be consistent. ```json { "name": "my-custom-connector", "config": { "name": "my-custom-connector", "kafka.api.key": "********", "kafka.api.secret": "********", "confluent.connector.type": "CUSTOM", "confluent.custom.plugin.id": "custom-plugin-l65664", "tasks.max": "1", "interval.ms": "10000", "kafka.topic": "my-kafka-topic" } } ``` #### Update a custom connector configuration Use the following command to update a custom connector configuration. You use a JSON payload file that contains all the configuration properties used to create the original connector, with any changes needed for the update. In the example JSON used in this example, the `tasks.max` is updated from `1` to `2`. Command syntax: ```bash confluent connect cluster update [flags] ``` For example: ```bash confluent connect cluster clcc-wzxp69 --config-file connector-config.json --cluster lkc-abcd123 --environment env-a12b34 ``` Example output: ```bash Updated connector "clcc-wzxp69" ``` The JSON payload file consists of the following configuration properties. Note that the connector name used in both instances of `"name"` in the payload must be consistent. ```json { "name": "my-custom-connector", "config": { "name": "my-custom-connector", "kafka.api.key": "********", "kafka.api.secret": "********", "confluent.connector.type": "CUSTOM", "confluent.custom.plugin.id": "custom-plugin-l65664", "tasks.max": "2", "interval.ms": "10000", "kafka.topic": "my-kafka-topic" } } ``` ### Export log messages The connector stores log messages in a Kafka topic. You can export log data using any of the following options: * Export logs using a Confluent connector: For example the [Elasticsearch Service Sink connector for Confluent Cloud](../cc-elasticsearch-service-sink.md#cc-elasticsearch-service-sink), or the [Elasticsearch Service Sink connector for Confluent Platform](https://docs.confluent.io/kafka-connectors/elasticsearch/current/overview.html) can export logs to Elasticsearch. Several additional connectors are available that may also be used for exporting logs. * Create a custom integration using the [Kafka REST API for topics](https://docs.confluent.io/cloud/current/api.html#tag/Topic-(v3)) to get log messages to a destination logs service. To manually configure a destination service to capture logs, you will need the following: * Bootstrap server endpoint: This is provided on the **Cluster Settings** page. For example, `pkc-abc123..aws.confluent.cloud:9092`. You can also get this information using the following Confluent CLI command: ```bash confluent kafka cluster describe ``` * Log topic name: Get this from the topics page. For example, `clcc--app-logs`. You can also get this information using the following Confluent CLI command: ```bash confluent kafka topic list ``` This information is also provided in the UI in **Cluster settings**. ![View cluster settings](images/ccloud-byoc-log-cluster-settings.png) #### NOTE Configuration properties that are not shown in the Cloud Console use the default values. See [Configuration Properties](#cc-alloydb-sink-config-properties) for all property values and definitions. 1. Select an **Input Kafka record value format**: (data coming from the Kafka topic) AVRO, JSON_SR (JSON Schema), or PROTOBUF. A valid schema must be available in [Schema Registry](../get-started/schema-registry.md#cloud-sr-config) to use a schema-based message format.. 2. Select an **insert mode** (insertion mode) to use: - `INSERT`: Use the standard `INSERT` row function. An error occurs if the row already exists in the table. - `UPSERT`: This mode is similar to `INSERT`. However, if the row already exists, the `UPSERT` function overwrites column values with the new values provided. ### **Show advanced configurations** - **Schema context**: Select a schema context to use for this connector, if using a schema-based data format. This property defaults to the **Default** context, which configures the connector to use the default schema set up for Schema Registry in your Confluent Cloud environment. A schema context allows you to use separate schemas (like schema sub-registries) tied to topics in different Kafka clusters that share the same Schema Registry environment. For example, if you select a non-default context, a **Source** connector uses only that schema context to register a schema and a **Sink** connector uses only that schema context to read from. For more information about setting up a schema context, see [What are schema contexts and when should you use them?](../sr/faqs-cc.md#faq-schema-contexts). - **Auto create table**: Whether to automatically create the destination table if it is missing. - **Auto add columns**: Whether to automatically add columns in the table if they are missing. #### NOTE Auto create tables and Auto add columns are optional. These properties set whether to automatically create tables or columns if they are missing relative to the input record schema. If not used, both default to `false`. When Auto create tables is set to `true`, the connector creates a table name using `${topic}` (that is, the Kafka topic name). For more information, see [Table names and Kafka topic names](#cc-alloydb-sink-truncation-behavior) and the [AlloyDB Sink configuration properties](#cc-alloydb-sink-config-properties). - **Database timezone**: Name of the timezone used in the connector when querying with time-based criteria. Defaults to `UTC`. - **Table name format**: A format string for the destination table name, which may contain `${topic}` as a placeholder for the originating topic name. - **Table types**: The comma-separated types of database tables to which the sink connector can write. - **Fields included**: List of comma-separated record value field names. If empty, all fields from the record value are used. - **PK mode**: The primary key mode. Options are: - `kafka`: Kafka coordinates are used as the primary key. Must be used with the **PK Fields** property. - `none`: No primary keys used. - `record_key`: Fields from the record key are used. May be a primitive or a struct. - `record_value`: Fields from the Kafka record value are used. Must be a struct type. - **PK Fields**: List of comma-separated primary key field names. Options are: - `kafka`: Must be three values representing the Kafka coordinates. If left empty, the coordinates default to `__connect_topic,__connect_partition,__connect_offset`. - `none`: PK Fields not used. - `record_key`: If left empty, all fields from the key struct are used. Otherwise, this is used to extract the fields in the property. A single field name must be configured for a primitive key. - `record_value`: Used to extract fields from the record value. If left empty, all fields from the value struct are used. - **When to quote SQL identifiers**: When to quote table names, column names, and other identifiers in SQL statements. - **Max rows per batch**: Maximum number of rows to include in a single batch when polling for new data. This setting can be used to limit the amount of data buffered internally in the connector. - **Input Kafka record key format**: Sets the input Kafka record key format. This need to be set to a proper format if using `pk.mode=record_key`. Valid entries are AVRO, JSON_SR, PROTOBUF, STRING. Note that you must have Confluent Cloud Schema Registry configured if using a schema-based message format like AVRO, JSON_SR, and PROTOBUF. - **Delete on null**: Whether to treat null record values as deletes. Requires `pk.mode` to be `record_key`. **Auto-restart policy** - **Enable Connector Auto-restart**: Control the auto-restart behavior of the connector and its task in the event of user-actionable errors. Defaults to `true`, enabling the connector to automatically restart in case of user-actionable errors. Set this property to `false` to disable auto-restart for failed connectors. In such cases, you would need to manually restart the connector. **Consumer configuration** - **Max poll interval(ms)**: Set the maximum delay between subsequent consume requests to Kafka. Use this property to improve connector performance in cases when the connector cannot send records to the sink system. The default is 300,000 milliseconds (5 minutes). - **Max poll records**: Set the maximum number of records to consume from Kafka in a single request. Use this property to improve connector performance in cases when the connector cannot send records to the sink system. The default is 500 records. **Transforms** - **Single Message Transforms**: To add a new SMT, see [Add transforms](single-message-transforms.md#cc-single-message-transforms-ui). For more information about unsupported SMTs, see [Unsupported transformations](single-message-transforms.md#cc-single-message-transforms-unsupported-transforms). See [Configuration Properties](#cc-alloydb-sink-config-properties) for all property values and definitions. 3. Click **Continue**. #### NOTE Configuration properties that are not shown in the Cloud Console use the default values. See [Configuration Properties](#cc-amazon-cloudwatch-metrics-sink-config-properties) for all property values and definitions. 1. Select the **Input Kafka record value** format (data coming from the Kafka topic): AVRO, JSON_SR, or PROTOBUF. A valid schema must be available in [Schema Registry](../get-started/schema-registry.md#cloud-sr-config) to use a schema-based message format. ### **Show advanced configurations** - **Schema context**: Select a schema context to use for this connector, if using a schema-based data format. This property defaults to the **Default** context, which configures the connector to use the default schema set up for Schema Registry in your Confluent Cloud environment. A schema context allows you to use separate schemas (like schema sub-registries) tied to topics in different Kafka clusters that share the same Schema Registry environment. For example, if you select a non-default context, a **Source** connector uses only that schema context to register a schema and a **Sink** connector uses only that schema context to read from. For more information about setting up a schema context, see [What are schema contexts and when should you use them?](../sr/faqs-cc.md#faq-schema-contexts). - **Behavior on malformed metric**: The connector’s behavior if the Kafka record does not contain an expected field. Valid options are `LOG` and `FAIL`. `LOG` will log and skip the malformed records, and `FAIL` will fail the connector. **Auto-restart policy** - **Enable Connector Auto-restart**: Control the auto-restart behavior of the connector and its task in the event of user-actionable errors. Defaults to `true`, enabling the connector to automatically restart in case of user-actionable errors. Set this property to `false` to disable auto-restart for failed connectors. In such cases, you would need to manually restart the connector. **Consumer configuration** - **Max poll interval(ms)**: Set the maximum delay between subsequent consume requests to Kafka. Use this property to improve connector performance in cases when the connector cannot send records to the sink system. The default is 300,000 milliseconds (5 minutes). - **Max poll records**: Set the maximum number of records to consume from Kafka in a single request. Use this property to improve connector performance in cases when the connector cannot send records to the sink system. The default is 500 records. **Transforms** - **Single Message Transforms**: To add a new SMT, see [Add transforms](single-message-transforms.md#cc-single-message-transforms-ui). For more information about unsupported SMTs, see [Unsupported transformations](single-message-transforms.md#cc-single-message-transforms-unsupported-transforms). See [Configuration Properties](#cc-amazon-cloudwatch-metrics-sink-config-properties) for all property values and definitions. 2. Click **Continue**. #### NOTE Configuration properties that are not shown in the Cloud Console use the default values. See [Configuration Properties](#cc-amazon-dynamodb-sink-config-properties) for all property values and definitions. 1. Select the **Input Kafka record value** (data coming from the Kafka topic): AVRO, JSON_SR, PROTOBUF, or JSON. A valid schema must be available in [Schema Registry](../get-started/schema-registry.md#cloud-sr-config) to use a schema-based message format (for example, Avro, JSON Schema, or Protobuf). See [Schema Registry Enabled Environments](limits.md#connect-ccloud-environment-limits) for additional information. 2. In the **DynamoDB hash key** and **DynamoDB sort key** fields, enter the hash key and sort key, respectively. By default, the Kafka partition number is used for the hash key and the record offset is used as the sort key. For a few examples of how these keys work with other record references, see [DynamoDB hash keys and sort keys](#cc-amazon-dynamodb-sink-hash-sort). Note that the maximum size of a partition using the default configuration is limited to 10 GB (defined by Amazon DynamoDB). ### **Show advanced configurations** - **Schema context**: Select a schema context to use for this connector, if using a schema-based data format. This property defaults to the **Default** context, which configures the connector to use the default schema set up for Schema Registry in your Confluent Cloud environment. A schema context allows you to use separate schemas (like schema sub-registries) tied to topics in different Kafka clusters that share the same Schema Registry environment. For example, if you select a non-default context, a **Source** connector uses only that schema context to register a schema and a **Sink** connector uses only that schema context to read from. For more information about setting up a schema context, see [What are schema contexts and when should you use them?](../sr/faqs-cc.md#faq-schema-contexts). - **Table name format**: A format string for the destination table name, which may contain `${topic}` as a placeholder for the originating topic name. For example, to create a table named `kafka-orders` based on a Kafka topic named `orders`, you would enter `kafka-${topic}` in this field. **Auto-restart policy** - **Enable Connector Auto-restart**: Control the auto-restart behavior of the connector and its task in the event of user-actionable errors. Defaults to `true`, enabling the connector to automatically restart in case of user-actionable errors. Set this property to `false` to disable auto-restart for failed connectors. In such cases, you would need to manually restart the connector. **Transforms** - **Single Message Transforms**: To add a new SMT, see [Add transforms](single-message-transforms.md#cc-single-message-transforms-ui). For more information about unsupported SMTs, see [Unsupported transformations](single-message-transforms.md#cc-single-message-transforms-unsupported-transforms). See [Configuration Properties](#cc-amazon-dynamodb-sink-config-properties) for all property values and definitions. 3. Click **Continue**. #### NOTE Configuration properties that are not shown in the Cloud Console use the default values. See [Configuration Properties](#cc-amazon-redshift-sink-config-properties) for all property values and definitions. 1. Select the **Input Kafka record value** format (data coming from the Kafka topic): AVRO, JSON_SR (JSON Schema), or PROTOBUF. A valid schema must be available in [Schema Registry](../get-started/schema-registry.md#cloud-sr-config) to use a schema-based message format (for example, Avro, JSON_SR (JSON Schema), or Protobuf). See [Schema Registry Enabled Environments](limits.md#connect-ccloud-environment-limits) for additional information. ### **Show advanced configurations** - **Schema context**: Select a schema context to use for this connector, if using a schema-based data format. This property defaults to the **Default** context, which configures the connector to use the default schema set up for Schema Registry in your Confluent Cloud environment. A schema context allows you to use separate schemas (like schema sub-registries) tied to topics in different Kafka clusters that share the same Schema Registry environment. For example, if you select a non-default context, a **Source** connector uses only that schema context to register a schema and a **Sink** connector uses only that schema context to read from. For more information about setting up a schema context, see [What are schema contexts and when should you use them?](../sr/faqs-cc.md#faq-schema-contexts). - **Table name format**: A format string for the destination table name, which may contain `${topic}` as a placeholder for the originating topic name. For example, to create a table named `kafka-orders` based on a Kafka topic named `orders`, you would enter `kafka-${topic}` in this field. - **Database timezone**: Name of the JDBC timezone that should be sed in the connector when inserting time-based values. - **Batch size**: Specifies how many records to attempt to batch together for insertion into the destination table. - **Auto create table**: Whether to automatically create the destination table if it is missing. - **Auto add columns**: Whether to automatically add columns in the table if they are missing. **Auto-restart policy** - **Enable Connector Auto-restart**: Control the auto-restart behavior of the connector and its task in the event of user-actionable errors. Defaults to `true`, enabling the connector to automatically restart in case of user-actionable errors. Set this property to `false` to disable auto-restart for failed connectors. In such cases, you would need to manually restart the connector. **Consumer configuration** - **Max poll interval(ms)**: Set the maximum delay between subsequent consume requests to Kafka. Use this property to improve connector performance in cases when the connector cannot send records to the sink system. The default is 300,000 milliseconds (5 minutes). - **Max poll records**: Set the maximum number of records to consume from Kafka in a single request. Use this property to improve connector performance in cases when the connector cannot send records to the sink system. The default is 500 records. **Transforms** - **Single Message Transforms**: To add a new SMT, see [Add transforms](single-message-transforms.md#cc-single-message-transforms-ui). For more information about unsupported SMTs, see [Unsupported transformations](single-message-transforms.md#cc-single-message-transforms-unsupported-transforms). See [Configuration Properties](#cc-amazon-redshift-sink-config-properties) for all property values and definitions. 2. Click **Continue**. #### NOTE * This Quick Start is for the fully-managed Confluent Cloud connector. If you are installing the connector locally for Confluent Platform, see [Amazon SQS Source Connector for Confluent Platform](https://docs.confluent.io/kafka-connectors/sqs/current/). * If you require private networking for fully-managed connectors, make sure to set up the proper networking beforehand. For more information, see [Manage Networking for Confluent Cloud Connectors](networking/internet-resource.md#clusters-connect-cloud). The connector converts an Amazon SQS message into a Kafka record, with the following structure: * The key encodes the SQS queue name and message ID in a struct. For FIFO queues, it also includes the message group ID. * The value encodes the body of the SQS message and various message attributes in a struct. * Each header encodes message attributes that may be present in the SQS message. For record schema details, see [Record Schemas](#cc-amazon-sqs-record-schemas). For **standard queues**, the connector supports best-effort ordering guarantees. This means that there is a chance records will end up in a different order in Kafka. For **FIFO queues**, the connector guarantees records are inserted into Kafka in the order they were inserted in Amazon SQS, as long as the destination Kafka topic has exactly one partition. If the destination topic has more than one partition, you can use a [Single Message Transforms (SMT)](single-message-transforms.md#cc-single-message-transforms) to set the partition based on the MessageGroupId field in the key. Note that the connector provides **least once delivery**. This means there is a chance that the connector can introduce duplicate records in Kafka for both standard and FIFO queues. #### NOTE Configuration properties that are not shown in the Cloud Console use the default values. See [Configuration Properties](#cc-amazon-lambda-sink-config-properties) for all property values and definitions. 1. Select the **Input Kafka record value** format (data coming from the Kafka topic): AVRO, JSON_SR, PROTOBUF, JSON, or BYTES. A valid schema must be available in [Schema Registry](../get-started/schema-registry.md#cloud-sr-config) to use a schema-based message format. ### **Show advanced configurations** - **Schema context**: Select a schema context to use for this connector, if using a schema-based data format. This property defaults to the **Default** context, which configures the connector to use the default schema set up for Schema Registry in your Confluent Cloud environment. A schema context allows you to use separate schemas (like schema sub-registries) tied to topics in different Kafka clusters that share the same Schema Registry environment. For example, if you select a non-default context, a **Source** connector uses only that schema context to register a schema and a **Sink** connector uses only that schema context to read from. For more information about setting up a schema context, see [What are schema contexts and when should you use them?](../sr/faqs-cc.md#faq-schema-contexts). - **AWS Lambda invocation type**: The mode in which the AWS Lambda function is invoked. Two modes are supported: **sync** and **async**. For more details about Lambda invocation, see [Synchronous invocation](https://docs.aws.amazon.com/lambda/latest/dg/invocation-sync.html) or [Asynchronous invocation](https://docs.aws.amazon.com/lambda/latest/dg/invocation-async.html). - **Batch size**: The maximum number of Kafka records to combine in a single AWS Lambda function invocation. You should set this as high as possible, without exceeding AWS Lambda invocation payload limits. To disable batching of records, set this value to 1. - **Record Converter Class**: Record converter class to convert Kafka records to AWS Lambda payload. **Auto-restart policy** - **Enable Connector Auto-restart**: Control the auto-restart behavior of the connector and its task in the event of user-actionable errors. Defaults to `true`, enabling the connector to automatically restart in case of user-actionable errors. Set this property to `false` to disable auto-restart for failed connectors. In such cases, you would need to manually restart the connector. **Consumer configuration** - **Max poll interval(ms)**: Set the maximum delay between subsequent consume requests to Kafka. Use this property to improve connector performance in cases when the connector cannot send records to the sink system. The default is 300,000 milliseconds (5 minutes). - **Max poll records**: Set the maximum number of records to consume from Kafka in a single request. Use this property to improve connector performance in cases when the connector cannot send records to the sink system. The default is 500 records. **Transforms** - **Single Message Transforms**: To add a new SMT, see [Add transforms](single-message-transforms.md#cc-single-message-transforms-ui). For more information about unsupported SMTs, see [Unsupported transformations](single-message-transforms.md#cc-single-message-transforms-unsupported-transforms). See [Configuration Properties](#cc-amazon-lambda-sink-config-properties) for all property values and definitions. 2. Click **Continue**. #### JDBC-based Source Connectors and the MongoDB Atlas Source Connector The [Source connector service account](#cloud-service-account-source-connectors) section provides basic ACL entries for source connector service accounts. Several source connectors allow a topic prefix. When a prefix is used and the following connectors are created using the CLI or API, you need to add ACL entries. * [MySQL Source (JDBC) Connector for Confluent Cloud](cc-mysql-source.md#cc-mysql-source) * [PostgreSQL Source (JDBC) Connector for Confluent Cloud](cc-postgresql-source.md#cc-postgresql-source) * [Microsoft SQL Server Source (JDBC) Connector for Confluent Cloud](cc-microsoft-sql-server-source.md#cc-microsoft-sql-server-source) * [Oracle Database Source (JDBC) Connector for Confluent Cloud](cc-oracle-db-source.md#cc-oracle-db-source) * [Get Started with the MongoDB Atlas Source Connector for Confluent Cloud](cc-mongo-db-source.md#cc-mongo-db-source) * [Snowflake Source Connector for Confluent Cloud](cc-snowflake-source/cc-snowflake-source.md#cc-snowflake-source) Add the following ACL entries for these source connectors: ```none confluent kafka acl create --allow --service-account "" --operations create --prefix --topic "" ``` ```none confluent kafka acl create --allow --service-account "" --operations write --prefix --topic "" ``` ### Datagen Source ```json { "connector.class": "DatagenSource", "kafka.api.key": "${KEY}", "kafka.api.secret": "${SECRET}", "kafka.topic": "datagen-source-smt-insert-field", "max.interval": "3000", "name": "DatagenSourceSmtInsertField", "output.data.format": "JSON", "quickstart": "ORDERS", "tasks.max": "1", "transforms": "insert", "transforms.insert.type": "org.apache.kafka.connect.transforms.InsertField$Value", "transforms.insert.partition.field": "PartitionField", "transforms.insert.static.field": "InsertedStaticField", "transforms.insert.static.value": "SomeValue", "transforms.insert.timestamp.field": "TimestampField", "transforms.insert.topic.field": "TopicField" } ``` ### Datagen Source ```json { "connector.class": "DatagenSource", "kafka.api.key": "${KEY}", "kafka.api.secret": "${SECRET}", "kafka.topic": "datagen-source-smt-set-schema-metadata", "max.interval": "3000", "name": "DatagenSourceSmtSetSchemaMetadata", "output.data.format": "AVRO", "quickstart": "ORDERS", "tasks.max": "1", "transforms": "setSchemaMetadata", "transforms.setSchemaMetadata.type": "org.apache.kafka.connect.transforms.SetSchemaMetadata$Value", "transforms.setSchemaMetadata.schema.name": "schema_name", "transforms.setSchemaMetadata.schema.version": "12" } ``` ### Routes configuration Routes are Confluent Gateway endpoints where client applications connect to stream data. Confluent Gateway uses routes to define how client applications connect to Kafka clusters. Clients connect to the Gateway as if it were a Kafka cluster, while the Gateway handles routing and governance. ```yaml gateway: routes: - name: --- [1] endpoint: --- [2] brokerIdentificationStrategy: --- [3] type: --- [4] pattern: --- [5] streamingDomain: --- [6] name: --- [7] bootstrapServerId: --- [8] security: --- [9] ``` * [1] The unique name for the route. * [2] The `host:port` combination that Confluent Gateway will listen on. This is the external address clients use to bootstrap to the Kafka cluster. * [3] Specifies the strategy for mapping client requests to a specific Kafka broker. * [4] The type of broker identification strategy. Set to `port` (default) or `host`. * `port` strategy: Each Kafka broker is identified using a unique port number. This is the default strategy. Clients connect to different ports to reach specific brokers (for example, port 9092 to connect with broker-0, port 9093 to connect with broker-1). The `nodeIdRanges` for the streaming domain you set in [Streaming domains configuration](#gateway-config-streaming-domains-docker) is used. `nodeIdRanges` should be present in all of the clusters associated with route’s streaming domain. * `host` strategy: Each Kafka broker is represented using a unique hostname. Clients use different host names to reach specific brokers (for example, `broker-0.kafka.company.com`, `broker-1.kafka.company.com`), and the gateway routes based on the SNI header. The `pattern` setting ([5]) is used. * [5] The pattern for the broker identification strategy. Required if the type ([4]) is `host`. For example, `broker-$(nodeId).eu-gw.sales.example.com:9092`. * [6] The reference to a streaming domain. * [7] The name of the streaming domain. Must be a valid name from the `gateway.streamingDomains[].name`. * [8] The bootstrap server ID. Must match `kafkaCluster.bootstrapServers[].id`. * [9] The security configuration. See [Configure Security for Confluent Cloud Gateway](gateway-security.md#gateway-security-docker) section for details. An example configuration for Confluent Gateway routes: ```yaml routes: - name: eu-sales endpoint: eu-gw.sales.example.com:9092 brokerIdentificationStrategy: type: host pattern: broker-$(nodeId).eu-gw.sales.example.com:9092 streamingDomain: name: sales bootstrapServerId: SASL_SSL-1 ``` ## Client switchover between Kafka clusters To perform client switchover: 1. Have your Confluent Gateway configured with two Streaming Domains for two Kafka clusters. For example: ```yaml streamingDomains: - name: kafka1-domain type: kafka kafkaCluster: name: kafka-cluster-1 bootstrapServers: - id: internal-plaintext-listener endpoint: "kafka-1:44444" - name: kafka2-domain type: kafka kafkaCluster: name: kafka-cluster-2 bootstrapServers: - id: internal-plaintext-listener endpoint: "kafka-2:22222" ``` 2. Reconfigure the Route to point to the destination Kafka cluster by updating the Streaming Domain and corresponding bootstrap server ID in the Route. For example, the following configuration points the `switchover-route` to the `kafka1-domain` streaming domain, and the clients send and receive messages from the source Kafka cluster, `kafka-cluster-1`. ```yaml streamingDomains: - name: kafka1-domain type: kafka kafkaCluster: name: kafka-cluster-1 bootstrapServers: - id: internal-plaintext-listener endpoint: "kafka-1:44444" - name: kafka2-domain type: kafka kafkaCluster: name: kafka-cluster-2 bootstrapServers: - id: internal-plaintext-listener endpoint: "kafka-2:22222" routes: - name: switchover-route endpoint: "host.docker.internal:19092" streamingDomain: name: kafka1-domain bootstrapServerId: internal-plaintext-listener ``` When you update the `switchover-route` to point to the `kafka2-domain` streaming domain, the clients will start sending and receiving new messages from the destination Kafka cluster, `kafka-cluster-2`. ```yaml streamingDomains: - name: kafka1-domain type: kafka kafkaCluster: name: kafka-cluster-1 bootstrapServers: - id: internal-plaintext-listener endpoint: "kafka-1:44444" - name: kafka2-domain type: kafka kafkaCluster: name: kafka-cluster-2 bootstrapServers: - id: internal-plaintext-listener endpoint: "kafka-2:22222" routes: - name: switchover-route endpoint: "host.docker.internal:19092" streamingDomain: name: kafka2-domain bootstrapServerId: internal-plaintext-listener ``` 3. Stop and restart the Confluent Gateway container. When the Confluent Gateway container is restarted, the clients continue sending and receiving new messages from the destination Kafka cluster. No changes are required on the producer or consumer side. # Configure Security for Confluent Cloud Gateway This section provides details on the following security configurations for Confluent Cloud Gateway (Confluent Gateway) using Docker Compose. * [Authentication](#gateway-auth-docker) * [TLS/SSL](#gateway-ssl-docker) * [Secret stores](#gateway-secret-stores-docker) * [Passwords](#gateway-password-docker) For the security configuration steps using Confluent for Kubernetes (CFK), see [Configure Security for Confluent Gateway using CFK](https://docs.confluent.io/operator/current/gateway/co-gateway-security.html). The top-level layout for the Confluent Gateway security configuration is as follows: ```yaml gateway: secretStores: streamingDomains: kafkaCluster: bootstrapServers: - id: endpoint: ssl: routes: - name: security: auth: ssl: swapConfig: ``` For `streamingDomains.kafkaCluster.bootstrapServers.ssl` and `routes.security.ssl`, see the [SSL configuration](#gateway-ssl-docker) section. #### **Cluster authentication for authentication swapping** Configure how Confluent Gateway authenticates to the Kafka cluster for authentication swapping. **SASL authentication** ```yaml gateway: routes: - name: security: auth: swap swapConfig: clusterAuth: sasl: mechanism: --- [1] callbackHandlerClass: --- [2] jaasConfig: file: --- [3] oauth: tokenEndpointUri: --- [4] connectionsMaxReauthMs: --- [5] ``` * [1] The SASL mechanism to use. Set to `PLAIN` for SASL/PLAIN authentication, or set to `OAUTHBEARER` for SASL/OAUTHBEARER authentication. * [2] The callback handler class to use. Set to `org.apache.kafka.common.security.plain.PlainServerCallbackHandler` for SASL/PLAIN authentication. * [3] The path to the JAAS configuration file. * [4] The URI for the OAuth token endpoint. * [5] The maximum re-authentication time in milliseconds. **JAAS configuration file content for SASL/PLAIN authentication** ```properties org.apache.kafka.common.security.plain.PlainLoginModule required username="%s" password="%s"; ``` **JAAS configuration file content for SASL/OAUTHBEARER authentication** ```properties org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required clientId="%s" clientSecret="%s"; ``` ## Access a ksqlDB cluster by using an API key Confluent Cloud ksqlDB supports authentication with a Confluent Cloud API key. You can use an API key to access the hosted ksqlDB cluster by using the ksqlDB CLI or HTTPS requests. Run the `confluent ksql cluster list` command to get the URL of the ksqlDB endpoint. ```bash confluent ksql cluster list ``` Your output should resemble: ```none ID | Name | Topic Prefix | Kafka Cluster | Storage | Endpoint | Status ---------------+-------------+--------------+--------------------+---------+----------------------------------------------------------+--------- lksqlc-ab123 | ksqldb-app1 | pksqlc-zz321 | lkc-bc456j | 500 | https://pksqlc-zz321.us-central1.gcp.confluent.cloud:443 | UP ``` Follow these guidelines for both the ksqlDB CLI and REST API commands: - For ``, use the endpoint value provided by the `confluent ksql cluster list` command, for example, `https://pksqlc-zz321.us-central1.gcp.confluent.cloud:443`. - For `` and ``, use an API key provided by the `confluent api-key create --resource ` command. #### IMPORTANT You must use a resource-specific key created for the ksqlDB cluster. API keys for Confluent Cloud or the Kafka cluster don’t work and cause an authorization error. ## Set up the cluster In the following steps, you install the Confluent CLI and use it to sign in to Confluent Cloud, get the cluster endpoint, and create a topic and an API key that you will use to configure the MQTT proxy. 1. Install Confluent CLI as described in [the Confluent CLI installation guide](https://docs.confluent.io/confluent-cli/current/install.html). For a list of all of the Confluent CLI commands, see [Confluent CLI Command Reference](https://docs.confluent.io/confluent-cli/current/command-reference/index.html). 2. Sign in to your Confluent Cloud cluster. ```bash confluent login ``` Your output should resemble: ```none Enter your Confluent credentials: Email: jdoe@myemail.io Password: *********************** Logged in as "jdoe@myemail.io" Using environment "t118" ("default") ``` 3. Run the `confluent kafka cluster list command` to get the Kafka cluster ID. ```bash confluent kafka cluster list ``` Your output should resemble: ```bash Id | Name | Type | Cloud | Region | Availability | Status -------------+---------------------+-----------+----------+----------+--------------+--------- lkc-m1234 | Dev | BASIC | gcp | us-west4 | single-zone | UP lkc-r1234 | Test | BASIC | gcp | us-east4 | single-zone | UP lkc-g1234 | Prod | DEDICATED | gcp | us-west4 | single-zone | UP ``` 4. Set the active Kafka cluster. In this example, the cluster ID is `lkc-m1234`. ```bash confluent kafka cluster use lkc-m1234 ``` 5. Run the `confluent kafka cluster describe` command to get the endpoint for your Confluent Cloud cluster. ```bash confluent kafka cluster describe ``` Your output should resemble: ```text +--------------+--------------------------------------------------------+ | Id | lkc-m1234 | | Name | mqtt-proxy-quickstart | | Type | BASIC | | Ingress | 100 | | Egress | 100 | | Storage | 5000 | | Cloud | gcp | | Availability | single-zone | | Region | us-west2 | | Status | UP | | Endpoint | SASL_SSL://pkc-12345.us-west2.gcp.confluent.cloud:9092 | | ApiEndpoint | https://pkac-12345.us-west2.gcp.confluent.cloud | +--------------+--------------------------------------------------------+ ``` Save the `Endpoint` value, which you’ll use to configure the bootstrap server for the MQTT Proxy. 6. Create a Kafka topic that the MQTT proxy will produce to. Use the Confluent CLI to create a topic named “temperature”. ```bash confluent kafka topic create temperature ``` 1. Create a Kafka API key and secret that the MQTT proxy can use to access Confluent Cloud. You must specify the cluster with the `resource` flag for this step. ```bash confluent api-key create --resource lkc-m1234 ``` Your output should resemble: ```text It may take a couple of minutes for the API key to be ready. Save the API key and secret. The secret is not retrievable later. +---------+------------------------------------------------------------------+ | API Key | ABCXQHYDZXMMUDEF | | Secret | aBCde3s54+4Xv36YKPLDKy2aklGr6x/ShUrEX5D1Te4AzRlphFlr6eghmPX81HTF | +---------+------------------------------------------------------------------+ ``` #### IMPORTANT **Save the API key and secret.** You need this information to configure your applications that communicate with Confluent Cloud. This is the *only* time that you can access, view, and save the key and secret. ## Setup your environment and run a Table API program Use [uv](https://docs.astral.sh/uv/) to create a virtual environment that contains all required dependencies and project files. 1. Use one of the following commands to install uv. ```bash curl -LsSf https://astral.sh/uv/install.sh | sh # or brew install uv # or pip install uv ``` 2. Create a new virtual environment. ```bash uv venv --python 3.11 ``` 3. Copy the following code into a file named `hello_table_api.py`. ```python # /// script # requires-python = ">=3.9,<3.12" # dependencies = [ # "confluent-flink-table-api-python-plugin>=2.1-8", # ] # /// from pyflink.table.confluent import ConfluentSettings, ConfluentTools from pyflink.table import TableEnvironment, Row from pyflink.table.expressions import col, row def run(): # Set up the connection to Confluent Cloud settings = ConfluentSettings.from_global_variables() env = TableEnvironment.create(settings) # Run your first Flink statement in Table API env.from_elements([row("Hello world!")]).execute().print() # Or use SQL env.sql_query("SELECT 'Hello world!'").execute().print() # Structure your code with Table objects - the main ingredient of Table API. table = env.from_path("examples.marketplace.clicks") \ .filter(col("user_agent").like("Mozilla%")) \ .select(col("click_id"), col("user_id")) table.print_schema() print(table.explain()) # Use the provided tools to test on a subset of the streaming data expected = ConfluentTools.collect_materialized_limit(table, 50) actual = [Row(42, 500)] if expected != actual: print("Results don't match!") if __name__ == "__main__": run() ``` 4. Run the following command to execute the Table API program from the directory where you created `hello_table_api.py`. ```bash uv run hello_table_api.py ``` ## Step 5. Deploy a Flink SQL statement To use Flink, you must create a Flink compute pool. A compute pool represents a set of compute resources that are bound to a region and are used to run your Flink SQL statements. For more information, see [Compute Pools](../concepts/compute-pools.md#flink-sql-compute-pools). 1. Create a new compute pool by adding the following code to “main.tf”. ```terraform # Create a Flink compute pool to execute a Flink SQL statement. resource "confluent_flink_compute_pool" "my_compute_pool" { display_name = "my_compute_pool" cloud = local.cloud region = local.region max_cfu = 10 environment { id = confluent_environment.my_env.id } depends_on = [ confluent_environment.my_env ] } ``` 2. Create a Flink-specific API key, which is required for submitting statements to Confluent Cloud, by adding the following code to “main.tf”. ```terraform # Create a Flink-specific API key that will be used to submit statements. data "confluent_flink_region" "my_flink_region" { cloud = local.cloud region = local.region } resource "confluent_api_key" "my_flink_api_key" { display_name = "my_flink_api_key" owner { id = confluent_service_account.my_service_account.id api_version = confluent_service_account.my_service_account.api_version kind = confluent_service_account.my_service_account.kind } managed_resource { id = data.confluent_flink_region.my_flink_region.id api_version = data.confluent_flink_region.my_flink_region.api_version kind = data.confluent_flink_region.my_flink_region.kind environment { id = confluent_environment.my_env.id } } depends_on = [ confluent_environment.my_env, confluent_service_account.my_service_account ] } ``` 3. Deploy a Flink SQL statement on Confluent Cloud by adding the following code to “main.tf”. The statement consumes data from `examples.marketplace.orders`, aggregates in 1 minute windows and ingests the filtered data into `sink_topic`. Because you’re using a Service Account, the statement runs in Confluent Cloud continuously until manually stopped. ```terraform # Deploy a Flink SQL statement to Confluent Cloud. resource "confluent_flink_statement" "my_flink_statement" { organization { id = data.confluent_organization.my_org.id } environment { id = confluent_environment.my_env.id } compute_pool { id = confluent_flink_compute_pool.my_compute_pool.id } principal { id = confluent_service_account.my_service_account.id } # This SQL reads data from source_topic, filters it, and ingests the filtered data into sink_topic. statement = <" export ENV_REGION_ID="." # example: "env-z3y2x1.aws.us-east-1" ``` The ENV_REGION_ID variable is a concatenation of your environment ID and the cloud provider region of your Kafka cluster, separated by a `.` character. To see the available regions, run the `confluent flink region list` command. 3. Run the following command to send a POST request to the `api-keys` endpoint. The REST API uses basic authentication, which means that you provide a base64-encoded string made from your Cloud API key and secret in the request header. ```bash curl --request POST \ --url 'https://api.confluent.cloud/iam/v2/api-keys' \ --header "Authorization: Basic $(echo -n "${CLOUD_API_KEY}:${CLOUD_API_SECRET}" | base64 -w 0)" \ --header 'content-type: application/json' \ --data "{"spec":{"display_name":"flinkapikey","owner":{"id":"${PRINCIPAL_ID}"},"resource":{"api_version":"fcpm/v2","id":"${ENV_REGION_ID}"}}}" ``` Your output should resemble: ```json { "api_version": "iam/v2", "id": "KJDYFDMBOBDNQEIU", "kind": "ApiKey", "metadata": { "created_at": "2023-12-15T23:10:20.406556Z", "resource_name": "crn://api.confluent.cloud/organization=b0b21724-4586-4a07-b787-d0bb5aacbf87/user=u-lq1dr3/api-key=KJDYFDMBOBDNQEIU", "self": "https://api.confluent.cloud/iam/v2/api-keys/KJDYFDMBOBDNQEIU", "updated_at": "2023-12-15T23:10:20.406556Z" }, "spec": { "description": "", "display_name": "flinkapikey", "owner": { "api_version": "iam/v2", "id": "u-lq1dr3", "kind": "User", "related": "https://api.confluent.cloud/iam/v2/users/u-lq2dr7", "resource_name": "crn://api.confluent.cloud/organization=b0b21724-4586-4a07-b787-d0bb5aacbf87/user=u-lq2dr7" }, "resource": { "api_version": "fcpm/v2", "id": "env-z3q9rd.aws.us-east-1", "kind": "Region", "related": "https://api.confluent.cloud/fcpm/v2/regions?cloud=aws", "resource_name": "crn://api.confluent.cloud/organization=b0b21724-4586-4a07-b787-d0bb5aacbf87/environment=env-z3q9rd/flink-region=aws.us-east-1" }, "secret": "B0BYFzyd0bb5Q58ZZJJYV52mbwDDHnZx21f0gOTz2k6Qv2V9I4KraVztwFOlQx6z" } } ``` ## AI_TOOL_INVOKE Invoke a registered tool, either externally by using an MCP server or locally by using a [UDF](../../concepts/user-defined-functions.md#flink-sql-udfs), as part of an AI workflow. Syntax : ```sql AI_TOOL_INVOKE(model_name, input_prompt, remote_udf_descriptor, mcp_tool_descriptor [, invocation_config]); ``` Description : The AI_TOOL_INVOKE function enables large language models (LLMs) to access various tools. The LLM decides which tools should be accessed, then the AI_TOOL_INVOKE function invokes the tools, gets the responses, and returns the responses to the LLM. The function returns a map that includes all the tools that were accessed, along with their responses and the status of the call, indicating whether it was a SUCCESS or FAILURE. The following models are supported: - Anthropic - AzureOpenAI - Gemini - OpenAI #### NOTE The AI_TOOL_INVOKE function is available for preview. A Preview feature is a Confluent Cloud component that is being introduced to gain early feedback from developers. Preview features can be used for evaluation and non-production testing purposes or to provide feedback to Confluent. The warranty, SLA, and Support Services provisions of your agreement with Confluent do not apply to Preview features. Confluent may discontinue providing preview releases of the Preview features at any time in Confluent’s’ sole discretion. Configuration : - `model_name`: Name of the model entity to call [STRING]. - `input_prompt`: Input prompt to pass to the LLM [STRING]. - `remote_udf_descriptor`: Map to pass UDF names as key and function description as value [MAP]. A maximum of 3 UDFs can be passed. - `mcp_tool_descriptor`: Map to pass MCP tool names as key and tool description as value [MAP]. A maximum of 5 tools can be passed. This additional description is passed to the LLM as “Additional description”. If the MCP server already has a description, and if the server doesn’t have a description, `mcp_tool_descriptor` is added as the description. You can leave it empty, in which case no changes are made to the description provided by the server. - `invocation_config[optional]`: Map to pass the config to manage function behavior, for example, `MAP['debug', true, 'on_error', 'continue']`. Example : The following example shows how to invoke a UDF and a registered external tool or API as part of an AI workflow. When you create an MCP server connection, specify the following options: - `endpoint`: Defines the base URL for all non-SSE communications with the MCP server, including other http calls and general data exchange. - `sse_endpoint`: Specifies the explicit URL endpoint used to establish a Server-Sent Events (SSE) connection with the MCP server. If omitted, the client defaults to constructing the SSE endpoint by appending `/sse` to the domain specified in `endpoint`. - `transport-type`: Specifies the transport type to use for the connection. Valid values are `SSE` and `STREAMABLE_HTTP`. The default is `SSE`. ```sql # Create an MCP server connection. CREATE CONNECTION claims_mcp_server WITH ( 'type' = 'mcp_server', 'endpoint' = 'https://mcp.deepwiki.com', 'sse-endpoint' = 'https://mcp.deepwiki.com/sse', 'api-key' = 'api_key' ); ``` ```sql -- Create a model that uses the MCP server connection. CREATE MODEL tool_invoker INPUT (input_message STRING) OUTPUT (tool_calls STRING) WITH( 'provider' = 'openai', 'openai.connection' = openai_connection, 'openai.system_prompt' = 'Select the best tools to complete the task', 'mcp.connection' = 'claims_mcp_server' ); -- Create a table that contains the input prompts. CREATE TABLE claims_verified ( id int, customer_id int ); -- Run the AI_TOOL_INVOKE function. SELECT id, customer_id, AI_TOOL_INVOKE( 'tool_invoker', customer_id, MAP['udf_1', 'udf_1 description', 'udf_2', 'udf_2 description'], MAP['tool_1', 'tool_1_description', 'tool_2', 'tool_2_description'] ) AS verified_result FROM claims_verified; ``` ## Step 4: Query Iceberg tables from Spark In this step, you read Iceberg tables created by Tableflow by using [PySpark](https://spark.apache.org/docs/latest/api/python/index.html). - Ensure that Docker is installed and running in your development environment. 1. Run the following command to start PySpark in a docker container. In this command, the AWS_REGION option must match your Kafka cluster region, for example, `us-west-2`. ```bash docker run -d \ --name spark-iceberg \ -v $(pwd)/warehouse:/home/iceberg/warehouse \ -v $(pwd)/notebooks:/home/iceberg/notebooks/notebooks \ -e AWS_REGION=${YOUR_CLUSTER_REGION} \ -p 8888:8888 \ -p 8080:8080 \ -p 10000:10000 \ -p 10001:10001 \ tabulario/spark-iceberg ``` Once the container has started successfully, you can access Jupyter notebooks in your browser by going to [http://localhost:8888](http://localhost:8888). ![Screenshot of Jupyter notebooks in PySpark](topics/tableflow/images/tableflow-iceberg-reader.png) 2. Upload the following `ipynb` file by clicking **Upload**. This file pre-populates the notebook that you use to test Tableflow.
tableflow-quickstart.ipynb ```json { "cells": [ { "cell_type": "markdown", "id": "2b3b8256-432a-46a8-8542-837777aada52", "metadata": {}, "source": [ "## Register rest catalog as default catalog for Spark" ] }, { "cell_type": "code", "execution_count": 1, "id": "e4d27656-867c-464e-a8c0-4b590fd7aae2", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "24/05/18 07:27:44 WARN SparkSession: Using an existing Spark session; only runtime SQL configurations will take effect.\n" ] } ], "source": [ "from pyspark.sql import SparkSession\n", "\n", "conf = (\n", " pyspark.SparkConf()\n", " .setAppName('Jupyter')\n", " .set(\"spark.sql.extensions\", \"org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions\")\n", " .set(\"spark.sql.catalog.tableflowdemo\", \"org.apache.iceberg.spark.SparkCatalog\")\n", " .set(\"spark.sql.catalog.tableflowdemo.type\", \"rest\")\n", " .set(\"spark.sql.catalog.tableflowdemo.uri\", \"\")\n", " .set(\"spark.sql.catalog.tableflowdemo.credential\", \":\")\n", " .set(\"spark.sql.catalog.tableflowdemo.io-impl\", \"org.apache.iceberg.aws.s3.S3FileIO\")\n", " .set(\"spark.sql.catalog.tableflowdemo.rest-metrics-reporting-enabled\", \"false\")\n", " .set(\"spark.sql.defaultCatalog\", \"tableflowdemo\")\n", " .set(\"spark.sql.catalog.tableflowdemo.s3.remote-signing-enabled\", \"true\")\n", ")\n", "spark = SparkSession.builder.config(conf=conf).getOrCreate()\n" ] }, { "cell_type": "markdown", "id": "3f7f0ed8-39bf-4ad1-ad72-d2f6e010c4b5", "metadata": {}, "source": [ "## List all the tables in the db" ] }, { "cell_type": "code", "execution_count": null, "id": "89fc7044-4f9e-47f7-8fca-05b15da88a9c", "metadata": {}, "outputs": [], "source": [ "%%sql \n", "SHOW TABLES in ``" ] }, { "cell_type": "markdown", "id": "9282572f-557d-4bb7-9f3e-511e86889304", "metadata": {}, "source": [ "## Query all records in the table" ] }, { "cell_type": "code", "execution_count": null, "id": "345f8fef-9d1f-4cc5-8015-babdf4102988", "metadata": {}, "outputs": [], "source": [ "%%sql \n", "SELECT *\n", "FROM ``.`stock-trades`;" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.18" } }, "nbformat": 4, "nbformat_minor": 5 } ```
A new notebook named **tableflow-quickstart** appears. 3. Double-click the **tableflow-quickstart** notebook to open it. ![Screenshot of Jupyter notebooks in PySpark showing the Confluent Tableflow Playground notebook](topics/tableflow/images/tableflow-playground-notebook.png) 4. Update the following properties of the Spark configuration with the values from [Step 3](#cloud-tableflow-quick-start-managed-storage-credentials). - `spark.sql.catalog.tableflowdemo.uri` - `spark.sql.catalog.tableflowdemo.credential :` 5. Run each cell individually from the **Run** menu by updating the query with the information corresponding to your cluster and topics. #### NOTE Query with the cluster ID, not the cluster name. You can see the list of tables and table data in the cells’ output. ## How do I connect Confluent Cloud for Flink SQL to a Confluent Cloud Kafka topic? Connect using Apache Flink® connectors with proper Kafka properties and Schema Registry integration. For more information, see [Stream Processing with Confluent Cloud for Apache Flink](../flink/overview.md#ccloud-flink). ### Create a new service account with an API key/secret pair 1. Run the following command to create a new service account: ```bash confluent iam service-account create demo-app-1 --description "Service account for demo application" -o json ``` 2. Verify your output resembles: ```text { "id": "sa-123456", "name": "demo-app-1", "description": "Service account for demo application" } ``` The value of the service account ID, in this case `sa-123456`, will differ in your output. 3. Create an API key and secret for the service account `sa-123456` for the Kafka cluster `lkc-x6m01` by running the following command: ```bash confluent api-key create --service-account sa-123456 --resource lkc-x6m01 -o json ``` 4. Verify your output resembles: ```text { "key": "ESN5FSNDHOFFSUEV", "secret": "nzBEyC1k7zfLvVON3vhBMQrNRjJR7pdMc2WLVyyPscBhYHkMwP6VpPVDTqhctamB" } ``` The value of the service account’s API key, in this case `ESN5FSNDHOFFSUEV`, and API secret, in this case `nzBEyC1k7zfLvVON3vhBMQrNRjJR7pdMc2WLVyyPscBhYHkMwP6VpPVDTqhctamB`, will differ in your output. 5. Create a local configuration file `/tmp/client.config` with Confluent Cloud connection information using the newly created Kafka cluster and the API key and secret for the service account. Substitute your values for the bootstrap server and credentials just created. ```text sasl.mechanism=PLAIN security.protocol=SASL_SSL bootstrap.servers=pkc-4kgmg.us-west-2.aws.confluent.cloud:9092 sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username='ESN5FSNDHOFFSUEV' password='nzBEyC1k7zfLvVON3vhBMQrNRjJR7pdMc2WLVyyPscBhYHkMwP6VpPVDTqhctamB'; ``` 6. Wait about 90 seconds for the Confluent Cloud cluster to be ready and for the service account credentials to propagate. ## Clean up Confluent Cloud resources 1. Complete the following steps to delete the managed connector: 1. Find the connector ID: ```bash confluent connect list ``` Which should display a something similar to below. Locate your connector ID, in this case the connector ID is `lcc-zno83`. ```text ID | Name | Status | Type | Trace -------------+---------------------------+---------+--------+-------- lcc-zno83 | datagen_ccloud_pageviews | RUNNING | source | ``` 2. Delete the connector, referencing the connector ID from the previous step: ```bash confluent connect delete lcc-zno83 ``` You should see: `Deleted connector "lcc-zno83".`. 2. Run the following command to delete the service account: ```bash confluent iam service-account delete sa-123456 ``` 3. Complete the following steps to delete all the Kafka topics: 1. Delete `demo-topic-1`: ```bash confluent kafka topic delete demo-topic-1 ``` You should see: `Deleted topic "demo-topic-1"`. 2. Delete `demo-topic-2`: ```bash confluent kafka topic delete demo-topic-2 ``` You should see: `Deleted topic "demo-topic-2"`. 3. Delete `demo-topic-3`: ```bash confluent kafka topic delete demo-topic-3 ``` You should see: `Deleted topic "demo-topic-3"`. 4. Run the following command to delete the user API key: ```bash confluent api-key delete QX7X4VA4DFJTTOIA ``` Note that the service account API key was deleted when you deleted the service account. 5. Delete the Kafka cluster: ```bash confluent kafka cluster delete lkc-x6m01 ``` 6. Delete the environment: ```bash confluent environment delete env-5qz2q ``` You should see: `Deleted environment "env-5qz2q"`. If the tutorial ends prematurely, you may receive the following error message when trying to run the example again (`confluent environment create ccloud-stack-000000-beginner-cli`): ```text Error: 1 error occurred: * error creating account: Account name is already in use Failed to create environment ccloud-stack-000000-beginner-cli. Please troubleshoot and run again ``` In this case, run the following script to delete the example’s topics, Kafka cluster, and environment: ```bash ./cleanup.sh ``` ## Flags ```none --file string REQUIRED: Input filename. --overwrite Overwrite existing topics with the same name. --kafka-api-key string Kafka cluster API key. --schema-registry-endpoint string The URL of the Schema Registry cluster. --kafka-endpoint string Endpoint to be used for this Kafka cluster. ``` ## Examples Create a Java client configuration file. ```none confluent kafka client-config create java --schema-registry-api-key my-sr-key --schema-registry-api-secret my-sr-secret ``` Create a Java client configuration file with arguments. ```none confluent kafka client-config create java --environment env-123 --cluster lkc-123456 --api-key my-key --api-secret my-secret --schema-registry-api-key my-sr-key --schema-registry-api-secret my-sr-secret ``` Create a Java client configuration file, redirecting the configuration to a file and the warnings to a separate file. ```none confluent kafka client-config create java --schema-registry-api-key my-sr-key --schema-registry-api-secret my-sr-secret 1> my-client-config-file.config 2> my-warnings-file ``` Create a Java client configuration file, redirecting the configuration to a file and keeping the warnings in the console. ```none confluent kafka client-config create java --schema-registry-api-key my-sr-key --schema-registry-api-secret my-sr-secret 1> my-client-config-file.config 2>&1 ``` ## Examples Create a Ktor client configuration file. ```none confluent kafka client-config create ktor --schema-registry-api-key my-sr-key --schema-registry-api-secret my-sr-secret ``` Create a Ktor client configuration file with arguments. ```none confluent kafka client-config create ktor --environment env-123 --cluster lkc-123456 --api-key my-key --api-secret my-secret --schema-registry-api-key my-sr-key --schema-registry-api-secret my-sr-secret ``` Create a Ktor client configuration file, redirecting the configuration to a file and the warnings to a separate file. ```none confluent kafka client-config create ktor --schema-registry-api-key my-sr-key --schema-registry-api-secret my-sr-secret 1> my-client-config-file.config 2> my-warnings-file ``` Create a Ktor client configuration file, redirecting the configuration to a file and keeping the warnings in the console. ```none confluent kafka client-config create ktor --schema-registry-api-key my-sr-key --schema-registry-api-secret my-sr-secret 1> my-client-config-file.config 2>&1 ``` ## Examples Create a Python client configuration file. ```none confluent kafka client-config create python --schema-registry-api-key my-sr-key --schema-registry-api-secret my-sr-secret ``` Create a Python client configuration file with arguments. ```none confluent kafka client-config create python --environment env-123 --cluster lkc-123456 --api-key my-key --api-secret my-secret --schema-registry-api-key my-sr-key --schema-registry-api-secret my-sr-secret ``` Create a Python client configuration file, redirecting the configuration to a file and the warnings to a separate file. ```none confluent kafka client-config create python --schema-registry-api-key my-sr-key --schema-registry-api-secret my-sr-secret 1> my-client-config-file.config 2> my-warnings-file ``` Create a Python client configuration file, redirecting the configuration to a file and keeping the warnings in the console. ```none confluent kafka client-config create python --schema-registry-api-key my-sr-key --schema-registry-api-secret my-sr-secret 1> my-client-config-file.config 2>&1 ``` ## Examples Create a REST API client configuration file. ```none confluent kafka client-config create restapi --schema-registry-api-key my-sr-key --schema-registry-api-secret my-sr-secret ``` Create a REST API client configuration file with arguments. ```none confluent kafka client-config create restapi --environment env-123 --cluster lkc-123456 --api-key my-key --api-secret my-secret --schema-registry-api-key my-sr-key --schema-registry-api-secret my-sr-secret ``` Create a REST API client configuration file, redirecting the configuration to a file and the warnings to a separate file. ```none confluent kafka client-config create restapi --schema-registry-api-key my-sr-key --schema-registry-api-secret my-sr-secret 1> my-client-config-file.config 2> my-warnings-file ``` Create a REST API client configuration file, redirecting the configuration to a file and keeping the warnings in the console. ```none confluent kafka client-config create restapi --schema-registry-api-key my-sr-key --schema-registry-api-secret my-sr-secret 1> my-client-config-file.config 2>&1 ``` ## Examples Create a Spring Boot client configuration file. ```none confluent kafka client-config create springboot --schema-registry-api-key my-sr-key --schema-registry-api-secret my-sr-secret ``` Create a Spring Boot client configuration file with arguments. ```none confluent kafka client-config create springboot --environment env-123 --cluster lkc-123456 --api-key my-key --api-secret my-secret --schema-registry-api-key my-sr-key --schema-registry-api-secret my-sr-secret ``` Create a Spring Boot client configuration file, redirecting the configuration to a file and the warnings to a separate file. ```none confluent kafka client-config create springboot --schema-registry-api-key my-sr-key --schema-registry-api-secret my-sr-secret 1> my-client-config-file.config 2> my-warnings-file ``` Create a Spring Boot client configuration file, redirecting the configuration to a file and keeping the warnings in the console. ```none confluent kafka client-config create springboot --schema-registry-api-key my-sr-key --schema-registry-api-secret my-sr-secret 1> my-client-config-file.config 2>&1 ``` ### On-Premises ```none --destination-cluster string Destination cluster ID. --destination-bootstrap-server string Bootstrap server address of the destination cluster. Can alternatively be set in the configuration file using key "bootstrap.servers". --remote-cluster string Remote cluster ID for bidirectional cluster links. --remote-bootstrap-server string Bootstrap server address of the remote cluster for bidirectional links. Can alternatively be set in the configuration file using key "bootstrap.servers". --source-api-key string An API key for the source cluster. For links at destination cluster, this is used for remote cluster authentication. For links at source cluster, this is used for local cluster authentication. If specified, the cluster will use SASL_SSL with PLAIN SASL as its mechanism for authentication. If you wish to use another authentication mechanism, do not specify this flag, and add the security configurations in the configuration file. --source-api-secret string An API secret for the source cluster. For links at destination cluster, this is used for remote cluster authentication. For links at source cluster, this is used for local cluster authentication. If specified, the cluster will use SASL_SSL with PLAIN SASL as its mechanism for authentication. If you wish to use another authentication mechanism, do not specify this flag, and add the security configurations in the configuration file. --destination-api-key string An API key for the destination cluster. This is used for remote cluster authentication links at the source cluster. If specified, the cluster will use SASL_SSL with PLAIN SASL as its mechanism for authentication. If you wish to use another authentication mechanism, do not specify this flag, and add the security configurations in the configuration file. --destination-api-secret string An API secret for the destination cluster. This is used for remote cluster authentication for links at the source cluster. If specified, the cluster will use SASL_SSL with PLAIN SASL as its mechanism for authentication. If you wish to use another authentication mechanism, do not specify this flag, and add the security configurations in the configuration file. --remote-api-key string An API key for the remote cluster for bidirectional links. This is used for remote cluster authentication. If specified, the cluster will use SASL_SSL with PLAIN SASL as its mechanism for authentication. If you wish to use another authentication mechanism, do not specify this flag, and add the security configurations in the configuration file. --remote-api-secret string An API secret for the remote cluster for bidirectional links. This is used for remote cluster authentication. If specified, the cluster will use SASL_SSL with PLAIN SASL as its mechanism for authentication. If you wish to use another authentication mechanism, do not specify this flag, and add the security configurations in the configuration file. --local-api-key string An API key for the local cluster for bidirectional links. This is used for local cluster authentication if remote link's connection mode is Inbound. If specified, the cluster will use SASL_SSL with PLAIN SASL as its mechanism for authentication. If you wish to use another authentication mechanism, do not specify this flag, and add the security configurations in the configuration file. --local-api-secret string An API secret for the local cluster for bidirectional links. This is used for local cluster authentication if remote link's connection mode is Inbound. If specified, the cluster will use SASL_SSL with PLAIN SASL as its mechanism for authentication. If you wish to use another authentication mechanism, do not specify this flag, and add the security configurations in the configuration file. --config strings A comma-separated list of "key=value" pairs, or path to a configuration file containing a newline-separated list of "key=value" pairs. --dry-run Validate a link, but do not create it. --no-validate Create a link even if the source cluster cannot be reached. --url string Base URL of REST Proxy Endpoint of Kafka Cluster (include "/kafka" for embedded Rest Proxy). Must set flag or CONFLUENT_REST_URL. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent REST Proxy. --client-cert-path string Path to client cert to be verified by Confluent REST Proxy. Include for mTLS authentication. --client-key-path string Path to client private key, include for mTLS authentication. --no-authentication Include if requests should be made without authentication headers and user will not be prompted for credentials. --prompt Bypass use of available login credentials and prompt for Kafka Rest credentials. --context string CLI context name. ``` ### On-Premises ```none --url string Base URL of REST Proxy Endpoint of Kafka Cluster (include "/kafka" for embedded Rest Proxy). Must set flag or CONFLUENT_REST_URL. --certificate-authority-path string Path to a PEM-encoded Certificate Authority to verify the Confluent REST Proxy. --client-cert-path string Path to client cert to be verified by Confluent REST Proxy. Include for mTLS authentication. --client-key-path string Path to client private key, include for mTLS authentication. --no-authentication Include if requests should be made without authentication headers and user will not be prompted for credentials. --prompt Bypass use of available login credentials and prompt for Kafka Rest credentials. --partitions uint32 Number of topic partitions. --replication-factor uint32 Number of replicas. --config strings A comma-separated list of "key=value" pairs, or path to a configuration file containing a newline-separated list of "key=value" pairs. --if-not-exists Exit gracefully if topic already exists. ``` ### On-Premises | Command | Description | |---------------------------------------------------------------------------------|------------------------------------------------------------| | [confluent audit-log](audit-log/index.md#confluent-audit-log) | Manage audit log configuration. | | [confluent cloud-signup](confluent_cloud-signup.md#confluent-cloud-signup) | Sign up for Confluent Cloud. | | [confluent cluster](cluster/index.md#confluent-cluster) | Retrieve metadata about Confluent Platform clusters. | | [confluent completion](confluent_completion.md#confluent-completion) | Print shell completion code. | | [confluent configuration](configuration/index.md#confluent-configuration) | Configure the Confluent CLI. | | [confluent connect](connect/index.md#confluent-connect) | Manage Kafka Connect. | | [confluent context](context/index.md#confluent-context) | Manage CLI configuration contexts. | | [confluent flink](flink/index.md#confluent-flink) | Manage Apache Flink. | | [confluent iam](iam/index.md#confluent-iam) | Manage RBAC, ACL and IAM permissions. | | [confluent kafka](kafka/index.md#confluent-kafka) | Manage Apache Kafka. | | [confluent ksql](ksql/index.md#confluent-ksql) | Manage ksqlDB. | | [confluent local](local/index.md#confluent-local) | Manage a local Confluent Platform development environment. | | [confluent login](confluent_login.md#confluent-login) | Log in to Confluent Cloud or Confluent Platform. | | [confluent logout](confluent_logout.md#confluent-logout) | Log out of Confluent Platform. | | [confluent plugin](plugin/index.md#confluent-plugin) | Manage Confluent plugins. | | [confluent prompt](confluent_prompt.md#confluent-prompt) | Add Confluent CLI context to your terminal prompt. | | [confluent schema-registry](schema-registry/index.md#confluent-schema-registry) | Manage Schema Registry. | | [confluent secret](secret/index.md#confluent-secret) | Manage secrets for Confluent Platform. | | [confluent shell](confluent_shell.md#confluent-shell) | Start an interactive shell. | | [confluent update](confluent_update.md#confluent-update) | Update the Confluent CLI. | | [confluent version](confluent_version.md#confluent-version) | Show version of the Confluent CLI. | ### Produce and consume with Schema Registry Confluent CLI supports producing and consuming with Schema Registry functionalities. You can register a schema with a local file as you produce (writing data in that schema), and then read data from the schema as you consume. 1. In the Schema Registry cluster you registered, retrieve the following information the CLI requires: - Schema Registry endpoint. - The value format. - A local schema file. 2. When using Schema Registry, you must log in to Kafka as the MDS token will be used to authenticate the Schema Registry client: ```text confluent login --ca-cert-path \ --url https://: ``` 3. Produce and consume using the Confluent CLI commands. An example CLI command to produce to `test-topic`: ```text confluent kafka topic produce test-topic \ --protocol SASL_SSL \ --bootstrap ":19091" \ --username admin --password secret \ --value-format avro \ --schema ~/schema.avsc \ --sr-endpoint https://localhost:8085 \ --ca-location scripts/security/snakeoil-ca-1.crt ``` An example CLI command to consume from `test-topic`: ```text confluent kafka topic consume test-topic -b \ --protocol SASL_SSL \ --bootstrap ":19091" \ --username admin --password secret \ --value-format avro \ --sr-endpoint https://localhost:8085 \ --ca-location scripts/security/snakeoil-ca-1.crt ``` - `--schema` is the path to your local schema file. - Specify `--value-format` according to the format of the schema file: `avro`, `json` or `protobuf`. When later consuming, it should also be set to the same value. - `--sr-endpoint` is the endpoint to the Schema Registry cluster. - `--ca-location` is required flag when working with schemas. It’s used to authenticate the Schema Registry client. It might be the same file that you use for SSL verification. ## Use case 1: Confluent Cloud cluster with public networking For users with a simple setup–with Confluent CLI installed and with internet connectivity –you want to do two tasks: 1. List the API keys: ```text confluent api-key list ``` 2. List the topics on the cluster: ```text confluent kafka topic list --environment env-639yqq --cluster lkc-nykw7z ``` In this use case, `lkc-nykw7z` is a basic cluster with public/internet endpoints. Both of the previous commands will egress using your workstation’s default gateway to the internet, but to two different internet endpoints. The API Key request goes to an `api.confluent.cloud` endpoint (Control Plane), and the topic list request goes to the broker’s Kafka REST API endpoint (Data Plane). Here are the two requests: ```text GET https://api.confluent.cloud/iam/v2/api-keys?page_size=100&spec.owner=&spec.resource= HTTP/2.0 ``` ```text GET https://pkc-ldvj1.ap-southeast-2.aws.confluent.cloud/kafka/v3/clusters/lkc-nykw7z/topics ``` If network administration and firewall rules prevent direct outbound internet connections and an outbound forward proxy server is required, you must configure the Confluent CLI to use the proxy server. The workstation user must add the following to the `.bashrc` file. Note that you must supply the proxy server host and port, for example, `http://:`: ```text export HTTPS_PROXY=http://localhost:8080 ``` Both of the previous connections will be routed using the proxy. ## Cluster overview page The overview page for a single Apache Kafka® cluster provides a summary view of the cluster and its connected services. ![Normal mode and Reduced infrastructure mode](images/basics-c3-cluster-overview.png) The following table describes the panels found on the Clusters page by mode. All of the panels are clickable and navigate you directly to the relevant sections. | Section | Normal mode | Reduced infrastructure mode | |-------------------------------------------------------------|----------------------------------------------------------------------------------------|---------------------------------------------| | [Brokers overview](brokers.md#c3-brokers-overview-metrics) | Total brokers with production and consumption throughput. | Not visible in Reduced infrastructure mode. | | [Topics overview](topics/overview.md#c3-all-topics) | Total topics, total partitions, under replicated partitions, out of sync replicas. | Total topics and total partitions. | | [Connect overview](connect.md#c3-all-connect-clusters-page) | Number of Connect clusters and connector status. | Same as Normal mode. | | [ksqlDB overview](ksql.md#c3-ksql-clusters-page) | Number of ksqlDB clusters and persistent queries. | Same as Normal mode. | You can view and edit cluster properties and broker configurations in the Cluster settings pages. When you click the Cluster settings sub-menu for a cluster, the **General** tab appears by default. ### confluent.controlcenter.auth.restricted.roles Specify a list of roles with limited read-only access. You must include roles added here in `confluent.controlcenter.rest.authentication.roles`. For users that are members of roles included in this list, the following features and options are unavailable: * Add, delete, pause, or resume connectors * Browse connectors * View connector settings * Upload connector configs * Create, delete, or edit alerts (triggers or actions) * Edit a license * Edit brokers * Press submit on cluster forms * Edit, create, or delete schemas * Edit data flow queries * [Inspect topics](../topics/messages.md#c3-topic-message-browser) * Type in the KSQL editor * [Run or stop ksqlDB querie](../ksql.md#controlcenter-userguide-ksql) * Add ksqlDB streams or table For fine-grained access control, consider configuring [role-based access control (RBAC)](../security/c3-rbac.md#controlcenter-security-rbac). * Type: list * Default: “” * Importance: low ### ZooKeeper Considerations: : - You must use a special command to start Prometheus on MacOS. 1. Download the Confluent Platform archive (7.7 to 7.9 supported) and run these commands: ```bash wget https://packages.confluent.io/archive/7.9/confluent-7.9.0.tar.gz ``` ```bash tar -xvf confluent-7.9.0.tar.gz ``` ```bash cd confluent-7.9.0 ``` ```bash export CONFLUENT_HOME=`pwd` ``` 2. Update broker configurations to emit metrics to Prometheus by adding the following configurations to: `etc/kafka/server.properties` ```bash metric.reporters=io.confluent.telemetry.reporter.TelemetryReporter confluent.telemetry.exporter._c3.type=http confluent.telemetry.exporter._c3.enabled=true confluent.telemetry.exporter._c3.metrics.include=io.confluent.kafka.server.request.(?!.*delta).*|io.confluent.kafka.server.server.broker.state|io.confluent.kafka.server.replica.manager.leader.count|io.confluent.kafka.server.request.queue.size|io.confluent.kafka.server.broker.topic.failed.produce.requests.rate.1.min|io.confluent.kafka.server.tier.archiver.total.lag|io.confluent.kafka.server.request.total.time.ms.p99|io.confluent.kafka.server.broker.topic.failed.fetch.requests.rate.1.min|io.confluent.kafka.server.broker.topic.total.fetch.requests.rate.1.min|io.confluent.kafka.server.partition.caught.up.replicas.count|io.confluent.kafka.server.partition.observer.replicas.count|io.confluent.kafka.server.tier.tasks.num.partitions.in.error|io.confluent.kafka.server.broker.topic.bytes.out.rate.1.min|io.confluent.kafka.server.request.total.time.ms.p95|io.confluent.kafka.server.controller.active.controller.count|io.confluent.kafka.server.session.expire.listener.zookeeper.disconnects.total|io.confluent.kafka.server.request.total.time.ms.p999|io.confluent.kafka.server.controller.active.broker.count|io.confluent.kafka.server.request.handler.pool.request.handler.avg.idle.percent.rate.1.min|io.confluent.kafka.server.session.expire.listener.zookeeper.disconnects.rate.1.min|io.confluent.kafka.server.controller.unclean.leader.elections.rate.1.min|io.confluent.kafka.server.replica.manager.partition.count|io.confluent.kafka.server.controller.unclean.leader.elections.total|io.confluent.kafka.server.partition.replicas.count|io.confluent.kafka.server.broker.topic.total.produce.requests.rate.1.min|io.confluent.kafka.server.controller.offline.partitions.count|io.confluent.kafka.server.socket.server.network.processor.avg.idle.percent|io.confluent.kafka.server.partition.under.replicated|io.confluent.kafka.server.log.log.start.offset|io.confluent.kafka.server.log.tier.size|io.confluent.kafka.server.log.size|io.confluent.kafka.server.tier.fetcher.bytes.fetched.total|io.confluent.kafka.server.request.total.time.ms.p50|io.confluent.kafka.server.tenant.consumer.lag.offsets|io.confluent.kafka.server.session.expire.listener.zookeeper.expires.rate.1.min|io.confluent.kafka.server.log.log.end.offset|io.confluent.kafka.server.broker.topic.bytes.in.rate.1.min|io.confluent.kafka.server.partition.under.min.isr|io.confluent.kafka.server.partition.in.sync.replicas.count|io.confluent.telemetry.http.exporter.batches.dropped|io.confluent.telemetry.http.exporter.items.total|io.confluent.telemetry.http.exporter.items.succeeded|io.confluent.telemetry.http.exporter.send.time.total.millis|io.confluent.kafka.server.controller.leader.election.rate.(?!.*delta).*|io.confluent.telemetry.http.exporter.batches.failed confluent.telemetry.exporter._c3.client.base.url=http://localhost:9090/api/v1/otlp confluent.telemetry.exporter._c3.client.compression=gzip confluent.telemetry.exporter._c3.api.key=dummy confluent.telemetry.exporter._c3.api.secret=dummy confluent.telemetry.exporter._c3.buffer.pending.batches.max=80 confluent.telemetry.exporter._c3.buffer.batch.items.max=4000 confluent.telemetry.exporter._c3.buffer.inflight.submissions.max=10 confluent.telemetry.metrics.collector.interval.ms=60000 confluent.telemetry.remoteconfig._confluent.enabled=false confluent.consumer.lag.emitter.enabled=true ``` 3. Download the Control Center archive and run these commands: ```bash wget https://packages.confluent.io/confluent-control-center-next-gen/archive/confluent-control-center-next-gen-2.3.0.tar.gz ``` ```bash tar -xvf confluent-control-center-next-gen-2.3.0.tar.gz ``` ```bash cd confluent-control-center-next-gen-2.3.0 ``` 4. Start Control Center. To start Control Center, you must have three dedicated command windows: one for Prometheus, another for the Control Center process, and a third for Alertmanager. Run the following commands from `CONTROL_CENTER_HOME` in all command windows. 1. Start Prometheus. ```bash bin/prometheus-start ``` 2. Start Alertmanager. ```bash bin/alertmanager-start ``` 3. Start Control Center. ```bash bin/control-center-start etc/confluent-control-center/control-center-dev.properties ``` 5. Start Confluent Platform. Start ZooKeeper. ```bash bin/zookeeper-server-start etc/kafka/zookeeper.properties ``` Start Kafka. ```bash bin/kafka-server-start etc/kafka/server.properties ``` ### Bad security configuration * Check the security configuration for all brokers, Telemetry Reporter, and Control Center (see [debugging check configuration](#check-configurations)). For example, is it SASL_SSL, SASL_PLAINTEXT, SSL? * Possible errors include: ```bash ERROR SASL authentication failed using login context 'Client'. (org.apache.zookeeper.client.ZooKeeperSaslClient) ``` ```bash Caused by: org.apache.kafka.common.KafkaException: java.lang.IllegalArgumentException: No serviceName defined in either JAAS or Kafka configuration ``` ```bash org.apache.kafka.common.errors.IllegalSaslStateException: Unexpected handshake request with client mechanism GSSAPI, enabled mechanisms are [GSSAPI] ``` * Verify that the correct Java Authentication and Authorization Service (JAAS) configuration was detected. * If ACLs are enabled, check them. * To verify that you can communicate with the cluster, try to produce and consume using `console-*` with the same security settings. ### HTTP Basic authentication enabled for Schema Registry Whenever you have HTTP Basic authentication configured for Schema Registry, you must provide a username and password for Control Center to communicate correctly with Schema Registry. For a single cluster or the first cluster in a multi-cluster deployment, set the following properties, where the `user.info` contains a `:` that you have configured for Schema Registry. ```bash confluent.controlcenter.schema.registry.basic.auth.credentials.source=USER_INFO confluent.controlcenter.schema.registry.basic.auth.user.info=: ``` For multi-cluster deployment, to set the remaining clusters, use: ```bash confluent.controlcenter.schema.registry..basic.auth.credentials.source=USER_INFO confluent.controlcenter.schema.registry..basic.auth.user.info=: ``` A multi-cluster deployment Schema Registry might look like the following: ```bash // first Schema Registry cluster confluent.controlcenter.schema.registry.url= confluent.controlcenter.schema.registry.basic.auth.credentials.source=USER_INFO confluent.controlcenter.schema.registry.basic.auth.user.info=: // additional Schema Registry clusters confluent.controlcenter.schema.registry..url= confluent.controlcenter.schema.registry..basic.auth.credentials.source=USER_INFO confluent.controlcenter.schema.registry..basic.auth.user.info=: ``` See [Schema Registry](/platform/current/security/authentication/http-basic-auth/overview.html#basic-auth-sr) for steps to configure HTTP Basic authentication for Schema Registry. ### HTTP Basic authentication enabled for Connect Whenever you have HTTP Basic authentication configured for Connect, you must provide a username and password for Control Center to communicate correctly with Connect. Set the `confluent.controlcenter.connect..basic.auth.user.info` property to a value that contains `:` that you have configured for Connect. ```bash confluent.controlcenter.connect..basic.auth.user.info=: ``` See [Connect REST API](/platform/current/security/authentication/http-basic-auth/overview.html#basic-auth-kconnect) for steps to configure HTTP Basic authentication for Connect. ## View My Role Assignments To access the Assignments page: 1. [Log in](c3-rbac-login.md#c3-rbac-login) to Control Center. 2. From the Control Center **Administration menu**, click the **View my role assignments** option. 3. Click the **Assignments** tab. ![View My Role Assignments Cluster Level page](images/c3-rbac-my-role-assign.png) Use this page to: - View the clusters for which you have role assignments. - Search for clusters by name and ID. - Filter the cluster view by cluster type: Connect, Kafka, ksqlDB, Schema Registry. #### NOTE If there is only one type of cluster you are authorized for, the **Cluster type** (All clusters) list does not appear. Only the cluster types that you have role permissions for appear in the list. - Drill into a cluster to access the Cluster roles and Resource roles pages where you can view your role assignments for a cluster and its resources. ![View My Connect Cluster Resource Role Assignments page](images/c3-rbac-view-connect-cluster-role.png) - Click the relevant tab to navigate to the appropriate resource scope page for a cluster, such as: - Consumer **Group**, **Topic**, or **Transactional ID** tab for a Kafka cluster. - **Subject** tab for a Schema Registry cluster. - **Connector** tab for a Connect cluster. # Configure Control Center to work with Kafka ACLs on Confluent Platform Before attempting to create and use Access Control Lists (ACLs), you should familiarize yourself with [ACL concepts](/platform/current/security/authorization/acls/overview.html#acl-concepts). Doing so can help you avoid common pitfalls that can occur when creating and using ACLs to manage access to components and cluster data. Standard Apache Kafka® authorization and encryption options are available for [control center](../installation/configuration.md#kafka-encryption-authentication-authorization-settings). ### Asynchronous writes With librdkafka, you first need to create a `rd_kafka_topic_t` handle for the topic you want to write to. Then you can use `rd_kafka_produce` to send messages to it. For example: ```c rd_kafka_topic_t *rkt = rd_kafka_topic_new(rk, topic, topic_conf); if (rd_kafka_produce(rkt, RD_KAFKA_PARTITION_UA, RD_KAFKA_MSG_F_COPY, payload, payload_len, key, key_len, NULL) == -1) { fprintf(stderr, "%% Failed to produce to topic %s: %s\n", topic, rd_kafka_err2str(rd_kafka_errno2err(errno))); } ``` You can pass topic-specific configuration in the third argument to `rd_kafka_topic_new`. The previous example passed the `topic_conf` and seeded with a configuration for acknowledgments. Passing `NULL` will cause the producer to use the default configuration. The second argument to `rd_kafka_produce` can be used to set the desired partition for the message. If set to `RD_KAFKA_PARTITION_UA`, as in this case, librdkafka will use the default partitioner to select the partition for this message. The third argument indicates that librdkafka should copy the payload and key, which would let us free it upon returning. If you want to invoke some code after the write has completed, you have to configure it on initialization: ```c static void on_delivery(rd_kafka_t *rk, const rd_kafka_message_t *rkmessage void *opaque) { if (rkmessage->err) fprintf(stderr, "%% Message delivery failed: %s\n", rd_kafka_message_errstr(rkmessage)); } void init_rd_kafka() { rd_kafka_conf_t *conf = rd_kafka_conf_new(); rd_kafka_conf_set_dr_msg_cb(conf, on_delivery); // initialization omitted } ``` The delivery callback in librdkafka is invoked in the user’s thread by calling `rd_kafka_poll`. A common pattern is to call this function after every call to the produce API, but this may not be sufficient to ensure regular delivery reports if the message produce rate is not steady. However, this API does not provide a direct way to block for the result of any particular message delivery. If you need to do this, then see the synchronous write example below. #### Asynchronous Commits ```python def consume_loop(consumer, topics): try: consumer.subscribe(topics) msg_count = 0 while running: msg = consumer.poll(timeout=1.0) if msg is None: continue if msg.error(): if msg.error().code() == KafkaError._PARTITION_EOF: # End of partition event sys.stderr.write('%% %s [%d] reached end at offset %d\n' % (msg.topic(), msg.partition(), msg.offset())) elif msg.error(): raise KafkaException(msg.error()) else: msg_process(msg) msg_count += 1 if msg_count % MIN_COMMIT_COUNT == 0: consumer.commit(asynchronous=True) finally: # Close down consumer to commit final offsets. consumer.close() ``` In this example, the consumer sends the request and returns immediately by using asynchronous commits. The `asynchronous` parameter to `commit()` is changed to `True`. The value is passed in explicitly, but asynchronous commits are the default if the parameter is not included. The API gives you a callback which is invoked when the commit either succeeds or fails. The commit callback can be any callable and can be passed as a configuration parameter to the consumer constructor. ```python from confluent_kafka import Consumer def commit_completed(err, partitions): if err: print(str(err)) else: print("Committed partition offsets: " + str(partitions)) conf = {'bootstrap.servers': "host1:9092,host2:9092", 'group.id': "foo", 'default.topic.config': {'auto.offset.reset': 'smallest'}, 'on_commit': commit_completed} consumer = Consumer(conf) ``` ### Override Default Configuration Properties You can override the replication factor using `confluent.topic.replication.factor`. For example, when using a Kafka cluster as a destination with less than three brokers (for development and testing) you should set the `confluent.topic.replication.factor` property to `1`. You can override producer-specific properties by using the `producer.override.*` prefix (for source connectors) and consumer-specific properties by using the `consumer.override.*` prefix (for sink connectors). You can use the defaults or customize the other properties as well. For example, the `confluent.topic.client.id` property defaults to the name of the connector with `-licensing` suffix. You can specify the configuration settings for brokers that require SSL or SASL for client connections using this prefix. You cannot override the cleanup policy of a topic because the topic always has a single partition and is compacted. Also, do not specify serializers and deserializers using this prefix; they are ignored if added. ## Distributed This configuration is used typically along with [distributed mode](/platform/current/connect/concepts.html#distributed-workers). 1. Create a file named `connector.json` using the following JSON configuration example: ```bash { "name": "connector1", "config": { "connector.class": "io.confluent.connect.activemq.ActiveMQSourceConnector", "kafka.topic":"MyKafkaTopicName", "activemq.url":"tcp://localhost:61616", "jms.destination.name":"testing", "confluent.license":"", "confluent.topic.bootstrap.servers":"localhost:9092" } } ``` You can change the `confluent.topic.*` properties to fit your specific environment. If running on a single-node Kafka cluster, you must include the following: `"confluent.topic.replication.factor":"1"`. Leave the `confluent.license` property blank for a 30-day trial. For more details, see the [configuration options](source_connector_config.md#activemq-source-connector-license-config). To explore other options when connecting to ActiveMQ, see the [Configuration Reference for ActiveMQ Source Connector for Confluent Platform](source_connector_config.md#activemq-source-connector-config) page. For details about the ActiveMQ URL parameters, see the [Apache ActiveMQ](https://activemq.apache.org/connection-configuration-uri.html) documentation. 2. Use `curl` to post the configuration to one of the Kafka Connect Workers. Change `http://localhost:8083/` the endpoint of one of your Kafka Connect worker(s). ```bash curl -s -X POST -H 'Content-Type: application/json' --data @connector.json http://localhost:8083/connectors ``` ### Source Connector Configuration Start the services using the Confluent CLI: ```bash confluent local start ``` Create a configuration file named aws-cloudwatch-logs-source-config.json with the following contents. ```text { "name": "aws-cloudwatch-logs-source", "config": { "connector.class": "io.confluent.connect.aws.cloudwatch.logs.AwsCloudWatchSourceConnector", "tasks.max": "1", "aws.cloudwatch.logs.url": "https://logs.us-east-2.amazonaws.com", "aws.cloudwatch.log.group": "my-log-group", "aws.cloudwatch.log.streams": "my-log-stream", "name": "aws-cloudwatch-logs-source", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1" } } ``` The important configuration parameters used here are: - **aws.cloudwatch.logs.url**: The endpoint URL that the source connector connects to pull the specified logs. - **aws.cloudwatch.log.group**: The AWS CloudWatch log group under which the log streams are contained. - **aws.cloudwatch.log.streams**: A list of AWS CloudWatch log streams from which the logs are pulled from. The default value is to use all log streams from the configured log group. - **tasks.max**: The maximum number of tasks that should be created for this connector. - You may pass your [AWS Credentials](https://docs.confluent.io/kafka-connect-kinesis/current/index.html#aws-credentials) to the AWS CloudWatch Logs Connector through your source connector configuration. To pass AWS credentials in the source configuration set the **aws.access.key.id** and the **aws.secret.key.id**: parameters. ```text "aws.access.key.id": "aws.secret.access.key": ``` Run this command to start the AWS CloudWatch Logs Source connector. ```bash confluent local load aws-cloudwatch-logs-source --config aws-cloudwatch-logs-source-config.json ``` To check that the connector started successfully view the Connect worker’s log by running: ```bash confluent local services connect log ``` Start a Kafka Consumer in a separate terminal session to view the data exported by the connector into the kafka topic ```text path/to/confluent/bin/kafka-console-consumer --bootstrap-server localhost:9092 --topic my-log-group.my-log-stream --from-beginning ``` Finally, stop the Confluent services using the command: ```bash confluent local stop ``` ### REST based example This configuration is used typically along with [distributed workers](/platform/current/connect/concepts.html#distributed-workers). Write the following JSON to `connector.json`, configure all of the required values, and use the command below to post the configuration to one the distributed connect worker(s). Check here for more information about the Kafka Connect [REST API](/platform/current/connect/references/restapi.html). ```bash { "name" : "aws-cloudwatch-logs-source-connector", "config" : { "name" : "aws-cloudwatch-logs-source-connector", "connector.class" : "io.confluent.connect.aws.cloudwatch.logs.AwsCloudWatchSourceConnector", "tasks.max" : "1", "aws.access.key.id" : "< Optional Configuration >", "aws.secret.access.key" : "< Optional Configuration >", "aws.cloudwatch.log.group" : "< Required Configuration >", "aws.cloudwatch.log.streams : "< Optional Configuration - defaults to all log streams in the log group >" } } ``` Use curl to post the configuration to one of the Kafka Connect Workers. Change `http://localhost:8083/` the endpoint of one of your Kafka Connect workers. ```bash curl -s -X POST -H 'Content-Type: application/json' --data @connector.json http://localhost:8083/connectors ``` ```bash curl -s -X PUT -H 'Content-Type: application/json' --data @connector.json \ http://localhost:8083/connectors/aws-cloudwatch-logs-source-connector/config ``` ### REST-based example This configuration is used typically along with [distributed workers](/platform/current/connect/concepts.html#distributed-workers). Write the following JSON to `connector.json`, configure all of the required values, and use the command below to post the configuration to one the distributed connect workers. Check here for more information about the Kafka Connect [Kafka Connect REST Interface](/platform/current/connect/references/restapi.html). ```bash { "name" : "aws-cloudwatch-metrics-sink-connector", "config" : { "name": "aws-cloudwatch-metrics-sink", "connector.class": "io.confluent.connect.aws.cloudwatch.metrics.AwsCloudWatchMetricsSinkConnector", "tasks.max": "1", "aws.cloudwatch.metrics.url": "https://monitoring.us-east-2.amazonaws.com", "aws.cloudwatch.metrics.namespace": "service-namespace", "behavior.on.malformed.metric": "fail", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1" } } ``` Use curl to post the configuration to one of the Kafka Connect workers. Change `http://localhost:8083/` to the endpoint of one of your Kafka Connect workers. ```bash curl -s -X POST -H 'Content-Type: application/json' --data @connector.json http://localhost:8083/connectors ``` ```bash curl -s -X PUT -H 'Content-Type: application/json' --data @connector.json \ http://localhost:8083/connectors/aws-cloudwatch-metrics-sink-connector/config ``` ### REST-based example This configuration is typically used with [distributed workers](/platform/current/connect/concepts.html#distributed-workers). Write the following JSON to `connector.json`, configure all of the required values. Use the command below to post the configuration to one of the distributed Kafka Connect worker(s). See Kafka Connect [REST API](/platform/current/connect/references/restapi.html) for more information. ```bash { "name": "LambdaSinkConnector", "config" : { "connector.class" : "io.confluent.connect.aws.lambda.AwsLambdaSinkConnector", "tasks.max" : "1", "topics" : "< Required Configuration >", "aws.lambda.function.name" : "< Required Configuration >", "aws.lambda.invocation.type" : "sync", "aws.lambda.batch.size" : "50", "behavior.on.error" : "fail", "confluent.topic.bootstrap.servers" : "localhost:9092", "confluent.topic.replication.factor" : "1" } } ``` Use curl to post the configuration to one of the Kafka Connect workers. Change `http://localhost:8083/` the endpoint of one of your Kafka Connect workers. ```bash curl -s -X POST -H 'Content-Type: application/json' --data @connector.json http://localhost:8083/connectors ``` For details about using this connector with Kafka Connect Reporter, see [Connect Reporter](/kafka-connectors/self-managed/userguide.html#userguide-connect-reporter). ## REST-based Example This configuration is used typically along with [distributed workers](/platform/current/connect/concepts.html#distributed-workers). Write the following JSON to `connector.json`, configure all of the required values, and use the command below to post the configuration to one the distributed connect workers. Refer to [REST API](/platform/current/connect/references/restapi.html) for more information about the Kafka Connect. **Connect Distributed REST example:** ```json { "name": "EventHubsSourceConnector1", "config": { "confluent.topic.bootstrap.servers": "< Required Configuration >", "connector.class": "io.confluent.connect.azure.eventhubs.EventHubsSourceConnector", "kafka.topic": "< Required Configuration >", "tasks.max": "1", "max.events": "< Optional Configuration >", "azure.eventhubs.sas.keyname": "< Required Configuration >", "azure.eventhubs.sas.key": "< Required Configuration >", "azure.eventhubs.namespace": "< Required Configuration >", "azure.eventhubs.hub.name": "< Required Configuration >" } } ``` Use curl to post the configuration to one of the Kafka Connect Workers. Change http://localhost:8083/ to the endpoint of one of your Kafka Connect workers. **Create a new connector:** ```bash curl -s -X POST -H 'Content-Type: application/json' --data @connector.json http://localhost:8083/connectors ``` **Update an existing connector:** ```bash curl -s -X PUT -H 'Content-Type: application/json' --data @connector.json http://localhost:8083/connectors/EventHubsSourceConnector1/config ``` #### NOTE Provide `datadog.api.key`, `datadog.domain` and `behavior.on.error` and start the connector. Then start the Datadog metrics connector by loading its configuration with the following command. ```bash confluent local load datadog-metrics-sink --config datadog-metrics-sink-connector.properties { "name": "datadog-metrics-sink", "config": { "connector.class": "io.confluent.connect.datadog.metrics.DatadogMetricsSinkConnector", "tasks.max":"1", "topics":"datadog-metrics-topic", "datadog.api.key": "< your-api-key > " "datadog.domain": "COM" "behavior.on.error": "fail", "key.converter":"io.confluent.connect.avro.AvroConverter", "key.converter.schema.registry.url":"http://localhost:8081", "value.converter":"io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url":"http://localhost:8081", "confluent.topic.bootstrap.servers":"localhost:9092", "confluent.topic.replication.factor":"1", "reporter.bootstrap.servers": "localhost:9092" }, "tasks": [] } ``` #### NOTE Change the `confluent.topic.bootstrap.servers` property to include your broker address(es) and change the `confluent.topic.replication.factor` to `3` for staging or production use. Use curl to post a configuration to one of the Kafka Connect workers. Change `http://localhost:8083/` to the endpoint of one of your Kafka Connect worker(s). ```bash curl -sS -X POST -H 'Content-Type: application/json' --data @config.json http://localhost:8083/connectors ``` Use the following command to update the configuration of existing connector. ```bash curl -s -X PUT -H 'Content-Type: application/json' --data @config.json http://localhost:8083/connectors/FirebaseSinkConnector/config ``` Confirm that the connector is in a `RUNNING` state by running the following command: ```bash curl http://localhost:8083/connectors/FirebaseSinkConnector/status | jq ``` The output should resemble: ```bash { "name":"FirebaseSinkConnector", "connector":{ "state":"RUNNING", "worker_id":"127.0.1.1:8083" }, "tasks":[ { "id":0, "state":"RUNNING", "worker_id":"127.0.1.1:8083" } ], "type":"sink" } ``` Search for the endpoint `/connectors/FirebaseSinkConnector/status`, the state of the connector and tasks should have status as `RUNNING`. To produce Avro data to Kafka topic: `artists`, use the following command. ```bash ./bin/kafka-avro-console-producer --broker-list localhost:9092 --topic artists \ --property parse.key=true \ --property key.schema='{"type":"string"}' \ --property "key.separator=:" \ --property value.schema='{"type":"record","name":"artists","fields":[{"name":"name","type":"string"},{"name":"genre","type":"string"}]}' ``` While the console is waiting for the input, use the following three records and paste each of them on the console. ```bash "artistId1":{"name":"Michael Jackson","genre":"Pop"} "artistId2":{"name":"Bob Dylan","genre":"American folk"} "artistId3":{"name":"Freddie Mercury","genre":"Rock"} ``` To produce Avro data to Kafka topic: `songs`, use the following command. ```bash ./bin/kafka-avro-console-producer --broker-list localhost:9092 --topic songs \ --property parse.key=true \ --property key.schema='{"type":"string"}' \ --property "key.separator=:" \ --property value.schema='{"type":"record","name":"songs","fields":[{"name":"title","type":"string"},{"name":"artist","type":"string"}]}' ``` While the console is waiting for the input, paste the following three records on the Firebase console. ```bash "songId1":{"title":"billie jean","artist":"Michael Jackson"} "songId2":{"title":"hurricane","artist":"Bob Dylan"} "songId3":{"title":"bohemian rhapsody","artist":"Freddie Mercury"} ``` Finally, check the Firebase console to ensure that the collections named `artists` and `songs` were created and the records are in the format defined in the [Firebase database structure](#firebase-data-format). ### REST-based example Use this setting with [distributed workers](/platform/current/connect/concepts.html#distributed-workers). Write the following JSON to `config.json`, configure all of the required values, and use the following command to post the configuration to one of the distributed connect workers. For more information about the Kafka Connect REST API, see [this documentation](/platform/current/connect/references/restapi.html). ```json { "name" : "FirebaseSourceConnector", "config" : { "connector.class" : "io.confluent.connect.firebase.FirebaseSourceConnector", "tasks.max" : "1", "gcp.firebase.credentials.path" : "file-path-to-your-gcp-service-account-json-file", "gcp.firebase.database.reference": "https://.firebaseio.com", "gcp.firebase.snapshot" : "true", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1", "confluent.license": " Omit to enable trial mode " } } ``` #### NOTE For staging or production use: - Change the `confluent.topic.bootstrap.servers` property to include your broker address(es). - Change the `confluent.topic.replication.factor` to `3` for staging or production use. - Change `http://localhost:8083/` to the endpoint of one of your Connect worker(s). Use curl to post a configuration to one of the Connect workers. ```bash curl -sS -X POST -H 'Content-Type: application/json' --data @config.json http://localhost:8083/connectors ``` Confirm that the connector is in a `RUNNING` state by running the following command: ```bash curl http://localhost:8083/connectors/MyGithubConnector/status ``` The output should resemble the example below: ```bash { "name":"MyGithubConnector", "connector":{ "state":"RUNNING", "worker_id":"127.0.1.1:8083" }, "tasks":[ { "id":0, "state":"RUNNING", "worker_id":"127.0.1.1:8083" } ], "type":"source" } ``` Enter the following command to consume records written by the connector to the Kafka topic: ```bash ./kafka-avro-console-consumer --bootstrap-server localhost:9092 --topic github-stargazers --from-beginning ``` The output should resemble the example below: ```bash { "type": { "string": "STARGAZERS" }, "createdAt": null, "data": { "data": { "login": { "string": "User.Name" }, "id": { "int": 1234 }, "node_id": { "string": "MDQ6VXNlcjM0OTE3MTE=" }, "avatar_url": { "string": "https://avatars2.githubusercontent.com/u/1234?v=4" }, "gravatar_id": { "string": "" }, "url": { "string": "https://api.github.com/users/User.Name" }, "html_url": { "string": "https://github.com/User.Name" }, "followers_url": { "string": "https://api.github.com/users/User.Name/followers" }, "following_url": { "string": "https://api.github.com/users/User.Name/following{/other_user}" }, "gists_url": { "string": "https://api.github.com/users/User.Name/gists{/gist_id}" }, "starred_url": { "string": "https://api.github.com/users/User.Name/starred{/owner}{/repo}" }, "subscriptions_url": { "string": "https://api.github.com/users/User.Name/subscriptions" }, "organizations_url": { "string": "https://api.github.com/users/User.Name/orgs" }, "repos_url": { "string": "https://api.github.com/users/User.Name/repos" }, "events_url": { "string": "https://api.github.com/users/User.Name/events{/privacy}" }, "received_events_url": { "string": "https://api.github.com/users/User.Name/received_events" }, "type": { "string": "User" }, "site_admin": { "boolean": false } } }, "id": { "string": "1234" } } ``` ### Template parameters The HTTP Sink connector forwards the message (record) value to the HTTP API. You can add parameters to have the connector construct a unique HTTP API URL containing the record key and topic name. For example, you enter `http://eshost1:9200/api/messages/${topic}/${key}` to have the HTTP API URL contain the topic name and record key. In addition to the `${topic}` and `${key}` parameters, you can also refer to fields from the Kafka record. As shown in the following example, you may want the connector to construct a URL that uses the Order ID and Customer ID. The following example shows the Avro format the producer uses to generate records in the Apache Kafka® topic `order`: ```json { "name": "MyClass", "type": "record", "namespace": "com.acme.avro", "fields": [ { "name": "customerId", "type": "int" }, { "name": "order", "type": { "name": "order", "type": "record", "fields": [ { "name": "id", "type": "int" }, { "name": "amount", "type": "int" } ] } } ] } ``` To send the Order ID and Customer ID, you would use the following URL in the HTTP API URL (`http.api.url`) configuration property: ```properties "http.api.url" : "http://eshost1:9200/api/messages/order/${order.id}/customer/${customerId}/" ``` Assuming the data in the Kafka topic contains the following values: ```json { "customerId": 123, "order": { "id": 1, "amount": 12345 } } ``` The connector constructs the following URL: ```bash http://eshost1:9200/api/messages/order/1/customer/123/ ``` ## Distributed This configuration is used typically along with [distributed mode](/platform/current/connect/concepts.html#distributed-workers). Write the following JSON to connector.json, configure all of the required values, and use the command below to post the configuration to one the distributed connect worker(s). ```bash { "name": "connector1", "config": { "connector.class": "io.confluent.connect.ibm.mq.IbmMQSourceConnector", "kafka.topic":"MyKafkaTopicName", "mq.hostname":"localhost", "mq.transport.type":"client", "mq.queue.manager":"QMA", "mq.channel":"SYSTEM.DEF.SVRCONN", "jms.destination.name":"testing", "confluent.license":"", "confluent.topic.bootstrap.servers":"localhost:9092" } } ``` Change the `confluent.topic.*` properties as required to suit your environment. If running on a single-node Kafka cluster you will need to include `"confluent.topic.replication.factor":"1"`. Leave the `confluent.license` property blank for a 30 day trial. See the [configuration options](source_connector_config.md#ibmmq-source-connector-license-config) for more details. Use curl to post the configuration to one of the Kafka Connect Workers. Change `http://localhost:8083/` the endpoint of one of your Kafka Connect worker(s). ```bash curl -s -X POST -H 'Content-Type: application/json' --data @connector.json http://localhost:8083/connectors ``` #### Exactly once delivery The connector supports exactly once semantics when the following conditions are met: - All the connect workers in the cluster have the [exactly.once.support](https://docs.confluent.io/platform/current/installation/configuration/connect/index.html#exactly-once-source-support) property set to `enabled`. For more information, see [exactly once source worker](https://kafka.apache.org/documentation.html#connect_exactlyoncesource) . - The connect worker is running in a distributed mode. Exactly once delivery cannot be supported in standalone mode. - The connect worker principal should have the required ACLs. For more information on the required ACLs, see [ACLs for exactly once source](https://kafka.apache.org/documentation.html#connect_exactlyonce) . - The connector is configured with the `state.topic.name` property. When these conditions are met, the connector processes each record exactly once, even through failures or restarts. It uses the state topic to track progress of the records it has processed, allowing it to resume from the last processed record in case of a failure. You must set the state topic only when you first create the connector. Changing the topic name after the connector creation can result in duplicates. For exactly once semantics, the connector requires only one consumer of the MQ destination. Hence, it doesn’t support more than one task or receiver thread. The connector uses a transactional producer for writing records to the Kafka topic, guaranteeing exactly once delivery. Any Kafka consumer reading from the topic must also set [isolation.level](https://docs.confluent.io/platform/current/installation/configuration/consumer-configs.html#isolation-level) property to `read_committed`. #### NOTE Change the `confluent.topic.bootstrap.servers` property to include your broker address(es), and change the `confluent.topic.replication.factor` to `3` for staging or production use. Use curl to post a configuration to one of the Connect workers. Change `http://localhost:8083/` to the endpoint of one of your Connect worker(s). ```bash curl -sS -X POST -H 'Content-Type: application/json' --data @config.json http://localhost:8083/connectors ``` Enter the following command to confirm that the connector is in a `RUNNING` state: ```bash curl http://localhost:8083/connectors/MyJiraConnector/status ``` The output should resemble the example below: ```bash { "name":"MyJiraConnector", "connector":{ "state":"RUNNING", "worker_id":"127.0.1.1:8083" }, "tasks":[ { "id":0, "state":"RUNNING", "worker_id":"127.0.1.1:8083" } ], "type":"source" } ``` Enter the following command to consume records written by the connector to the Kafka topic: ```bash ./bin/kafka-avro-console-consumer --bootstrap-server localhost:9092 --topic jira-topic-roles --from-beginning ``` The output should resemble the example below: ```bash { "type":"roles", "data":{ "self":"/rest/api/2/role/10100", "name":"Project_Name", "id":10111, "description":"A test role added to the project", "scope":null, "actors":{ "array":[ { "id":10012, "displayName":"Jira_Actor_Name", "type":"user-role-actor", "actorUser":{ "accountId":"101" } } ] } } } ``` #### Override Default Configuration Properties You can override the replication factor using `confluent.topic.replication.factor`. For example, when using a Kafka cluster as a destination with less than three brokers (for development and testing) you should set the `confluent.topic.replication.factor` property to `1`. You can override producer-specific properties by using the `producer.override.*` prefix (for source connectors) and consumer-specific properties by using the `consumer.override.*` prefix (for sink connectors). You can use the defaults or customize the other properties as well. For example, the `confluent.topic.client.id` property defaults to the name of the connector with `-licensing` suffix. You can specify the configuration settings for brokers that require SSL or SASL for client connections using this prefix. You cannot override the cleanup policy of a topic because the topic always has a single partition and is compacted. Also, do not specify serializers and deserializers using this prefix; they are ignored if added. ## REST-based example This configuration is used typically along with [distributed workers](/platform/current/connect/concepts.html#distributed-workers). Write the following json to connector.json, configure all of the required values, and use the command below to post the configuration to one the distributed connect worker(s). Check here for more information about the Kafka Connect [REST API](/platform/current/connect/references/restapi.html) **Connect Distributed REST example:** ```json { "config" : { "name" : "MapRDBSinkConnector1", "connector.class" : "io.confluent.connect.mapr.db.MapRDBSinkConnector", "tasks.max" : "1", "mapr.table.map." : "" } } ``` Use curl to post the configuration to one of the Kafka Connect Workers. Change http://localhost:8083/ the endpoint of one of your Kafka Connect worker(s). **Create a new connector:** ```bash curl -s -X POST -H 'Content-Type: application/json' --data @connector.json http://localhost:8083/connectors ``` **Update an existing connector:** ```bash curl -s -X PUT -H 'Content-Type: application/json' --data @connector.json http://localhost:8083/connectors/MapRDBSinkConnector1/config ``` ### REST-based example This configuration is used typically along with [distributed workers](/platform/current/connect/concepts.html#distributed-workers). Write the following json to `connector.json`, configure all of the required values, and use the command below to post the configuration to one the distributed connect worker(s). Check here for more information about the Kafka Connect [REST API](/platform/current/connect/references/restapi.html) **Connect Distributed REST example:** ```json { "config" : { "name" : "MqttSinkConnector1", "connector.class" : "io.confluent.connect.mqtt.MqttSinkConnector", "tasks.max" : "1", "topics" : "< Required Configuration >", "mqtt.server.uri" : "< Required Configuration >" } } ``` Use `curl` to post the configuration to one of the Kafka Connect Workers. Change http://localhost:8083/ the endpoint of one of your Kafka Connect worker(s). - **Create a new connector**: ```bash curl -s -X POST -H 'Content-Type: application/json' --data @connector.json http://localhost:8083/connectors ``` - **Update an existing connector**: ```bash curl -s -X PUT -H 'Content-Type: application/json' --data @connector.json http://localhost:8083/connectors/MqttSinkConnector1/config ``` ### REST-based example This configuration is used typically along with [distributed workers](/platform/current/connect/concepts.html#distributed-workers). Write the following json to `connector.json`, configure all of the required values, and use the command below to post the configuration to one the distributed connect worker(s). Check here for more information about the Kafka Connect [REST API](/platform/current/connect/references/restapi.html) **Connect Distributed REST example:** ```json { "config" : { "name" : "MqttSourceConnector1", "connector.class" : "io.confluent.connect.mqtt.MqttSourceConnector", "tasks.max" : "1", "mqtt.server.uri" : "< Required Configuration >", "mqtt.topics" : "< Required Configuration >" } } ``` Use curl to post the configuration to one of the Kafka Connect Workers. Change http://localhost:8083/ the endpoint of one of your Kafka Connect worker(s). - **Create a new connector**: ```bash curl -s -X POST -H 'Content-Type: application/json' --data @connector.json http://localhost:8083/connectors ``` - **Update an existing connector**: ```bash curl -s -X PUT -H 'Content-Type: application/json' --data @connector.json http://localhost:8083/connectors/MqttSourceConnector1/config ``` ### REST-based example This configuration is typically used for [distributed workers](/platform/current/connect/concepts.html#distributed-workers). See the Kafka Connect [REST API](/platform/current/connect/references/restapi.html) for details REST API information. 1. Write the following JSON sample code to `connector.json` and set all of the required parameters. ```json { "name": "NetezzaSinkConnector", "config":{ "connector.class": "io.confluent.connect.netezza.NetezzaSinkConnector", "tasks.max": "1", "topics": "orders", "connection.host": "192.168.24.74", "connection.port": "5480", "connection.database": "SYSTEM", "connection.user": "admin", "connection.password": "password", "batch.size": "10000", "auto.create": "true" } } ``` 2. Use curl to post the configuration to one of the Kafka Connect workers. Change `http://localhost:8083/` the endpoint of one of your Kafka Connect worker(s). ```bash curl -s -X POST -H 'Content-Type: application/json' --data @connector.json http://localhost:8083/connectors ``` ```bash curl -s -X PUT -H 'Content-Type: application/json' --data @connector.json http://localhost:8083/connectors/NetezzaSinkConnector/config ``` 3. To verify the data in Netezza, log in to Netezza and connect to the Netezza database with the following command: ```bash [nz@netezza ~]$nzsql ``` 4. Run the following SQL query to verify the records: ```bash SYSTEM.ADMIN(ADMIN)=> select * from orders; foo|50.0|100|999 ``` ### The connector fails with “Redo Log consumer failed to subscribe…” This may occur if the connector can’t read from the redo log topic due to security being configured on the Kafka cluster. When security is enabled on a Kafka cluster, you must configure `redo.log.consumer.*` accordingly. For example, in the case of an SSL-secured (non-Confluent Cloud) cluster, you can configure the following properties: ```json "redo.log.consumer.security.protocol": "SSL", "redo.log.consumer.ssl.truststore.location": "", "redo.log.consumer.ssl.truststore.password": "", "redo.log.consumer.ssl.keystore.location": "", "redo.log.consumer.ssl.keystore.password": "", "redo.log.consumer.ssl.key.password": "”, "redo.log.consumer.ssl.truststore.type":"", "redo.log.consumer.ssl.keystore.type": "", ``` If configuring the connector to send data to a Confluent Cloud cluster, the following properties can be configured: ```json "redo.log.consumer.bootstrap.servers: "XXXXXXXX", "redo.log.consumer.security.protocol: "SASL_SSL", "redo.log.consumer.ssl.endpoint.identification.algorithm: "https", "redo.log.consumer.sasl.mechanism: "PLAIN", "redo.log.consumer.sasl.jaas.config: "org.apache.kafka.common.security.plain.PlainLoginModule required username='XXXXXXXXX' password='XXXXXXXXXX';" ``` ### REST-based example This configuration is used typically along with [distributed workers](/platform/current/connect/concepts.html#distributed-workers). Write the following JSON to `kafka-connect-redis.json`, configure all of the required values, and use the command below to post the configuration to one of the distributed connect workers. For more information about the Kafka Connect, see [REST API](/platform/current/connect/references/restapi.html). ```bash { "name" : "kafka-connect-redis", "config" : { "name" : "kafka-connect-redis", "connector.class" : "com.github.jcustenborder.kafka.connect.redis.RedisSinkConnector", "topics" : "users", "tasks.max" : "1", "key.converter" : "org.apache.kafka.connect.storage.StringConverter", "value.converter" : "org.apache.kafka.connect.storage.StringConverter" } } ``` Use curl to post the configuration to one of the Kafka Connect Workers. Change `http://localhost:8083/` the endpoint of one of your Kafka Connect workers. ```bash curl -s -X POST -H 'Content-Type: application/json' --data @kafka-connect-redis.json http://localhost:8083/connectors ``` ## Write JSON message values into ServiceNow The example settings file is shown below. 1. Create a `servicenow-sink-json.json` file with the following contents. #### NOTE All user-defined tables in ServiceNow start with `u_` ```bash // substitute <> with your config { "name": "ServiceNowSinkJSONConnector", "config": { "connector.class": "io.confluent.connect.servicenow.ServiceNowSinkConnector", "topics": "test_table_json", "servicenow.url": "https://.service-now.com/", "tasks.max": "1", "servicenow.table": "u_test_table", "servicenow.user": "", "servicenow.password": "", "key.converter":"org.apache.kafka.connect.storage.StringConverter", "value.converter":"org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable": "true", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.license": "", // leave it empty for evaluation license "confluent.topic.replication.factor": "1", "reporter.bootstrap.servers": "localhost:9092", "reporter.error.topic.name": "test-error", "reporter.error.topic.replication.factor": 1, "reporter.error.topic.key.format": "string", "reporter.error.topic.value.format": "string", "reporter.result.topic.name": "test-result", "reporter.result.topic.key.format": "string", "reporter.result.topic.value.format": "string", "reporter.result.topic.replication.factor": 1 } } ``` #### NOTE For details about using this connector with Kafka Connect Reporter, see [Connect Reporter](/kafka-connectors/self-managed/userguide.html#userguide-connect-reporter). 2. Load the ServiceNow Sink connector by posting configuration to Connect REST server. ```bash confluent local load ServiceNowSinkJSONConnector --config servicenow-sink-json.json ``` 3. Confirm that the connector is in a `RUNNING` state. ```bash confluent local status ServiceNowSinkJSONConnector ``` 4. To produce some records into the `test_table_json` topic, first start a Kafka producer. #### NOTE All user-defined columns in ServiceNow start with `u_` ```bash kafka-console-producer \ --broker-list localhost:9092 \ --topic test_table_json ``` 5. The console producer is now waiting for input, so you can go ahead and insert some records into the topic. ```json {"schema": {"type": "struct", "fields": [{"type": "string", "optional": false, "field": "u_name"},{"type": "float", "optional": false, "field": "u_price"}, {"type": "int64","optional":false,"field": "u_quantity"}],"optional": false,"name": "products"}, "payload": {"u_name": "laptop", "u_price": 999.50, "u_quantity": 3}} {"schema": {"type": "struct", "fields": [{"type": "string", "optional": false, "field": "u_name"},{"type": "float", "optional": false, "field": "u_price"}, {"type": "int64","optional":false,"field": "u_quantity"}],"optional": false,"name": "products"}, "payload": {"u_name": "pencil", "u_price": 0.99, "u_quantity": 10}} {"schema": {"type": "struct", "fields": [{"type": "string", "optional": false, "field": "u_name"},{"type": "float", "optional": false, "field": "u_price"}, {"type": "int64","optional":false,"field": "u_quantity"}],"optional": false,"name": "products"}, "payload": {"u_name": "pen", "u_price": 1.99, "u_quantity": 5}} ``` ## Override Default Configuration Properties You can override the replication factor using `confluent.topic.replication.factor`. For example, when using a Kafka cluster as a destination with less than three brokers (for development and testing) you should set the `confluent.topic.replication.factor` property to `1`. You can override producer-specific properties by using the `producer.override.*` prefix (for source connectors) and consumer-specific properties by using the `consumer.override.*` prefix (for sink connectors). You can use the defaults or customize the other properties as well. For example, the `confluent.topic.client.id` property defaults to the name of the connector with `-licensing` suffix. You can specify the configuration settings for brokers that require SSL or SASL for client connections using this prefix. You cannot override the cleanup policy of a topic because the topic always has a single partition and is compacted. Also, do not specify serializers and deserializers using this prefix; they are ignored if added. ### CSV with Schema Example This example reads CSV files and writes them to Kafka. It parses them using the schema specified in `key.schema` and `value.schema`. 1. Create a data directory and generate test data. ```bash curl "https://api.mockaroo.com/api/58605010?count=1000&key=25fd9c80" > "data/csv-spooldir-source.csv" ``` 2. Create a `spooldir.properties` file with the following contents: ```properties name=CsvSchemaSpoolDir tasks.max=1 connector.class=com.github.jcustenborder.kafka.connect.spooldir.SpoolDirCsvSourceConnector input.path=/path/to/data input.file.pattern=csv-spooldir-source.csv error.path=/path/to/error finished.path=/path/to/finished halt.on.error=false topic=spooldir-testing-topic csv.first.row.as.header=true key.schema={\n \"name\" : \"com.example.users.UserKey\",\n \"type\" : \"STRUCT\",\n \"isOptional\" : false,\n \"fieldSchemas\" : {\n \"id\" : {\n \"type\" : \"INT64\",\n \"isOptional\" : false\n }\n }\n} value.schema={\n \"name\" : \"com.example.users.User\",\n \"type\" : \"STRUCT\",\n \"isOptional\" : false,\n \"fieldSchemas\" : {\n \"id\" : {\n \"type\" : \"INT64\",\n \"isOptional\" : false\n },\n \"first_name\" : {\n \"type\" : \"STRING\",\n \"isOptional\" : true\n },\n \"last_name\" : {\n \"type\" : \"STRING\",\n \"isOptional\" : true\n },\n \"email\" : {\n \"type\" : \"STRING\",\n \"isOptional\" : true\n },\n \"gender\" : {\n \"type\" : \"STRING\",\n \"isOptional\" : true\n },\n \"ip_address\" : {\n \"type\" : \"STRING\",\n \"isOptional\" : true\n },\n \"last_login\" : {\n \"type\" : \"STRING\",\n \"isOptional\" : true\n },\n \"account_balance\" : {\n \"name\" : \"org.apache.kafka.connect.data.Decimal\",\n \"type\" : \"BYTES\",\n \"version\" : 1,\n \"parameters\" : {\n \"scale\" : \"2\"\n },\n \"isOptional\" : true\n },\n \"country\" : {\n \"type\" : \"STRING\",\n \"isOptional\" : true\n },\n \"favorite_color\" : {\n \"type\" : \"STRING\",\n \"isOptional\" : true\n }\n }\n} ``` 3. Load the SpoolDir CSV Source connector. ```bash confluent local load spooldir --config spooldir.properties ``` #### IMPORTANT Don’t use the [Confluent CLI](https://docs.confluent.io/confluent-cli/current/index.html) in production environments. 4. Validate messages are sent to Kafka serialized with Avro. ```bash kafka-avro-console-consumer --topic spooldir-testing-topic --from-beginning --bootstrap-server localhost:9092 ``` ## JSON Source Connector Example This example follows the same steps as the Quick Start. Review the Quick Start for help running the Confluent Platform and installing the Spool Dir connectors. 1. Generate a JSON dataset using the command below: ```bash curl "https://api.mockaroo.com/api/17c84440?count=500&key=25fd9c80" > "json-spooldir-source.json" ``` 2. Create a `spooldir.properties` file with the following contents: ```properties name=JsonSpoolDir tasks.max=1 connector.class=com.github.jcustenborder.kafka.connect.spooldir.SpoolDirJsonSourceConnector input.path=/path/to/data input.file.pattern=json-spooldir-source.json error.path=/path/to/error finished.path=/path/to/finished halt.on.error=false topic=spooldir-json-topic ``` 3. Load the SpoolDir JSON Source connector using the Confluent CLI [confluent local services connect connector load](https://docs.confluent.io/confluent-cli/current/command-reference/local/services/connect/connector/confluent_local_services_connect_connector_load.html) command. ```bash confluent local load spooldir --config spooldir.properties ``` #### IMPORTANT Don’t use the [confluent local](https://docs.confluent.io/confluent-cli/current/command-reference/local/index.html) commands in production environments. ### Passwordless OAuth/OIDC authentication with client assertion Starting with version 8.0, Confluent Platform supports OAuth client assertion, a secure credential management with passwordless authentication. It uses asymmetric encryption-based authentication, extending Confluent Platform OAuth, and allows you to: * Avoid deploying username and password while securing Confluent Platform. * Streamline and automate client credential rotation on a periodic basis without manual intervention for the client applications. A client assertion is a JSON Web Token (JWT) with a collection of information for sharing identity and security information, and it is presented as proof of the client’s identity. The following client assertion flows are supported in CFK: * JSON Web Token (JWT) assertion retrieval from file flow JWT assertion retrieval from file flow is not recommended for production use cases. Instead, you should use local client assertion flow for production. * Local client assertion flow In CFK 3.0, OAuth client assertion is supported for the following resources: * Day 1 components: Kafka, KRaft, MDS, Schema Registry * Day 2 application resources: KafkaTopic, Kafka REST Class, ConfluentRoleBinding, Schema, SchemaExporter, ClusterLinking ## Customize Confluent Platform pods with Pod Overlay Confluent for Kubernetes (CFK) supports a subset of Kubernetes PodTemplateSpec in the CFK API (`spec.podTemplate` in the component custom resource) where you configure StatefulSet PodTemplate for the Confluent Platform components. . To set and use additional Kubernetes features that are not supported by the CFK API, you can use the Pod Overlay feature. Example use cases that you can use the Pod Overlay feature include: * To deploy a Confluent Platform cluster with a custom init container that runs alongside the CFK init container. In this case, the custom init container runs before the CFK init container. * To use a newly introduced Kubernetes feature that has not been added to the CFK API. Make sure that you do not have conflict values between what’s set in the CFK podTemplate API and in Pod Overlay. For example, if you specify `podSecurityContext` in the `kafka.spec.podTemplate`, you cannot use Pod Overlay to specify different values in `spec.template.spec.securityContext`. To use Pod Overlay: 1. Create a template file (``) with the settings you want to add: ```yaml spec: template: ``` The template file has to start with `spec: template: `, and it has to follow the [Kubernetes StatefulSetSpec API](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.26/#statefulsetspec-v1-apps). You can configure fields only inside `spec.template`. Fields specified outside of `spec.template` will be considered as invalid. The following example is for a custom init container: ```yaml spec: template: spec: initContainers: - name: busybox image: busybox:1.28 command: ["echo","I am a custom init-conatiner"] imagePullPolicy: IfNotPresent hostNetwork: true dnsPolicy: ClusterFirstWithHostNet ``` Note that when `hostNetwork:` is set to `true`, `dnsPolicy:` must be set to `ClusterFirstWithHostNet`. 2. Create a ConfigMap (``) using the file created in the previous step (``). You must use `pod-template.yaml` as the key with `--from-file` option. ```bash kubectl create configmap --from-file=pod-template.yaml= -n ``` 3. Add the `platform.confluent.io/pod-overlay-configmap-name` annotation on the Confluent Platform component resource CR. For example: ```yaml kind: Kafka metadata: name: kafka namespace: operator annotations: platform.confluent.io/pod-overlay-configmap-name: ``` According to the Kubernetes convention, a ConfigMap can only be referenced by the pods residing in the same namespace. So CFK will look for `` within the same namespace as the component CR object. For configuration examples, see [the tutorial for Pod Overlay](https://github.com/confluentinc/confluent-kubernetes-examples/tree/master/advanced-configuration/pod-overlay).