.. _connect_userguide:

Getting Started with |kconnect-long|
------------------------------------

.. meta::
   
   :title: Getting Started with Kafka Connect
   :keywords: Connect, Kafka Connect, Kafka connectors, Connect worker, connectors
   :description: This document provides instructions for getting started with Kafka Connect.

This document provides information about how to get started with
|kconnect-long|. You should read and understand :ref:`Kafka Connect Concepts
<connect_concepts>` before getting started. The following topics are covered in this document:

* :ref:`connect_userguide-planning-install`
* :ref:`connect_installing_plugins`
* :ref:`connect_userguide_standalone_config` and :ref:`connect_configuring_workers`
* :ref:`connect_configuring_converters`
* :ref:`connect-override-producer-consumer`
* :ref:`Connect Reporter <userguide-connect-reporter>`
* :ref:`connect-ops-config-provider`
* :ref:`Next Steps (additional references and demo links) <connect_next-steps>`

.. _connect_userguide-planning-install:

-------------------------
Deployment Considerations
-------------------------

|kconnect-long| has only one required prerequisite in order to get started; that
is, a set of |ak| brokers. These |ak| brokers can be earlier broker versions or
the latest version. See :ref:`cross-component-compatibility` for details.

Even though there is only one prerequisite, there are a few deployment options
to consider beforehand. Understanding and acting on these deployment options
ensures your |kconnect-long| deployment will scale and support the long-term
needs of your data pipeline.

|sr-long|
~~~~~~~~~

Although :ref:`Schema Registry <schemaregistry_kafka_connect>` is not a required
service for |kconnect-long|, it enables you to easily use Avro, Protobuf, and
JSON Schema as common data formats for the |ak| records that connectors read
from and write to. This keeps the need to write custom code at a minimum and
standardizes your data in a flexible format. You also get the added benefit of
schema evolution and enforced compatibility rules. For additional information,
see :ref:`schemaregistry_kafka_connect` and :ref:`connect_configuring_converters`.

.. _connect_standalone_v_distributed:

Standalone vs. Distributed Mode
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Connectors and tasks are logical units of work and run as a process. This
process is called a **worker** in |kconnect-long|. There are two modes for
running workers: *standalone mode* and *distributed mode*. You should identify
which mode works best for your environment before getting started.

**Standalone mode** is useful for development and testing |kconnect-long| on a local machine. It can also be used for environments that typically use single agents (for example, sending web server logs to |ak|).

**Distributed mode** runs |kconnect| workers on multiple machines (nodes). These form a |kconnect| cluster. |kconnect-long| distributes running connectors across the cluster. You can add more nodes or remove nodes as your needs evolve. 

Distributed mode is also more fault tolerant. If a node unexpectedly leaves the
cluster, |kconnect-long| automatically distributes the work of that node to
other nodes in the cluster. And, because |kconnect-long| stores connector
configurations, status, and offset information inside the |ak| cluster where it
is safely replicated, losing the node where a |kconnect| worker runs does not
result in any lost data.

.. important::
   
   Distributed mode is recommended for production environments because of scalability, high availability, and management benefits.

Operating Environment
~~~~~~~~~~~~~~~~~~~~~

|kconnect| workers operate well in containers and in managed environments, such
as Kubernetes, Apache Mesos, Docker Swarm, or Yarn. The distributed worker
stores all states in |ak| so it's easier to manage a cluster. And, by design,
|kconnect-long| does not automatically handle restarting or scaling workers.
This means your existing cluster management solution can continue to be used
transparently. Note that the standalone worker state is stored on the local file system.

.. seealso::
   
   * See :ref:`cpdocker_intro` for more information about using Docker.
   * See :ref:`operator-about-intro` for information about deploying and managing |cp| in a Kubernetes environment.

|kconnect-long| workers are JVM processes that can run on shared machines with
sufficient resources. Hardware requirements for |kconnect| workers are similar
to that of standard Java producers and consumers. Resource requirements mainly
depend on the types of connectors operated by the workers. More memory is
required for environments where large messages are sent. More memory is also required for environments where large numbers of messages get buffered before being written in aggregate form to an external system. Using compression continuously requires a more powerful CPU. 

.. tip::
   
   If you have multiple workers running concurrently on a single machine, make
   sure you know the resource limits (CPU and memory). Start with the default
   heap size setting and :ref:`monitor internal metrics <kafka_monitoring>` and
   the system. Verify that the CPU, memory, and network (10GbE or greater) are
   sufficient for the load.

.. _connect_installing_plugins:

-----------------------------
Installing |kconnect| Plugins
-----------------------------

|kconnect-long| is designed to be extensible so developers can create custom connectors, transforms, or converters, and users can install and run them.

A |kconnect-long| plugin is a set of JAR files containing the implementation of one or more connectors, transforms, or converters. |kconnect| isolates each plugin from one another so that libraries in one plugin are not affected by the libraries in any other plugins. This is very important when mixing and matching connectors from multiple providers.

.. caution::

   It is common to have many plugins installed in a |kconnect| deployment. Make sure to only have **one version** of each plugin installed.

The |cp| comes bundled with several commonly used connectors, transforms, and converters. All of these can be used without having to first install the corresponding plugins. Bundled connectors include the following:

* :ref:`JDBC Source Connector <jdbc-source-configs-overview>`: reads tables from common DBMSes and writes them as records to |ak| topics.
* :ref:`JDBC Sink Connector <connect_jdbc-sink>`: consumes records from |ak| topics and inserts, updates, and deletes rows in DBMS tables.
* :ref:`Elasticsearch Sink Connector <elasticsearch-overview>`: consumes records from |ak| topics and writes them as documents to Elasticsearch.
* :ref:`Amazon S3 sink connector <connect_s3>`: consumes records from |aK| topics and writes them as aggregate container files to an S3 bucket.

For a full list of supported connectors, see :ref:`connect_bundled_connectors`.

.. note::
   
   Make sure to check out `Confluent Hub <https://www.confluent.io/hub/>`__. You can browse the large ecosystem of connectors, transforms, and converters to find the components that suit your needs and easily install them into your local |cp| environment. See :ref:`confluent_hub_client` for |c-hub| Client installation instructions.

.. _connect_plugins:

A |kconnect-long| plugin can be:

* a **directory** on the file system that contains all required JAR files and third-party dependencies for the plugin. This is most common and is preferred.
* a single **uber JAR** containing all of the class files for the plugin and its third-party dependencies.

.. important::
   
   A plugin should never contain any libraries provided by the |kconnect-long|
   runtime.
   
|kconnect-long| finds the plugins using a *plugin path* defined as a comma-separated list of directory paths in the ``plugin.path`` :ref:`worker configuration property <connect_configuring_workers>`. The following shows an example ``plugin.path`` worker configuration property:

::
   
   plugin.path=/usr/local/share/kafka/plugins

To install a plugin, place the plugin directory or uber JAR (or a symbolic link
that resolves to one of these) in a directory already listed in the plugin path.
Or, you can update the plugin path by adding the absolute path of the directory
containing the plugin. Using the plugin path example above, you would create a
``/usr/local/share/kafka/plugins`` directory **on each machine** running
|kconnect| and then place the plugin directories (or uber JARs) there.

When you start your |kconnect| workers, each worker discovers all connectors,
transforms, and converter plugins found inside the directories on the plugin
path. When you use a connector, transform, or converter, the |kconnect| worker
loads the classes from the respective plugin first, followed by the
|kconnect-long| runtime and Java libraries. |kconnect| explicitly avoids all of
the libraries in other plugins. This prevents conflicts and makes it very easy
to add and use connectors and transforms developed by different providers.

Earlier versions of |kconnect-long| required a different approach to installing
connectors, transforms, and converters. All the scripts for running |kconnect|
recognized the ``CLASSPATH`` environment variable. You would export this
variable to define the list of paths to the connector JAR files. An example of
the older ``CLASSPATH`` export variable mechanism is shown below:

.. codewithvars:: bash

     export CLASSPATH=/path/to/my/connectors/*
     bin/connect-standalone standalone.properties new-custom-connector.properties


.. caution::
   
   Exporting ``CLASSPATH`` is not recommended. Using this mechanism to create a
   path to plugins can result in library conflicts that can cause
   |kconnect-long| and connectors to fail. Use the ``plugin.path`` configuration
   property which properly isolates each plugin from other plugins and
   libraries.
   
.. include:: includes/classpath-v-pluginpath.rst

.. _connect_userguide_standalone_config:

---------------
Running Workers
---------------

The following sections provide information about running workers in standalone mode or distributed mode.

Standalone Mode
~~~~~~~~~~~~~~~

Standalone mode is typically used for development and testing, or for
lightweight, single-agent environments (for example, sending web server logs to
|ak|). The following shows an example command that launches a worker in
standalone mode:

.. codewithvars:: bash

      bin/connect-standalone worker.properties connector1.properties [connector2.properties connector3.properties ...]

The first parameter (``worker.properties``) is the :ref:`worker configuration
properties <connect_configuring_workers>` file. Note that ``worker.properties``
is an example file name. You can use any valid file name for your worker
configuration file. This file gives you control over settings such as the |ak|
cluster to use and serialization format. For an example configuration file that
uses `Avro <http://avro.apache.org/docs/current/>`__ and :ref:`Schema Registry
<schemaregistry_kafka_connect>` in a standalone mode, open the file located at
``etc/schema-registry/connect-avro-standalone.properties``. You can copy and
modify this file for use as your standalone worker properties file.

The second parameter (``connector1.properties``) is the 
:ref:`connector configuration properties <connect_managing_standalone_mode>` file. All connectors have configuration properties that are loaded with the worker. As shown in the example, you can launch multiple connectors using this command.

If you run multiple standalone workers on the same host machine, the following two configuration properties must be unique for each worker:

* ``offset.storage.file.filename``: the storage file name for connector offsets. This file is stored on the local filesystem in standalone mode. Using the same file name for two workers will cause offset data to be deleted or overwritten with different values.
* ``rest.port``: the port the REST interface listens on for HTTP requests. This must be unique for each worker running on one host machine.

.. _connect_userguide_distributed_config:

Distributed Mode
~~~~~~~~~~~~~~~~

|kconnect| stores connector and task configurations, offsets, and status in several |ak| topics. These are referred to as |kconnect-long| internal topics. It is important that these internal topics have a high replication factor, a compaction cleanup policy, and an appropriate number of partitions.

|kconnect-long| can automatically create the internal topics when it starts up,
using the |kconnect| distributed worker configuration properties to specify the
topic names, replication factor, and number of partitions for these topics.
|kconnect| verifies that the properties meet the requirements and creates all
topics with compaction cleanup policy.

Allowing |kconnect| to automatically create these internal topics is
recommended. However, you may want to manually create the topics. Two examples
of when you would manually create these topics are provided below:

* For security purposes, the broker may be configured to not allow clients like |kconnect| to create |ak| topics.
* You may require other advanced topic-specific settings that are not automatically set by |kconnect| or that are different than the auto-created settings.

The following example commands show how to manually create compacted and
replicated |ak| topics before starting |kconnect|. Make sure to adhere to the
:ref:`distributed worker <connect_userguide_dist_worker_config>` guidelines when entering parameters.


:: 
   
   # config.storage.topic=connect-configs
   bin/kafka-topics --create --bootstrap-server localhost:9092 --topic connect-configs --replication-factor 3 --partitions 1 --config cleanup.policy=compact
    
::
  
  # offset.storage.topic=connect-offsets
  bin/kafka-topics --create --bootstrap-server localhost:9092 --topic connect-offsets --replication-factor 3 --partitions 50 --config cleanup.policy=compact
    
::
  

  # status.storage.topic=connect-status
  bin/kafka-topics --create --bootstrap-server localhost:9092 --topic connect-status --replication-factor 3 --partitions 10 --config cleanup.policy=compact
  
.. note::
   
   All workers in a |kconnect| cluster use the same internal topics. Workers in
   a different cluster must use different internal topics. See
   :ref:`connect_userguide_dist_worker_config` for details.

Distributed mode does not have any additional command-line parameters other than
loading the worker configuration file. New workers will either start a new group
or join an existing one with a matching ``group.id``. Workers then coordinate
with the consumer groups to distribute the work to be done. See
:ref:`connect_userguide_dist_worker_config` for details about how new workers
get added.

The following shows an example command that launches a worker in distributed mode:

.. codewithvars:: bash

     bin/connect-distributed worker.properties
     
For an example distributed mode configuration file that uses `Avro <http://avro.apache.org/docs/current/>`__ and :ref:`Schema Registry
<schemaregistry_intro>`, open
``etc/schema-registry/connect-avro-distributed.properties``. You can make a copy
of this file, modify it, use it as the new ``worker.properties`` file. Note that
``worker.properties`` is an example file name. You can use any valid file name
for your properties file.

In standalone mode, connector configuration property files are added as
commmand-line parameters. However, in distributed mode, connectors are deployed
and managed using a REST API request. To create connectors, you start the worker
and then make a REST request to create the connector. REST request examples are
provided in many :ref:`supported connector <connect_bundled_connectors>`
documents. For instance, see the :ref:`Azure Blob Storage Source connector
REST-based example <azure_blob_storage_source_connector-rest-example>` for one
example.

.. note::
   
   If you run multiple distributed workers on one host machine for development
   and testing, the ``rest.port`` configuration property must be unique for each
   worker. This is the port the REST interface listens on for HTTP requests.

.. _connect_configuring_workers:

-------------------------------
Worker Configuration Properties
-------------------------------

Regardless of the mode used, |kconnect-long| workers are configured by passing a
worker configuration properties file as the first parameter. For example:

.. codewithvars:: bash

     bin/connect-distributed worker.properties
   
Sample worker configuration properties files are included with |cp| to help you get started. The location for Avro sample files are listed below:

* ``etc/schema-registry/connect-avro-distributed.properties``
* ``etc/schema-registry/connect-avro-standalone.properties``

Use one of these files as a starting point. These files contain the necessary
configuration properties to use the Avro converters that integrate with |sr|.
They are configured to work well with |ak| and |sr| services running locally.
They do not require running more than a single broker, making it easy for you to
test |kconnect-long| locally. 

The example configuration files can also be modified for production deployments
by using the correct hostnames for |ak| and |sr| and acceptable (or default)
values for the internal topic replication factor.

.. _common-worker-configs-ug:

Common Configuration Properties
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The following are several common worker configuration properties you need to get
started. Many more configuration options are provided in :ref:`Kafka Connect
Worker Configs <connect_allconfigs>`.

``bootstrap.servers``
  A list of host/port pairs to use for establishing the initial connection to the |ak| cluster. The client uses all servers regardless of which servers are specified for bootstrapping. The list only impacts the initial hosts used to discover the full set of servers. This list should be in the form ``host1:port1,host2:port2,...``. Since these servers are just used for the initial connection to discover the full cluster membership (which may change dynamically), this list need not contain the full set of servers (you may want more than one though, in case a server is down).

  * Type: list
  * Default: [localhost:9092]
  * Importance: high

``key.converter``
  Converter class for key |kconnect| data. This controls the format of the data that will be written to |ak| for source connectors or read from |ak| for sink connectors. Popular formats include Avro and JSON.

  * Type: class
  * Default:
  * Importance: high

``value.converter``
  Converter class for value |kconnect| data. This controls the format of the data that will be written to |ak| for source connectors or read from |ak| for sink connectors. Popular formats include Avro and JSON.

  * Type: class
  * Default:
  * Importance: high

``rest.host.name``
  Hostname for the REST API. If this is set, it will only bind to this interface.

  * Type: string
  * Importance: low

``rest.port``
  Port for the REST API to listen on.

  * Type: int
  * Default: 8083
  * Importance: low

``plugin.path``
  The comma-separated list of paths to directories that contain :ref:`Kafka Connect plugins <connect_installing_plugins>`.

  * Type: string
  * Default:
  * Importance: low

.. _connect_userguide_dist_worker_config:

Distributed Configuration Properties
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:ref:`Distributed Workers <connect_userguide_distributed_config>` that are
configured with matching ``group.id`` values automatically discover each other
and form a |kconnect-long| cluster. All Workers in the cluster use the same
three internal |ak| topics to share connector configurations, offset data, and
status updates. For this reason all distributed worker configurations **in the
same Connect cluster** must have matching ``config.storage.topic``,
``offset.storage.topic``, and ``status.storage.topic`` properties.
   
In addition to the three required topic names, the distributed worker
configuration **should** have identical values for the properties listed below.
This ensures that any worker in the cluster will create missing topics with the
desired property values. Note that these configuration properties have
:ref:`practical default values <connect-dist-work-config>`.

* ``config.storage.replication.factor``
* ``offset.storage.replication.factor``
* ``offset.storage.partitions``
* ``status.storage.replication.factor``
* ``status.storage.partitions``

As each distributed worker starts up, it uses the internal |ak| topics if they
already exist. If not, the worker attempts to create the topics using the worker
configuration properties. This gives you the option of manually creating these
topics before starting |kconnect-long|, if you require topic-specific settings
or when |kconnect-long| does not have the necessary :ref:`privileges to create
the topics <connect_security>`. If you do create the topics manually, make sure
to follow the guidelines provided in the list of :ref:`configuration properties
<connect-distributed-worker-properties>`.

If you need to create a distributed worker that is independent of an existing |kconnect| cluster, you must create new worker configuration properties. The following configuration properties must be different from the worker configurations used in an existing cluster:

* ``group.id``
* ``config.storage.topic``
* ``offset.storage.topic``
* ``status.storage.topic``

.. important::
   
   * |kconnect| clusters cannot share Group IDs or internal topics. Simply changing a ``group.id`` will not create a new worker separate from an existing |kconnect| cluster. The new ``group.id`` must also have unique internal topics associated with it. This requires setting unique ``config.storage.topic``, ``offset.storage.topic``, and ``status.storage.topic`` configuration properties for the new ``group.id``.
   
   * You also must use different connector names than those used in the  existing |kconnect| cluster since a consumer group is created based on the connector  name. Each connector in a |kconnect| cluster shares the same consumer group.

The following lists and defines the distributed worker properties:

``group.id``
  A unique string that identifies the |kconnect| cluster group this worker belongs to.

  * Type: string
  * Default: connect-cluster
  * Importance: high
  
  .. _connect-distributed-worker-properties:

``config.storage.topic``
  The name of the topic where connector and task configuration data are stored. This *must* be the same for all workers with the same ``group.id``. At startup, |kconnect-long| attempts to automatically create this topic with a single-partition and compacted cleanup policy to avoid losing data. It uses the existing topic if present. If you choose to create this topic manually, **always** create it as a compacted topic with a single partition and a high replication factor (3x or more).

  * Type: string
  * Default: ""
  * Importance: high

``config.storage.replication.factor``
  The replication factor used when |ak| Connects creates the topic used to store connector and task configuration data. This should **always**
  be at least 3 for a production system, but cannot be larger than the number of |ak| brokers in the cluster.

  * Type: short
  * Default: 3
  * Importance: low

``offset.storage.topic``
  The name of the topic where connector and task configuration offsets are stored. This *must* be the same for all workers with the same ``group.id``.
  At startup, |kconnect-long| attempts to automatically create this topic with multiple partitions and a compacted cleanup policy to avoid losing
  data. It uses the existing topic if present. If you choose to create this topic manually, **always** create it as a compacted, highly replicated (3x or more) topic with a large number of partitions to support large |kconnect-long| clusters (that is, 25 or 50 partitions like the |ak| built-in ``__consumer_offsets`` topic).

  * Type: string
  * Default: ""
  * Importance: high

``offset.storage.replication.factor``
  The replication factor used when |kconnect| creates the topic used to store connector offsets. This should **always**
  be at least 3 for a production system, but cannot be larger than the number of |ak| brokers in the cluster.

  * Type: short
  * Default: 3
  * Importance: low

``offset.storage.partitions``
  The number of partitions used when |kconnect| creates the topic used to store connector offsets. A large value is necessary to support large |kconnect-long| clusters (that is, 25 or 50 partitions like the |ak| built-in ``__consumer_offsets`` topic).

  * Type: int
  * Default: 25
  * Importance: low

``status.storage.topic``
  The name of the topic where connector and task configuration status updates are stored. This *must* be the same for all workers
  with the same ``group.id``. At startup, |kconnect-long| attempts to automatically create this topic with multiple partitions and a compacted cleanup policy to avoid losing data. It uses the existing topic if present. If you choose to create this topic manually, **always** create it as a compacted, highly replicated (3x or more) topic with multiple partitions.

  * Type: string
  * Default: ""
  * Importance: high

``status.storage.replication.factor``
  The replication factor used when |kconnect| creates the topic used to store connector and task status updates.
  This should **always** be at least 3 for a production system, but cannot be larger than the number of |ak| brokers in the cluster.

  * Type: short
  * Default: 3
  * Importance: low

``status.storage.partitions``
  The number of partitions used when |kconnect| creates the topic used to store connector and task status updates.

  * Type: int
  * Default: 5
  * Importance: low

Standalone Configuration Properties
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In addition to the common worker configuration options, the following property is available in standalone mode.

``offset.storage.file.filename``
  The file to store connector offsets in. By storing offsets on disk, a standalone process can be stopped and started on a single node and resume where it previously left off.

  * Type: string
  * Default: ""
  * Importance: high
 

For additional configuration properties see the following sections:

* |kconnect| and |sr|: See :ref:`schemaregistry_kafka_connect`.
* Producer configuration properties: See :ref:`kafka_producer`.
* Consumer configuration properties: See :ref:`kafka_consumer`.
* All |ak| configuration properties: See :ref:`cp-config-reference`.

.. _connect_configuring_converters:

------------------------------------
Configuring Key and Value Converters
------------------------------------

The converters packaged with the |cp| are listed below:

.. include:: includes/converter-list.rst

The ``key.converter`` and ``value.converter`` properties are where you specify
the type of :ref:`converter <connect_converters>` to use. Default converters for
all connectors are specified in the :ref:`worker configuration
<common-worker-configs-ug>`. However, any connector can override the default
converters by completely defining a key, value, and header converter. We
recommend you define the default key, value, and header converters that most
connectors can use in the worker, and then define them in a connector's
configuration if that connector requires different converters. The default
``header.converter`` defined in the worker serializes header values as strings
using the ``StringConverter`` and deserializes header values to the most
appropriate numeric, boolean, array, or map representation. Schemas are not
serialized but are inferred upon deserialization when possible.

.. important::
   
   Converter configuration properties in the :ref:`worker configuration
   <common-worker-configs-ug>` are used by all connectors running on the worker,
   unless a converter is added to a connector configuration.

   If a converter is added to a connector configuration, all converter
   properties in the worker configuration prefixed with the converter type added
   (``key.converter.*`` and/or ``value.converter.*``) are not used. Be careful
   when adding converters to a connector configuration. For example, if the
   following value converter properties are present in the worker configuration:

   .. sourcecode:: properties

      value.converter=io.confluent.connect.avro.AvroConverter
      value.converter.schema.registry.url=http://localhost:8081

   and you add the following properties to your connector configuration:

   .. sourcecode:: json

      {
       "value.converter": "AvroConverter",
       "value.converter.basic.auth.credentials.source": "USER_INFO",
       "value.converter.basic.auth.user.info": "<username>:<password>"
      }

   an error will occur when the connector is started because the required |sr| URL property ``value.converter.schema.registry.url=http://localhost:8081`` is not provided to the converter.

The following sections provide converter descriptions and examples. For details
about how these converters work in |sr|, see
:ref:`schemaregistry_kafka_connect`.

Avro
~~~~

To use the ``AvroConverter`` with :ref:`Schema Registry
<schemaregistry_kafka_connect>`, you specify the ``key.converter`` and
``value.converter`` properties in the worker configuration. An additional
converter property must also be added that provides the |sr| URL. The example
below shows the ``AvroConverter`` key and value properties that are added to the
configuration:

.. sourcecode:: properties
   
   key.converter=io.confluent.connect.avro.AvroConverter
   key.converter.schema.registry.url=http://localhost:8081
   value.converter=io.confluent.connect.avro.AvroConverter
   value.converter.schema.registry.url=http://localhost:8081
   
The Avro key and value converters can be used independently from each other. For example, you may want to use a ``StringConverter`` for keys and the ``AvroConverter`` or ``JsonConverter`` for values. Independent key and value properties are shown below:

.. sourcecode:: properties
   
      key.converter=org.apache.kafka.connect.storage.StringConverter
      value.converter=io.confluent.connect.avro.AvroConverter
      value.converter.schema.registry.url=http://localhost:8081
      
.. _connect-json-protobuf:

JSON Schema and Protobuf
~~~~~~~~~~~~~~~~~~~~~~~~

Both JSON Schema and Protobuf converters are implemented in the same way as the
Avro converter. The following examples show a couple of configuration examples
using the ``ProtobufConverter`` or ``JsonSchemaConverter`` for the value
converter and using ``StringConverter`` for the key:

.. sourcecode:: properties
   
      key.converter=org.apache.kafka.connect.storage.StringConverter
      value.converter=io.confluent.connect.protobuf.ProtobufConverter
      value.converter.schema.registry.url=http://localhost:8081
      
.. sourcecode:: properties
   
      key.converter=org.apache.kafka.connect.storage.StringConverter
      value.converter=io.confluent.connect.json.JsonSchemaConverter
      value.converter.schema.registry.url=http://localhost:8081

Both Avro and JSON Schema express their schemas as JSON and are lenient if
unrecognized properties are encountered. This allows the converter to use custom
JSON properties to capture any |kconnect-long| schema objects with no equivalent
in Avro or JSON Schema. 

However, Protobuf has its own Interface Definition Language (IDL) which differs
from JSON and does not allow for custom ad-hoc properties. For this reason, the
conversion from the |kconnect-long| schema to Protobuf may cause data loss or
inconsistencies if there is no direct equivalent in Protobuf.

For example, the |kconnect-long| schema supports int8, int16, and int32 data
types. Protobuf supports int32 and int64. When |kconnect| data is converted to
Protobuf, int8 and int16 fields are mapped to int32 or int64 with no indication
that the source was int8 or int16.

With JSON Schema, only number and integer type fields are supported. However,
the JSON Schema Converter (``JsonSchemaConverter``) will store data with no JSON
Schema equivalent in a property named ``connect.type``. This property is ignored
by the JSON Schema parser, so fields can be restored to the proper type by
downstream components.

For full encoding details, see `JSON encoding for Avro <https://avro.apache.org/docs/current/spec.html#json_encoding>`__ and `JSON encoding for Protobuf <https://developers.google.com/protocol-buffers/docs/proto3#json>`__. Additionally, JSON Schema supports three means of combining schemas: `allOf, anyOf, and oneOf <https://json-schema.org/understanding-json-schema/reference/combining.html>`__. However, the JSON Schema converter only supports oneOf, treating it similarly to how the Avro converter handles unions and how the Protobuf converter handles oneof.

.. note::
   
   If you're configuring Avro, Protobuf, or JSON Schema converters in an environment configured for Role-Based Access Control (RBAC), see :ref:`key and value converters with RBAC <connect-rbac-key-value-converters>`.
   
For details about how converters work with |sr|, see
:ref:`schemaregistry_kafka_connect`. 

The following converters are not used with |sr|.
   
JSON (without |sr|)
~~~~~~~~~~~~~~~~~~~

If you need to use JSON without |sr| for |kconnect| data, you can use the
``JsonConverter`` supported with |ak|. The example below shows the
``JsonConverter`` key and value properties that are added to the configuration:

.. sourcecode:: properties
  
  key.converter=org.apache.kafka.connect.json.JsonConverter
  value.converter=org.apache.kafka.connect.json.JsonConverter
  key.converter.schemas.enable=false
  value.converter.schemas.enable=false

When the properties ``key.converter.schemas.enable`` and
``value.converter.schemas.enable`` are set to ``true``, the key or value is not
treated as plain JSON, but rather as a composite JSON object containing both an
internal schema and the data. When these are enabled for a source connector,
both the schema and data are in the composite JSON object. When these are
enabled for a sink connector, the schema and data are extracted from the
composite JSON object. Note that this implementation never uses |sr|.
  
When the properties ``key.converter.schemas.enable`` and
``value.converter.schemas.enable`` are set to ``false`` (the default), only the data is passed along, without the schema. This reduces the payload overhead for applications that do not need a schema.

String format and raw bytes
~~~~~~~~~~~~~~~~~~~~~~~~~~~

``org.apache.kafka.connect.storage.StringConverter`` is used to convert the internal |kconnect| format to simple string format. When converting |kconnect| data to bytes, the schema is ignored and data is converted to a simple string. When converting from bytes to |kconnect| data format, the converter returns an optional string schema and a string (or null).

``org.apache.kafka.connect.converters.ByteArrayConverter`` does not convert data. Bytes are passed through the connector directly with no conversion.
   
.. tip::
   
   For a deep dive into converters, see: `Converters and Serialization Explained <https://www.confluent.io/blog/kafka-connect-deep-dive-converters-serialization-explained/>`__.


.. _connect-override-producer-consumer:

----------------------------------
|kconnect| Producers and Consumers
----------------------------------

Internally, |kconnect-long| uses standard Java producers and consumers to
communicate with |ak|. |kconnect| configures default settings for these producer
and consumer instances. These settings include properties that ensure data is
delivered to |ak| in order and without any data loss.

Default |kconnect| Producer properties
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

By default, |kconnect| configures the |ak| producers for source connectors with
the following important properties:

* Points the producer's bootstrap servers to the same |ak| cluster used by the |kconnect| cluster.
* Configures key and value serializers that work with the connector's key and value converters.
* Generates a producer ``client.id``  based on the connector and task, using the pattern ``connector-producer-<connectorName>-<taskId>``.
* Sets ``acks=all`` to ensure each message produced is properly written to all in-sync replicas (ISRs).
* For retriable exceptions, |kconnect| configures the producer with the following properties to reduce the potential for data duplication during infinite retries:

  - ``request.timeout.ms=<max>``
  - ``max.block.ms=<max>``
  - ``max.in.flight.requests.per.connection=1``
  - ``delivery.timeout.ms=<max>``

You can override these defaults by using the ``producer.*`` properties in the worker configuration or by using the ``producer.override.*`` properties in connector configurations, but changing these default properties may compromise the delivery guarantees of |kconnect|.

Producer and Consumer overrides
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You may need to override default settings, other than those described in the previous section. The following two examples show when this might be required.

**Worker override example**

Consider a standalone process that runs a log file connector. For the logs being
collected, you might prefer low-latency, best-effort delivery. That is, when
there are connectivity issues, minimal data loss may be acceptable for your
application in order to avoid data buffering on the client. This keeps log
collection as lightweight as possible. 

To override :ref:`producer configuration properties <cp-config-producer>` and
:ref:`consumer configuration properties <cp-config-consumer>` for all connectors
controlled by the worker, you prefix worker configuration properties with
``producer.`` or ``consumer.`` as shown in the example below:

::

   producer.retries=1
   consumer.max.partition.fetch.bytes=10485760

The example above overrides the default producer ``retries`` property to retry
sending messages only one time. The consumer override increases the default
amount of data fetched from a partition per request to 10 MB.

These configuration changes are applied to **all connectors** controlled by the worker. Be careful making any changes to these settings when running distributed
mode workers.

**Per-connector override example**

By default, the producers and consumers used for connectors are created using the same properties that |kconnect| uses for its own internal topics. That means that the same |ak| principal needs to be able to read and write to all the internal topics and all of the topics used by the connectors. 

You may want the producers and consumers used for connectors to use a different
|ak| principal. It is possible for connector configurations to override worker
properties used to create producers and consumers. These are prefixed with
``producer.override.`` and ``consumer.override.``. For additional information
about per-connector overrides, see :ref:`connect-override-config`.

.. note::
  
  For detailed information about producers and consumers, see
  :ref:`kafka_producer` and :ref:`kafka_consumer`. For a list of configuration
  properties, see :ref:`cp-config-producer` and :ref:`cp-config-consumer`.

.. _userguide-connect-reporter:

-------------------
|kconnect| Reporter
-------------------

.. include:: includes/connect-reporter.rst

Reporter and Kerberos security
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. include:: includes/reporter-security-properties.rst

.. _disable-connect-reporter:

Disabling |kconnect| Reporter
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. include:: includes/disable-connect-reporter.rst

.. _connect-ops-config-provider:

------------------------
ConfigProvider interface
------------------------

.. include:: includes/config-provider.rst
   
.. _connect_next-steps:

----------
Next Steps
----------

After getting started with your deployment, you may want check out the following additional |kconnect-long| documentation:

* :ref:`connect_quickstart`
* :ref:`connect-logging`
* :ref:`Upgrade Kafka Connect <upgrade_connect>`
* :ref:`connect_security`
* :ref:`connect_userguide_rest`
* :ref:`schemaregistry_kafka_connect`
* :ref:`Upgrading a Connector Plugin <connect_upgrading_plugin>`
* :ref:`connect-override-config`
* :ref:`Adding Connectors or Software (Docker) <connect_adding_connectors_to_images>`

.. tip::
   
   Try out our :devx-examples:`end-to-end demos|README.md` for |kconnect-long| on-premises, |ccloud|, and |co-long|.