Debezium MySQL Source Connector Configuration Properties

The Debezium MySQL Source connector can be configured using a variety of configuration properties.

Note

These are properties for the self-managed connector. If you are using Confluent Cloud, see MySQL CDC Source (Debezium) Connector for Confluent Cloud.

Required Parameters

name
A unique name for the connector. Trying to register again with the same name will fail. This property is required by all Kafka Connect connectors.
connector.class
The name of the Java class for the connector. Always specify io.debezium.connector.mysql.MySqlConnector for the MySQL connector.
tasks.max

The maximum number of tasks that should be created for this connector. The MySQL connector always uses a single task and therefore does not use this value, so the default is always acceptable.

  • Type: int
  • Default: 1
database.hostname

IP address or hostname of the MySQL database server.

  • Type: string
database.port

Integer port number of the MySQL database server.

  • Type: int
  • Importance: low
  • Default: 3306
database.user

Name of the MySQL user to use when connecting to the MySQL database server.

  • Type: string
  • Importance: high
database.password

Password to use when connecting to the MySQL database server.

  • Type: password
  • Importance: high
database.server.name

Logical name that identifies and provides a namespace for the particular MySQL database server/cluster being monitored. The logical name should be unique across all other connectors, since it is used as a prefix for all Kafka topic names emanating from this connector. Defaults to host:_port_, where host is the value of the database.hostname property and port is the value of the database.port property. Confluent recommends changing the default to a meaningful name.

  • Type: string
  • Importance: low
  • Default: database.hostname:database.port
database.server.id

A numeric ID of this database client, which must be unique across all currently-running database processes in the MySQL cluster. This connector joins the MySQL database cluster as another server (with this unique ID) so it can read the binlog. By default, a random number is generated between 5400 and 6400. Confluent recommends setting a value.

  • Type: int
  • Importance: low
  • Default: random

Settings include the following:

  • bytes: represents binary data as a byte array.
  • base64: represents binary data as a base64-encoded String.
  • hex: represents binary data as a hex-encoded (base16) String.
database.include.list

An optional comma-separated list of regular expressions that match database names to be monitored. Any database name not included in the include list will be excluded from monitoring. By default all databases will be monitored. May not be used with database.exclude.list.

  • Type: list of strings
  • Importance: low
  • Default: empty string
database.exclude.list

An optional comma-separated list of regular expressions that match database names to be excluded from monitoring. Any database name not included in the exclude list will be monitored. May not be used with database.include.list.

  • Type: list of strings
  • Importance: low
  • Default: empty string
table.include.list

An optional comma-separated list of regular expressions that match fully-qualified table identifiers for tables to be monitored. Any table not included in the include list will be excluded from monitoring. Each identifier is of the form databaseName.tableName. By default the connector will monitor every non-system table in each monitored schema. May not be used with table.exclude.list.

  • Type: list of strings
  • Importance: low
  • Default: empty string
table.exclude.list

An optional comma-separated list of regular expressions that match fully-qualified table identifiers for tables to be excluded from monitoring. Any table not included in the blacklist will be monitored. Each identifier is of the form databaseName.tableName. May not be used with table.include.list.

  • Type: list of strings
  • Importance: low
  • Default: empty string
column.exclude.list

An optional, comma-separated list of regular expressions that match the fully-qualified names of columns to exclude from change event record values. Fully-qualified names for columns are of the form databaseName.tableName.columnName.

  • Type: list of strings
  • Importance: low
  • Default: empty string
column.include.list

An optional, comma-separated list of regular expressions that match the fully-qualified names of columns to include in change event record values. Fully-qualified names for columns are of the form databaseName.tableName.columnName.

  • Type: list of strings
  • Importance: low
  • Default: empty string
column.truncate.to.length.chars

An optional comma-separated list of regular expressions that match the fully-qualified names of character-based columns. The column values are truncated in the change event message values if the field values are longer than the specified number of characters. Multiple properties with different lengths can be used in a single configuration, although in each the length must be a positive integer. Fully-qualified names for columns are in the form databaseName.tableName.columnName, or databaseName.schemaName.tableName.columnName.

  • Type: list of strings
  • Importance: low
  • Default: n/a
column.mask.with._length_.chars

An optional comma-separated list of regular expressions that match the fully-qualified names of character-based columns. The column values are replaced in the change event message values with a field value consisting of the specified number of asterisk (*) characters. Multiple properties with different lengths can be used in a single configuration, although in each the length must be a positive integer. Fully-qualified names for columns are in the form databaseName.tableName.columnName, or databaseName.schemaName.tableName.columnName.

  • Type: list of strings
  • Importance: low
  • Default: n/a
column.propagate.source.type

An optional comma-separated list of regular expressions that match the fully-qualified names of columns whose original type and length should be added as a parameter to the corresponding field schemas in the emitted change messages. The schema parameters __debezium.source.column.type, __debezium.source.column.length and _debezium.source.column.scale are used to propagate the original type name and length (for variable-width types), respectively. Useful to properly size corresponding columns in sink databases. Fully-qualified names for columns are in the form databaseName.tableName.columnName, or databaseName.schemaName.tableName.columnName.

  • Type: list of strings
  • Importance: low
  • Default: n/a
database.propagate.source.type

An optional, comma-separated list of regular expressions that match the database-specific data type name of columns whose original type and length should be added as a parameter to the corresponding field schemas in the emitted change event records. These schema parameters are used to propagate the original type name and length for variable-width types, respectively.

  1. __debezium.source.column.type
  2. __debezium.source.column.length
  1. __debezium.source.column.scale

This is useful to properly size corresponding columns in sink databases. Fully-qualified data type names are of one of these forms:

  1. databaseName.tableName.typeName
  2. databaseName.schemaName.tableName.typeName

For more information on how MySQL connectors map data types for the list of MySQL-specific data type names, see Data type mappings in the Debezium documentation.

  • Type: list of strings
  • Importance: low
  • Default: n/a
time.precision.mode

Time, date, and timestamps can be represented with different kinds of precision. Settings include the following:

  1. adaptive_time_microseconds: (the default) which captures the date, datetime and timestamp values exactly as they are in the database. It uses either millisecond, microsecond, or nanosecond precision values that are are based on the database column’s type. An exception to this are TIME type fields, which are always captured as microseconds.
  2. adaptive: (deprecated) captures the time and timestamp values exactly as they are the database using either millisecond, microsecond, or nanosecond precision values. These values are based on the database column type.
  3. connect: represents time and timestamp values using Kafka Connect’s built-in representations for Time, Date, and Timestamp. It uses millisecond precision regardless of database column precision.
  • Type: string
  • Importance: low
  • Default: adaptive_time_microseconds
decimal.handling.mode

Specifies how the connector should handle values for DECIMAL and NUMERIC columns. Settings include the following:

  1. precise: (the default) represents them precisely using java.math.BigDecimal values represented in change events in a binary form; or double represents them using double values, which may result in a loss of precision but will be far easier to use.
  2. string: encodes values as formatted string which is easy to consume but semantic information about the real type is lost.
  • Type: string
  • Importance: low
  • Default: precise
bigint.unsigned.handling.mode

Specifies how BIGINT UNSIGNED columns should be represented in change events. Settings include the following:

  1. precise uses java.math.BigDecimal to represent values, which are encoded in the change events using a binary representation and Kafka Connect’s org.apache.kafka.connect.data.Decimal type.
  2. long (the default) represents values using Java’s long, which may not offer the precision but will be far easier to use in consumers. long is usually the preferable setting. The precise setting should only be used when working with values larger than 2^63 (these values can not be conveyed using long).
  • Type: string
  • Importance: low
  • Default: long
include.schema.changes

Boolean value that specifies whether the connector should publish changes in the database schema to a Kafka topic with the same name as the database server ID. Each schema change will be recorded using a key that contains the database name and whose value includes the DDL statement(s). This is independent of how the connector internally records database history.

  • Type: boolean
  • Importance: low
  • Default: true
include.schema.comments

Boolean value that specifies whether the connector should parse and publish table and column comments on metadata objects. Enabling this option will bring the implications on memory usage. The number and size of logical schema objects is what largely impacts how much memory is consumed by the Debezium connectors, and adding potentially large string data to each of them can potentially be quite expensive.

  • Type: boolean
  • Importance: low
  • Default: false
include.query

Boolean value that specifies whether the connector should include the original SQL query that generated the change event. Note: This option requires MySQL be configured with the binlog_rows_query_log_events option set to ON. Query will not be present for events generated from the snapshot process.

Warning

Enabling this option may expose tables or fields explicitly blacklisted or masked by including the original SQL statement in the change event.

  • Type: boolean
  • Importance: low
  • Default: false
event.deserialization.failure.handling.mode

Specifies how the connector should react to exceptions during deserialization of binlog events. fail propagates the exception (indicating the problematic event and its binlog offset), causing the connector to stop. warn causes the problematic event to be skipped and the problematic event and its binlog offset to be logged (make sure that the logger is set to the WARN or ERROR level). ignore causes the problematic event to be skipped.

  • Type: string
  • Importance: low
  • Default: fail
inconsistent.schema.handling.mode

Specifies how the connector should react to binlog events that relate to tables that are not present in internal schema representation (i.e. internal representation is not consistent with database) fail throws an exception (indicating the problematic event and its binlog offset), causing the connector to stop. warn causes the problematic event to be skipped and the problematic event and its binlog offset to be logged (make sure that the logger is set to the WARN or ERROR level). ignore causes the problematic event to be skipped.

  • Type: string
  • Importance: low
  • Default: fail
max.queue.size

Positive integer value that specifies the maximum size of the blocking queue into which change events read from the database log are placed before they are written to Kafka. This queue can provide backpressure to the binlog reader when, for example, writes to Kafka are slower or if Kafka is not available. Events that appear in the queue are not included in the offsets periodically recorded by this connector. Defaults to 8192, and should always be larger than the maximum batch size specified in the max.batch.size property.

  • Type: int
  • Importance: low
  • Default: 8192
max.batch.size

Positive integer value that specifies the maximum size of each batch of events that should be processed during each iteration of this connector. Defaults to 2048.

  • Type: int
  • Importance: low
  • Default: 2048
max.queue.size.in.bytes

Long value for the maximum size in bytes of the blocking queue. The feature is disabled by default, it will be active if it’s set with a positive long value.

  • Type: long
  • Importance: low
  • Default: 0
poll.interval.ms

Positive integer value that specifies the number of milliseconds the connector should wait during each iteration for new change events to appear. Defaults to 500 milliseconds.

  • Type: int
  • Importance: low
  • Default: 500
connect.timeout.ms

A positive integer value that specifies the maximum time in milliseconds this connector should wait after trying to connect to the MySQL database server before timing out. Defaults to 30 seconds.

  • Type: string
  • Importance: low
  • Default: 30
gtid.source.includes

A comma-separated list of regular expressions that match source UUIDs in the GTID set used to find the binlog position in the MySQL server. Only the GTID ranges that have sources matching one of these include patterns will be used. May not be used with gtid.source.excludes.

  • Type: list of strings
  • Importance: low
gtid.source.excludes

A comma-separated list of regular expressions that match source UUIDs in the GTID set used to find the binlog position in the MySQL server. Only the GTID ranges that have sources matching none of these exclude patterns will be used. May not be used with gtid.source.includes.

  • Type: list of strings
  • Importance: low
gtid.new.channel.position (deprecated and scheduled for removal)

When set to latest, and when the connector sees a new GTID channel, the connector starts consuming from the last executed transaction in that GTID channel. If set to earliest, the Debezium connector starts reading that channel from the first available (not purged) GTID position. earliest is useful when you have a active-passive MySQL setup where Debezium is connected to the primary, in this case during failover the secondary with new UUID (and GTID channel) starts receiving writes before Debezium is connected. These writes would be lost when using latest.

  • Type: string
  • Importance: low
  • Default: earliest
tombstones.on.delete

Controls whether a tombstone event should be generated after a delete event. When set to true, the delete operations are represented by a delete event and a subsequent tombstone event. When set to false, only a delete event is sent. Emitting the tombstone event (the default behavior) allows Kafka to completely delete all events pertaining to the given key once the source record got deleted.

  • Type: string
  • Importance: low
  • Default: true
message.key.columns

A list of expressions that specify the columns that the connector uses to form custom message keys for change event records that it publishes to the Kafka topics for specified tables.

By default, Debezium uses the primary key column of a table as the message key for records that it emits. In place of the default, or to specify a key for tables that lack a primary key, you can configure custom message keys based on one or more columns.

To establish a custom message key for a table, list the table, followed by the columns to use as the message key. Each list entry takes the following format:

<fully-qualified_tableName>:_<keyColumn>_,<keyColumn>

To base a table key on multiple column names, insert commas between the column names.

Each fully-qualified table name is a regular expression in the following format:

<databaseName>.<tableName>

The property can include entries for multiple tables. Use a semicolon to separate table entries in the list. The following example sets the message key for the tables inventory.customers and purchase.orders:

inventory.customers:pk1,pk2;(.*).purchase.orders:pk3,pk4

For the table inventory.customer, the columns pk1 and pk2 are specified as the message key. For the purchase.orders tables in any database, the columns pk3 and pk4 server as the message key.

There is no limit to the number of columns that you use to create custom message keys. However, it’s best to use the minimum number that are required to specify a unique key.

  • Type: list
  • Default: n/a
binary.handling.mode

Specifies how binary columns (for example, blob, binary, varbinary) should be represented in change events.

  • Type: bytes or string
  • Importance: low
  • Default: Bytes

Advanced Parameters

connect.keep.alive

A Boolean value that specifies whether a separate thread should be used to ensure that the connection to the MySQL server/cluster is kept alive.

  • Type: boolean
  • Default: true
table.ignore.builtin

A Boolean value that specifies whether built-in system tables should be ignored. This applies regardless of the table include and exclude lists. By default, system tables are excluded from having their changes captured, and no events are generated when changes are made to any system tables.

  • Type: boolean
  • Default: true
database.ssl.mode

Specifies whether to use an encrypted connection. Possible settings are:

  1. disabled: Specifies the use of an unencrypted connection.
  2. preferred: Establishes an encrypted connection if the server supports secure connections. If the server does not support secure connections, falls back to an unencrypted connection.
  3. required: Establishes an encrypted connection or fails if one cannot be made for any reason.
  4. verify_ca: Behaves like required but additionally it verifies the server TLS certificate against the configured Certificate Authority (CA) certificates and fails if the server TLS certificate does not match any valid CA certificates.
  5. verify_identity: Behaves like verify_ca but additionally verifies the server certificate matches the host of the remote connection.
  • Type: string
  • Default: disabled

Database History Parameters

database.history.kafka.topic

The full name of the Kafka topic where the connector will store the database schema history.

  • Type: string
  • Importance: high
database.history.kafka.bootstrap.servers

A list of host/port pairs that the connector will use for establishing an initial connection to the Kafka cluster. This connection will be used for retrieving database schema history previously stored by the connector, and for writing each DDL statement read from the source database. This should point to the same Kafka cluster used by the Kafka Connect process.

  • Type: list of strings
  • Importance: high

Note

If the Kafka cluster is secured, you must add the security properties prefixed with database.history.consumer.* and database.history.producer.* to the connector configuration, as shown below:

"database.history.consumer.security.protocol": "SASL_SSL",
"database.history.consumer.sasl.mechanism": "PLAIN",
"database.history.consumer.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"key\" password=\"secret\";",

"database.history.producer.security.protocol": "SASL_SSL",
"database.history.producer.sasl.mechanism": "PLAIN",
"database.history.producer.sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required username=\"key\" password=\"secret\";",
database.history.kafka.recovery.poll.interval.ms

An integer value that specifies the maximum number of milliseconds the connector should wait during startup/recovery while polling for persisted data. The default is 100ms.

  • Type: int
database.history.kafka.recovery.attempts

The maximum number of times that the connector should try to read persisted history data before the connector recovery fails with an error. The maximum amount of time to wait after receiving no data is recovery.attempts x recovery.poll.interval.ms.

  • Type: int
  • Default: 4
database.history.skip.unparseable.ddl

A Boolean value that specifies whether the connector should ignore malformed or unknown database statements or stop processing so a human can fix the issue. The safe default is false. Skipping should be used only with care as it can lead to data loss or mangling when the binlog is being processed.

  • Type: boolean
  • Default: false
database.history.store.only.captured.tables.ddl

A Boolean value that specifies whether the connector should record all DDL statements true records only those DDL statements that are relevant to tables whose changes are being captured by Debezium. Set to true with care because missing data might become necessary if you change which tables have their changes captured.

The safe default is false.

  • Type: boolean
  • Default: false

Signal Parameters

signal.kafka.topic

The name of the Kafka topic that connector monitors for ad-hoc signals.

  • Type: string
signal.kafka.bootstrap.servers

A list of host/port pairs that the connector uses for establishing an initial connection to the Kafka cluster. Each pair should point to the same Kafka cluster used by the Kafka Connect process.

  • Type: list
signal.kafka.poll.timeout.ms

An integer value that specifies the maximum number of milliseconds the connector should wait when polling signals. The default is 100ms.

  • Type: int
  • Default: 100

Auto topic creation

For more information about Auto topic creation, see Configuring Auto Topic Creation for Source Connectors.

Note

Configuration properties accept regular expressions (regex) that are defined as Java regex.

topic.creation.groups

A list of group aliases that are used to define per-group topic configurations for matching topics. A default group always exists and matches all topics.

  • Type: List of String types
  • Default: empty
  • Possible Values: The values of this property refer to any additional groups. A default group is always defined for topic configurations.
topic.creation.$alias.replication.factor

The replication factor for new topics created by the connector. This value must not be larger than the number of brokers in the Kafka cluster. If this value is larger than the number of Kafka brokers, an error occurs when the connector attempts to create a topic. This is a required property for the default group. This property is optional for any other group defined in topic.creation.groups. Other groups use the Kafka broker default value.

  • Type: int
  • Default: n/a
  • Possible Values: >= 1 for a specific valid value or -1 to use the Kafka broker’s default value.
topic.creation.$alias.partitions

The number of topic partitions created by this connector. This is a required property for the default group. This property is optional for any other group defined in topic.creation.groups. Other groups use the Kafka broker default value.

  • Type: int
  • Default: n/a
  • Possible Values: >= 1 for a specific valid value or -1 to use the Kafka broker’s default value.
topic.creation.$alias.include

A list of strings that represent regular expressions that match topic names. This list is used to include topics with matching values, and apply this group’s specific configuration to the matching topics. $alias applies to any group defined in topic.creation.groups. This property does not apply to the default group.

  • Type: List of String types
  • Default: empty
  • Possible Values: Comma-separated list of exact topic names or regular expressions.
topic.creation.$alias.exclude

A list of strings representing regular expressions that match topic names. This list is used to exclude topics with matching values from getting the group’s specfic configuration. $alias applies to any group defined in topic.creation.groups. This property does not apply to the default group. Note that exclusion rules override any inclusion rules for topics.

  • Type: List of String types
  • Default: empty
  • Possible Values: Comma-separated list of exact topic names or regular expressions.
topic.creation.$alias.${kafkaTopicSpecificConfigName}

Any of the Changing Broker Configurations Dynamically for the version of the Kafka broker where the records will be written. The broker’s topic-level configuration value is used if the configuration is not specified for the rule. $alias applies to the default group as well as any group defined in topic.creation.groups.

  • Type: property values
  • Default: Kafka broker value

You can find more advanced configuration properties and details in the Debezium connector for MySQL documentation.

Note

Portions of the information provided here derives from documentation originally produced by the Debezium Community. Work produced by Debezium is licensed under Creative Commons 3.0.