Important
You are viewing documentation for an older version of Confluent Platform. For the latest, click here.
Installing and Running Replicator Executable¶
You can use the Replicator Executable to run Replicator from the command line. The Replicator Executable bundles a distributed Kafka Connect Worker and a Replicator Connector in a single application.
Install the Replicator Executable by downloading Confluent Enterprise from http://confluent.io/download/. The Replicator Executable is not available in Confluent Open Source.
Extract the contents of the archive or install Confluent Enterprise with the package manager of your Linux distribution.
Confluent Enterprise is available in rpm
and deb
packages as well as zip
and tar.gz
archives.
Once Confluent Enterprise is installed, find Replicator Executable as bin/replicator
within your installation directory.
Running bin/replicator
without arguments will print on your terminal a list of all the available command line arguments.
Configure producers and consumers with Replicator Executable¶
The Replicator Executable expects the properties that are required to establish a connection with the origin and
destination clusters to be passed using files. Use --consumer.config
to point to the file that contains the
properties that will allow Replicator to connect to the origin cluster. For example, a file named
consumer.properties
with contents:
zookeeper.connect=localhost:2171
bootstrap.servers=localhost:9082
Next, place the producer properties for the destination cluster in a file and pass this file via the –producer.config
argument. For example, point to a file called producer.properties
containing:
zookeeper.connect=localhost:2181
bootstrap.servers=localhost:9092
Note
The ZooKeeper properties are optional.
The properties that are specified in here do not require prefixes specific to the origin, destination, producer, or
consumer (bootstrap.servers
as opposed to dest.kafka.bootstrap.servers
) as required in the Replicator connector
configuration. This is something that Replicator Executable can infer from the command line parameter.
The parameters --consumer.config
and --producer.config
are required in every call to bin/replicator
.
To enable monitoring interceptors, you may include their properties in the same files, or in separate ones, that you
will pass to Replicator Executable using the parameters --consumer.monitoring.config
and
--producer.monitoring.config
respectively. These properties do not require a producer.
or consumer.
prefix.
For example, you can use interceptor.classes
as opposed to producer.interceptor.classes
.
Configure replication and run Replicator Executable¶
The other required parameter you need to pass to Replicator Executable is --cluster.id
. This parameter defines a
unique identifier for the cluster that is formed when several instances of Replicator Executable start with the same
--cluster.id
.
Note
All instances of the Replicator Executable with the same --cluster.id
should be started with the exact same overall configuration.
You can now specify the configuration properties related for data replication. There are multiple ways to do this:
- Store all the configuration in a file, and pass this file to Replicator Executable using the parameter
--replication.config
. For example:
$ replicator \
--consumer.config ./consumer.properties \
--producer.config ./producer.properties \
--cluster.id replicator \
--replication.config ./replication.properties
- Pass the replication properties from the command line using individual parameters, each corresponding to a property. For example, specify the original topics using
--topic.whitelist
. Confluent license can be passed using--confluent.license
. For example:
$ replicator \
--consumer.config ./consumer.properties \
--producer.config ./producer.properties \
--cluster.id replicator \
--topic.whitelist test-topic \
--confluent.license "XYZ"
- Use a mixture of some replication properties in a file and the rest using command line arguments. For example:
$ replicator \
--consumer.config ./consumer.properties \
--producer.config ./producer.properties \
--cluster.id replicator \
--replication.config ./replication.properties \
--topic.whitelist test-topic
Configuring Logging with Replicator Executable¶
Replicator Executable reads logging settings from the file etc/kafka-connect-replicator/replicator-log4j.properties
.
By default, it writes to the console, but for production deployments you should log to a file.
By default it just writes to the console, but for production deployments you will want to log to a file. Before you start Replicator Executable, add these lines to the replicator-log4j.properties
file:
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.File=logs/replicator.log
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=[%d] %p %m (%c:%L)%n
log4j.appender.file.MaxFileSize=10MB
log4j.appender.file.MaxBackupIndex=5
log4j.appender.file.append=true
Add the newly defined appender file that you created above to the log4j.rootLogger
parameter:
# By default
log4j.rootLogger=INFO, stdout, file
License Key¶
Without the license key, you can use Replicator Executable for a 30-day trial period. If you are a Confluent customer,
you can contact customer support and ask for a Replicator license key. Then, use the key you received from Confluent
support with --confluent.license
command line parameter or by adding it to the confluent.license
property
within the replication configuration file you pass to --replication.config
.
Command Line Parameters of Replicator Executable¶
The available commands line parameters are:
Command line parameter | Value | Description |
---|---|---|
--blacklist |
<Topic Blacklist> | A comma-separated list of topics that should not be replicated, even if they are included in the whitelist or matched by the regular expression. |
--cluster.id |
<Replicator Cluster Id> (required) | Specifies the unique identifier for the Replicator cluster. |
--cluster.threads |
<Total Replicator threads> | The total number of threads across all workers in the Replicator cluster. If this command starts another Replicator worker in an existing cluster, this can be used to change the number of threads in the whole cluster. |
--confluent.license |
<Confluent License Key> | Your Confluent license key that enables you to use Replicator. Without the license key, you can use Replicator for a 30-day trial period. If you are a subscriber, please contact Confluent Support for more information. |
--consumer.config |
<consumer.properties> (required) | Specifies the location of the file that contains the configuration settings for the consumer reading from the origin cluster. |
--consumer.monitoring.config |
<consumer-monitoring.properties> | Specifies the location of the file that contains the producer settings for the Kafka cluster where monitoring information about the Replicator consumer is to be sent. This must be specified if monitoring is to be enabled, but may point to a different Kafka cluster than the origin or destination clusters. Use the same file as –producer-config to write metrics to the destination cluster. |
-h , --help |
Display help information | |
--producer.config |
<producer.properties> (required) | Specifies the location of the file that contains the configuration settings for the producer writing to the destination cluster. |
--producer.monitoring.config |
<producer-monitoring.properties> | Specifies the location of the file that contains the producer settings for the Kafka cluster where monitoring information about the Replicator producer is to be sent. This must be specified if monitoring is to be enabled, but may point to a different Kafka cluster than the origin or destination clusters. Use the same file as –producer-config to write metrics to the destination cluster. |
--replication.config |
<replication.properties> | Specifies the location of the file that contains the configuration settings for replication. When used, any property in this file can be overridden via a command line parameter. When this is not supplied, all of the properties defining how topics are to be replicated should be specified on the command line. |
--topic.auto.create |
true/false | Whether to automatically create topics in the destination cluster if required. |
--topic.config.sync |
true/false | Whether to periodically sync topic configuration to the destination cluster. |
--topic.config.sync.interval.ms |
<Topic Config Sync Interval(ms)> | How often to check for configuration changes when ‘topic.config.sync’ is enabled. |
--topic.create.backoff.ms |
<Topic Creation Backoff(ms)> | Time to wait before retrying auto topic creation or expansion. |
--topic.poll.interval.ms |
<Topic Config Sync Interval(ms)> | Specifies how frequently to poll the source cluster for new topics |
--topic.preserve.partitions |
true/false | Whether to automatically increase the number of partitions in the destination cluster to match the source cluster and ensure that messages replicated from the source cluster use the same partition in the destination cluster. |
--topic.regex |
<Regular Expression to Match Topics for Replication> | A regular expression that matches the names of the topics to be replicated. Any topic that matches this expression (or is listed in the whitelist) and not in the blacklist will be replicated. |
--topic.rename.format |
<Rename Format> | A format string for the topic name in the destination cluster, which may contain ${topic} as a placeholder for the originating topic name. For example, ${topic}_dc1 for the topic ‘orders’ will map to the destination topic name ‘orders_dc1.’ Can be placed inside the file specified by –replication.config. |
--topic.timestamp.type |
<Topic Timestamp Type> | The timestamp type for the topics in the destination cluster. |
--whitelist |
<Topic Whitelist> | A comma-separated list of the names of topics that should be replicated. Any topic that is in this list and not in the blacklist will be replicated. |