Important
You are viewing documentation for an older version of Confluent Platform. For the latest, click here.
Spool Dir Connectors for Confluent Platform¶
The Kafka Connect Spool Dir connector provides the capability to watch a directory for files and read the data as new files are written to the input directory. Once a file has been read, it will be placed into the configured finished.path
directory. Each record in the input file will be converted based on the user-supplied schema or an auto-generated schema.
The following connectors are included with the Connect Spool Dir connector package:
- CSV Source Connector for Confluent Platform
- JSON Source Connector for Confluent Platform
- Schemaless JSON Source Connector for Confluent Platform
- Line-Delimited Source Connector for Confluent Platform
- Extended Log File Format Source Connector for Confluent Platform
Install the Spool Dir Connector Package¶
You can install this connector by using the Confluent Hub client (recommended) or you can manually download the ZIP file.
Install the connector using Confluent Hub¶
- Prerequisite
- Confluent Hub Client must be installed. This is installed by default with Confluent Enterprise.
Navigate to your Confluent Platform installation directory and run the following command to install the latest (latest
) connector version. The connector must be installed on every machine where Connect will run.
confluent-hub install jcustenborder/kafka-connect-spooldir:latest
You can install a specific version by replacing latest
with a version number. For example:
confluent-hub install jcustenborder/kafka-connect-spooldir:1.0.31
Install the connector manually¶
Download and extract the ZIP file for your connector and then follow the manual connector installation instructions.
License¶
The Spool Dir connector is an open source connector and does not require a Confluent Enterprise License.
Configuration Properties¶
For a complete list of configuration properties, see the specific connector documentation.
Note
For an example of how to get Kafka Connect connected to Confluent Cloud, see Distributed Cluster in Connect Kafka Connect to Confluent Cloud.
Quick Start¶
The following steps show the SpoolDirCsvSourceConnector
loading a mock CSV file to a Kafka topic named spooldir-testing-topic
. The other connectors are similar but load from different file types.
- Prerequisites
- Confluent Platform
- Confluent CLI (requires separate installation)
Install the connector through the Confluent Hub Client.
# run from your Confluent Platform installation directory confluent-hub install jcustenborder/kafka-connect-spooldir:latest
Tip
By default, it will install the plugin into
share/confluent-hub-components
and add the directory to the plugin path. If this is the first connector you have installed, you may need to restart the connect server for the plugin path change to take effect.Start Confluent Platform using the Confluent CLI confluent local commands.
Tip
The command syntax for the Confluent CLI development commands changed in 5.3.0. These commands have been moved to
confluent local
. For example, the syntax forconfluent start
is nowconfluent local start
. For more information, see confluent local.confluent local start
Create a data directory and generate test data.
mkdir data && curl "https://api.mockaroo.com/api/58605010?count=1000&key=25fd9c80" > "data/csv-spooldir-source.csv"
Set up directories for files with errors and files that finished successfully.
mkdir error && mkdir finished
Create a
spooldir.json
file with the following contents:{ "name": "CsvSpoolDir", "config": { "tasks.max": "1", "connector.class": "com.github.jcustenborder.kafka.connect.spooldir.SpoolDirCsvSourceConnector", "input.path": "/path/to/data", "input.file.pattern": "csv-spooldir-source.csv", "error.path": "/path/to/error", "finished.path": "/path/to/finished", "halt.on.error": "false", "topic": "spooldir-testing-topic", "csv.first.row.as.header": "true", "schema.generation.enabled": "true" } }
Load the SpoolDir CSV Source Connector.
Caution
You must include a double dash (
--
) between the topic name and your flag. For more information, see this post.confluent local load spooldir -- -d spooldir.json
Important
Don’t use the confluent local commands in production environments.
Confirm that the connector is in a
RUNNING
state.confluent local status spooldir
Confirm that the messages are being sent to Kafka.
kafka-avro-console-consumer \ --bootstrap-server localhost:9092 \ --property schema.registry.url=http://localhost:8081 \ --topic spooldir-testing-topic \ --from-beginning | jq '.'
Confirm that the source CSV file has been moved to the
finished
directory.