Kafka Connect Spool Dir Connectors

The Kafka Connect Spool Dir Connectors provide the capability to watch a directory for files and read the data as new files are written to the input directory. Once a file has been read, it will be placed into the configured finished.path directory. Each record in the input file will be converted based on the user-supplied schema or an auto-generated schema.

The following connectors are included with the Connect Spool Dir Connector package:

Install Spool Dir Connectors

You can install this connector by using the Confluent Hub client (recommended) or you can manually download the ZIP file.

Install the connector using Confluent Hub

Prerequisite
Confluent Hub Client must be installed. This is installed by default with Confluent Enterprise.

Navigate to your Confluent Platform installation directory and run this command to install the latest (latest) connector version. The connector must be installed on every machine where Connect will be run.

confluent-hub install jcustenborder/kafka-connect-spooldir:latest

You can install a specific version by replacing latest with a version number. For example:

confluent-hub install jcustenborder/kafka-connect-spooldir:1.0.31

Install Connector Manually

Download and extract the ZIP file for your connector and then follow the manual connector installation instructions.

License

The Spool Dir connector is an open source connector and does not require a Confluent Enterprise License.

Quick Start

The following steps show the SpoolDirCsvSourceConnector loading a mock CSV file to an Kafka topic named spooldir-testing-topic. The other connectors are similar but load from different file types.

  1. Install the connector through the Confluent Hub Client.

    # run from your Confluent Platform installation directory
    confluent-hub install jcustenborder/kafka-connect-spooldir:latest
    

    Tip

    By default, it will install the plugin into share/confluent-hub-components and add the directory to the plugin path. If this is the first connector you have installed, you may need to restart the connect server for the plugin path change to take effect.

  2. Start the Confluent Platform.

    confluent start
    
  3. Create a data directory and generate test data.

    mkdir data && curl "https://api.mockaroo.com/api/58605010?count=1000&key=25fd9c80" > "data/csv-spooldir-source.csv"
    
  4. Set up directories for files with errors and files that finished successfully.

    mkdir error && mkdir finished
    
  5. Create a spooldir.json file with the following contents:

    {
      "name": "CsvSpoolDir",
      "config": {
        "tasks.max": "1",
        "connector.class": "com.github.jcustenborder.kafka.connect.spooldir.SpoolDirCsvSourceConnector",
        "input.path": "/path/to/data",
        "input.file.pattern": "csv-spooldir-source.csv",
        "error.path": "/path/to/error",
        "finished.path": "/path/to/finished",
        "halt.on.error": "false",
        "topic": "spooldir-testing-topic",
        "csv.first.row.as.header": "true",
        "schema.generation.enabled": "true"
      }
    }
    
  6. Load the SpoolDir CSV Source Connector.

    confluent load spooldir​ -d spooldir.json
    

    Important

    Don’t use the Confluent CLI in production environments.

  7. Confirm that the connector is in a RUNNING state.

    confluent status spooldir
    
  8. Confirm that the messages are being sent to Kafka.

    kafka-avro-console-consumer \
        --bootstrap-server localhost:9092 \
        --property schema.registry.url=http://localhost:8081 \
        --topic spooldir-testing-topic \
        --from-beginning | jq '.'
    
  9. Confirm that the source CSV file has been moved to the finished directory.