Configuration Properties

The BigQuery sink connector can be configured using a variety of configuration properties.

datasets

Names for the datasets Kafka topics write to. The dataset names take the form of <topic regex>=<dataset>.

  • Type: list
  • Importance: high
project

The BigQuery project to write to.

  • Type: string
  • Importance: high
topics

A list of|ak|topics to read from.

  • Type: list
  • Importance: high
autoUpdateSchemas

Designates whether or not to automatically update BigQuery schemas.

  • Type: boolean
  • Default: false
  • Importance: high
bigQueryMessageTimePartitioning

Designates whether or not to use the message time when inserting records. The default is the connector processing time.

  • Type: boolean
  • Default: false
  • Importance: high
autoCreateTables

Automatically create BigQuery tables if they don’t already exist.

  • Type: boolean
  • Default: false
  • Importance: high
gcsBucketName

The name of the bucket where Google Cloud Storage (GCS) blobs are located. These blobs are used to batch-load to BigQuery. This is applicable only if enableBatchLoad is configured.

  • Type: string
  • Default: “”
  • Importance: high
queueSize

The maximum size (or -1 for no maximum size) of the worker queue for BigQuery write requests before all topics are paused. This is a soft limit; the size of the queue can go over this before topics are paused. All topics resume once a flush is requested or the size of the queue drops under half of the maximum size.

  • Type: long
  • Default: -1
  • Valid Values: [-1,…]
  • Importance: high
bigQueryRetry

The number of retry attempts made for a BigQuery request that fails with a backend error or a quota exceeded error.

  • Type: int
  • Default: 0
  • Valid Values: [0,…]
  • Importance: medium
bigQueryRetryWait

The minimum amount of time, in milliseconds, to wait between retry attempts for a BigQuery backend or quota exceeded error.

  • Type: long
  • Default: 1000
  • Valid Values: [0,…]
  • Importance: medium
keyfile

The file containing a JSON key with BigQuery service account credentials.

  • Type: string
  • Default: null
  • Importance: medium
sanitizeTopics

Designates whether to automatically sanitize topic names before using them as table names. If not enabled, topic names are used as table names.

  • Type: boolean
  • Default: false
  • Importance: medium
schemaRetriever

A class that can be used for automatically creating tables and/or updating schemas.

  • Type: class
  • Default: null
  • Importance: medium
threadPoolSize

The size of the BigQuery write thread pool. This establishes the maximum number of concurrent writes to BigQuery.

  • Type: int
  • Default: 10
  • Valid Values: [1,…]
  • Importance: medium
topicsToTables

A list of mappings from topic regexes to table names. Note the regex must include capture groups that are referenced in the format string using placeholders (for example, $1). These take the form of <topic regex>=<format string>.

  • Type: list
  • Default: null
  • Importance: medium
allBQFieldsNullable

If true, no fields in any produced BigQuery schema are REQUIRED. All non-nullable Avro fields are translated as NULLABLE (or REPEATED, if arrays).

  • Type: boolean
  • Default: false
  • Importance: low
avroDataCacheSize

The size of the cache to use when converting schemas from Avro to Kafka Connect.

  • Type: int
  • Default: 100
  • Valid Values: [0,…]
  • Importance: low
batchLoadIntervalSec

The interval, in seconds, in which to attempt to run GCS to BigQuery load jobs. Only relevant if enableBatchLoad is configured.

  • Type: int
  • Default: 120
  • Importance: low
convertDoubleSpecialValues

Designates whether +Infinity is converted to Double.MAX_VALUE and whether -Infinity and NaN are converted to Double.MIN_VALUE to ensure successfull delivery to BigQuery.

  • Type: boolean
  • Default: false
  • Importance: low
enableBatchLoad

Beta Feature Use with caution. The sublist of topics to be batch loaded through GCS.

  • Type: list
  • Default: “”
  • Importance: low
includeKafkaData

Whether to include an extra block containing the Kafka source topic, offset, and partition information in the resulting BigQuery rows.

  • Type: boolean
  • Default: false
  • Importance: low