Important

You are viewing documentation for an older version of Confluent Platform. For the latest, click here.

Configuration Properties¶

The BigQuery sink connector can be configured using a variety of configuration properties.

datasets

Names for the datasets Kafka topics write to. The dataset names take the form of <topic regex>=<dataset>.

Type: list
Importance: high

project

The BigQuery project to write to.

Type: string
Importance: high

topics

A list of|ak|topics to read from.

Type: list
Importance: high

autoUpdateSchemas

Designates whether or not to automatically update BigQuery schemas.

Type: boolean
Default: false
Importance: high

bigQueryMessageTimePartitioning

Designates whether or not to use the message time when inserting records. The default is the connector processing time.

Type: boolean
Default: false
Importance: high

autoCreateTables

Automatically create BigQuery tables if they don’t already exist.

Type: boolean
Default: false
Importance: high

gcsBucketName

The name of the bucket where Google Cloud Storage (GCS) blobs are located. These blobs are used to batch-load to BigQuery. This is applicable only if enableBatchLoad is configured.

Type: string
Default: “”
Importance: high

queueSize

The maximum size (or -1 for no maximum size) of the worker queue for BigQuery write requests before all topics are paused. This is a soft limit; the size of the queue can go over this before topics are paused. All topics resume once a flush is requested or the size of the queue drops under half of the maximum size.

Type: long
Default: -1
Valid Values: [-1,…]
Importance: high

bigQueryRetry

The number of retry attempts made for a BigQuery request that fails with a backend error or a quota exceeded error.

Type: int
Default: 0
Valid Values: [0,…]
Importance: medium

bigQueryRetryWait

The minimum amount of time, in milliseconds, to wait between retry attempts for a BigQuery backend or quota exceeded error.

Type: long
Default: 1000
Valid Values: [0,…]
Importance: medium

keyfile

The file containing a JSON key with BigQuery service account credentials.

Type: string
Default: null
Importance: medium

sanitizeTopics

Designates whether to automatically sanitize topic names before using them as table names. If not enabled, topic names are used as table names.

Type: boolean
Default: false
Importance: medium

schemaRetriever

A class that can be used for automatically creating tables and/or updating schemas.

Type: class
Default: null
Importance: medium

threadPoolSize

The size of the BigQuery write thread pool. This establishes the maximum number of concurrent writes to BigQuery.

Type: int
Default: 10
Valid Values: [1,…]
Importance: medium

topicsToTables

A list of mappings from topic regexes to table names. Note the regex must include capture groups that are referenced in the format string using placeholders (for example, $1). These take the form of <topic regex>=<format string>.

Type: list
Default: null
Importance: medium

allBQFieldsNullable

If true, no fields in any produced BigQuery schema are REQUIRED. All non-nullable Avro fields are translated as NULLABLE (or REPEATED, if arrays).

Type: boolean
Default: false
Importance: low

avroDataCacheSize

The size of the cache to use when converting schemas from Avro to Kafka Connect.

Type: int
Default: 100
Valid Values: [0,…]
Importance: low

batchLoadIntervalSec

The interval, in seconds, in which to attempt to run GCS to BigQuery load jobs. Only relevant if enableBatchLoad is configured.

Type: int
Default: 120
Importance: low

convertDoubleSpecialValues

Designates whether +Infinity is converted to Double.MAX_VALUE and whether -Infinity and NaN are converted to Double.MIN_VALUE to ensure successfull delivery to BigQuery.

Type: boolean
Default: false
Importance: low

enableBatchLoad

Beta Feature Use with caution. The sublist of topics to be batch loaded through GCS.

Type: list
Default: “”
Importance: low

includeKafkaData

Whether to include an extra block containing the Kafka source topic, offset, and partition information in the resulting BigQuery rows.

Type: boolean
Default: false
Importance: low