Important
You are viewing documentation for an older version of Confluent Platform. For the latest, click here.
Configuration Properties¶
The BigQuery sink connector can be configured using a variety of configuration properties.
datasets
Names for the datasets Kafka topics write to. The dataset names take the form of
<topic regex>=<dataset>
.- Type: list
- Importance: high
project
The BigQuery project to write to.
- Type: string
- Importance: high
topics
A list of|ak|topics to read from.
- Type: list
- Importance: high
autoUpdateSchemas
Designates whether or not to automatically update BigQuery schemas.
- Type: boolean
- Default: false
- Importance: high
bigQueryMessageTimePartitioning
Designates whether or not to use the message time when inserting records. The default is the connector processing time.
- Type: boolean
- Default: false
- Importance: high
autoCreateTables
Automatically create BigQuery tables if they don’t already exist.
- Type: boolean
- Default: false
- Importance: high
gcsBucketName
The name of the bucket where Google Cloud Storage (GCS) blobs are located. These blobs are used to batch-load to BigQuery. This is applicable only if
enableBatchLoad
is configured.- Type: string
- Default: “”
- Importance: high
queueSize
The maximum size (or -1 for no maximum size) of the worker queue for BigQuery write requests before all topics are paused. This is a soft limit; the size of the queue can go over this before topics are paused. All topics resume once a flush is requested or the size of the queue drops under half of the maximum size.
- Type: long
- Default: -1
- Valid Values: [-1,…]
- Importance: high
bigQueryRetry
The number of retry attempts made for a BigQuery request that fails with a backend error or a quota exceeded error.
- Type: int
- Default: 0
- Valid Values: [0,…]
- Importance: medium
bigQueryRetryWait
The minimum amount of time, in milliseconds, to wait between retry attempts for a BigQuery backend or quota exceeded error.
- Type: long
- Default: 1000
- Valid Values: [0,…]
- Importance: medium
keyfile
The file containing a JSON key with BigQuery service account credentials.
- Type: string
- Default: null
- Importance: medium
sanitizeTopics
Designates whether to automatically sanitize topic names before using them as table names. If not enabled, topic names are used as table names.
- Type: boolean
- Default: false
- Importance: medium
schemaRetriever
A class that can be used for automatically creating tables and/or updating schemas.
- Type: class
- Default: null
- Importance: medium
threadPoolSize
The size of the BigQuery write thread pool. This establishes the maximum number of concurrent writes to BigQuery.
- Type: int
- Default: 10
- Valid Values: [1,…]
- Importance: medium
topicsToTables
A list of mappings from topic regexes to table names. Note the regex must include capture groups that are referenced in the format string using placeholders (for example,
$1
). These take the form of<topic regex>=<format string>
.- Type: list
- Default: null
- Importance: medium
allBQFieldsNullable
If true, no fields in any produced BigQuery schema are REQUIRED. All non-nullable Avro fields are translated as
NULLABLE
(orREPEATED
, if arrays).- Type: boolean
- Default: false
- Importance: low
avroDataCacheSize
The size of the cache to use when converting schemas from Avro to Kafka Connect.
- Type: int
- Default: 100
- Valid Values: [0,…]
- Importance: low
batchLoadIntervalSec
The interval, in seconds, in which to attempt to run GCS to BigQuery load jobs. Only relevant if
enableBatchLoad
is configured.- Type: int
- Default: 120
- Importance: low
convertDoubleSpecialValues
Designates whether +Infinity is converted to Double.MAX_VALUE and whether -Infinity and NaN are converted to Double.MIN_VALUE to ensure successfull delivery to BigQuery.
- Type: boolean
- Default: false
- Importance: low
enableBatchLoad
Beta Feature Use with caution. The sublist of topics to be batch loaded through GCS.
- Type: list
- Default: “”
- Importance: low
includeKafkaData
Whether to include an extra block containing the Kafka source topic, offset, and partition information in the resulting BigQuery rows.
- Type: boolean
- Default: false
- Importance: low