Best Practices for Oracle CDC Source Connector for Confluent Platform¶

This document describes the best practices for configuring the Oracle CDC connector.

Check Database Prerequisites
Use Latest Connector Version
oracle.service.name for PDB
Full Supplemental Logging
Archive Log Files Retention Period
Unique Redo Log Topic Per Connector
Tasks Count
Redo Log Fetch Size
Connection Pool Settings
Up-to-date Database Stats
Numeric Type Mapping
Transactions Buffering
Snapshots in Parallel
Infrequently Updated Databases

Check Database Prerequisites¶

Some errors may occur if you do not satisfy all the database prerequisites before configuring the connector. To ensure you meet all requirements, see Oracle Database Prerequisites before moving forward.

Use Latest Connector Version¶

Use the latest version of the connector from Confluent Hub. This includes every enhancement, bug fix, and performance improvement done to the connector. The changelog records the specific changes in each version.

`oracle.service.name` for PDB¶

Set oracle.service.name to the container database (CDB) service name when using a pluggable database (PDB).

Full Supplemental Logging¶

The connector’s performance depends on the size of the redo log and the count of redo log records that it needs to process. To minimize the redo log generated, enable full supplemental logging only for the tables of interest and not the entire database.

For a multi-tenant database with tables of interest in a PDB:

Up to version 21c, enable minimal supplemental logging for the root container (CDB$ROOT), and enable full supplemental logging for the tables of interest in the PDB.
For version 23ai, enable minimal supplemental logging either at the root container (CDB$ROOT) or at the PDB level, and enable full supplemental logging for the tables of interest in the PDB.

Archive Log Files Retention Period¶

Set the retention time for archived redo log files to be longer than the maximum time the connector is allowed to be out of service. Confluent recommends you set log retention policies to at least 24 hours. If you have a shorter retention policy and your table doesn’t have many activities, the connector may not be able to find a record with the last committed SCN.

Unique Redo Log Topic Per Connector¶

You must not share redo log topics among connectors. This can cause unexpected behavior.

Tasks Count¶

You can configure the connector to use as few as one task by setting tasks.max to 1, or scale to as many tasks as required to capture all table changes. For maximum parallelism, set the number of tasks to be one plus the number of tables being captured.

Redo Log Fetch Size¶

Consider increasing the redo.log.row.fetch.size from the default (10) to increase throughput. To find an optimal setting, benchmark the workload with different values. Optimal value depends on various factors, including network latency, available memory, and driver version.

Connection Pool Settings¶

The connector uses one connection to stream changes from the Oracle database. In addition, it uses one connection per table during the initial snapshot phase. Once the snapshot is complete, only the task that reads redo log from the database will require a connection to the database to stream database changes into the redo log topic.

Up-to-date Database Stats¶

The connector reads from the ALL_TABLES view to get the tables accessible to the current user. Ensure the statistics are up to date to improve query performance. This is especially important on Confluent Cloud where timeouts are set on these queries.

DBMS_STATS.GATHER_DICTIONARY_STATS
DBMS_STATS.GATHER_FIXED_OBJECTS_STATS

Numeric Type Mapping¶

You can use the numeric.mapping configuration property to map numeric types with known precision and scale to their best matching primitive type. numeric.mapping documents the specific precision and scale required on each numeric type to be able to map to a given connect primitive type.

Transactions Buffering¶

Starting from version 2.0.0, the connector buffers uncommitted transactions in the connector’s memory. If you have long running transactions and record.buffer.mode is set to connector, ensure you have sufficient memory for your Connect workers. If not, consider breaking these long running transactions into smaller transactions, avoiding a potential out-of-memory issue.

Snapshots in Parallel¶

You can configure the connector to perform snapshots in parallel for large tables that are partitioned in Oracle. Set snapshot.by.table.partitions to true to assign more than one task to one table (if the table is partitioned). This reduces the overall time required to perform the snapshot by scaling out the number of tasks.

Note

When running a connector with snapshot.by.table.partitions=true, create table-specific topics ahead of time.

Infrequently Updated Databases¶

Use the heartbeat feature (see the following note) in environments where the connector is configured to capture tables that are infrequently updated so that the offsets stored in the source offsets topic can move forward. Otherwise, a task restart could cause the connector to fail with the ORA-01291 missing logfile or ORA-01292: no log file error if the archived redo log file corresponding to the stored source offset has been purged from the database.