Best Practices for Oracle CDC Source Connector for Confluent Platform

This document describes the best practices for configuring the Oracle CDC connector.

Check Database Prerequisites

Some errors may occur if you do not satisfy all the database prerequisites before configuring the connector. To ensure you meet all requirements, see Oracle Database Prerequisites before moving forward.

Use Latest Connector Version

Use the latest version of the connector from Confluent Hub. This includes every enhancement, bug fix, and performance improvement done to the connector. The changelog records the specific changes in each version.

oracle.service.name for PDB

Set oracle.service.name to the container database (CDB) service name when using a pluggable database (PDB).

Full Supplemental Logging

The connector’s performance depends on the size of the redo log and the count of redo log records that it needs to process. To minimize the redo log generated, enable full supplemental logging only for the tables of interest and not the entire database.

For a multi-tenant database with tables of interest in a PDB:

  • Up to version 21c, enable minimal supplemental logging for the root container (CDB$ROOT), and enable full supplemental logging for the tables of interest in the PDB.
  • For version 23ai, enable minimal supplemental logging either at the root container (CDB$ROOT) or at the PDB level, and enable full supplemental logging for the tables of interest in the PDB.

Archive Log Files Retention Period

Set the retention time for archived redo log files to be longer than the maximum time the connector is allowed to be out of service. Confluent recommends you set log retention policies to at least 24 hours. If you have a shorter retention policy and your table doesn’t have many activities, the connector may not be able to find a record with the last committed SCN.

Unique Redo Log Topic Per Connector

You must not share redo log topics among connectors. This can cause unexpected behavior.

Tasks Count

You can configure the connector to use as few as one task by setting tasks.max to 1, or scale to as many tasks as required to capture all table changes. For maximum parallelism, set the number of tasks to be one plus the number of tables being captured.

Redo Log Fetch Size

Consider increasing the redo.log.row.fetch.size from the default (10) to increase throughput. To find an optimal setting, benchmark the workload with different values. Optimal value depends on various factors, including network latency, available memory, and driver version.

Connection Pool Settings

The connector uses one connection to stream changes from the Oracle database. In addition, it uses one connection per table during the initial snapshot phase. Once the snapshot is complete, only the task that reads redo log from the database will require a connection to the database to stream database changes into the redo log topic.

Up-to-date Database Stats

The connector reads from the ALL_TABLES view to get the tables accessible to the current user. Ensure the statistics are up to date to improve query performance. This is especially important on Confluent Cloud where timeouts are set on these queries.

  • DBMS_STATS.GATHER_DICTIONARY_STATS
  • DBMS_STATS.GATHER_FIXED_OBJECTS_STATS

Numeric Type Mapping

You can use the numeric.mapping configuration property to map numeric types with known precision and scale to their best matching primitive type. numeric.mapping documents the specific precision and scale required on each numeric type to be able to map to a given connect primitive type.

Transactions Buffering

Starting from version 2.0.0, the connector buffers uncommitted transactions in the connector’s memory. If you have long running transactions and record.buffer.mode is set to connector, ensure you have sufficient memory for your Connect workers. If not, consider breaking these long running transactions into smaller transactions, avoiding a potential out-of-memory issue.

Snapshots in Parallel

You can configure the connector to perform snapshots in parallel for large tables that are partitioned in Oracle. Set snapshot.by.table.partitions to true to assign more than one task to one table (if the table is partitioned). This reduces the overall time required to perform the snapshot by scaling out the number of tasks.

Note

When running a connector with snapshot.by.table.partitions=true, create table-specific topics ahead of time.

Infrequently Updated Databases

Use the heartbeat feature (see the following note) in environments where the connector is configured to capture tables that are infrequently updated so that the offsets stored in the source offsets topic can move forward. Otherwise, a task restart could cause the connector to fail with the ORA-01291 missing logfile or ORA-01292: no log file error if the archived redo log file corresponding to the stored source offset has been purged from the database.

Note

This feature is available from version 2.3.0 for Oracle 19c and in subsequent supported database versions. For supported versions prior to Oracle 19c, use version 2.7.0.