This document describes the best practices for configuring the Oracle CDC connector.
- Check Database Prerequisites
- Use Latest Connector Version
- oracle.service.name for PDB
- Full Supplemental Logging
- Archive Log Files Retention Period
- Unique Redo Log Topic Per Connector
- Tasks Count
- Redo Log Fetch Size
- Connection Pool Settings
- Up-to-date Database Stats
- Numeric Type Mapping
- Transactions Buffering
- Snapshots in Parallel
- Infrequently Updated Databases
Check Database Prerequisites¶
Some errors may occur if you do not satisfy all the database prerequisites before configuring the connector. To ensure you meet all requirements, see Oracle Database Prerequisites before moving forward.
Use Latest Connector Version¶
Use the latest version of the connector from Confluent Hub. This includes every enhancement, bug fix, and performance improvement done to the connector. The changelog records the specific changes in each version.
oracle.service.name for PDB¶
oracle.service.name to the container database (CDB) service name when
using a pluggable database (PDB).
Full Supplemental Logging¶
The connector’s performance depends on the size of the redo log and the count of redo log records that it needs to process. To minimize the redo log generated, enable full supplemental logging only for the tables of interest and not the entire database.
In case of a multi-tenant database having the tables of interest in a PDB, enable minimal supplemental logging for the root container (CDB$ROOT) and full supplemental logging for the tables of interest in the PDB.
Archive Log Files Retention Period¶
Set the retention time for archived redo log files to be longer than the maximum time the connector is allowed to be out of service. Confluent recommends you set log retention policies to at least 24 hours. If you have a shorter retention policy and your table doesn’t have many activities, the connector may not be able to find a record with the last committed SCN.
Unique Redo Log Topic Per Connector¶
You must not share redo log topics among connectors. This can cause unexpected behavior.
You can configure the connector to use as few as one task by setting
tasks.max to 1, or scale to as many tasks as required to capture all table
changes. For maximum parallelism, set the number of tasks to be one plus the
number of tables being captured.
Redo Log Fetch Size¶
Consider increasing the
redo.log.row.fetch.size from the default (10) to
increase throughput. To find an optimal setting, benchmark the workload with
different values. Optimal value depends on various factors, including network
latency, available memory, and driver version.
Connection Pool Settings¶
The connector uses one connection to stream changes from the Oracle database. In addition, it uses one connection per table during the initial snapshot phase. Once the snapshot is complete, only the task that reads redo log from the database will require a connection to the database to stream database changes into the redo log topic.
Up-to-date Database Stats¶
The connector reads from the
ALL_TABLES view to get the tables accessible to the
current user. Ensure the statistics are up to date to improve query performance.
This is especially important on Confluent Cloud where timeouts are set on these
Numeric Type Mapping¶
You can use the
numeric.mapping configuration property to map numeric types
with known precision and scale to their best matching primitive type.
numeric.mapping documents the specific precision and scale required on each
numeric type to be able to map to a given connect primitive type.
Starting from version 2.0.0, the connector buffers uncommitted transactions in
the connector’s memory. If you have long running transactions and
record.buffer.mode is set to
connector, ensure you have sufficient
memory for your Connect workers. If not, consider breaking these long running
transactions into smaller transactions, avoiding a potential out-of-memory
Snapshots in Parallel¶
You can configure the connector to perform snapshots in parallel for large
tables that are partitioned in Oracle. Set
true to assign more than one task to one table (if the table is
partitioned). This reduces the overall time required to perform the snapshot by
scaling out the number of tasks.
When running a connector with
table-specific topics ahead of time.
Infrequently Updated Databases¶
Use the heartbeat feature (available from version 2.3.0) in environments where
the connector is configured to capture tables that are infrequently updated so
that the offsets stored in the source offsets topic can move forward. Otherwise,
a task restart could cause the connector to fail with the
logfile error if the archived redo log file corresponding to the stored source
offset has been purged from the database.