Best Practices for Oracle CDC Source Connector for Confluent Platform¶
This document describes the best practices for configuring the Oracle CDC connector.
- Check Database Prerequisites
- Use Latest Connector Version
- oracle.service.name for PDB
- Full Supplemental Logging
- Archive Log Files Retention Period
- Unique Redo Log Topic Per Connector
- Tasks Count
- Redo Log Fetch Size
- Connection Pool Settings
- Up-to-date Database Stats
- Numeric Type Mapping
- Transactions Buffering
- Snapshots in Parallel
- Infrequently Updated Databases
Check Database Prerequisites¶
Some errors may occur if you do not satisfy all the database prerequisites before configuring the connector. To ensure you meet all requirements, see Oracle Database Prerequisites before moving forward.
Use Latest Connector Version¶
Use the latest version of the connector from Confluent Hub. This includes every enhancement, bug fix, and performance improvement done to the connector. The changelog records the specific changes in each version.
oracle.service.name
for PDB¶
Set oracle.service.name
to the container database (CDB) service name when
using a pluggable database (PDB).
Full Supplemental Logging¶
The connector’s performance depends on the size of the redo log and the count of redo log records that it needs to process. To minimize the redo log generated, enable full supplemental logging only for the tables of interest and not the entire database.
For a multi-tenant database with tables of interest in a PDB:
- Up to version 21c, enable minimal supplemental logging for the root container (CDB$ROOT), and enable full supplemental logging for the tables of interest in the PDB.
- For version 23ai, enable minimal supplemental logging either at the root container (CDB$ROOT) or at the PDB level, and enable full supplemental logging for the tables of interest in the PDB.
Archive Log Files Retention Period¶
Set the retention time for archived redo log files to be longer than the maximum time the connector is allowed to be out of service. Confluent recommends you set log retention policies to at least 24 hours. If you have a shorter retention policy and your table doesn’t have many activities, the connector may not be able to find a record with the last committed SCN.
Unique Redo Log Topic Per Connector¶
You must not share redo log topics among connectors. This can cause unexpected behavior.
Tasks Count¶
You can configure the connector to use as few as one task by setting
tasks.max
to 1, or scale to as many tasks as required to capture all table
changes. For maximum parallelism, set the number of tasks to be one plus the
number of tables being captured.
Redo Log Fetch Size¶
Consider increasing the redo.log.row.fetch.size
from the default (10) to
increase throughput. To find an optimal setting, benchmark the workload with
different values. Optimal value depends on various factors, including network
latency, available memory, and driver version.
Connection Pool Settings¶
The connector uses one connection to stream changes from the Oracle database. In addition, it uses one connection per table during the initial snapshot phase. Once the snapshot is complete, only the task that reads redo log from the database will require a connection to the database to stream database changes into the redo log topic.
Up-to-date Database Stats¶
The connector reads from the ALL_TABLES
view to get the tables accessible to the
current user. Ensure the statistics are up to date to improve query performance.
This is especially important on Confluent Cloud where timeouts are set on these
queries.
- DBMS_STATS.GATHER_DICTIONARY_STATS
- DBMS_STATS.GATHER_FIXED_OBJECTS_STATS
Numeric Type Mapping¶
You can use the numeric.mapping
configuration property to map numeric types
with known precision and scale to their best matching primitive type.
numeric.mapping
documents the specific precision and scale required on each
numeric type to be able to map to a given connect primitive type.
Transactions Buffering¶
Starting from version 2.0.0, the connector buffers uncommitted transactions in
the connector’s memory. If you have long running transactions and
record.buffer.mode
is set to connector
, ensure you have sufficient
memory for your Connect workers. If not, consider breaking these long running
transactions into smaller transactions, avoiding a potential out-of-memory
issue.
Snapshots in Parallel¶
You can configure the connector to perform snapshots in parallel for large
tables that are partitioned in Oracle. Set snapshot.by.table.partitions
to
true
to assign more than one task to one table (if the table is
partitioned). This reduces the overall time required to perform the snapshot by
scaling out the number of tasks.
Note
When running a connector with snapshot.by.table.partitions=true
, create
table-specific topics ahead of time.
Infrequently Updated Databases¶
Use the heartbeat feature (see the following note) in environments where
the connector is configured to capture tables that are infrequently updated so
that the offsets stored in the source offsets topic can move forward. Otherwise,
a task restart could cause the connector to fail with the ORA-01291 missing logfile
or ORA-01292: no log file
error if the archived redo log file corresponding to
the stored source offset has been purged from the database.
Note
This feature is available from version 2.3.0 for Oracle 19c and in subsequent supported database versions. For supported versions prior to Oracle 19c, use version 2.7.0.