Manage Private Networking for Cluster Linking on Confluent Cloud

The following sections describe supported cluster combinations, commands, configurations, use cases, and walkthroughs for private networking on Confluent Cloud.

Note

Cross-cloud Cluster Linking across clouds on privately networked clusters is in Limited Availability.

Supported cluster combinations

Cluster Linking is fundamentally a networking feature: it copies data over the network. As such, Cluster Linking requires that at least one of the clusters involved has connectivity to the other cluster. The networking parameters of each cluster determines whether the two clusters can be linked, and whether the destination cluster or the source cluster must initiate the connection. By default, the destination cluster initiates the connection. A special mode called “source-initiated links” allows the source cluster to initiate the connection of the cluster link.

Submit source and destination cluster types in the compatibility checkers below to quickly determine if a given combination is supported, or consult the detailed tables that follow for more guidance.

Confluent Cloud source and destination clusters

Confluent Cloud Clusters Compatibility Checker

Cloud providers Amazon Web Services (AWS), Microsoft Azure, and Google Cloud are all supported.

Confluent Cloud combinations

Source cluster

Destination cluster

Possible?

Notes

Confluent Cloud - Basic, Standard, or Dedicated with secure public endpoints [1]

Confluent Cloud - Any Enterprise cluster with private networking, or Dedicated cluster.

Yes

  • Source and destination clusters can be on the same or different cloud providers

  • Source and destination clusters can be on the same or different Confluent Cloud Organization

Confluent Cloud - Public Dedicated cluster on any supported cloud provider

Confluent Cloud - Public Dedicated cluster on any supported cloud provider

Yes

  • Source and destination clusters can be on the same or different cloud providers

  • Source and destination clusters can be on the same or different Confluent Cloud Organization

Confluent Cloud - Public Dedicated cluster on any supported cloud provider

Confluent Cloud - Private Dedicated cluster on any supported cloud provider

Yes

  • Using destination-initiated, source-initiated, or bidirectional links

  • Source and destination clusters can be on the same or different Confluent Cloud Organization

Confluent Cloud - Privately networked Enterprise cluster on any supported cloud provider [1]

Confluent Cloud - Public Dedicated cluster on any supported cloud provider

Yes

  • Must use a source-initiated link as described in Private to public Cluster Linking

  • Source and destination clusters can be on the same or different Confluent Cloud Organization

Confluent Cloud - Public Dedicated cluster on any supported cloud provider [1]

Confluent Cloud - Privately networked Enterprise cluster on any supported cloud provider

Yes

  • Source and destination clusters can be on the same or different Confluent Cloud Organization

Confluent Cloud - Enterprise cluster with AWS, Azure, or Google Cloud private networking [2] [3] [4] [6]

Confluent Cloud - Enterprise cluster with AWS, Azure, or Google Cloud private networking

Yes

  • Clusters must be in the same Confluent Cloud Organization

Confluent Cloud - Enterprise cluster with AWS, Azure, or Google Cloud private networking [2] [3] [4] [6]

Confluent Cloud - Dedicated cluster with AWS, Azure, or Google Cloud private networking

Yes

  • Clusters must be in the same Confluent Cloud Organization

Confluent Cloud - Dedicated cluster with AWS, Azure, or Google Cloud private networking [2] [3] [4] [6]

Confluent Cloud - Enterprise cluster with AWS, Azure, or Google Cloud private networking

Yes

  • Clusters must be in the same Confluent Cloud Organization

Confluent Cloud - Dedicated cluster with AWS, Azure, or Google Cloud private networking [2] [3] [4] [6]

Confluent Cloud - Dedicated cluster with AWS, Azure, or Google Cloud private networking

Yes

  • Clusters must be in the same Confluent Cloud Organization

Confluent Cloud - Dedicated cluster with private networking

Confluent Cloud - Dedicated cluster with public networking

Yes

  • Must use a source-initiated link as described in Private to public Cluster Linking

  • Not available to create in Confluent Cloud Console

  • Source and destination clusters can be on the same or different Confluent Cloud Organization

  • Dedicated clusters with Transit Gateway networking is only available/applicable on AWS

Confluent Cloud - Basic, Standard, or Freight cluster

Confluent Cloud - Basic, Standard, or Freight cluster

No

  • Cluster Linking across Basic, Standard, and Freight clusters are currently not supported

Confluent Cloud - (Legacy) Dedicated cluster with AWS Transit Gateway networking [5]

Confluent Cloud - (Legacy) Dedicated cluster with AWS Transit Gateway networking

Yes

  • Requires proper Transit Gateway configuration

  • Neither cluster can use the CIDR 198.18.0.0/15

  • Clusters must be in the same Confluent Cloud Organization

  • Dedicated clusters with Transit Gateway networking is only available/applicable on AWS

Note

Confluent Platform and Confluent Cloud

Confluent Platform & Confluent Cloud Compatibility Checker

Choose one each of Confluent Platform and Confluent Cloud cluster types for source and destination on either side of the link. Confluent Cloud supported cluster types are Dedicated and Enterprise. Cloud providers Amazon Web Services (AWS), Microsoft Azure, and Google Cloud are all supported.

Confluent Platform and Confluent Cloud combinations

Source cluster

Destination cluster

Possible?

Notes

Confluent Platform 7.1.0 or later

Confluent Cloud - Any Enterprise cluster with private networking or Dedicated cluster

Yes

  • Must use a source-initiated link

  • Source Confluent Platform cluster must have connectivity to the destination cluster

  • Brokers must be Confluent Server

Confluent Platform 5.4+ with public endpoints on all brokers

Confluent Cloud - Any Enterprise cluster with private networking or Dedicated cluster

Yes

  • For AWS only. If egress static IP addresses are used, a firewall in front of Confluent Platform can filter on those IP addresses

  • Cluster link must use SASL/PLAIN, SASL/SCRAM, and/or mTLS

Confluent Platform 5.4+ without public endpoints

Confluent Cloud - A privately networked Enterprise cluster, or Dedicated cluster with VPC Peering, VNet Peering, or Transit Gateway

Yes

  • Destination Confluent Cloud cluster must have connectivity to the source Confluent Cloud cluster

  • Cluster link must use SASL/PLAIN, SASL/SCRAM, and/or mTLS

  • Your cloud networking must be configured to allow the Confluent Cloud cluster to reach the brokers on the source cluster

Confluent Cloud - A Basic or Standard cluster, or a Dedicated cluster with secure public endpoints

Confluent Platform 7.0.0 or later

Yes

Confluent Cloud - A cluster with private networking

Confluent Platform 7.0.0 or later

Yes

  • Destination Confluent Platform cluster must have connectivity to the source Confluent Cloud cluster

For details on Confluent Platform only supported combinations, see Supported platform and tools compatibilities

Confluent Cloud and Kafka

The following combinations are supported for private networking with Confluent Cloud and Apache Kafka®.

Source cluster

Destination cluster

Possible?

Notes

Kafka 2.4 or later with public endpoints on all brokers

Confluent Cloud - Any Dedicated or Enterprise cluster

Yes

  • For AWS only. If egress static IP addresses are used, a firewall in front of Confluent Platform can filter on those IP addresses

  • Cluster link must use SASL/PLAIN, SASL/SCRAM, and/or mTLS

Kafka 2.4 or later without public endpoints

Confluent Cloud - Any Dedicated cluster with VPC Peering, VNet Peering, or Transit Gateway

Yes

  • Destination Confluent Cloud cluster must have connectivity to the source Kafka cluster

  • Cluster link must use SASL/PLAIN, SASL/SCRAM, and/or mTLS

  • Your cloud networking must be configured to allow the Confluent Cloud cluster to reach the brokers on the source cluster

Limitations and Summary

The matrices of supported and non-supported cluster combinations above are a bit complex. To simplify at a high level, keep in mind these more generalized, shorthand tips.

The following cluster combinations are supported for cross-cloud scenarios; that is, clusters can be in different cloud providers:

  • Any public cluster can connect to private networking.

  • “Public to Public” using destination-initiated, source-initiated, or bidirectional links is supported across organizations and cloud providers.

  • “Public to Private” using destination-initiated, source-initiated, or bidirectional links is supported across organizations and cloud providers.

  • “Private to Public” using source-initiated or bidirectional links is supported across organizations and cloud providers. (This is not supported with destination-initiated links.)

  • “Private to Private” clusters across cloud providers Azure, AWS, and Google Cloud are supported. It is not supported across organizations.

Conversely, the following combinations are not supported for cross-cloud scenarios:

  • “Private to Public” with destination-initiated links is not supported across organizations or cloud providers. This is only supported with source-initiated links.

Diagrams of supported combinations for private networking

Confluent Cloud to Confluent Cloud

../../_images/cluster-link-private-net-cloud-to-cloud.png

Confluent Cloud to Confluent Platform/Kafka

../../_images/cluster-link-private-net-cloud-to-cp-kafka.png

Supported regions

Supported AWS Regions

The following AWS regions are supported in the current release.

The Americas and Canada

Code

Region

ca-central-1

Canada (Central)

ca-west-1

Canada (West)

us-west-2

US West (Oregon)

us-east-1

US East (N. Virginia)

us-east-2

US East (Ohio)

sa-east-1

South America (São Paulo)

Europe

Code

Region

eu-north-1

Europe (Stockholm)

eu-south-1

Europe (Milan)

eu-south-2

Europe (Spain)

eu-central-1

Europe (Frankfurt)

eu-central-2

Europe (Zurich)

eu-west-1

Europe (Ireland)

eu-west-2

Europe (London)

eu-west-3

Europe (Paris)

Middle East

Code

Region

me-central-1

Middle East (UAE)

me-south-1

Middle East (Bahrain)

il-central-1

Israel (Tel Aviv)

Africa

Code

Region

af-south-1

Africa (Cape Town)

Asia Pacific

Code

Region

ap-south-1

Asia Pacific (Mumbai)

ap-south-2

Asia Pacific (Hyderabad)

ap-east-1

Asia Pacific (Hong Kong)

ap-northeast-1

Asia Pacific (Tokyo)

ap-northeast-2

Asia Pacific (Seoul)

ap-northeast-3

Asia Pacific (Osaka)

ap-southeast-1

Asia Pacific (Singapore)

ap-southeast-2

Asia Pacific (Sydney)

ap-southeast-3

Asia Pacific (Jakarta)

ap-southeast-4

Asia Pacific (Melbourne)

Supported Azure Regions

The following Azure regions are supported in the current release.

The Americas and Canada

Code

Region

canadacentral

Canada (Central)

eastus

United States (Virginia)

eastus2

United States (Virginia)

centralus

United States (Iowa)

southcentralus

United States (Texas)

westus2

United States (Washington)

westus3

United States (Phoenix)

mexicocentral

Mexico (Central)

brazilsouth

Brazil (São Paulo State)

Europe and Scandinavia

Code

Region

uksouth

United Kingdom (London)

germanywestcentral

Germany (Frankfurt)

francecentral

France (Paris)

italynorth

Italy (Milan)

spaincentral

Spain (Madrid)

westeurope

Europe (Netherlands)

northeurope

Europe (Ireland)

switzerlandnorth

Switzerland (Zurich)

norwayeast

Norway (Oslo)

swedencentral

Sweden (Gävle)

Middle East, Asia Pacific, and Africa

Code

Region

uaenorth

United Arab Emirates (Dubai)

japaneast

Japan (Tokyo)

southeastasia

Asia Pacific (Singapore)

eastasia

Asia Pacific (Hong Kong)

southafricanorth

South Africa (Johannesburg)

centralindia

India (Pune)

australiaeast

Australia (New South Wales)

Supported Google Cloud Regions

The following Google Cloud regions are currently supported.

The Americas and Canada

Code

Region

us-central1

Iowa (United States)

us-east1

South Carolina (United States)

us-east4

Northern Virginia (United States)

us-west1

Oregon (United States)

us-west2

Los Angeles (United States)

us-west3

Salt Lake City, Utah (United States)

us-west4

Las Vegas, Nevada (United States)

us-south1

Dallas (Texas) (United States)

northamerica-northeast1

Montreal (Canada)

northamerica-northeast2

Toronto (Canada)

southamerica-east1

São Paulo (Brazil)

southamerica-west1

Santiago (Chile)

Europe and Scandinavia

Code

Region

europe-west1

Belgium (Europe)

europe-west2

London (Europe)

europe-west3

Frankfurt (Germany)

europe-west4

Netherlands (Europe)

europe-west6

Zurich (Switzerland)

europe-west8

Milan (Italy)

europe-west9

Paris (France)

europe-west12

Turin (Italy)

europe-north1

Finland (Finland)

europe-central2

Warsaw (Poland)

europe-southwest1

Madrid (Spain)

Middle East and Asia Pacific

Code

Region

me-central1

Doha (Qatar)

me-central2

Dammam (Saudi Arabia)

me-west1

Tel Aviv (Israel)

asia-east1

Hong Kong (Asia Pacific)

asia-east2

Hong Kong (Asia Pacific)

asia-northeast1

Tokyo (Japan)

asia-northeast2

Osaka (Japan)

asia-northeast3

Seoul (Korea)

asia-south1

Mumbai (India)

asia-south2

Delhi (India)

asia-southeast1

Singapore (Asia Pacific)

asia-southeast2

Jakarta (Indonesia)

australia-southeast1

Sydney (Australia)

australia-southeast2

Melbourne (Australia)

Private to public Cluster Linking

Private to public Cluster Linking has some specific requirements and configurations. These are described below, followed by a walkthrough.

Requirements

These instructions apply to cluster links which fall under this category:

Source Cluster

Dedicated Confluent Cloud cluster with private networking. Any cloud provider, region, and networking type (for example, Private Link, VPC Peering, Transit Gateway, and Private Service* Connect)

Destination Cluster

Dedicated Confluent Cloud cluster with secure public endpoints.

The clusters can be in the same or different Confluent Cloud organizations.

Such a cluster link requires using a “source-initiated link”, which differs from other cluster links in these ways:

  • A cluster link is created on the destination (as usual) but with certain configurations (described below).

  • After the cluster link is created on the destination cluster, a cluster link object must be created on the source cluster, too. It must have certain configurations (described below).

  • This type of cluster link cannot be created on the Confluent Cloud Console. It must be created using the cluster REST APIs, the Confluent CLI, Terraform, or Confluent for Kubernetes.

  • Deleting either cluster link object –- source or destination –- will stop data from being replicated between the clusters.

  • Both cluster link objects must be deleted in order to stop billing charges. If one is deleted but one remains, hourly billing charges may continue to accrue.

  • The cluster link’s service account must have a CLUSTER:ALTER ACL on the destination cluster. Alternatively, the service account can have the CloudClusterAdmin RBAC role on the destination cluster.

Configurations

Destination cluster

The cluster link on the Destination cluster needs these configurations:

  • connection.mode=INBOUND

  • No bootstrap server, instead, the source cluster’s cluster link object will get this.

  • No security configuration or authentication credentials; instead, the source cluster’s cluster link object will get these.

  • Other optional configuration options for this cluster link, such as consumer offset sync, ACL sync, and prefix.

Source cluster

The cluster link on the Source cluster needs these configurations:

  • link.mode=SOURCE

  • connection.mode=OUTBOUND

  • Bootstrap server set to the Destination cluster’s bootstrap server

  • Security credentials to authenticate into the Destination cluster. (These can be either API key or OAuth). The Service Account or user with these security credentials must have the CLUSTER:ALTER ACL on the destination (public) cluster, or alternatively the CloudClusterAdmin RBAC role.

  • Security credentials to authenticate into the Source cluster. (These can be either API key or OAuth.)

Confluent Cloud billing considerations

There are cost differences associated with private vs. public networking. These are detailed under Cluster Linking in the Billing documentation. Examples are provided there for public networking.

Here is further detail specifically related to the private-public-private pattern outlined for Cluster Linking.

Private-Public-Private pattern cost considerations and security

When adding the jump cluster architecture to an existing Origin and Target cluster, new Confluent Cloud cost will be introduced by:

  • The Jump cluster (an hourly cost)

  • The two cluster links: one from origin to Jump, and one from Jump to Target (an hourly cost)

  • The data throughput (ClusterLinkRead and ClusterLinkWrite charged twice), once on each cluster link (a per GB cost).

How to keep costs as low as possible and maintain security

Consider these suggestions and recommendations for keeping costs as low as possible while still achieving the desired level of security on the cluster links.

Use a single zone for the jump cluster

Using a “single zone” cluster for the Jump cluster will reduce the hourly cost of the jump cluster, if the replication use case can tolerate the “single zone” service level agreement (SLA). Here are points to consider for this strategy:

  • “Multi-zone” clusters are always recommended for clusters that run “production” traffic–which the Origin and Target clusters likely do–as they come with a higher SLA. However, the “single zone” Jump cluster’s SLA may be sufficient if the use case is data sharing, data aggregation, or even disaster recovery.

  • If there is a disruption in the availability zone used by a “single zone” jump cluster, it will only impact the replication flow from Origin to Target. Producers and consumers on both Origin and Target clusters will be unaffected. When the disruption ends, if the Jump cluster returns to healthy status, the cluster links will automatically resume and catch the Target cluster up with any data that it missed.

  • For a Data Sharing or Data Aggregation architecture, replication is used to send a copy of data from the Origin to the Target, so the Target’s consumers can read the data. An outage of the availability zone of a single zone Jump cluster would stop the cluster links from replicating new data to the Target cluster. During this time, the Target’s consumers can still read any data that was already replicated to the Target. When the outage is over, if the Jump cluster recovers in good health, the Target’s replicated topics will automatically catch up and the consumers will be able to read data. Topics on the Target cluster that do not use replication will be unaffected. Therefore, the only effect of an availability zone outage on a single zone Jump cluster is a delay in some replicated data. A multi-zone Jump cluster would avoid this case, but for an added cost.

  • For a Disaster Recovery (DR) architecture, replication is used to keep topics on the Target cluster up-to-date and ready for failover, should an outage hit the Origin cluster. If the availability zone of a single zone Jump cluster has an outage, this would temporarily halt replication, thereby increasing the RPO at the moment. Hypothetically, if some time after this outage began and before it resolved, the Origin cluster’s region (a different cloud region from the Jump cluster) also experienced an outage, then its applications would fail over to the Target cluster without access to recent data produced since the Jump cluster’s zone outage. The net effect is that, in this corner case, a multi-zone Jump cluster would yield a lower RPO than a single zone jump cluster, for an added cost. However, as of this writing, there are no documented cases of a cloud provider experiencing a regional outage and an availability zone outage in a different region at the same time. Because this hypothetical corner case is exceedingly rare, most enterprise companies require disaster recovery for only a single cloud region outage at a time, and therefore would be served by a single zone Jump cluster.

Legacy workflows using network linking

Although less efficient than the newer solutions described in How to use cluster links with private networking, the following workflows using network linking are still supported. If you already have these workflows in place, you can continue to use them.

(Legacy) Cluster Linking between AWS Transit Gateway attached Confluent Cloud clusters

Warning

This legacy workflow requires you to use AWS Transit Gateway to create a cluster link with private networking. Support for private linking with AWS and Azure now enables you to bypass this setup. The currently recommended workflow for both AWS and Azure is described in Cluster Linking between Confluent Cloud clusters using Azure, AWS, or Google Cloud private networking.

Confluent provides Cluster Linking between AWS Transit Gateway Confluent Cloud clusters as a fully-managed solution for geo-replication, multi-region, high availability and disaster recovery, data sharing, or aggregation.

This section describes how to use Cluster Linking to sync data between two private Confluent Cloud clusters in different AWS regions that are each attached to an AWS Transit Gateway. You can provision new Confluent Cloud clusters or use existing AWS Transit Gateway attached or AWS virtual private cloud (VPC) Peered Confluent Cloud clusters.

../../_images/cluster-link-private-net-aws-gate-arch.png

Limitations

  • This is limited to Confluent Cloud clusters that use AWS Transit Gateway as their networking type.

    Tip

    AWS VPC Peered clusters can be seamlessly converted to AWS Transit Gateway clusters with a Confluent support ticket. The Confluent Cloud clusters can be in the same or in different Confluent Cloud Environments or Organizations. The Transit Gateways can be in the same or different AWS Accounts. Connecting clusters from different organizations is useful for data sharing between organizations.

  • The clusters must be provisioned with different CIDRs. The address ranges cannot overlap.

  • The CIDRs for both clusters must be within RFC 1918:

    • 10.0.0.0/8

    • 100.64.0.0/10

    • 172.16.0.0/12

    • 192.168.0.0/16

  • The CIDRs for either cluster cannot be 198.18.0.0/15, even though it is a valid Confluent Cloud CIDR.

  • This configuration does not support combinations with other networking types, such as PrivateLink, or with other cloud providers, such as Google Cloud or Microsoft Azure.

Setup

Step 1: Create the networks and clusters
  1. Determine:

    • the two regions to use

    • the two non-overlapping /16 CIDRs for the two Confluent Cloud clusters to use

    • the AWS account(s) to use

    This decision will depend on your architecture and business requirements.

    Note that:

    • You can use only one region, but most use cases will involve two (or more) different regions.

    • It is possible for the Cluster Linking to be between different AWS accounts or Confluent Cloud accounts, but most use cases will involve one AWS account and one Confluent Cloud account.

  2. Provision two AWS Transit Gateways, one in each region, and create two resource shares; one for each Transit Gateway, as described in Use AWS Transit Gateway on Confluent Cloud under Networking.

    ../../_images/cluster-link-private-net-aws-gateways.png
  3. Provision a new Transit Gateway enabled Confluent Cloud network, as described in Use AWS Transit Gateway on Confluent Cloud.

    You will need to specify the network ID and the ARN of the resource share containing that region’s Transit Gateway. It is possible to seamlessly convert an existing AWS VPC Peered Confluent Cloud cluster in that region to a Transit Gateway attached cluster.

    ../../_images/cluster-link-priv-net-provision-aws-gate-dedicated-cluster.png
  4. Connect a “Command VPC” from which to issue commands, and create topics on the Confluent Cloud cluster in “Region 1”.

    ../../_images/cluster-link-private-net-aws-gate-command-VPC.png
    1. Create a new VPC in Region 1, from which to run commands against your Confluent Cloud cluster. For purposes of this example, call this the “Command VPC”.

    2. Attach the Command VPC to your Transit Gateway. (Make sure you have a route in the Transit Gateway’s route table that points to the Command VPC for the Command VPC’s CIDR range.)

      ../../_images/cluster-link-private-net-aws-gate-command-VPC-attach-to-transit-gateway.png
    3. In the Command VPC’s route table, create the following routes if they do not already exist:

      Component

      Route to:

      Command VPC CIDR range

      local

      Confluent Cloud CIDR range in this region

      Transit Gateway in this region

      Confluent Cloud CIDR range in the other region

      Transit Gateway in this region

      $0.0.0.0/0

      An Internet Gateway (create one if needed) [7]

  5. Create and launch an EC2 instance in that VPC.

  6. SSH into the EC2 instance. (If needed for this step, create an Elastic IP and assign it to this EC2 instance.)

  7. Install the Confluent Cloud CLI.

  8. Log on to the Confluent CLI with confluent login.

  9. Select your Confluent environment with confluent environment use <environment-ID>. (You can list your environments with confluent environment list <environment-ID>.)

  10. Select your Confluent cluster in this region with confluent kafka cluster use <cluster-id>. (You can list your clusters with confluent kafka cluster list.)

  11. List the topics in your Confluent Cloud cluster with confluent kafka topic list. If this command fails, your Command EC2 instance may not be able to reach your Confluent Cloud cluster. Your networking may not be correctly set up. Make sure you followed the steps above. See the Troubleshoot section if needed.

  12. Create a new topic with confluent kafka topic create my-topic –partitions 1. If you have more Kafka clients, you can spin them up in VPCs attached to the Transit Gateway, and produce to and consume from the cluster.

Cluster Linking between two Confluent Cloud clusters in the same region

If you want to create a cluster link between two AWS Transit Gateway Confluent Cloud clusters in the same AWS region, this is a special case in which the requirements may be different, depending on the networking setup: Here are some scenarios and factors to consider:

  • If both Confluent Cloud clusters are in the same Confluent Cloud network, the additional configuration described in previous sections is not necessary, Only one Transit Gateway is required; without any Transit Gateway peering or changes to its route table.

  • If the Confluent Cloud clusters are in different Confluent Cloud networks, but both are attached to the same Transit Gateway, then the only requirement is that they use the same Transit Gateway Route Table. The routes from each CIDR to each Confluent VPC must be in the same route table. No Transit Gateway Peering is required.

  • If the Confluent Cloud clusters are attached to different Transit Gateways, then the above configuration is required. The steps are no different for two Transit Gateways in the same region from two Transit Gateways in different regions.

Management responsibilities

Every cluster link runs as a continuous service managed by Confluent Cloud. Keeping a cluster link running is a shared responsibility between Confluent and its customers:

  • Confluent is responsible for the Cluster Linking service.

  • The customer is responsible for the network that facilitates the connection between the two clusters.

To operate a cluster link between two AWS Transit Gateway Confluent Cloud clusters, Confluent requires that the AWS networking be configured as laid out in the sections on this page.

Troubleshoot Transit Gateway Cluster Linking

This section covers these issues:

For more troubleshooting help, see Troubleshoot Cluster Linking on Confluent Cloud.

My CLI commands / API calls are failing

Often, CLI commands may fail if you are not running them from a place that has connectivity to your Confluent Cloud cluster. Verify that your CLI (for example, which may be running in a Command EC2 instance) has connectivity to the Confluent Cloud cluster.

  • You can test connectivity to the cluster with:

    confluent kafka cluster describe <destination-cluster-id>
    
  • Get the URL out of REST Endpoint.

    telnet <url> 443
    

    Tip

    Note the space in front of 443 instead of a colon (:).

  • Example success showing connectivity:

    telnet pkc-z3000.us-west-2.aws.confluent.cloud 443
    Trying 10.18.72.172…
    Connected to pkc-z3000.us-west-2.aws.confluent.cloud.
    Escape character is '^]'
    
  • Example failure (the command will hang indefinitely):

    telnet pkc-z3000.us-west-2.aws.confluent.cloud 443
    ...
    

If this process ends in failure, you are not running the CLI commands from a location that has connectivity to your Confluent Cloud cluster. Work with your networking team to ensure your instance is attached and routed to your Transit Gateway to the Confluent Cloud cluster.

Troubleshoot Transit Gateway connectivity

Ensure the Transit Gateways, Peering, and Route Tables are properly configured.

Assuming a transit gateway in Region A and one in Region B, with CIDRs CIDR-A and CIDR-B :

Transit Gateway Region A route table

Component

Route to:

CIDR-A

Confluent VPC A (Confluent is responsible for setting this)

CIDR-B

Transit Gateway B via Peering connection

Transit Gateway Region B route table

Component

Route to:

CIDR-A

Transit Gateway B via Peering connection

CIDR-B

Confluent VPC B (Confluent is responsible for setting this)

Note that each transit gateway has routes set for both CIDRs. This may be the issue. If the transit gateways are not both set up with both CIDRs, then the clusters will not have connectivity to each other. The route between the two Confluent Cloud clusters and transit gateways must allow traffic over port 9092.

You can test if the cross-region connectivity works like this:

  1. Attach a Test VPC in Region A to A’s transit gateway.

  2. Launch an EC2 instance in the Test VPC (or you can use the “Command VPC” if you set one up per the previous steps).

  3. Route the EC2 instance into transit gateway in Region A:

    1. Test VPC Route Table:

      Component

      Route to:

      CIDR-Test-VPC:

      local

      CIDR-A

      Transit gateway A

      CIDR-B

      Transit gateway A (also)

    2. Transit gateway Region A Route Table: Add CIDR-Test-VPC: Test VPC Attachment

    3. Transit gateway Region B Route Table: Add CIDR-Test-VPC: TGW A via Peering connection (needed for bidirectional connectivity)

  4. SSH into the EC2 instance. For example, put an elastic IP on the EC2 instance.

  5. Check connectivity to Cluster A:

    telnet CIDR-A 9092
    

    If successful, you should see similar output:

    Trying 10.18.72.172...
    Connected to pkc-z3000.us-west-2.aws.confluent.cloud.
    Escape character is '^]'.
    
  6. Check connectivity across the peering connection to Cluster B:

    telnet CIDR-A 9092
    

    If successful, you should see similar output:

    Trying 10.18.72.172...
    Connected to pkc-z3000.us-west-2.aws.confluent.cloud.
    Escape character is '^]'.
    

Confluent Cloud billing considerations

There are cost differences associated with private vs. public networking. These are detailed under Cluster Linking in the Billing documentation. Examples are provided there for public networking.

Here is further detail specifically related to the private-public-private pattern outlined for Cluster Linking.

Private-Public-Private pattern cost considerations and security

When adding the jump cluster architecture to an existing Origin and Target cluster, new Confluent Cloud cost will be introduced by:

  • The Jump cluster (an hourly cost)

  • The two cluster links: one from origin to Jump, and one from Jump to Target (an hourly cost)

  • The data throughput (ClusterLinkRead and ClusterLinkWrite charged twice), once on each cluster link (a per GB cost).

How to keep costs as low as possible and maintain security

Consider these suggestions and recommendations for keeping costs as low as possible while still achieving the desired level of security on the cluster links.

Use a single zone for the jump cluster

Using a “single zone” cluster for the Jump cluster will reduce the hourly cost of the jump cluster, if the replication use case can tolerate the “single zone” service level agreement (SLA). Here are points to consider for this strategy:

  • “Multi-zone” clusters are always recommended for clusters that run “production” traffic–which the Origin and Target clusters likely do–as they come with a higher SLA. However, the “single zone” Jump cluster’s SLA may be sufficient if the use case is data sharing, data aggregation, or even disaster recovery.

  • If there is a disruption in the availability zone used by a “single zone” jump cluster, it will only impact the replication flow from Origin to Target. Producers and consumers on both Origin and Target clusters will be unaffected. When the disruption ends, if the Jump cluster returns to healthy status, the cluster links will automatically resume and catch the Target cluster up with any data that it missed.

  • For a Data Sharing or Data Aggregation architecture, replication is used to send a copy of data from the Origin to the Target, so the Target’s consumers can read the data. An outage of the availability zone of a single zone Jump cluster would stop the cluster links from replicating new data to the Target cluster. During this time, the Target’s consumers can still read any data that was already replicated to the Target. When the outage is over, if the Jump cluster recovers in good health, the Target’s replicated topics will automatically catch up and the consumers will be able to read data. Topics on the Target cluster that do not use replication will be unaffected. Therefore, the only effect of an availability zone outage on a single zone Jump cluster is a delay in some replicated data. A multi-zone Jump cluster would avoid this case, but for an added cost.

  • For a Disaster Recovery (DR) architecture, replication is used to keep topics on the Target cluster up-to-date and ready for failover, should an outage hit the Origin cluster. If the availability zone of a single zone Jump cluster has an outage, this would temporarily halt replication, thereby increasing the RPO at the moment. Hypothetically, if some time after this outage began and before it resolved, the Origin cluster’s region (a different cloud region from the Jump cluster) also experienced an outage, then its applications would fail over to the Target cluster without access to recent data produced since the Jump cluster’s zone outage. The net effect is that, in this corner case, a multi-zone Jump cluster would yield a lower RPO than a single zone jump cluster, for an added cost. However, as of this writing, there are no documented cases of a cloud provider experiencing a regional outage and an availability zone outage in a different region at the same time. Because this hypothetical corner case is exceedingly rare, most enterprise companies require disaster recovery for only a single cloud region outage at a time, and therefore would be served by a single zone Jump cluster.

Use bidirectional cluster links if data needs to flow in both directions

If data needs to flow in both directions (Origin to Target, and Target to Origin), using a bidirectional link is most cost-effective.

Using the “bidirectional” cluster link mode will allow topics to mirror from Origin to Target, and will also allow topics to go back from Target to Origin without any additional cluster links. (Bidirectional mode is always recommended when using Cluster Linking for Disaster Recovery.)

A single Jump cluster can be used for both directions, as long as prefixing is used on at least one of the cluster links.

Next steps

Try Confluent Cloud on AWS Marketplace with $1000 of free usage for 30 days, and pay as you go. No credit card is required.