Workloads Scheduling with Confluent for Kubernetes

Overview

One of the things you can do to get the optimal performance out of Confluent components is to control how the component pods are scheduled on Kubernetes nodes. For example, you can configure pods not to be scheduled on the same node as other resource intensive applications, pods to be scheduled on dedicated nodes, or pods to be scheduled on the nodes with the most suitable hardware.

To avoid components competing for the same resources, such as storage or network, and causing performance degradation in both pods, schedule component pods in a way that avoids sharing nodes with other critical workloads.

Confluent for Kubernetes supports the following features for scheduling Confluent component pods on Kubernetes nodes:

Taints and tolerations
Affinity and anti-affinity
Pod topology spread constraints
One replica per node

Taints and tolerations

Taints and tolerations work together to ensure that pods are not scheduled onto specific nodes in Kubernetes.

Taints are applied to a node, and tolerations are applied to pods. Only the pods that do not declare a toleration for the taint cannot be scheduled to that node.

Affinity and anti-affinity

A taint and toleration make sure that pods are not scheduled onto the nodes, but the features do not guarantee pods get scheduled onto the node where you intend them to run.

To schedule the pods you do want on the right node, use another Kubernetes feature, affinity and anti-affinity. Confluent for Kubernetes supports the following types of Kubernetes affinity features:

Node affinity
Pod affinity and anti-affinity

Node affinity

The node affinity feature allows you to schedule workloads (pods) onto specific nodes, for example, to optimize for various resources, such as storage, CPU, and networking.

With Confluent for Kubernetes, you can create node affinity rules to specify which Kubernetes nodes your Confluent pods are eligible to be scheduled on. For example, if you have special hardware that you want to use to run one or more Confluent pods, use node affinity to pin those pods to the special hardware nodes.

Pod affinity and pod anti-affinity

The pod affinity feature allows you to specify that a pod should be scheduled on the same node as another pod.

The pod anti-affinity feature allows you to specify that a set of pods should be scheduled away from one another on different nodes, availability zones, or several other potential topology domains. For example, you can use pod anti-affinity to ensure that the Kafka brokers do not share nodes with other resource-intensive or critical workloads.

The affinity and anti-affinity rules apply at the namespace level.

Pod topology spread constraints

Using pod topology spread constraints, you can control the distribution of your pods across nodes, zones, regions, or other user-defined topology domains, achieving high availability and efficient cluster resource utilization.

You first label nodes to provide topology information, such as regions, zones, and nodes. Then in Confluent component CRs, you define spread constraints, such as which pods to group together, which topology domains they are spread among, and the acceptable skew.

Pod topology constraints only apply to pods within the same namespace.

One replica per node

You can configure CFK to enforce only one replica to run on one Kubernetes node.

This scheduling method guarantees that Confluent Platform components do not compete for system resources with other applications running on the same node.

The rule applies at the namespace level and only to the replicas from the same cluster.

Confluent recommends that you enable the one replica per node setting for Kafka and ZooKeeper.

For other Confluent Platform components, you can enable the one replica per node setting for availability, but the setting is not required for reliability.

Configure taint and toleration for Confluent components

Configure taint

Taint the nodes using the kubectl taint nodes command as described in Taint and Toleration.

Configure toleration

When a pod is configured with a toleration, the pod “tolerates” the taint that matches the triple <key,value,effect> using the matching operator <operator>.

When a pod tolerates a node taint, the pod can be scheduled to the node.

Configure the tolerations property of the Confluent component CRs as follows, and create or update the resource.

spec:
  podTemplate:
    tolerations:
    - effect:                --- [1]
      key:                   --- [2]
      operator:              --- [3]
      value:                 --- [4]
      tolerationSeconds:     --- [5]

[1] Indicates what to do with intolerant pods. Allowed values are NoSchedule, PreferNoSchedule, NoExecute.
When set to an empty value, this toleration matches all taint effects.
[2] The taint key that the toleration applies to.
If the key is empty, operator must be set to Exists. This combination means to match all values and all keys.
[3] The match operator to compare key to the value. Allowed operators are Exists and Equal, and Equal is the default.
Exists is equivalent to wildcard for value, so that a pod can tolerate all taints of the key.
[4] The taint value the toleration matches to. If the operator is Exists, the value must be empty.
[5] The period of time the toleration tolerates the taint. Only applies when effect is set to NoExecute. Otherwise, this field is ignored.
By default, it is not set, which means to tolerate the taint forever (do not evict).
Zero and negative values will be treated as 0 (evict immediately).

For example, for the following node taint:

kubectl taint nodes node1 myKey=myValue:NoSchedule

The following CR snippet matches the taint created above, and the pod can be schedule onto the node:

spec:
  podTemplate:
    tolerations:
    - key: "myKey"
      operator: "Equal"
      value: "myValue"
      effect: "NoSchedule"

For more information on the fields, see Taints and tolerations.

Configure node affinity for Confluent components

This section describes how to configure node affinity scheduling rules for Confluent component pods.

Using node affinity, you specify which nodes Confluent component pods can run on based on labels on the nodes (matchExpressions) or on node fields (matchFields). Each match requirement is a <key, operator, value> triple.

Two types of node affinity rules are supported:

requiredDuringSchedulingIgnoredDuringExecution
A pod must satisfy these rules to be scheduled onto a node.
preferredDuringSchedulingIgnoredDuringExecution
The scheduler prefers to schedule the pods to the nodes that satisfy the specified affinity expression, but it may choose a node that violates one or more of the expressions.
The node with the greatest sum of weights is most preferred.
An empty preferred scheduling term matches all objects with implicit weight 0.
A null preferred scheduling term matches no objects.

Configure node affinity with requiredDuringSchedulingIgnoredDuringExecution

The pod can be scheduled to a node if one of the nodeSelectorTerms matches.

A nodeSelectorTerms item is a match if all matchExpressions match.

To create the node affinity rules to match node labels, configure the nodeAffinity property of the Confluent components CRs as follows and create or update the resource. Set either matchExpressions or matchFields.

spec:
  podTemplate:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:     --- [1]
            - matchExpressions:  --- [2]
                - key:           --- [3]
                  operator:      --- [4]
                  values:        --- [5]
              matchFields:       --- [6]
                - key:           --- [7]
                  operator:      --- [8]
                  values:        --- [9]

[1] Required. A list of node selector terms.
[2] A list of node selector requirements by node’s labels.
A node selector requirement is a selector that contains values, a key, and an operator that relates the key and values.
[3] Required. The label key that the selector applies to.
[4] Required. Represents how the key is related to the values. Valid operators are In, NotIn, Exists, DoesNotExist, Gt, and Lt.
[5] An array of string values.
If the operator is In or NotIn, the values array must be non-empty.
If the operator is Exists or DoesNotExist, the values array must be empty.
If the operator is Gt or Lt, the values array must have a single element, which will be interpreted as an integer.
[6] A list of node selector requirements by node’s fields.
A node selector requirement is a selector that contains values, a key, and an operator that relates the key and values.
[7] Required. The field key that the selector applies to.
[8] Required. Represents how the values are related to the key. Valid operators are In, NotIn, Exists, DoesNotExist, Gt, and Lt.
[9] An array of string values.
If the operator is In or NotIn, the values array must be non-empty.
If the operator is Exists or DoesNotExist, the values array must be empty.
If the operator is Gt or Lt, the values array must have a single element, which will be interpreted as an integer.

An example CR snippet:

spec:
  podTemplate:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: "node-role.kubernetes.io/kafka-connect"
              operator: In
              values: ["true"]

Configure node affinity with preferredDuringSchedulingIgnoredDuringExecution

To create the node affinity rules to match node selector expressions or node selector fields, Configure the Confluent components CRs as follows and create or update the resource. Set either matchExpressions or matchFields.

spec:
  podTemplate:
    affinity:
      nodeAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - preference:            --- [1]
            matchExpressions:    --- [2]
              key:               --- [3]
              operator:          --- [4]
              values:            --- [5]
            matchFields:         --- [6]
            - key:               --- [7]
              operator:          --- [8]
              values:            --- [9]
          weight:                --- [10]

[1] Required. A node selector term, associated with the corresponding weight.
[2] A list of node selector requirements by node’s labels.
[3] Required. The label key that the selector applies to.
[4] Required. Represents how the key is related to the values. Valid operators are In, NotIn, Exists, DoesNotExist, Gt, and Lt.
[5] An array of string values.
If the operator is In or NotIn, the values array must be non-empty.
If the operator is Exists or DoesNotExist, the values array must be empty.
If the operator is Gt or Lt, the values array must have a single element, which will be interpreted as an integer.
[6] A list of node selector requirements.
[7] Required, The label key that the selector applies to.
[8] Required. Represents how the key is related to the values. Valid operators are In, NotIn, Exists, DoesNotExist, Gt, and Lt.
[9] An array of string values.
If the operator is In or NotIn, the values array must be non-empty.
If the operator is Exists or DoesNotExist, the values array must be empty.
If the operator is Gt or Lt, the values array must have a single element, which will be interpreted as an integer.
[10] Required. Weight associated with matching the corresponding nodeSelectorTerm in the range 1 - 100.

Configure pod affinity and anti-affinity for Confluent components

This section describes how to configure pod affinity and pod anti-affinity scheduling rules, for example, whether to co-locate or to avoid putting a pod in the same node or zone as some other pods already running on the node.

Use topologyKey to specify the nodes to co-locate or avoid scheduling pods. The label on the nodes on which selected pods are running must match the topologyKey.

topologyKey is typically used to indicate the granularity at which the affinity rule is applied.

For example, if you do not want to run Kafka with some other applications on the same node, set the topologyKey to kubernetes.io/hostname to specify that the Kafka pods should not be scheduled on nodes with the same hostname.

For another example, if you set the topologyKey to topology.kubernetes.io/zone for an anti-affinity rule, and a node in zone us-central1-a is running a Kafka pod. The scheduler looks for a node in a different zone with the node label value of topology.kubernetes.io/zone is not us-central1-a.

Two types of pod affinity/anti-affinity rules are supported:

requiredDuringSchedulingIgnoredDuringExecution
A pod must satisfy these rules to be scheduled onto a node.
preferredDuringSchedulingIgnoredDuringExecution
The scheduler prefers to schedule the pods to nodes that satisfy the specified affinity expression, but it may choose a node that violates one or more of the expressions.
The node with the greatest sum of weights is most preferred.

Pod affinity and anti-affinity apply within a namespace as pods are namespaced.

Configure pod affinity with preferredDuringSchedulingIgnoredDuringExecution

Configure the affinity property of the Confluent components CRs as follows and create or update the resources.

spec:
  podTemplate:
    affinity:
      podAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:     --- [1]
              namespaces:        --- [2]
              labelSelector:     --- [3]
              topologyKey:       --- [4]
          weight:                --- [5]

[1] Required. A pod affinity term, associated with the corresponding weight.
[2] The namespaces the labelSelector applies to (matches against). Null or empty list means the namespace of this pod.
[3] A query used to find matching pods. See Pod label selector for detail.
[4] Required. The label of the cluster node.
This pod should be co-located (affinity) or not co-located (anti-affinity) with the pods matching the labelSelector in the specified namespaces. Co-located is defined as running on a node whose value of the label with topologyKey matches that of any node on which any of the selected pods are running.
An empty topologyKey is not allowed.
[5] Required. The weight associated with matching the corresponding podAffinityTerm in the range 1 - 100.

Configure pod affinity and pod anti-affinity with requiredDuringSchedulingIgnoredDuringExecution

Configure the affinity property of the Confluent components CRs as follows and create or update the resources.

spec:
  podTemplate:
    affinity:
      podAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:      --- [1]
          namespaces:           --- [2]
          topologyKey:          --- [3]

[1] A query used to find matching pods. See Pod label selector for detail.
[2] The namespaces the labelSelector applies to (matches against). Null or empty list means the namespace of this pod.
[3] Required. The label of the cluster node.
This pod should be co-located (affinity) or not co-located (anti-affinity) with the pods matching the labelSelector in the specified namespaces.
An empty topologyKey is not allowed.

Pod label selector

Use labelSeletor in pod affinity, pod anti-affinity, and pod topology spread constraints to find matching pods to apply the constraints.

labelSelector:
  matchExpressions:    --- [1]
  - key:               --- [2]
    operator:          --- [3]
    values:            --- [4]
  matchLabels:         --- [5]

[1] A list of label selector requirements. The requirements are ANDed.
A selector contains values, a key, and an operator that relates the key and values.
[2] Required. The label key that the selector applies to.
[3] Required. Represents how the key is related to a set of values. Valid operators are In, NotIn, Exists`, and ``DoesNotExist.
[4] An array of string values.
If the operator is In or NotIn, the values array must be non-empty.
If the operator is Exists or DoesNotExist, the values array must be empty.
[5] A map of {key,value} pairs. The requirements are ANDed.

CFK adds the following labels you can use in labelSelector:

Label key	Label value
app	<cluster name>
clusterId	operator
confluent-platform	“true”
platform.confluent.io/type	<CR type>
type	<cluster name>

An example CR snippet to not schedule two Connect apps onto the same host:

spec:
  podTemplate:
    affinity:
      podAntiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - connect
            topologyKey: "kubernetes.io/hostname"

Configure pod topology spread constraints

TopologySpreadConstraints describes how matching Confluent pods should be scheduled across topology domains.

All topologySpreadConstraints are ANDed.

Specify the following in the Confluent component CRs that you want to apply pod topology spread constraints to:

spec:
  podTemplate:
    topologySpreadConstraints:
      - labelSelector:           --- [1]
        maxSkew:                 --- [2]
        topologyKey:             --- [3]
        whenUnsatisfiable:       --- [4]

[1] Used to find matching pods. Pods that match this label selector are counted to determine the number of pods in their corresponding topology domain. See Pod label selector for detail.
[2] Required. Describes the degree to which pods may be unevenly distributed. Default value is 1 for the most even distribution. 0 is not allowed.
[3] Required. The key of node labels. Nodes that have a label with this key and identical values are considered to be in the same topology.
[4] Required. Indicates how to deal with a pod if it doesn’t satisfy the spread constraint. Allowed values are DoNotSchedule (default) and ScheduleAnyway.

The following example specifies to evenly balance Kafka pod across all worker nodes:

spec:
  podTemplate:
    topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: ScheduleAnyway
        labelSelector:
          matchLabels:
            type: kafka-pods

Configure one replica per node

The oneReplicaPerNode property enforces to run one pod per node through the pod anti-affinity capability, assigning a dedicated node for the pod workload.

The oneReplicaPerNode is disabled by default.

When this property is enabled (oneReplicaPerNode: true), the pod anti-affinities are disabled.

When you change this property in an existing cluster, the cluster will roll.

Configure the oneReplicaPerNode property for the Confluent CRs as follows and create or update the resources.

spec:
  oneReplicaPerNode:    # Set it to true or false.