Workloads Scheduling with Confluent for Kubernetes

Overview

One of the things you can do to get the optimal performance out of Confluent components is to control how the component pods are scheduled on Kubernetes nodes. For example, you can configure pods not to be scheduled on the same node as other resource intensive applications, pods to be scheduled on dedicated nodes, or pods to be scheduled on the nodes with the most suitable hardware.

To avoid components competing for the same resources, such as storage or network, and causing performance degradation in both pods, schedule component pods in a way that avoids sharing nodes with other critical workloads.

Confluent for Kubernetes supports the following features for scheduling Confluent component pods on Kubernetes nodes:

Taints and tolerations

Taints and tolerations work together to ensure that pods are not scheduled onto specific nodes in Kubernetes.

Taints are applied to a node, and tolerations are applied to pods. Only the pods that do not declare a toleration for the taint cannot be scheduled to that node.

Affinity and anti-affinity

A taint and toleration make sure that pods are not scheduled onto the nodes, but the features do not guarantee pods get scheduled onto the node where you intend them to run.

To schedule the pods you do want on the right node, use another Kubernetes feature, affinity and anti-affinity. Confluent for Kubernetes supports the following types of Kubernetes affinity features:

  • Node affinity
  • Pod affinity and anti-affinity

Node affinity

The node affinity feature allows you to schedule workloads (pods) onto specific nodes, for example, to optimize for various resources, such as storage, CPU, and networking.

With Confluent for Kubernetes, you can create node affinity rules to specify which Kubernetes nodes your Confluent pods are eligible to be scheduled on. For example, if you have special hardware that you want to use to run one or more Confluent pods, use node affinity to pin those pods to the special hardware nodes.

Pod affinity and pod anti-affinity

Pod affinity allows you to specify that a pod should be scheduled on the same node as another pod.

Pod anti-affinity allows you to specify a pod to be scheduled on a different node from another pod. For example, you can use pod anti-affinity to ensure that the Kafka brokers do not share nodes with other resource-intensive or critical workloads.

The affinity and anti-affinity rules apply at the namespace level.

One replica per node

You can configure CFK to enforce only one replica to run on one Kubernetes node.

This scheduling method guarantees that Confluent Platform components do not compete for system resources with other applications running on the same node.

This rule applies at the namespace level and only to the replicas from the same cluster.

Confluent recommends that you enable the one replica per node setting for Kafka and ZooKeeper, and disable the one replica per node setting for other Confluent Platform components.

Bin packing

Confluent recommends running ZooKeeper and Kafka on individual Kubernetes nodes.

You can bin pack other Confluent components. Bin packing places component tasks on nodes in the cluster that have the least remaining CPU and memory capacity. Bin packing maximizes node utilization and can reduce the number of nodes required.

Bin packing is disabled by default at the namespace level. You can enable bin packing by setting the oneReplicaPerNode: false property in the component custom resource (CR).

Important

Bin packing components is not recommended for production deployments.

Configure taint and toleration for Confluent components

Configure taint

Taint the nodes using the kubectl taint nodes command as described in Taint and Toleration.

Configure toleration

When a pod is configured with a toleration, the pod “tolerates” the taint that matches the triple <key,value,effect> using the matching operator <operator>.

When a pod tolerates a node taint, the pod can be scheduled to the node.

Configure the tolerations property of the Confluent component CRs as follows, and create or update the resource.

spec:
  podTemplate:
    tolerations:
    - effect:            # Indicates what to do with intolerant pods.
                         # Allowed values are: NoSchedule,
                         # PreferNoSchedule, NoExecute.
                         # When set to an empty value, this toleration
                         # matches all taint effects.

      key:               # The taint key that the toleration applies to.
                         # An empty key matches all taint keys.
                         # If the key is empty, operator must be set to
                         # Exists; this combination means to match all
                         # values and all keys.

      operator:          # The match operator to compare key to the
                         # value.
                         # Allowed operators are: Exists and Equal
                         # Equal is the default.
                         # Exists is equivalent to wildcard for
                         # value, so that a pod can tolerate all taints
                         # of the key.

      value:             # The taint value the toleration matches to.
                         # If the operator is Exists, the value must be
                         # empty.

      tolerationSeconds: # The period of time the toleration
                         # tolerates the taint. Only applies when
                         # effect is set to NoExecute.
                         # Otherwise this field is ignored.
                         # By default, it is not set, which means
                         # to tolerate the taint forever (do not evict).
                         # Zero and negative values will be treated as
                         # 0 (evict immediately).

For example, for the following node taint:

kubectl taint nodes node1 myKey=myValue:NoSchedule

The following CR snippet matches the taint created above, and the pod can be schedule onto the node:

spec:
  podTemplate:
    tolerations:
    - key: "myKey"
      operator: "Equal"
      value: "myValue"
      effect: "NoSchedule"

For more information on the fields, see Taints and tolerations.

Configure node affinity for Confluent components

This section describes how to configure node affinity scheduling rules for Confluent component pods.

Using node affinity, you specify which nodes Confluent component pods can run on based on labels on the nodes (matchExpressions) or on node fields (matchFields). Each match requirement is a <key, operator, value> triple.

Two types of node affinity rules are supported:

  • requiredDuringSchedulingIgnoredDuringExecution

    A pod must satisfy these rules to be scheduled onto a node.

  • preferredDuringSchedulingIgnoredDuringExecution

    The scheduler prefers to schedule the pods to the nodes that satisfy the specified affinity expression, but it may choose a node that violates one or more of the expressions.

    The node with the greatest sum of weights is most preferred.

    An empty preferred scheduling term matches all objects with implicit weight 0.

    A null preferred scheduling term matches no objects.

Configure node affinity with requiredDuringSchedulingIgnoredDuringExecution

The pod can be scheduled to a node if one of the nodeSelectorTerms matches.

A nodeSelectorTerms item is a match if all matchExpressions match.

To create the node affinity rules to match node labels, configure the nodeAffinity property of the Confluent components CRs as follows and create or update the resource. Set either matchExpressions or matchFields.

spec:
  podTemplate:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:

          nodeSelectorTerms:    # Required. A list of node selector terms.

          - matchExpressions:   # A list of node selector requirements by
                                # node's labels.
                                # A node selector requirement is a
                                # selector that contains values, a key,
                                # and an operator that relates
                                # the key and values.

            - key:              # Required. The label key that the
                                # selector applies to.

              operator:         # Required. Represents a key's
                                # relationship to a set of values.
                                # Valid operators are In, NotIn,
                                # Exists, DoesNotExist. Gt, and Lt.

              values:           # An array of string values. If the
                                # operator is In or NotIn, the values
                                # array must be non-empty.
                                # If the  operator is Exists or
                                # DoesNotExist, the values array must be
                                # empty.
                                # If the  operator is Gt or Lt, the values
                                # array must have a single element, which
                                # will be interpreted as an integer. This
                                # array is replaced during a strategic
                                # merge patch.

            matchFields:        # A list of node selector requirements
                                # by node's fields.
                                # A node selector requirement is a
                                # selector that contains values, a key,
                                # and an operator that relates the key
                                # and values.

            - key:              # Required. The field key that the selector
                                # applies to.

              operator:         # Required. Represents a key's
                                # relationship to a set of values.
                                # Valid operators are In, NotIn,
                                # Exists, DoesNotExist. Gt, and Lt.

              values:           # An array of string values.
                                # If the operator is In or NotIn, the values
                                # array must be non-empty.
                                # If the operator is Exists or DoesNotExist,
                                # the values array must be empty.
                                # If the operator is Gt or Lt, the values
                                # array must have a single element, which
                                # will be interpreted as an integer. This
                                # array is replaced during a strategic
                                # merge patch.

An example CR snippet:

spec:
  podTemplate:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: "node-role.kubernetes.io/kafka-connect"
              operator: In
              values: ["true"]

Configure node affinity with preferredDuringSchedulingIgnoredDuringExecution

To create the node affinity rules to match node selector expressions or node selector fields, Configure the Confluent components CRs as follows and create or update the resource. Set either matchExpressions or matchFields.

spec:
  podTemplate:
    affinity:
      nodeAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:

        - preference:         # Required. A node selector term,
                              # associated with the corresponding weight.

            matchExpressions: # A list of node selector
                              # requirements by node's labels.

              key:            # Required. The label key that the selector
                              # applies to.

              operator:       # Required, Represents a key's
                              # relationship to a set of values. Valid
                              # operators are In, NotIn, Exists,
                              # DoesNotExist. Gt, and Lt.

              values:         # An array of string values. If the
                              # operator is In or NotIn, the values
                              # array must be non-empty. If the
                              # operator is Exists or DoesNotExist, the
                              # values array must be empty. If the
                              # operator is Gt or Lt, the values array
                              # must have a single element, which will
                              # be interpreted as an integer. This
                              # array is replaced during a strategic
                              # merge patch.

            matchFields:      # A list of node selector requirements by
                              # node's fields.

            - key:            # Required, The label key that the
                              # selector applies to.

              operator:       # Required. Represents a key's
                              # relationship to a set of values. Valid
                              # operators are In, NotIn, Exists,
                              # DoesNotExist. Gt, and Lt.

              values:         # An array of string values.
                              # If the operator is In or NotIn, the values
                              # array must be non-empty.
                              # If the operator is Exists or DoesNotExist,
                              # the values array must be empty.
                              # If the operator is Gt or Lt, the values
                              # array must have a single element, which
                              # will be interpreted as an integer. This
                              # array is replaced during a strategic
                              # merge patch.

          weight:             # Required. Weight associated with matching
                              # the corresponding nodeSelectorTerm,
                              # the range 1-100.

Configure pod affinity and anti-affinity for Confluent components

This section describes how to configure pod affinity and pod anti-affinity scheduling rules, for example, whether to co-locate or to avoid putting a pod in the same node or zone as some other pods already running on the node.

Use topologyKey to specify the nodes to co-locate or avoid scheduling pods. The label on the nodes on which selected pods are running must match the topologyKey.

topologyKey is typically used to indicate the granularity at which the affinity rule is applied.

For example, if you do not want to run Kafka with some other applications on the same node, set the topologyKey to kubernetes.io/hostname to specify that the Kafka pods should not be scheduled on nodes with the same hostname.

For another example, if you set the topologyKey to topology.kubernetes.io/zone for an anti-affinity rule, and a node in zone us-central1-a is running a Kafka pod. The scheduler looks for a node in a different zone with the node label value of topology.kubernetes.io/zone is not us-central1-a.

Two types of pod affinity/anti-affinity rules are supported:

  • requiredDuringSchedulingIgnoredDuringExecution

    A pod must satisfy these rules to be scheduled onto a node.

  • preferredDuringSchedulingIgnoredDuringExecution

    The scheduler prefers to schedule the pods to nodes that satisfy the specified affinity expression, but it may choose a node that violates one or more of the expressions.

    The node with the greatest sum of weights is most preferred.

Pod affinity and anti-affinity apply within a namespace as pods are namespaced.

Configure pod affinity with preferredDuringSchedulingIgnoredDuringExecution

Configure the affinity property of the Confluent components CRs as follows and create or update the resources.

spec:
  podTemplate:
    affinity:
      podAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - podAffinityTerm:       # Required. A pod affinity term,
                                 # associated with the corresponding
                                 # weight.

            labelSelector:       # A label query over a set of pods.

              matchExpressions:  # matchExpressions is a list of
                                 # label selector requirements. The
                                 # requirements are ANDed.
                                 # A label selector requirement is
                                 # a selector that contains values,
                                 # a key, and an operator that
                                 # relates the key and values.

              - key:             # Required. key is the label key that
                                 # the selector applies to.

                operator:        # Required. operator represents a key's
                                 # relationship to a set of values. Valid
                                 # operators are In, NotIn, Exists and
                                 # DoesNotExist.

                values:          # values is an array of string values.
                                 # If the operator is In or NotIn, the
                                 # values array must be non-empty.
                                 # If the operator is Exists or
                                 # DoesNotExist, the values array must
                                 # be empty. This array is replaced
                                 # during a strategic merge patch.

              matchLabels:       # matchLabels is a map of {key,value}
                                 # pairs. A single {key,value} in the
                                 # matchLabels map is equivalent to an
                                 # element of matchExpressions, whose key
                                 # field is "key", the operator is "In",
                                 # and the values array contains only
                                 # "value". The requirements are ANDed.

            topologyKey:         # Required. This pod should be
                                 # co-located (affinity) or not
                                 # co-located (anti-affinity) with the
                                 # pods matching the labelSelector in
                                 # the specified namespaces, where
                                 # co-located is defined as running on a
                                 # node whose value of the label with
                                 # key topologyKey matches that of any
                                 # node on which any of the selected
                                 # pods are running. Empty topologyKey
                                 # is not allowed.

          weight:                # Required. weight associated with matching
                                 # matching the corresponding
                                 # podAffinityTerm, in therange 1-100.

Configure pod affinity and pod anti-affinity with requiredDuringSchedulingIgnoredDuringExecution

Configure the affinity property of the Confluent components CRs as follows and create or update the resources.

spec:
  podTemplate:
    affinity:
      podAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
           - labelSelector:       # A label query over a set of
                                  # resources, in this case pods.

               matchExpressions:  # A list of label selector requirements.
                                  # The requirements are ANDed.
                                  # A label selector requirement is a
                                  # selector that contains values, a
                                  # key, and an operator that relates
                                  # the key and values.

              - key:              # Required. key is the label key that
                                  # the selector applies to.

                operator:         # Required. operator represents a key's
                                  # relationship to a set of values.
                                  # Valid operators are In, NotIn, Exists
                                  # and DoesNotExist.

                values:           # values is an array of string values.
                                  # If the operator is In or NotIn, the
                                  # values array must be non-empty.
                                  # If the operator is Exists or
                                  # DoesNotExist, the values array must be
                                  # empty. This array is replaced during a
                                  # strategic merge patch.
                - type

              matchLabels:        # matchLabels is a map of {key,value}
                                  # pairs. A single {key,value} in the
                                  # matchLabels map is equivalent to an
                                  # element of matchExpressions, whose key
                                  # field is "key", the operator is "In",
                                  # and the values array contains only
                                  # "value". The requirements are ANDed

          namespaces:             # Specifies which namespaces
                                  # the labelSelector applies to (matches
                                  # against); null or empty list means
                                  # "this pod's namespace

          topologyKey:            # Required. Label of the cluster node.
                                  # This pod should be
                                  # co-located (affinity) or not
                                  # co-located (anti-affinity) with the
                                  # pods matching the labelSelector in the
                                  # specified namespaces.
                                  # An empty topologyKey is not allowed.

CFK adds the following labels you can use in labelSelector:

Label key Label value
app <cluster name>
clusterId operator
onfluent-platform “true”
platform.confluent.io/type <CR type>
type <cluster name>

An example CR snippet to not schedule two Connect apps onto the same host:

spec:
  podTemplate:
    affinity:
      podAntiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - connect
            topologyKey: "kubernetes.io/hostname"

Configure one replica per node

The OneReplicaPerNode property enforces to run one pod per node through the pod anti-affinity capability, assigning a dedicated node for the pod workload.

The OneReplicaPerNode is enabled by default.

When this property is enabled, the pod anti-affinities are disabled.

When you change this property in an existing cluster, the cluster will roll.

Configure the oneReplicaPerNode property for Confluent CRs as follows and create or update the resources.

spec:
  oneReplicaPerNode:    # Set it to true or false.