Configure Pod Scheduling for Confluent Platform Using Confluent for Kubernetes
To get the optimal performance out of Confluent components, you can control how the component pods are scheduled on Kubernetes nodes. For example, you can configure pods not to be scheduled on the same node as other resource intensive applications, pods to be scheduled on dedicated nodes, or pods to be scheduled on the nodes with the most suitable hardware.
To avoid components competing for the same resources, such as storage or network, and causing performance degradation in both pods, schedule component pods in a way that avoids sharing nodes with other critical workloads.
Confluent for Kubernetes supports the following features for scheduling Confluent component pods on Kubernetes nodes:
- Taints and tolerations
Taints and tolerations work together to ensure that pods are not scheduled onto specific nodes in Kubernetes.
Taints are applied to a node, and tolerations are applied to pods. Only the pods that do not declare a toleration for the taint cannot be scheduled to that node.
- Affinity and anti-affinity
A taint and toleration make sure that pods are not scheduled onto the nodes, but the features do not guarantee pods get scheduled onto the node where you intend them to run.
To schedule the pods you do want on the right node, use another Kubernetes feature, affinity and anti-affinity. Confluent for Kubernetes supports the following types of Kubernetes affinity features:
Node affinity
The node affinity feature allows you to schedule workloads (pods) onto specific nodes, for example, to optimize for various resources, such as storage, CPU, and networking.
With Confluent for Kubernetes, you can create node affinity rules to specify which Kubernetes nodes your Confluent pods are eligible to be scheduled on. For example, if you have special hardware that you want to use to run one or more Confluent pods, use node affinity to pin those pods to the special hardware nodes.
Pod affinity and anti-affinity
The pod affinity feature allows you to specify that a pod should be scheduled on the same node as another pod.
The pod anti-affinity feature allows you to specify that a set of pods should be scheduled away from one another on different nodes, availability zones, or several other potential topology domains. For example, you can use pod anti-affinity to ensure that the Kafka brokers do not share nodes with other resource-intensive or critical workloads.
The affinity and anti-affinity rules apply at the namespace level.
- Pod topology spread constraints
Using pod topology spread constraints, you can control the distribution of your pods across nodes, zones, regions, or other user-defined topology domains, achieving high availability and efficient cluster resource utilization.
You first label nodes to provide topology information, such as regions, zones, and nodes. Then in Confluent component CRs, you define spread constraints, such as which pods to group together, which topology domains they are spread among, and the acceptable skew.
Pod topology constraints only apply to pods within the same namespace.
- One replica per node
You can configure CFK to enforce only one replica to run on one Kubernetes node.
This scheduling method guarantees that Confluent Platform components do not compete for system resources with other applications running on the same node.
The rule applies at the namespace level and only to the replicas from the same cluster.
Confluent recommends that you enable the one replica per node setting for Kafka and ZooKeeper (Confluent Platform 7.9 and earlier only).
For other Confluent Platform components, you can enable the one replica per node setting for availability, but the setting is not required for reliability.
Configure taint and toleration for Confluent components
Configure taint
Taint the nodes using the kubectl taint nodes command as described in Taint
and Toleration.
Configure toleration
When a pod is configured with a toleration, the pod “tolerates” the taint that
matches the triple <key,value,effect> using the matching operator
<operator>.
When a pod tolerates a node taint, the pod can be scheduled to the node.
Configure the tolerations property of the Confluent component CRs as follows, and create or update the resource.
spec:
podTemplate:
tolerations:
- effect: --- [1]
key: --- [2]
operator: --- [3]
value: --- [4]
tolerationSeconds: --- [5]
[1] Indicates what to do with intolerant pods. Allowed values are
NoSchedule,PreferNoSchedule,NoExecute.When set to an empty value, this toleration matches all taint effects.
[2] The taint key that the toleration applies to.
If the
keyis empty,operatormust be set toExists. This combination means to match all values and all keys.[3] The match operator to compare key to the value. Allowed operators are
ExistsandEqual, andEqualis the default.Existsis equivalent to wildcard for value, so that a pod can tolerate all taints of thekey.[4] The taint value the toleration matches to. If the
operatorisExists, the value must be empty.[5] The period of time the toleration tolerates the taint. Only applies when
effectis set toNoExecute. Otherwise, this field is ignored.By default, it is not set, which means to tolerate the taint forever (do not evict).
Zero and negative values will be treated as
0(evict immediately).
For example, for the following node taint:
kubectl taint nodes node1 myKey=myValue:NoSchedule
The following CR snippet matches the taint created above, and the pod can be schedule onto the node:
spec:
podTemplate:
tolerations:
- key: "myKey"
operator: "Equal"
value: "myValue"
effect: "NoSchedule"
For more information on the fields, see Taints and tolerations.
Configure node affinity for Confluent components
This section describes how to configure node affinity scheduling rules for Confluent component pods.
Using node affinity, you specify which nodes Confluent component pods can run on
based on labels on the nodes (matchExpressions) or on node fields
(matchFields). Each match requirement is a <key, operator, value>
triple.
Two types of node affinity rules are supported:
requiredDuringSchedulingIgnoredDuringExecutionA pod must satisfy these rules to be scheduled onto a node.
preferredDuringSchedulingIgnoredDuringExecutionThe scheduler prefers to schedule the pods to the nodes that satisfy the specified affinity expression, but it may choose a node that violates one or more of the expressions.
The node with the greatest sum of weights is most preferred.
An empty preferred scheduling term matches all objects with implicit weight 0.
A null preferred scheduling term matches no objects.
Configure node affinity with requiredDuringSchedulingIgnoredDuringExecution
The pod can be scheduled to a node if one of the nodeSelectorTerms matches.
A nodeSelectorTerms item is a match if all matchExpressions match.
To create the node affinity rules to match node labels, configure the
nodeAffinity property of the Confluent components CRs as follows and create
or update the resource. Set either matchExpressions or matchFields.
spec:
podTemplate:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms: --- [1]
- matchExpressions: --- [2]
- key: --- [3]
operator: --- [4]
values: --- [5]
matchFields: --- [6]
- key: --- [7]
operator: --- [8]
values: --- [9]
[1] Required. A list of node selector terms.
[2] A list of node selector requirements by node’s labels.
A node selector requirement is a selector that contains
values, akey, and anoperatorthat relates thekeyandvalues.[3] Required. The label key that the selector applies to.
[4] Required. Represents how the
keyis related to thevalues. ValidoperatorsareIn,NotIn,Exists,DoesNotExist,Gt, andLt.[5] An array of string values.
If the
operatorisInorNotIn, thevaluesarray must be non-empty.If the
operatorisExistsorDoesNotExist, the values array must be empty.If the
operatorisGtorLt, thevaluesarray must have a single element, which will be interpreted as an integer.[6] A list of node selector requirements by node’s fields.
A node selector requirement is a selector that contains
values, akey, and anoperatorthat relates thekeyandvalues.[7] Required. The field key that the selector applies to.
[8] Required. Represents how the
valuesare related to thekey. Valid operators areIn,NotIn,Exists,DoesNotExist,Gt, andLt.[9] An array of string values.
If the
operatorisInorNotIn, thevaluesarray must be non-empty.If the
operatorisExistsorDoesNotExist, thevaluesarray must be empty.If the
operatorisGtorLt, the values array must have a single element, which will be interpreted as an integer.
An example CR snippet:
spec:
podTemplate:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "node-role.kubernetes.io/kafka-connect"
operator: In
values: ["true"]
Configure node affinity with preferredDuringSchedulingIgnoredDuringExecution
To create the node affinity rules to match node selector expressions or node
selector fields, Configure the Confluent components CRs as follows and create
or update the resource. Set either matchExpressions or matchFields.
spec:
podTemplate:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference: --- [1]
matchExpressions: --- [2]
key: --- [3]
operator: --- [4]
values: --- [5]
matchFields: --- [6]
- key: --- [7]
operator: --- [8]
values: --- [9]
weight: --- [10]
[1] Required. A node selector term, associated with the corresponding weight.
[2] A list of node selector requirements by node’s labels.
[3] Required. The label key that the selector applies to.
[4] Required. Represents how the
keyis related to thevalues. Valid operators areIn,NotIn,Exists,DoesNotExist,Gt, andLt.[5] An array of string values.
If the
operatorisInorNotIn, thevaluesarray must be non-empty.If the
operatorisExistsorDoesNotExist, thevaluesarray must be empty.If the
operatorisGtorLt, the values array must have a single element, which will be interpreted as an integer.[6] A list of node selector requirements.
[7] Required, The label key that the selector applies to.
[8] Required. Represents how the
keyis related to thevalues. Valid operators areIn,NotIn,Exists,DoesNotExist,Gt, andLt.[9] An array of string values.
If the
operatorisInorNotIn, thevaluesarray must be non-empty.If the
operatorisExistsorDoesNotExist, thevaluesarray must be empty.If the
operatorisGtorLt, the values array must have a single element, which will be interpreted as an integer.[10] Required. Weight associated with matching the corresponding
nodeSelectorTermin the range1-100.
Configure pod affinity and anti-affinity for Confluent components
This section describes how to configure pod affinity and pod anti-affinity scheduling rules, for example, whether to co-locate or to avoid putting a pod in the same node or zone as some other pods already running on the node.
Use topologyKey to specify the nodes to co-locate or avoid scheduling pods.
The label on the nodes on which selected pods are running must match the
topologyKey.
topologyKey is typically used to indicate the granularity at which the
affinity rule is applied.
For example, if you do not want to run Kafka with some other applications on the
same node, set the topologyKey to kubernetes.io/hostname to specify that
the Kafka pods should not be scheduled on nodes with the same hostname.
For another example, if you set the topologyKey to
topology.kubernetes.io/zone for an anti-affinity rule, and a node in zone
us-central1-a is running a Kafka pod. The scheduler looks for a node in a
different zone with the node label value of topology.kubernetes.io/zone is
not us-central1-a.
Two types of pod affinity/anti-affinity rules are supported:
requiredDuringSchedulingIgnoredDuringExecutionA pod must satisfy these rules to be scheduled onto a node.
preferredDuringSchedulingIgnoredDuringExecutionThe scheduler prefers to schedule the pods to nodes that satisfy the specified affinity expression, but it may choose a node that violates one or more of the expressions.
The node with the greatest sum of weights is most preferred.
Pod affinity and anti-affinity apply within a namespace as pods are namespaced.
Configure pod affinity with preferredDuringSchedulingIgnoredDuringExecution
Configure the affinity property of the Confluent components CRs as follows
and create or update the resources.
spec:
podTemplate:
affinity:
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm: --- [1]
namespaces: --- [2]
labelSelector: --- [3]
topologyKey: --- [4]
weight: --- [5]
[1] Required. A pod affinity term, associated with the corresponding weight.
[2] The namespaces the
labelSelectorapplies to (matches against). Null or empty list means the namespace of this pod.[3] A query used to find matching pods. See Pod label selector for detail.
[4] Required. The label of the cluster node.
This pod should be co-located (affinity) or not co-located (anti-affinity) with the pods matching the
labelSelectorin the specified namespaces. Co-located is defined as running on a node whose value of the label withtopologyKeymatches that of any node on which any of the selected pods are running.An empty
topologyKeyis not allowed.[5] Required. The weight associated with matching the corresponding
podAffinityTermin the range1-100.
Configure pod affinity and pod anti-affinity with requiredDuringSchedulingIgnoredDuringExecution
Configure the affinity property of the Confluent components CRs as follows
and create or update the resources.
spec:
podTemplate:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector: --- [1]
namespaces: --- [2]
topologyKey: --- [3]
[1] A query used to find matching pods. See Pod label selector for detail.
[2] The namespaces the
labelSelectorapplies to (matches against). Null or empty list means the namespace of this pod.[3] Required. The label of the cluster node.
This pod should be co-located (affinity) or not co-located (anti-affinity) with the pods matching the
labelSelectorin the specified namespaces.An empty
topologyKeyis not allowed.
Pod label selector
Use labelSeletor in pod affinity, pod anti-affinity, and pod topology spread
constraints to find matching pods to apply the constraints.
labelSelector:
matchExpressions: --- [1]
- key: --- [2]
operator: --- [3]
values: --- [4]
matchLabels: --- [5]
[1] A list of label selector requirements. The requirements are ANDed.
A selector contains values, a key, and an operator that relates the key and values.
[2] Required. The label key that the selector applies to.
[3] Required. Represents how the key is related to a set of values. Valid operators are
In,NotIn,Exists`, and ``DoesNotExist.[4] An array of string values.
If the operator is
InorNotIn, thevaluesarray must be non-empty.If the operator is
ExistsorDoesNotExist, thevaluesarray must be empty.[5] A map of {key,value} pairs. The requirements are ANDed.
CFK adds the following labels you can use in labelSelector:
Label key |
Label value |
|---|---|
app |
<cluster name> |
clusterId |
operator |
confluent-platform |
“true” |
platform.confluent.io/type |
<CR type> |
type |
<cluster name> |
An example CR snippet to not schedule two Connect apps onto the same host:
spec:
podTemplate:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- connect
topologyKey: "kubernetes.io/hostname"
Configure pod topology spread constraints
TopologySpreadConstraints describes how matching Confluent pods should be
scheduled across topology domains.
All topologySpreadConstraints are ANDed.
Specify the following in the Confluent component CRs that you want to apply pod topology spread constraints to:
spec:
podTemplate:
topologySpreadConstraints:
- labelSelector: --- [1]
maxSkew: --- [2]
topologyKey: --- [3]
whenUnsatisfiable: --- [4]
[1] Used to find matching pods. Pods that match this label selector are counted to determine the number of pods in their corresponding topology domain. See Pod label selector for detail.
[2] Required. Describes the degree to which pods may be unevenly distributed. Default value is
1for the most even distribution.0is not allowed.[3] Required. The key of node labels. Nodes that have a label with this key and identical values are considered to be in the same topology.
[4] Required. Indicates how to deal with a pod if it doesn’t satisfy the spread constraint. Allowed values are
DoNotSchedule(default) andScheduleAnyway.
The following example specifies to evenly balance Kafka pod across all worker nodes:
spec:
podTemplate:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
type: kafka-pods
Configure one replica per node
The oneReplicaPerNode property enforces to run one pod per node through the
pod anti-affinity capability, assigning a dedicated node for the pod workload.
The oneReplicaPerNode is disabled by default.
When this property is enabled (oneReplicaPerNode: true), the pod
anti-affinities are disabled.
When you change this property in an existing cluster, the cluster will roll.
Configure the oneReplicaPerNode property for the Confluent CRs as follows
and create or update the resources.
spec:
oneReplicaPerNode: # Set it to true or false.