Workloads Scheduling with Confluent for Kubernetes¶
Overview¶
One of the things you can do to get the optimal performance out of Confluent components is to control how the component pods are scheduled on Kubernetes nodes. For example, you can configure pods not to be scheduled on the same node as other resource intensive applications, pods to be scheduled on dedicated nodes, or pods to be scheduled on the nodes with the most suitable hardware.
To avoid components competing for the same resources, such as storage or network, and causing performance degradation in both pods, schedule component pods in a way that avoids sharing nodes with other critical workloads.
Confluent for Kubernetes supports the following features for scheduling Confluent component pods on Kubernetes nodes:
- Taints and tolerations
- Affinity and anti-affinity
- Pod topology spread constraints
- One replica per node
Taints and tolerations¶
Taints and tolerations work together to ensure that pods are not scheduled onto specific nodes in Kubernetes.
Taints are applied to a node, and tolerations are applied to pods. Only the pods that do not declare a toleration for the taint cannot be scheduled to that node.
Affinity and anti-affinity¶
A taint and toleration make sure that pods are not scheduled onto the nodes, but the features do not guarantee pods get scheduled onto the node where you intend them to run.
To schedule the pods you do want on the right node, use another Kubernetes feature, affinity and anti-affinity. Confluent for Kubernetes supports the following types of Kubernetes affinity features:
- Node affinity
- Pod affinity and anti-affinity
Node affinity¶
The node affinity feature allows you to schedule workloads (pods) onto specific nodes, for example, to optimize for various resources, such as storage, CPU, and networking.
With Confluent for Kubernetes, you can create node affinity rules to specify which Kubernetes nodes your Confluent pods are eligible to be scheduled on. For example, if you have special hardware that you want to use to run one or more Confluent pods, use node affinity to pin those pods to the special hardware nodes.
Pod affinity and pod anti-affinity¶
The pod affinity feature allows you to specify that a pod should be scheduled on the same node as another pod.
The pod anti-affinity feature allows you to specify that a set of pods should be scheduled away from one another on different nodes, availability zones, or several other potential topology domains. For example, you can use pod anti-affinity to ensure that the Kafka brokers do not share nodes with other resource-intensive or critical workloads.
The affinity and anti-affinity rules apply at the namespace level.
Pod topology spread constraints¶
Using pod topology spread constraints, you can control the distribution of your pods across nodes, zones, regions, or other user-defined topology domains, achieving high availability and efficient cluster resource utilization.
You first label nodes to provide topology information, such as regions, zones, and nodes. Then in Confluent component CRs, you define spread constraints, such as which pods to group together, which topology domains they are spread among, and the acceptable skew.
Pod topology constraints only apply to pods within the same namespace.
One replica per node¶
You can configure CFK to enforce only one replica to run on one Kubernetes node.
This scheduling method guarantees that Confluent Platform components do not compete for system resources with other applications running on the same node.
The rule applies at the namespace level and only to the replicas from the same cluster.
Confluent recommends that you enable the one replica per node setting for Kafka and ZooKeeper.
For other Confluent Platform components, you can enable the one replica per node setting for availability, but the setting is not required for reliability.
Configure taint and toleration for Confluent components¶
Configure taint¶
Taint the nodes using the kubectl taint nodes
command as described in Taint
and Toleration.
Configure toleration¶
When a pod is configured with a toleration, the pod “tolerates” the taint that
matches the triple <key,value,effect>
using the matching operator
<operator>
.
When a pod tolerates a node taint, the pod can be scheduled to the node.
Configure the tolerations property of the Confluent component CRs as follows, and create or update the resource.
spec:
podTemplate:
tolerations:
- effect: --- [1]
key: --- [2]
operator: --- [3]
value: --- [4]
tolerationSeconds: --- [5]
[1] Indicates what to do with intolerant pods. Allowed values are
NoSchedule
,PreferNoSchedule
,NoExecute
.When set to an empty value, this toleration matches all taint effects.
[2] The taint key that the toleration applies to.
If the
key
is empty,operator
must be set toExists
. This combination means to match all values and all keys.[3] The match operator to compare key to the value. Allowed operators are
Exists
andEqual
, andEqual
is the default.Exists
is equivalent to wildcard for value, so that a pod can tolerate all taints of thekey
.[4] The taint value the toleration matches to. If the
operator
isExists
, the value must be empty.[5] The period of time the toleration tolerates the taint. Only applies when
effect
is set toNoExecute
. Otherwise, this field is ignored.By default, it is not set, which means to tolerate the taint forever (do not evict).
Zero and negative values will be treated as
0
(evict immediately).
For example, for the following node taint:
kubectl taint nodes node1 myKey=myValue:NoSchedule
The following CR snippet matches the taint created above, and the pod can be schedule onto the node:
spec:
podTemplate:
tolerations:
- key: "myKey"
operator: "Equal"
value: "myValue"
effect: "NoSchedule"
For more information on the fields, see Taints and tolerations.
Configure node affinity for Confluent components¶
This section describes how to configure node affinity scheduling rules for Confluent component pods.
Using node affinity, you specify which nodes Confluent component pods can run on
based on labels on the nodes (matchExpressions
) or on node fields
(matchFields
). Each match requirement is a <key, operator, value>
triple.
Two types of node affinity rules are supported:
requiredDuringSchedulingIgnoredDuringExecution
A pod must satisfy these rules to be scheduled onto a node.
preferredDuringSchedulingIgnoredDuringExecution
The scheduler prefers to schedule the pods to the nodes that satisfy the specified affinity expression, but it may choose a node that violates one or more of the expressions.
The node with the greatest sum of weights is most preferred.
An empty preferred scheduling term matches all objects with implicit weight 0.
A null preferred scheduling term matches no objects.
Configure node affinity with requiredDuringSchedulingIgnoredDuringExecution¶
The pod can be scheduled to a node if one of the nodeSelectorTerms
matches.
A nodeSelectorTerms
item is a match if all matchExpressions
match.
To create the node affinity rules to match node labels, configure the
nodeAffinity
property of the Confluent components CRs as follows and create
or update the resource. Set either matchExpressions
or matchFields
.
spec:
podTemplate:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms: --- [1]
- matchExpressions: --- [2]
- key: --- [3]
operator: --- [4]
values: --- [5]
matchFields: --- [6]
- key: --- [7]
operator: --- [8]
values: --- [9]
[1] Required. A list of node selector terms.
[2] A list of node selector requirements by node’s labels.
A node selector requirement is a selector that contains
values
, akey
, and anoperator
that relates thekey
andvalues
.[3] Required. The label key that the selector applies to.
[4] Required. Represents how the
key
is related to thevalues
. Validoperators
areIn
,NotIn
,Exists
,DoesNotExist
,Gt
, andLt
.[5] An array of string values.
If the
operator
isIn
orNotIn
, thevalues
array must be non-empty.If the
operator
isExists
orDoesNotExist
, the values array must be empty.If the
operator
isGt
orLt
, thevalues
array must have a single element, which will be interpreted as an integer.[6] A list of node selector requirements by node’s fields.
A node selector requirement is a selector that contains
values
, akey
, and anoperator
that relates thekey
andvalues
.[7] Required. The field key that the selector applies to.
[8] Required. Represents how the
values
are related to thekey
. Valid operators areIn
,NotIn
,Exists
,DoesNotExist
,Gt
, andLt
.[9] An array of string values.
If the
operator
isIn
orNotIn
, thevalues
array must be non-empty.If the
operator
isExists
orDoesNotExist
, thevalues
array must be empty.If the
operator
isGt
orLt
, the values array must have a single element, which will be interpreted as an integer.
An example CR snippet:
spec:
podTemplate:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "node-role.kubernetes.io/kafka-connect"
operator: In
values: ["true"]
Configure node affinity with preferredDuringSchedulingIgnoredDuringExecution¶
To create the node affinity rules to match node selector expressions or node
selector fields, Configure the Confluent components CRs as follows and create
or update the resource. Set either matchExpressions
or matchFields
.
spec:
podTemplate:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference: --- [1]
matchExpressions: --- [2]
key: --- [3]
operator: --- [4]
values: --- [5]
matchFields: --- [6]
- key: --- [7]
operator: --- [8]
values: --- [9]
weight: --- [10]
[1] Required. A node selector term, associated with the corresponding weight.
[2] A list of node selector requirements by node’s labels.
[3] Required. The label key that the selector applies to.
[4] Required. Represents how the
key
is related to thevalues
. Valid operators areIn
,NotIn
,Exists
,DoesNotExist
,Gt
, andLt
.[5] An array of string values.
If the
operator
isIn
orNotIn
, thevalues
array must be non-empty.If the
operator
isExists
orDoesNotExist
, thevalues
array must be empty.If the
operator
isGt
orLt
, the values array must have a single element, which will be interpreted as an integer.[6] A list of node selector requirements.
[7] Required, The label key that the selector applies to.
[8] Required. Represents how the
key
is related to thevalues
. Valid operators areIn
,NotIn
,Exists
,DoesNotExist
,Gt
, andLt
.[9] An array of string values.
If the
operator
isIn
orNotIn
, thevalues
array must be non-empty.If the
operator
isExists
orDoesNotExist
, thevalues
array must be empty.If the
operator
isGt
orLt
, the values array must have a single element, which will be interpreted as an integer.[10] Required. Weight associated with matching the corresponding
nodeSelectorTerm
in the range1
-100
.
Configure pod affinity and anti-affinity for Confluent components¶
This section describes how to configure pod affinity and pod anti-affinity scheduling rules, for example, whether to co-locate or to avoid putting a pod in the same node or zone as some other pods already running on the node.
Use topologyKey
to specify the nodes to co-locate or avoid scheduling pods.
The label on the nodes on which selected pods are running must match the
topologyKey
.
topologyKey
is typically used to indicate the granularity at which the
affinity rule is applied.
For example, if you do not want to run Kafka with some other applications on the
same node, set the topologyKey
to kubernetes.io/hostname
to specify that
the Kafka pods should not be scheduled on nodes with the same hostname.
For another example, if you set the topologyKey
to
topology.kubernetes.io/zone
for an anti-affinity rule, and a node in zone
us-central1-a
is running a Kafka pod. The scheduler looks for a node in a
different zone with the node label value of topology.kubernetes.io/zone
is
not us-central1-a
.
Two types of pod affinity/anti-affinity rules are supported:
requiredDuringSchedulingIgnoredDuringExecution
A pod must satisfy these rules to be scheduled onto a node.
preferredDuringSchedulingIgnoredDuringExecution
The scheduler prefers to schedule the pods to nodes that satisfy the specified affinity expression, but it may choose a node that violates one or more of the expressions.
The node with the greatest sum of weights is most preferred.
Pod affinity and anti-affinity apply within a namespace as pods are namespaced.
Configure pod affinity with preferredDuringSchedulingIgnoredDuringExecution¶
Configure the affinity
property of the Confluent components CRs as follows
and create or update the resources.
spec:
podTemplate:
affinity:
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm: --- [1]
namespaces: --- [2]
labelSelector: --- [3]
topologyKey: --- [4]
weight: --- [5]
[1] Required. A pod affinity term, associated with the corresponding weight.
[2] The namespaces the
labelSelector
applies to (matches against). Null or empty list means the namespace of this pod.[3] A query used to find matching pods. See Pod label selector for detail.
[4] Required. The label of the cluster node.
This pod should be co-located (affinity) or not co-located (anti-affinity) with the pods matching the
labelSelector
in the specified namespaces. Co-located is defined as running on a node whose value of the label withtopologyKey
matches that of any node on which any of the selected pods are running.An empty
topologyKey
is not allowed.[5] Required. The weight associated with matching the corresponding
podAffinityTerm
in the range1
-100
.
Configure pod affinity and pod anti-affinity with requiredDuringSchedulingIgnoredDuringExecution¶
Configure the affinity
property of the Confluent components CRs as follows
and create or update the resources.
spec:
podTemplate:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector: --- [1]
namespaces: --- [2]
topologyKey: --- [3]
[1] A query used to find matching pods. See Pod label selector for detail.
[2] The namespaces the
labelSelector
applies to (matches against). Null or empty list means the namespace of this pod.[3] Required. The label of the cluster node.
This pod should be co-located (affinity) or not co-located (anti-affinity) with the pods matching the
labelSelector
in the specified namespaces.An empty
topologyKey
is not allowed.
Pod label selector¶
Use labelSeletor
in pod affinity, pod anti-affinity, and pod topology spread
constraints to find matching pods to apply the constraints.
labelSelector:
matchExpressions: --- [1]
- key: --- [2]
operator: --- [3]
values: --- [4]
matchLabels: --- [5]
[1] A list of label selector requirements. The requirements are ANDed.
A selector contains values, a key, and an operator that relates the key and values.
[2] Required. The label key that the selector applies to.
[3] Required. Represents how the key is related to a set of values. Valid operators are
In
,NotIn
,Exists`, and ``DoesNotExist
.[4] An array of string values.
If the operator is
In
orNotIn
, thevalues
array must be non-empty.If the operator is
Exists
orDoesNotExist
, thevalues
array must be empty.[5] A map of {key,value} pairs. The requirements are ANDed.
CFK adds the following labels you can use in labelSelector
:
Label key | Label value |
---|---|
app | <cluster name> |
clusterId | operator |
confluent-platform | “true” |
platform.confluent.io/type | <CR type> |
type | <cluster name> |
An example CR snippet to not schedule two Connect apps onto the same host:
spec:
podTemplate:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- connect
topologyKey: "kubernetes.io/hostname"
Configure pod topology spread constraints¶
TopologySpreadConstraints
describes how matching Confluent pods should be
scheduled across topology domains.
All topologySpreadConstraints are ANDed.
Specify the following in the Confluent component CRs that you want to apply pod topology spread constraints to:
spec:
podTemplate:
topologySpreadConstraints:
- labelSelector: --- [1]
maxSkew: --- [2]
topologyKey: --- [3]
whenUnsatisfiable: --- [4]
- [1] Used to find matching pods. Pods that match this label selector are counted to determine the number of pods in their corresponding topology domain. See Pod label selector for detail.
- [2] Required. Describes the degree to which pods may be unevenly distributed.
Default value is
1
for the most even distribution.0
is not allowed. - [3] Required. The key of node labels. Nodes that have a label with this key and identical values are considered to be in the same topology.
- [4] Required. Indicates how to deal with a pod if it doesn’t satisfy
the spread constraint. Allowed values are
DoNotSchedule
(default) andScheduleAnyway
.
The following example specifies to evenly balance Kafka pod across all worker nodes:
spec:
podTemplate:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
type: kafka-pods
Configure one replica per node¶
The oneReplicaPerNode
property enforces to run one pod per node through the
pod anti-affinity capability, assigning a dedicated node for the pod workload.
The oneReplicaPerNode
is disabled by default.
When this property is enabled (oneReplicaPerNode: true
), the pod
anti-affinities are disabled.
When you change this property in an existing cluster, the cluster will roll.
Configure the oneReplicaPerNode
property for the Confluent CRs as follows
and create or update the resources.
spec:
oneReplicaPerNode: # Set it to true or false.