Control Center Alerts for Confluent Platform

Control Center enables you to detect anomalous events in the monitoring data and configure alerts to occur when those events are detected. For example, you can detect when a cluster goes down and configure an email to be sent.

Note

When RBAC is enabled in Control Center, there are some nuances to trigger, action, and alert access. For details, see About alerts access.

To learn how to access the Alerts page, see Manage Control Center Alerts and History for Confluent Platform.

Triggers and actions

An alert consists of a trigger and one or more actions.

The component type you can choose for a Trigger, and the metrics you can define for the trigger depends on whether you are running Control Center in Normal mode or Reduced infrastructure mode. When you create a trigger, incompatible component types and metrics are filtered out.

Trigger component types that are only compatible with Normal mode:

  • Topics
  • Brokers

Trigger component types compatible with both Normal mode mode and Reduced infrastructure mode:

  • Consumer groups
  • Clusters

Each trigger is based on a metric with condition value criteria that determines when the trigger should fire. For more about Triggers, see Manage Control Center Alert Triggers for Confluent Platform.

Any actions associated with the trigger are executed when the criteria is met.

Supported actions for a trigger include:

  • Sending an email notification to one or more accounts
  • Sending a Slack webhook notification
  • Sending a PagerDuty webhook notification that creates an incident ticket

A trigger can be associated with any number of defined actions. For more about actions, see Manage Control Center Alert Actions for Confluent Platform.

To learn how to configure some example triggers and actions, see Control Center Alerts Usage Example for Confluent Platform.

When a trigger fires, it executes all its associated enabled actions for which the Max send rate has not been exceeded. If the Max send rate of a particular action has been exceeded, the trigger event is added to a queue associated with the action and is included in the action event the next time it is executed (actions can report a set of triggers, not just one trigger).

Note

Queuing does not occur when alerts (actions) are paused because triggers are ignored during that interim.

The maximum triggered events per alert (default: 1000) is controlled by the confluent.controlcenter.max.trigger.events.per.alert.config option.

Detection of anomalous events (triggering criteria) is decoupled from the alert actions that are taken when a triggering event occurs. This means that triggers and actions are defined independently, which provides flexibility when setting one or more actions to perform when a trigger fires.

Each time interceptor data is received by Control Center, metric values (such as consumption difference and latency) of the corresponding time windows are updated to reflect the new data. All newly updated metric values are then checked against all configured triggers to determine whether a trigger should fire.

Note

Interceptors can conceivably report data related to any time - alerting works across all time windows, not just those near real time.

Buffer for consumer group triggers (deprecated)

Tip

The buffer feature for this trigger has been deprecated. It will be removed from Control Center in a future release. Do not rely on the buffer value.

Triggers for consumer groups have an associated buffer value. The buffer enables you to require an alertable state to persist for a configurable period of time to alleviate prematurely activating a consumer group trigger.