.. _controlcenter_userguide_alerts:

Alerts
======

|c3-short| provides functionality for detecting anomalous events in your monitoring data and performing
actions when those events occur.

.. _concepts:

Concepts
--------

Triggers can be defined for topics, brokers, consumer groups, and clusters. Each trigger is based on a metric
with condition value criteria that determines
when the trigger should fire. Any actions associated with the trigger are executed when the criteria is met.

Detection of anomalous events (*triggering* criteria) is decoupled from the alert *actions* that should be taken when
a triggering event occurs. This means that triggers and actions are defined independently, which provides flexibility
when setting one or more actions to perform when a trigger fires.

Each time interceptor data is received by |c3-short|, metric values (consumption difference and
latency) of the corresponding time windows are updated to reflect the new data. All newly updated
metric values are then checked against all configured triggers to determine whether a trigger should fire.

.. note::

  Interceptors can conceivably report data related to any time - alerting works across all time
  windows, not just those near real time.

A trigger can be associated with any number of defined actions. When a trigger fires, it causes all
associated actions to be executed for which the *max send rate* of the action has not been
exceeded. If the max send rate of a particular action has been exceeded, the trigger event is
added to a list queue associated with the action and is included in the action event the next time it
is executed (actions can report a set of triggers, not just one trigger).

.. _buffer_cg_triggers:

Buffer for consumer group triggers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Because of normal lag in the system, time windows close to real time will frequently have
associated metric values that would be cause for concern if the time window was further behind
real time. For this reason, triggers for consumer groups have an associated *buffer* value. The buffer
allows you to require an alertable state to persist for a configurable period of time to alleviate
prematurely activating a consumer group trigger.

A triggered event that is within *buffer* seconds of real time is not immediately registered against
actions. When the time window ultimately moves greater than the *buffer* seconds behind
real time, any associated metric value that would still cause a trigger to be fired is then
registered against any appropriate actions.

.. note:: Setting a condition for the buffer in seconds is only applicable
          to :ref:`consumer group triggers<consumergroup_triggers>`.

.. _alerts_overview:

Overview page
-------------

To access the Alerts Overview page, from the navigation menu, click **Alerts -> Overview**.

The **Overview** page has the following main sections, accessible by clicking the tabs:

- **HISTORY** - :ref:`Historical alert log<alert_history>`
- **TRIGGERS** - :ref:`trigger_mgmnt`
- **ACTIONS** - :ref:`actions_mgmnt`

.. figure:: images/c3alertssubmenus.png
   :scale: 50%


.. _trigger_mgmnt:

Trigger Management
------------------

Clicking on the **Triggers** tab shows a summary of all configured triggers:

.. figure:: images/c3alertstriggersoverview.png
   :scale: 50%

Initially, the page is blank when there aren't any triggers defined yet.

You can edit or delete an existing trigger using the edit or delete links in this table, or create a new trigger
using the **+ New trigger** button.

You can also initiate creating a new trigger by clicking on a consumer group delivery or latency
monitoring chart progress indicator, and clicking the **Set up an alert** button. This pre-populates the new trigger form with
relevant information.

.. figure:: images/c3alertsprepopulate.png


.. _trigger_form:

New/Edit Trigger Form
^^^^^^^^^^^^^^^^^^^^^

Complete the New Trigger form to define the criteria that will activate associated alert actions. Some of the fields
are already populated for you when you clicked **Set up an alert** from a context menu.

The following types of triggers can be created:

- :ref:`Topic<topic_triggers>`
- :ref:`Consumer Groups<consumergroup_triggers>`
- :ref:`Cluster<cluster_triggers>`
- :ref:`Broker<broker_triggers>`

.. _topic_triggers:

^^^^^^^^^^^^^^
Topic Triggers
^^^^^^^^^^^^^^

.. figure:: images/c3alertstopictrigger.png
    :scale: 50%

Trigger name
  A unique name used to identify the trigger (for example: *topic name* production requests).
  Uniqueness is not enforced but you should use different names to avoid confusion.

Component type
  Should be pre-selected as **Topic** if you clicked **Set up an alert** from the **System Health > Topics** tab. Otherwise,
  select **Topic**.

Cluster id
  The trigger for a topic is limited to a specific cluster ID. If you require a topic to be triggered
  by multiple clusters, create independent triggers for each cluster.

Condition
  A select list of options for matching against the value field (below). The name of the topic can **Equals**,
  **Begins with**, **Ends with**, or **Contains** a specified value.

  .. note::

    For example, selecting **Contains** and then entering 'topic' into the value
    field will match 'my topic', 'topical', and 'topics with data'. If **Begins with** is selected, the trigger will only match
    'topical' and 'topics with data', not 'my topic'.

Topic name
  The name or part of a topic name to be triggered against. Works in conjunction with *Condition* in order to match against
  one or many topics.

  .. note::

    If multiple topics match against topic name, the trigger will be *per topic*, not aggregate.
    In the case where there are two topics that **Begin with** 'topic', and the trigger is set to **Bytes in** *greater than 100*,
    any topic will fire the trigger if they get > 100 **Bytes In**.

  .. warning::

     A message appears when there are greater than five
     topics that match the criteria. Narrow the criteria when you see this message.


Metric
  The value to check for the trigger alert. Possible values are:

  Bytes in
    Amount of bytes per second coming in to a topic.
  Bytes out
    Amount of bytes per second going out from a topic (does not account for internal replication traffic).

  .. note::

    Prior to |ak-tm| 0.11.0.0, the ``BytesOutPerSec`` accounted for traffic from the consumer
    and internal replication. This has been changed to only account for consumer traffic for this topic.
    Please adjust alerts accordingly.


  Out of sync replica count
    .. include:: includes/topicInSyncReplica.rst
  Production request count
    Amount of production requests per second to a topic in a cluster.
  Under replicated topic partitions
    Amount of under replicated topic partitions. A use case for this metric would be
    wanting to know if a Kafka broker crashed while holding a specific topic partition.

Condition
  The trigger will fire when *Condition* is true for the difference between the value of the
  metric being monitored and the value of the *Value* field. Possibly **Greater than**,
  **Less than**, **Equal to**, or **Not equal to**.

Value
  The value to which the topic *Metric* is compared.

.. _consumergroup_triggers:

^^^^^^^^^^^^^^^^^^^^^^^
Consumer Group Triggers
^^^^^^^^^^^^^^^^^^^^^^^

.. figure:: images/c3alertsedittrigger.png
    :scale: 50%


Trigger name
    A unique name used to identify the trigger (for example:  *consumer group name* under consumption).
    Uniqueness is not enforced but you should use different names to avoid confusion.

Component type
  Should be selected as 'Consumer group'. If this is not the case, see the other form trigger documentation.

Consumer group name
    The name of the consumer group to monitor for anomalies.

Metric
    The metric to monitor. One of **maximum latency (ms)**, **average latency (ms)** or **consumption difference**.

Condition
    The trigger will fire when *Condition* is true of the difference between the value of the
    metric being monitored and the value of the *Value* field. Possibly **Greater than**,
    **Less than**, **Equal to**, or **Not equal to**.

Value
  The value to which the monitored consumer group *Metric* is compared.

Buffer
    The delay behind real time to wait until a time window is considered for triggering (refer to :ref:`concepts`
    for more information).

.. _cluster_triggers:

^^^^^^^^^^^^^^^^
Cluster Triggers
^^^^^^^^^^^^^^^^

.. figure:: images/c3alertscluster.png
    :scale: 50%


Trigger name
    A unique name that identifies the trigger (for example:  Cluster zookeeper down).
    Uniqueness is not enforced but you should use different names to avoid confusion.

Clusters
  One or many clusters to trigger based on conditions.

  .. note:: If multiple clusters are selected, the trigger will be *per cluster*, not aggregate, but not in all cases.
            See below for more details.

Metric
  Values in *Metric* are triggered on a cluster-wide basis.

  .. important:: Any **cluster** that meets the *Condition* below triggers an associated action.

  Under replicated topic partitions
    .. include:: includes/brokerClusterUnderReplicated.rst

    A trigger should be created for values ``> 0``.

  Offline topic partitions
    .. include:: includes/brokerClusterOfflineTopicPartitions.rst

    A trigger should be created for values ``> 0``.

  |zk| status
    Are brokers able to connect to |zk|? 'Offline' / 'Online' are possible values.

  |zk| expiration rate
    .. include:: includes/brokerClusterZooKeeperExpires.rst

Active controller count
    .. include:: includes/brokerClusterActiveController.rst

    A trigger should be created for values ``!= 1``.

Leader election rate
    .. include:: includes/brokerClusterLeaderElection.rst

Unclean election count
    .. include:: includes/brokerClusterUncleanCount.rst

    A trigger should be created for values ``!= 0``.

Condition
    The trigger will fire when *Condition* is true of the difference between the value of the
    metric being monitored and the value of the *Value* field. Possibly **Greater than**,
    **Less than**, **Equal to**, **Not equal to**, **Online**, or **Offline**, depending on the *Metric* selected.

Value
  The value to which the cluster *Metric* is compared.

.. _broker_triggers:

^^^^^^^^^^^^^^^
Broker Triggers
^^^^^^^^^^^^^^^

.. figure:: images/c3alertsbroker.png
    :scale: 50%


Trigger name
    A unique name used to identify the trigger (for example: Broker fetch request latency).
    Uniqueness is not enforced but you should use different names to avoid confusion.

Broker clusters
  One or many clusters to trigger based on conditions of individual brokers.

  .. note::

    If multiple clusters are selected, the trigger will be *per cluster*, not aggregate, but not in all cases.
    See below for more details.

Metric
  Values in *Metric* are triggered on a per broker basis.

  .. important:: Any **broker** that meets the *Condition* below will trigger discretely.

  Bytes in
    Number of bytes per second produced a broker.
  Bytes out
    Number of bytes per second fetched from a broker (does not account for internal replication traffic).

  .. note::

    Prior to Kafka 0.11.0.0, the ``BytesOutPerSec`` accounted for traffic from the consumer
    and internal replication. This has been changed to only account for consumer traffic for this broker.
    Please adjust alerts accordingly.

  Production request latency
    .. include:: includes/brokerProductionRequestLatency.rst
  Production request count
    Total number of produce requests to a broker (requests per minute).
  Fetch request latency
    .. include:: includes/brokerFetchRequestLatency.rst

Condition
    The trigger will fire when *Condition* is true of the difference between the value of the
    metric being monitored and the value of the *Value* field. Possibly **Greater than**,
    **Less than**, **Equal to**, **Not equal to**, **Online**, or **Offline**, depending on the *Metric* selected.

Value
  The value to which the broker *Metric* is compared.

.. _actions_mgmnt:

Actions Management
------------------

After creating a :ref:`trigger<trigger_form>`, you are given the option to go to the action management 
page to associate it with one or more existing actions, or if none exist, create a new action.

.. figure:: images/c3-trigger-saved.png
              :scale: 50%

Before being able to send email actions, you need to enable :ref:`email_settings` and properly
configure |c3-short| to communicate with your SMTP server.

At the very least, you need to set:

.. codewithvars:: bash

   # this enables sending mail via c3
   confluent.controlcenter.mail.enabled=true
   # this is the host name of your mail server
   confluent.controlcenter.mail.host.name=mymail.server
   # this is the port your mail server is running on
   confluent.controlcenter.mail.port=25
   # Confluent also recommends setting rest.listeners explicitly as well because
   # this will control the c3 link that is embedded in the
   # body of any alert emails
   confluent.controlcenter.rest.listeners=control-center.server

Clicking on the **Actions** tab shows a summary of all configured actions:

.. figure:: images/c3alertsactionsoverview.png
   :scale: 50%

You can edit or delete an existing action using the edit or delete links, or create a new action
using the **+ New action** button.

.. _new_edit_action_form:

New or Edit Action Form
^^^^^^^^^^^^^^^^^^^^^^^

Complete the Action form to specify the action to take when a defined trigger or triggers associated with the
action is fired.

.. figure:: images/c3alertseditaction.png
   :scale: 50%


A description of each field follows (all fields are required):

Action name
    A unique name for the action (for example: email DevOps on call). Uniqueness is not enforced, but you should
    use different names to avoid confusion.

Enabled/Disabled
    You might need to temporarily disable actions. You can choose whether an action is currently
    enabled or disabled with this field.

Triggers
    One or more triggers that will cause the action to execute. Refer to :ref:`concepts` for more information.

Action
    The type of action to perform. Currently, the only available action is **Send email**.

Recipient email address
    The email address or addresses associated with this action. A message is sent to the specified email address each time the
    action is executed. Separate multiple email addresses with a comma.

Subject
    The subject line of the email associated with the action.

Max send rate
    The maximum rate at which the action should be executed. A value and frequency: Per hour (default), Per minute, Per 4 hours,
    Per 8 hours, Per day. For example, enter 1 and select Per day to send the alert once daily. Refer to :ref:`concepts` for more information.


|c3-short| offline status
-------------------------

An red banner appears at the top of every page when |c3-short| goes offline. This happens when the Kafka cluster |c3-short| uses is offline or unreachable.

.. figure:: images/c3clusterdown.png


Create alerts for |c3-short| cluster offline status
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If you want to send an email alert to recipients when the |c3-short| cluster goes offline, there are a few ways to create it:

#. Add the following lines to your properties file (``<path-to-confluent>/etc/confluent-control-center/control-center.properties``).

   .. sourcecode:: bash

      # Automatically create Control Center cluster down trigger and action pair during start up
      confluent.controlcenter.alert.cluster.down.autocreate=true
      confluent.controlcenter.alert.cluster.down.to.email=emailOnCall@example.com,emailDevOps@example.com
      confluent.controlcenter.alert.cluster.down.send.rate=12

   .. note::

    You can edit the auto-created trigger and actions in the Alerts UI.

#. Using the :ref:`trigger UI<trigger_form>` and the :ref:`action UI<new_edit_action_form>`.

.. _alert_history:

Alerts History
--------------

To access the Alerts Overview page, from the navigation menu, 
click **Alerts -> Overview**. The  **History** page displays by default. Initially, 
the page is blank when there isn't any trigger history yet:

.. figure:: images/c3-no-alerts-history-yet.png
   :scale: 50%

After actions have been triggered, the History page shows a summary of triggers that caused
an action to be executed. The alert history does not list every triggered event:

- Any alerts triggered by consumer lag, cluster status, or broker status events do 
  *not* populate history. Alert emails sent do *not* contain an email link to alerts history.

- Only alerts triggered by topic status events (streams topology) populate history. Alert 
  emails are sent that contain an email link to alerts history, as 
  configured in ``confluent.controlcenter.rest.listeners``.

.. comment out until alerts history is accumulated and can be shown

.. You can see contextual information for some items by clicking the view link.  

.. _integration_alerts:

Integration (REST API) page
---------------------------

To access the Alerts Integration page, from the navigation menu, click **Alerts -> Integration**.

The **REST API** page provides details of the alerts REST endpoint that can be used to programmatically
obtain historical alert information.

.. figure:: images/c3alertsintegration.png
   :scale: 50%

See :ref:`alerts-integration-rest-api`.

.. _alerts-integration-rest-api:

REST API
^^^^^^^^

.. http:get:: /2.0/alerts/history

    Get the most recent alerts.

    :query int limit: The maximum number of records to return
    :query long ts: The most recent alert to return (in milliseconds since epoch)
    :>json string guid: The unique ID of this alert
    :>jsonarr string timestamp: Milliseconds since the epoch when this alert was issued
    :>jsonarr map monitoringTrigger: Trigger definition that caused this alert to be issued
    :>jsonarr string monitoringTrigger.guid:  The unique ID of this trigger
    :>jsonarr string monitoringTrigger.name: The name of this trigger
    :>jsonarr array triggers: The trigger cause associated with `monitoringTrigger`
    :>jsonarr string triggers[i].window: Milliseconds since the epoch associated with the underlying data that caused this trigger was issued
    :>jsonarr array actions: Actions taken due to the firing of `monitoringTrigger`
    :>jsonarr string actions[i].guid: The unique ID of the action taken
    :>jsonarr string actions[i].name: The name of the action taken
    :>jsonarr map actions[i].email: The email address that the alert was sent to

    **Example request**:

    .. sourcecode:: http

      GET /2.0/alerts/history HTTP/1.1
      Accept: application/json

    **Example response**:

    .. sourcecode:: http

      HTTP/1.1 200 OK
      Content-Type: application/json

      [
        {
          "guid": "50c0e74a-6368-43bf-bff7-fa51beff9ad9",
          "timestamp": "1516207447488",
          "monitoringTrigger": {
            "guid": "c8d72271-9f57-44b5-a6a4-97c97f0d1668",
            "name": "rock-cg-0 consumption"
          },
          "triggers": [
            {
              "window": "1516207320000",
              "hasError": false,
              "component": {
                "componentId": "rock-cg-0"
              },
              "longValue": "0"
            }
          ],
          "actions": [
            {
              "guid": "f593d79d-1bb7-4179-8997-6a7c8045dd8e",
              "name": "1212",
              "email": {
                "address": "sdfsdf@lskdjf.com",
                "subject": "skldfjlsdkfj"
              }
            }
          ]
        }
      ]