Important

You are viewing documentation for an older version of Confluent Platform. For the latest, click here.

Stream Monitoring

Stream Monitoring provides information about how many messages were produced and consumed over time, highlighting any discrepancies between the two. It also provides statistics about how long it takes for messages to be consumed after production.

Chart Types

There are two chart types - one for visualizing message delivery (expected consumption c.f. actual consumption counts) and one for visualizing latency (statistics on the time taken for messages to be consumed after production). These charts are always presented together, with the message delivery chart at the top and latency chart at the bottom.

../../_images/c3deliverylatency.png

Message delivery (top) and latency (bottom) charts.

All times referenced on the charts relate to the time at which messages were sent. More specifically, they are the timestamps included in Kafka messages added at the time messages were produced. By default, these timestamps will be generated by the Kafka Client, but an application may override them. For more information about the use of timestamps, refer to the Concept Guide.

All data presented by the charts are binned values. The bin size is uniform across all charts displayed at any given time and is annotated in the message delivery chart legend (in the above chart, the bin size is 15 seconds). The bin size is chosen dynamically to best match the data and display. It depends on time domain of the chart, browser window size and screen resolution. The minimum bin size is 15 seconds.

The delivery chart shows the number of messages expected to be consumed in each time-bin as a step chart and the number of messages actually consumed as an area chart.

Note

Remember that the messages associated with a particular time-bin are those that were produced over the corresponding time range. The time at which messages were consumed never affects which time-bin they are associated with.

A gap between the “expected consumption” line and the “consumed” area indicates that some messages that were produced have not yet been consumed by a consumer group that is reading from the topic. Typically, there will be a gap between expected and actual consumption very close to real time, and this gap will diminish over time. (It takes some time for messages to move through a pipeline to be processed.) If a gap persists past one minute behind real time, Control Center will highlight the gap in orange to help draw your attention to it.

It’s also possible for more messages to be consumed than expected (this can happen in the case of consumer failure). In this case, the area chart will higher than the line chart for the affected time bin and the corresponding area on the chart will also be highlighted orange.

../../_images/c3consumptiondiff.png

Example of both under and over consumption

The latency chart shows the minimum latency, average latency, and maximum latency for messages sent within each time window. See Concept Guide for more details about how latency is calculated.

Page Layout

The stream monitoring page is split into two sections. The top section (aggregate view) provides a visualization of a specific subset of all messages and the bottom (partition view) a partitioning of this subset. The available views are:

All messages, partition by:
  1. Consumer group, or
  2. Topic
A specific consumer group, partitioned by:
  1. Consumer, or
  2. Topic / partition
A specific consumer, partitioned by:
  1. Topic / partition

All charts on a given page show information corresponding to the same time range. Hovering over a chart displays information pertinent to the time bin that the mouse is currently over across all charts. This allows for easy comparison of metrics across partitions and the aggregate data.

Time Range and Cluster Selection

The current time range (which pertains to all charts shown on screen at any given time) is displayed at the top right hand side of the page - here “October 24, Last 30 minutes”. Clicking this will open the time range selector:

../../_images/datepicker.png

You can use the selector to select one of three types of time ranges:

  1. Static - A specific time range with constant start and end times.
  2. Rolling - A time range where the end time always equal to the current time and the extent of the time range is held constant.
  3. Growing - A time range where the end time always equal to the current time and the start time is held constant.

Next to the time range selector is cluster selector which allows the current Kafka Cluster to be selected in a multi-cluster setting. Note that an aggregate view of all messages across all clusters is not available.

Summary statistics

Statistics summarizing the aggregate view data over the current visible time window are displayed near the top right of the screen.

../../_images/c3summarystats.png

Average latency is a weighted average (by consumed count) of the average latency over all visible time bins.

Overall completion is the sum of actual completion over all visible time bins as a percentage of the sum of expected completion over all visible time bins. This number can be greater than 100% in the case of over consumption.

If there is any under-consumption sufficiently behind real time (currently fixed at 1 minute behind real time, rounded up to the nearest bin size) an orange pin will be shown next to the overall consumption value to highlight this.

Note

It’s possible for both the orange pin to be visible and an overall consumption value to be greater than 100%. In this case there would have been both under and over consumption in different time bins, and greater over consumption than under consumption.

Missing Metrics Data

Lost or duplicate messages sent by your application can be seen as a difference between expected and actual consumption values in the message delivery chart.

It’s also possible for messages sent by the Confluent metrics interceptors to be lost or duplicated. When this occurs, it’s indicated in the interface by showing a herringbone pattern on the axis. An error like this means that we can’t tell if any of your application messages were lost, delayed, or duplicated. (It is possible that they were lost, and also possible that they were not.)

../../_images/errorcloseup.png

Lost stream monitoring metrics