Get the Status of ksqlDB Servers on Confluent Platform¶
The /clusterStatus
resource gives you information about the status
of all ksqlDB servers in a ksqlDB cluster, which can be useful for
troubleshooting. Enable this endpoint by setting
ksql.heartbeat.enable
to true
. Optionally, you can also set
ksql.lag.reporting.enable
to true
to have your ksqlDB servers report state store lag, which
will then also be returned with the response from the /clusterStatus
endpoint.
Note
ksqlDB servers in a cluster discover each other through persistent queries.
If you have no persistent queries running, then the /clusterStatus
endpoint contains info for the particular server that was queried, rather
than all servers in the cluster.
You can use the curl
command to query the /clusterStatus
endpoint for a particular server:
curl -sX GET "http://localhost:8088/clusterStatus" | jq '.'
The response object contains a clusterStatus
field with the
following information for each ksqlDB server (represented as
host:port
):
- hostAlive (boolean): whether the server is alive, as determined by heartbeats received by the queried server
- lastStatusUpdateMs (long): epoch timestamp, in milliseconds, for when the last status update was received for this server, by the queried server
- activeStandbyPerQuery (object): for each query ID, a collection of active and standby partitions and state stores on this server
- hostStoreLags (object): state store lag information. Empty unless
ksql.lag.reporting.enable
is set totrue
. - hostStoreLags.stateStoreLags (object): partition-level lag breakdown for each state store.
- hostStoreLags.updateTimeMs (long): epoch timestamp, in milliseconds, for when the last lag update was received for this server, by the queried server
For a two-node cluster running a single CREATE TABLE ... AS SELECT
query, with lag reporting enabled, your output should resemble:
{
"clusterStatus": {
"localhost:8088": {
"hostAlive": true,
"lastStatusUpdateMs": 1617609098808,
"activeStandbyPerQuery": {
"CTAS_MY_AGG_TABLE_3": {
"activeStores": [
"Aggregate-Aggregate-Materialize"
],
"activePartitions": [
{
"topic": "my_stream",
"partition": 1
},
{
"topic": "my_stream",
"partition": 3
},
{
"topic": "_confluent-ksql-default_query_CTAS_MY_AGG_TABLE_3-Aggregate-GroupBy-repartition",
"partition": 1
},
{
"topic": "_confluent-ksql-default_query_CTAS_MY_AGG_TABLE_3-Aggregate-GroupBy-repartition",
"partition": 3
}
],
"standByStores": [],
"standByPartitions": []
}
},
"hostStoreLags": {
"stateStoreLags": {
"_confluent-ksql-default_query_CTAS_MY_AGG_TABLE_3#Aggregate-Aggregate-Materialize": {
"lagByPartition": {
"1": {
"currentOffsetPosition": 0,
"endOffsetPosition": 0,
"offsetLag": 0
},
"3": {
"currentOffsetPosition": 0,
"endOffsetPosition": 0,
"offsetLag": 0
}
},
"size": 2
}
},
"updateTimeMs": 1617609168917
}
},
"other.ksqldb.host:8088": {
"hostAlive": true,
"lastStatusUpdateMs": 1617609172614,
"activeStandbyPerQuery": {
"CTAS_MY_AGG_TABLE_3": {
"activeStores": [
"Aggregate-Aggregate-Materialize"
],
"activePartitions": [
{
"topic": "my_stream",
"partition": 0
},
{
"topic": "my_stream",
"partition": 2
},
{
"topic": "_confluent-ksql-default_query_CTAS_MY_AGG_TABLE_3-Aggregate-GroupBy-repartition",
"partition": 0
},
{
"topic": "_confluent-ksql-default_query_CTAS_MY_AGG_TABLE_3-Aggregate-GroupBy-repartition",
"partition": 2
}
],
"standByStores": [],
"standByPartitions": []
}
},
"hostStoreLags": {
"stateStoreLags": {
"_confluent-ksql-default_query_CTAS_MY_AGG_TABLE_3#Aggregate-Aggregate-Materialize": {
"lagByPartition": {
"0": {
"currentOffsetPosition": 1,
"endOffsetPosition": 1,
"offsetLag": 0
},
"2": {
"currentOffsetPosition": 0,
"endOffsetPosition": 0,
"offsetLag": 0
}
},
"size": 2
}
},
"updateTimeMs": 1617609170111
}
}
}
}