Monitoring is essential for large data clusters. Logs are used to determine failure inefficiencies, and we need complete metrics to help us manage Kafka clusters. This article discusses Kafka monitoring and some common third-party monitoring tools.

A, Kafka Monitoring

First of all, kafka monitoring principles, third-party tools are also used to monitor, we can also go to the implementation of monitoring, the official website about the monitoring document address is as follows:

Kafka.apache.org/documentati… ] (kafka.apache.org/documentati…).

Kafka uses Yammer Metrics, a Java monitoring library, for monitoring.

Kafka has a number of monitoring metrics by default, which are accessed remotely using the JMX interface by setting JMX_PORT before starting broker and Clients:

JMX_PORT=9997 bin/kafka-server-start.sh config/server.properties
Copy the code

Each monitoring metric in Kafka is defined in the form of a JMX MBEAN, which is an instance of a managed resource.

You can use Jconsole (Java Monitoring and Management Console), a VISUAL Monitoring and Management tool based on JMX.

To visualize the results of monitoring:

Figure 2 Jconsole

You can then find various Kafka metrics under the Mbean.

The Mbean naming convention is kafka. XXX :type= XXX, XXX = XXX

It is mainly divided into the following categories:

(There are many monitoring indicators, only part of them are captured here, please refer to the official documents for details)

Graphing and Alerting Monitoring

Kafka. Server is server related, kafka.network is network related.

Description Mbean name Normal value
Message in rate kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec
Byte in rate from clients kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec
Byte in rate from other brokers kafka.server:type=BrokerTopicMetrics,name=ReplicationBytesInPerSec
Request rate kafka.network:type=RequestMetrics,name=RequestsPerSec,request={Produce|FetchConsumer|FetchFollower}
Error rate kafka.network:type=RequestMetrics,name=ErrorsPerSec,request=([-.\w]+),error=([-.\w]+) Number of errors in responses counted per-request-type, per-error-code. If a response contains multiple errors, all are counted. error=NONE indicates successful responses.

Common monitoring the metrics for the producer/consumer/connect/streams monitoring:

Monitoring kafka during runtime.

Metric/Attribute name Description Mbean name
connection-close-rate Connections closed per second in the window. kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)
connection-close-total Total connections closed in the window. kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+)

Common Per – broker metrics for producer/consumer/connect/streams monitoring:

Monitoring of each broker.

Metric/Attribute name Description Mbean name
outgoing-byte-rate The average number of outgoing bytes sent per second for a node. kafka.[producer|consumer|connect]:type=[consumer|producer|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)
outgoing-byte-total The total number of outgoing bytes sent for a node. kafka.[producer|consumer|connect]:type=[consumer|producer|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+)

Producer monitoring:

Producer Monitors the invocation process.

Metric/Attribute name Description Mbean name
waiting-threads The number of user threads blocked waiting for buffer memory to enqueue their records. kafka.producer:type=producer-metrics,client-id=([-.\w]+)
buffer-total-bytes The maximum amount of buffer memory the client can use (whether or not it is currently used). kafka.producer:type=producer-metrics,client-id=([-.\w]+)
buffer-available-bytes The total amount of buffer memory that is not being used (either unallocated or in the free list). kafka.producer:type=producer-metrics,client-id=([-.\w]+)
bufferpool-wait-time The fraction of time an appender waits for space allocation. kafka.producer:type=producer-metrics,client-id=([-.\w]+)

Consumer monitoring:

Monitoring during the consumer call.

Metric/Attribute name Description Mbean name
commit-latency-avg The average time taken for a commit request kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
commit-latency-max The max time taken for a commit request kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
commit-rate The number of commit calls per second kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)
commit-total The total number of commit calls kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+)

The Connect monitoring:

Attribute name Description
connector-count The number of connectors run in this worker.
connector-startup-attempts-total The total number of connector startups that this worker has attempted.

Streams monitoring:

Metric/Attribute name Description Mbean name
commit-latency-avg The average execution time in ms for committing, across all running tasks of this thread. kafka.streams:type=stream-metrics,client-id=([-.\w]+)
commit-latency-max The maximum execution time in ms for committing across all running tasks of this thread. kafka.streams:type=stream-metrics,client-id=([-.\w]+)
poll-latency-avg The average execution time in ms for polling, across all running tasks of this thread. kafka.streams:type=stream-metrics,client-id=([-.\w]+)

These metrics cover all aspects of our use of Kafka and kafka.log for logging information. There are specific parameters under each Mbean.

Using parameters such as inbound and outbound rate, ISR change rate, Producer side Batch size, number of threads, Consumer side latency, flow rate, etc. Of course, we also need to pay attention to the JVM and OS level monitoring. There are general tools for these parameters, which will not be described here.

The monitoring principle of Kafka has been basically understood, and most of the other third-party monitoring tools are also improved in this level. Here are a few mainstream monitoring tools.

Second, the JmxTool

JmxTool is not a framework, but a tool that Kafka provides by default for viewing JMX metrics in real time.

Go to the Kafka installation directory and run the bin/kafka-run-class.sh kafka.tools.JmxTool command to obtain the help information of the JmxTool tool.

For example, if we want to monitor the inbound rate, we can type the command:

bin/kafka-run-class.sh kafka.tools.JmxTool --object-name kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec --jmx-url service:jmx:rmi:///jndi/rmi://:9997/jmxrmi --date-format "YYYY-MM-dd HH:mm:ss" --attributes FifteenMinuteRate --reporting-interval 5000
Copy the code

The value of BytesInPerSec is printed on the console every 5 seconds:

>kafka_2.12-2.0.0 RRD $bin/kafka-run-class.sh kafka.tools.JmxTool --object name kafka. Server:type=BrokerTopicMetrics,name=BytesInPerSec --jmx-url service:jmx:rmi:///jndi/rmi://:9997/jmxrmi --date-format "YYYY-MM-dd HH:mm:ss" --attributes FifteenMinuteRate --reporting-interval 5000

Trying to connect to JMX url: service:jmx:rmi:///jndi/rmi://:9997/jmxrmi.

"time"."kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec:FifteenMinuteRate"The 2018-08-10 14:52:15, 784224.2587058166 2018-08-10 14:52:20, 1003401.2319497257 2018-08-10 14:52:25, 1125080.6160773218 The 2018-08-10 14:52:30, 1593394.1860063889Copy the code

Third, Kafka – Manager

Kafka is an open-source monitoring framework written in Scala by Yahoo in 2015. The github address is github.com/yahoo/kafka…

Conditions of use:

  1. Kafka 0.8..Or 0.9..Or 0.10..Or 0.11..
  2. Java 8+

Kafka – download manager

Configuration: the conf/application. Conf

kafka-manager.zkhosts="my.zookeeper.host.com:2181,other.zookeeper.host.com:2181"
Copy the code

Deployment: SBT deployment is used here

./sbt clean dist
Copy the code

Activation:

Bin/kafka - manager specified port: $bin/kafka - manager - Dconfig. The file = / path/to/application. The conf - Dhttp. Port = 8080 permissions:  $ bin/kafka-manager -Djava.security.auth.login.config=/path/to/my-jaas.confCopy the code

Then access local host:8080

You can see the monitoring page:

Figure topic

Figure broker

The page is very concise, also has a lot of rich features, open source free, recommended use, but the current version supports Kafka 0.8.. Or 0.9.. Or 0.10.. Or 0.11, need special attention.

Fourth, kafka – monitor

Kafka is a monitoring framework for kafka. The github address is github.com/linkedin/ka…

Based on Gradle 2.0 and above, supports Java 7 and Java 8.

Support kafka from 0.8-2.0, users can download different branches according to their needs.

Use:

Compile:

$ git clone https://github.com/linkedin/kafka-monitor.git
$ cd kafka-monitor 
$ ./gradlew jar
Copy the code

Modify the configuration: config/kafka-monitor.properties

"zookeeper.connect" = "localhost:2181"
Copy the code

Activation:

$./bin/kafka-monitor-start.sh config/kafka-monitor.properties Single-cluster startup: $./bin/single-cluster-monitor.sh --topictest--broker-list localhost:9092 --zookeeper localhost:2181  $ ./bin/kafka-monitor-start.sh config/multi-cluster-monitor.propertiesCopy the code

Then visit localhost:8080 to see the monitoring page

Figure kafkamonitor

At the same time, we can also query other metrics through HTTP request:

curl localhost:8778/jolokia/read/kmf.services:type=produce-service,name=*/produce-availability-avg
Copy the code

In general, its Web features are relatively simple, users do not use much, HTTP function is very useful, supported by many versions.

Kafka Offset Monitor

Website address http://quantifind.github.io/KafkaOffsetMonitor/

Making address github.com/quantifind/…

Use: Execute after download

Java - cp KafkaOffsetMonitor - assembly - 0.3.0. Jar: kafka - offset - monitor - another - db - reporter. Jar \ com.quantifind.kafka.offsetapp.OffsetGetterWeb \ --zk zk-server1,zk-server2 \ --port 8080 \ --refresh 10.seconds \ --retain 2.days --pluginsArgs anotherDbHost=host1,anotherDbPort=555Copy the code

Then look at localhost:8080

Figure offsetmonitor1

Figure offsetmonitor2

The project is more focused on monitoring offset, and the pages are rich, but not updated after 15 years to support the latest version of Kafka. Continue to maintain the version of the following address https://github.com/Morningstar/kafka-offset-monitor.

Six, Cruise control

In August 2017, Linkedin opened source the Cruise-Control framework for monitoring large clusters, including a number of operations functions. Linkedin reportedly has over 20,000 Kafka clusters, and the project is still being updated.

Github address: github.com/linkedin/cr…

Use:

Download gitclone https://github.com/linkedin/cruise-control.git && cdCruise control/compile. / gradlew jar to modify the config/you properties the bootstrap. The servers to zookeeper. Start the connect: ./gradlew jar copyDependantLibs ./kafka-cruise-control-start.sh [-jars PATH_TO_YOUR_JAR_1,PATH_TO_YOUR_JAR_2] config/cruisecontrol.properties [port]Copy the code

Access after startup:

http://localhost:9090/kafkacruisecontrol/state

There are no pages, and everything is provided in the form of REST apis.

The interface list is as follows: github.com/linkedin/cr…

The framework is flexible enough that users can optimize their clusters by capturing various metrics based on their own situation.

Seven, Doctorkafka

DoctorKafka is Pinterest’s open source Kafka cluster self-healing and workload balancing tool.

Pinterest is a social site for sharing pictures. They use Kafka as a centralized messaging tool for data ingestion, stream processing, and other scenarios. As the number of users grew, the Kafka cluster grew larger, and the complexity of managing it became a heavy burden for the operations team, so they developed DoctorKafka, a Kafka cluster self-healing and workload balancing tool. They recently opened the project on GitHub.

Use:

Download: gitclone [git-repo-url] doctorkafka
cdDoctorkafka compile: MVN package-pl kafkastats -am Start: java-server \ -dlog4j.configurationFile=file:./log4 j2. XML \ - cp lib / * : kafkastats - 0.2.4.8. Jar \ com pinterest. Doctorkafka. Stats. KafkaStatsMain \ - broker 127.0.0.1 \ -jmxport 9999 \ -topic brokerstats \ -zookeeper zookeeper001:2181/cluster1 \ -uptimeinseconds 3600 \ -pollingintervalinseconds 60 \ -ostrichport 2051 \ -tsdhostport localhost:18126 \ -kafka_config /etc/kafka/server.properties \ -producer_config /etc/kafka/producer.properties \ -primary_network_ifacename eth0Copy the code

The page is as follows:

Figure dockerkafka

DoctorKafka periodically checks the status of each cluster after startup. When it detects a failed broker, it transfers the failed broker’s workload to a broker with sufficient bandwidth. It raises an alarm if there are not enough resources in the cluster to reallocate. Belongs to a framework that automatically maintains cluster health.

Eight, Burrow

Burrow is LinkedIn’s open source framework for monitoring consumer Lag.

The github address is github.com/linkedin/Bu…

Using Burrow to monitor Kafka, there is no need to set a threshold for LAG, which is a completely dynamic evaluation based on the consumption process

Burrow supports reading offsets from Both Kafka Topic and ZooKeeper, both old and new versions of Kafka

Burrow supports HTTP, email alerts

By default, Burrow provides only HTTP interfaces (HTTP endpoint). Data is in JSON format, and there is no Web UI.

Installation and use:

$ Clone github.com/linkedin/Burrow to a directory outside of $GOPATH. Alternatively, you can export GO111MODULE=on to enable Go module.
$ cd to the source directory.
$ go mod tidy
$ go install
Copy the code

Example:

List all monitored Kafka cluster curl-s http://localhost:8000/v3/kafka |jq
{
  "error": false."message": "cluster list returned"."clusters": [
    "kafka"."kafka"]."request": {
    "url": "/v3/kafka"."host": "kafka"}}Copy the code

Other frameworks, kafka-web-console: github.com/claudemamo/…

Kafkat:github.com/airbnb/kafk…

Capillary:github.com/keenlabs/ca…

Chaperone:github.com/uber/chaper…

There are many more, but we need to make our choices based on our version of Kafka.

For more blog posts on real-time computing, Kafka and other related technologies, welcome to real-time streaming computing