Monitoring is essential for large data clusters. Logs are used to determine failure inefficiencies, and we need complete metrics to help us manage Kafka clusters. This article discusses Kafka monitoring and some common third-party monitoring tools.
A, Kafka Monitoring
First of all, kafka monitoring principles, third-party tools are also used to monitor, we can also go to the implementation of monitoring, the official website about the monitoring document address is as follows:
Kafka.apache.org/documentati… ] (kafka.apache.org/documentati…).
Kafka uses Yammer Metrics, a Java monitoring library, for monitoring.
Kafka has a number of monitoring metrics by default, which are accessed remotely using the JMX interface by setting JMX_PORT before starting broker and Clients:
JMX_PORT=9997 bin/kafka-server-start.sh config/server.properties
Copy the code
Each monitoring metric in Kafka is defined in the form of a JMX MBEAN, which is an instance of a managed resource.
You can use Jconsole (Java Monitoring and Management Console), a VISUAL Monitoring and Management tool based on JMX.
To visualize the results of monitoring:
Figure 2 Jconsole
You can then find various Kafka metrics under the Mbean.
The Mbean naming convention is kafka. XXX :type= XXX, XXX = XXX
It is mainly divided into the following categories:
(There are many monitoring indicators, only part of them are captured here, please refer to the official documents for details)
Graphing and Alerting Monitoring
Kafka. Server is server related, kafka.network is network related.
Description | Mbean name | Normal value |
---|---|---|
Message in rate | kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec | |
Byte in rate from clients | kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec | |
Byte in rate from other brokers | kafka.server:type=BrokerTopicMetrics,name=ReplicationBytesInPerSec | |
Request rate | kafka.network:type=RequestMetrics,name=RequestsPerSec,request={Produce|FetchConsumer|FetchFollower} | |
Error rate | kafka.network:type=RequestMetrics,name=ErrorsPerSec,request=([-.\w]+),error=([-.\w]+) | Number of errors in responses counted per-request-type, per-error-code. If a response contains multiple errors, all are counted. error=NONE indicates successful responses. |
Common monitoring the metrics for the producer/consumer/connect/streams monitoring:
Monitoring kafka during runtime.
Metric/Attribute name | Description | Mbean name |
---|---|---|
connection-close-rate | Connections closed per second in the window. | kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+) |
connection-close-total | Total connections closed in the window. | kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.\w]+) |
Common Per – broker metrics for producer/consumer/connect/streams monitoring:
Monitoring of each broker.
Metric/Attribute name | Description | Mbean name |
---|---|---|
outgoing-byte-rate | The average number of outgoing bytes sent per second for a node. | kafka.[producer|consumer|connect]:type=[consumer|producer|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+) |
outgoing-byte-total | The total number of outgoing bytes sent for a node. | kafka.[producer|consumer|connect]:type=[consumer|producer|connect]-node-metrics,client-id=([-.\w]+),node-id=([0-9]+) |
Producer monitoring:
Producer Monitors the invocation process.
Metric/Attribute name | Description | Mbean name |
---|---|---|
waiting-threads | The number of user threads blocked waiting for buffer memory to enqueue their records. | kafka.producer:type=producer-metrics,client-id=([-.\w]+) |
buffer-total-bytes | The maximum amount of buffer memory the client can use (whether or not it is currently used). | kafka.producer:type=producer-metrics,client-id=([-.\w]+) |
buffer-available-bytes | The total amount of buffer memory that is not being used (either unallocated or in the free list). | kafka.producer:type=producer-metrics,client-id=([-.\w]+) |
bufferpool-wait-time | The fraction of time an appender waits for space allocation. | kafka.producer:type=producer-metrics,client-id=([-.\w]+) |
Consumer monitoring:
Monitoring during the consumer call.
Metric/Attribute name | Description | Mbean name |
---|---|---|
commit-latency-avg | The average time taken for a commit request | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) |
commit-latency-max | The max time taken for a commit request | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) |
commit-rate | The number of commit calls per second | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) |
commit-total | The total number of commit calls | kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.\w]+) |
The Connect monitoring:
Attribute name | Description | |
---|---|---|
connector-count | The number of connectors run in this worker. | |
connector-startup-attempts-total | The total number of connector startups that this worker has attempted. |
Streams monitoring:
Metric/Attribute name | Description | Mbean name |
---|---|---|
commit-latency-avg | The average execution time in ms for committing, across all running tasks of this thread. | kafka.streams:type=stream-metrics,client-id=([-.\w]+) |
commit-latency-max | The maximum execution time in ms for committing across all running tasks of this thread. | kafka.streams:type=stream-metrics,client-id=([-.\w]+) |
poll-latency-avg | The average execution time in ms for polling, across all running tasks of this thread. | kafka.streams:type=stream-metrics,client-id=([-.\w]+) |
These metrics cover all aspects of our use of Kafka and kafka.log for logging information. There are specific parameters under each Mbean.
Using parameters such as inbound and outbound rate, ISR change rate, Producer side Batch size, number of threads, Consumer side latency, flow rate, etc. Of course, we also need to pay attention to the JVM and OS level monitoring. There are general tools for these parameters, which will not be described here.
The monitoring principle of Kafka has been basically understood, and most of the other third-party monitoring tools are also improved in this level. Here are a few mainstream monitoring tools.
Second, the JmxTool
JmxTool is not a framework, but a tool that Kafka provides by default for viewing JMX metrics in real time.
Go to the Kafka installation directory and run the bin/kafka-run-class.sh kafka.tools.JmxTool command to obtain the help information of the JmxTool tool.
For example, if we want to monitor the inbound rate, we can type the command:
bin/kafka-run-class.sh kafka.tools.JmxTool --object-name kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec --jmx-url service:jmx:rmi:///jndi/rmi://:9997/jmxrmi --date-format "YYYY-MM-dd HH:mm:ss" --attributes FifteenMinuteRate --reporting-interval 5000
Copy the code
The value of BytesInPerSec is printed on the console every 5 seconds:
>kafka_2.12-2.0.0 RRD $bin/kafka-run-class.sh kafka.tools.JmxTool --object name kafka. Server:type=BrokerTopicMetrics,name=BytesInPerSec --jmx-url service:jmx:rmi:///jndi/rmi://:9997/jmxrmi --date-format "YYYY-MM-dd HH:mm:ss" --attributes FifteenMinuteRate --reporting-interval 5000
Trying to connect to JMX url: service:jmx:rmi:///jndi/rmi://:9997/jmxrmi.
"time"."kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec:FifteenMinuteRate"The 2018-08-10 14:52:15, 784224.2587058166 2018-08-10 14:52:20, 1003401.2319497257 2018-08-10 14:52:25, 1125080.6160773218 The 2018-08-10 14:52:30, 1593394.1860063889Copy the code
Third, Kafka – Manager
Kafka is an open-source monitoring framework written in Scala by Yahoo in 2015. The github address is github.com/yahoo/kafka…
Conditions of use:
- Kafka 0.8..Or 0.9..Or 0.10..Or 0.11..
- Java 8+
Kafka – download manager
Configuration: the conf/application. Conf
kafka-manager.zkhosts="my.zookeeper.host.com:2181,other.zookeeper.host.com:2181"
Copy the code
Deployment: SBT deployment is used here
./sbt clean dist
Copy the code
Activation:
Bin/kafka - manager specified port: $bin/kafka - manager - Dconfig. The file = / path/to/application. The conf - Dhttp. Port = 8080 permissions: $ bin/kafka-manager -Djava.security.auth.login.config=/path/to/my-jaas.confCopy the code
Then access local host:8080
You can see the monitoring page:
Figure topic
Figure broker
The page is very concise, also has a lot of rich features, open source free, recommended use, but the current version supports Kafka 0.8.. Or 0.9.. Or 0.10.. Or 0.11, need special attention.
Fourth, kafka – monitor
Kafka is a monitoring framework for kafka. The github address is github.com/linkedin/ka…
Based on Gradle 2.0 and above, supports Java 7 and Java 8.
Support kafka from 0.8-2.0, users can download different branches according to their needs.
Use:
Compile:
$ git clone https://github.com/linkedin/kafka-monitor.git
$ cd kafka-monitor
$ ./gradlew jar
Copy the code
Modify the configuration: config/kafka-monitor.properties
"zookeeper.connect" = "localhost:2181"
Copy the code
Activation:
$./bin/kafka-monitor-start.sh config/kafka-monitor.properties Single-cluster startup: $./bin/single-cluster-monitor.sh --topictest--broker-list localhost:9092 --zookeeper localhost:2181 $ ./bin/kafka-monitor-start.sh config/multi-cluster-monitor.propertiesCopy the code
Then visit localhost:8080 to see the monitoring page
Figure kafkamonitor
At the same time, we can also query other metrics through HTTP request:
curl localhost:8778/jolokia/read/kmf.services:type=produce-service,name=*/produce-availability-avg
Copy the code
In general, its Web features are relatively simple, users do not use much, HTTP function is very useful, supported by many versions.
Kafka Offset Monitor
Website address http://quantifind.github.io/KafkaOffsetMonitor/
Making address github.com/quantifind/…
Use: Execute after download
Java - cp KafkaOffsetMonitor - assembly - 0.3.0. Jar: kafka - offset - monitor - another - db - reporter. Jar \ com.quantifind.kafka.offsetapp.OffsetGetterWeb \ --zk zk-server1,zk-server2 \ --port 8080 \ --refresh 10.seconds \ --retain 2.days --pluginsArgs anotherDbHost=host1,anotherDbPort=555Copy the code
Then look at localhost:8080
Figure offsetmonitor1
Figure offsetmonitor2
The project is more focused on monitoring offset, and the pages are rich, but not updated after 15 years to support the latest version of Kafka. Continue to maintain the version of the following address https://github.com/Morningstar/kafka-offset-monitor.
Six, Cruise control
In August 2017, Linkedin opened source the Cruise-Control framework for monitoring large clusters, including a number of operations functions. Linkedin reportedly has over 20,000 Kafka clusters, and the project is still being updated.
Github address: github.com/linkedin/cr…
Use:
Download gitclone https://github.com/linkedin/cruise-control.git && cdCruise control/compile. / gradlew jar to modify the config/you properties the bootstrap. The servers to zookeeper. Start the connect: ./gradlew jar copyDependantLibs ./kafka-cruise-control-start.sh [-jars PATH_TO_YOUR_JAR_1,PATH_TO_YOUR_JAR_2] config/cruisecontrol.properties [port]Copy the code
Access after startup:
http://localhost:9090/kafkacruisecontrol/state
There are no pages, and everything is provided in the form of REST apis.
The interface list is as follows: github.com/linkedin/cr…
The framework is flexible enough that users can optimize their clusters by capturing various metrics based on their own situation.
Seven, Doctorkafka
DoctorKafka is Pinterest’s open source Kafka cluster self-healing and workload balancing tool.
Pinterest is a social site for sharing pictures. They use Kafka as a centralized messaging tool for data ingestion, stream processing, and other scenarios. As the number of users grew, the Kafka cluster grew larger, and the complexity of managing it became a heavy burden for the operations team, so they developed DoctorKafka, a Kafka cluster self-healing and workload balancing tool. They recently opened the project on GitHub.
Use:
Download: gitclone [git-repo-url] doctorkafka
cdDoctorkafka compile: MVN package-pl kafkastats -am Start: java-server \ -dlog4j.configurationFile=file:./log4 j2. XML \ - cp lib / * : kafkastats - 0.2.4.8. Jar \ com pinterest. Doctorkafka. Stats. KafkaStatsMain \ - broker 127.0.0.1 \ -jmxport 9999 \ -topic brokerstats \ -zookeeper zookeeper001:2181/cluster1 \ -uptimeinseconds 3600 \ -pollingintervalinseconds 60 \ -ostrichport 2051 \ -tsdhostport localhost:18126 \ -kafka_config /etc/kafka/server.properties \ -producer_config /etc/kafka/producer.properties \ -primary_network_ifacename eth0Copy the code
The page is as follows:
Figure dockerkafka
DoctorKafka periodically checks the status of each cluster after startup. When it detects a failed broker, it transfers the failed broker’s workload to a broker with sufficient bandwidth. It raises an alarm if there are not enough resources in the cluster to reallocate. Belongs to a framework that automatically maintains cluster health.
Eight, Burrow
Burrow is LinkedIn’s open source framework for monitoring consumer Lag.
The github address is github.com/linkedin/Bu…
Using Burrow to monitor Kafka, there is no need to set a threshold for LAG, which is a completely dynamic evaluation based on the consumption process
Burrow supports reading offsets from Both Kafka Topic and ZooKeeper, both old and new versions of Kafka
Burrow supports HTTP, email alerts
By default, Burrow provides only HTTP interfaces (HTTP endpoint). Data is in JSON format, and there is no Web UI.
Installation and use:
$ Clone github.com/linkedin/Burrow to a directory outside of $GOPATH. Alternatively, you can export GO111MODULE=on to enable Go module.
$ cd to the source directory.
$ go mod tidy
$ go install
Copy the code
Example:
List all monitored Kafka cluster curl-s http://localhost:8000/v3/kafka |jq
{
"error": false."message": "cluster list returned"."clusters": [
"kafka"."kafka"]."request": {
"url": "/v3/kafka"."host": "kafka"}}Copy the code
Other frameworks, kafka-web-console: github.com/claudemamo/…
Kafkat:github.com/airbnb/kafk…
Capillary:github.com/keenlabs/ca…
Chaperone:github.com/uber/chaper…
There are many more, but we need to make our choices based on our version of Kafka.
For more blog posts on real-time computing, Kafka and other related technologies, welcome to real-time streaming computing