Didi experts said operations 1.2 - Monitoring indicators example (Linux, Redis, application business level)

A lot of users to ask, what is the meaning of this indicator, what is the meaning of that indicator, explain a long time also do not understand, the core reason is that the user to he wants to monitor the target itself is lack of knowledge. For example, net.in. Dropped indicates the number of packets lost per second by the inbound NIC. If you have never executed the ifconfig command and don’t know the cause of network packet loss, it is really hard to understand the value.

The following will explain how to obtain some common metrics to give you a sense of the basics of Linux. Sit tight and hold steady

Examples of Indicators related to Linux

A lot of the metrics in Linux come from the /proc directory, which is a very special directory.

The /proc directory on Linux systems is a type of file system, the Proc file system. Unlike other common file system, / proc is a pseudo file system (and the virtual file system), storage is the current state of the kernel run a series of special files, the user can through these files to view information about the system hardware and is currently running process, can even by changing some of those files to change kernel running state.

Due to the particularity of the /proc file system described above, the files inside it are also often referred to as virtual files and have some unique characteristics. For example, some of these files return a large amount of information when viewed using the view command, but the file itself is displayed as 0 bytes in size. In addition, the time and date attributes of most of these special files are usually the current system time and date, due to the fact that they can be refreshed (stored in RAM) at any time.

To facilitate viewing and use, these files are usually classified into different directories or even subdirectories according to their correlation. For example, the /proc/scsi directory stores information about all SCSI devices in the current system, and the /proc/n directory stores information about running processes in the current system. Where N is the running process (you can imagine that the associated directory disappears after a process terminates).

Most virtual files can be viewed using file view commands such as cat, more, or less. Some file information is easy to read, but others are less readable. However, these less readable files can perform well when viewed using commands such as APM, free, lSPCI, or top.

Let’s pick a simple indicator: the load data of the last 1min, 5min and 15min. We usually use these three indicators, which can be seen by executing the uptime command, for example:

[root@10-255-0-103 ~]# uptime
 10:49:34 up 10 days, 16:54,  1 user,  load average: 0.09, 0.14, 0.18
Copy the code

The monitor system does not execute the uptime command to collect data, which is too low and inefficient. In fact, the monitor system reads data from /proc/loadavg file, because the /proc read does not involve real disk IO, so the efficiency is very high. Take a look at the contents of this file:

[root@10-255-0-103 ~]# cat /proc/loadavg
0.09 0.14 0.18 3/271 9043
Copy the code

/proc/meminfo/memory usage/memory usage/memory usage/memory usage/memory usage/memory usage/memory usage

MemTotal, MemFree, MemAvailable.

Finally, looking at disk usage, this is much harder to get than either of the above, not just reading the contents of /proc. The command on Linux is df -h, and you can see the usage of each mount point. The monitoring system actually collects this data in two steps.

You read /proc/mounts to get all mount points, filter out some virtual mount points, and then execute a system call to Statfs to get the usage of the partition.

2. Examples of Redis related indicators

Here I have an instance of Redis, and I’ll run the info command to show you the output:

You see, there are all kinds of indicators, and when we talk about Redis monitoring, it’s all about monitoring these indicators.

Easy to use monitoring will allow you to directly configure the redis connection address, the monitoring system will automatically connect to the command. If page configuration mode is not provided, plug-in extension mode is generally provided at least, allowing users to write scripts to collect monitoring indicators and then push monitoring data to the monitoring server.

3. Monitor application business-level indicators

Application-level monitoring mainly involves interface success rate, response time, QPS, etc. It is better to collect these monitoring data in a unified HTTP framework or RPC framework, or in a unified seven-layer access framework to reduce the access costs of each service.

For example, the data interface module of the monitoring system, I want to count how many data points are received per second. Then the data receiving module needs to write relevant acquisition logic in the code.

Index extraction through logs is also a typical method. ELK is usually used to collect logs to the center and write query statements to query and analyze. But that’s a heavier approach, and we’ve introduced a lighter approach at Nightingale for your reference. The core logic is to configure the regular expression on the server and deliver it to the Agent. The Agent streams the log file and matches each line of the log file with the regular to check whether the log file matches. If the log file matches, some indicator information can be extracted from the log file.

Example 1: If there is an OOM program, the keyword Out of memory will be printed in /var/log/messages. I want to generate an alarm when the system triggers OOM. Then I can write a re to match /var/log/messages and calculate how many logs are hit in the last 1min. Report the number of matched log lines as an indicator.

Example 2: for example, if there is a transaction program, each order prints the transaction amount in the log, we can write a re to extract this amount from the log, and count the total amount of transactions in the last 1min or the average transaction unit price, etc.

Drops Logi

Didi Logi log service suite has been polished in Didi for more than 7 years, aiming at log collection, log storage, log computing, log retrieval and log analysis. It has carried out targeted optimization in component capability PAAS construction and engine stability and expansibility.

Currently, this suite has opened source Didi Logi-Kafkamanager, and will open source logi-Agent, Logi-LogX, logi-ElasticSearchManager and other PAAS suites in the future.

1. Github: z.didi.cn/4newP

2, rapid experience address: http://117.51.150.133:8080/kafka account password admin/admin

3, daily FAQ: github.com/didi/Logi-K…

4. Manual: github.com/didi/Logi-K…

5. Didi Logi-Kafkamanager Cloud platform Construction Summary:

Mp.weixin.qq.com/s/9qSZIkqCn…

6, series of video tutorials: mp.weixin.qq.com/s/9X7gH0tpt…

Drops the nightingale

Didi Nightingale is a set of distributed and highly available operation and maintenance monitoring system. Its biggest feature is hybrid cloud support, which can support both traditional physical machine virtual machine scenarios and K8S container scenarios. Meanwhile, Didi Nightingale is not only capable of monitoring, but also of CMDB and automatic operation and maintenance. Many companies develop their own operation and maintenance platforms based on Didi Nightingale.

Making: z.d idi. Cn / 4 wurz

Official document: n9e.didiyun.com

Gocn. VIP /topics/1081…

The voice answer: m.ximalaya.com/keji/450958…

Video tutorial: m.bilibili.com/space/44253…

Secondary development: xie.infoq.cn/article/30d…

If you have problems using didi Logi-KafkaManager and nightingale, or have any questions you need to communicate with the developers, you can scan the qr code below to enter didi Logi and nightingale’s open source user group, and ask questions in the group.

The group has Didi Logi-KafkaManager and nightingale project leader: Didi senior expert engineer — Zhang Liang, Qin Xiaohui and other technology giants, online for you to answer questions, welcome everyone to pay attention to [Didi cloud Obsuite] public number reply Kafka or nightingale plus small assistant into the group. (Note Kafka or nightingale)

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Didi experts said operations 1.2 — Monitoring indicators example (Linux, Redis, application business level)

Didi experts said operations 1.2 — Monitoring indicators example (Linux, Redis, application business level)

Related Posts

Common essence collection of CSS

Nuggets interview GitHub CEO Nat Friedman — We should do everything we can to make sure the Star numbers are real

It’s a long way from being in debt to being a shareholder don’t give up. There’s more I can do. Summary | 2021 years