Back end developer for TX, sharing back end technology, machine learning, data structure and algorithm, computer fundamentals, programmer interview, etc. Welcome to the public account “Rendongxue Programming”

Custom monitoring based on Grafana

preface

Because the content of this article is more detailed and complicated, so I put the structure of the article in the front, so that you can clarify your thinking. At the same time, this is the first time for the subject to write KM article. If there are any deficiencies in the article, I hope you can criticize and correct me.

1, the background

1.1. Cross monitoring

** Monitoring is a very important part of the whole product life cycle. Operation and maintenance focus on hardware and basic monitoring, RESEARCH and development focus on monitoring of various middleware and application layer, and product focus on monitoring of core business indicators. ** Self-monitoring the whole link of data reporting, transmission, storage, and application can achieve real-time collection of monitoring data, predict faults and alarms, etc. However, if the self-monitoring fails, how can it be discovered? This requires the introduction of external cross-monitoring to monitor the life cycle of self-monitoring.

1.2 monitoring tool selection

Is the so-called “no monitoring, no operation and maintenance”, the status of the monitoring system is self-evident. On the selection of monitoring system and some monitoring basis can refer to the article monitoring system selection. For the subject, the cross monitoring system needs to do ES data source import, real-time interface visualization, and timely multi-way alarm after threshold triggering. Although ELK three Musketers’ Kibana can also be visualized, it is not easy to use QAQ.

Grafana is a cross-platform open source visualization tool that enables complex query and presentation of data by configuring data sources. It supports up to 14 data sources, including MySQL and Elasticsearch, and supports configuration alerts for most data sources. So I chose the easy and simple Grafana!

Summary:

  • External inspection + internal self-check and cross monitoring can prevent the failure to query monitoring data when the internal monitoring fails and enhance the depth and level of monitoring.

  • The monitoring chart is three-dimensional, and the charts are grouped and hierarchical according to their relevance. The aggregation, grouping and three-dimensional monitoring chart is critical to quickly locate the root cause of the problem.

  • Custom Grafana monitoring configuration is easy and convenient

2, in field

As mentioned above, the selection of cross-monitoring tool is grafana, which needs to complete the requirements of ES data source import, real-time interface visualization, multi-way alarm after threshold triggering, and so on. The next step is to enter the practical stage.

2.1. Configure the DataSource

Mainly is to carry on the related setting to the data source, generates the valid data source

2.2. Configure the Dashboard

After you have configured the data sources to use, you can add a panel to configure yourself. There are also a variety of panels:

Graph is selected as an example to add or configure dashboards, as shown in the following figure. The red boxes in the upper right corner indicate new, Star, Share, Save, Set, query mode, Time range, Zoom out (a small time range is changed to a large time range), and Refresh

2.3. Configuration Variables

In this section, template variables are set to facilitate subsequent interface query. With flexible drop-down list configuration, you can customize the aggregation of monitoring charts to quickly locate problems.

For the Settings of Query statements, refer to the configuration method in the official document:

Query describe
{” find “:” fields “, “type” : “keyword”} Returns a list of field names of the index typekeyword.
{” find “:” terms “, “field” : “@hostname”, “size” : 1000} Use the term aggregation to return a list of values for a field. Query The time range on the current dashboard is used as the query time range.
{” find “:” terms “, “field” : “@hostname”, “query” : “} Use term Aggregation & and the specified Lucene query filter to return a list of values for the field. Queries will use the current dashboard time range as the query time range.

After you have a valid variable set, you can save it to see the preview

Note that only the variable configuration is complete, the drop down box does not work in the data query, need to use the query statement to bind the variable to take effect!!

2.4 Alarm for Interconnection with Nebula

The Nebula alarm management system is a universal alarm management solution. It provides alarm access management capabilities such as reporting, masking, subscribing, convergence, recovery, query, notification (supporting phone calls, wechat, enterprise wechat, and email miniprograms), upgrade, automatic processing, and statistical analysis. Through the structured definition of alarm data, rich open API, the system has a high degree of customization capability; At the same time, it seamlessly connects with Tencent Yunxing cloud work order system, duty system and process engine, with strong automatic processing ability; Currently, it mainly serves Tencent cloud infrastructure IAAS operation and maintenance scenarios. Most basic alarms of the cloud have been connected.

Nebula Alarm management system address: Alarm query

2.4.1 Related configurations

  • ** Step 1: ** Check if the Nebula alarm channel already exists, if not, add a new channel in Alerting — Notification Channels.

Note: The channel needs to be configured only once when the interface is used for the first time. If a nebula alarm exists, go to step 2.

  • ** Step 2: ** Configure the alarm conditions. Select the Alert option on the left side of the Panel page to configure the alarm conditions.

Note that an alarm is displayed indicating that template variables cannot be configured. The following two solutions are provided:

1. Multiple query solutions: In the following figure, query A is configured with template variables, and alarms cannot be configured. You can configure query statements for query B without using template variables to configure alarms. (Note that B view is masked to avoid view overlap)

2. Dual-view scheme: You can set two views, one for monitor and the other for Alert. Other configurations are the same as in the first scheme.

The difference between the two methods is not very big, multiple views may be more convenient and intuitive.Copy the code

After the template variable problem is resolved, return to the main topic and configure the alarm conditions

  • ** Step 3: ** Configure alarm information

  • Step 4: Add subscription alarm information to the Nebula

2.4.2 Testing alarms

  • ** Alarm triggering test: ** After the alarm conditions and rules are configured, you can perform an alarm test to determine whether the alarm will be triggered.

  • ** Enable alarm rules: ** If all the above actions have not failed, then congratulations you can enable alarm rules, for formal monitoring!!

  • Alarm effect:

  • ** Alarm history query: **Grafana can also query alarm history,

  • **Dashboard status display: ** The Dashboard screen will also have red and green colored lines to indicate different status

3, the References

This article mainly refers to the following articles, thanks to the authors of these articles.

The Grafana collapsed

Grafana high availability and Alerting