This is the third day of my August challenge.

The paper

Dear friends, I feel that I have posted a little too many technical articles recently. I wonder if it has brought some inspiration to you to solve practical problems in your work. Why do you say that? Because it is the little knowledge points involved in the article that add up to a lot, which liberated me from the fragmented and busy operation and maintenance work to a certain extent. Believe the small partner that has read seriously, the technical ability that can solve pain point of what tall and high is not only in meeting to feel the job, on the contrary, it is the detail that those we ignore at ordinary times is the key of the problem. Then only by hitting the nail on the head, can we do the right thing.

So in the next few days, I will probably share my thoughts on some of the problems in the process of operation and maintenance, hoping to give you some inspiration.

This is the integration of CMDB and Zabbix monitoring system.

The status quo

1. Insufficient maintenance

(1) In the current o&M environment, the monitoring system is relatively independent, so you need to manually group, create templates, and add monitors. With the increasing number of hosts and services, the group has been seriously separated from the current service group, architecture group, etc. (2) The lack of communication and coordination among multiple departments leads to the chaos of the monitoring system. Although the current monitoring system has specified a series of maintenance specifications, such as naming, monitoring interval and fault classification, etc.; But surveillance remains chaotic;

2. The performance is insufficient

(1) With more and more servers, monitoring covers basic monitoring, network monitoring, service monitoring, log monitoring, etc.; The monitoring details need to be constantly enlarged, which will inevitably lead to the performance degradation of the monitoring. Therefore, it is necessary to adjust the passive to active, monitor classification, monitor interval and other optimization; (2) Alarm flooding and false alarm caused by network jitter or other factors bring unnecessary trouble to operation and maintenance personnel, so alarm convergence is needed to avoid;

3. Lack of integration

(1) The delivery of virtual machines to JumpServer and CMDB through blue Whale standard operation and maintenance; Delivery of the monitoring system to Zabbix relies on zabbix’s automatic discovery host, which is the basic monitoring of the host; CMDB did not get through to Zabbix; (2) Port monitoring and service monitoring are automatically added to the monitoring system through the system online pipeline. Business cannot be mapped to Zabbix; (3) Zabbix and Blue Whale can self-heal based on CMDB;

thinking

1. Why does insufficient maintenance occur?

Starting from the shelf process of the server, we have determined what service to deploy on the server, the port and URL of the service before the shelf of the server. In the process of launching, we need to go through the OS installation, jumpServer addition, CMDB registration, which has been through the Blue Whale standard operation and maintenance; To the monitoring system, we only add basic host monitoring through automatic discovery, and then add port monitoring and service monitoring automatically when the system goes online. Service groups are completely ignored, and only manual maintenance can be performed later.

2. Why does insufficient performance occur?

(1) The cause of insufficient performance is not one-time, but a chain reaction caused by the increasing number of monitoring items. Therefore, we need to prepare hierarchical templates for each level in advance and automatically deliver matching templates to avoid sudden deterioration of system performance due to inadequate understanding of monitoring by personnel at all levels. In this case, a sudden degradation of the monitoring system is avoided, but performance degradation over time is not complete. (2) Although Zabbix meets most of our needs to a certain extent, we share its performance by complementing zabbix with other monitoring tools, such as:

  • Grafana, we can plug part of the alarm source into Grafana;
  • Alertmanager provides advanced alarm management methods to effectively solve some of our alarm problems.
  • Cacti, the network layer monitoring control to correct the professional network monitoring tools;
  • Prometheus, for container level monitoring;
  • ELK to monitor application traffic;
  • APM, link tracking;

Therefore, our monitoring must be a platform that integrates multiple monitoring tools, rather than relying entirely on Zabbix to cover all monitoring requirements, which is not shown.

Why does insufficient fusion between CMDB and Zabbix occur? What is the role of the CMDB in the monitoring system? If only asset management, then the role of CMDB in the enterprise will continue to weaken, eventually become a chicken rib. Therefore, we need to use CMDB as the data support for all operation and maintenance systems, including the monitoring system. Then the problem of insufficient fusion is found, that is, CMDB does not play the role of data support in the monitoring system.

Looking forward to

1. The fusion CMDB

The CMDB connects to the monitoring system. When a new object is added to the CMDB, the object is automatically added to the monitoring system. In addition, when the configuration data changes, the monitoring system can send necessary alarms. If the machine goes online, it will be automatically registered to the CMDB; Business change, automatic registration to the CMDB; Role changes and downtime maintenance also become CMDB. In this case, the monitoring system only needs to maintain the corresponding rules, so that the monitoring system can add autonomously, so as to make up for the deficiency of automatic discovery. At the same time, the flexibility is greatly enhanced, as long as the CMDB obtains the relevant information and status of the equipment, and then proactively updates the monitoring system, and corrects previously added but not continuously updated monitoring.

2. CMDB data support

The CMDB provides necessary supporting data for the monitoring system to provide three-dimensional and standardized alarm information. What do you mean by that? Usually just zabbix monitoring system of the information we receive “XX XX index alarm and the details of the object information such as”, but we don’t know the alarm information belongs to the application system, which whether environment, high availability, who is head of the application, which systems rely on it, whether to change, so that operations can make further judgment and arrangement. Then at this time, we need a system to provide: application hierarchy topology, cluster information, module information, resource instances, association relationship and other information, this system is CMDB.

When an alarm occurs, the alarm system can query the comprehensive configuration information about the alarm object in the CMDB to provide the most accurate, rich, and standard alarm information.

solution

1. Blue Whale monitoring, Zabbix and Grafana multiple monitoring systems operate complementary

  • A. Blue Whale monitoring has the natural advantage of integrating CMDB, and has been used for memory TCP monitoring, NGINx, mysql and other components monitoring, covering the current production environment using open source tools; Therefore, it can be used to implement business monitoring and some basic monitoring.
  • B. Zabbix as basic monitoring, network monitoring, hardware monitoring, vsphere monitoring and other blue whale monitoring does not have;
  • C. Rafana +elk+ AlertManager is used as a visual monitoring tool to monitor the application running state by using its convergence feature;

2. Unified Data Source As the name implies, monitoring needs CMDB as a unified data source, but in the case of multiple sets of systems coexistence, we need to have a strong API integration capability. At this time, it is necessary to select a pointcut with good integration ability within the system, such as the standard operation and maintenance of Blue Whale, to realize API calls of different platforms.

Extension: 3 objective facts about surveillance systems

  1. The purpose of enterprise monitoring governance is to detect, solve and predict problems in time, not to integrate the monitoring system.
  2. Enterprise IT architectures are complex now and will be more complex in the future, and it is difficult to solve all monitoring requirements with one or two monitoring products; There is no such products and manufacturers, must have their own strengths;
  3. New businesses, systems and scenarios give rise to new monitoring requirements (such as containers). In the future, enterprises must have multiple monitoring products coexist, and it is imperative to build a monitoring platform with sustainable growth of functions.

conclusion

From the current situation of Zabbix monitoring and the role of CMDB, this paper expounds the relationship between the two, and further extended the purpose of the monitoring system in the future and its sustainable growth mode. Of course, operation and maintenance does not mean that it can not do without these, but in the right stage to choose the appropriate tools to ensure the reliability of the business.