Abstract: The alarm function is an essential module of various cloud platforms. Personalized alarm configuration plays an important role in helping users and o&M personnel discover problems in time.

This document is shared in huawei cloud community. The alarm Framework of GaussDB(DWS) Intelligent Database Monitoring System is Online! , by Codefulture.

This document describes the DMS alarm framework from the following aspects:

  • Source of the alarm framework of the intelligent database monitoring system
  • The realization of the alarm framework
  • Shortfalls and expectations of the alarm framework

A brief introduction to the alarm framework of database intelligent monitoring system

The alarm function is an essential module of various cloud platforms, including Ali Cloud, Tencent Cloud, and even Huawei cloud itself. Personalized alarm configuration helps users and O&M personnel discover problems in a timely manner.

The alarm framework of the Intelligent Database Monitoring System (DMS) (alarm framework for short) is used to monitor cluster information of the data warehouse and is developed based on clusters of a later version than 8.1.1. If your cluster version is earlier than 8.1.1 or the DMS is not installed, the secondary alarm function cannot be used.

Alarm function is combined with the product requirements, business requirements, customer requirements independently design and development, in order to be able to let the user more quickly familiar with the function and use of time, the alarm function in the design of also with reference to the use of other platforms, the related concepts, combined with their own situation to adjust, finished the first edition of design and development.

Realization and use of alarm framework

1. Implement the alarm framework

Before we talk about the implementation, we need to understand the concepts involved in the alarm framework.

  • Alarm indicators: Alarm indicators are actual monitored items, such as CPU usage, disk usage, and I/O.
  • Alarm policy: An alarm policy is the smallest unit that triggers an alarm. Each policy is specific to an alarm indicator. Alarm policies are classified into threshold policies and status policies.
  • Alarm rules: Alarm rules are the smallest unit of actual monitoring (task scheduling) and a collection of alarm policies. Alarm rules include default rules and custom rules.
  • Default alarm rules: The default alarm rules are basic alarm items provided by the system. Users can receive alarm information based on simple service configurations.
  • Normal alarm rules: When the default alarm rules cannot meet actual requirements, you can create custom alarm rules based on actual requirements.
  • Relationships among rules, alarm policies, and alarm indicators: An alarm rule (default or custom) can contain multiple alarm policies, and the policies in the rule have different relationships. The known policy relationships are as follows:
  1. Independent (or) : The policies are not actually related to each other. As long as one policy meets the conditions, an alarm is generated.
  2. Priority: Indicates that all monitored indicators of a rule are the same but the thresholds are different. The system determines whether to send alarms based on the descending order of thresholds.
  3. And: An alarm is generated if all policies meet the conditions.

After understanding the preceding concepts, the alarm framework is composed of three parts: monitoring indicator collection, alarm policy customization, and alarm task scheduling.

1-1. Collecting monitoring indicators

To monitor a database, you must collect indicators of the database, obtain real-time or periodic database and cluster status through reasonable statistics and query, and trigger alarms based on alarm policies.

1-2. Customize alarm policies

The following figure shows the composition of alarm policies. Multiple configurations can be achieved through different combinations of configuration items. In subsequent iterations, more configurable items will be added to support more service scenarios.

1-3. Alarm task scheduling

Monitoring index is a cyclical process, which requires a stable scheduler to support the task scheduling of the alarm framework. Currently, the distributed scheduling framework Quartz is used. The following figure shows the execution logic for scheduling tasks.

2. Use the alarm framework

The DMS Alarm framework is located in the Alarm Management menu of Data Warehouse Services.

The home page provides alarm statistics, including the alarms generated within one week. You can view statistics and detailed alarm information.

Click View Alarm Rules to view the list of alarm rules.

The alarm framework provides custom alarm rules and default alarm rules. The default alarm rules are not built-in. You can add custom alarms based on your own requirements.

Click “Create Rule” or “Modify” button to enter the configuration page.

Currently, only binding cluster, Threshold, Duration period, Suppression condition, and Alarm Severity can be modified. Other options will be available in future versions to provide users with more configuration options.

2-1. Description of the modification items

  1. You can configure multiple clusters for which the secondary alarm rule applies. The default value is all.
  2. You can adjust the upper limit or lower limit for triggering alarms. Each indicator provides a default threshold range. You can modify the threshold based on actual conditions.
  3. By modifying the duration period, you can extend or shorten the query range of indicator data and check whether the indicator changes in a long period or abnormal changes at a certain time.
  4. By modifying the suppression conditions, you can control the sending frequency of alarms. The alarms in the suppression period will not be sent repeatedly.

Limitations and expectations of the alarm framework

The DMS alarm framework is still under construction, but has many shortcomings, such as providing more monitoring indicators, supporting multiple policy configuration modes, and inconvenient expansion of alarm items.

In addition to solving the above pain points, it is hoped that the alarm framework can be linked with the function modules of the system to make the monitoring system more “intelligent”.

For more information about GuassDB(DWS), please search “GaussDB DWS” on wechat to follow the wechat public account, and share with you the latest and most complete PB series warehouse black technology, the background can also obtain many learning materials oh ~

Click to follow, the first time to learn about Huawei cloud fresh technology ~