preface

After the configuration of Skywalking + NACOS + Gateway + Demoservice in the previous article, this article mainly provides the configuration of a Skywalking alarm rule and the sending of dingding alarm data.

The body of the

Basic Alarm Process

Skywalking Sends alarms by polling the link-tracing data collected by the Skywalk-Collector at intervals. Then, based on the configured alarm rules, such as service response time and percentage of service response time, the Skywalking sends response alarms when the specified alarm threshold is reached. To send alarm information, the thread pool asynchronously invokes the Webhook interface (the specific Webhook interface can be customized by users). In this way, developers can write various alarm modes, such as pin alarm and email alarm, in the specified WebHook interface.

Alarm Configuration

  1. Enable the alarm configuration related to Skywalking and edit config/alarm-settings.yml. After the configuration is enabled, the following information is displayed: rules indicates the list of alarm rules to be configured. The first rule, ‘endpoinT_PERCENT_rule,’ is the rule name, cannot be repeated and must end in ‘_rule’; The properties inside:
attribute meaning
metrics-name The specified rule (different from the rule name, here is the rule map in the corresponding alarm, Specific can see https://github.com/apache/skywalking/blob/master/docs/en/setup/backend/backend-alarm.md#list-of-all-potential-metri Cs-name, some of the common ones, endpoinT_percent_rule — endpoint half percentage alarm, service_percent_rule — service percentage alarm)
threshold Threshold, which matches the metrics-name and the comparison symbol below
op The comparison operator can be set to >,<,=, for example, metrics-name: endpoint_percent, threshold: 75, op: <, indicating that an alarm is generated if the corresponding period is less than 75% on average
period How often do I check whether the current indicator data complies with alarm rules
counts Send an alarm message after the number of alarms has reached
silence-period How long does it take to ignore the same alarm message
message Contents of Alarm Messages
include-names List of services that use this rule to generate alarms
rules:
  # Rule unique name, must be ended with `_rule`.
  endpoint_percent_rule:
    # Metrics value need to be long, double or int
    metrics-name: endpoint_percent
    threshold: 75
    op: <
    # The length of time to evaluate the metrics
    period: 10
    # How many times after the metrics match the condition, will trigger alarm
    count: 3
    # How many times of checks, the alarm keeps silence after alarm triggered, default as same as period.
    silence-period: 10
    
  service_percent_rule:
    metrics-name: service_percent
    # [Optional] Default, match all services in this metricsInclude-names: -service_a-service_b threshold: 85 op: < period: 10 count: 4 webhooks: - http://127.0.0.1//alarm/test
Copy the code
  1. Webhook interface URL definition (address customization), in addition to rule making, there is also the Webhook interface that needs to be called by Skywalking after the alarm rule is reached, as shown above, pay attention to the indentation of the URL, before the indentation of two Spaces, has not taken effect.

After the configuration is complete, restart Skywalking to take effect.

The WebHook interface is connected

Write the webhook docking interface, http://127.0.0.1//alarm/test, the current version is post + requestbody webhook interface call way, and the contents of the body are as follows:

[{"scopeId": 1, / / refers to the scope of the alarm types (source in defining constants org. Apache. Skywalking. The oap. Server core. The source, DefaultScopeDefine)"name":"gateway"// Name of the alarm service"id0":3, // Matches the service name one by one"id1":0, // Not currently in use"alarmMessage":"Response time of service gateway is more than 1000ms in 3 minutes of last 10 minutes."."startTime":1569552742633 // Alarm timestamp}, {"scopeId": 1,"name":"en-exercise"."id0": 2."id1": 0."alarmMessage":"Response time of service en-exercise is more than 1000ms in 3 minutes of last 10 minutes."."startTime": 1569552742633}]Copy the code

So the interface is defined as follows:

@RequestMapping("/alarm")
@RestController
public class AlarmController {
    @Autowired
    AlarmService alarmService;

    @RequestMapping(value = "/test",method = RequestMethod.POST) public void alarm(@RequestBody List<AlarmMessageDto> alarmMessageList){ System.out.println(alarmMessageList.toString()); // Process alarm information alarmService.doAlarm(alarmMessageList); @data Public Class AlarmMessageDto {private int scopeId; private String name; private int id0; private int id1; private String alarmMessage; private long startTime; }Copy the code

Alarm Interconnection

As shown above, after we get the data in webhook interface, we process the relevant logic in service alarmService.doAlarm(alarmMessageList); Therefore, you can customize the location for sending alarms, such as nails and emails. The specific access method is described in the nail documents and blogs, so you do not need to elaborate too much.

conclusion

This article mainly introduces the simple principle of Skywalking alarm, alarm rule configuration, webhook receiving interface writing, etc. In the future, skywalking will be personalized development, I hope to communicate with you more.

Reference: github.com/apache/skyw…