Configuring alarm Rules

Alertmanager deployment

  • Downloading binary packages
https://prometheus.io/download/
Copy the code

Install the alertmanager

  • Installation steps
[root@prometheus ~]# tar xf alertManager-0.20.0-rc.0.linux-amd64.tar. gz -c /usr/local/[root@prometheus ~]# ln -sv /usr/local/alertmanager-0.20.0-rc.0. Linux -amd64/ /usr/local/Prometheus_alertmanager/ [root@prometheus ~]# CD /usr/local/Prometheus_alertmanager/Copy the code

Configure alertManager email alarms

  • Example Modify the AlertManager configuration file
[root@prometheus Prometheus_alertmanager]# cp alertmanager.yml alertmanager.yml.bak [root@prometheus Prometheus_alertmanager]# vim alertmanager.yml global: resolve_timeout: 5m smtp_smarthost: 'smtp.163.com:25' # SMTP address smtp_from: '[email protected]' # who sent the email smtp_auth_username: '[email protected]' # email user smtp_auth_password: 'XXXXX' # email client authorization password smtp_require_TLS: false route: # route Used to set alarm distribution policy group_by: [" alertName "] # group_wait: Group_interval: 30s # Warning sending interval REPEAT_interval: 20m # Repeated alarm interval receiver: # routes: # Specifies which groups receive which messages # -receiver: 'Node_warning' # continue: true # group_wait: 10s # match_re: # service: Mysql | Cassandra all service = # mysql or service = Cassandra alarm receiving end # assigned to the database - receiver: 'MySQL_warning # group_wait: 10s # match_re: # Fping -Receiver group # serverity: Warning Receivers: # 'Node_warning' email_configs: - to: '[email protected]' #- name: 'MySQL_warning' # email_configs: # - to: '[email protected]'Copy the code

test

  • Check the configuration
[root@prometheus Prometheus_alertmanager]# ./amtool check-config alertmanager.yml
Checking 'alertmanager.yml'  SUCCESS
Found:
 - global config
 - route
 - 0 inhibit rules
 - 1 receivers
 - 0 templates
Copy the code

Test success

  • Configure systemd to start alertManager
[root@prometheus ~]# vim /lib/systemd/system/alertmanager.service [Unit] Description=Alertmanager After=network.target [Service] ExecStart=/usr/local/Prometheus_alertmanager/alertmanager -- config. The file = '/ usr/local/Prometheus_alertmanager/alertmanager yml' [Install] WantedBy = multi - user. Target # reload and set up the boot from the rev [root@prometheus ~]# systemctl daemon-reload [root@prometheus ~]# systemctl start alertmanager [root@prometheus ~]# systemctl enable alertmanagerCopy the code

If you use the url http://alertmanager_ip:9093 on the web UI, the AlertManager page is displayed

Establish communication

  • Modify the Communication address of AlertManager in the Prometheus configuration file
[root@prometheus ~]# CD /usr/local/Prometheus [root@prometheus Prometheus]# vim Prometheus. Alertmanagers: - Static_configs: -targets: -127.0.0.1:9093 RuLE_files: - "rules/node_rules.yml" # - "rules/mysql_rules.yml"Copy the code
  • Configuring Alarm Rules
[root@prometheus Prometheus]# mkdir rules [root@prometheus Prometheus]# vim rules/node_rules.yml groups: - name: Test rules: - alert: The memory usage is too high expr: 100-(node_memory_Buffers_bytes+node_memory_Cached_bytes+node_memory_MemFree_bytes)/node_memory_MemTotal_bytes*100 > 90 For: 30s # Alarm duration, the alarm is sent to AlertManager Labels: Severity: Warning Labels: summary: Instance {{$alllabels. Instance}} Memory usage is too high "{{$alllabels. Instance}} of job {{$alllabels. Job}} Memory usage exceeds 80%, current memory usage [{{$value}}]." 100-avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by(instance)*100 > 90 for: 30s labels: severity: Warning Annotations: Summary: "Instance {{$alllabels. Instance}} CPU usage is too high" "{{$labels. The instance}} of job {{$labels. Job}} more than 80% CPU usage, the current utilization [{value} {$}]."Copy the code

The test rules

  • To check alarm rules, reload Prometheus
[root@prometheus Prometheus]# curl -XPOST http://localhost:9090/-/reload
Copy the code
  • You can view the alarm status on Alert on the Prometheus interface
- Green: Normal. - PENDING indicates that alerts are not sent to Alertmanager because for: 30s is configured in rules. - 30 seconds later, Prometheus sends an alarm to the AlertManager when the status changes from PENDING to FIRING. An alert is displayed in the AlertManager.Copy the code