Monitoring Common Indicators

Node_exporter Standard performance indicator

Avg (irate(node_cpu_seconds_total{mode='idle', Instance ="10.12.69.173:9100"}[1m])) *100) # (((node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes)/node_memory_MemTotal_bytes) * 100) # Disk usage - 100 (node_filesystem_free_bytes {fstype = ~ "corruption | XFS", mountpoint = "/", the instance = "10.12.69.173:9100}" / Node_filesystem_size_bytes {fstype = ~ "corruption | XFS", mountpoint = "/", the instance = "10.12.69.173:9100} * 100) # network transmission rate Node_network_receive_bytes_total {device="eth0",instance="10.12.69.173:9100",job="node_exporter"} # AVERAGE CPU load Node_load5 {instance = "10.12.69.173:9100", the job = "node_exporter}"Copy the code

Mysql monitors performance indicators

Mysql > select * from 'down'; Mysql_up # Number of queries per second rate(mysql_global_status_slow_queries[5m]) # Number of connections Rate (mysql_global_status_threads_connected[5m]) # mysql_global_variables_max_connections - Mysql_global_status_threads_connected [5m]) # mysql_slave_status_slave_sql_running rate(mysql_slave_status_seconds_behind_master[5m])Copy the code

Pod performance index

# Pod CPU usage container_memory_usage_bytes{container_name! =""} / container_spec_memory_limit_bytes{container_name! = ""} * 100! = +Inf # Pod Memory usage sum by (pod_name)(rate(container_cpu_usage_seconds_total{image! =""}[1m] ) ) * 100Copy the code

Create rules

Node Alarm Rules

[root@prometheus Prometheus]# mkdir rules [root@prometheus Prometheus]# vim rules/node_rules.yml groups: - name: Test rules: - alert: The memory usage is too high expr: 100-(node_memory_Buffers_bytes+node_memory_Cached_bytes+node_memory_MemFree_bytes)/node_memory_MemTotal_bytes*100 > 90 For: 30s # Alarm duration, the alarm is sent to AlertManager Labels: Severity: Warning Labels: summary: Instance {{$alllabels. Instance}} Memory usage is too high "{{$alllabels. Instance}} of job {{$alllabels. Job}} Memory usage exceeds 80%, current memory usage [{{$value}}]." 100-avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by(instance)*100 > 90 for: 30s labels: severity: Warning Annotations: Summary: "Instance {{$alllabels. Instance}} CPU usage is too high" "{{$labels. The instance}} of job {{$labels. Job}} more than 80% CPU usage, the current utilization [{value} {$}]."Copy the code

MySQL Alarm Rules

Write the indicators

[root@prometheus ~]# vim /usr/local/Prometheus/prometheus/rules/mysql_rules.yml
groups:
- name: mysql_rules
  rules:
  - record: mysql:status
    expr: mysql_up{instance=~".*9104"}
  - record: mysql:uptime
    expr: mysql_global_status_uptime{job="mysqld_exporter"}
  - record: mysql:mysql_threads_connected
    expr: mysql_global_status_threads_connected{job="mysqld_exporter"}
  - record: mysql:mysql_threads_running
    expr: mysql_global_status_threads_running{job="mysqld_exporter"}
  - record: mysql:mysql_aborted_connects
    expr: increase(mysql_global_status_aborted_connects{job="mysqld_exporter"}[2m])
  - record: mysql:mysql_slow_queries
    expr: increase(mysql_global_status_slow_queries{job="mysqld_exporter"}[2m])
  - record: mysql:mysql_table_locks
    expr: increase(mysql_global_status_table_locks_waited{job="mysqld_exporter"}[2m])
  - record: mysql:mysql_qps
    expr: rate(mysql_global_status_queries{job="mysqld_exporter"}[2m])
Copy the code

Writing Alarm Rules

[root@prometheus ~]# vim /usr/local/Prometheus/prometheus/rules/ groups: - name: mysql_alerts rules: - alert: MySQL_Down_Alert expr: mysql:status==0 for: 1m labels: metric_type: db_monitor resource: db severity: Annotations critical Annotations: Summary: Host {{$alllabels. Nodename}} database exception! Description: host {{$labels. The nodename}} {{$labels. Job}} on there may be exceptions, please check! - alert: MySQL_uptime_Alert expr: mysql:uptime<1 for: 1m labels: metric_type: db_monitor resource: db severity: Annotations critical Annotations: Summary: Host {{$alllabels. Nodename}} database exception! Description: {{$allelages. nodename}} database is abnormal. - alert: MySQL_threads_connected_Alert expr: mysql_threads_connected > 100 for: 1m labels: metric_type: db_monitor resource: db severity: critical annotations: summary: Description Database indicator Threads_Connected on host {{$allages. nodename}} exceeded threshold! {{humanize $value}} {humanize $value}} {humanize $value}} {humanize $value}} {humanize $value}} - alert: MySQL_threads_running_Alert expr: mysql:mysql_threads_running > 200 for: 1m labels: metric_type: db_monitor resource: db severity: critical annotations: summary: Database indicator Threads_RUNNING on host {{$allages. nodename}} exceeded threshold! {{humanize $value}} {humanize $value}} {humanize $value}} {humanize $value}} {humanize $value}} - alert: MySQL_aborted_connects_Alert expr: mysql:mysql_aborted_connects > 10 for: 1m labels: metric_type: db_monitor resource: db severity: annotations: summary: Aborted_connects on {{$allages. nodename}} Aborted_connects on {{$allages. nodename}} {{humanize $value}}, -alert expr: mysql:mysql_slow_queries > 1 for: 1m labels: metric_type: db_monitor resource: db severity: critical annotations: Summary: Database indicator slow_queries on host {{$allages. nodename}} exceeded threshold! Description: Slow_queries on {{$allages. nodename}} exceeds the threshold of slow_queries ({{$humanize $value}}). - alert: MySQL_table_locks_Alert expr: mysql:mysql_table_locks > 1 for: 1m labels: metric_type: db_monitor resource: Db Severity: Critical Annotations: ummary: host {{$alllabels. Nodename}} {{$allelage. nodename}} table_locks exceeds the threshold of {{humanize $value}}. - alert: MySQL_qps_Alert expr: mysql:mysql_qps > 500 for: 1m labels: metric_type: db_monitor resource: db severity: Critical Annotations: Summary: Host {{$alllabels. Nodename}} database metric QPS exceeds the threshold! {{$humanize $value}} {humanize $value}} {humanize $value}} {humanize $value}} {humanize $value}} {humanize $value}}Copy the code

Pod Alarm Rules

groups: - name: noah_pod.rules rules: - alert: PodMemUsage expr: container_memory_usage_bytes{container_name! =""} / container_spec_memory_limit_bytes{container_name! = ""} * 100! = +Inf > 80 for: 2m labels: severity: warning annotations: summary: "{{$labels.name}}: Pod High Mem usage detected" description: "{{$labels.name}}: Pod Mem is above 80% ,(current value is: {{ $value }})" - alert: PodCpuUsage expr: sum by (pod_name)( rate(container_cpu_usage_seconds_total{image! =""}[1m] ) ) * 100 > 80 for: 2m labels: severity: warning annotations: summary: "{{$labels.name}}: Pod High CPU usage detected" description: "{{$labels.name}}: Pod CPU is above 80% ,(current value is: {{ $value }})"Copy the code

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Prometheus Alert Monitoring Indicators Building a Monitoring System vii (Introduction)

Monitoring Common Indicators

Create rules

Prometheus Alert Monitoring Indicators Building a Monitoring System vii (Introduction)

Monitoring Common Indicators

Create rules

Related Posts

Rocketmq-streams Architecture Design analysis

HTTPRUNNER2.0 series 1: How can enterprise testers quickly implement interface automation testing solutions

Website packaging into APP, webAPP online packaging tool recommended