Monitoring Common Indicators

  • Node_exporter Standard performance indicator
Avg (irate(node_cpu_seconds_total{mode='idle', Instance ="10.12.69.173:9100"}[1m])) *100) # (((node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes)/node_memory_MemTotal_bytes) * 100) # Disk usage - 100 (node_filesystem_free_bytes {fstype = ~ "corruption | XFS", mountpoint = "/", the instance = "10.12.69.173:9100}" / Node_filesystem_size_bytes {fstype = ~ "corruption | XFS", mountpoint = "/", the instance = "10.12.69.173:9100} * 100) # network transmission rate Node_network_receive_bytes_total {device="eth0",instance="10.12.69.173:9100",job="node_exporter"} # AVERAGE CPU load Node_load5 {instance = "10.12.69.173:9100", the job = "node_exporter}"Copy the code
  • Mysql monitors performance indicators
Mysql > select * from 'down'; Mysql_up # Number of queries per second rate(mysql_global_status_slow_queries[5m]) # Number of connections Rate (mysql_global_status_threads_connected[5m]) # mysql_global_variables_max_connections - Mysql_global_status_threads_connected [5m]) # mysql_slave_status_slave_sql_running rate(mysql_slave_status_seconds_behind_master[5m])Copy the code
  • Pod performance index
# Pod CPU usage container_memory_usage_bytes{container_name! =""} / container_spec_memory_limit_bytes{container_name! = ""} * 100! = +Inf # Pod Memory usage sum by (pod_name)(rate(container_cpu_usage_seconds_total{image! =""}[1m] ) ) * 100Copy the code

Create rules

Node Alarm Rules

[root@prometheus Prometheus]# mkdir rules [root@prometheus Prometheus]# vim rules/node_rules.yml groups: - name: Test rules: - alert: The memory usage is too high expr: 100-(node_memory_Buffers_bytes+node_memory_Cached_bytes+node_memory_MemFree_bytes)/node_memory_MemTotal_bytes*100 > 90 For: 30s # Alarm duration, the alarm is sent to AlertManager Labels: Severity: Warning Labels: summary: Instance {{$alllabels. Instance}} Memory usage is too high "{{$alllabels. Instance}} of job {{$alllabels. Job}} Memory usage exceeds 80%, current memory usage [{{$value}}]." 100-avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by(instance)*100 > 90 for: 30s labels: severity: Warning Annotations: Summary: "Instance {{$alllabels. Instance}} CPU usage is too high" "{{$labels. The instance}} of job {{$labels. Job}} more than 80% CPU usage, the current utilization [{value} {$}]."Copy the code

MySQL Alarm Rules

  • Write the indicators
[root@prometheus ~]# vim /usr/local/Prometheus/prometheus/rules/mysql_rules.yml
groups:
- name: mysql_rules
  rules:
  - record: mysql:status
    expr: mysql_up{instance=~".*9104"}
  - record: mysql:uptime
    expr: mysql_global_status_uptime{job="mysqld_exporter"}
  - record: mysql:mysql_threads_connected
    expr: mysql_global_status_threads_connected{job="mysqld_exporter"}
  - record: mysql:mysql_threads_running
    expr: mysql_global_status_threads_running{job="mysqld_exporter"}
  - record: mysql:mysql_aborted_connects
    expr: increase(mysql_global_status_aborted_connects{job="mysqld_exporter"}[2m])
  - record: mysql:mysql_slow_queries
    expr: increase(mysql_global_status_slow_queries{job="mysqld_exporter"}[2m])
  - record: mysql:mysql_table_locks
    expr: increase(mysql_global_status_table_locks_waited{job="mysqld_exporter"}[2m])
  - record: mysql:mysql_qps
    expr: rate(mysql_global_status_queries{job="mysqld_exporter"}[2m])
Copy the code
  • Writing Alarm Rules
[root@prometheus ~]# vim /usr/local/Prometheus/prometheus/rules/ groups: - name: mysql_alerts rules: - alert: MySQL_Down_Alert expr: mysql:status==0 for: 1m labels: metric_type: db_monitor resource: db severity: Annotations critical Annotations: Summary: Host {{$alllabels. Nodename}} database exception! Description: host {{$labels. The nodename}} {{$labels. Job}} on there may be exceptions, please check! - alert: MySQL_uptime_Alert expr: mysql:uptime<1 for: 1m labels: metric_type: db_monitor resource: db severity: Annotations critical Annotations: Summary: Host {{$alllabels. Nodename}} database exception! Description: {{$allelages. nodename}} database is abnormal. - alert: MySQL_threads_connected_Alert expr: mysql_threads_connected > 100 for: 1m labels: metric_type: db_monitor resource: db severity: critical annotations: summary: Description Database indicator Threads_Connected on host {{$allages. nodename}} exceeded threshold! {{humanize $value}} {humanize $value}} {humanize $value}} {humanize $value}} {humanize $value}} - alert: MySQL_threads_running_Alert expr: mysql:mysql_threads_running > 200 for: 1m labels: metric_type: db_monitor resource: db severity: critical annotations: summary: Database indicator Threads_RUNNING on host {{$allages. nodename}} exceeded threshold! {{humanize $value}} {humanize $value}} {humanize $value}} {humanize $value}} {humanize $value}} - alert: MySQL_aborted_connects_Alert expr: mysql:mysql_aborted_connects > 10 for: 1m labels: metric_type: db_monitor resource: db severity: annotations: summary: Aborted_connects on {{$allages. nodename}} Aborted_connects on {{$allages. nodename}} {{humanize $value}}, -alert expr: mysql:mysql_slow_queries > 1 for: 1m labels: metric_type: db_monitor resource: db severity: critical annotations: Summary: Database indicator slow_queries on host {{$allages. nodename}} exceeded threshold! Description: Slow_queries on {{$allages. nodename}} exceeds the threshold of slow_queries ({{$humanize $value}}). - alert: MySQL_table_locks_Alert expr: mysql:mysql_table_locks > 1 for: 1m labels: metric_type: db_monitor resource: Db Severity: Critical Annotations: ummary: host {{$alllabels. Nodename}} {{$allelage. nodename}} table_locks exceeds the threshold of {{humanize $value}}. - alert: MySQL_qps_Alert expr: mysql:mysql_qps > 500 for: 1m labels: metric_type: db_monitor resource: db severity: Critical Annotations: Summary: Host {{$alllabels. Nodename}} database metric QPS exceeds the threshold! {{$humanize $value}} {humanize $value}} {humanize $value}} {humanize $value}} {humanize $value}} {humanize $value}}Copy the code

Pod Alarm Rules

groups: - name: noah_pod.rules rules: - alert: PodMemUsage expr: container_memory_usage_bytes{container_name! =""} / container_spec_memory_limit_bytes{container_name! = ""} * 100! = +Inf > 80 for: 2m labels: severity: warning annotations: summary: "{{$labels.name}}: Pod High Mem usage detected" description: "{{$labels.name}}: Pod Mem is above 80% ,(current value is: {{ $value }})" - alert: PodCpuUsage expr: sum by (pod_name)( rate(container_cpu_usage_seconds_total{image! =""}[1m] ) ) * 100 > 80 for: 2m labels: severity: warning annotations: summary: "{{$labels.name}}: Pod High CPU usage detected" description: "{{$labels.name}}: Pod CPU is above 80% ,(current value is: {{ $value }})"Copy the code