1. The implementation principle and the components used are introduced
Components:
- Node_exporter is responsible for collecting server data and exposing monitoring data to port 9100
- Prometheus obtains monitoring data from server 9100 and provides interfaces for other components to query data
- Grafana data visualization provides a better looking interface for displaying monitoring data and providing simple alarms
- Consul service discovery, enabling automatic registration of monitoring servers (If only one device is detected, perform configuration in the Prometheus profile without Consul)
- The AlertManager defines detailed alarm rules and forwards the alarm information to the Web service of Prometheus – Webhook-DingTalk
- Prometry-webhook-dingtalk is responsible for beautifying and pinning alarm messages
2. Install and configure Prometheus
Installation:
# downloadWget HTTP: / / https://github.com/prometheus/prometheus/releases/download/v2.7.2/prometheus-2.7.2.linux-amd64.tar.gz# decompressionThe tar - XZF Prometheus - 2.7.2. Linux - amd64. Tar. GzBase configuration prometheus.ymlCopy the code
Configuration file:
global:
scrape_interval: 15s # Collect data every 15s
evaluation_interval: 15s Evaluate the rule every 15s
This configuration uses the Alertmanager configuration method
alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093']
# Alarm rule configuration file directory
rule_files: [ 'rules.yml' ]
scrape_configs:
- job_name: 'prometheus-server'
static_configs:
- targets: ['localhost:9100']
The following configuration is using Consul automatic registration service
- job_name: 'node'
consul_sd_configs:
- server: '127.0.0.1:8500'
services: []
relabel_configs:
- source_labels: [__meta_consul_tags]
regex: .*prometheus-target.*
action: keep
# Replace the IP in instance with the machine name in Consul
- source_labels: [ __meta__consul_service ]
target_label: instanceCopy the code
Enable:
./prometheus &Copy the code
You can visit http://ip:9090/targets to see which servers are online and which are offline
To see the graphical interface for Prometheus, type http://ip:9090 in the browser and select a metric,node_load1, to see if any data is available:
3. Node_exporter installation
Download the zip packageWget HTTP: / / https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz# decompressionThe tar - XZF node_exporter - 0.18.1. Linux - amd64. Tar. Gz# enable
`./node_exporter &`Copy the code
4. Grafana installation
Download the package from the official websiteWget HTTP: / / https://dl.grafana.com/oss/release/grafana_6.2.5_amd64.deb# Version changes on demand
#
sudo apt-get install -y adduser libfontconfig1
# installationSudo DPKG -i grafana_6. 2.5 _amd64. Deb# enable
- `sudo service grafana-server start`
# Boot enabled
`sudo update-rc.d grafana-server defaults`
Port 3000 is enabled by default. The default account name and password are admin/adminCopy the code
Grafana official installation guide and use
After Prometheus, node_exporter, grafana installation, you can open a browser and enter IP :3000 to log in to grafana as admin. The first login will force you to change the password.
Add Prometheus data source after entering grafana screen
Click gear => Data Sources => Add Data Source, as shown below:
Add prmetheus data source
After the data source is added, you can customize your own dashboard. Post one of mine
5. Consul installation
Installation:
# docker installation
# pull mirror
docker pull consul
Run and bind ports
docker run --name consul -d -p 8500:8500 consul
Zip package installation
Download the installation packageWget HTTP: / / https://releases.hashicorp.com/consul/1.5.2/consul_1.5.2_linux_arm64.zip# decompressionUnzip consul_1. 0.0 _linux_amd64. ZipDuplicate Consul to bin
cp consul /usr/local/bin/
# startConsul agent-server-uI-bootstrap-expect 1-data-dir/TMP/consul-bind = 192.168.50.19-client 0.0.0.0 2> &1&# -bind=> server IP-client => IP address that can be accessedCopy the code
New configuration in prometheus.yml
- job_name: 'consul-prometheus'
consul_sd_configs:
# consul address
- server: 'xx.xx.xx.xx:8500' #
services: []
relabel_configs:
- source_labels: [__meta_consul_tags]
regex: .*prometheus-target.*
action: keepCopy the code
Registration Services:
{
"id": "prometheus-server"."name": "prometheus-node"."address": "192.168.50.19".The IP address to register the service
"port": 9100,
"tags": ["prometheus-target"]."checks": [{"http": "http://www.baidu.com".# health check website
"interval": "15s" # Health check interval}}]Copy the code
Save the configuration file as a JSON file and register with the JSON file:
curl --request PUT --data @regitor.json http://localhost:8500/v1/agent/service/registerCopy the code
Termination of service:
Sends a PUT request to the http://localhost:8500/v1/agent/service/deregister back to the service name
curl --request PUT http://localhost:8500/v1/agent/service/deregister/userService1 #userService1 is the deleted service nameCopy the code
6. Install and configure Alertmanager
Installation:
Download the installation packageWget HTTP: / / https://github.com/prometheus/alertmanager/releases/download/v0.17.0/alertmanager-0.17.0.linux-amd64.tar.gz# decompressionThe tar - XFZ alertmanager - 0.17.0. Linux - amd64. Tar. Gz# start
./alertmanagerCopy the code
Alertmanager. yml configuration file configuration:
global:
resolve_timeout: 5m
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h # Frequency of sending
receiver: 'webhook' # Use the notification channel webhook for the nail robot
receivers:
- name: 'webhook'
webhook_configs:
- url: 'http://localhost:8060/dingtalk/ops_dingding/send' Promethees-webhook-dingtalk is used to send messages to promethees-webhook-dingTalk
send_resolved: true Whether to send a notification after the alarm is cleared
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname'.'dev'.'instance']Copy the code
New configuration in Prometheus. yml:
rule_files: [ 'rules.yml' ]Copy the code
Rules.yml Configuration example:
Groups: - name: host_monitoring rules: - alert: memory alert expr: ((node_memory_MemFree_bytes) + (node_memory_Cached_bytes) + (node_memory_Buffers_bytes)) / 1024 / 1024 < 500for: 2m
labels:
team: node
annotations:
# Alert_type: memory alarm
# Server: '{{$labels.instance}}'
#summary: "{{$labels.instance}}: High Memory usage detected"
explain: "Free memory < 500MB, value: {{$value }}MB"
#description: "{{$labels.instance}}: Memory usage is above 80% (current value is: {{ $value }})"- alert: disk alarm expr: (Max (node_filesystem_avail_bytes{device=~"/dev.*"}) by (instance)) / 1024 / 1024 < 1024
for: 2m
labels:
team: node
annotations:
#Alert_type: disk alarm
#Server: '{{$labels.instance}}'
explain: "Available disk capacity less than 1 GiB, value: {{$value }}GiB"- alert: service alarm expr: up == 0for: 2m
labels:
team: node
annotations:
#Alert_type: service alarm
#Server: '{{$labels.instance}}'
explain: "Node_exporter service disconnected"Copy the code
After the configuration is complete, you can see the monitoring rule in the IP :9090/ Alerts interface, as shown in the figure:
7. Prometheus – Webhook-DingTalk installation and configuration
Since Prometry-Webhook-dingTalk is written in Golang, we install Golang first:
# Golang install configuration
# Download source codeWget HTTP: / / https://dl.google.com/go/go1.10.3.linux-amd64.tar.gz# decompression
tar -C /usr/local- XZF go1.10.3. Linux - amd64. Tar. GzAdd binary files to PATH
vim /etc/profile
export GOROOT=/usr/local/go
export PATH=$PATH:$GOROOT/bin
source /etc/profileCopy the code
Prometheus-webhook-dingtalk Installation and Configuration:
# golang under the SRC directory of the newly built and CD/usr/local/go/src/github.com/timonwong
# Clone project and compile
git clone https://github.com/timonwong/prometheus-webhook-dingtalk.git
cd prometheus-webhook-dingtalk
makeCopy the code
Start the
Profile = "ops_dingding=dingding_webhook" 2>&1 1>dingding.log &# start port 8060, start to netstat -a | grep 8060 see service service demonstrated normal bootCopy the code
At this point, all the configuration is complete. When the server has an alarm, the pinning robot will automatically send a message in the following style:
~~~ OVER ~~~