Prometheus + Grafana to monitor host CPU, GPU, MEM, IO, etc.
The premise
- Docker
The client
Node Exporter
Used to collect UNIX kernel host data, download and decompress here:
The tar wget https://github.com/prometheus/node_exporter/releases/download/v1.1.2/node_exporter-1.1.2.linux-amd64.tar.gz XVFZ node_exporter - 1.1.2. Linux - amd64. Tar. GzcdNode_exporter - 1.1.2. Linux - amd64 nohup. / node_exporter &Copy the code
View data:
$ curl http://localhost:9100/metrics
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
go_gc_duration_seconds{quantile="0.5"} 0...Copy the code
DCGM Exporter
Used to collect NVIDIA GPU data, run as Docker image:
docker run -d --restart=always --gpus all -p 9400:9400 nvidia/dcgm-exporter
Copy the code
View data:
$ curl localhost:9400/metrics
# HELP DCGM_FI_DEV_SM_CLOCK SM clock frequency (in MHz).
# TYPE DCGM_FI_DEV_SM_CLOCK gauge
# HELP DCGM_FI_DEV_MEM_CLOCK Memory clock frequency (in MHz).
# TYPE DCGM_FI_DEV_MEM_CLOCK gauge
# HELP DCGM_FI_DEV_MEMORY_TEMP Memory temperature (in C)..Copy the code
The server
Prometheus
Configure the ~ / Prometheus. Yml:
global:
scrape_interval: 15s
scrape_configs:
# Node Exporter
- job_name: node
static_configs:
- targets: ['192.167.200.91:9100']
# DCGM Exporter
- job_name: dcgm
static_configs:
- targets: ['192.167.200.91:9400']
Copy the code
Run the Docker image:
docker run -d --restart=always \
-p 9090:9090 \
-v ~/prometheus.yml:/etc/prometheus/prometheus.yml \
prom/prometheus
Copy the code
http://localhost:9090/ :
http://localhost:9090/targets:
Grafana
Run the Docker image:
docker run -d --restart=always -p 3000:3000 grafana/grafana
Copy the code
http://localhost:3000/ :
Log in as admin/admin.
New data source
New Prometheus:
Click Save & Test:
Import dashboard
8919 Node Exporter for Prometheus Dashboard by Starsl.cn
View dashboard:
12239 NVIDIA DCGM Exporter Dashboard by NVIDIA:
View dashboard:
reference
- Start Prometheus
- Prometheus Docs
- Configuration
- Node Exporter
- DCGM Exporter
- Grafana Docs
- Dashboards
- Plugins
GoCoding personal practice experience sharing, please pay attention to the public account!