Monitor hosts and containers

Node_export monitoring

Node_export a library that can be used to collect various host metrics, including CPU, memory, and disk data. It is installed on Node.

Download and install

Prometheus. IO/download / # n… Decompress the package and install it on node

The port number

The default port is 9100

node_exporter --web.listen-address==":9600" --web.telemetry-path=="/node_metrics"
Copy the code

Collect list

node_exporter --no-collector.arp
Copy the code

Do not use the collector. The arp

All listings are available at github.com/prometheus/…

textfile collector

Metadata {role=”docker_server”,datacenter=”NJ”} 1

Specify the directory address: /var/lib/node_exporters /textfile_collector/metadata.prom

--collector.textfile.directory
Copy the code

Systemd collector

Docker. Service Docker daemon process
Ssh. service SSH daemon process
Rsyslog. service Rsyslog daemon

node_exporter --collector.textfile.directory /var/lib/node_exporter/textfile_collector --collector.systemd --collector.systemd.unit-whitelist="(docker|ssh|rsyslog).service" 
Copy the code

grab

9100 is the port number of node_exporter

scrap_config:
  - job_name: 'promethus'
  static_configs:
    - targets: ['localhost:9090']
  - job_name: 'node'
  static_configs:
    - targets: ['ip1:9100', 'ip2:9100']
Copy the code

Filter collector

scrap_config:
  - job_name: 'promethus'
  static_configs:
    - targets: ['localhost:9090']
  - job_name: 'node'
  static_configs:
    - targets: ['ip1:9100', 'ip2:9100']
  params:
    collect[]:
      - cpu
      - meminfo
Copy the code

Monitor docker containers

Cadvisor Docker run executes port 8080 UI address /containers

docker run \
...
--publish=8080:8080
google/cadvisor:latest
Copy the code

Life cycle of fetching

Service discovery – “configuration -” – “crawl -” – “re – mark (still don’t understand why re – mark twice

The label

Changing or adding tags creates a new time series.

To mark

The easiest way to remember both phases is to use relabel_configs before fetching and metric_relabel_configs after fetching

Delete unnecessary indicators
Remove sensitive or unwanted labels from metrics
Add, edit, or modify the label value or format of an indicator

Delete indicators

Action: drop

Replace the indicators

This is because the default operation is replace, and if no operation is specified, Prometheus assumes that you want to replace. By default, Honor_labels is false, and Prometheus will rename existing labels by adding exported_ prefix to them. replacement: $1

Remove the label

The Action: LabelDrop tag is a unique constraint on the time series. If you delete tags and cause the time series to repeat, then the system may have problems!

Method of USE

CPU utilization

100 - avg(irate(node_cpu_seconds_total{job="node",mode="idle"}[5m])) by (instance) * 100
Copy the code

CPU saturation rate

It is usually normal for the average load to be less than the number of cpus, and exceeding that number for an extended period of time indicates CPU saturation.

Number of CPU:

count by (instance) (node_cpu_seconds_total{mode="idle"})
Copy the code

Node_load:

They show 1 -, 5 -, and 15-minute load averages. Average load of 1 minute: node_load1.

node_load1 > on (instance) 2 * count by (instance) (node_cpu_seconds_total{mode="idle"})
Copy the code

Memory usage

Unit: bytes

Node_memory_MemTotal_bytes: indicates the total memory on a host
Node_memory_MemFree_bytes: memory available on a host
Node_memory_Buffers_bytes: memory in the buffer cache
Node_memory_Cached_bytes: memory in the page cache

The last three add up to the total available memory

100 - (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes + node_memory_Buffers_bytes + node_memory_Cached_bytes)) / node_memory_MemTotal_bytes * 100
Copy the code

Memory saturation

1024 * sum by (instance) (
(rate(node_vmstat_pswpin[1m]) + rate(node_vmstat_pswpout[1m]))
)
Copy the code

Node_exporter Specifies the number of bytes, in KB, collected from /proc/vmstat since the last boot.

Node_vmstat_pswpin: number of bytes read from disk to memory per second
Node_vmstat_pswpout: indicates the number of bytes written from memory to disk per second

Disk usage

For disks, we only measure disk usage and not usage, saturation, or errors.

(1-node_filesystem_size_bytes{mountpoint="/data"}/node_filesystem_free_bytes{mountpoint="/data"})*100
Copy the code

predict_linear(node_filesystem_free_bytes{job="node"}[1h], 4*3600) < 0
Copy the code

Service status

node_systemd_unit_state

The metadata index

node_systemd_unit_state{name="docker.service"} == 1
and on (instance, job)
metadata{datacenter="SF"}
Copy the code

Query persistence

Evaluation_interval is the time here

rule_files:
  - "rules/node_rules.yml"
Copy the code

Record rule: Create a new indicator based on the query

Generate aggregates across multiple time series. Precomputation consumes large queries

Alert rules: Generate alerts from queries
Visualization: Visualize queries using dashboards such as Grafana

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Prometheus Monitoring Actual Combat Chapter 4 Monitoring Hosts and Containers

Monitor hosts and containers

Node_export monitoring

Download and install

The port number

Collect list

textfile collector

Systemd collector

grab

Filter collector

Monitor docker containers

Life cycle of fetching

The label

tag

To mark

Delete indicators

Replace the indicators

Remove the label

Method of USE

CPU utilization

CPU saturation rate

Memory usage

Memory saturation

Disk usage

Service status

The metadata index

Query persistence

Prometheus Monitoring Actual Combat Chapter 4 Monitoring Hosts and Containers

Monitor hosts and containers

Node_export monitoring

Download and install

The port number

Collect list

textfile collector

Systemd collector

grab

Filter collector

Monitor docker containers

Life cycle of fetching

The label

tag

To mark

Delete indicators

Replace the indicators

Remove the label

Method of USE

CPU utilization

CPU saturation rate

Memory usage

Memory saturation

Disk usage

Service status

The metadata index

Query persistence

Related Posts

LeetCode 263. Ugly number: problem solving | brush problem punching

A picture to clarify – how did the first 15 years come about

Deploy GreatSQL in Docker and build MGR cluster