This article belongs to the K8S monitoring series, and the other articles are:
- K8s Monitoring (1) Install Prometheus
- K8s Monitoring (2) Monitors cluster components and PODS
- K8s Monitoring (3) Prometry-Adapter
K8s monitoring the fourth article, this article is about monitoring the host machine indicators. Node_exporter is official and most users do this with node_exporter, but I prefer Telegraf. The reason is that Node_exporter has the following major pain points:
- There are too many indicators. For CPUS alone, each CPU core has 6 indicators. If there are 72 cores, then there are 432 indicators for cpus alone.
- You can’t customize the metrics you want to collect, so you either collect them or you don’t collect them, not just parts of them.
- Customized monitoring scripts are not supported.
- There are no indicators of TCP’s 11 states (maybe I don’t know what to look at?) And I don’t know what to do with so many network indicators. I can’t understand any of them.
Telegraf has no such problems. With that in mind, this article will deploy both, and the choice is up to you.
As usual, all yML files have been uploaded to Github.
node_exporter
Just note the following:
- Use Daemonset to ensure that each K8S node is deployed;
- To mount both /proc and /sys on the host, you need to mount the root.
- Use the host network namespace.
There are only 5 deployment files, all of which start with Node-exporter. Kubectl apply = kubectl apply = kubectl apply = kubectl apply = kubectl apply = kubectl apply = kubectl
The curl 127.0.0.1:9100 / metricsCopy the code
After you have collected all indicators, modify Prometry-config. yml and add the following configuration:
- job_name: node_exporter
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
scrape_interval: 30s
scrape_timeout: 30s
tls_config:
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: keep
source_labels:
- __meta_kubernetes_service_label_k8s_app
regex: node-exporter
Copy the code
Change the – in the service label name k8S-app to _, otherwise Reload Prometheus will report an error.
After the modification, execute kubectl apply-f promethy-config. yml. At this point, you’d better log in to Prometheus to check whether the configuration file is valid.
curl -XPOST POD_IP:9090/-/reload
Copy the code
This can then be seen in the Target of the Prometheus Web page.
telegraf
Node_exporter has too many ambiguous indicators, which can take up a lot of extra resources, so I choose the more customized Telegraf. Telegraf is a metric collection tool developed by InfluxData using Go, and InfluxData’s other product, InfluxDB, is well known, These two factors together with the remaining Chronograf and Kapacitor constitute the monitoring system tick of InfluxData.
I’m not going to mention tick here, we’re just going to use telegraf. Telegraf is similar to logstash. It is divided into four parts: Input, Processor, Aggregator and Output. Each part is provided with specific functions by each plug-in. All of telegraf’s functionality is provided by plug-ins, but plug-ins fall into four categories.
This article will use the Input, Output and Processor, as for the Aggregator (aggregate, used to calculate the maximum, minimum, average, etc.) of a period of time can be studied by interested children.
Here, we use Telegraf to collect performance indicators of the host computer. Since there are many indicators, including CPU, memory, disk, network, etc., multiple input plug-ins will be used. Some plug-ins offer options that give us more control over what metrics we want to collect, which is handy and much more useful than node_exporter’s collection.
After obtaining these metrics, Prometheus Output is used because it is collected by Prometheus, and all collected metrics are displayed on the MeTIRC page.
Let’s start with its configuration file. All of its configuration is in this configuration file. Before we get into configuration files, we need to know that Telegraf has some concepts of its own:
- Field: indicates the name of an indicator.
- Tag: indicates the tag of an indicator.
To avoid duplication, I have posted the configMap content directly, and we only need to start with [Agent].
apiVersion: v1
kind: ConfigMap
metadata:
name: telegraf
namespace: monitoring
labels:
name: telegraf
data:
telegraf.conf: |+
[agent]
interval = "10s"
round_interval = true
collection_jitter = "1s"
omit_hostname = true
[[outputs.prometheus_client]]
listen = ": 9273"
collectors_exclude = ["gocollector", "process"]
metric_version = 2
[[inputs.cpu]]
percpu = false
totalcpu = true
collect_cpu_time = false
report_active = false
[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs". "devfs". "iso9660". "overlay". "aufs". "squashfs"]
[inputs.disk.tagdrop]
path = ["/etc/telegraf", "/dev/termination-log". "/etc/hostname". "/etc/hosts". "/etc/resolv.conf"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
fielddrop = ["slab", "wired". "commit_limit". "committed_as". "dirty". "high_free". "high_total". "huge_page_size". "huge_pages_free". "low_free". "low_total". "mapped". "page_tables". "sreclaimable". "sunreclaim". "swap_cached". "swap_free". "vmalloc_chunk". "vmalloc_total". "vmalloc_used". "write_back". "write_back_tmp"]
[[inputs.processes]]
[[inputs.system]]
[[inputs.netstat]]
[[inputs.net]]
ignore_protocol_stats = true
interfaces = ["eth*", "bond*". "em*"]
fielddrop = ["packets_sent", "packets_recv"]
Copy the code
The configuration file
The official documentation of the Telegraf configuration file is here, there is not much in it, you can have a look. It doesn’t matter if you don’t want to see it. I’ll explain everything I have here. Telegraf uses the CONFIGURATION file format of TOML. [] indicates a dictionary and [[]] indicates a list. Yml looks like this:
agent:
# Collection interval
interval: 30s
# Without this, it looks like it would only be collected once
round_interval: true
# Multiple inputs may cause CPU spikes if they are collected at the same time. Use this time to stagger
collection_jitter: 1s
[bug Mc-10868] - no hostname tag will be added for all metrics
omit_hostname: true
inputs:
- disk:
The specified file system is not collected
ignore_fs: []
# if the tag has path as the following, do not collect the corresponding index
tagdrop:
path: ["/etc/telegraf", "/dev/termination-log". "/etc/hostname". "/etc/hosts". "/etc/resolv.conf"]
- system: {}
- cpu:
Node_exporter uses this option. You can't turn it off yet, but Telegraf can
percpu: false
# This is definitely on, statistics total CPU usage
totalcpu: true
# Count CPU time to see if you need it
collect_cpu_time: false
# add a new active index, which is the sum of all values except idle. If CPU time is not counted, subtract 100 from it
# idle, the value is the value of active
report_active: false
- mem:
Check the inputs in mem inputs for inputs
# Because the old man is not good at learning, many memory indicators do not understand, simply kill them, you decide by yourself
fielddrop: []
outputs:
- prometheus_client:
listen: : 9273
# Exclude go itself (Goroutine, GC, etc.) and Process
collectors_exclude: ["gocollector", "process"]
Copy the code
Among the four major parts of Telegraf, only Processor has no corresponding keyword. Currently, it only filters Input, Output, and Aggregator. Fielddrop and Tagdrop in the Input configuration file belong to the Processor and are used to filter indicators. The keywords for filtering are:
namepass
: The indicator name is used as the filtering condition. Pass is a whitelist. The value of pass is a list, and the elements in the list can use wildcards.namedrop
: Blacklist. Note that name is not the same as field. For example, there is a total field in the memory indicator, but its name is mem_total.fieldpass
: filters by field name, its value type is also list;fielddrop
: Blacklist;tagpass
: If the tag contains a key/value, the metric is not collected. Note that its value type is dictionary, as described above;tagdrop
: Blacklist;taginclude
: This is to remove the tag and the value type is list. Keep all tags in the list;tagexclude
: Deletes all tags from the list.
I only used TagDrop and FieldDrop, and you can use anything else you need. The Processor can easily delete indexes that we do not need, which is very convenient.
Combined with this, you should be able to easily understand the configuration I’m using here. I’ve only collected a few common system metrics here, but if you need more, check out the official input documentation for a variety of plug-ins.
pod
Telegraf obviously also uses Daemonset, and there are some key points to mention about its POD configuration.
- Mount /proc and /sys separately. Disk metrics will be problematic.
- Use HOST_PROC, HOST_SYS, and HOST_MOUNT_PREFIX environment variables to let Telegraf collect the directory of the mounted host.
- HostNetwork and hostPID must be true.
- Use securityContext to run pod as a non-root user. The uid specified is the uid of the host user, regardless of whether the user is present in the image. Once pod is running, you can pass through the pod host
ps -ef|grep telegraf
View running users.
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: telegraf
namespace: monitoring
labels:
k8s-app: telegraf
spec:
selector:
matchLabels:
name: telegraf
template:
metadata:
labels:
name: telegraf
spec:
containers:
- name: telegraf
image: Telegraf: 1.13.2 - alpine
resources:
limits:
memory: 500Mi
requests:
cpu: 500m
memory: 500Mi
env:
- name: "HOST_PROC"
value: "/host/proc"
- name: "HOST_SYS"
value: "/host/sys"
- name: "HOST_MOUNT_PREFIX"
value: "/host"
volumeMounts:
- name: config
mountPath: /etc/telegraf
readOnly: true
- mountPath: /host
name: root
readOnly: true
hostNetwork: true
hostPID: true
nodeSelector:
kubernetes.io/os: linux
securityContext:
runAsNonRoot: true
runAsUser: 65534
tolerations:
- operator: Exists
terminationGracePeriodSeconds: 30
volumes:
- name: config
configMap:
name: telegraf
- hostPath:
path: /
name: root
Copy the code
I won’t say much about Services, which are used for discovery by Prometheus. After applying all three files, modify the Prometheus configuration.
Modify Prometheus configuration
The Prometheus configuration may vary depending on usage.
- job_name: telegraf
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
scrape_interval: 30s
scrape_timeout: 30s
tls_config:
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: keep
source_labels:
- __meta_kubernetes_service_label_k8s_app
regex: telegraf
- source_labels:
- __meta_kubernetes_endpoint_node_name
target_label: instance
Copy the code
The only configuration added here is to change the instance tag to node name instead of the default __address__. If you want to keep the default instance, you can change instance to whatever name you want. I changed the default instance to Node name because there were too many indicators.
The node name is used in conjunction with the kubectl top node command (which covered the previous article in this series). Therefore, the value of the instance tag should correspond exactly to the value you would see using kubectl Get Node. Of course, if you can use the kubectl top node command directly, then there is no need to add this tag.
After modifying the apply, and then was also the exec in the Prometheus container, check the/etc/Prometheus/config/Prometheus. If yml has been changed. After the changes are made, the execution is returned to the host;
curl -XPOST PROMETHEUS_CONTAINER_IP:9090/-/reload
Copy the code
After reload, you can access the metrics page directly from the host IP.
curl IP:9273/metrics
Copy the code
You can see that the indicators are clear and easy to understand, and are small in number, far more than node_exporter.
Modify Prometheus Adapter configuration
In the previous article, we deployed the Prometheus Adapter and used it to provide the Resource Metric API through which we could use the Kubectl top command. But since I removed the index with an id=”/” tag, the default node index query is invalidated.
If you want to use it, you can restore the index that was deleted. The default query statement looks like this:
# cpu
sum(rate(container_cpu_usage_seconds_total{<<.LabelMatchers>>, id='/'}[1m])) by (<<.GroupBy>>)
# memory
sum(container_memory_working_set_bytes{<<.LabelMatchers>>,id='/'}) by (<<.GroupBy>>)
Copy the code
But if you restore all of them, the number of indicators will increase a lot, some of the gains outweigh the losses. Now that we have collected the host’s metrics, we can simply ask it to query the host’s metrics, not the container’s. So we just need to replace these two queries with these two:
# cpu
100-cpu_usage_idle{cpu="cpu-total", <<.LabelMatchers>>}
# mem
mem_used{<<.LabelMatchers>>}
Copy the code
But you need to make sure that the following configuration exists:
resources:
overrides:
instance:
resource: nodes
Copy the code
This configuration maps the node resource to the value of the instance tag. When you execute kubectl top node, it will fetch all nodes and then put each node into a query expression, such as the CPU with node name k8S-node1:
100-cpu_usage_idle{cpu="cpu-total", instance="k8s-node1"}
Copy the code
The complete configuration can be seen on Github, and is actually just two query statements changed.
The kubectl top node command can be executed after the Prometheus Adapter pod is removed and restarted after apply, but the CPU display is not correct. If you are not familiar with the adapter implementation logic, you can study it. But memory is accurate, just look at memory.