This article belongs to the K8S monitoring series, and the other articles are:
- K8s Monitoring (2) Monitors cluster components and PODS
- K8s Monitoring (3) Prometry-Adapter
- K8s monitoring (4) Monitor the host computer
K8s monitoring we need to accomplish the following points:
- Monitor the Master /node itself.
- Monitor cluster components such as apiServer and ETCD.
- Monitor pods that need attention;
- Custom monitoring, including JMX.
With these monitoring indicators in place, we need to do the following:
- Formulate corresponding alarm rules;
- Provide corresponding Webhook to send alarm;
- Grafana was deployed for graphical display;
- Monitor high availability of related components;
- And k8S metrics- Server.
At present, the recognized standard of K8S monitoring industry is to use Prometheus, which can also complete the monitoring work of K8S well. However, monitoring with native Prometheus requires a lot of work and an understanding of K8S and Prometheus itself.
Why not use Prometry-operator
For the convenience of operation, the coreos offers Prometheus – operator of such a product, it wraps the Prometheus, and also provides four custom k8s type (CustomResourceDefinitions), It also allows you to add new monitoring jobs and alarm rules by defining the manifest without manually manipulating the Prometheus configuration file, making the process k8S better.
And on top of that, Coreos also introduced upgrades such as Kube-Prometheus, which made Prometheus and Alertmanager highly usable on top of Prometheus-Operator, Node-exporter provides host monitoring, as well as the Prometheus adapter and Grafana of the Kubernetes Metrics APIs for one-click deployment.
I don’t know if it’s appropriate, but it’s certainly problematic because the Promethy-operator is still bate. And many of the additional components are just to avoid having you manipulate configuration files directly, and these components are additional costs.
Also because you can’t manipulate the configuration file directly, it is very difficult to modify the configuration file once you want to, because the configuration file is automatically generated, and once you want to modify its relabel_config configuration, you can only add it after the rules it generates.
There’s a situation where you might want to delete a tag that it automatically generated for you, but it doesn’t have a tag, it generated it for you, and then you want to delete it, so you have two more rules. And if you get it wrong, you don’t get it because Prometry-operator reloaded it for you.
You can use Kube-Prometheus if you’re using it simply. But if you want to understand the mechanics of it, or if you have your own customization needs, go native, and do whatever Prometheus-Operator can do.
How to do
This article will start at 0 and work through the initial monitoring requirements bit by bit. Even if you still wanted to use Kube-Prometheus, you should have no problem with it after you’ve read all my articles.
All we need to do is deploy the Prometheus image into K8S and use Kubernetes_sd_config to complete monitoring of K8S. Of course, prometry-Operator does the same thing.
While Prometheus could find all the pods in a K8S cluster, pod IP would change at any time, and you’d have to find all the pods without managing them by category, which would be messy.
Therefore, I’m going to discover only the endpoint, just as I did for Prometry-operator. Since creating a service creates an endpoint, we could categorize pods by Service, and then use a Prometheus job for a class of pods, which would be very neat.
I’ve uploaded all the files to GitHub so you can clone them without having to copy and paste them frequently.
In this article, the K8S version is 1.14.2 and is installed using Kubeadm. In addition, this article will not cover Prometheus much, which means you need to have some Prometheus background.
Query the Kubelet metrics page
It should be clear that an application that wanted to be monitored by Prometheus would have provided a URL that, when accessed, would have printed out all monitoring items line by line as text, and Prometheus would have accessed this URL to obtain all monitoring data. The default URI is HTTP [s]://IP:PORT/metrics.
Because Prometheus has become a standard, all components of K8S provide the /metrics URL. For host level monitoring of applications that do not provide this URL, an intermediate product can be used that collects application-related monitoring metrics and provides this URL for Prometheus to collect.
The product is called XXX_exporter, such as Node_exporter and Jmx_exporter. There are many official and unofficial exporters, and you can maintain one yourself through their client libraries.
If Prometheus could collect through HTTP, curl could do the same. Therefore, before using Prometheus to collect data, I use the curl command to output all the data to be collected so that everyone has a clear idea.
Of course, since this is a K8S environment, monitoring metrics will be collected with tags such as its namespace, service, POD name, and IP port, which you can optionally add or not add.
Without further ado, let’s take a look at Kubelet’s metrics. Create a directory to store all subsequent manifest files:
mkdir -p /opt/kube/prometheus
Copy the code
Start by creating a namespace under which all monitoring related resources are placed:
# vim 00namespace.yml
apiVersion: v1
kind: Namespace
metadata:
name: monitoring
# kubectl apply -f 00namespace.yml
Copy the code
As we know, POD is created by Kubelet, so pod indicators (including CPU, memory, flow, etc.) are provided by Kubelet, we can now visit kubelet indicators page, see what indicators are available.
As a daemon, kubelet listens on port 10250 by default, so it can be accessed directly from the master or node:
# curl https://127.0.0.1:10250/metrics/cadvisor - k
Unauthorized
Copy the code
Among them:
- You must use HTTPS;
metrics/cadvisor
Is the kubelet Pod related monitoring metric, and it has one moremetrics
, which is kubelet’s own monitoring indicator;-k
Kubelet certificates are not validated because the entire cluster uses self-signed certificates and therefore validation is not necessary.
The above reminds us that there is no authentication, so we can’t see the index data. Authentication is simple, just use the token, so we need to create the token first. You should be aware that when you create a serviceAccount, K8S automatically generates a secret for it, which contains the token information.
So we need to create a clusterRole and create a ClusterRoleBinding to bind the permission to a serviceAccount, and we have the token for the permission.
Yml, Prometheus -clusterRoleBinding. Yml, and Prometheus -serviceAccount. Yml.
Prometheus – clusterRole. Yml:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-k8s
rules:
- apiGroups:
- ""
resources:
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
Copy the code
Prometheus – clusterRoleBinding. Yml:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus-k8s
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus-k8s
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: monitoring
Copy the code
Prometheus – serviceAccount. Yml:
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus-k8s
namespace: monitoring
Copy the code
So we created a ServiceAccount called Prometry-K8s, which can now be used not only to get kubelet monitoring metrics, Subsequent Prometheus launches using this serviceAccount.
kubectl apply -f prometheus-clusterRole.yml
kubectl apply -f prometheus-clusterRoleBinding.yml
kubectl apply -f prometheus-serviceAccount.yml
Copy the code
When created, a secret containing the token is automatically generated:
# kubectl -n monitoring get secret
NAME TYPE DATA AGE
prometheus-k8s-token-xmkd4 kubernetes.io/service-account-token 3 3h26m
Copy the code
Access token:
token=`kubectl -n monitoring get secret prometheus-k8s-token-xmkd4 -o jsonpath={.data.token} | base64 -d`
Copy the code
Then use this token to access Kubelet’s metrics page:
The curl https://127.0.0.1:10250/metrics/cadvisor - k - H"Authorization: Bearer $token"
Copy the code
Just put the token in the request header and you can see all the monitoring metrics.
You can see that there are a number of indicators for these labels:
{container="",container_name="",id="/system.slice/tuned.service",image="",name="",namespace="",pod="",pod_name="",state="stopped"}
Copy the code
I have no idea what these do, I don’t know if they work, but I’m going to delete them all. Using Prometheus is there such a problem, what kind of index data, very anxious to put all the exposed, if you are a default without tube inside what index data, you may receive several times of useless data (for a lot of people, really is useless, because never concern). A lot of resources are wasted.
Kubelet also has a /metrics url in addition to the /metrics/ cAdvisor url, which is its own monitoring metric rather than pod’s. To be honest, I can’t read any of the data, so I’m wondering if I should accept it.
This way, you should be able to access the other K8S components, but not ETCD, which requires client certificate verification.
Query etCD indicators
The URL for etCD’s metrics page is also /metrics, but you need to provide a certificate to access it because it validates the client certificate. You can, of course, make the metrics page use HTTP instead of HTTPS in its startup parameters via the –listen-metrics-urls http://ip:port, so you don’t have to provide a certificate.
Although etCD is deployed in a container, it can be accessed by directly accessing port 2379 of the master due to the use of hostNetwork. By default it uses HTTPS, so we need to provide its peer certificate. If kubeadm k8s is used to install, etcd certificate in/etc/kubernetes/pki/etcd/directory.
So the command to access etcd is:
Curl https://127.0.0.1:2379/metrics -- cacert/etc/kubernetes/pki/etcd/ca. The CRT - cert /etc/kubernetes/pki/etcd/healthcheck-client.crt --key /etc/kubernetes/pki/etcd/healthcheck-client.keyCopy the code
Later we need to mount these three files into the Prometheus container so that it can collect etCD monitoring data.
Install the Prometheus
Prometheus itself relies on something, so we had to do some preparatory work before installing it.
We created two ConfigMaps, a configuration file for Prometheus and a rule file for alarms. The configuration file must be saved using ConfigMap. Do not directly store the configuration file in the container. Otherwise, the configuration file will be lost after the container hangs.
Prometheus configuration file
Create the Prometheus configuration file configmap Prometheus -config.yml, which contains the following contents:
apiVersion: v1
data:
prometheus.yml: |
global:
evaluation_interval: 30s
scrape_interval: 30s
external_labels:
prometheus: monitoring/k8s
rule_files:
- /etc/prometheus/rules/*.yml
scrape_configs:
- job_name: prometheus
honor_labels: false
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
scrape_interval: 30s
relabel_configs:
- action: keep
source_labels:
- __meta_kubernetes_service_label_prometheus
regex: k8s
- source_labels:
- __meta_kubernetes_endpoint_address_target_kind
- __meta_kubernetes_endpoint_address_target_name
separator: ;
regex: Pod; (. *)
replacement: The ${1}
target_label: pod
- source_labels:
- __meta_kubernetes_namespace
target_label: namespace
- source_labels:
- __meta_kubernetes_service_name
target_label: service
- source_labels:
- __meta_kubernetes_pod_name
target_label: pod
- source_labels:
- __meta_kubernetes_service_name
target_label: job
replacement: The ${1}
- target_label: endpoint
replacement: web
kind: ConfigMap
metadata:
name: prometheus
namespace: monitoring
Copy the code
A brief description of the configuration file: This is a first version of the configuration file, and you can see that there is only one job in it, which is monitoring Prometheus itself. Kubernetes_sd_configs is used to automatically discover nodes, services, pods, endpoints, and ingress in K8S.
Prometheus uses the endpoint to discover Prometheus itself, so you may wonder why not collect its own 127.0.0.1:9090? Given that Prometheus might have more than one, they would all be under a job.
Of course, if you don’t mind, you can just collect it from yourself, no problem at all. Then there’s a bunch of relabel_configs, and I’ll explain what they do one by one.
Let’s start with the first configuration:
- action: keep
source_labels:
- __meta_kubernetes_service_label_prometheus
regex: k8s
Copy the code
You should know that each service creates an endpoint, but the discovery of the endpoint for Prometheus discovered all endpoints in the namespace specified by K8S. So how do we ensure That Prometheus only finds the endpoint we need? The answer is through relabel_configs, which is what keep does here.
K8s collects the endpoint of a service only when the label of the service contains Prometheus =k8s. So we will create a service for Prometheus and label it Prometheus: K8s.
No URL is specified; Prometheus takes the default URL /metrics.
Move on to the next configuration:
- source_labels:
- __meta_kubernetes_endpoint_address_target_kind
- __meta_kubernetes_endpoint_address_target_name
separator: ;
regex: Pod; (. *)
replacement: The ${1}
target_label: pod
Copy the code
If __meta_kubernetes_endpoint_address_target_kind is Pod, The value of __meta_kubernetes_endpoint_address_target_name is prometheus-0, with one between them; After that, they add up to Pod; Prometheus – 0. Use regular expression Pod; ${1} takes the first group, which is prometheus-0, and passes it to pod.
Therefore, this section is to add a pod= Promethy-0 tag to all collected monitoring indicators.
If the value of __meta_kubernetes_endpoint_address_target_kind is not Pod, no tags will be added.
The next configuration is to convert the specified meta tag to the specified label, because otherwise the meta tag will be dried after relabel.
Create it:
kubectl apply -f prometheus-config.yml
Copy the code
Prometheus rule file
We don’t need any rule files at the moment, but since the configMap will be mounted into the container, we create an empty rule file.
Create a prometheus-config-rulefiles.yml file with the following contents:
apiVersion: v1
data:
k8s.yml: ""
kind: ConfigMap
metadata:
name: prometheus-rulefiles
namespace: monitoring
Copy the code
Create:
kubectl apply -f prometheus-config-rulefiles.yml
Copy the code
The role and roleBinding
Because Prometheus uses the serviceAccount prometheus-k8s (Service Account) created earlier, there is no way for Prometheus to view the service or endpoint using the serviceAccount prometheus-k8s.
We used Kubernetes_sd_config mainly for discovery using the endpoint, so Promethes-k8s had to have more permissions.
We need to create more roles and bind these permissions to the SA prometheus-K8s by roleBinding. ClusterRole is not used to minimize permissions.
This is going to create Prometheus – roleConfig. Yml, Prometheus – roleBindingConfig. Yml, Prometheus – roleSpecificNamespaces. Yml, Prometheus – roleBindingS Pecificnamespaces.yml these four files have the following contents.
Prometheus – roleConfig. Yml:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: prometheus-k8s-config
namespace: monitoring
rules:
- apiGroups:
- ""
resources:
- configmaps
verbs:
- get
Copy the code
Prometheus – roleBindingConfig. Yml:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: prometheus-k8s-config
namespace: monitoring
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: prometheus-k8s-config
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: monitoring
Copy the code
Prometheus – roleSpecificNamespaces. Yml:
apiVersion: rbac.authorization.k8s.io/v1
items:
- apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: prometheus-k8s
namespace: default
rules:
- apiGroups:
- ""
resources:
- services
- endpoints
- pods
verbs:
- get
- list
- watch
- apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: prometheus-k8s
namespace: kube-system
rules:
- apiGroups:
- ""
resources:
- services
- endpoints
- pods
verbs:
- get
- list
- watch
- apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: prometheus-k8s
namespace: monitoring
rules:
- apiGroups:
- ""
resources:
- services
- endpoints
- pods
verbs:
- get
- list
- watch
kind: RoleList
Copy the code
Prometheus – roleBindingSpecificNamespaces. Yml:
apiVersion: rbac.authorization.k8s.io/v1
items:
- apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: prometheus-k8s
namespace: default
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: prometheus-k8s
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: monitoring
- apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: prometheus-k8s
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: prometheus-k8s
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: monitoring
- apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: prometheus-k8s
namespace: monitoring
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: prometheus-k8s
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: monitoring
kind: RoleBindingList
Copy the code
Of the above permissions, config was used to read configMap, and the following permissions were used by Prometheus for K8S discovery. Finally, a rulebinding was used to bind all of these permissions to the SA prometry-k8s.
These permissions will be used for subsequent Prometheus container access to the API Server and components within the cluster.
Final application:
kubectl apply -f prometheus-roleBindingConfig.yml
kubectl apply -f prometheus-roleBindingSpecificNamespaces.yml
kubectl apply -f prometheus-roleConfig.yml
kubectl apply -f prometheus-roleSpecificNamespaces.yml
Copy the code
Create the pv
Prometheus stores data to disk, so we had to use statefulSet, which would have required a store. I’m going to use NFS directly, but you might need to build one, which I won’t mention here.
Create prometheus-pv.yml:
apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus
labels:
name: prometheus
spec:
nfs:
path: /data/prometheus
server: 10.11.3.
accessModes: ["ReadWriteMany", "ReadWriteOnce"]
capacity:
storage: 1Ti
Copy the code
Then apply:
kubectl apply -f prometheus-pv.yml
Copy the code
Create a service
Statefulset must have a headless service, and we also need to create a service for endpoint discovery. Creating a service will do just that by adding the tag Prometheus =k8s to the service.
So create a file prometry-service. yml, which reads as follows:
apiVersion: v1
kind: Service
metadata:
name: prometheus
namespace: monitoring
labels:
prometheus: k8s
spec:
clusterIP: None
ports:
- name: web
port: 9090
protocol: TCP
targetPort: web
selector:
app: prometheus
type: ClusterIP
Copy the code
A tag selector like APP = Prometheus is defined above, so the Prometheus container must have this tag.
Create:
kubectl apply -f prometheus-service.yml
Copy the code
Create etcd secret
If you don’t plan to monitor ETCD, skip ahead and remove the secret related mount from the Prometheus YML file below.
Just create a secret:
kubectl -n monitoring create secret generic etcd-client-cert --from-file=/etc/kubernetes/pki/etcd/ca.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.crt --from-file=/etc/kubernetes/pki/etcd/healthcheck-client.key
Copy the code
For later use, it is recommended that you add –dry-run -o yaml to the end of this command and save the output in promethees-secret.yml. Since –dry-run will not be executed, you need to create it manually:
kubectl apply -f prometheus-secret.yml
Copy the code
The deployment of Prometheus
With all the pre-work done here, it’s time to deploy Prometheus directly. Create the file prometheus.yml, which contains the following contents:
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app: prometheus
prometheus: k8s
name: prometheus
namespace: monitoring
spec:
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: prometheus
prometheus: k8s
serviceName: prometheus
template:
metadata:
creationTimestamp: null
labels:
app: prometheus
prometheus: k8s
spec:
serviceAccount: prometheus-k8s
containers:
- args:
- --web.console.templates=/etc/prometheus/consoles
- --web.console.libraries=/etc/prometheus/console_libraries
- --config.file=/etc/prometheus/config/prometheus.yml
- --storage.tsdb.path=/prometheus
- --web.enable-admin-api
- --storage.tsdb.retention.time=20d
- --web.enable-lifecycle
- --storage.tsdb.no-lockfile
- --web.external-url=http://example.com/
- --web.route-prefix=/
image: PROM/Prometheus: v2.11.1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 6
httpGet:
path: /-/healthy
port: web
scheme: HTTP
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 3
name: prometheus
ports:
- containerPort: 9090
name: web
protocol: TCP
readinessProbe:
failureThreshold: 120
httpGet:
path: /-/ready
port: web
scheme: HTTP
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 3
resources:
requests:
memory: 400Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/prometheus/config
name: config
readOnly: true
- mountPath: /prometheus
name: prometheus-data
#subPath: prometheus-db
- mountPath: /etc/prometheus/rules/
name: prometheus-rulefiles
- mountPath: /etc/prometheus/secrets/etcd-client-cert
name: secret-etcd-client-cert
readOnly: true
volumes:
- name: config
configMap:
defaultMode: 420
name: prometheus
- name: prometheus-rulefiles
configMap:
defaultMode: 420
name: prometheus-rulefiles
- name: secret-etcd-client-cert
secret:
defaultMode: 420
secretName: etcd-client-cert
updateStrategy:
type: RollingUpdate
volumeClaimTemplates:
- metadata:
name: prometheus-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Ti
volumeMode: Filesystem
Copy the code
I won’t go into the basics of StatfulSet, but I’ll give you a few highlights:
--storage.tsdb.retention.time=20d
This startup option indicates that the monitoring data collected by Prometheus is retained for only 20 days, a value that should not be too large. If historical data is stored for a long time, you are advised to write it to persistent storage, for example, VictoriaMetrics, Thanos, InfluxDB, and Opentsdb.--web.enable-admin-api
This startup option enables the administrator API, which allows you to delete monitoring data, etc.serviceAccount
Its value must be Prometry-k8s, otherwise all the previous weights are useless;- Pod must exist
app: prometheus
This tag would otherwise not have been selected by the previously created service; - Two ConfigMaps, a Secret, and a storage volume are mounted.
There’s nothing else to say, just do it:
kubectl apply -f prometheus.yml
Copy the code
Then wait for Prometheus to start successfully:
kubectl -n monitoring get pod -w
Copy the code
Visit Prometheus
Instead of creating the ingress, we bind the pod port to the host:
Kubectl-n monitoring port-forward --address 0.0.0.0 pod/ promethes-0 9090:9090Copy the code
You can then open the Prometheus page by accessing port 9090 of the current host. To see that Prometheus itself is being monitored, click on Status and select Targets.
As you can see, it shows the additional six tags that we added earlier with the Relabel_configs configuration, which you will now have whenever you query any indicator in Prometheus. If you don’t need any of these tags, just delete them from the previous configuration.
You can then hover over the tags and see all the meta tags before relabel. If you want, you can add the corresponding tags to the previous configuration file by adding the response configuration.
You can click Prometheus in the upper left corner to return to the home page, type a random letter in the search box below, and then click on any of the indicators to see all of its tags and their corresponding values.
As you can see, the additional six tags are there, and you can use them for future queries.
Finally, create an ingress file for Prometheus named Promeths-Ingress:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: prometheus
namespace: monitoring
spec:
rules:
- host: example.com
http:
paths:
- path: /
backend:
serviceName: prometheus
servicePort: 9090
Copy the code
Be careful to replace the example.com above with your Prometheus domain name.