Of all the objects in the Kubernetes API, Events is one of the most overlooked types. Compared with other objects, events are very active and are unlikely to be stored in ETCD for long periods of time. By default, events only last one hour. When we use Kubectl Describe to retrieve an object, we may not be able to retrieve its history events due to time lapse, which is very unfriendly to the cluster user. In addition to being able to view cluster events, we might also need to track specific Warning events such as Pod life cycles, replica sets, or worker node status for related alerts. So before starting this topic, let’s first understand the structure of Kubernetes Events. The following are several important field explanations given by the official interview
- Message: A human-readable description of the status of this operation
- Involved Object: The object that the event is about, like Pod, Deployment, Node, etc.
- Reason: Short, MACHINE – Understandable string — in other words, Enum
- Source: The component reporting this event; a short, machine-understandable string, i.e., kube-scheduler
- Type: Currently holds only Normal & Warning, but custom types can be given if desired.
- Count: The number of times the event has occurred
For these events, we expect a collection tool to output the information to a persistent place for storage and analysis. In the past, we used to export kubernetes events to Elasticsearch for index analysis.
Since this article discusses the analysis of Kubernes events by Loki, we basically follow the following process for event processing:
kubernetes-api --> event-exporter --> fluentd --> loki --> grafana
Copy the code
At present, Kubernetes Events can be collected by kube-Eventer from Ali Cloud and Kubernetes-Event from Opsgenie. (Kubesphere also has a Kube-Events. However, it needs to be used with CRD of its component, so it is not discussed.
When an event is entered in Loki, it can be visually queried on Grafana using LogQL V2 statements. For example, we can display Kubernetes events statistically by level and type. The Dashboard allows you to quickly view cluster exceptions.
kubernetes-event-exporter
The first step is to deploy Kubernetes-Event-Exporter, which prints cluster events to the container STdout for log collection
apiVersion: v1
kind: ServiceAccount
metadata:
name: event-exporter
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: event-exporter
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: view
subjects:
- kind: ServiceAccount
namespace: kube-system
name: event-exporter
---
apiVersion: v1
kind: ConfigMap
metadata:
name: event-exporter-cfg
namespace: kube-system
data:
config.yaml: | logLevel: error logFormat: json route: routes: - match: - receiver: "dump" receivers: - name: "dump" file: path: "/dev/stdout"---
apiVersion: apps/v1
kind: Deployment
metadata:
name: event-exporter
namespace: kube-system
spec:
replicas: 1
template:
metadata:
labels:
app: event-exporter
version: v1
spec:
serviceAccountName: event-exporter
containers:
- name: event-exporter
image: Opsgenie/kubernetes - event - exporter: 0.9
imagePullPolicy: IfNotPresent
args:
- -conf=/data/config.yaml
volumeMounts:
- mountPath: /data
name: cfg
volumes:
- name: cfg
configMap:
name: event-exporter-cfg
selector:
matchLabels:
app: event-exporter
version: v1
Copy the code
When the container is fully running, kubectl logs will allow you to see the cluster events that the Event-exporter container will print in JSON format.
Fluentd and FluentBit will collect the container logs by default, usually running on Kubernetes. All we need to do is send these to Loki
Fluentd and Loki: Fluentd and Loki: Fluentd and Loki: Fluentd and Loki: Fluentd and Loki
Finally, we can query the write of the Kubernetes event on the Dagger
Event extension Node Problem Detector
Kubernetes does not have many events about Node, and it is not possible to notify nodes of more low-level states (such as kernel deadlocks, container runtime unresponsive, etc.) by means of events. The Node Problem Detector is a good complement to reporting more detailed Node events to Kubernetes in NodeCondition and Event mode.
Installing the Node Problem Detector is very simple and can be done with just two commands from helm.
helm repo add deliveryhero https://charts.deliveryhero.io/
helm install deliveryhero/node-problem-detector
Copy the code
The Node Problem Detector supports users to run custom scripts to construct events. In this article, the Node Problem Detector has a defined network monitoring step to perform Conntrack checks on Node nodes in addition to the default configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: node-problem-detector-config
namespace: kube-system
data:
network_problem.sh: |
#! /bin/bash
readonly OK=0
readonly NONOK=1
readonly UNKNOWN=2
readonly NF_CT_COUNT_PATH='/proc/sys/net/netfilter/nf_conntrack_count'
readonly NF_CT_MAX_PATH='/proc/sys/net/netfilter/nf_conntrack_max'
readonly IP_CT_COUNT_PATH='/proc/sys/net/ipv4/netfilter/ip_conntrack_count'
readonly IP_CT_MAX_PATH='/proc/sys/net/ipv4/netfilter/ip_conntrack_max'
if [[ -f $NF_CT_COUNT_PATH ]] && [[ -f $NF_CT_MAX_PATH]].then
readonly CT_COUNT_PATH=$NF_CT_COUNT_PATH
readonly CT_MAX_PATH=$NF_CT_MAX_PATH
elif [[ -f $IP_CT_COUNT_PATH ]] && [[ -f $IP_CT_MAX_PATH]].then
readonly CT_COUNT_PATH=$IP_CT_COUNT_PATH
readonly CT_MAX_PATH=$IP_CT_MAX_PATH
else
exit $UNKNOWN
fi
readonly conntrack_count=$(< $CT_COUNT_PATH) | |exit $UNKNOWN
readonly conntrack_max=$(< $CT_MAX_PATH) | |exit $UNKNOWN
readonly conntrack_usage_msg="${conntrack_count} out of ${conntrack_max}"
if (( conntrack_count > conntrack_max * 9 /10 )); then
echo "Conntrack table usage over 90%: ${conntrack_usage_msg}"
exit $NONOK
else
echo "Conntrack table usage: ${conntrack_usage_msg}"
exit $OK
fi
network-problem-monitor.json: |
{
"plugin": "custom"."pluginConfig": {
"invoke_interval": "30s"."timeout": "5s"."max_output_length": 80,
"concurrency": 3}."source": "network-plugin-monitor"."metricsReporting": true."conditions": []."rules": [{"type": "temporary"."reason": "ConntrackFull"."path": "/config/network_problem.sh"."timeout": "5s"}}]...Copy the code
Then edit the Daemonset file of Node-Problem-Detector to introduce the following customized scripts and rules
. containers: - name: node-problem-detector command: - /node-problem-detector - --logtostderr - --config.system-log-monitor=/config/kernel-monitor.json,/config/docker-monitor.json - -- config. Custom - the plugin - monitor = / config/network - problem - monitor. Json - Prometheus - address = 0.0.0.0 - --prometheus-port=20258 - --k8s-exporter-heartbeat-period=5m0s ... volumes: - name: config configMap: defaultMode: 0777 name: node-problem-detector-config items: - key: kernel-monitor.json path: kernel-monitor.json - key: docker-monitor.json path: docker-monitor.json - key: network-problem-monitor.json path: network-problem-monitor.json - key: network_problem.sh path: network_problem.shCopy the code
Grafana analysis panel
Bai has contributed the Kubernetes event analysis panel based on Loki to Grafana Lab. We can visit the following website to download the Dashboard
Grafana.com/grafana/das…
After importing the Panel into Grafana, we need to modify the Panel’s log query to replace {job=”kubernetes-event-exporter”} with our own exporter label.
After that, we can get the following analysis panel
How is it? Is it a feeling
Follow “cloud native xiaobai” and enter the Loki learning group