This is the 10th day of my participation in Gwen Challenge
1 background
In the era of Kubernetes has not yet emerged, business deployment almost all applications adopt single-machine deployment, when the pressure increases, IDC architecture can only horizontally expand the server cluster, increase computing power, cloud computing after the rise, can dynamically adjust the configuration of the existing server, to share the pressure, Alternatively, you can use elastic scaling to dynamically adjust the number of back-end servers based on service volume, cluster load, and monitoring. As a part of the application system, logs are usually used to troubleshoot faults and find causes when the system is abnormal. Traditional log processing is usually combined with grep and Linux common text command tools for analysis.
In order to support faster development, iterative efficiency, in recent years began to container transformation, and began to embrace Kubernetes ecology, full cloud business and other work. At this stage, the log shows explosive growth in both scale and type, and the demand for digital and intelligent analysis of log is also increasing. Therefore, a unified log platform emerges at the historic moment.
2 Difficulties in Collecting Kubernetes logs
Simple log system solutions are very many, relatively mature, here will not go to repeat, we only for Kubernetes log system construction in terms of. The logging solution on Kubernetes is quite different from our previous logging solution based on physical machines and virtual airports. For example:
- The form of logs becomes more complex. There are not only physical machine/VIRTUAL machine logs, but also container standard output, container files, container events, Kubernetes events and other information to be collected.
- The dynamic environment becomes stronger, in Kubernetes, machine downtime, offline, online, Pod destruction, capacity expansion/reduction and so on are normal, in this case the existence of the log is instantaneous (for example, if the Pod destruction after the Pod log is not visible), so the log data must be real-time collected to the server. At the same time, it is necessary to ensure that the log collection can adapt to this dynamic scene;
- A request from the client needs to pass through CDN, Ingress, Service Mesh, Pod and other components, involving a variety of infrastructure, among which the log type has increased a lot. For example, K8s system component logs, audit logs, ServiceMesh logs, and Ingress logs.
3 Description of Kubernetes log files
As for the log collection of Kubenetes, the deployment method was DaemonSet. During the collection, namespaces of K8S cluster were classified, and then different topics were created into Kafka according to the name of namespace
By default, Kubernetes will generate soft connections for these log files in /var/log/containers and /var/log/Pods directories, as shown below:
You will then see that this directory contains all of the container logs for this host, named as:
[podName]_[nameSpace]_[depoymentName]-[containerId].log
Copy the code
This is the way of naming deployment. Others may vary a little, such as DaemonSet, StatefulSet, etc. But all have one thing in common: deployment
*_[nameSpace]_*.log
Copy the code
At this point, knowing this feature, you can move on to the deployment and configuration of Filebeat.
4 Filebeat
4.1 the deployment
The deployment took place on DaemonSet
---
apiVersion: v1
kind: ConfigMap
metadata:
name: filebeat-config
namespace: log
data:
filebeat.yml: |- filebeat.inputs: - type: container enabled: true paths: - /var/log/containers/*_default_*.log fields: namespace: default env: dev k8s: cluster-dev - type: container enabled: true paths: - /var/log/containers/*_kube-system_*.log fields: namespace: kube-system env: dev k8s: cluster-dev filebeat.config.modules: path: ${path.config}/modules.d/*.yml reload.enabled: false output.kafka: hosts: [175.27.159.78: "9092", "175.27.159.78:9093", "175.27.159.78:9094] the topic: '%{[fields.k8s]}-%{[fields.namespace]}' partition.round_robin: reachable_only: true---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: filebeat
namespace: log
labels:
k8s-app: filebeat
spec:
selector:
matchLabels:
k8s-app: filebeat
template:
metadata:
labels:
k8s-app: filebeat
spec:
serviceAccountName: filebeat
terminationGracePeriodSeconds: 30
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: filebeat
image: Docker. Elastic. Co/beats/filebeat: 7.12.0
args: [
"-c"."/etc/filebeat.yml"."-e",]env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
securityContext:
runAsUser: 0
# If using Red Hat OpenShift uncomment this:
#privileged: true
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
volumeMounts:
- name: config
mountPath: /etc/filebeat.yml
readOnly: true
subPath: filebeat.yml
- name: data
mountPath: /usr/share/filebeat/data
- name: varlibdockercontainers
mountPath: /data/docker/containers
readOnly: true
- name: varlog
mountPath: /var/log
readOnly: true
volumes:
- name: config
configMap:
defaultMode: 0640
name: filebeat-config
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /data/docker/containers
# data folder stores a registry of read status for all files, so we don't send everything again on a Filebeat pod restart
- name: data
hostPath:
# When filebeat runs as non-root user, this directory needs to be writable by group (g+w).
path: /var/lib/filebeat-data
type: DirectoryOrCreate
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: filebeat
subjects:
- kind: ServiceAccount
name: filebeat
namespace: log
roleRef:
kind: ClusterRole
name: filebeat
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: filebeat
labels:
k8s-app: filebeat
rules:
- apiGroups: [""] # "" indicates the core API group
resources:
- namespaces
- pods
- nodes
verbs:
- get
- watch
- list
- apiGroups: ["apps"]
resources:
- replicasets
verbs: ["get"."list"."watch"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: filebeat
namespace: log
labels:
k8s-app: filebeat
Copy the code
[root@master filebeat]# kubectl apply -f filebeat-daemonset.yaml
configmap/filebeat-daemonset-config-test created
daemonset.apps/filebeat created
clusterrolebinding.rbac.authorization.k8s.io/filebeat created
clusterrole.rbac.authorization.k8s.io/filebeat created
serviceaccount/filebeat created
Copy the code
4.2 Introduction to the Filebeat Configuration File
Here is the configuration structure of FileBeat
filebeat.inputs:
filebeat.config.modules:
processors:
output.xxxxx:
Copy the code
The structure is roughly like this. The complete data flow can be simply described as the following figure:
4.3 inputs
If you want to collect multiple clusters, you will also use the same namespace to do the classification, but the topic name needs to add a K8S cluster name, so that it is convenient to separate. Write the re in inputs to fetch and read the log file in the specified namespace, for example:
filebeat.inputs:
- type: container
enabled: true
paths:
- /var/log/containers/*_default_*.log
fields:
namespace: default
env: dev
k8s: cluster-dev
Copy the code
If there is more than one namespace, it can be arranged as follows:
filebeat.inputs:
- type: container
enabled: true
paths:
- /var/log/containers/*_default_*.log
fields:
namespace: default
env: dev
k8s: cluster-dev
- type: container
enabled: true
paths:
- /var/log/containers/*_kube-system_*.log
fields:
namespace: kube-system
env: dev
k8s: cluster-dev
Copy the code
I added a custom field named namespace, which is the name of the following topic. However, there are many namespaces in this topic, so how to create the topic dynamically when exporting?
output.kafka:
hosts: ["10.0.105.74:9092"."10.0.105.76:9092"."10.0.105.96:9092"]
topic: '%{[fields.namespace]}'
partition.round_robin:
reachable_only: true
Copy the code
So for now, the complete configuration file is as follows:
apiVersion: v1
kind: ConfigMap
metadata:
name: filebeat-config
namespace: log
data:
filebeat.yml: |- filebeat.inputs: - type: container enabled: true paths: - /var/log/containers/*_default_*.log fields: namespace: default env: dev k8s: cluster-dev - type: container enabled: true paths: - /var/log/containers/*_kube-system_*.log fields: namespace: kube-system env: dev k8s: cluster-dev filebeat.config.modules: path: ${path.config}/modules.d/*.yml reload.enabled: false output.kafka: hosts: [175.27.159.78: "9092", "175.27.159.78:9093", "175.27.159.78:9094] the topic: '%{[fields.k8s]}-%{[fields.namespace]}' partition.round_robin: reachable_only: trueCopy the code
4.4 processors
If you didn’t do anything with the log, you’d end up here, but what’s missing when you’re looking at the log? You only know the log content and the namespace from which the log comes, but you = do not know which service the log belongs to, which POD, or even want to view the mirror address of the service, etc., but this information is not available in our above configuration mode, so you need to add more information.
That’s where we use a configuration item called: Processors
4.4.1 Adding Basic K8s Information
When collecting k8S logs, if the above configuration is followed, there is no information about POD, for example:
- Pod Name
- Pod UID
- Namespace
- Labels
Example of log before adding
{
"@timestamp": "The 2021-05-06 T02:47:09. 256 z"."@metadata": {
"beat": "filebeat"."type": "_doc"."version": "7.12.0"
},
"log": {
"file": {
"path": "/var/log/containers/metrics-server-5549c7694f-7vb66_kube-system_metrics-server-9108765e17c7e325abd665fb0f53c8f4b3077c69 8cb88392099dfbafb0475709.log"
},
"offset": 15842
},
"stream": "stderr"."message": "E0506 02:47:09.254911 1 Reststorage.go :160] Unable to fetch pod metrics for pod log/filebeat-s67ds: no metrics known for pod"."input": {
"type": "container"
},
"fields": {
"env": "dev"."k8s": "cluster-dev"."namespace": "kube-system"
},
"ecs": {
"version": "1.8.0 comes with"
},
"host": {
"name": "node-03"
},
"agent": {
"hostname": "node-03"."ephemeral_id": "1c87559a-cfca-4708-8f28-e4fc6441943c"."id": "f9cf0cd4-eccf-4d8b-bd24-2bff25b4083b"."name": "node-03"."type": "filebeat"."version": "7.12.0"}}Copy the code
To add this information, use a tool in Processors called add_kubernetes_metadata, which simply adds k8s metadata. Here’s an example of how to do this: processors processors add_kubernetes_metadata
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
Copy the code
Host: Specifies the node to operate on fileBeat in case it cannot be accurately detected, such as running FileBeat in host network mode
Matchers: Matchers are used to construct lookup keys that match identifiers created by the index
Logs_path: indicates the base path of container logs. If this is not specified, the default log path of the platform where Filebeat runs is used
After adding the k8S metadata information, you can see the K8S information in the log. Take a look at the log format after adding the K8S information
{
"@timestamp": "The 2021-05-06 T03:01:58. 512 z"."@metadata": {
"beat": "filebeat"."type": "_doc"."version": "7.12.0"
},
"agent": {
"hostname": "node-03"."ephemeral_id": "c0f94fc0-b128-4eb9-b9a3-387f4cae44b7"."id": "f9cf0cd4-eccf-4d8b-bd24-2bff25b4083b"."name": "node-03"."type": "filebeat"."version": "7.12.0"
},
"ecs": {
"version": "1.8.0 comes with"
},
"stream": "stdout"."input": {
"type": "container"
},
"host": {
"name": "node-03"
},
"container": {
"id": "6791d22d210507becd7306ead1eeda9a4c558b5ca0630ed5af4f8b1b220fb4a7"."runtime": "docker"."image": {
"name": "Nginx: 1.10"}},"kubernetes": {
"namespace": "default"."replicaset": {
"name": "nginx-5b946576d4"
},
"labels": {
"app": "nginx"."pod-template-hash": "5b946576d4"
},
"container": {
"name": "nginx"."image": "Nginx: 1.10"
},
"deployment": {
"name": "nginx"
},
"node": {
"name": "node-03"."uid": "4340750b-1bb4-4d61-a9aa-4715c7326988"."labels": {
"kubernetes_io/arch": "amd64"."kubernetes_io/hostname": "node-03"."kubernetes_io/os": "linux"."beta_kubernetes_io/arch": "amd64"."beta_kubernetes_io/os": "linux"
},
"hostname": "node-03"
},
"namespace_uid": "8d1dad4b-bea0-469d-9858-51147822de79"."pod": {
"name": "nginx-5b946576d4-6kftk"."uid": "cc8c943a-919c-4e15-9cde-05358b8588c1"}},"log": {
"offset": 2039."file": {
"path": "/var/log/containers/nginx-5b946576d4-6kftk_default_nginx-6791d22d210507becd7306ead1eeda9a4c558b5ca0630ed5af4f8b1b220fb4 a7.log"}},"message": "The 2021-05-06 11:01:58 10.234.2.11 - \" GET/HTTP / 1.1 \ "200\612" - \ "\" curl / 7.29.0 \ "\", \ ""."fields": {
"k8s": "cluster-dev"."namespace": "default"."env": "dev"}}Copy the code
Kubernetes key value contains pod information, node information, namespace information, etc., basically contains some key information about K8S.
However, the problem is that there is too much information in this log. More than half of the information is not what we want, so we need to remove some fields that are not useful to us
4.4.2 Deleting Unnecessary Fields
processors:
- drop_fields:
# Delete unnecessary fields
fields:
- host
- ecs
- log
- agent
- input
- stream
- container
ignore_missing: true
Copy the code
4.4.3 Adding a Log Time
It can be seen from the above log information that there is no separate field about the log time. Although there is a @timestamp in it, it is not Beijing time, and what we want is the log time. There is time in message, but how can we get it and add a separate field? At this point, you need to use script, you need to write a JS script to replace.
processors:
- script:
lang: javascript
id: format_time
tag: enable
source: > function process(event) { var str=event.Get("message"); var time=str.split(" ").slice(0, 2).join(" "); event.Put("time", time); } - timestamp:
field: time
timezone: Asia/Shanghai
layouts:
- 'the 2006-01-02 15:04:05'
- 'the 2006-01-02 15:04:05. 999'
test:
- 'the 2019-06-22 16:33:51'
Copy the code
After the addition, there will be a time field, in later use, can use this field.
{
"@timestamp": "The 2021-05-06 T04: abide. 560 z"."@metadata": {
"beat": "filebeat"."type": "_doc"."version": "7.12.0"
},
"message": "The 2021-05-06 11:32:10 10.234.2.11 - \" GET/HTTP / 1.1 \ "200\612" - \ "\" curl / 7.29.0 \ "\", \ ""."fields": {
"k8s": "cluster-dev"."namespace": "default"."env": "dev"
},
"time": "The 2021-05-06 11:32:10"."kubernetes": {
"replicaset": {
"name": "nginx-deployment-6c4b886b"
},
"labels": {
"app": "nginx-deployment"."pod-template-hash": "6c4b886b"
},
"container": {
"name": "nginx"."image": "Nginx: 1.19.5"
},
"deployment": {
"name": "nginx-deployment"
},
"node": {
"uid": "07d8a1a4-e10f-4331-adf0-2fd7d5817c2d"."labels": {
"beta_kubernetes_io/os": "linux"."kubernetes_io/arch": "amd64"."kubernetes_io/hostname": "node-02"."kubernetes_io/os": "linux"."beta_kubernetes_io/arch": "amd64"
},
"hostname": "node-02"."name": "node-02"
},
"namespace_uid": "8d1dad4b-bea0-469d-9858-51147822de79"."pod": {
"name": "nginx-deployment-6c4b886b-6rbhw"."uid": "78a28548-3d34-4df6-9a76-c651b39ff934"
},
"namespace": "default"}}Copy the code
4.4.4 Optimize k8S data information structure
K8s = podName, nameSpace, imageAddr, hostName; drop kubernetes = kubernetes; The final result is as follows:
processors:
- script:
lang: javascript
id: format_k8s
tag: enable
source: > function process(event) { var k8s=event.Get("kubernetes"); var newK8s = { podName: k8s.pod.name, nameSpace: k8s.namespace, imageAddr: k8s.container.name, hostName: k8s.node.hostname } event.Put("k8s", newK8s); }Copy the code
The log
{
"@timestamp": "The 2021-05-06 T05: removed. 351 z"."@metadata": {
"beat": "filebeat"."type": "_doc"."version": "7.12.0"
},
"fields": {
"k8s": "cluster-dev"."namespace": "default"."env": "dev"
},
"k8s": {
"hostName": "node-02"."podName": "nginx-deployment-6c4b886b-6rbhw"."nameSpace": "default"."imageAddr": "nginx"
},
"message": "6 / May / 2021:05:33:25 + 0000 10.234.2.11 - \" GET/HTTP / 1.1 \ "200\612" - \ "\" curl / 7.29.0 \ "\", \ ""
}
Copy the code
Refer to the article
Juejin. Cn/post / 684490…
Juejin. Cn/post / 695399…