The figure above shows the EFK architecture diagram and the common log collection mode in THE K8S environment.
Log demand
1 Collect microservice logs in a centralized manner. Complete logs can be traced according to the request ID.
2. Collect statistics on the time consuming of the request interface. If the time exceeds the maximum response time, alarm should be made and targeted tuning should be carried out;
3 slow SQL ranking, and alarm;
4 abnormal log list, and alarm;
5 Slow page request ranking and an alarm is generated.
Collect K8S logs
K8s itself will not do log collection for you, you need to do it yourself;
K8s uses cluster-level logs for container log processing.
That is, container destruction, POD drift, Node downtime will not affect the container log;
The container logs are output to stdout and stderr, which are stored in the host directory.
The/var/lib/docker/container;
Node forwards through log proxy
Deploy a Daemonset on each node and run a logging-agent to collect logs.
For example, FluentD collects logs on the data disk corresponding to the host and outputs them to the log storage service or message queue.
Advantages and disadvantages analysis:
contrast | instructions |
---|---|
advantages | 1 Deploy only one Pod for each Node to collect logs. 2 No application intrusion is required |
disadvantages | Application logs must be directly exported to the container’s stdout,stderr |
The Pod is internally forwarded to the logging service through the Sidecar container
By starting a Sidecar container, such as FluentD, in POD, the volume directory mounted by the container is read and output to the log server.
Log input source: log file
Log processing: logging-agent, such as Fluentd
Log storage: for example elasticSearch, kafka
Advantages and disadvantages analysis:
contrast | instructions |
---|---|
advantages | 1 Simple deployment. 2. Be friendly to host computer; |
disadvantages | 1. Consume more resources. 2. Kubectl logs cannot be seen |
Example:
apiVersion: v1
kind: Pod
metadata:
name: counter
spec:
containers:
- name: count
image: busybox
args:
- /bin/sh
- -c
- > i=0; while true; do echo "$i:$(data)" >> /var/log/1.log echo "$(data) INFO $i" >> /var/log/2.log i=$((i+1)) sleep 1; done volumeMounts:
- name: varlog
mountPath: /var/log
- name: count-agent
image: K8s. GCR. IO/fluentd - GCP: 1.30
env:
- name: FLUENTD_ARGS
value: -c /etc/fluentd-config/fluentd.conf
valumeMounts:
- name: varlog
mountPath: /var/log
- name: config-volume
mountPath: /etc/fluentd-config
volumes:
- name: varlog
emptyDir: {}
- name: config-volume
configMap:
name: fluentd-config
Copy the code
The Pod internally outputs to STDOUT through the Sidecar container
The application container can only output logs to files, not to stdout,stderr scenarios;
Through a sidecar container, the log files are read directly and reprinted to stdout,stderr,
You can use the mode of forwarding through the log proxy on Node.
Advantages and disadvantages analysis:
contrast | instructions |
---|---|
advantages | The shared volume processing efficiency is high because it consumes less CPU and memory |
disadvantages | Two identical logs exist on the host, causing low disk usage |
The application container directly outputs logs to the logging service
This mode applies to scenarios where a mature log system is deployed and logs do not need to pass the K8S.
EFK introduction
fluentd
Fluentd is an open source data collector with a unified logging layer.
Flentd allows you to unify log collection and better use and understand data;
Four characteristics:
Unified logging layer
Fluentd cuts off the data source and provides a unified log layer from the background system.
Simple and flexible to provide more than 500 plug-ins, connect a lot of data sources and output sources, simple kernel;
Fluentd’s largest customer collects logs from more than 50,000 servers through extensive validation of more than 5,000 data-driven companies
** Cloud native **Copy the code
It is a member project of cloud Native CNCF
4 Advantages:
Unified JSON Log
Fluentd tries to use JSON structured data, which unify all aspects of processing log data, collecting, filtering, caching, and outputting logs to multiple destinations. Downstream data processing is easier to use JSON, because it already has enough access structure and retains enough flexible scemas.
Plug-in architecture
Fluntd has a flexible plugin system that allows the community to extend its functionality. More than 500 community-contributed plug-ins connect to many data sources and destinations. With plug-ins, you can start to make better use of your logs
Minimum resource consumption
Written in C and Ruby, requires minimal system resources, around 40M memory can handle 13K/time/second, if you need more compact memory, you can use Fluent bit, lighter weight Fluentd
The kernel is reliable
Fluentd supports memory and file-based caching to prevent internal node data loss. Robust failure is also supported and highly available mode can be configured. More than 2000 data-driven companies rely on Fluentd in different products to better use and understand their log data
Reasons to use FluentD:
Simple and flexible
Fluentd can be installed on your computer in 10 minutes, you can download it immediately, more than 500 plug-ins through the data source and destination, plug-ins are easy to develop and deploy;
Open source
** fully open source based on Apache2.0 certificates **
Reliable and high performance
More than 5,000 data-driven companies rely on fluentd for different products and services to better use and understand data. In fact, datadog-based surveys are the top7 technologies running docker.
Some Fluentd users collect data from thousands of machines in real time. Each instance only needs about 40MB of memory. When scaling, you can save a lot of memory
community
Fluentd can improve software and help others use it better
Big companies use endorsements: Microsoft, Amazon; pptv ;
You can use elasticSearch + Kibana to create a log suite. Quickly set up an EFK cluster, collect application logs, and configure performance rankings.
elasticsearch
Elasticsearch is a distributed, RESTful search and data analysis engine,
Be able to solve the ever-emerging use cases. As the heart of the Elastic Stack,
It stores your data centrally and helps you discover what’s expected and what’s not.
Details: www.elastic.co/guide/cn/el…
kibana
Kibana, an open-source data analysis and visualization platform, is a member of Elastic Stack,
Designed to work with Elasticsearch. You can use Kibana to search for data in the Elasticsearch index,
View and interoperate. You can easily use charts, tables and maps to analyze and present data in a variety of ways.
Kibana makes big data easy to understand. It’s very simple,
The browser-based interface makes it easy to quickly create and share dynamic data dashboards to track real-time changes in Elasticsearch data.
Details: www.elastic.co/guide/cn/ki…
Containerized EFK implementation path
Github.com/kayrus/elk-…
Just drag the code down, and then configure the context, namespace, and install it;
cd elk-kubernetes
./deploy.sh --watch
Copy the code
Here’s the deploy.sh script for a quick look:
#! /bin/sh
CDIR=$(cd `dirname "$0"` && pwd)
cd "$CDIR"
print_red() {
printf '%b' "\033[91m$1\033[0m\n"
}
print_green() {
printf '%b' "\033[92m$1\033[0m\n"
}
render_template() {
eval "echo \"$(cat "$1")\""
}
KUBECTL_PARAMS="--context=250091890580014312-cc3174dcd4fc14cf781b6fc422120ebd8"
NAMESPACE=${NAMESPACE:-sm}
KUBECTL="kubectl ${KUBECTL_PARAMS} --namespace=\"${NAMESPACE}\""
eval "kubectl ${KUBECTL_PARAMS} create namespace \"${NAMESPACE}\""
#NODES=$(eval "${KUBECTL}get nodes -l 'kubernetes.io/role! =master' -o go-template=\"{{range .items}}{{\\\$name := .metadata.name}}{{\\\$unschedulable := .spec.unschedulable}}{{range .status.conditions}}{{if eq .reason \\\"KubeletReady\\\"}}{{if eq .status \\\"True\\\"}}{{if not \\\$unschedulable}}{{\\\$name}}{{\\\"\\\\n\\\"}}{{end}}{{end}}{{end}}{{end}}{{end}}\"")NODES=$(eval "${KUBECTL} get nodes -l 'sm.efk=data' -o go-template=\"{{range .items}}{{\\\$name := .metadata.name}}{{\\\$unschedulable := .spec.unschedulable}}{{range .status.conditions}}{{if eq .reason \\\"KubeletReady\\\"}}{{if eq .status \\\"True\\\"}}{{if not \\\$unschedulable}}{{\\\$name}}{{\\\"\\\\n\\\"}}{{end}}{{end}}{{end}}{{end}}{{end}}\"") ES_DATA_REPLICAS=$(echo "$NODES" | wc -l) if [ "$ES_DATA_REPLICAS" -lt 3 ]; then print_red "Minimum amount of Elasticsearch data nodes is 3 (in case when you have 1 replica shard), you have ${ES_DATA_REPLICAS} worker nodes" print_red "Won't deploy more than one Elasticsearch data pod per node exiting..." exit 1 fi print_green "Labeling nodes which will serve Elasticsearch data pods" for node in $NODES; do eval "${KUBECTL} label node ${node} elasticsearch.data=true --overwrite" done for yaml in *.yaml.tmpl; do render_template "${yaml}" | eval "${KUBECTL} create -f -" done for yaml in *.yaml; do eval "${KUBECTL} create -f \"${yaml}\"" done eval "${KUBECTL} create configmap es-config --from-file=es-config --dry-run -o yaml" | eval "${KUBECTL} apply -f -" eval "${KUBECTL} create configmap fluentd-config --from-file=docker/fluentd/td-agent.conf --dry-run -o yaml" | eval "${KUBECTL} apply -f -" eval "${KUBECTL} create configmap kibana-config --from-file=kibana.yml --dry-run -o yaml" | eval "${KUBECTL} apply -f -" eval "${KUBECTL} get pods $@"Copy the code
A brief breakdown of the deployment process:
My K8S environment did not build successfully, detailed installation notes will be provided after the subsequent build is successful.
summary
EFK is a common way to collect application logs by logging proxy client.
Original is not easy, attention is precious, forwarding price is higher! Reprint please indicate the source, let us exchange, common progress, welcome communication.