The figure above shows the EFK architecture diagram and the common log collection mode in THE K8S environment.

Log demand

1 Collect microservice logs in a centralized manner. Complete logs can be traced according to the request ID.

2. Collect statistics on the time consuming of the request interface. If the time exceeds the maximum response time, alarm should be made and targeted tuning should be carried out;

3 slow SQL ranking, and alarm;

4 abnormal log list, and alarm;

5 Slow page request ranking and an alarm is generated.

Collect K8S logs

K8s itself will not do log collection for you, you need to do it yourself;

K8s uses cluster-level logs for container log processing.

That is, container destruction, POD drift, Node downtime will not affect the container log;

The container logs are output to stdout and stderr, which are stored in the host directory.

The/var/lib/docker/container;

Node forwards through log proxy

Deploy a Daemonset on each node and run a logging-agent to collect logs.

For example, FluentD collects logs on the data disk corresponding to the host and outputs them to the log storage service or message queue.

Advantages and disadvantages analysis:

contrast instructions
advantages 1 Deploy only one Pod for each Node to collect logs. 2 No application intrusion is required
disadvantages Application logs must be directly exported to the container’s stdout,stderr

The Pod is internally forwarded to the logging service through the Sidecar container

By starting a Sidecar container, such as FluentD, in POD, the volume directory mounted by the container is read and output to the log server.

Log input source: log file

Log processing: logging-agent, such as Fluentd

Log storage: for example elasticSearch, kafka

Advantages and disadvantages analysis:

contrast instructions
advantages 1 Simple deployment. 2. Be friendly to host computer;
disadvantages 1. Consume more resources. 2. Kubectl logs cannot be seen

Example:

apiVersion: v1
kind: Pod
metadata:
  name: counter
spec:
  containers:
  - name: count
    image: busybox
    args:
    - /bin/sh
    - -c
    - > i=0; while true; do echo "$i:$(data)" >> /var/log/1.log echo "$(data) INFO $i" >> /var/log/2.log i=$((i+1)) sleep 1; done    volumeMounts:
    - name: varlog
        mountPath: /var/log
  - name: count-agent
    image: K8s. GCR. IO/fluentd - GCP: 1.30
    env:
    - name: FLUENTD_ARGS
        value: -c /etc/fluentd-config/fluentd.conf
    valumeMounts:
    - name: varlog
        mountPath: /var/log
    - name: config-volume
        mountPath: /etc/fluentd-config
  volumes:
  - name: varlog
      emptyDir: {}
  - name: config-volume
      configMap:
        name: fluentd-config



Copy the code

The Pod internally outputs to STDOUT through the Sidecar container

The application container can only output logs to files, not to stdout,stderr scenarios;

Through a sidecar container, the log files are read directly and reprinted to stdout,stderr,

You can use the mode of forwarding through the log proxy on Node.

Advantages and disadvantages analysis:

contrast instructions
advantages The shared volume processing efficiency is high because it consumes less CPU and memory
disadvantages Two identical logs exist on the host, causing low disk usage

The application container directly outputs logs to the logging service

This mode applies to scenarios where a mature log system is deployed and logs do not need to pass the K8S.

EFK introduction

fluentd

Fluentd is an open source data collector with a unified logging layer.

Flentd allows you to unify log collection and better use and understand data;

Four characteristics:

Unified logging layer

Fluentd cuts off the data source and provides a unified log layer from the background system.

Simple and flexible to provide more than 500 plug-ins, connect a lot of data sources and output sources, simple kernel;

Fluentd’s largest customer collects logs from more than 50,000 servers through extensive validation of more than 5,000 data-driven companies

** Cloud native **Copy the code

It is a member project of cloud Native CNCF

4 Advantages:

Unified JSON Log

Fluentd tries to use JSON structured data, which unify all aspects of processing log data, collecting, filtering, caching, and outputting logs to multiple destinations. Downstream data processing is easier to use JSON, because it already has enough access structure and retains enough flexible scemas.

Plug-in architecture

Fluntd has a flexible plugin system that allows the community to extend its functionality. More than 500 community-contributed plug-ins connect to many data sources and destinations. With plug-ins, you can start to make better use of your logs

Minimum resource consumption

Written in C and Ruby, requires minimal system resources, around 40M memory can handle 13K/time/second, if you need more compact memory, you can use Fluent bit, lighter weight Fluentd

The kernel is reliable

Fluentd supports memory and file-based caching to prevent internal node data loss. Robust failure is also supported and highly available mode can be configured. More than 2000 data-driven companies rely on Fluentd in different products to better use and understand their log data

Reasons to use FluentD:

Simple and flexible

Fluentd can be installed on your computer in 10 minutes, you can download it immediately, more than 500 plug-ins through the data source and destination, plug-ins are easy to develop and deploy;

Open source

** fully open source based on Apache2.0 certificates **

Reliable and high performance

More than 5,000 data-driven companies rely on fluentd for different products and services to better use and understand data. In fact, datadog-based surveys are the top7 technologies running docker.

Some Fluentd users collect data from thousands of machines in real time. Each instance only needs about 40MB of memory. When scaling, you can save a lot of memory

community

Fluentd can improve software and help others use it better

Big companies use endorsements: Microsoft, Amazon; pptv ;

You can use elasticSearch + Kibana to create a log suite. Quickly set up an EFK cluster, collect application logs, and configure performance rankings.

elasticsearch

Elasticsearch is a distributed, RESTful search and data analysis engine,

Be able to solve the ever-emerging use cases. As the heart of the Elastic Stack,

It stores your data centrally and helps you discover what’s expected and what’s not.

Details: www.elastic.co/guide/cn/el…

kibana

Kibana, an open-source data analysis and visualization platform, is a member of Elastic Stack,

Designed to work with Elasticsearch. You can use Kibana to search for data in the Elasticsearch index,

View and interoperate. You can easily use charts, tables and maps to analyze and present data in a variety of ways.

Kibana makes big data easy to understand. It’s very simple,

The browser-based interface makes it easy to quickly create and share dynamic data dashboards to track real-time changes in Elasticsearch data.

Details: www.elastic.co/guide/cn/ki…

Containerized EFK implementation path

Github.com/kayrus/elk-…

Just drag the code down, and then configure the context, namespace, and install it;

cd elk-kubernetes

./deploy.sh --watch
Copy the code

Here’s the deploy.sh script for a quick look:

#! /bin/sh

CDIR=$(cd `dirname "$0"` && pwd)
cd "$CDIR"

print_red() {
  printf '%b' "\033[91m$1\033[0m\n"
}

print_green() {
  printf '%b' "\033[92m$1\033[0m\n"
}

render_template() {
  eval "echo \"$(cat "$1")\""
}


KUBECTL_PARAMS="--context=250091890580014312-cc3174dcd4fc14cf781b6fc422120ebd8"
NAMESPACE=${NAMESPACE:-sm}
KUBECTL="kubectl ${KUBECTL_PARAMS} --namespace=\"${NAMESPACE}\""

eval "kubectl ${KUBECTL_PARAMS} create namespace \"${NAMESPACE}\""

#NODES=$(eval "${KUBECTL}get nodes -l 'kubernetes.io/role! =master' -o go-template=\"{{range .items}}{{\\\$name := .metadata.name}}{{\\\$unschedulable := .spec.unschedulable}}{{range .status.conditions}}{{if eq .reason \\\"KubeletReady\\\"}}{{if eq .status \\\"True\\\"}}{{if not \\\$unschedulable}}{{\\\$name}}{{\\\"\\\\n\\\"}}{{end}}{{end}}{{end}}{{end}}{{end}}\"")NODES=$(eval "${KUBECTL} get nodes -l 'sm.efk=data' -o go-template=\"{{range .items}}{{\\\$name := .metadata.name}}{{\\\$unschedulable := .spec.unschedulable}}{{range .status.conditions}}{{if eq .reason \\\"KubeletReady\\\"}}{{if eq .status \\\"True\\\"}}{{if not \\\$unschedulable}}{{\\\$name}}{{\\\"\\\\n\\\"}}{{end}}{{end}}{{end}}{{end}}{{end}}\"") ES_DATA_REPLICAS=$(echo "$NODES"  | wc -l) if [ "$ES_DATA_REPLICAS" -lt 3 ]; then print_red "Minimum amount of Elasticsearch data nodes is 3 (in case when you have 1 replica shard), you have ${ES_DATA_REPLICAS} worker nodes" print_red "Won't deploy more than one Elasticsearch data pod per node exiting..." exit 1 fi print_green "Labeling nodes which will serve Elasticsearch data pods" for node in $NODES; do eval "${KUBECTL} label node ${node} elasticsearch.data=true --overwrite" done for yaml in *.yaml.tmpl; do render_template "${yaml}" | eval "${KUBECTL} create -f -" done for yaml in *.yaml; do eval "${KUBECTL} create -f \"${yaml}\"" done eval "${KUBECTL} create configmap es-config --from-file=es-config --dry-run -o yaml" | eval "${KUBECTL} apply -f -" eval "${KUBECTL} create configmap fluentd-config --from-file=docker/fluentd/td-agent.conf --dry-run -o yaml" | eval "${KUBECTL} apply -f -" eval "${KUBECTL} create configmap kibana-config --from-file=kibana.yml --dry-run -o yaml" | eval "${KUBECTL} apply -f -" eval "${KUBECTL} get pods $@"Copy the code

A brief breakdown of the deployment process:

My K8S environment did not build successfully, detailed installation notes will be provided after the subsequent build is successful.

summary

EFK is a common way to collect application logs by logging proxy client.

Original is not easy, attention is precious, forwarding price is higher! Reprint please indicate the source, let us exchange, common progress, welcome communication.