Why do you need a distributed logging system

In earlier projects, if you wanted to use logs to locate bugs or performance problems in business services in the production environment, you had to run commands to query log files for each service instance. As a result, troubleshooting was very inefficient.

In the microservice architecture, multiple service instances are deployed on different physical machines, and the logs of each microservice are distributed and stored on different physical machines. If the cluster is large enough, the traditional way of looking up the logs above becomes inappropriate. Therefore, logs in a distributed system need to be centrally managed. Open source components such as Syslog are used to collect logs from all servers.

However, after centralized log files, we are faced with statistics and retrieval of these log files, such as which services have alarms and exceptions, which need to have detailed statistics. Therefore, when online faults occurred in the past, it was common to see development and operation personnel downloading service logs and retrieving and collecting statistics based on Linux commands (such as grep, AWk, and WC). Such a way is not only heavy workload, low efficiency, and for higher requirements for query, sorting and statistics operations, as well as a large number of machines, it will inevitably be a little “inadequate”, not very competent.

Method of collecting container logs

If logs are placed inside the container, they will be deleted as the container is deleted. There are so many containers that the traditional way of viewing logs is not practical. To understand the container log collection method, let’s first look at the following three questions:

  • What is the difference between log collection in K8S and log collection under traditional conditions?
  • What kind of logs are collected?
  • After determining the type of logs to be collected, how do you collect them?

You can think about these questions, and we’ll get to them later.

Classification of container logs

There are several types of container logs, three for K8S itself:

1. Event events for resource runtime. For example, after creating a POD in a K8S cluster, you can view pod details using the kubectl Describe pod command.

2. Logs generated by applications running in the container, such as Tomcat, Nginx, and PHP run logs. Kubectl logs Redis-master-bobr0 This is also part of the introduction of the official and most online articles.

3. Service logs of K8S components, such as systemctl status kubelet.

K8s way

K8s itself is a container log output console, and Docker itself provides a log collection capability. If landing to the local file, there is no good collection method. Therefore, the newly added Pod property information (log file path and log source) may change. The process is similar to traditional collection, as shown in the following figure.

Generally speaking, we use two log collection schemes:

  • Collection outside the container. Mount the host directory as the container’s log directory, and collect it on the host.
  • Collect in container. Log collectors are deployed on Node, such as daemonset, to collect directories under the node container. And mount the directory inside the container to the host directory. Directory for this node/var/log/kubelet/pods and/var/lib/docker/containers/two directory to capture the log.
  • Network collection. In-container applications send logs directly to the logging center, such as Java programs that use log4j 2 to convert the log format and send it to the remote end.
  • Attach a dedicated log collection container to the Pod. Add a log collection container to each Pod of the running application and use emtyDir to share the log directory for the log collector to read.

The final approach is officially used, running both ElesticSearch and Kibana in a K8S cluster and then fluentd with Daemonset.

ELKB Distributed log system

ELKB is a complete distributed log collection system, which solves the problem of difficult log collection, retrieval and analysis mentioned above. ELKB refers to Elasticsearch, Logstash, Kibana, and Filebeat respectively. Elasticsearch is the data model layer, and Kibana is the View layer. Logstash and Elasticsearch are implemented in Java, while Kibana uses the Node.js framework.

  • Kibana: Visualization platform. Kibana is a Web page for searching, analyzing, and visualizing log data stored in Elasticsearch metrics. Kibana uses Elasticsearch’s REST interface to retrieve data, and visualizes Elasticsearch’s stored data by calling it. It not only allows users to customize views, but also supports querying and filtering data in special ways.

  • Elasticsearch: distributed search engine. It has the characteristics of high scalability, high reliability and easy management. It can be used for full-text retrieval, structured retrieval and analysis, and can combine the three. Elasticsearch is based on Lucene and is now one of the most widely used open source search engines, with Wikipedia, StackOverflow, Github and others building their own search engines based on it.

  • Logstash: Data collection and processing engine. Dynamically collect data from various data sources, filter, analyze, enrich, and format the data, and then store the data for future use.

  • Filebeat: Lightweight data collection engine. Based on the original Logstash-fowarder source code. In other words: Filebeat is the new Logstash-fowarder, and will be the first choice for ELK Stack.

With this architecture we connect the Logstash instance directly to the Elasticsearch instance. The Logstash instance uses the Input plugin to read data from the data source (Java logs, Nginx logs, etc.), filters the logs through the Filter plugin, and writes the data to the ElasticSearch instance through the Output plugin.

Filebeat is based on the logstash-forwarder source code and can run without relying on Java environment.

If there is a large number of logs, Logstash will encounter a high resource occupancy problem. To solve this problem, Filebeat is introduced. Filebeat is based on the logstash-forwarder source code. It is written in Go and does not rely on Java environment. It is efficient, takes less memory and CPU, and is very suitable for Agent running on server.

Filebeat consumes only 70% of the CPU of Logstash, but is collected seven times faster. In practice, Filebeat solves the resource consumption problem of Logstash with low cost and stable service quality.

summary

This article introduces the concept of distributed log system EFK. Log is mainly used to record discrete events, including detailed information about a certain point or stage of program execution. ELKB is a good solution to the problem that many and scattered service instances are difficult to collect and analyze logs under the microservice architecture.

The following article will enter into specific practice, how to build EFK log system on K8s and collect corresponding micro-service logs.

Read more articles on wechatAoho probe