Kubernetes (K8S) logging service is once again upgrading its logging solution today. The cluster can be deployed within 1 minute, supports dynamic capacity expansion, and provides one-stop collection of all data sources, including host logs, container logs, and container STDout.


Click here to view the original: click.aliyun.com/m/42852/


background


As we all know, Docker is very popular, and Kubernetes (K8S for short) is the most popular in Docker. Compared with physical machines and VMS, Docker provides simpler, lightweight and cost-effective deployment, operation and maintenance methods. K8s, on top of Docker, further provides the abstraction of management infrastructure, forming a one-stop deployment and operation and maintenance scheme in a real sense.


K8s provides powerful work scheduling, horizontal expansion, health monitoring, high availability maintenance and other capabilities, and provides network and file system abstraction and management, so it is very convenient for existing applications or applications deployed based on K8S. But there is one part of the problem that is causing headaches for development and operations people — log collection.


The difficulties in analysis


The application based on VM or physical machine deployment, log collection technology is relatively complete, including relatively sound Logstash, Fluentd, FileBeats and so on. However, in Docker, especially in K8S, there is no good solution for log collection. The main reasons are as follows:
  1. Multiple collection objects: Collect host logs, intra-container logs, and container STDOUT. There are corresponding acquisition software for each data source, but there is no one-stop solution.
  2. Elastic scaling is difficult: K8S is a distributed cluster. The elastic scaling of services and environments brings great difficulties to log collection, and the dynamic collection and data integrity are great challenges.
  3. Large operation and maintenance costs: existing schemes can only use a combination of multiple software acquisition, which makes it difficult to guarantee the stability of the system assembled by each software, and lack of centralized management, configuration and monitoring means, resulting in huge operation and maintenance burden.
  4. High intrusion: Docker Driver extensions require modifications to the underlying engine; One Container corresponds to one collection Agent, causing resource competition and waste.
  5. Low collection performance: In normal cases, a Docker Engine runs dozens or even hundreds of Containers. In this case, the log collection performance and resource consumption of open source Agent are very worrying.


Based on alibaba’s accumulated experience in container service log collection over the years, and combined with the feedback and demands of the majority of users since alibaba Cloud Kubernetes internal test, today, log service for K8S to bring a true sense of one-stop log solution.


Plan to introduce


Project introduction


As shown in the figure above, we only need to deploy a Logtail container on each node in the Kubernetes cluster to achieve one-stop collection of all data sources such as host logs, container logs, container STdout, etc. We provide the DaemonSet deployment template for K8S, which can complete the whole cluster deployment within 1 minute. Moreover, the subsequent cluster dynamic scaling does not require any secondary deployment for acquisition. For details, see section Usage.


The log service client Logtail has been deployed to millions of users and collects tens of thousands of applications and PB data every day. It has gone through double 11 and double 12 tests for many times. For the sharing of related technologies, please refer to the article: Multi-tenant isolation technology + Double eleven combat effect, log sequence preserving collection scheme under the combination of Polling + Inotify.


Relying on the powerful function of Aliyun log service, we provide the following information for the collected log data:
  1. Context query enables you to quickly locate abnormal data from vast data, and supports context logs of the Container/Pod where the exception is located
  2. Real-time mass data analysis, complete statistical analysis of 100 million pieces of data in 1 second
  3. Built-in report, alarm function, boss, development, operation and maintenance all done
  4. Streaming computing docking: Storm, Flink, Blink, Spark Streaming, and more
  5. External visualization: Grafana and DataV can be easily connected
  6. Log archive delivery: Supports delivery to OSS archive storage and MaxCompute for offline analysis


Advantages of acquisition scheme


The advantages of log service as a whole will not be described here. This paper mainly discusses the advantages of log service Kubernetes collection scheme. Here we mainly summarize the following points:
Scheme comparison
Compared with the main log collection methods of Logstash and Fluentd, the comparison is as follows:


logtail logstash fluentd
Acquisition methods Host file support support support

The container file Automatic discovery Static collection Static collection

container stdout Automatic discovery Plug-in extension Docker driver
The data processing handling Any combination of re, Anchor, delimiter, and JSON Plug-in extension Plug-in extension

Automatic marking support Does not support k8s Does not support k8s

filter regular Plug-in extension Plug-in extension
configuration Automatic updates support Manual loading support

Server Configuration support The Beta version supports simple features Assist management software extension
performance Acquisition performance Very simple core 160M/s, regular 20M/s Single-core 2M/s or so Single-core 3-5 m/s

Resource consumption The average CPU is 2% and memory is 40M More than 10 times the performance cost More than 10 times the performance cost
reliability Save the data support Plug-in support Plug-in support

Collection site preservation All supported Files only Plug-in support
monitoring Local monitoring support support support

Server monitoring support The Beta version supports simple features Auxiliary monitoring software extension


use


Deploying K8S log collection is a 3-step cluster deployment that can be completed in one minute (see [K8S Collection Help] for details). This is probably the simplest K8S log collection deployment solution you have ever seen:
  1. Deploy the DaemonSet of Logtail. Physical exertion: one wget name, vi modify 3 parameters, execute a kubectl command
  2. The log service console creates a custom identification machine group (no additional operations are required for cluster dynamic scaling). Physical exertion: Web console click a few times and enter an ID
  3. Log service console Creates collection configuration (all collection is configured on the server and does not require local o&M). Physical exertion: Stdout collection web console a few clicks; File collection Web Console Click a few times and enter 2 paths
  • In addition to K8S, the logging service also supports standard Docker deployment


Introduction to core Technology


User-defined identification machine group


The key to supporting K8S elastic scaling is Logtail’s custom identification machine group. Generally, the schemes for remote management of acquisition Agent are marked by IP or hostname. This scheme is more suitable in the case of small cluster size and low environmental variability. When the machine scale expands and elastic expansion becomes normal, the operation and maintenance costs will increase exponentially.


Based on the experience of Agent operation and maintenance in the group for several years, we designed a configuration & machine management mode with higher flexibility, more convenient use and lower coupling degree:
  1. In addition to static IP addresses, machine groups can also be customized. All logtails that define the ids are automatically associated with the corresponding machine groups.
  2. A Logtail can belong to multiple machine groups. A machine group can contain multiple Logtails to decouple logtails from machine groups.
  3. One collection configuration can be applied to multiple machine groups, and a machine group can be associated with multiple collection configurations to decouple the machine group from the collection configuration.


The above concepts are mapped to k8S to achieve a variety of flexible configurations:
  1. A K8S cluster corresponds to a machine group with a custom identity. Logtails in the same cluster use the same configuration. When the K8S cluster scales, the DaemonSet of the corresponding Logtail scales automatically. After the Logtail starts, all the configurations associated with the machine group are obtained.
  2. A K8S cluster is configured with many different collection configurations. Set collection configurations according to different Pod requirements. All configurations involving container collection support IncludeLabel and ExcluseLabel filtering
  3. The same configuration can be applied to multiple K8S clusters. If you have multiple K8S clusters and some services have the same log collection logic, you can apply the same configuration to multiple clusters without additional configuration.


Automatic container discovery


Logtail is one of many applications (Logspout, MetricBeats, Telegraf, etc.) that have built-in automatic discovery of containers. The current open source container automatic discovery adopts the mode of one scan + event listening, that is, to obtain all the current container information during the initial run, and then monitor the docker Engine event information and incremental update information.


This method is relatively efficient, but there is a certain probability of missing some information:
  1. The increments that get all container information to this part of the Docker Engine event listener setup are lost
  2. Event listening may be terminated due to some exception, and incremental information will be lost between termination and listening re-establishment
Logtail implements automatic container discovery using event listening and periodic full scanning:
  1. First register listening events, then full scan
  2. Perform a full scan every once in a while to fully update meta information (the interval is high enough to have no impact on docker Engine pressure)


Container files are automatically rendered


Container log collection requires only file paths in the container, and supports various collection modes, such as minimalist, Nginx template, regular, delimiter, and JSON. Compared with traditional absolute path collection, intra-container log collection is highly dynamic, so we specially implemented a set of container path automatic matching and configuration rendering scheme:
  1. Logtail searches for the mapping between the container path and the host based on the configured container path
  2. Based on the host path and the metadata information of the container (Container Name, POD, Namespace…) Render the normal collection configuration
  3. The Logtail file acquisition module loads the rendered configuration and collects data
  4. Delete the rendered configuration when the container is destroyed


Reliability assurance


Reliability assurance in log collection is very important and difficult work. In Logtail’s design, process exits, abnormal terminations, and program upgrades are considered normal, and Logtail needs to ensure that data is as reliable as possible when these situations occur. Based on the dynamic characteristics of container data collection, Logtail adds the container standard output and checkpoint maintenance mechanism for container files on the basis of reliability assurance


Checkpoint management of container standard output


  1. The checkpoint of the containers stdout and stderr are stored separately
  2. Checkpoint saving policy: Periodically dumps the current checkpoint of all containers. Forcibly save configuration updates/process exits
  3. During configuration loading, data collection starts from the checkpoint by default. If there is no checkpoint, data collection starts from 5 seconds earlier
  4. Because checkpoint is not deleted during configuration deletion, invalid checkpoint is periodically deleted in the background


Checkpoint management of container files


  1. Save the checkpoint of file collection and the meta mapping of the container
  2. Before checkpoint loading, load the mapping between containers and files
  3. Considering that container state changes are not sensed during stops, all current configurations are rendered on each startup. Logtail ensures that the same container configuration is idempotent when loaded multiple times.


conclusion


Ali cloud logging services to provide solutions perfectly solve the problem of difficult k8s on log collection, from before need multiple software, dozens of deployment process reduced to 1 software, three operation can easily on the cloud, let the real users to experience a word: great, from log operations staff’s quality of life is greatly increased.


Scan for more information: