Increasing demand for data acquisition

In the previous article, we shared the log sequence preserving technology based on the combination of Polling+Notify in log collection. Logtail realizes the log sequence preserving, efficient and reliable collection problem in a single configuration based on the combination of Polling+Notify and log rotation queue.

However, log collection is not only the work of a single user/application. For example, log data collected on a typical server include resource Metric data, system monitoring logs, Nginx access data, middleware request data, security audit logs, and logs of different components in various applications. If docker is applied, it is conservatively estimated that the application in a Docker has 6-7 types of logs, and a physical machine runs 50 Dockers, so there are more than 300 configurations even if the logs in docker are collected.

Now the roles that need to get data range from the CEO who looks at the market to the DEBUG guy who queries the logs, and almost everyone is involved with the logs. That’s what makes us a data company. Therefore, ensuring the effective collection of all kinds of logs is a qualified log collection Agent must solve the problem.






Features and challenges of multi-tenant isolation

Multi-tenant isolation technology has been used in mainframes since the 1960s and is now used in many applications/systems. Multi-tenant isolation is interpreted differently in each application/system, and in this article we will only explore the application of multi-tenant isolation in log collection.

Multi-tenant isolation feature

You need to know the features required for multi-tenant isolation in log collection scenarios. The following five points are summarized: isolation, fairness, reliability, controllability, and cost performance

  • Isolation: The most basic feature of multi-tenant isolation. Multiple collection tasks do not affect each other, and the blocking of some collection configurations does not affect other collection tasks
  • Fairness: The fairness of multiple configurations in each phase (read, process, and send) is ensured. The probability of other configurations being processed is not reduced due to the large amount of log writing in one configuration
  • Reliability: Reliability is critical in any scenario. In multi-tenant isolation, the Agent can suspend collection of configurations if some collection is blocked. However, ensure that data is not lost during recovery
  • Controllability: Controllability is mainly reflected in the control of resources and behaviors. Agent should be able to control the resource occupation of each configuration within a reasonable range, control the collection rate, pause/start and other behaviors
  • Cost performance: The most important thing to pay attention to when implementing the final solution of the above features is cost performance. How to achieve optimal multi-tenant isolation solution with as little resource occupation as possible is the key to technical feasibility and applicability

Multi-tenant isolation challenges

Currently logtail is gradually becoming one of the group’s infrastructure, a service within the group, there are hundreds of acquisition belong to normal configuration, the priority of each configuration, logging speed, handling, upload the destination address is likely to be different, such as how to effectively isolate various custom configuration, QoS guarantee acquisition configuration is not affected by abnormal part configuration?

Hundreds of configuration server involves different application log of different levels, some of the log priority high and some low priority, the key point to low priority log the drop stop, but want to relegation after the period can also be the data recover before, how to effectively limit the low priority configuration and dynamic relegation and may not be able to guarantee the log is missing?

Currently, Nearly one million Logtail are deployed in the entire group and public cloud. If Logtail can save 1MB of memory usage, nearly 1TB of memory will be saved. In addition, the log collection Agent is positioned as an auxiliary application of the service, and a server cannot provide major resources for log collection. How do you schedule configurations as fairly, fairly, and efficiently as possible with the lowest possible footprint?

Industry collection Agent multi-tenant isolation solution

First of all, let’s take a look at the main technologies adopted by collection agents in the industry for multi-tenant isolation. Here, we mainly focus on Logstash, Fluentd and Filebeat, which is popular recently. We will compare and analyze these three products from the above five features.

logstash fluentd filebeat
Isolation, At least 1 thread per configuration, independent persistent queue At least 1 thread per configuration, independent persistent queue Each configuration of several GO Runtime, independent queue
fairness No coordination between the configurations, based on multi-thread scheduling No coordination between the configurations, based on multi-thread scheduling Configurations are not coordinated and are scheduled based on the Go Runtime
reliability Based on persistable queue cache guarantees Based on persistable queue cache guarantees The collection stops when the queue is full
controllability Can control persistent queue resources, delete configuration stop, support remote configuration Can control persistent queue resources, delete configuration stop, local configuration Can control the queue resource occupation, delete the configuration stop mining
Cost performance The lower The lower higher

Logstash, Fluentd and Filebeat all belong to the architecture of pipeline. According to different languages, independent threads/Go Runtime are used to realize pipeline functions. Each pipeline is executed in sequence internally and each pipeline runs independently of each other. This method has good isolation and simple implementation, and is suitable for small-scale scenarios. However, as the number of configurations increases, the corresponding number of threads/GO Runtime increases proportionally, and the resource is difficult to control in the case of more collection configurations. In addition, each pipeline is completely dependent on the underlying layer (operating system/Go Runtime) for scheduling. If CPU resources cannot be fully met, a configuration with a large amount of data takes up more execution time, reducing the probability of other configurations with a small amount of data obtaining resources.

Logtail Multi-tenant isolation solution

The overall architecture

Different from the current mainstream open source acquisition Agent implementation, Logtail adopts a more complex architecture, with a fixed number of threads (configurable parsing threads) for event discovery, data reading, parsing, and sending. The thread size does not increase with the number of configurations. Although all configurations run in the same execution environment, we take a series of measures to ensure the isolation of each configuration processing flow, fair scheduling between configurations, reliability of data collection, controllability and very high cost performance of resources.






Let’s focus on some of the more critical technologies that enable multi-tenant isolation in this scenario

Collection scheduling based on time slice

Mainstream agents allocate independent threads/Go Runtime for each configuration to read data, but only one thread is configured for logtail data reading. The main reasons are as follows:

  • A single thread is sufficient to complete all configured event processing and data reading. The bottleneck of data reading is not computing but disk. For normal servers, it is almost impossible to generate more than 100MB of logs per second, while the Logtail data reading thread can accomplish more than 200MB of data reading per second (SSD rate can be higher).
  • Another advantage of single-threading is that it allows event processing and data reading to run in a lock-free environment, which is more cost-effective than multithreading

But under the condition of single thread will exist the problem of uneven distribution of resources between multiple configuration, if you use a simple way of FCFS, once a write speed high files occupy a processing unit, it has been run down, until the file is processing is complete and release resources actively, this way is likely to cause other collection files are starved to death. Therefore, we adopt the time slice based collection scheduling scheme to make the scheduling between each configuration as fair as possible.






  1. Logtail merges Polling and Inotify events into a lockless event queue (see previous article), and each file modification event triggers a log read;
  2. The log reads from the LastReadOffset, LastReadOffset, and attempts to read the text to EOF within a fixed time slice.
  3. If the reading in the time slice is complete, the event is considered to be finished and deleted.
  4. If the read in the time slice is not complete, the time is pushed back to the end of the queue for the next schedule.

The scheme makes every acquisition target get fair treatment and all files have the chance to be scheduled to run, which solves the starvation phenomenon of acquisition target well.

Multi-level high and low water feedback queue

The time slice based collection scheduling ensures that logs of each configuration are fairly scheduled when data is read, which meets the basic fairness of multi-tenant isolation, but does not contribute to isolation. For example, if some collection configurations are blocked due to complex processing or network exceptions, the blocking configuration will still be processed. As a result, the queue reaches the upper limit and the data reading thread is blocked, affecting other normal configurations.

Therefore, we design a set of multi-level high and low level feedback queues to realize effective coordination and scheduling among multiple acquisition configurations, as well as among reading, parsing and sending steps. The Multilevel feedback queue is named similarly to the Multilevel feedback queue in process scheduling, but the queue implementation and application scenarios are quite different.






  • multistage
    • In this case, multi-level refers to multiple levels of processing, that is, each process will have one such queue and adjacent queues are related to each other
    • For example, in the Logtail data reading, processing, and sending process, you need to set a queue between read -> parse and parse -> send
  • High and low water level:
    • Two water levels are set for a single queue
    • Stop non-emergency data writes when the queue grows to a high water level (write is allowed in special cases such as process restart or data split)
    • When the queue consumes from a high water level to a low water level, writes are allowed again
  • Feedback:
    • There are two types of feedback: synchronous and asynchronous
    • When the current queue data is read, the status of the lower queue is checked synchronously. When the lower queue reaches its high watermark, the queue is skipped
    • When the current queue consumes from a high watermark to a low watermark, it asynchronously notifies the previous queue associated with it






Since multiple configurations exist, we create a set of queues for each configuration, each implemented using an array of Pointers, with a common lock for all configuration queues at each level, which is performance-friendly and memory-consuming. The structure of the multi-level high and low water feedback queue in Logtail is as follows:






We observe the behavior of the multi-level feedback queue in the same way that the log parsing step works:

  1. In the initial state, the parsing thread processes the Wait state and enters the FindJob state when data arrives or when a queue configured by the next sending thread consumes from high watermark to low watermark
  2. FindJob searches for the queue that has data and can be written to the next queue in sequence from the position of the queue that was last processed. If the queue is found, the queue is in Process state; otherwise, the queue is in Wait state
  3. After parsing the current job, Process checks whether the queue to which the job belongs reaches its low watermark. If so, it enters the Feedback state. Otherwise, it returns to FindJob to search for the next effective job
  4. Feedback status will send signals to the associated upper-level queue, and the parameter carries the ID of the current queue to trigger the operation of the upper-level process. After the signal is sent, it enters the FindJob state

In the process of processing based on multi-level high and low water feedback queues, when encountering the queue of the next level of blocking, the queue is directly skipped to prevent the thread from being blocked due to the processing of blocking jobs, which has high isolation. FindJob records the ID of the queue that was searched last time. The next search starts from the queue after the ID, ensuring the fairness of scheduling between configurations.

Flow control and blocking processing

The multi-level high and low water feedback queue in the previous section solves the problems of isolation and fairness between multiple configurations, but there are still some problems with controllability and reliability. For example: 1. You cannot accurately control the collection traffic of each configuration. You can only stop the collection by deleting the collection configuration. If a configuration is completely blocked and the configuration is associated with log file rotation, the data before the rotation will be lost when the blocking is recovered

It mainly includes three parts: event processing, data reading logic and data sending control:

  1. Event processing has nothing to do with data reading. Even if the queue associated with reading is full, it will be processed as usual. The processing here is mainly to update the file meta and put the rotation file into the rotation queue. In this way, data will not be lost even if the configuration is blocked/paused.
  2. When the parse queue associated with the configuration is full, if the event is put back to the end of the queue, it will cause more invalid scheduling and make the CPU idle. So we put the event into a dedicated Blocked queue when the parse queue is full, and put the data in the Blocked queue back into the event queue when the parse queue asynchronously feeds back.
  3. Each configured queue in the Sender is associated with a SenderInfo that records whether the current network is normal, the Quota is normal, and the maximum allowed send rate for the configuration. Each time the Sender fetches data from the queue according to SenderInfo, this includes network failure retry, Quota retry, status update, flow control, etc

Overall process combing

Let’s review the techniques used by Logtail in multi-tenant isolation, as shown below:

  • Time slice collection and scheduling ensures the isolation and fairness of each configuration data entry
  • Multi-level high and low water feedback queues are used to ensure the isolation and fairness of each process and multiple configurations even with very low resource occupancy.
  • The event processing non-blocking mechanism ensures high reliability even when file rotation occurs during configuration blocking/shutdown
  • Different flow control/stop-production policies and dynamic update are configured to ensure high controllability of data collection






Through the combination of these designs, we built a virtual multi-tenant Pipleline structure with minimal resources:






The proof is in the pudding

The cost performance issue in multi-tenant isolation is not a matter of verbal description. The best way is to use data:

Collect logs behind Double 11

At present, Logtail has carried the data collection of the whole aliyun website, all cloud products and services, deployment of all regions around the world, and important services of Alibaba Group (Taobao, Tmall, Cainiao, etc.). It collects real-time data from nearly a million servers every day, connecting thousands of applications and consumers. Logtail collected nearly all the data generated by ant Financial’s apps, users and servers during this year’s Singles’ Day event.

At present, Logtail has been installed on hundreds of thousands of ant machines, with an average of nearly 100 collection configurations on each machine, and thousands of application PEtabytes of logs are collected every day. During the period of double 11, in order to ensure the normal order of the core application data acquisition, double zero before 11 to hundreds of application in advance the drop stop, zero after the peak batch-by-batch recovery log collection, logtail finished in three hours after five hours of peak data that hundreds of applications, and ensure that stop log during mining even rotation/remove will not be lost. The following figure shows the CPU and memory usage of Logtail during data chasing:

You can see that even during Double 11, logtail’s average CPU usage was only 1.7% of single-core memory, and the average memory peaked at 42M. During data tracking, CPU increased by 0.4% and memory increased by 7M. Functionality aside, Logtail controls resources in a way that open source acquisition agents cannot.

The performance comparison

Among Logstash, Fluentd, and FileBeat, the best multi-tenant implementation with the highest performance is FileBeat, so here we compare fileBeat.

  • Configuration of the test machine: CPU: 16-core Intel(R) Xeon(R) CPU e5-2682 v4 @ 2.50GHz MEM: 64GB DISK: four 1TB SSDS (an SSD for writing logs)

  • The benchmark of FileBeat is about 30K/s(the number of logs per second). In order to improve fileBeat performance, we set fileBeat read file cache to 512K. Configure the output on a separate SSD and set the size of the log rotate to 4GB (about double the performance of online benchmark)

The number of configurations is small

Here are the CPU and memory usage of Logtail and FileBeat in minimal mode (single-line logging, no log parsing) at 0.1 MB/s, 1 MB/s, 1 MB/s, and 5 MB/s in 1, 2, 4, and 8 configurations, respectively

0.1 M/s 1M/s 5M/s
1 configuration FB:1.8% 22M LT: 0.5% 34M FB:15.9% 22M LT: 1.3% 36M FB: 76.3% 26M LT: 5.5% 42M
2 configuration FB:3.0% 22M LT: 0.6% 35M FB:27.5% 24M LT: 2.5% 44M FB: 137.9% 31M LT: 8.3% 50M
4 configuration FB:6.3% 30M LT: 0.8% 35M FB:52.4% 30M LT: 4.3% 59M FB: 190.4% 31M(lost data) LT: 15.3% 69M
8 configuration FB:10.9% 36M LT: 1.1% 46M FB:103% 37M LT: 6.5% 82M FB: (lost data) LT: 30.5% 83M

Filebeat has better memory control, but its performance lags behind that of Logtail. Filebeat’s processing capacity is about 18MB/s, and a log in the test is about 300 bytes, which is about 60K/s (the online benchmark is about 30K/s).

It can be seen that FileBeat has some advantages over Logstash and Fluentd (about 10 times faster). In minimal mode (no log parsing, similar to FileBeat), Logtail has about 150M/s of processing power, which is about 8 times better than the optimized FileBeat. As FileBeat consumes about 200% CPU when it reaches 18MB/s, Logtail consumes only 100% CPU when it reaches 150MB/s, so Logtail has a 10-fold advantage over FileBeat in terms of CPU cost performance.

Large Number of Configurations

The following compares the performance of FileBeat and Logtail in 100 configurations:

0.01 M/s 0.1 M/s 1M/s
Logtail 2.7% of 60 m 8.0% of 65 m 65.4% of 98 m
filebeat 13.3% of 236 m 102.5% of 238 m (Lost data)

When the configuration goes up to 100, fileBeat’s memory footprint increases significantly, while logtail’s relative memory growth is low. The CPU consumption of logtail in 100 configurations is the same as that in 2 configurations with the same amount of data. Meanwhile, by observing the CPU consumption of FileBeat, it can be seen that about 20% of the CPU consumption is related to dispatching-related FUtex calls. It can be seen that when the number of configurations increases, even the coroutine scheduling method relying on GO will consume more CPU.

conclusion

A data acquisition software may seem small and simple, but everything needs to be reconsidered when it comes to large scale, large numbers of users, and large amounts of data. Logtail of Aliyun log service is such a log collection Agent that has experienced millions of deployments, several PB data every day and nearly 10,000 applications.

Compared with open source software, our biggest advantage is that ali and Double 11 have such an environment for our training and mining. The multi-tenant isolation technology shared today may not even consider this problem for open source agent, but I believe it will definitely be encountered when the business scale increases to a certain size in the future. I hope our sharing can help you to some extent.

Looking forward to

To improve the

  • Currently, Logtail does a good job in the fairness of multi-tenancy, but it does not optimize the priority of configuration. We need to make more efforts in this aspect in the future. After all, different types of data are of different importance, and the priority of collection needs to be more accurate.
  • Currently, logtail’s monitoring function is not visible to users. In the future, we will package Logtail’s running data and error data into an Agent monitoring and alarm scheme to make logtail’s integrated collection scheme more complete.

New features

Logtail will also support HTTP, mysql binlog, and mysql SQL input sources in the upcoming version of Logtail.

  • HTTP is supported as the input source, the user can configure a specific URL, and Logtail periodically requests, processes and uploads the request data to the logging service. This method can collect data from Nginx, HaProxy, and Docker Daemons that provide HTTP interfaces.
  • Mysql binlog can be used as the input source (including RDS) to synchronize data to the log service in the form of binlog, similar to canal
  • Mysql SQL is used as the input source. You can use the SELECT statement to customize data collection to the log service. Incremental collection is supported

Comrades have more input source needs or other questions please add nail group 11775223 contact:









The resources

Multilevel feedback queue

cpu scheduling

log service

logtail vs logstash, flunetd

logstash

filebeat

fluentd