The articles of this period mainly focus on the growth of the team and the sorting of the process, but lack of real technical “dry goods”. So today I’m going to share my thoughts on how to build a log analysis system, and I’m going to talk a little bit more about technology. To get back to the point, the starting point of this system is very simple. During the promotion activities, operation colleagues hope to see users’ behavior data and order status in real time, so as to adjust operation strategies timely and effectively according to the data. Although the number of users of Mutual Gold products is far from being comparable to the promotion period of domestic leading e-commerce companies, the volume of logs during the activity is still difficult to cope with the ordinary architecture. Therefore, after some research, the architecture of real-time log analysis system is as follows:


Flume+Kafka+Storm was introduced as the base, and Redis+Mysql’s “classic” combination to do a good job of log data storage after processing. The following will be discussed separately, the reasons for choosing this group of team and thinking in the process.

Collect log data using Flume

Initially, there were two sets of ideas for collecting logs. One was to consider batching log text through shell scripts, and the other was to collect log data directly from a specific set of apis in an application. However, these two schemes were quickly rejected. The problem of scheme 1 is that the workload is heavy, the script is not convenient to maintain, and the maintenance cost is quite high. The problem of scheme 2 is that the business code is very intrusive, and it is difficult to adjust or update the API in time. The most important point is that this scheme also has a certain impact on the performance of business services.

After investigation, we decided to use the third-party framework Flume for log collection. Flume is a distributed and efficient log collection system, which can uniformly collect the massive log file data distributed on different servers into a centralized storage resource, and has good compatibility with Kafka.

Why not use Logstash?

Frankly speaking, I knew nothing about Flume before this project. When I was at SF, I used ELK (ElasticSearch, Logstash, Kibana) for log processing, so I am more familiar with Logstash. Flume is considered for two reasons:

1. Flume + Kafka combination solution is more mature, because Kafka is considered to do the message system, will consider backward using Flume.

1. The advantage of Flume lies in the stability of transmission. Therefore, since it is the analysis of business data, stability is naturally a key consideration. Flume Agent runs on JVM. Each Flume Agent is deployed on a server. Flume collects log data generated by application services and encapsulates them into events and sends them to Flume Agent Source. Flume provides point-to-point high availability. Data in the Flume Agent Channel on one server is removed only when it is transferred to the Flume Agent Channel on another server or correctly saved to the local file storage system.

Set up the message processing system

Kafka provides features similar to JMS, but with a completely different design implementation, and is not an implementation of the JMS specification. Kafka saves messages according to Topic. The sender becomes Producer and the receiver becomes Consumer. In addition, a Kafka cluster consists of multiple Kafka instances, with each instance (server) becoming a broker. Both Kafka clusters, producers and consumers rely on ZooKeeper to ensure that the system is available and that the cluster stores meta information.

(Image from Kafka website)

Kafka actually acts as a data buffer pool in log analysis system. Because it is based on log File message system, data is not easy to lose, and can record the location of data consumption and users can customize the starting location of message consumption, which makes it possible to consume messages repeatedly. In addition, it has both queue and publish and subscribe message consumption modes, which is very flexible and highly compatible with Storm. It takes full advantage of Linux I/O to improve read and write speed.

Steaming Computing via Storm

Storm is an open source distributed real-time computing system, often referred to as a streaming computing framework. What is streaming computing? In layman’s terms, streaming computing is exactly what its name implies: a stream of data comes in and out, calculating the results as it goes along and then moving on to the next stream. For example, a financial product runs continuously, transactions of financial products will continue, and all user behaviors will be recorded in the log. Storm is one of the typical models. Its general application scenario is as follows: a message queue system such as Kafka is used in the middle, and messages are cached first. Storm has many nodes, which run distributed and parallel processing programs for data processing.

The flow from Kafka Comsumer to Storm is as follows:

According to Storm’s programming model, realizing this data processing requirement requires the establishment of a data source Spout component, two business logic components Bolt, and a Topology structure, which adds these three components to the Topology structure. Spout is used to generate or receive data from outside the data source. Bolt is used to consume data streams sent from Spout and implement user-defined data processing logic; For complex data processing, multiple consecutive Bolts can be defined for collaborative processing. Tuples are Storm’s data model, consisting of values and their corresponding fields.

In Storm, the concept of Stream Group is introduced, which is used to decide which tuples from Spout or Bolt components should be transmitted to next. It is clear that the program should set which tuples a component should receive from which component. Storm provides several mechanisms for grouping streams, such as shuffleGrouping and randomly distributing tuples in streams, ensuring that each Bolt task receives roughly the same number of tuples. Finally, Spout and Bolt were used to generate Topology objects in the program and submitted to Storm cluster for execution. The Topology class assembles one spout and two Bolts into a Topology class and sets the direction of data transfer between them and the number of processes using the Append shuffleGrouping method.

One final conclusion

Based on the above discussion of FKS combination, the running process of real-time log analysis system is as follows:

  1. Through Flume to monitor the log generated by the business system, and real-time capture of every log information and into the Kafka message system;

  2. Then the messages in Kafka are consumed by Storm system. The user defined Storm Topology is used to analyze the log information and output it to Redis cache database.

  3. At the same time, redis cache data, regular synchronization to MySQL;

  4. In order to serve each front-end system, a set of API service is established to facilitate the acquisition of data from each dimension.

Scan the QR code or manually search wechat public account [architecture stack] : ForestNotes

Welcome to reprint, bring the following QR code