With the popularization of microservitization, our applications will be deployed in a distributed way, which will lead to scattered application logs, difficult application monitoring and troubleshooting, and low efficiency. The centralized log platform is to solve this problem.

background

A long time ago, all our applications have been connected with CAT, which can check application performance indicators such as application traffic volume and abnormal call status online in real time. Meanwhile, call links of various platforms have been opened, basically meeting application performance monitoring requirements.

Since the application of service logs is not pushed to CAT, it is difficult to check service logs and troubleshoot problems in the traditional way when online problems occur. Therefore, it is extremely urgent to build a centralized platform for service logs. After our research, we chose Elastic’s ELK logging solution for an online demo.

There are two main reasons:

  • The functions provided by ELK meet our requirements and have high scalability.
  • ELK is an open source project with low maintenance cost.

Relevant concepts

Beats is a new addition to ELK, a set of solutions that stands for Elasticsearch, Logstash and Kibana software products.

  • E: represents Elasticsearch, which stores and retrieves logs.
  • L: represents Logstash, which is responsible for collecting, filtering, and formatting logs.
  • K: On behalf of Kibana, responsible for the visualization of log data;
  • Beats: is a kind of lightweight data collector;

Among them, the current Beats family is divided into four categories according to functions:

  • Filebeat: Collects file data;
  • Packetbeat: Responsible for collecting network traffic data;
  • Metricbeat: Collects system-level CPU usage, memory, file system, disk IO, and network IO statistics;
  • Winlogbeat: Responsible for collecting Windows event log data;

In the log platform system, Filebeat is used as a log file collection tool. Filebeat can easily collect log files of Nginx, Mysql, Redis, Syslog and other applications.

Log Platform Architecture

ELK centralized log platform. Generally speaking, data collectors deployed on application servers collect log data in near real time and push it to the Logstash of log filtering nodes. The Logstash then pushes formatted log data to Elasticsearch storage. Kibana centrally retrieves and visualizes logs through Elasticsearch.

Of course, the ELK centralized logging platform has also evolved over time to become the final form.

ES + Logstash + Kibana

In the original architecture, Logstash took over the data collector and filtering functions and deployed them on the application server. Because Logstash filters a large number of logs, part of the performance of the application system is consumed and unreasonable resource allocation is brought about. On the other hand, the log filtering configuration is distributed on each application server, which is not convenient for centralized configuration management.

Introduce the Logstash – forwarder

With this architecture, the Logstash-forwarder is introduced as data collection. The Logstash is separated from the application server. The application server only collects data, and the data filtering is centralized on the log platform server, which solves the existing problems. However, the communication between the Logstash-forwarder and Logstash forwarder must be encrypted by SSL, which is difficult to deploy and does not significantly improve the system performance. On the other hand, the Logstash-forwarder location is not a data collection plug-in, so the system is not easy to expand.

The introduction of Beats

This architecture, based on the Logstash-forwarder architecture, replaces the Logstash-forwarder with Beats. Because Beats has lower system performance overhead, the application server performance overhead is negligible. On the other hand, Beats can work as a data collection plug-in. Plug-ins of different functions under Beats can be enabled as required, which is more flexible and expansibility. For example, if only Filebeat is enabled on an application server, only log file data can be collected. If you need to collect system performance data one day, you can enable Metricbeat without requiring much modification or configuration.

This ELK+Beats architecture already meets most application scenarios. However, when the service system is large and the log data is large and relatively real-time, the service system and the log system are coupled together.

The introduction of the queue

In this architecture, message queue is introduced to balance network transmission, thus reducing the possibility of network occlusion, especially the loss of data. On the other hand, the system can be decoupled and has better flexibility and scalability.

conclusion

The mature ELK+Beats architecture is the preferred solution for centralized logging platform due to its strong scalability. In actual deployment, whether to introduce message queues depends on the number of service systems. In the early deployment, you can simply deploy message queues without introducing them. Later, you need to expand message queues.

Related Articles »

  • ELK Centralized Logging Platform 2 — Deployment (2017-12-21)
  • ELK Centralized Logging Platform 3 — Advanced (2017-12-22)