An overview,

ELK has become the most popular centralized log solution. It is mainly composed of Beats, Logstash, Elasticsearch, Kibana and other components to achieve real-time log collection, storage, display and other one-stop solution. This article introduces common ELK architectures and related problem solving.

  1. Filebeat: Filebeat is a lightweight data collection engine with very few service resources. It is a new member of the ELK family. It can replace Logstash as a log collection engine on the application server side.
  2. Logstash: Data collection engine. Compared to Filebeat, it integrates a large number of plug-ins and supports rich data source collection. The collected data can be filtered, analyzed, and formatted in log format.
  3. Elasticsearch is a distributed data search engine based on Apache Lucene, which provides centralized storage, analysis, and powerful data search and aggregation capabilities.
  4. Kibana: Data visualization platform. This web platform allows you to view Elasticsearch data in real time and provides rich chart statistics functions.

2. Common DEPLOYMENT architecture of ELK

2.1. Logstash as the log collector

The Logstash component is deployed on each application server as a log collector. The Logstash data is filtered, analyzed, formatted and sent to Elasticsearch storage. Finally, Kibana is used for visualization. The downside of this architecture is that Logstash is a server resource hog, so it can increase the load on the application server side.

2.2 Filebeat as log collector

The only difference between this architecture and the first is: The log collector on the application side is changed to Filebeat, which is lightweight and occupies less resources on the server. Therefore, Filebeat is used as the log collector on the application server. Generally, Filebeat is used together with Logstash.

2.3. Introduce the deployment architecture of cache queues

The architecture introduces the Kafka message queue (or other message queues) on the basis of the second architecture, sends the data collected by Filebeat to Kafka, and then reads the data in Kafka through Logstasth. This architecture mainly solves the log collection solution under large data volume. The main purpose of using cache queues is to solve data security and balance Logstash and Elasticsearch load.

2.4. Summary of the above three architectures

The first deployment architecture is now rarely used due to resource usage, while the second deployment architecture is currently the most used. As for the third deployment architecture, I do not feel the need to introduce message queues unless I have other requirements, because in the case of a large amount of data, Filebeat uses pressure-sensitive protocols to send data to Logstash or Elasticsearch. If Logstash is busy processing data, it tells Filebeat to slow down its reads. Once congestion is resolved, Filebeat will resume its initial speed and continue sending data.

Problems and solutions

Question: How to implement log multi-line merge function?

Logs in system applications are generally printed in a specific format. Data belonging to the same log may be printed in multiple lines. Therefore, when using ELK to collect logs, data belonging to the same log in multiple lines must be combined.

Solution: Use the Multiline multi-line merge plugin in Filebeat or Logstash to do this

When using the multiline multi-line merge plug-in, it is important to note that different ELK deployment schemas may use multiline differently. If it is the first deployment schema in this article, then multiline needs to be configured in Logstash. If it is the second deployment schema, So multiline needs to be configured in Filebeat instead of Logstash.

Multiline configuration in Filebeat:

filebeat.prospectors:
    -
       paths:
          - /home/project/elk/logs/test.log
       input_type: log 
       multiline:
            pattern: '^\['
            negate: true
            match: after
output:
   logstash:
      hosts: ["localhost:5044"]Copy the code
  • Pattern: indicates a regular expression
  • Negate: The default value is false, indicating that the lines matching pattern are merged to the previous line. True: lines that do not match pattern are merged to the previous line
  • Match: after indicates merging to the end of the previous line, and before indicates merging to the beginning of the previous line

Such as:

pattern: ‘\[‘

negate: true

match: after

This configuration merges lines that do not match the Pattern pattern to the end of the previous line

2. Configuration of multiline in Logstash

input {
  beats {
    port => 5044
  }
}

filter {
  multiline {
    pattern => "%{LOGLEVEL}\s*\]"
    negate => true
    what => "previous"
  }
}

output {
  elasticsearch {
    hosts => "localhost:9200"
  }
}Copy the code

(1) The value of what attribute configured in Logstash is previous, which is equivalent to after in Filebeat, and the value of what attribute configured in Logstash is next, which is equivalent to before in Filebeat. (2) Pattern => “%{LOGLEVEL}\s*\]” is the Logstash regular matching pattern. There are many regular matching patterns in the pattern. https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns

Question: How do I replace the time field displayed in logs in Kibana with the time in log information?

By default, the time field we view in Kibana is different from the time in the log information. Because the default time field is the current time when the log is collected, the time in this field needs to be replaced with the time in the log information.

Solution: use grok word segmentation plug-in and date time formatting plug-in to achieve

Configure grok word segmentation and date formatting in the Logstash configuration file filter, for example:

input { beats { port => 5044 } } filter { multiline { pattern => "%{LOGLEVEL}\s*\]\[%{YEAR}%{MONTHNUM}%{MONTHDAY}\s+%{TIME}\]" negate => true what => "previous" } grok { match => [ "message" , "(?<customer_time>%{YEAR}%{MONTHNUM}%{MONTHDAY}\s+%{TIME})" ] } date { match => ["customer_time", }} output {elasticSearch {hosts => output {elasticSearch {hosts => "localhost:9200" } }Copy the code

For example, the log format to be matched is: “[the DEBUG] [10:07:31 20170811, 359] [DefaultBeanDefinitionDocumentReader: 106] Loading bean definitions”, parsing out of the way the log time fields are:

For example, the expression file is customer_Patterns. The content is CUSTOMER_TIME %{YEAR}%{MONTHNUM}%{MONTHDAY}\s+%{TIME}. Then the logstash reference can look like this:

Filter {grok {patterns_dir => ["./customer-patterms/mypatterns"] // Reference expression file path match => [" message", "%{CUSTOMER_TIME: CUSTOMER_TIME}"] // Use custom grok expression}}Copy the code

② in the configuration item mode, the rule is :(? < custom expression name > Regular matching rule), as in:

filter {
  grok {
    match => [ "message" , "(?<customer_time>%{YEAR}%{MONTHNUM}%{MONTHDAY}\s+%{TIME})" ]
  }
}Copy the code

Question: How do I view data in Kibana by selecting different system log modules

Generally, the log data displayed in Kibana is a mixture of data from different system modules, so how to select or filter the log data to view only the specified system module?

Solution: Add fields identifying different system modules or create ES indexes based on different system modules

1. Add a field identifying different system modules, and then Kibana can filter and query the data of different modules based on this field. Here is the second deployment architecture to explain the configuration content in Filebeat:

filebeat.prospectors: - paths: - /home/project/elk/logs/account.log input_type: log multiline: pattern: '^\[' negate: True match: after fields: / / log_from log_from fields: new account - paths: - / home/project/elk/logs/customer log input_type: log multiline: pattern: '^\[' negate: true match: after fields: log_from: customer output: logstash: hosts: ["localhost:5044"]Copy the code

The: log_FROM field is added to identify logs of different system modules

2. Configure the ES index corresponding to different system modules, and then create the corresponding index mode match in Kibana. You can select different system module data through the index mode drop-down box on the page. Here is the second deployment architecture, divided into two steps: ① In Filebeat configuration:

filebeat.prospectors:
    -
       paths:
          - /home/project/elk/logs/account.log
       input_type: log 
       multiline:
            pattern: '^\['
            negate: true
            match: after
       document_type: account

    -
       paths:
          - /home/project/elk/logs/customer.log
       input_type: log 
       multiline:
            pattern: '^\['
            negate: true
            match: after
       document_type: customer
output:
   logstash:
      hosts: ["localhost:5044"]Copy the code

Document_type is used to identify different system modules

② Modify the output configuration in Logstash as:

output {
  elasticsearch {
    hosts => "localhost:9200"
    index => "%{type}"
  }
}Copy the code

Add index attribute to output, %{type} indicates ES index based on different document_type value

Four,

This paper mainly introduces three deployment architectures of ELK real-time log analysis, and the problems solved by different architectures. The second deployment mode of these three architectures is the most popular and common deployment mode. Finally, it introduces some problems and solutions of ELK in log analysis. ELK can be used not only for centralized query and management of distributed log data, but also for project applications and server resource monitoring. See the official website for more information.