1. Introduction

In our previous article, “ELK Special: Day2 — Logstash Configuration Nginx Log Analysis,” we did the initial debugging of the necessary ELK Stack around the Web business log collection scenario, but we only completed a prototype. In a real operation scenario, there are some necessary additions, which we will continue to explore later.

2. Scenario and solution

2.1 Adding a Field

In real operation and maintenance work, a set of business will have multiple environments, generally divided into at least dev (development environment), test (online test environment), production environment (live). Although we can distinguish the log source through the Agent. Hostname field automatically generated in logstash, we need to add fields to the log in consideration of other requirements or scenarios, such as distinguishing the log types in the same server and differentiated processing of individual contents in the same log file.

2.1.1 Adding a Field to FileBeat

2.1.1.1 Example of Configuring FileBeat

We can add fields to fileBeat’s configuration file using the Fields configuration item. Fields can be strings, arrays, or dictionaries. The configuration example is as follows:

filebeat.inputs:
- type: log
  enabled: true
  paths:
   - /var/log/nginx/hexo_access.log
  fields:
    env: test # Identify the test environment
    nginx_log_type: access
- type: log
  enabled: true
  paths:
   - /var/log/nginx/hexo_error.log
  fields:
    env: test
    nginx_log_type: error # Errorlog
Copy the code

Reference: www.elastic.co/guide/en/be…

2.1.1.2 LogStash Processing of custom fields

After adding custom fields to the FileBeat configuration, log content will be in the form of fields.nginx_log_type: access when entering the logstash configuration. To explicitly filter and match custom fields in the logstash configuration, you need to configure the following:

. filter {if [fields][nginx_log_type] == "access"{ grok { ... }}}...Copy the code

About logstash pipeline configuration file for the if the use of the reference documentation: www.elastic.co/guide/en/lo…

2.1.2 Adding a Field in the LogStash File

The logstash pipeline configuration supports adding or modifying fields through the plug-in Mutate. The configuration example is as follows:

input { stdin { } }
filter {
  grok {
    match => { "message"= >"%{IPORHOST:remote_addr} - %{DATA:remote_user} \[%{HTTPDATE:time_local}\] \"%{WORD:request_method} %{DATA:uri} HTTP/%{NUMBER:http_version}\" %{NUMBER:response_code} %{NUMBER:body_sent_bytes} \"%{DATA:http_referrer}\" \"%{DATA:http_user_agent}\"" }
  }
  date {
    match => [ "time_local"."dd/MMM/yyyy:HH:mm:ss Z" ]
  }
  mutate {
    add_field => {
      "new_field"= >"new_static_value"
      "foo_%{request_method}"= >"something different from %{uri}"
    }
  }
}
output {
  stdout { codec => rubydebug }
}
Copy the code

We can also use variables to modify the contents of the field.

{
     "request_method"= >"GET"."response_code"= >"304"."remote_user"= >"-"."host"= >"logstash"."message"= >"202.105.107.186 - - [18/Aug/ 220:11:47:13 +0800] \"GET /images/ Alipay. PNG HTTP/1.1\" 304 0 \ \ "http://rondochen.com/ELK2/\" "Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36\"."http_user_agent"= >"Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"."@timestamp"= >2021- 08-18T03:47:13.000Z,
          "new_field"= >"new_static_value"."http_referrer"= >"http://rondochen.com/ELK2/"."remote_addr"= >"202.105.107.186"."time_local"= >"18/Aug/2021:11:47:13 +0800"."body_sent_bytes"= >"0"."http_version"= >"1.1"."@version"= >"1"."uri"= >"/images/alipay.png"."foo_GET"= >"something different from /images/alipay.png"
}
Copy the code

Reference: www.elastic.co/guide/en/lo…

2.2 Identifying the geographical location of an IP address

In common business analysis scenario, we often need to access to the source, such as found within the site visits to the highest, the most intense point in time, or find the traffic even simply statistics page open speed, we can all through the front we have prepared the log content retrieval to start with. But if we also want to know the source of access, which province or city counts the most users in the source of access, we need to identify the IP address.

The Logstash plug-in GeoIP automatically identifies the region where the IP address resides through GeoLite2 and automatically adds the required fields. The example configuration is as follows:

input {}

filter {
    grok { ... }
    }
    geoip {
      source => "remote_addr"
      target => "geoip"
    }
}

output {}
Copy the code

It should be noted that only after log content identification is completed, IP address or domain name information is provided to the GeoIP plug-in, and the source field of geoIP is provided, so that the address can be correctly identified. At the same time, the geoIP recognition result specified by the Target configuration item is organized into a field named geoIP.

When using GeoIP for the first time, you may need to wait a few minutes for the GeoLite2 database to complete initialization before it works properly.

The following is an example of the result returned after location recognition:

{..."geoip"= > {"country_name"= >"China"."continent_code"= >"AS"."location"= > {"lat"= >22.5333."lon"= >114.1333
        },
         "country_code2"= >"CN"."city_name"= >"Shenzhen"."ip"= >"14.154.29.133"."region_name"= >"Guangdong"."longitude"= >114.1333."country_code3"= >"CN"."timezone"= >"Asia/Shanghai"."region_code"= >"GD"."latitude"= >22.5333}}Copy the code

Reference: www.elastic.co/guide/en/lo…

2.3 Centralized Pattern Management

In our last post, we used the Grok plugin to configure pattern directly in the Pipeline configuration file to identify log content:

input { stdin { } }
filter {
  ...
  grok {
    match => { "message"= >"%{IPORHOST:remote_addr} - %{DATA:remote_user} \[%{HTTPDATE:time_local}\] \"%{WORD:request_method} %{DATA:uri} HTTP/%{NUMBER:http_version}\" %{NUMBER:response_code} %{NUMBER:body_sent_bytes} \"%{DATA:http_referrer}\" \"%{DATA:http_user_agent}\""}}... } output { stdout { codec => rubydebug } }Copy the code

The problem with this configuration is that it is not easy to maintain. When we have multiple Pipeline profiles for the same log format, we need to modify multiple pipeline profiles every time we change the log format, and this configuration makes the configuration files too cluttered.

We can uniformly maintain each pattern in the following ways:

  1. Create a file/etc/logstash/pattern. D/mypattern

  2. Place the pattern we need in the file mypattern as follows:

    NGINXCOMBINEDLOG %{IPORHOST:remote_addr} - %{DATA:remote_user} \[%{HTTPDATE:time_local}\] \"%{WORD:request_method} %{DATA:uri} HTTP/%{NUMBER:http_version}\" %{NUMBER:response_code} %{NUMBER:body_sent_bytes} \"%{DATA:http_referrer}\" \"%{DATA:http_user_agent}\"
    Copy the code
  3. Modify the pipeline configuration file to the following form:

    input {  }
    filter {
      grok {
        patterns_dir => [ "/etc/logstash/pattern.d" ]
        match => { "message"= >"%{NGINXCOMBINEDLOG}"}}... } output { }Copy the code

In this way, when we need to modify pattern, only need to modify a file, can be in multiple pipelines.

In addition, the Logstash developers provide us with many common log patterns that we can download and reference directly:

Github.com/logstash-pl…

Patterns_dir reference documentation: www.elastic.co/guide/en/lo…

2.4 Consolidating configuration Files

Combining the optimizations we mentioned above, we ended up using the following configuration file:

  1. Filebeat configuration

    logging.level: info
    logging.to_files: true
    logging.files:
      path: /var/log/filebeat
      name: filebeat
      keepfiles: 7
    
    filebeat.inputs:
    - type: log
      enabled: true
      paths:
       - /var/log/nginx/hexo_access.log
      fields:
        env: test
        nginx_log_type: access
    - type: log
      enabled: true
      paths:
       - /var/log/nginx/hexo_error.log
      fields:
        env: test
        nginx_log_type: error
    
    setup.template.settings:
      index.number_of_shards: 1
    
    output.logstash:
      hosts: ["192.168.0.211:5400"]
    Copy the code
  2. Logstash pipeline configuration

    input {
            beats {
                    host => "0.0.0.0"
                    port => 5400
            }
    }
    
    filter {
      if [fields][nginx_log_type] == "access" {
        grok {
          patterns_dir => ["/etc/logstash/pattern.d"]
          match => { "message"= >"%{NGINXCOMBINEDLOG}" }
        }
        date {
          match => [ "time_local"."dd/MMM/yyyy:HH:mm:ss Z" ]
        }
        geoip {
          source => "remote_addr"
        }
      }
    }
    
    output {
            elasticsearch { 
                    hosts => ["192.168.0.212:9200"] 
                    index => "rc_index_pattern-%{+YYYY.MM.dd}"}}Copy the code

After modifying the configuration file, restart the service for the configuration to take effect.

Tips:

The logstash method for dynamically loading configuration files:

kill -SIGHUP ${logstash_pid}

3. Summary

In the above practice, we made a few optimizations to the Logstash and FileBeat configurations to make them closer to our actual production scenario. After optimization, we can distinguish between production and test traffic in Kibana using the field field.env. Later, we can start business analysis of our site.

Click to expand the complete JSON
{" _index ":" rc_index_pattern - 2021.08.18 ", "_type" : "_doc", "_id" : "oPxeV3sB - S8uwXpk0k - d", "_version" : 1, "_score" : Null, "fields": {"agent.version.keyword": ["7.13.4"], "http_referrer. Keyword ": [ "http://rondochen.com/" ], "geoip.timezone": [ "Asia/Shanghai" ], "remote_addr.keyword": ["202.105.107.186"], "how.name.keyword ": ["hexo"], "geoip.region_name. Keyword ": [ "Guangdong" ], "geoip.country_code2.keyword": [ "CN" ], "geoip.country_name.keyword": [ "China" ], "agent.hostname.keyword": [ "hexo" ], "request_method.keyword": [ "GET" ], "remote_user": / "-", "ecs. Version. Keyword:" [] "1.8.0 comes with", "geoip. Region_code. Keyword:" [] "GD", "geoip. City_name. Keyword" : [ "Shenzhen" ], "agent.name": [ "hexo" ], "host.name": [ "hexo" ], "geoip.longitude": [114.1333], "fields. Env. Keyword ": ["test"], "geoip.location.lat": [22.5333], "agent.id.keyword": [" D2F43DA1-5024-4000-9251-0BCC8FC10697 "], "http_version": ["1.1"], "time_local": [ "18/Aug/2021:11:47:13 +0800" ], "@version.keyword": [ "1" ], "geoip.region_name": [ "Guangdong" ], "input.type": [ "log" ], "log.offset": [ 2593 ], "agent.hostname": [ "hexo" ], "tags": [ "beats_input_codec_plain_applied" ], "agent.id": [ "d2f43da1-5024-4000-9251-0bcc8fc10697" ], "geoip.continent_code.keyword": [ "AS" ], "ecs.version": "], "[" 1.8.0 comes with the message. The keyword" : ["202.105.107.186 - - [18/Aug/ 201:11:47:13 +0800] \"GET /ELK2/ HTTP/1.1\" 200 15101 \"http://rondochen.com/\" \"Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36\""], "body_sent_bytes": ["15101"], "geoip.latitude": [22.5333], "http_referrer": ["http://rondochen.com/"], "agent.version": [" 7.13.4 "], "geoip. Continent_code" : [" AS "], "response_code" : [" 200 "], "input. The type. The keyword" : [ "log" ], "geoip.region_code": [ "GD" ], "tags.keyword": [ "beats_input_codec_plain_applied" ], "remote_user.keyword": [ "-" ], "geoip.country_code3.keyword": [ "CN" ], "request_method": [ "GET" ], "fields.nginx_log_type": ["access"], "geoip. IP ": ["202.105.107.186"], "fields.nginx_log_type. Keyword ": ["access"], "http_user_agent": ["Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"], "ury.keyword ": [ "/ELK2/" ], "agent.type": [ "filebeat" ], "geoip.country_code3": [ "CN" ], "geoip.country_code2": [ "CN" ], "http_user_agent.keyword": ["Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"], "@version": [ "1" ], "geoip.country_name": [ "China" ], "log.file.path.keyword": ["/var/log/nginx/hexo_access.log"], "http_version.keyword": ["1.1"], "agent.type.keyword": [ "filebeat" ], "agent.ephemeral_id.keyword": [ "e3425b80-edff-41c6-a6db-41d3e4904130" ], "remote_addr": [" 202.105.107.186 "], "fields. The env" : / "test", "agent. Name. Keyword:" [] "hexo", "time_local. Keyword" : [ "18/Aug/2021:11:47:13 +0800" ], "geoip.city_name": [ "Shenzhen" ], "message": ["202.105.107.186 - - [18/Aug/ 201:11:47:13 +0800] \"GET /ELK2/ HTTP/1.1\" 200 15101 \"http://rondochen.com/\" \"Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36\"], "geoip.ip.keyword": [" 202.105.107.186 "], "uri" : / / ELK2 "/", "@ timestamp:" [" of "the 2021-08-18 T03: there. 000 z]," body_sent_bytes. Keyword ": ["15101"], "geoip.location.lon": [114.1333], "response_code.keyword": ["200"], "log.file.path": [ "/var/log/nginx/hexo_access.log" ], "agent.ephemeral_id": [ "e3425b80-edff-41c6-a6db-41d3e4904130" ], "geoip.timezone.keyword": [ "Asia/Shanghai" ] }, "highlight": { "agent.hostname.keyword": [ "@kibana-highlighted-field@hexo@/kibana-highlighted-field@" ] }, "sort": [1629258433000]}Copy the code

Reprinted from: ELK Feature: Day3 — Add custom fields and recognize the geographic location of IP addresses through the Geoip plug-in