1. Introduction
In our previous article, “ELK Special: Day2 — Logstash Configuration Nginx Log Analysis,” we did the initial debugging of the necessary ELK Stack around the Web business log collection scenario, but we only completed a prototype. In a real operation scenario, there are some necessary additions, which we will continue to explore later.
2. Scenario and solution
2.1 Adding a Field
In real operation and maintenance work, a set of business will have multiple environments, generally divided into at least dev (development environment), test (online test environment), production environment (live). Although we can distinguish the log source through the Agent. Hostname field automatically generated in logstash, we need to add fields to the log in consideration of other requirements or scenarios, such as distinguishing the log types in the same server and differentiated processing of individual contents in the same log file.
2.1.1 Adding a Field to FileBeat
2.1.1.1 Example of Configuring FileBeat
We can add fields to fileBeat’s configuration file using the Fields configuration item. Fields can be strings, arrays, or dictionaries. The configuration example is as follows:
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/nginx/hexo_access.log
fields:
env: test # Identify the test environment
nginx_log_type: access
- type: log
enabled: true
paths:
- /var/log/nginx/hexo_error.log
fields:
env: test
nginx_log_type: error # Errorlog
Copy the code
Reference: www.elastic.co/guide/en/be…
2.1.1.2 LogStash Processing of custom fields
After adding custom fields to the FileBeat configuration, log content will be in the form of fields.nginx_log_type: access when entering the logstash configuration. To explicitly filter and match custom fields in the logstash configuration, you need to configure the following:
. filter {if [fields][nginx_log_type] == "access"{ grok { ... }}}...Copy the code
About logstash pipeline configuration file for the if the use of the reference documentation: www.elastic.co/guide/en/lo…
2.1.2 Adding a Field in the LogStash File
The logstash pipeline configuration supports adding or modifying fields through the plug-in Mutate. The configuration example is as follows:
input { stdin { } }
filter {
grok {
match => { "message"= >"%{IPORHOST:remote_addr} - %{DATA:remote_user} \[%{HTTPDATE:time_local}\] \"%{WORD:request_method} %{DATA:uri} HTTP/%{NUMBER:http_version}\" %{NUMBER:response_code} %{NUMBER:body_sent_bytes} \"%{DATA:http_referrer}\" \"%{DATA:http_user_agent}\"" }
}
date {
match => [ "time_local"."dd/MMM/yyyy:HH:mm:ss Z" ]
}
mutate {
add_field => {
"new_field"= >"new_static_value"
"foo_%{request_method}"= >"something different from %{uri}"
}
}
}
output {
stdout { codec => rubydebug }
}
Copy the code
We can also use variables to modify the contents of the field.
{
"request_method"= >"GET"."response_code"= >"304"."remote_user"= >"-"."host"= >"logstash"."message"= >"202.105.107.186 - - [18/Aug/ 220:11:47:13 +0800] \"GET /images/ Alipay. PNG HTTP/1.1\" 304 0 \ \ "http://rondochen.com/ELK2/\" "Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36\"."http_user_agent"= >"Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"."@timestamp"= >2021- 08-18T03:47:13.000Z,
"new_field"= >"new_static_value"."http_referrer"= >"http://rondochen.com/ELK2/"."remote_addr"= >"202.105.107.186"."time_local"= >"18/Aug/2021:11:47:13 +0800"."body_sent_bytes"= >"0"."http_version"= >"1.1"."@version"= >"1"."uri"= >"/images/alipay.png"."foo_GET"= >"something different from /images/alipay.png"
}
Copy the code
Reference: www.elastic.co/guide/en/lo…
2.2 Identifying the geographical location of an IP address
In common business analysis scenario, we often need to access to the source, such as found within the site visits to the highest, the most intense point in time, or find the traffic even simply statistics page open speed, we can all through the front we have prepared the log content retrieval to start with. But if we also want to know the source of access, which province or city counts the most users in the source of access, we need to identify the IP address.
The Logstash plug-in GeoIP automatically identifies the region where the IP address resides through GeoLite2 and automatically adds the required fields. The example configuration is as follows:
input {}
filter {
grok { ... }
}
geoip {
source => "remote_addr"
target => "geoip"
}
}
output {}
Copy the code
It should be noted that only after log content identification is completed, IP address or domain name information is provided to the GeoIP plug-in, and the source field of geoIP is provided, so that the address can be correctly identified. At the same time, the geoIP recognition result specified by the Target configuration item is organized into a field named geoIP.
When using GeoIP for the first time, you may need to wait a few minutes for the GeoLite2 database to complete initialization before it works properly.
The following is an example of the result returned after location recognition:
{..."geoip"= > {"country_name"= >"China"."continent_code"= >"AS"."location"= > {"lat"= >22.5333."lon"= >114.1333
},
"country_code2"= >"CN"."city_name"= >"Shenzhen"."ip"= >"14.154.29.133"."region_name"= >"Guangdong"."longitude"= >114.1333."country_code3"= >"CN"."timezone"= >"Asia/Shanghai"."region_code"= >"GD"."latitude"= >22.5333}}Copy the code
Reference: www.elastic.co/guide/en/lo…
2.3 Centralized Pattern Management
In our last post, we used the Grok plugin to configure pattern directly in the Pipeline configuration file to identify log content:
input { stdin { } }
filter {
...
grok {
match => { "message"= >"%{IPORHOST:remote_addr} - %{DATA:remote_user} \[%{HTTPDATE:time_local}\] \"%{WORD:request_method} %{DATA:uri} HTTP/%{NUMBER:http_version}\" %{NUMBER:response_code} %{NUMBER:body_sent_bytes} \"%{DATA:http_referrer}\" \"%{DATA:http_user_agent}\""}}... } output { stdout { codec => rubydebug } }Copy the code
The problem with this configuration is that it is not easy to maintain. When we have multiple Pipeline profiles for the same log format, we need to modify multiple pipeline profiles every time we change the log format, and this configuration makes the configuration files too cluttered.
We can uniformly maintain each pattern in the following ways:
-
Create a file/etc/logstash/pattern. D/mypattern
-
Place the pattern we need in the file mypattern as follows:
NGINXCOMBINEDLOG %{IPORHOST:remote_addr} - %{DATA:remote_user} \[%{HTTPDATE:time_local}\] \"%{WORD:request_method} %{DATA:uri} HTTP/%{NUMBER:http_version}\" %{NUMBER:response_code} %{NUMBER:body_sent_bytes} \"%{DATA:http_referrer}\" \"%{DATA:http_user_agent}\" Copy the code
-
Modify the pipeline configuration file to the following form:
input { } filter { grok { patterns_dir => [ "/etc/logstash/pattern.d" ] match => { "message"= >"%{NGINXCOMBINEDLOG}"}}... } output { }Copy the code
In this way, when we need to modify pattern, only need to modify a file, can be in multiple pipelines.
In addition, the Logstash developers provide us with many common log patterns that we can download and reference directly:
Github.com/logstash-pl…
Patterns_dir reference documentation: www.elastic.co/guide/en/lo…
2.4 Consolidating configuration Files
Combining the optimizations we mentioned above, we ended up using the following configuration file:
-
Filebeat configuration
logging.level: info logging.to_files: true logging.files: path: /var/log/filebeat name: filebeat keepfiles: 7 filebeat.inputs: - type: log enabled: true paths: - /var/log/nginx/hexo_access.log fields: env: test nginx_log_type: access - type: log enabled: true paths: - /var/log/nginx/hexo_error.log fields: env: test nginx_log_type: error setup.template.settings: index.number_of_shards: 1 output.logstash: hosts: ["192.168.0.211:5400"] Copy the code
-
Logstash pipeline configuration
input { beats { host => "0.0.0.0" port => 5400 } } filter { if [fields][nginx_log_type] == "access" { grok { patterns_dir => ["/etc/logstash/pattern.d"] match => { "message"= >"%{NGINXCOMBINEDLOG}" } } date { match => [ "time_local"."dd/MMM/yyyy:HH:mm:ss Z" ] } geoip { source => "remote_addr" } } } output { elasticsearch { hosts => ["192.168.0.212:9200"] index => "rc_index_pattern-%{+YYYY.MM.dd}"}}Copy the code
After modifying the configuration file, restart the service for the configuration to take effect.
Tips:
The logstash method for dynamically loading configuration files:
kill -SIGHUP ${logstash_pid}
3. Summary
In the above practice, we made a few optimizations to the Logstash and FileBeat configurations to make them closer to our actual production scenario. After optimization, we can distinguish between production and test traffic in Kibana using the field field.env. Later, we can start business analysis of our site.
Click to expand the complete JSON
{" _index ":" rc_index_pattern - 2021.08.18 ", "_type" : "_doc", "_id" : "oPxeV3sB - S8uwXpk0k - d", "_version" : 1, "_score" : Null, "fields": {"agent.version.keyword": ["7.13.4"], "http_referrer. Keyword ": [ "http://rondochen.com/" ], "geoip.timezone": [ "Asia/Shanghai" ], "remote_addr.keyword": ["202.105.107.186"], "how.name.keyword ": ["hexo"], "geoip.region_name. Keyword ": [ "Guangdong" ], "geoip.country_code2.keyword": [ "CN" ], "geoip.country_name.keyword": [ "China" ], "agent.hostname.keyword": [ "hexo" ], "request_method.keyword": [ "GET" ], "remote_user": / "-", "ecs. Version. Keyword:" [] "1.8.0 comes with", "geoip. Region_code. Keyword:" [] "GD", "geoip. City_name. Keyword" : [ "Shenzhen" ], "agent.name": [ "hexo" ], "host.name": [ "hexo" ], "geoip.longitude": [114.1333], "fields. Env. Keyword ": ["test"], "geoip.location.lat": [22.5333], "agent.id.keyword": [" D2F43DA1-5024-4000-9251-0BCC8FC10697 "], "http_version": ["1.1"], "time_local": [ "18/Aug/2021:11:47:13 +0800" ], "@version.keyword": [ "1" ], "geoip.region_name": [ "Guangdong" ], "input.type": [ "log" ], "log.offset": [ 2593 ], "agent.hostname": [ "hexo" ], "tags": [ "beats_input_codec_plain_applied" ], "agent.id": [ "d2f43da1-5024-4000-9251-0bcc8fc10697" ], "geoip.continent_code.keyword": [ "AS" ], "ecs.version": "], "[" 1.8.0 comes with the message. The keyword" : ["202.105.107.186 - - [18/Aug/ 201:11:47:13 +0800] \"GET /ELK2/ HTTP/1.1\" 200 15101 \"http://rondochen.com/\" \"Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36\""], "body_sent_bytes": ["15101"], "geoip.latitude": [22.5333], "http_referrer": ["http://rondochen.com/"], "agent.version": [" 7.13.4 "], "geoip. Continent_code" : [" AS "], "response_code" : [" 200 "], "input. The type. The keyword" : [ "log" ], "geoip.region_code": [ "GD" ], "tags.keyword": [ "beats_input_codec_plain_applied" ], "remote_user.keyword": [ "-" ], "geoip.country_code3.keyword": [ "CN" ], "request_method": [ "GET" ], "fields.nginx_log_type": ["access"], "geoip. IP ": ["202.105.107.186"], "fields.nginx_log_type. Keyword ": ["access"], "http_user_agent": ["Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"], "ury.keyword ": [ "/ELK2/" ], "agent.type": [ "filebeat" ], "geoip.country_code3": [ "CN" ], "geoip.country_code2": [ "CN" ], "http_user_agent.keyword": ["Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"], "@version": [ "1" ], "geoip.country_name": [ "China" ], "log.file.path.keyword": ["/var/log/nginx/hexo_access.log"], "http_version.keyword": ["1.1"], "agent.type.keyword": [ "filebeat" ], "agent.ephemeral_id.keyword": [ "e3425b80-edff-41c6-a6db-41d3e4904130" ], "remote_addr": [" 202.105.107.186 "], "fields. The env" : / "test", "agent. Name. Keyword:" [] "hexo", "time_local. Keyword" : [ "18/Aug/2021:11:47:13 +0800" ], "geoip.city_name": [ "Shenzhen" ], "message": ["202.105.107.186 - - [18/Aug/ 201:11:47:13 +0800] \"GET /ELK2/ HTTP/1.1\" 200 15101 \"http://rondochen.com/\" \"Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36\"], "geoip.ip.keyword": [" 202.105.107.186 "], "uri" : / / ELK2 "/", "@ timestamp:" [" of "the 2021-08-18 T03: there. 000 z]," body_sent_bytes. Keyword ": ["15101"], "geoip.location.lon": [114.1333], "response_code.keyword": ["200"], "log.file.path": [ "/var/log/nginx/hexo_access.log" ], "agent.ephemeral_id": [ "e3425b80-edff-41c6-a6db-41d3e4904130" ], "geoip.timezone.keyword": [ "Asia/Shanghai" ] }, "highlight": { "agent.hostname.keyword": [ "@kibana-highlighted-field@hexo@/kibana-highlighted-field@" ] }, "sort": [1629258433000]}Copy the code
Reprinted from: ELK Feature: Day3 — Add custom fields and recognize the geographic location of IP addresses through the Geoip plug-in