1. Introduction

We have completed the ELK cluster construction in the article “ELK Special Topic: Day1 — Minimize ELK cluster construction and Collect Nginx logs”, and successfully collected Nginx logs and displayed them on the Kibana page. However, we found that in Kibana’s page, the whole section of Nginx logs collected was only stored as a field in text format, without parsing and identifying the log content.

We will investigate and solve this problem later.

The cluster architecture diagram is as follows:

2. Requirements analysis

2.1 Nginx Log format

Module ngx_http_log_module: default Nginx log format: combined

log_format combined '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer"  "$http_user_agent"';Copy the code

In this experimental environment, one of the access_log contents of Nginx is as follows:

[11/Aug/ 2020:09:31:49 +0800] "GET /2021/07/31/ELK1/ HTTP/1.1" 200 17188 "http://192.168.0.125/" "Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"Copy the code

Combined with the Nginx log format configuration, we can get the following:

remote_addr 192.168.0.123
remote_user null
time_local 11/Aug/2021:09:31:49 +0800
request GET / 2021/07/31 / ELK1 / HTTP / 1.1
status 200
body_bytes_sent 17188
http_referer http://192.168.0.125/
http_user_agent Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36

2.2 Index contents in ES

As we can see from Kibana’s page, ES recognizes Nginx logs as a text field:

In Kibana Discover page, only the message field is selected, and the display is as follows:

See the full index in ES as follows:

Click to expand the complete JSON
{" _index ":" rc_index_pattern - 2021.08.11 ", "_type" : "_doc", "_id" : "iPzWMnsB - S8uwXpkUkQ3", "_version" : 1, "_score" : Null, "fields": {"agent.version.keyword": ["7.13.4"], "input.type. Keyword ": ["log"], "host.name. Keyword ": [ "hexo" ], "tags.keyword": [ "beats_input_codec_plain_applied" ], "agent.hostname.keyword": [ "hexo" ], "agent.type": [" filebeat "], "ecs. Version. Keyword:" [] "1.8.0 comes with", "@ version:"/" 1 ", "agent. The name:" [] "hexo", "the host. The name" : [ "hexo" ], "log.file.path.keyword": [ "/var/log/nginx/hexo_access.log" ], "agent.type.keyword": [ "filebeat" ], "agent.ephemeral_id.keyword": [ "43714af4-5bf6-43c0-9c1a-4c223fddc273" ], "agent.name.keyword": [ "hexo" ], "agent.id.keyword": [ "d2f43da1-5024-4000-9251-0bcc8fc10697" ], "input.type": [ "log" ], "@version.keyword": [ "1" ], "log.offset": [ 4678 ], "agent.hostname": [ "hexo" ], "message": [" 192.168.0.123 - [11 / Aug / 2021:09:31:49 + 0800] \ "GET / 2021/07/31 ELK1 / HTTP / 1.1 \" 200\17188 "http://192.168.0.125/\" \"Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML like Gecko) Chrome/92.0.4515.131 Safari/537.36\"], "tags": ["beats_input_codec_plain_applied"], "@timestamp": [" 2021-08-11t01:31:50.630z "], "agent. Id ": [" D2F43DA1-5024-4000-9251-0BCC8FC10697 "], "ECs. version": ["1.8.0"], "log.file.path": [ "/var/log/nginx/hexo_access.log" ], "message.keyword": [" 192.168.0.123 - [11 / Aug / 2021:09:31:49 + 0800] \ "GET / 2021/07/31 ELK1 / HTTP / 1.1 \" 200\17188 "http://192.168.0.125/\" \"Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36\"], "agent.ephemeral_id": [" 43714AF4-5BF6-43C0-9C1A-4c223fDDC273 "], "agent. Version ": ["7.13.4"]}, "sort": [1628645510630]}Copy the code

2.3 Problem Analysis

It is not difficult to see that when Nginx logs are stored in ES, ES does not carry out field division and content recognition for log content, so we cannot search and count log content on Kibana.

To solve this problem, we can modify the Logstash configuration to complete the field division and content identification of the log content before the log content is stored in ES.

3. Logstash configuration roadmap

By referring to How Logstash Works, we can see that Logstash can realize log recognition and field arrangement functions through rich plug-ins.

3.1 How do I Modify The Collected Log Content

In the cluster building process of Day1, our Logstash service does not use the filter function. However, in the requirements of this paper, filter is required to process the collected log content before warehousing.

3.2 How can I Segment and Identify Log Content

Grok is currently one of the most widely used logstash plug-ins, making it easy to identify text-structured content and make it easy to retrieve.

Description: Debugging grok expressions

3.3 How do I Test the Configuration

We can reference the codecs plug-in in the Output of the Logstash. When the output to codecs is configured in the Logstash, we can know the output result of the final logstash through screen printing. In addition, we can introduce different formatting plug-ins, such as RubyDebug, to make the output easier to read.

3.4 LogSTSH Pipeline Configuration draft

Based on the above analysis, we can initially determine that the logstash Pipline test configuration is roughly as follows:

Input {stdin {}} # Output {stdout {codec => rubydebug} output {stdout {codec => rubydebug}Copy the code

4. Modify the Logstash configuration

4.1 Using Grok to Identify Log Content

On the main menu of the Kibana page, there is the Dev Tools tool, which includes the Grok Debugger and makes it easy to debug log content.

In Sample Data we enter the Nginx logs that need to be identified and matched:

[11/Aug/ 2020:09:31:49 +0800] "GET /2021/07/31/ELK1/ HTTP/1.1" 200 17188 "http://192.168.0.125/" "Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"Copy the code

Based on the log content and grok syntax, we can type the following in the grok Pattern:

%{IPORHOST:remote_addr} - %{DATA:remote_user} \[%{HTTPDATE:time_local}\] \"%{WORD:request_method} %{DATA:uri} HTTP/%{NUMBER:http_version}\" %{NUMBER:response_code} %{NUMBER:body_sent_bytes} \"%{DATA:http_referrer}\" \"%{DATA:http_user_agent}\"
Copy the code

Finally click Simulate to display the identification results:

Click to expand the complete JSON
{"remote_addr": "192.168.0.123", "response_code": "200", "time_local": "11/Aug/ 209:09:31:49 +0800", "http_version": "1.1", "request_method" : "GET", "uri" : "/ 2021/07/31 ELK1 /", "http_user_agent" : "Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36", "remote_user": "-", "body_sent_bytes": "17188", "http_referrer" : "http://192.168.0.125/"}Copy the code

As you can see, Grok has correctly identified the fields in the nginx log content, and the keys in json will become the fields we see in Kibana.

4.2 Testing the LogStash Pipline Configuration

In combination with the debugging ideas introduced in the previous section, we write a test configuration file logsta-test.conf to verify whether logstash can handle logs correctly. We input log content into logstash through keyboard input, and then check the processing result of logstash through standard output.

input { stdin { } }
filter {
  grok {
    match => { "message"= >"%{IPORHOST:remote_addr} - %{DATA:remote_user} \[%{HTTPDATE:time_local}\] \"%{WORD:request_method} %{DATA:uri} HTTP/%{NUMBER:http_version}\" %{NUMBER:response_code} %{NUMBER:body_sent_bytes} \"%{DATA:http_referrer}\" \"%{DATA:http_user_agent}\""}} # with the date plugin, use the nginx log as the logstash event date {match => ["time_local"."dd/MMM/yyyy:HH:mm:ss Z" ]
  }
}
output {
  stdout { codec => rubydebug }
}
Copy the code

4.3 Checking the Syntax of the Configuration File

We start the logstash process directly from the command line and add the –config.test_and_exit parameter at startup to check for syntax errors in the configuration file.

Full command: / usr/share/logstash/bin/logstash — config. Test_and_exit -f/root/logstash – test. Conf

You can see that at the end of the command execution, the screen displays Config Validation Result: OK, indicating that the configuration file has no syntax errors.

4.4 Verifying the Output of log content identification by logStash

Run the following command to start the logstash process:

/usr/share/logstash/bin/logstash -f /root/logstash-test.conf
Copy the code

When The logstash initialization is completed, The stdin plugin is now waiting for input: is displayed. Input The log content through keyboard input, and The output result will be obtained:

Click here to expand the full story
"@version" => "1", "body_sent_bytes" => "17188", "Message" => "192.168.0.123 - - [11/Aug/ 2020:09:31:49 +0800] \"GET /2021/07/31/ELK1/ HTTP/1.1\" 200 17188 \ \ "http://192.168.0.125/\" "Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36\", "request_method" => "GET", "Uri" => "/2021/07/31/ELK1/", "http_user_agent" => "Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML like Gecko) Chrome/92.0.4515.131 Safari/537.36", "host" => "logstash", "Remote_user" = > "-", "time_local" = > "11 / Aug / 2021:09:31:49 + 0800", "http_referrer" = > "http://192.168.0.125/", "Response_code" = > "200", "http_version" = > "1.1", "remote_addr" = > "192.168.0.123." "@ timestamp" = > 2021-08-11 T01:31:49. 000 zCopy the code

4.5 LogStash Outputs indexed log content to ES

After the above test, we can confirm that the Logstash has successfully completed the sharding and identification of nginx log content. Finally, we adjust the configuration file to output the processed log content to ES.

Edit/etc/logstash/conf. D/nginx – es. Conf is as follows:

input {
        beats {
                host => "0.0.0.0"
                port => 5400
        }
}

filter {
  grok {
    match => { "message"= >"%{IPORHOST:remote_addr} - %{DATA:remote_user} \[%{HTTPDATE:time_local}\] \"%{WORD:request_method} %{DATA:uri} HTTP/%{NUMBER:http_version}\" %{NUMBER:response_code} %{NUMBER:body_sent_bytes} \"%{DATA:http_referrer}\" \"%{DATA:http_user_agent}\"" }
  }
  date {
    match => [ "time_local"."dd/MMM/yyyy:HH:mm:ss Z" ]
  }
}

output {
        elasticsearch { 
                hosts => ["192.168.0.212:9200"] 
                index => "rc_index_pattern-%{+YYYY.MM.dd}"}}Copy the code

Finally, start or restart logstashsudo systemctl start/restart logstash. Service

5. Test results

Use a browser to visit the test page and make it generate access logs. Then enter the Kibana page and you can see that the latest collected logs have been correctly identified (figure below) :

We select the log content that has not been identified before (figure below) :

As we can see, before indexing the log content, because the Logstash cannot identify the time stamp in the log content, the index can only take the time from the database to ES as the time stamp, which will be about 1 second different from the actual time in the log content. This time difference is mainly reflected in the processing of fileBeat transmission log and Logstash. When the log content is recognized, the timestamp displayed in Kibana matches the log content.

6. Summary

This paper analyzes this scenario through nginx log, and briefly introduces how to realize field division and content identification of log content by modifying the Logstash configuration file, so that the collected log has the possibility of being retrieved and counted.

This process also briefly describes the plug-ins, tools and ideas we use to analyze log content.

With this Logstash configuration adjustment, we can easily filter the log content in the Kibana page. In real business scenarios, this tool can help us better understand what is going on in the background.

Such as, I want to see the path / 2021/07/31 / ELK1 / kibana PNG in today’s access to the records, can do this in the kibana search:

In the future, we will continue to study kibana’s visual chart function on the basis of log retrieval, so that the statistical results can be displayed intuitively.

7. Expand links

The codec plugins

The filter plugin

Grok grammar

Kibana retrieve statement KQL

Reprinted from: Rondo’s notebook