1. Introduction
We have completed the ELK cluster construction in the article “ELK Special Topic: Day1 — Minimize ELK cluster construction and Collect Nginx logs”, and successfully collected Nginx logs and displayed them on the Kibana page. However, we found that in Kibana’s page, the whole section of Nginx logs collected was only stored as a field in text format, without parsing and identifying the log content.
We will investigate and solve this problem later.
The cluster architecture diagram is as follows:
2. Requirements analysis
2.1 Nginx Log format
Module ngx_http_log_module: default Nginx log format: combined
log_format combined '$remote_addr - $remote_user [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_user_agent"';Copy the code
In this experimental environment, one of the access_log contents of Nginx is as follows:
[11/Aug/ 2020:09:31:49 +0800] "GET /2021/07/31/ELK1/ HTTP/1.1" 200 17188 "http://192.168.0.125/" "Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"Copy the code
Combined with the Nginx log format configuration, we can get the following:
remote_addr | 192.168.0.123 |
remote_user | null |
time_local | 11/Aug/2021:09:31:49 +0800 |
request | GET / 2021/07/31 / ELK1 / HTTP / 1.1 |
status | 200 |
body_bytes_sent | 17188 |
http_referer | http://192.168.0.125/ |
http_user_agent | Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36 |
2.2 Index contents in ES
As we can see from Kibana’s page, ES recognizes Nginx logs as a text field:
In Kibana Discover page, only the message field is selected, and the display is as follows:
See the full index in ES as follows:
Click to expand the complete JSON
{" _index ":" rc_index_pattern - 2021.08.11 ", "_type" : "_doc", "_id" : "iPzWMnsB - S8uwXpkUkQ3", "_version" : 1, "_score" : Null, "fields": {"agent.version.keyword": ["7.13.4"], "input.type. Keyword ": ["log"], "host.name. Keyword ": [ "hexo" ], "tags.keyword": [ "beats_input_codec_plain_applied" ], "agent.hostname.keyword": [ "hexo" ], "agent.type": [" filebeat "], "ecs. Version. Keyword:" [] "1.8.0 comes with", "@ version:"/" 1 ", "agent. The name:" [] "hexo", "the host. The name" : [ "hexo" ], "log.file.path.keyword": [ "/var/log/nginx/hexo_access.log" ], "agent.type.keyword": [ "filebeat" ], "agent.ephemeral_id.keyword": [ "43714af4-5bf6-43c0-9c1a-4c223fddc273" ], "agent.name.keyword": [ "hexo" ], "agent.id.keyword": [ "d2f43da1-5024-4000-9251-0bcc8fc10697" ], "input.type": [ "log" ], "@version.keyword": [ "1" ], "log.offset": [ 4678 ], "agent.hostname": [ "hexo" ], "message": [" 192.168.0.123 - [11 / Aug / 2021:09:31:49 + 0800] \ "GET / 2021/07/31 ELK1 / HTTP / 1.1 \" 200\17188 "http://192.168.0.125/\" \"Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML like Gecko) Chrome/92.0.4515.131 Safari/537.36\"], "tags": ["beats_input_codec_plain_applied"], "@timestamp": [" 2021-08-11t01:31:50.630z "], "agent. Id ": [" D2F43DA1-5024-4000-9251-0BCC8FC10697 "], "ECs. version": ["1.8.0"], "log.file.path": [ "/var/log/nginx/hexo_access.log" ], "message.keyword": [" 192.168.0.123 - [11 / Aug / 2021:09:31:49 + 0800] \ "GET / 2021/07/31 ELK1 / HTTP / 1.1 \" 200\17188 "http://192.168.0.125/\" \"Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36\"], "agent.ephemeral_id": [" 43714AF4-5BF6-43C0-9C1A-4c223fDDC273 "], "agent. Version ": ["7.13.4"]}, "sort": [1628645510630]}Copy the code
2.3 Problem Analysis
It is not difficult to see that when Nginx logs are stored in ES, ES does not carry out field division and content recognition for log content, so we cannot search and count log content on Kibana.
To solve this problem, we can modify the Logstash configuration to complete the field division and content identification of the log content before the log content is stored in ES.
3. Logstash configuration roadmap
By referring to How Logstash Works, we can see that Logstash can realize log recognition and field arrangement functions through rich plug-ins.
3.1 How do I Modify The Collected Log Content
In the cluster building process of Day1, our Logstash service does not use the filter function. However, in the requirements of this paper, filter is required to process the collected log content before warehousing.
3.2 How can I Segment and Identify Log Content
Grok is currently one of the most widely used logstash plug-ins, making it easy to identify text-structured content and make it easy to retrieve.
Description: Debugging grok expressions
3.3 How do I Test the Configuration
We can reference the codecs plug-in in the Output of the Logstash. When the output to codecs is configured in the Logstash, we can know the output result of the final logstash through screen printing. In addition, we can introduce different formatting plug-ins, such as RubyDebug, to make the output easier to read.
3.4 LogSTSH Pipeline Configuration draft
Based on the above analysis, we can initially determine that the logstash Pipline test configuration is roughly as follows:
Input {stdin {}} # Output {stdout {codec => rubydebug} output {stdout {codec => rubydebug}Copy the code
4. Modify the Logstash configuration
4.1 Using Grok to Identify Log Content
On the main menu of the Kibana page, there is the Dev Tools tool, which includes the Grok Debugger and makes it easy to debug log content.
In Sample Data we enter the Nginx logs that need to be identified and matched:
[11/Aug/ 2020:09:31:49 +0800] "GET /2021/07/31/ELK1/ HTTP/1.1" 200 17188 "http://192.168.0.125/" "Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36"Copy the code
Based on the log content and grok syntax, we can type the following in the grok Pattern:
%{IPORHOST:remote_addr} - %{DATA:remote_user} \[%{HTTPDATE:time_local}\] \"%{WORD:request_method} %{DATA:uri} HTTP/%{NUMBER:http_version}\" %{NUMBER:response_code} %{NUMBER:body_sent_bytes} \"%{DATA:http_referrer}\" \"%{DATA:http_user_agent}\"
Copy the code
Finally click Simulate to display the identification results:
Click to expand the complete JSON
{"remote_addr": "192.168.0.123", "response_code": "200", "time_local": "11/Aug/ 209:09:31:49 +0800", "http_version": "1.1", "request_method" : "GET", "uri" : "/ 2021/07/31 ELK1 /", "http_user_agent" : "Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36", "remote_user": "-", "body_sent_bytes": "17188", "http_referrer" : "http://192.168.0.125/"}Copy the code
As you can see, Grok has correctly identified the fields in the nginx log content, and the keys in json will become the fields we see in Kibana.
4.2 Testing the LogStash Pipline Configuration
In combination with the debugging ideas introduced in the previous section, we write a test configuration file logsta-test.conf to verify whether logstash can handle logs correctly. We input log content into logstash through keyboard input, and then check the processing result of logstash through standard output.
input { stdin { } }
filter {
grok {
match => { "message"= >"%{IPORHOST:remote_addr} - %{DATA:remote_user} \[%{HTTPDATE:time_local}\] \"%{WORD:request_method} %{DATA:uri} HTTP/%{NUMBER:http_version}\" %{NUMBER:response_code} %{NUMBER:body_sent_bytes} \"%{DATA:http_referrer}\" \"%{DATA:http_user_agent}\""}} # with the date plugin, use the nginx log as the logstash event date {match => ["time_local"."dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
stdout { codec => rubydebug }
}
Copy the code
4.3 Checking the Syntax of the Configuration File
We start the logstash process directly from the command line and add the –config.test_and_exit parameter at startup to check for syntax errors in the configuration file.
Full command: / usr/share/logstash/bin/logstash — config. Test_and_exit -f/root/logstash – test. Conf
You can see that at the end of the command execution, the screen displays Config Validation Result: OK, indicating that the configuration file has no syntax errors.
4.4 Verifying the Output of log content identification by logStash
Run the following command to start the logstash process:
/usr/share/logstash/bin/logstash -f /root/logstash-test.conf
Copy the code
When The logstash initialization is completed, The stdin plugin is now waiting for input: is displayed. Input The log content through keyboard input, and The output result will be obtained:
Click here to expand the full story
"@version" => "1", "body_sent_bytes" => "17188", "Message" => "192.168.0.123 - - [11/Aug/ 2020:09:31:49 +0800] \"GET /2021/07/31/ELK1/ HTTP/1.1\" 200 17188 \ \ "http://192.168.0.125/\" "Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36\", "request_method" => "GET", "Uri" => "/2021/07/31/ELK1/", "http_user_agent" => "Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML like Gecko) Chrome/92.0.4515.131 Safari/537.36", "host" => "logstash", "Remote_user" = > "-", "time_local" = > "11 / Aug / 2021:09:31:49 + 0800", "http_referrer" = > "http://192.168.0.125/", "Response_code" = > "200", "http_version" = > "1.1", "remote_addr" = > "192.168.0.123." "@ timestamp" = > 2021-08-11 T01:31:49. 000 zCopy the code
4.5 LogStash Outputs indexed log content to ES
After the above test, we can confirm that the Logstash has successfully completed the sharding and identification of nginx log content. Finally, we adjust the configuration file to output the processed log content to ES.
Edit/etc/logstash/conf. D/nginx – es. Conf is as follows:
input {
beats {
host => "0.0.0.0"
port => 5400
}
}
filter {
grok {
match => { "message"= >"%{IPORHOST:remote_addr} - %{DATA:remote_user} \[%{HTTPDATE:time_local}\] \"%{WORD:request_method} %{DATA:uri} HTTP/%{NUMBER:http_version}\" %{NUMBER:response_code} %{NUMBER:body_sent_bytes} \"%{DATA:http_referrer}\" \"%{DATA:http_user_agent}\"" }
}
date {
match => [ "time_local"."dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch {
hosts => ["192.168.0.212:9200"]
index => "rc_index_pattern-%{+YYYY.MM.dd}"}}Copy the code
Finally, start or restart logstashsudo systemctl start/restart logstash. Service
5. Test results
Use a browser to visit the test page and make it generate access logs. Then enter the Kibana page and you can see that the latest collected logs have been correctly identified (figure below) :
We select the log content that has not been identified before (figure below) :
As we can see, before indexing the log content, because the Logstash cannot identify the time stamp in the log content, the index can only take the time from the database to ES as the time stamp, which will be about 1 second different from the actual time in the log content. This time difference is mainly reflected in the processing of fileBeat transmission log and Logstash. When the log content is recognized, the timestamp displayed in Kibana matches the log content.
6. Summary
This paper analyzes this scenario through nginx log, and briefly introduces how to realize field division and content identification of log content by modifying the Logstash configuration file, so that the collected log has the possibility of being retrieved and counted.
This process also briefly describes the plug-ins, tools and ideas we use to analyze log content.
With this Logstash configuration adjustment, we can easily filter the log content in the Kibana page. In real business scenarios, this tool can help us better understand what is going on in the background.
Such as, I want to see the path / 2021/07/31 / ELK1 / kibana PNG in today’s access to the records, can do this in the kibana search:
In the future, we will continue to study kibana’s visual chart function on the basis of log retrieval, so that the statistical results can be displayed intuitively.
7. Expand links
The codec plugins
The filter plugin
Grok grammar
Kibana retrieve statement KQL
Reprinted from: Rondo’s notebook