This is the 8th day of my participation in the First Challenge 2022. For details: First Challenge 2022.
Follow me: Initial ELK construction for distributed link tracking
1. Introduction
In the last article, we set up the ELK environment. Initially, when adding logs in a specified folder, it can be stored in ES, but there are many problems. In this article, we will carefully stroke it.
Some of the code is in the middle of the experience, so we use the @x tag, which can be searched at the bottom.
Problem 2.
2.1 Multi-line code problem
Question why
As we all know, Java exceptions come in multiple lines of code, rendered in Kibana as previously configured, one sentence at a time, completely unusable.
Problem solving
Add a configuration under the log node of Filebeat. yml. If the regular expression does not match, the following statement will be After to become a Message. @ 1
multiline:
pattern: '^\s*(\d{4}|\d{2})\-(\d{2}|[a-zA-Z]{3})\-(\d{2}|\d{4})'
negate: true
match: after
Copy the code
2.2 Data format problems
Question why
After logs are generated, you need to clean data to ensure data quality.
Problem solving
- Create the Patterns folder under the logstash root.
- Create a pattern file.
- Edit the pattern file and add the following regular expressions for data formatting.
DATETIME \ d {4} \ d {1, 2} - \ d {1, 2} \ s + \ d {1, 2} : \ d {1, 2} : \ d {1, 2} (\. \ d {1, 3}) STACK_TRACE ((.+Exception:.*)|(.+Error:.*)|(\s+at\s.*))? (.*\s*)* LINE \|{2} VLINE \s*\|{2}\s* VLIN1 \| NOTVLINE [^\s\|]* NOTSQUAR [\]]* JAVA_LINE_LOG_SIMPLATE01 %{DATETIME:timestamp}%{LINE}%{DATA:level}%{LINE}%{DATA:logger}%{LINE}%{GREEDYDATA:more} JAVA_LINE_LOG_MULTILINE01 %{DATETIME:timestamp}%{LINE}%{DATA:level}%{LINE}%{DATA:logger}%{LINE}%{DATA:more}[\n]+%{STACK_TRACE:stacktrace}Copy the code
- Configure @2 under sync.conf of the logstash (ps: this file is created manually, see above)
2.3 Data filtering
Question why
When a lot of logs are generated, some logs are needed and some logs are not. In order to ensure data integrity, no filtering is performed. I’m just going to label it Parse_Error, the same concept as logical deletion.
Problem solving
Add the @3 configuration to the Logstash configuration file
2.4 Data time problem
Question why
I believe that most developers have encountered the problem of the outdated zone, whether MySQL, or Docker, often 8 hours.
There are two times in this environment, one is the time when the log is generated and the other is the time when ES is written.
Problem solving
Add configuration @4 to the Logstash configuration file
2.5 Redundant fields
Question why
After data cleaning, some fields are not needed, we need to delete some fields.
Problem solving
Add configuration @5 to the Logstash configuration file
3 the end
3.1 the effect
After the above configuration, the result in Es is as follows:
3.2 the search
4 Complete configuration files
4.1 filebeat
filebeat.inputs: - type: log enabled: true paths: - /Users/zyq/project/study-project/logs/*.log fields_under_root: True fields: app_type: Java ### Multiline options @1 Multiline: pattern: '^\s*(\d{4}|\d{2})\-(\d{2}|[a-zA-Z]{3})\-(\d{2}|\d{4})' negate: true match: after filebeat.config.modules: path: ${path.config}/modules.d/*.yml reload.enabled: false setup.template.settings: index.number_of_shards: 1 index.number_of_replica: 0 setup.kibana: output.logstash: # The Logstash hosts hosts: ["localhost:5044"] # pretty: true # enable: true processors: - add_host_metadata: ~ - add_cloud_metadata: ~ - add_docker_metadata: ~ - add_kubernetes_metadata: ~Copy the code
4.2 logstash
# input {beats {port => 5044}} filter{#@2 if [agent][type] == "fileBeat" {# check whether the source is Fillebeat if [app_type] == "Java" {# determine appType if "multiline" in [log][flags] {# determine multiline or single line grok {patterns_dir => ["../config/patterns"] Match => ["message", "%{JAVA_LINE_LOG_MULTILINE01}"]} mutate {add_field => {"mutiline" => "true"} else {grok { patterns_dir => ["../config/patterns"] match => [ "message" , "%{JAVA_LINE_LOG_SIMPLATE01}"] # single-line regular expression}} date {# timezone configuration @4 target => "@timestamp" timezone => "Asia/Shanghai" match => ["@timestamp","MMMM dd YYYY HH:mm: ss.sss "]} if "_grokparseFailure" in [tags] {# add_field => {"parse_error" => "true"} } } else { mutate { replace => {"message" => "%{[more]}"} } } mutate { #@5 Remove_field = > [# remove field "@ version", "agent", "host", "ecs", "input", "log", "more", "tags"] add_field = > {" logfile = >" "%{[log][file][path]}" } } } } } output { elasticsearch { hosts => ["http://localhost:9200"] index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}" #user => "elastic" #password => "changeme" } }Copy the code