Original: Curly brace MC(wechat official account: Huakuohao-MC) Focus on JAVA basic programming and big data, focus on experience sharing and personal growth.

Some things, when we brave to take the first step, in fact, has already completed 50%. Obviously, learning ELK is one of those things. Many students who did not know about ELK would not know how to start because there were so many components involved when they wanted to learn. The final result was that they would never start and would always remain in the circle.

Most programmers start learning a new language with “Hello World”. In fact, ELK has its own “Hello World”. It’s just that the “Hello World” requires a bit more components and configuration.

This article I will take you to build a real-time log search platform. Logs generated by the service system (simulation) are continuously collected to Elasticsearch, and finally displayed through Kibana.

If some of the details in this article confuse you, give yourself a break and try to make sense of it all. After all, you didn’t really understand every line of code in the hello World program when you started learning Java.

ELK is Elasticsearch, Logstash, Kibana three components of the acronym, this is the original name, now the new term is Elastic Stack, in addition to Elasticsearch, Kibana, Logstash also increased the Beats, Mainly used for data collection.

A quick note: this article is based on centos7.5, and Elastic Stack is based on the latest version 7.6. This article deals only with the setup and configuration of a single point, and the configuration and tuning of a cluster is beyond the scope of this article. This article assumes that the log format generated by business systems is as follows:

07801302020021914554950568859 | | 127.0.0.1 | 2020-02-19 14:55:49 [INFO] [Thread - 4] [com. Hello. Frank. Test. TestUser with] - user MSG jack

Elasticsearch

introduce

Elasticsearch is a distributed data storage, search and analysis engine. There are many application scenarios, but most of them are used to store log information for daily operation and maintenance and business analysis.

The installation
#downloadThe curl - L - O https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.1-linux-x86_64.tar.gz#Unpack theTar -xzvf ElasticSearch-7.6.1 -linux-x86_64.tar.gz CD ElasticSearch-7.6.1#Start the
./bin/elasticsearch
Copy the code

The configuration file for Elasticsearch is config/ elasticSearch.yml. By default, Elasticsearch only allows native access to Elasticsearch, so simply change the configuration file to 0.0.0.0 and remove the network-host comment.

You can use curl http://hostIp:9200 to test Elasticsearch.

{
  "name" : "localhost"."cluster_name" : "elasticsearch"."cluster_uuid" : "1ew0o-aXSpq8Tfv0zCWE3Q"."version" : {
    "number" : "7.6.0"."build_flavor" : "default"."build_type" : "tar"."build_hash" : "7f634e9f44834fbc12724506cc1da681b0c3b1e3"."build_date" : "The 2020-02-06 T00:09:00. 449973 z"."build_snapshot" : false."lucene_version" : "8.4.0"."minimum_wire_compatibility_version" : "6.8.0"."minimum_index_compatibility_version" : "6.0.0 - beta1"
  },
  "tagline" : "You Know, for Search"
}

Copy the code

Note: Max Virtual Memory AREAS VM. Maxmapcount [65530] is too low $sudo sysctl -w vm. Max_map_count =262144 is required to change the maximum amount of virtual memory.

FileBeat

introduce

The ELK platform provides a number of Beat components for collecting various data sources, such as FileBeat for log files, MetricBeat for system running information, and PacketBeat for network packages. Here I use the installation and use of FileBeat as an example to show how to collect application logs.

The installation
#downloadThe curl - L - O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.6.1-linux-x86_64.tar.gz#Unpack theTar XZVF filebeat 7.6.1 - Linux - x86_64. Tar. GzCopy the code
configuration

Using Filebeat for log collection, you only need to make simple changes to the configuration file. The Filebeat configuration file is filebeat.yml. All configuration is done in this file.

Enter the configuration.

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/*.log
Copy the code

This simple configuration means collecting all. Log files under /var/log. You can also configure /var/log/*/*.log to grab all.log files in the log subdirectory. Note that.log files in the log directory are not fetched.

Filebeat supports many output configurations. The most common ones are Kafka and Logstash. You can also input Elasticsearch directly.

Output configuration to Elasticsearch

output.elasticsearch:
  hosts: ["ES-host:9200"]
Copy the code

Output to the Logstash configuration.

output.logstash:
  hosts: ["logstash-host:5044"]
Copy the code

A Logstash can filter and clean logs. If the number of logs is too large, a single-node Logstash may not be enough. Filebeat supports simultaneous output to multiple Logstash files.

Load balancing Configuration The configuration of load balancing is as follows.

output.logstash:
  hosts: ["localhost:5044", "localhost:5045"]
  loadbalance: true
Copy the code

Instead of single-node configuration, add a loadBalance attribute and add the new node to the hosts array.

Note that the loadbalance attribute only applies to Redis,Logstash, and Elasticsearch. Kafka can loadbalance itself without Filebeat worrying about it.

The default work is 1. If you want to increase the number of workers, just add the work attribute.

filebeat.inputs:
- type: log
  paths:
    - /var/log/*.log
output.logstash:
  hosts: ["localhost:5044", "localhost:5045"]
  loadbalance: true
  worker: 2
Copy the code

The above configuration indicates that there are 4(host * work) workers working.

Start the

/filebeat -e -c filebeat. Yml -d “publish” start filebeat. If ES has been successfully started and your FileBeat configuration is exported to Elasticsearch. Once started, your logs will be streamed to Elasticsearch.

Filebeat uses the data directory in the Filebeat installation directory to manage log file information. If you want to collect the same log file repeatedly during the test phase, you need to clear the information in the Data directory each time and then restart the file.

Logstash

introduce

Logstash is a powerful data processing tool that helps us deal with data sent from data sources. Send the processed information to Elasticsearch. ELK plays the role of a link between the preceding and the following.

The installation
#downloadThe curl - L - O https://artifacts.elastic.co/downloads/logstash/logstash-7.6.1.tar.gz#Unpack theThe tar - XZVF logstash - 7.6.1. Tar. GzCopy the code
configuration

The Logstash plugin provides input plug-ins for reading data from various data sources. Here configure a configuration to receive the data sent by FileBeat, and then make the data simple and send it to ES for storage.

In the Logstash config directory, create a configuration file called blog-pipeline. Conf.

input {
    beats{
        port = > "5044"
    }
}

output{
    #stdout { codec => rubydebug }
    elasticsearch{
        hosts = > [ "localhost" ]
	    index = > "blog-demo"
    }
}
Copy the code

The configuration is simple and consists of two parts: input and output. The port number in the input section is the Filebeat port number mentioned above. The configuration portion of output represents the output to ES. The annotated line in the configuration file is convenient for debugging. If you open that line, you can print the output to the console, which is convenient for debugging, so as not to pollute the production data in ES during debugging.

The index concept of Elasticsearch can be easily understood as the concept of a table in a relational database. If the index item is not configured, the default is logstash.

Start the

Start the logstash with./bin/logstash -f./config/blog-pipeline. You can see the collected logs in the Logstash console when you start them (remember to use stdout {codec => rubyDebug} when debugging).

Grok filter

If you just store the log information in ES as-is, there is no need to use a Logstash. Because FileBeat can also complete related work well. If you want to do something with log information, Grok’s Filter plugin is a good idea.

Grok is a ready-to-use regular expression library that encapsulates and customizes regular expressions. If you need to debug grok, you can use this url. The url for debugging should be strong.

Let’s do regular matching for the logging sample mentioned at the beginning of this article. Of course, this filter is just to demonstrate how Grok is used. It does not filter out any part of the log information. In practice, you can extract the log information of interest and put it into the library.

The Grok plugin is easy to use, just add the following information to the Logstash configuration file.

filter {
    grok {
		match = > { "message" = > "\ |%{GREEDYDATA:traceid}\ |%{IPV4:serverip}\ |%{GREEDYDATA:logdate}\ [%{LOGLEVEL:loglevel}\] \ [%{GREEDYDATA:thread}\] \ [%{GREEDYDATA:classname}\] -%{GREEDYDATA:logmessage}" }
        overwrite = > [ "message" ]
		 remove_field = > [ "host","ecs",
			  "agent","version",
			  "log". "input". "tags".
			  "@version","message" ]
	}
}
Copy the code

The complete configuration of the Logstash is as follows

input {
    beats{
        port = > "5044"
    }
}

filter {
    grok {
		match = > { "message" = > "\ |%{GREEDYDATA:traceid}\ |%{IPV4:serverip}\ |%{GREEDYDATA:logdate}\ [%{LOGLEVEL:loglevel}\] \ [%{GREEDYDATA:thread}\] \ [%{GREEDYDATA:classname}\] -%{GREEDYDATA:logmessage}" }
        overwrite = > [ "message" ]
		 remove_field = > [ "host","ecs",
			  "agent","version",
			  "log". "input". "tags".
			  "@version","message" ]
	}
}

output{
    #stdout { codec => rubydebug }
    elasticsearch{
        hosts = > [ "es-host" ]
	    index = > "blog-demo"
    }
}
Copy the code
Start the

Start the Logstash again. If you are using stdout {codec => rubyDebug}, the output on the screen looks like this.

Kibana

introduce

Kibana is an open source platform for data analysis and presentation. Often used with ElasticSearch. ElasticSearch provides a data search and analysis interface for ElasticSearch.

The installation
#downloadThe curl - L - O https://artifacts.elastic.co/downloads/kibana/kibana-7.6.1-linux-x86_64.tar.gz#Unpack theTar XZVF kibana 7.6.1 - Linux - x86_64. Tar. Gz#Start theCD kibana - 7.6.1 - Linux - x86_64 /. / bin/kibanaCopy the code
configuration

The configuration file for Kibana is config/kibana.yml. The default port number is 5601. You need to tell Kibana which ES to connect to before starting Kibana. In the configuration file, modify the following parameters :[“http://ES-host:9200”]

You can now access Kibana from your browser. If you want to query the Elasticsearch logs you need to do the following on the Kibana page.

In the left sidebar of Kibana, go to Management and click on it. Find Kibana index Management and add blog- Demo index (Logstash index name). Once you’ve added it, you can search. I’m going to show you a screenshot, just to give you a sense of it, with arrows pointing out the areas of interest.

The end of the

This is just a primer on how every single component in the Elastic Stack can make a difference, and if you have the energy to delve deeper, there’s a lot to be learned.


Recommended reading

How to learn Java NIO

This may be the nature of the conflict between product and development

Essential Reading List for Java Programmers – Basic Edition

, END,

Curly braces MC

Java· Big Data · Personal growth

Wechat id: Huakuohao-MC