The practice of each microservice component will involve tools. This article will introduce some of the tools used in daily microservice development to help us build more robust microservices and troubleshoot problems and performance bottlenecks in microservices.
We will focus on ELK (ELK is short for Elasticsearch, Logstash, and Kibana), or ELKB (ELK + Filebeat). Filebeat is a lightweight transport tool for forwarding and centralizing log data.
Why do you need a distributed logging system
In previous projects, if you want to use logs to locate bugs or performance problems of service services in the production environment, o&M personnel need to use commands to query log files of each service instance. As a result, troubleshooting is inefficient.
In the microservice architecture, multiple service instances are deployed on different physical machines, and the logs of each microservice are stored on different physical machines. If the cluster is large enough, the traditional way of viewing the logs as described above becomes quite inappropriate. Therefore, logs in a distributed system need to be centrally managed. Open source components such as syslog are used to collect logs from all servers.
However, after centralized log files, we are faced with the statistics and retrieval of these log files, which services have alarms and exceptions, these need to have detailed statistics. Therefore, when online faults occur before, it is common to see that development and operation and maintenance personnel download service logs to search and collect statistics based on some commands in Linux, such as grep, AWk and WC. This way is inefficient, heavy workload, and for higher requirements such as query, sorting and statistics and the huge number of machines still use this method is inevitably a little inadequate.
ELKB Distributed log system
ELKB is a complete distributed log collection system, which solves the problem of difficult log collection, retrieval and analysis mentioned above. ELKB refers to Elasticsearch, Logstash, Kibana, and Filebeat respectively. Elasticsearch is the data model layer, and Kibana is the View layer. Logstash and Elasticsearch are implemented in Java, while Kibana uses the Node.js framework.
The following describes the functions of these components and their roles in the log collection system.
Install and use Elasticsearch
Elasticsearch is a real-time full-text search and analysis engine that collects, analyzes, and stores data. Is a set of open REST and JAVA API architecture to provide efficient search capabilities, scalable distributed system. It is built on top of the Apache Lucene search engine library.
Elasticsearch can be used to search various documents. It offers scalable search, near real-time search, multi-tenancy, scalability of hundreds of service nodes, and petabytes of structured or unstructured data.
Elasticsearch is distributed, which means the index can be split into shards with zero or more copies per shard. Each node hosts one or more shards and acts as a coordinator to delegate operations to the correct shards. Rebalancing and routing are done automatically. Related data is typically stored in the same index, which consists of one or more primary shards and zero or more replication shards. Once an index is created, the number of master shards cannot be changed.
Elasticsearch is a real-time distributed search analysis engine that is used as full-text search, structured search, analysis, and a combination of these three functions. It is document-oriented, meaning it stores entire objects or documents. Elasticsearch not only stores documents, but also indexes the contents of each document so that it can be retrieved. In Elasticsearch, you index, retrieve, sort, and filter documents — not column and column data.
For convenience, we will install Elasticsearch directly using Docker:
$docker run - d - name elasticsearch docker. Elastic. Co/elasticsearch/elasticsearch: 5.4.0Copy the code
Xpack.security. enabled is enabled by default. Disable login authentication for Elasticsearch. We log inside the container and execute the following command:
$docker exec it elasticSearch bash $vim config/elasticsearch.yml cluster.name Host: 0.0.0.0 http.coron. enabled: true http.coron. allow-origin: "*" xpack.security.enabled: false # minimum_master_nodes need to be explicitly set when bound on a public IP # set to 1 to allow single node clusters # Details: https://github.com/elastic/elasticsearch/pull/17288 discovery.zen.minimum_master_nodes: 1Copy the code
After modifying the configuration file, exit the container and restart the container. To preserve the configuration for later use, we need to create a new image from the container. The first step is to get the ContainerId of the container. A new image is then submitted based on the container.
$ docker commit -a "add config" -m "dev" a404c6c174a2 es:latest
sha256:5cb8c995ca819765323e76cccea8f55b423a6fa2eecd9c1048b2787818c1a994
Copy the code
This gives us a new mirror es: Latest. We run the new image:
docker run -d --name es -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" es:latest
Copy the code
We check that the installation was successful by accessing the built-in endpoint provided by Elasticsearch.
[root@VM_1_14_centos ~]# curl 'http://localhost:9200/_nodes/http? pretty' { "_nodes" : { "total" : 1, "successful" : 1, "failed" : 0 }, "cluster_name" : "docker-cluster", "nodes" : {" 8IH5V9C-q9GA3ASUPM4CAW ": {"name" : "8iH5v9C", "transport_address" : "10.0.1.14:9300", "host" : "10.0.1.14"," IP ": "10.0.1.14", "version" : "5.4.0", "build_hash" : "780 f8c4", "roles" : [" master ", "data", "ingest"], "attributes" : { "ml.enabled" : "true" }, "http" : { "bound_address" : [ "[::]:9200" ], "publish_address" : "10.0.1.14:9200", "max_content_LENGTH_IN_bytes" : 104857600}}}}Copy the code
As you can see, Elasticsearch has been successfully installed. Elasticsearch acts as a storage source for log data and provides efficient search performance.
Elasticsearch visualization tool: Elasticsearch -head Installation method is very simple answer:
$ docker run -p 9100:9100 mobz/elasticsearch-head:5
Copy the code
Elasticsearch-head A client plug-in used to monitor the elasticSearch status, including visualization of data, adding, deleting, modifying, and querying data.
The interface after installation is as follows:
Installation and use of Logstash
Logstash is a data analysis software that focuses on analyzing logs. The principle of its use is as follows:
The data source first passes the data to the LogStash, and here we’re using Filebeat to transfer the log data. Its main components are Input data Input, Filter data source Filter and Output data Output.
Logstash filters and formats the data (in JSON format) and sends it to Elasticsearch for storage and indexing. Kibana provides a front-end page view that allows you to search the page and turn the results into charts.
Now let’s start installing using logstash. First download unzip logstash:
Wget download logstash $# # https://artifacts.elastic.co/downloads/logstash/logstash-5.4.3.tar.gz decompression logstash $tar ZXVF. - Logstash - 5.4.3. Tar. GzCopy the code
If the download speed may be slow, you can select a domestic image source. Once the decompression is successful, we need to configure the Logstash, which is basically the input, output, and filtering we mentioned.
[root@VM_1_14_centos elk]# cat logstash-5.4.3/client.conf input {beats {port => 5044 codec => "json"}} output { Elasticsearch {hosts => ["127.0.0.1:9200"] index => "logstash-app-error-%{+ YYYY.mm. Dd}"} stdout {codec => rubyDebug}}Copy the code
Enter support file, Syslog, beats, and only one of them can be selected during configuration. Here we configure the FileBeats mode.
Filtering is used to handle certain behaviors to handle the flow of events that match certain rules. Common filters include grok parsing random text and transforming it into a structured format, geoIP adding geographic information, drop discarding partial events, and mutate modifying documents. Here is an example of filter usage:
Geoip {source => "clientIp"}}Copy the code
The output supports Elasticsearch, file, Graphite and STATSD. By default, the filter data will be exported to Elasticsearch. When we do not need to export to ES, we need to specify the output mode.
An event can go through multiple outputs during processing, but once all outputs have been executed, the event completes its life cycle.
Export log information to Elasticsearch in configuration. With the configuration file in place, we start the logstash:
$bin/logstash -f client.conf Sending logstash's logs to /elk/logstash-5.4.3/logs which is now configured via Log4j2. Properties [the 2020-10-30 T14: if then, 056] [INFO] [logstash. Outputs. Elasticsearch] elasticsearch pool URLs updated {:changes=>{:removed=>[], : added = > [http://127.0.0.1:9200/]}} [the 2020-10-30 T14: if then, 062] [INFO] [logstash. Outputs. Elasticsearch] Running health check To see if an Elasticsearch Connection is working {: healthcheck_URL =>http://127.0.0.1:9200/, :path=>"/"} log4j:WARN No appenders could be found for logger (org.apache.http.client.protocol.RequestAuthCache). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. [the 2020-10-30 T14: if then, 209] [WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>#<URI::HTTP:0x1abac0 URL: HTTP: / / http://127.0.0.1:9200/ >} [the 2020-10-30 T14: if then, 225] [INFO] [logstash. Outputs. Elasticsearch] Using the mapping template from {: path = > nil} [the 2020-10-30 T14: if then, 288] [INFO] [logstash. Outputs. Elasticsearch] Attempting to install the template {:manage_template=>{"template"=>"logstash-*", "version"=>50001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"_all"=>{"enabled"=>true, "norms"=>false}, "dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword"}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date", "include_in_all"=>false}, "@version"=>{"type"=>"keyword", "include_in_all"=>false}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "Longitude" = > {" type "= >" half_float "}}}}}}}} [the 2020-10-30 T14: if then, 304] [INFO] [logstash. Outputs. Elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>[#<URI::Generic:0x2fec3fe6 URL://127.0.0.1:9200>]} [2020-10-30T14:12:26.312][INFO][logstash. Pipeline] Starting pipeline {"id"=>"main", "pipeline.workers"=>4, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "Pipeline. Max_inflight "=>500} [2020-10-30T14:12:27,226][INFO][logstash. Inputs. Starting Input Listener {:address=>"0.0.0.0:5044"} [2020-10-30T14:12:27.319][INFO][logstash Started [2020-10-30T14:12:27.422][INFO][logstash. Agent] Successfully started logstash API endpoint {:port=>9600}Copy the code
According to the log output from the console, we know that logstash has started normally.
Installation and use of Kibana
Kibana is a Web-based graphical interface for searching, analyzing, and visualizing log data stored in Elasticsearch metrics. Kibana calls the data returned by the Elasticsearch interface for visualization. It utilizes Elasticsearch’s REST interface to retrieve data, allowing users not only to create custom dashboard views of their own data, but also to query and filter data in special ways.
The installation of Kibana is relatively simple, we can install it based on Docker:
Docker run --name kibana -e ELASTICSEARCH_URL=http://127.0.0.1:9200 -p 5601:5601-d kibana:5.6.9Copy the code
We specified the environment variable for ELASTICSEARCH in the startup command, which is local 127.0.0.1:9200.
Installation and use of Filebeat
Filebeat is a lightweight transport tool for forwarding and centralizing log data. Filebeat monitors specified log files or locations, collects log events, and forwards them to Logstash, Kafka, Redis, etc., or directly to Elasticsearch for indexing.
Now let’s start installing and configuring Filebeat:
Download filebeat # $$tar wget HTTP: / / https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-5.4.3-linux-x86_64.tar.gz -zxvf filebeat-5.4.3-linux-x86_64.tar.gz $mv filebeat-5.4.3-linux-x86_64 filebeat # $CD filebeat # vi filebeat/client.yml filebeat.prospectors: - input_type: log paths: - /var/log/*.log output.logstash: hosts: ["localhost:5044"]Copy the code
In fileBeat configuration, input_type can be input from Log, Syslog, Stdin, Redis, UDP, Docker, TCP, NetFlow. The above configuration reads log information from log. Only log files in the /var/log/ directory are configured. Output configures Filebeat to use logstash and uses the logstash to perform additional processing on the data collected by Filebeat.
Once configured, we start Filebeat:
$./filebeat -e -c client.yml 2020/10/30 06:46:31.764391 beat.go:285: INFO Home path: [/elk/filebeat] Config path: [/elk/filebeat] Data path: [/elk/filebeat/ Data] Logs Path: [/elk/filebeat/ Logs] 2020/10/30 06:46:31.764426 beat. Go :186: INFO Setup Beat: filebeat; Version: 5.4.3 2020/10/30 06:46:31.764522 logstash. Go :90: INFO Max Retries set to: 3 2020/10/30 06:46:31. 764588 outputs. Go: 108: INFO Activated logstash as output plugin. 2020/10/30 06:46:31.764586 Metrics. Go :23: INFO Metrics Logging every 30s 2020/10/30 06:46:31.764664 Publish. Go :295: INFO Publisher name: VM_1_14_centos 2020/10/30 06:46:31.765299 ASYNc. go:63: INFO Flush Interval set to: 1s 2020/10/30 06:46:31.765315 async.go:64: INFO Max Bulk Size set to: 2048 2020/10/30 06:46:31.765563 beat.go:221: INFO FileBeat start running. 2020/10/30 06:46:31.765592 Registrar. Go :85: INFO Registry file set to: Elk/filebeat/data/registry 2020/10/30 06:46:31. 765630 registrar. Go: 106: The INFO Loading the registrar data from elk/filebeat/data/registry 2020/10/30 06:46:31. 766100 registrar. Go: 123: INFO States Loaded from Registrar: 6 2020/10/30 06:46:31.766136 Crawler.go :38: INFO Loading Prospectors: 1 2020/10/30 06:46:31.766209 Registrar. Go: INFO Starting Registrar 2020/10/30 06:46:31.766256 INFO Start sending events to output 2020/10/30 06:46:31.766291 Prospector_log. go:65: INFO Prospector with previous States loaded: 0 2020/10/30 06:46:31.766390 Prospector. go:124: INFO Starting prospector of type: log; Id: 2536729917787673381 2020/10/30 06:46:31.766422 crawler. Go :58: INFO Loading and starting Prospectors completed. Enabled Prospectors: 1 2020/10/30 06:46:31.766430 spooler.go:63: INFO Starting spooler: spool_size: 2048; Idle_timeout: 5s 2020/10/30 06:47:01.764888 metrics. Go :34: INFO No non-zero metrics in the last 30s 2020/10/30 06:47:31.764929 metrics. Go :34: INFO No non-zero metrics in the last 30s 2020/10/30 06:48:01.765134 metrics. Go :34: INFO No non-zero metrics in the last 30sCopy the code
When Filebeat is started, it launches one or more inputs that will be looked up in the location specified for the log data. Filebeat starts the collector for every log Filebeat finds. Each collector reads a single log for new content and sends the new log data to libbeat, which aggregates events and sends the gathered data to the output configured for Filebeat.
The usage practice of ELKB
With the ELKB components installed, we started to integrate them. First, take a look at ELKB’s log collection process.
Filebeat listens to the application’s log files and then sends the data to the LogStash, which filters and formats the data, such as JSON formatting. Then the logstash file sends the processed log data to Elasticsearch, which stores and indexes the search. Kibana provides visual view pages.
After we run all the components, let’s first look at the index change in elasticSearch-head:
If the index of Filebeat -2020.10.12 is added, the ELKB distributed log collection framework is set up successfully. Visit http://localhost:9100 to take a closer look at the index data:
As can be seen from the above two screenshots, new log data is generated in the mysqld.log file in the /var/log/ directory. The data is very large. In the production environment, we need to filter the data according to actual services and process the corresponding log format.
Elasticsearch -head is a simple elasticSearch client that allows you to search for elasticsearch data with the help of Kibana. Perform mathematical transformations and slice and dice the data as required.
Visit http://localhost:5601 to get the log information shown in the figure above. Filebeat listens to mysql logs and displays them on Kibana. Kibana is better able to process large amounts of data to create bar charts, line charts, scatter charts, histograms, pie charts and maps, which are not shown here.
summary
This paper mainly introduces the distributed log collection system ELKB. Logs are mainly used to record discrete events, including detailed information about a certain point or stage of program execution. ELKB is a good solution to the problem that many and scattered service instances are difficult to collect and analyze logs under the microservice architecture. Due to limited space, this class only introduces the installation and use of ELKB. Log frameworks such as Logrus and ZAP are generally used in Go micro-service to output logs to specified locations in a certain format. Readers can build a micro-service by themselves for practice.