Collection, Analysis, and Visualization of log data

introduce

In recent years, the speed of Internet data generation is increasing. In order to facilitate users to find the content they want faster and more accurately, the site search or the application search has become one of the indispensable functions. At the same time, the data accumulated by enterprises is also increasing, and the demand for massive data analysis, processing and visualization is also increasing.

ElasticSearch is gaining traction in this area. Last year Elastic partnered with Alibaba Cloud to provide ElasticSearch’s cloud service, and in October Elastic went public. Elasticsearch is becoming more and more popular in the enterprise, as evidenced by the Elastic China Developer conference held in November and the fact that almost every cloud vendor now offers Elasticsearch as a cloud search service.

Let’s take a look at the introduction of the official website, OK, the core keywords: search, analysis.

Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data so you can discover the expected and uncover the unexpected.

Elasticsearch is a distributed, RESTful search and data analysis engine that addresses a growing variety of use cases. At the heart of the Elastic Stack, it stores your data centrally, helping you find what you expect and what you don’t expect.

Product advantages:

Speed is fast
scalability
The elastic
flexibility

Elasticsearch will be used in some scenarios as well as other Elastic products like Kibana and Logstash. ELK is Elasticsearch, Logstash, and Kibana. Elastic Stack refers to all of Elastic’s open source products.

Application scenarios :(image from Elastic’s website)

The scene of actual combat

Next, let’s look at an application scenario.

Scenario: If a back-end application is deployed on a cloud server, the back-end application records logs in files. The requirements are as follows: Collect log content, parse each row of logs, and obtain structured data for easy searching, processing, and visualization.

Solution: Use Filebeat to forward logs to Logstash, which parses or transforms the data and forwards it to Elasticsearch. The data is then processed by Kibana. Select * from Elasticsearch; select * from Elasticsearch; select * from Elasticsearch; select * from Elasticsearch; select * from Elasticsearch;

The product versions used in this case are as follows:

System: CentOS, which is deployed separately, can also be put together.

1. Kibana_v6.2.3 (IP: 192.168.0.26)

2, ElasticSearch_v6.2.3 (IP: 192.168.0.26)

3, filebeat_v6.2.3 (IP: 192.168.0.25)

4, logSTash_v6.2.3 (IP: 192.168.0.25)

Assume that a line of logs has the following contents (slidable) :(the log file is placed in the /root/logs directory)

One line of the log reads as follows: 2018-11-08 949 | 20:46:25, HTTPS - jsse - nio - 10.44.97.19-8979 - exec - 11 | INFO | CompatibleClusterServiceImpl. GetClusterResizeStatus. ResizeStat us=|com.huawei.hwclouds.rds.trove.api.service.impl.CompatibleClusterServiceImpl.getResizeStatus(CompatibleClusterService Impl. Java :775) 5 fields in a log line, With "|" split the 2018-11-08 20:46:25, # 949 | time HTTPS - jsse - nio - 10.44.97.19-8979 - exec - 11 # | thread name INFO | # The level of logging CompatibleClusterServiceImpl. GetClusterResizeStatus. ResizeStatus = | # Log content. Trove.api.service.impl.Com patibleClusterServiceImpl getResizeStatus # (775) CompatibleClusterServiceImpl. Java: the name of the classCopy the code

The file directories are as follows :(Elasticsearch and Kibana are on a different server and enabled)

The logs directory stores application logs to be collected. Logstash. Conf provides the configuration file for logstash.

The logstash. Conf contents are as follows (slidable) :

input {  beats {    port => 5044  }}filter {  grok {    match => { "message" => "%{GREEDYDATA:Timestamp}\|%{GREEDYDATA:ThreadName}\|%{WORD:LogLevel}\|%{GREEDYDATA:TextInformation}\|%{GREEDYDATA:ClassName}" }  }  date {    match => [ "Timestamp", "yyyy-MM-dd HH:mm:ss,SSS" ]  }}output {  elasticsearch {    hosts => "192.168.0.26:9200"    manage_template => false    index => "java_log"  }}Copy the code

Next, start the Logstash (start successfully, listen on port 5044, wait for the log data to come in) :

Take a look at the Filebeat configuration file:

Filebeat. Prospectors: - type: the log enabled: true # configuration or log file path to the log directory paths: - / root/logs / *. Logfilebeat. Config. Modules: path: ${path.config}/modules.d/*.yml reload.enabled: falsesetup.template.settings: index.number_of_shards: 3 #index.codec: best_compression #_source.enabled: falsesetup.kibana: host: "192.168.0.26:5601"# set output to logstashOut. logstash: # The logstash hosts hosts: ["localhost:5044"]Copy the code

Start Filebeat :(when the log file is updated, Filebeat will listen to it and forward it.)

Finally, take a look at Kibana for log visualization. Create an index Pattern in Kibana and name it java_log. Query log data on the Discover page.

Create the Visualize in Visualize.

Combine our graphics in Dashboard.

To this, complete a simple log data collection, analysis, visualization.

There’s a lot more that Elastic Stack can do, but let’s look at an example of in-app search.

References:

1. https://www.elastic.co/cn/blog/alibaba-cloud-to-offer-elasticsearch-kibana-and-x-pack-in-china ali cloud with Elastic company cooperation

2. Introduction to beat at https://www.elastic.co/guide/en/beats/libbeat/6.2/getting-started.html

3. https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html grok plug-in

Collection, Analysis, and Visualization of log data

Related Posts

Simple implicit routing model binding for Laravel 7

Nginx production environment to teach you how to build high availability environment

GraphQL is an awesome, flexible query language