1 ELKIntroduction to technology Stack


The necessity of log analysis

  • Logs can provide us with necessary information about the behavior of the system. However, the content and format of logs may be different for each different service, or for different components of the same system
  • Because logs are diverse, they are useful, for example, for troubleshooting, performing simple status checks, or generating reportsWebServer logs can be used to analyze traffic patterns across multiple products. Through the logs of e-commerce websites, it can be analyzed whether packages sent from a particular location are frequently returned, and what are the possible reasons
  • Here are some common use cases using log analysis
  1. Problems in debugging
  2. Performance analysis
  3. Safety analysis
  4. Forecast analysis
  5. The Internet of things (IoT) log

Problems in debugging

  • One of the most common understandings of enabling logging in an application. The simplest and most frequent use of debug logs is to look for specific error messages or events that occurred
  • Once thebugOr if a problem is located, the log analysis solution can help capture application information and provide a log snapshot of the problem as it occurs for the development team to use for further analysis

Performance analysis

  • Log analysis helps to optimize or debug the performance of a system, often in understanding how resources are used in the system. Logging can help analyze the usage of individual resources in a system, multithreaded behavior in an application, and potential deadlock conditions

For example, you can look at response times and HTTP response codes in the Web server logs to see what each service is like

Safety analysis

  • For any organization, logs can play a key role in managing application security, especially in detecting security vulnerabilities, application abuse, and malicious attacks

Forecast analysis

  • One of the research hotspots in recent years. Log and event data can be used for very accurate predictive analysis. Predictive analytics models help identify potential customers, plan resources, manage and optimize inventory, and improve the efficiency of workloads and resource scheduling. It helps guide marketing strategies, user targeting, AD delivery strategies, and more

Internet of Things Log

  • When it comes to iot devices, the system is monitored and managed to keep downtime to a minimum and resolve any major issues quicklybugLogs are critical to the problem

Challenges of log analysis

  • The current log analysis process focuses on examining logs on multiple servers that are logged by different components and systems in the application. Analyzing logs is a time-consuming and tedious task
  1. Inconsistent log format
  2. Discrete log
  3. The need for expertise

Inconsistent log format

  • Each application or device has its own format for logging, and each format requires its own expert to interpret. It is also very difficult to search between different log formats

Discrete log

  • In applications, logs tend to be distributed across different servers and different components. Multiple components log in multiple locations, increasing the complexity of log analysis

ELKTechnology stack

  • ELKThe platform is a complete log analysis solution,ELKUse the open source technology stackElasticsearchFor deep search and data analysis;LogstashCentrally manages logs, including transferring and forwarding logs from multiple servers, and enriching and parsing logs. Finally,Kibana, provides powerful and beautiful data visualization.ELKThe technology stack is currently dominated byElasticCompany maintenance and support

Elasticsearch

  • Is based onApache LuceneDistributed open source search engine, usedApache2.0Open source distribution (meaning free to download, use, or modify). It’s inLuceneIn addition to real-time search provides scalability, reliability and multi-tenant capabilities.ElasticsearchThe functionality can be accessed through theJSONtheRESTfulAPITo use the
  • A lot of big companies use itElasticsearch, includingGithub,SoundCloud,FourSquare,Netflix, and many other famous companies. Here are some typical user cases

Wikipedia: Use ES to provide text search, as well as product features such as search as you type and search suggestions

Github: Index over 8 million lines of code base and cross-platform events using ES to provide real-time search capabilities

  • esKey features include
  1. It is an open source distributed, scalable, and highly available real-time document storage
  2. Provides real-time search and analysis capabilities
  3. Provides a complexRESTful API, including find and various other features such as batch search, geolocation search, autocomplete, contextual search suggestions, and result fragments
  4. It’s easy to scale horizontally, and it’s easy to scale horizontally with other cloud infrastructures likeAWSSuch as integrated

Logstash

  • Is a data pipeline used to collect, parse, and analyze large amounts of structured and unstructured data and events generated by various systems.LogstashInput plug-ins are provided to support different data sources and platforms, designed to efficiently process logs, events, and unstructured data sources, and then output plug-ins such as files, standard outputs (such as output to run)LogstashConsole) oresAnd so on output result data
  • LogstashThe key features
  1. Centralized data processing:LogstashThrough the data pipeline, data can be processed centrally. Using different input and output plug-ins, you can convert a variety of input sources into a single usable format
  2. Supports custom log formats: Logs generated by different applications usually have different special formats.LogstashAnalyze and process logs in large custom formats.LogstashIt comes with a lot of filter plugins out of the box, and also supports users to write custom plugins
  3. Plug-in development: You can develop and publish custom plug-ins. There are actually a number of custom plug-ins available today

Kibana

  • Is based onApache2.0Open source data visualization platform under open Source license. It can be stored onesA variety of structured and unstructured data in the index for visual presentation
  • KibanaKey features are as follows
  1. It provides a flexible analysis and visualization platform for business intelligence
  2. It provides real-time analysis, summarization, charting, and debugging capabilities
  3. Provides an intuitive and user-friendly interface, and is highly customizable to drag and drop and align charts as needed
  4. Multiple dashboards can be managed and saved. Dashboards can be shared and embedded across multiple systems
  5. Snapshots of log search results can be shared and different problem handling processes can be isolated

ELKData pipeline

  • A typicalELKThe data pipeline of the technology stack looks like the figure below

  • In a typicalELKIn the data pipeline of the technology stack, logs from multiple application servers pass throughLogstashThe collector is transferred to a centralized indexer, which outputs the processed data toesCluster, and thenKibanaThrough the queryesCreate dashboards for visualizing log data in a cluster

Elasticsearch

  • esThe configuration file is usually stored in the installation directoryconfigDirectory. I have two fileselasticsearch.ymlandlogging.yml. The former configurationesProperties of different modules, such as network address, path, and so on, which are used to configure their own logging options

The path

  • Specifies the path to the data and log files
path:
  logs: /var/log/elasticserach
  data: /var/data/elasticsearch
Copy the code

The cluster name

  • Specify the name of the production cluster based on which the cluster will automatically discover and add nodes
cluster:
  name: <name of your cluster>
Copy the code

The node name

  • Specify the default name for each node
node:
  name: <name of your node>
Copy the code

Logstash

bin/logstash -e 'input { stdin{} } output { stdout{} }'
Copy the code
$ logstash -e 'input { stdin{} } output { stdout{} }'Sending Logstash logs to/usr/local/Cellar/Logstash / 6.4.2 / libexec/logs which is now configured via log4j2. Properties [the 2020-06-19 T01: probably, 205] [WARN] [. Logstash config. Source. Multilocal] Ignoring the 'pipelines. Yml file because modules or The command line options are specified [2020-06-19T01:09:49,726][INFO][logstash. Runner] Starting logstash [INFO][logstash. Pipeline] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>12, "pipeline.batch.size"=>125, "Pipeline. Batch.delay "=>50} [2020-06-19T01:09:51.463][INFO][logstash. Pipeline] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x6e7affd run>"} The stdin plugin is now waiting for input: < span style = "box-sizing: border-box; line-height: 22px; word-break: inherit! Important; word-break: inherit! Important; : Non_running_secure =>[]} [2020-06-19T01:09:51:918][INFO][logstash. Agent] Successfully started Logstash API endpoint ABC {9600} {: port = > "@ version" = > "1", "@ timestamp" = > T17 2020-06-18:. 696 z, "the host" = > "YEEDOMLIU - MB12", "message" => "abc" }Copy the code
bin/logstash -e 'input { stdin{} } output { stdout{ codec => rubydebug } }'
Copy the code
$ logstash -e 'input { stdin{} } output { stdout{ codec => rubydebug } }'Sending Logstash logs to/usr/local/Cellar/Logstash / 6.4.2 / libexec/logs which is now configured via log4j2. Properties [the 2020-06-19 T01:11:09, 563] [WARN] [. Logstash config. Source. Multilocal] Ignoring the 'pipelines. Yml file because modules or The command line options are specified [2020-06-19T01:11:10.091][INFO][logstash. Runner] Starting logstash [INFO][logstash. Pipeline] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>12, "pipeline.batch.size"=>125, "Pipeline. Batch. delay"=>50} [2020-06-19T01:11:12.060][INFO][logstash. Pipeline] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x3b0f0f77 run>"} The stdin plugin is now waiting for input: < span style = "box-sizing: border-box; display: block; line-height: 22px; word-break: inherit! Important;" : Non_running_secure =>[]} [2020-06-19T01:11:12.334][INFO][logstash. Agent] Successfully started Logstash API endpoint  {:port=>9600} hello baby { "message" => "hello baby", "@version" => "1", "host" => "YEEDOMLIU-MB12", "@ timestamp" = > 2020-06-18 T17: everyone. 873 z}Copy the code
  • The output above isLogstashThe most common form
  1. Message: Contains complete input information or events
  2. @timestamp: contains the time when the event was indexed. If you’re using a date filter plug-in, it might bemessageA field specifying the event time in
  3. Host: Typically represents the host of the event

LogstashFile input plug-in

For example, read an Apache log file as input and then print it to standard output

input {
  file {
    type => "apache"
    path => "/user/packpub/intro-to-elk/elk.log"
  }
  output {
    stdout { codec => rubydebug }
  }
}
Copy the code

LogstashtheElasticsearchOutput plug-in

bin/logstash -e 'input { stdin{} } output { elasticsearch{ host = localhost } }'
Copy the code

configurationLogstash

  • LogstashIs used in the configuration fileJSONFormat, available through –flagThe parameter specifies the path to the configuration file, and can even be a directory containing multiple configuration files of different types such as input, filtering, and output plug-ins
bin/logstash -f .. /conf/logstash.confCopy the code

If you want to test the configuration file for syntax errors before running it, you can run the following command: bin/logstash -configtest.. /conf/logstash. Conf The above command only checks the configuration file, not actually running the logstash

LogstashThe plug-in

  • Common plug-ins fall into three categories
  1. Input plug-in
  2. Filter plugin
  3. Output plug-in

Input plug-in

  1. File: Reads the event stream from a log file
  2. RedisFrom:redisInstance to read the event stream
  3. Stdin: Reads the event stream from standard input
  4. Syslog: Through the networksyslogThe event flow is read in the message
  5. GangliaThrough:udpNetwork readinggangliaThe flow of events in a package
  6. LumberjackUse:lumberjackThe protocol reads the event stream
  7. EventlogFrom:WindowsThe event stream is read from the event log
  8. S3: From Amazons3The stream of events is read from a file in the bucket
  9. ElasticsearchFrom:elasticsearchThe stream of events is read from the search results of the cluster

Filter plugin

  1. Date: Parses the date field from the incoming event asLogstashthetimestampfield
  2. Drop: Discard all data that meets specific filtering criteria from incoming events
  3. Grok: a very powerful filtering plug-in that parses unstructured log events into structured data
  4. multiline: Parses multiple rows of data from the same input source into a single log event
  5. dns: Resolves any specified field toIPaddress
  6. mutate: You can rename, delete, modify, or replace any field in the event
  7. geoip: according to theMaxmind IPDatabase, willIPFields parse out information about a geographic location

Output plug-in

  1. file: Writes the event to a file on disk
  2. e-mail: Sends a message based on some specific criteria when it receives output
  3. elasticsearch: Saves the output data toesIn the cluster
  4. stdout: writes the event to standard output
  5. redis: writes the event toredisThe queue acts as the proxy
  6. mongodb: Writes output informationmongodb
  7. kafka: Writes the eventkafkaThe theme of the

Kibana

  • The configuration file is incofnigdirectory

config/kibana.yml

port: 5601

host: “localhost”

elasticsearch_url: http://localhost:9200

interface

  1. explore
  2. visualization
  3. The dashboard
  4. Set up the

explore

  • Interactively view data that matches the selected index schema. Submit search queries, filter search results, and view document data

visualization

  • Create new visualizations based on different data sources, such as new switched searches, saved searches, or other existing visualizations

The dashboard

  • A collection of visual widgets saved in different groups