Logstash- Data stream engine

The author | WenasWei

The Logstash

Logstash is an open source data collection engine with real-time pipelining capabilities. Logstash dynamically unifies data from different sources and standardizes the data to a target location of your choice. Clean up and democratize all data for use in a variety of advanced downstream analysis and visualization use cases.

1.1 introduction of Logstash

Logstash is a data stream engine:

  • It is an open source streaming ETL (extract-Transform-load) engine for data logistics
  • Establish a data flow pipeline in a matter of minutes
  • Horizontal scalability and toughness with adaptive buffering
  • Unknowable data source
  • Plug-in ecosystem with more than 200 integrations and processors
  • Use Elastic Stack to monitor and manage deployments

Logstash is an Open Source Data collection engine with real-time Pipelining capabilities. Simply put, logstash is a pipeline with real-time data transmission capability. It is responsible for transferring data information from the input end of the pipeline to the output end of the pipeline. At the same time, this pipe also allows you to add filters in the middle according to your own needs. Logstash provides a variety of powerful filters to suit your various application scenarios.

1.2 Data Processing

Logstash is a powerful tool that can be integrated with a variety of deployments. It provides a number of plug-ins that help you parse, enrich, transform, and buffer data from a variety of sources. If your data requires additional processing that isn’t in Beats, add Logstash to your deployment.

The most popular data sources today:

Logstash takes in logs, files, metrics, or real data from the web. After being processed by Logstash, it becomes usable data that can be consumed by Web Apps, stored in data centers, or transformed into other streaming data:

  • Logstash makes it easy to work with Beats, which is the recommended approach
  • Logstash also works with well-known cloud vendors’ services to process their data
  • It can also work with similar message queues such as Redis or Kafka
  • Logstash can also use JDBC to access RDMS data
  • It can also work with IoT devices to process their data
  • Not only does Logstash send data to Elasticsearch, but it also sends data to many other destinations and serves as their input source for further processing

Two Logstash system architecture

The Logstash consists of 3 main parts: inputs, filters and outputs

The Logstash event processing pipeline has three main roles: inputs — > filters — > outputs: inputs — > outputs

  • Inpust: must, responsible for generating events, commonly used: File, syslog, redis, Kakfa, beats (e.g., Filebeats)
  • Filters: Optional. Filters modify them. Commonly used: grok, mutate, drop, clone, geoip
  • Outputs ship them elsewhere. Commonly used: ElasticSearch, File, Graphite, Kakfa, STATSD

Three Logstash installation

3.1 Environment List
  • Operating system: Linux #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 x86_64
  • Logstash Version: Logstash -6.2.4
  • Jdk version: 1.8.0_152
3.2 Installing the JDK on Linux
3.2.1 Decompress the package and move it to a specified directory (/usr/local)
(1) the decompression
tar -zxvf jdk-8u152-linux-x64.tar.gz
Copy the code
(2) Create a directory
mkdir -p /usr/local/java
Copy the code
(3) Move the installation package
The mv jdk1.8.0 _152 / / usr/local/Java /Copy the code
(4) Set the owner
chown -R root:root /usr/local/java/
Copy the code
3.2.2 Configuring Environment Variables
(1) Configure system environment variables
vi /etc/environment
Copy the code
(2) Add the following statement
PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games" export JAVA_HOME = / usr/local/Java/jdk1.8.0 _152 export JRE_HOME = / usr/local/Java/jdk1.8.0 _152 / jre export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/libCopy the code
(3) Configure user environment variables
nano /etc/profile
Copy the code
(4) Add the following statement (must be placed in the middle)
if [ "$PS1" ]; then if [ "$BASH" ] && [ "$BASH" != "/bin/sh" ]; then # The file bash.bashrc already sets the default PS1. # PS1='\h:\w\$ ' if [ -f /etc/bash.bashrc ]; then . /etc/bash.bashrc fi else if [ "`id -u`" -eq 0 ]; Then PS1='# 'else PS1='$' fi fi FI export JAVA_HOME=/usr/local/ Java /jdk1.8.0_152 export JRE_HOME = / usr/local/Java/jdk1.8.0 _152 / jre export CLASSPATH = $CLASSPATH: $JAVA_HOME/lib: $JAVA_HOME/jre/lib export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$HOME/bin if [ -d /etc/profile.d ]; then for i in /etc/profile.d/*.sh; do if [ -r $i ]; then . $i fi done unset i fiCopy the code
(5) User environment variables take effect
source /etc/profile
Copy the code
(6) Check whether the installation is successful
$Java -version Java version "1.8.0_152" Java(TM) SE Runtime Environment (build 1.8.0_152-B16) Java HotSpot(TM) 64-bit Server VM (Build 25.152-B16, Mixed mode)Copy the code
3.3 installation Logstash
3.3.1 Creating an Installation Directory
$ sudo mkdir /usr/local/logstash
Copy the code
3.3.2 Downloading the Logstash installation file
$wget - P/usr/local/logstash https://artifacts.elastic.co/downloads/logstash/logstash-6.2.4.tar.gzCopy the code
3.3.2 Decompressing the installation file
$CD /usr/local/logstash/ $sudo tar -zxvf logstash-6.2.4.tar.gzCopy the code
3.3.3 Verifying the installation is successful

Test: fast start, standard input output as input and output, no filter

$CD logstash-6.2.4/ $./bin/logstash -e 'input {stdin {}} output {stdout {}}' Sending logstash's logs to / usr/local/logstash/logstash - 6.2.4 / logs which is now configured via log4j2. Properties [the 2021-05-27 T00: ", 729] [INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"fb_apache", : / usr/local/directory = > "logstash/logstash - 6.2.4 / modules/fb_apache/configuration"} [the 2021-05-27 T00: ", 804] [INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"netflow", : / usr/local/directory = > "logstash/logstash - 6.2.4 / modules/netflow/configuration"} [the 2021-05-27 T00: the prosperous, 827] [WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are Specified [2021-05-27T00:22:30:979][INFO][logstash. Runner] Starting logstash {"logstash. Version "=>"6.2.4"} [2021-05-27T00:22:31.821][INFO][logstash. Agent] Successfully started logstash API endpoint {:port=>9600} [2021-05-27T00:22:36.463][INFO][logstash. Pipeline] Starting pipeline {:pipeline_id=>"main", "pipeline "=>1, "pipeline.batch.size"=>125, "Pipeline. Batch. delay"=>50} [2021-05-27T00:22:36,690][INFO][logstash. Pipeline] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x55a5abea run>"} The stdin plugin is now waiting for input: < span style = "box-sizing: border-box; line-height: 22px; display: block; word-break: inherit! Important; < span style = "box-sizing: border-box; word-break: inherit! Important; word-break: inherit! Important; ## {"@timestamp" => 2021-05-26t16:22:52.527z, "host" => "*******", "message" => "hello world", "@version" => "1" }Copy the code

Four Logstash parameters and configuration

4.1 Common Startup Parameters
parameter instructions For example,
-e Execute immediately, starting the instance with the configuration parameters on the command line /bin/logstash -e ‘Input {stdin {}} output {stdout {}}’
-f Specifies the configuration file to start the instance ./bin/logstash -f config/test.conf
-t Test the correctness of the configuration file ./bin/logstash-f config/test.conf -t
-l Specify the log file name ./bin/logstash-f config/test.conf -l logs/test.log
-w Specifies the number of filter threads. The default number is 5 ./bin/logstash-f config/test.conf -w 8
4.2 Configuration File Structure and Syntax
(1) section

Logstash defines a region with {}. Plug-ins can be defined in this region. Multiple plug-ins can be defined in one region, as follows:

input {
    stdin {
    }
    beats {
        port => 5044
    }
}
Copy the code
(2) Data type

Logstash supports only a small number of data types:

  • Boolean: ssl_enable => true
  • Number: port => 33
  • String: name => “Hello world”
  • Commonts: # this is a comment
(3) Field reference

The data in the Logstash data stream is called an Event object. The Event is made up of a JSON structure, and the properties of the Event are called fields. If you want to reference these fields in a configuration file, just write the name of the field in brackets [], such as [type]. For nested fields, write the field name in [], for example: [tags][type]; In addition, the Logstash arrag type supports subscript and reverse order tables, such as [tags][type][0],[tags][type][-1].

(4) Conditional judgment

Logstash supports the following operators:

  • Equality: = =,! =, <, >, <=, >=
  • Regexp: =,!
  • Inclusion: in, not in
  • Boolean: and, or, nand, xor
  • Unary:!

Such as:

if EXPRESSION {
  ...
} else if EXPRESSION {
  ...
} else {
  ...
}
Copy the code
(5) Environment variable reference

Logstash supports referencing system environment variables. If the environment variables do not exist, you can set the default value, for example:

export TCP_PORT=12345

input {
  tcp {
    port => "${TCP_PORT:54321}"
  }
}
Copy the code
4.3 Input Plugin

Input plugins include the following types. For details, see the official website documentation – common input plugins:

  • elasticsearch
  • exec
  • file
  • github
  • http
  • jdbc
  • jms
  • jmx
  • kafka
  • log4j
  • rabbitmq
  • redis
  • tcp
  • udp
  • unix
  • websocket
4.3.1 File reading plug-in

The file reading plug-in is mainly used to capture the change information of files and encapsulate the change information into Event process for processing or transmission.

  • Configuration examples
input
  file {
    path => ["/var/log/*.log", "/var/log/message"]
    type => "system"
    start_position => "beginning"
  }
}
Copy the code
  • Commonly used parameters
The parameter name type The default value Description information
add_field hash {} Use to add fields to the Event
close_older number 3600 Set the number of seconds in which a file has not been updated to turn off listening on the file
codec string “Plain” After the data is entered, it is decoded
delimiter string “\ n” A line delimiter for file content, encapsulated by line Event by default
discover_interval number 15 Check the number of seconds between paths to see if new files are generated
4.3.2 TCP Listening Plug-in

The TCP plug-in has two working modes: Client and Server, which are used to send network data and listen network data respectively.

  • Configuration examples
tcp {
    port => 41414
}
Copy the code
  • Common Parameters (empty => same as above)
The parameter name type The default value Description information
add_field
codec
enable_metric
host
id
mode “Server” and “client” “Server” “Server” listens for connection requests from “client”, and “Client” connects to “server”.
port number There is no The listening port must be specified in Server mode and the connection port must be specified in Client mode
proxy_protocol boolean false Proxyprotocol support, only v1 is supported at this time
ssl_cert
ssl_enable
ssl_extra_chain_certs
ssl_key
ssl_key_passphrase
ssl_verify
tags
type
4.3.3 Redis reading plug-in

Used to read data information cached in Redis.

  • Configuration examples
Input {redis {host => "127.0.0.1" port => 6379 data_type => "list" key => "logstash-list"}}Copy the code
  • Common Parameters (empty => same as above)
The parameter name type The default value Description information
add_field
batch_count number 125 To use the Redis Batch feature, a version of Redis 2.6.0 or later is required
codec
data_type List, channel, pattern_channel There is no You must set the items, according to the different Settings, SUBSCRIBE redis use different commands, order is: BLPOP, SUBSCRIBE, PSUBSCRIBE
db number 0 Specifies the Redis database to use
enable_metric
host string 127.0.0.1 Redis service address
id
key string There is no You must set the key name of the entry, reidslist, or channel
password string There is no Redis password
port number 6379 Redis connection port number
tags
threads number 1
timeout number 5 Redis service connection timeout duration, in seconds
Note:

Data_type Note that “channel” and “pattern_channel” are broadcast types, and the same data will be sent to the Logstash cluster that subscribes to the channel. That is, there will be data duplication in the logstash cluster environment. Each node in the cluster will receive the same data, but in the single-node case, “Pattern_channel” can be targeted simultaneously for multiple keys that satisfy the pattern

4.3.4 Kafka read plug-in

Used to read the topic data information pushed in Kafka.

  • Configuration examples
input { kafka { bootstrap_servers => "kafka-01:9092,kafka-02:9092,kafka-03:9092" topics_pattern => "elk-.*" consumer_threads => 5 decorate_events => true codec => "json" auto_offset_reset => "latest" group_id => "Logstash1 "##logstash cluster need to be the same}}Copy the code
  • Common parameters:
The parameter name type The default value Description information
bootstrap_servers string localhost:9092 Kafka list, used to establish initial connections to the cluster
topics_pattern string The topic regular expression pattern to subscribe to. When using this configuration, the topic configuration is ignored.
consumer_threads number Number of concurrent threads, ideally you should have as many threads as you have partitions
decorate_events string none Acceptable values are: none/basic/extended/false
codec codec plain A codec for input data
auto_offset_reset string When Kafka initial offset
group_id String logstash The identifier of the group to which the consumer belongs

Note:

  • Auto_offset_reset: earliest- automatically resets the offset to the earliest; Latest – Automatically resets the offset to the latest offset None – Throws an exception to the consumer if the previous offset of the consumer group is not found; Anything else- Throws an exception to the consumer.
  • Decorate_events: none: metadata is not added, basic: record attributes are added, extended: record attributes are added, title is added, false: alias none not recommended, true: alias basic not recommended
4.4 Common Filter Plugin

The rich filter plug-in is an important factor of the power of the LogStash. The filter plug-in mainly deals with the event information flowing through the current LogStash. You can add fields, remove fields, convert field types, slice data through regular expressions, etc. Can also be judged according to the conditions of different data processing methods, details see the official website document – common filter plug-in

4.4.1 Grok re capture

Grok is the best Logstash tool for parsing unstructured data into structured data for easy query. It’s great for parsing syslog logs, Apache logs, mysql logs, and other Web logs

(1) Predefined expression call:
  • Logstash provides 120 common regular expressions for installation, and after installation you can call them by name with the following SYNTAX: %{SYNTAX:SEMANTIC}
  • SYNTAX: Indicates the name of the installed regular expression
  • SEMANTIC: Represents the name of the content matched from the Event

For example, if the Event content is [debug] 127.0.1-test log content, the result is client: 127.0.0.1 if you match %{IP:client}. You can use %{NUMBER:num:int} if you want to convert data when you capture it. By default, all returns are strings. The only conversions currently supported by Logstash are “int” and “float”.

A slightly more complete example:

  • Log file content: 55.3.244.1 GET /index.html 15824 0.043
  • Expression: %{IP:client} %{WORD:method} %{URIPATHPARAM: Request} %{NUMBER:bytes} %{NUMBER:duration}
  • Configuration file contents:
input {
  file {
    path => "/var/log/http.log"
  }
}
filter {
  grok {
    match => {"message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}"}
  }
}
Copy the code
  • Output result:
Client: 55.3.244.1 method: GET Request: /index.html bytes: 15824 duration: 0.043Copy the code
(2) Custom expression call

Grammar: (? <field_name>the pattern here)

Example: To catch hexadecimal queue_id of 10 or 11 and length, use the expression (? <queue_id>[0-9a-f]{10,11}) installs a custom expression

Just like predefined expressions, you can configure your custom expressions into the Logstash stash and use them as defined expressions. The following are the instructions:

  • 1. Create a folder called “Patterns” under the Logstash root. In the “Patterns” folder create a file called “extra”.

  • 2. Add the expressions in the file “extra” with the following format: patternName regexp, separated by a space:

# contents of./patterns/postfix: POSTFIX_QUEUEID [0-9a-f]{10,11} # contents of./patterns/postfix: POSTFIX_QUEUEID [0-9a-f]{10,11}Copy the code
  • 3. To use custom expressions, specify the variable “patterns_dir” with the contents pointing to the directory where the expression file resides, as shown in the following example:

<1> Log content

Jan 1 06:25:43 mailserver14 postfix/cleanup[21403]: BEF25A72965: The message - id = < [email protected] > 20130101142543.5828399Copy the code

< 2 > Logstash configuration

filter {
  grok {
    patterns_dir => ["./patterns"]
    match => { "message" => "%{SYSLOGBASE} %{POSTFIX_QUEUEID:queue_id}: %{GREEDYDATA:syslog_message}" }
  }
}
Copy the code

<3> Running results

timestamp: Jan 1 06:25:43
logsource: mailserver14
program: postfix/cleanup
pid: 21403
queue_id: BEF25A72965
Copy the code
(3) Common grok configuration parameters (empty => same as above)
The parameter name type The default value Description information
add_field
add_tag
break_on_match boolean true If multiple patterns exist in the match field, all subsequent matches are terminated after the first match succeeds. To match all patterns, set this parameter to false
enable_metric
id
keep_empty_captures boolean false If true, the prize for failed fields is set to null
match array {} Set pattern array: match=> {” message “=> [” Duration: %{NUMBER: Duration}”, “Speed: %{NUMBER: Speed}”]}
named_captures_only boolean true If true, only store named captures from grok.
overwrite array [] Match => {” message “=>” %{SYSLOGBASE} %{DATA:message} “} overwrite=> [” message “]
patterns_dir array [] Specify a custom directory for storing pattern files. When started, the Logstash folder reads all files that match patterns_FILES_glob
patterns_files_glob string “*” Use to match the file in patterns_dir
periodic_flush boolean false Call the Flush method of Filter periodically
remove_field array [] Remove any field from Event: remove_field=> [” foo_%{somefield} “]
remove_tag array [] Remove_tag => [” foo_%{somefield} “]
tag_on_failure array [” _grokparsefailure “] If there is no match, add the array to the Tags field
tag_on_timeout string “_groktimeout” When a match times out, add this content to the Tags field
timeout_millis number 30000 Set the timeout period for a single match, in milliseconds. If this parameter is set to 0, the timeout setting is disabled
4.4.2 Date time processing plug-in

The plugin is used for formatting time fields, such as converting “Apr 17 09:32:01” (MMM DD HH:mm:ss) to “MM-DD HH:mm:ss”. In general, Logstash automatically stamps the Event with a time stamp, but this time stamp is the processing time of the Event (mainly the time when the input receives data), and there will be a deviation from the log recording time (mainly due to buffer). We can use this plug-in to replace the default timestamp value with the log occurrence time.

Common Configuration Parameters (empty => same as above)
The parameter name type The default value Description information
add_field
add_tag
enable_metric
id
locale
match array [] Time field matching, can customize a variety of formats, until the match to or the end of the match
periodic_flush
remove_field
remove_tag
tag_on_failure
target string “@ timestamp” Specifies the location (field) where match matches and converts to date, overridden by “@timestamp” by default.
timezone string There is no Specifies the time zone for time formatting

Note:

Match formats: Time field matches, various formats can be customized until the match arrives or ends. Format: [field,formats…] , for example, match=>[” logDate “, “MMM dd YYYY HH:mm:ss”, “MMM d YYYY HH:mm:ss”, “ISO8601”]

4.4.3 Mutate data modification plug-in

The Mutate plugin is another important Logstash plugin. It provides rich capabilities for handling basic types of data. You can rename, delete, replace, and modify fields in events.

Common Configuration Parameters (empty => same as above)
The parameter name type The default value Description information
add_field
add_tag
convert hash There is no Converts a specified field to a specified type. If the field is an array, all array elements are converted. If the field is a hash, no processing is performed. For example: convert=> {” fieldName “=>” integer “}
enable_metric
gsub array There is no [” fieldName “, “/”, “, “fieldName2”, “[\?#-]”, “. “] , explanation: Replace all slashes in fieldName with slashes, and use slashes. Replace all the “\” “? “in” fieldname2 “. , “#” and “-“
id
join hash There is no Concatenates each element of an array field with the specified symbol. Does not apply to non-array fields. For example, join each element of the array field fieldName into a string using “, “: join=> {” fieldName” => “, “}
lowercase array There is no Converts custom field values to lowercase
merge hash There is no Merges two arrays or hashes, automatically converted to a unit prime if they are strings; Merges an array and a hash. For example, merge added_field to dest_field: merge=> {” dest_field “=>” added_field “}
periodic_flush
remove_field
remove_tag
rename hash There is no Changes the names of one or more fields. For example, rename HOSTORIP to client_IP: rename=> {HOSTORIP => client_IP}
replace hash There is no Replaces the entire contents of a specified field with the new value, supporting variable references. For example, replace the value of “message” with the result after the string “: My new message” on the concatenation of the content of the field “source_host” : replace=> {” message “=>” %{source_host}: My new message “}
split hash There is no Split string fields into array fields using custom delimiters. Only string fields can be used. For example, split the contents of “fieldName” into an array according to “, “: split=> {” fieldName” => “, “}
strip array There is no Remove whitespace characters at both ends of the field content. Strip => [field1, field2] strip=> [field1, field2]
update hash There is no Update the content of an existing field. For example, update the content of the sample field to Mynew Message: update=> {” sample “=>” My New Message “}
uppercase array There is no Converts a string to uppercase
4.4.4 JSON plug-in

JSON plugins are used to decode jSON-formatted strings, usually in a pile of log messages, some in JSON format, some not

(1) Configuration examples
json {
    source => ...
}
Copy the code
  • In case configuration, message is a string in JSON format:"{\"uid\":3081609001,\"type\":\"signal\"}"
filter {
    json {
        source => "message"
        target => "jsoncontent"
    }
}
Copy the code
  • Output result:
{" @ version ":" 1 ", "@ timestamp" : "the 2014-11-18 T08: o. 000 z", "the host" : "web121.mweibo.tc.sinanode.com", "message" : "{\"uid\":3081609001,\"type\":\"signal\"}", "jsoncontent": { "uid": 3081609001, "type": "signal" } }Copy the code
  • If deleted from the case configurationtarget, the output is as follows:
{" @ version ":" 1 ", "@ timestamp" : "the 2014-11-18 T08: o. 000 z", "the host" : "web121.mweibo.tc.sinanode.com", "message" : "{\"uid\":3081609001,\"type\":\"signal\"}", "uid": 3081609001, "type": "signal" }Copy the code
(2) Common configuration parameters (empty => same as above)
The parameter name type The default value Description information
add_field
add_tag
enable_metric
id
periodic_flush
remove_field
remove_tag
skip_on_invalid_json boolean false Whether to skip the JSON that fails the verification
source string There is no You must set the item to specify the JSON string field to decode
tag_on_failure
target string There is no The name of the field in which the parsed JSON object is located. If not, all fields of the JSON object are hung under the root node
4.4.5 ElasticSearch Query filtering plug-in

This command is used to query events in Elasticsearch and apply the query result to the current event

Common Configuration Parameters (empty => same as above)
The parameter name type The default value Description information
add_field
add_tag
ca_file string There is no SSL Certificate Authority file path
enable_sort boolean true Whether to sort the results
fields array {} Copy fields from old events to new events, old events from ElasticSearch (used to query updates)
hosts array [] “localhost: 9200” List of ElasticSearch services
index string “” A comma-separated list of elasticSearch indexes that will be created automatically if the index does not exist when saving data to ElasticSearch using ‘_all’ or ‘ ‘
password string There is no password
periodic_flush
query string There is no Query string for querying ElasticSearch
remove_field
remove_tag
result_size number 1 The number of results returned when querying ElasticSearch
sort string “@ timestamp: desc” A comma separated “:” list is used to sort the query results
ssl boolean false SSL
tag_on_failure
user string There is no The user name
4.5 Output Plugin
4.5.1 ElasticSearch output plugin

Write event information to Elasticsearch

(1) Configuration examples
Output {elasticSearch {hosts => [" 127.0.0.1/9200 "] index => "FileBeat -%{type}-%{+ YYYY.mm. Dd}" template_overwrite => true } }Copy the code
(2) Common configuration parameters (empty => same as above)
The parameter name type The default value Description information
absolute_healthcheck_path boolean false If healthcheck_path is configured, you can determine whether the URL for elasticSearch healthcheck is an absolute path. For example, the access path of ElasticSearch is as follows:http://localhost:9200/es, where healthcheck_path is /health,When the current parameter is true, the access path is:http://localhost:9200/es/health “,When the current parameter is false, the access path is:http://localhost:9200/health”
absolute_sniffing_path boolean false If sniffing_path is configured, determine the Sniffing access path of ElasticSearch. For example, the access path of ElasticSearch is:http://localhost:9200/es “, “sniffing_path” is “/_sniffing”,When the current parameter is true, the access path is:http://localhost:9200/es/_sniffing “,When the current parameter is false, the access path is:http://localhost:9200/_sniffing”
action string “Index” Elasticsearch operation type: Index: index Logstash Event data to ElasticSearch; Delete: deletes a document by id, which must be specified; Delete: deletes a document by id, which must be specified; Update: Updates the document by ID
cacert string There is no Cer or. Pem certificate file path. The certificate is used for ElasticSearch authentication
codec
doc_as_upsert boolean false Enable UPDATE to enable upsert mode, which creates a new document if it does not already exist
document_id string There is no Elasticsearch document ID, used to override documents already saved in ElasticSearch
4.5.2 Redis output plug-in

It is used to write events into Redis for caching. Under normal circumstances, Logstash Filter processing consumes system resources, and complex Filter processing is time-consuming. If events are generated quickly, Redis can be used as buffer

(1) Configuration examples
Output {redis {host => "127.0.0.1" port => 6379 data_type => "list" key => "logstash-list"}}Copy the code
(2) Common configuration parameters (empty => same as above)
The parameter name type The default value Description information
batch boolean false Specifies whether to enable the Batch mode of Redis. This parameter is valid only when data_type= list
batch_events number 50 Batch size, when batch reaches this size, “RPUSH” is executed.
batch_timeout number 5 Batch timeout period after which “RPUSH” is executed
codec
congestion_interval number 1 How often to check for blocks? If set to 0, no Event is checked once
congestion_threshold number 0
data_type “List”, “channel” There is no Data types stored in Redis will be RPUSH if “list” or PUBLISH if “channel”
db number 0 Redis database number used
enable_metric
host array [” 127.0.0.1 “] Redis service list. If multiple redis services are configured, one is selected randomly. If the current Redis service is unavailable, the next one is selected
id
key string There is no Thename of a Redis list or channel. Dynamic names are valid here, forexample logstash-%{type}.
password string There is no Password of the Redis service
port number 6379 The Redis service listens on the port
reconnect_interval number 1 The reconnection interval if the connection fails
shuffle_hosts boolean true Shufflethe host list during Logstash startup.
timeout number 5 Redis connection timeout duration
workers number 1 whenwe no longer support the :legacy type This is hacky, but it can only be herne
4.5.3 File output plug-in

Use to print events to a file

(1) Configuration examples
output {
    file {
        path => ...
        codec => line { format => "custom format: %{message}"}
    }
}
Copy the code
(2) Common configuration parameters (empty => same as above)
The parameter name type The default value Description information
codec
create_if_deleted boolean true If the target file is deleted, a new file is created at write event
dir_mode number – 1 Set the directory access permission. If the value is -1, use the default access permission of the operating system
enable_metric
file_mode number – 1 Set the file access permission. If the value is -1, use the default access permission of the OPERATING system
filename_failure string “_filepath_failures” If the specified file path is invalid, the file is created in the directory and the data is logged
flush_interval number 2 Flush interval
gzip boolean false Whether to enable Gzip compression
id
path string There is no The output file path must be set, for example: path =>./test-%{+YYYY-MM-dd}.txt
workers string 1 whenwe no longer support the :legacy type This is hacky, but it can only be herne
4.5.4 Kafka output plug-in

Kafka is used to output events to Kafka topics

(1) Configuration examples
output {
    kafka {
        bootstrap_servers => "localhost:9092"
        topic_id => "test"
        compression_type => "gzip"
    }
}
Copy the code
(2) Common configuration parameters (empty => same as above)
The parameter name type The default value Description information
bootstrap_servers string Kafka cluster information. The format is host1:port1,host2:port2
topic_id string The topic on which the message is generated
compression_type String none The compression type of all data generated by the producer. The default value is none (no compression). The value can be None, gzip, snappy, or LZ4.
batch_size number 16384 Configure to control the default batch size in bytes
buffer_memory number 33554432(32MB) The total number of bytes of memory available to the producer to buffer records waiting to be sent to the server
max_request_size number 1048576(1MB) The maximum size of the request
flush_interval number 2 Flush interval
gzip boolean false Whether to enable Gzip compression
id
path string There is no The output file path must be set, for example: path =>./test-%{+YYYY-MM-dd}.txt
workers string 1 whenwe no longer support the :legacy type This is hacky, but it can only be herne
4.6 Codec Plugin
4.6.1 JSON encoding plug-in

You can omit the filter/grok configuration by entering the predefined JSON data directly

  • Configuration examples
json {
}
Copy the code
  • Common Configuration Parameters
The parameter name type The default value Description information
charset string “Utf-8” Character set
enable_metric
id

Five Logstash instance

5.1 Receiving Filebeat events and Sending them to Redis
Input {beats {port => 5044}} output {redis {host => "127.0.0.1" port => 6379 data_type => "list" key => "logstash-list" } }Copy the code
5.2 Read Redis data, judge according to “type”, process them respectively, and output them to ES
Input {redis {host => "127.0.0.1" port => 6379 data_type => "list" key => "logstash-list"}} filter {if [type] == "application" { grok { match => ["message", "(? M) - (? < systemName >. +?) (? < logTime > (?) > \ d \ d {1, 2} - (? : 0? [1-9] | [0-2] 1) - (? : (? : 0 [1-9]) | (? : [12] [0-9]) | | 3 [01] (? :) [1-9]) (?:2[0123]|[01]?[0-9]):(?:[0-5][0-9]):(?:(?:[0-5][0-9]|60)(?:[:.,][0-9]+)?)) \[(?<level>(\b\w+\b)) *\] (?<thread>(\b\w+\b)) \((?<point>.*?)\) - (?<content>.*)"] } date { match => ["logTime", "yyyy-MM-dd HH:mm:ss,SSS"] } json { source => "message" } date { match => ["timestamp", "yyyy-MM-dd HH:mm:ss,SSS"] } } if [type] == "application_bizz" { json { source => "message" } date { match => ["timestamp", "yyyy-MM-dd HH:mm:ss,SSS"] } } mutate { remove_field => ["@version", "beat", "LogTime "]}} Output {stdout{} elasticSearch {hosts => ["127.0.0.1:9200"] index =>" FileBeat -%{type}-%{+ YYYY.mm. Dd}"  document_type => "%{documentType}" template_overwrite => true } }Copy the code

6 Application Scenarios

6.1 Use LogStash as the log Finder

Architecture: LogStash capture, process, forward to ElasticSearch store and display in Kibana

Features: Since Logstash is deployed on each server, it consumes CPU and memory resources. Therefore, it is suitable for the server with rich computing resources. Otherwise, it is easy to degrade the performance of the server and may even fail to work properly.

6.2 Message Mode

Message mode: Beats does not yet support output to message queues (except for newer versions: version 5.0 and above), so there can only be Logstash instances at both ends of the message queue. Logstash collects data from various data sources and sends it to message queues (Kafka, Redis, rabbitMQ, etc.) without any processing and transformation. After that, Logstash takes data from message queues for transformation, analysis and filtering, and outputs it to ElasticSearch, which is graphicalized in Kibana

The architecture (Logstash log resolution must be good enough for all aspects of server performance) :

Pattern characteristics: This architecture is suitable for large log scale situations. However, since the Logstash log resolution node and Elasticsearch are under heavy load, you can configure them in cluster mode to share the load. The message queue is introduced to balance the network transmission, thus reducing the possibility of network occlusion, especially the loss of data. However, the problem of Logstash occupying too many system resources still exists

Workflow: Filebeat collects — > Logstash forwards to Kafka — > Logstash processes the data from the Kafka cache for analysis — > outputs to ES — > displays in Kibana

6.3 LogStash (Non-FileBeat) Files are collected and output to the Kafka cache. Kafka data is read and output to files or ES
6.4 Logstash Synchronizes mysql database data to ES (THE JDBC plug-in is integrated with Logstash5 and later, so you can use it directly without downloading and installing it)

Seven Logstash and Flume

First of all, from the structural comparison, we will find that the two are surprisingly similar! The Logstash Shipper, Broker, and Indexer correspond to Flume Source, Channel, and Sink respectively! It’s just that Logstash is integrated, Broker is not needed, and Flume needs to be configured separately. But this once again shows that the design ideas of computers are universal! It’s just going to be implemented differently.

From the programmer’s point of view, as mentioned above, Flume is really tedious. You need to configure source, channel and sink manually, and it involves complex data collection environment. You may have to do multiple configurations, which is mentioned above. The three parts of the attributes are defined, programmers to choose their own line, even if there is no, you can also develop plug-ins, very convenient. Of course, Flume has a lot of plugins, but there are only two kinds of channels: memory and file. The reader can see, both actually configuration is very flexible, just to see the scene to choose.

In fact, from the author and historical background, the original design purpose of the two is not quite the same. Flume itself is originally designed to transmit data to HDFS (it is not designed to collect logs, which is fundamentally different from Logstash), so it naturally focuses on data transmission. Programmers need to be very clear about the routing of the entire data, and there is a reliability policy more than Logstash. The channel mentioned above is used for persistence purposes. Data is not deleted until it is confirmed to be transferred to the next location. This step is controlled by transactions, which makes reliability very good. In contrast, Logstash clearly focuses on data preprocessing because the log fields require a lot of preprocessing to prepare for parsing.

Why did I start with Logstash and then Flume? There are several considerations

  • First of all, Logstash is actually more like a general model, so it is easier for newcomers to understand, while Flume, a lightweight thread, may have some basic computer programming to understand better.

  • Secondly, in most cases, Logstash is used more frequently. I haven’t counted this data myself, but according to my experience, Logstash can be used together with other ELK components, so the development and application will be much simpler, the technology is mature, and the application scenarios are wide. On the contrary, Flume components need to be used with many other tools, so the scenario will be more targeted, not to mention Flume configuration is too complicated.

In conclusion, we can understand their differences as follows:

The Logstash is like a desktop computer you buy. The main board, power supply, hard disk, and case are all packed inside. You can use them directly or modify them by yourself.

Flume is like providing you with a complete set of motherboard, power supply, hard disk, Flume did not pack, just like a manual guide you how to assemble, to run up.

Reference Documents:

  • [1] Stray siege lion. CSDN: blog.csdn.net/chenleiking… , 2017-06-22.
  • [2] Logstash website: www.elastic.co/cn/logstash