Logstash- Data stream engine

The author | WenasWei

The Logstash

Logstash is an open source data collection engine with real-time pipelining capabilities. Logstash dynamically unifies data from different sources and standardizes the data to a target location of your choice. Clean up and democratize all data for use in a variety of advanced downstream analysis and visualization use cases.

1.1 introduction of Logstash

Logstash is a data stream engine:

It is an open source streaming ETL (extract-Transform-load) engine for data logistics
Establish a data flow pipeline in a matter of minutes
Horizontal scalability and toughness with adaptive buffering
Unknowable data source
Plug-in ecosystem with more than 200 integrations and processors
Use Elastic Stack to monitor and manage deployments

Logstash is an Open Source Data collection engine with real-time Pipelining capabilities. Simply put, logstash is a pipeline with real-time data transmission capability. It is responsible for transferring data information from the input end of the pipeline to the output end of the pipeline. At the same time, this pipe also allows you to add filters in the middle according to your own needs. Logstash provides a variety of powerful filters to suit your various application scenarios.

1.2 Data Processing

Logstash is a powerful tool that can be integrated with a variety of deployments. It provides a number of plug-ins that help you parse, enrich, transform, and buffer data from a variety of sources. If your data requires additional processing that isn’t in Beats, add Logstash to your deployment.

The most popular data sources today:

Logstash takes in logs, files, metrics, or real data from the web. After being processed by Logstash, it becomes usable data that can be consumed by Web Apps, stored in data centers, or transformed into other streaming data:

Logstash makes it easy to work with Beats, which is the recommended approach
Logstash also works with well-known cloud vendors’ services to process their data
It can also work with similar message queues such as Redis or Kafka
Logstash can also use JDBC to access RDMS data
It can also work with IoT devices to process their data
Not only does Logstash send data to Elasticsearch, but it also sends data to many other destinations and serves as their input source for further processing

Two Logstash system architecture

The Logstash consists of 3 main parts: inputs, filters and outputs

The Logstash event processing pipeline has three main roles: inputs — > filters — > outputs: inputs — > outputs

Inpust: must, responsible for generating events, commonly used: File, syslog, redis, Kakfa, beats (e.g., Filebeats)
Filters: Optional. Filters modify them. Commonly used: grok, mutate, drop, clone, geoip
Outputs ship them elsewhere. Commonly used: ElasticSearch, File, Graphite, Kakfa, STATSD

Three Logstash installation

3.1 Environment List

Operating system: Linux #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 x86_64
Logstash Version: Logstash -6.2.4
Jdk version: 1.8.0_152

3.2 Installing the JDK on Linux

3.2.1 Decompress the package and move it to a specified directory (/usr/local)

(1) the decompression

tar -zxvf jdk-8u152-linux-x64.tar.gz
Copy the code

(2) Create a directory

mkdir -p /usr/local/java
Copy the code

(3) Move the installation package

The mv jdk1.8.0 _152 / / usr/local/Java /Copy the code

(4) Set the owner

chown -R root:root /usr/local/java/
Copy the code

3.2.2 Configuring Environment Variables

(1) Configure system environment variables

vi /etc/environment
Copy the code

(2) Add the following statement

PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games" export JAVA_HOME = / usr/local/Java/jdk1.8.0 _152 export JRE_HOME = / usr/local/Java/jdk1.8.0 _152 / jre export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/libCopy the code

(3) Configure user environment variables

nano /etc/profile
Copy the code

(4) Add the following statement (must be placed in the middle)

if [ "$PS1" ]; then if [ "$BASH" ] && [ "$BASH" != "/bin/sh" ]; then # The file bash.bashrc already sets the default PS1. # PS1='\h:\w\$ ' if [ -f /etc/bash.bashrc ]; then . /etc/bash.bashrc fi else if [ "`id -u`" -eq 0 ]; Then PS1='# 'else PS1='$' fi fi FI export JAVA_HOME=/usr/local/ Java /jdk1.8.0_152 export JRE_HOME = / usr/local/Java/jdk1.8.0 _152 / jre export CLASSPATH = $CLASSPATH: $JAVA_HOME/lib: $JAVA_HOME/jre/lib export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$HOME/bin if [ -d /etc/profile.d ]; then for i in /etc/profile.d/*.sh; do if [ -r $i ]; then . $i fi done unset i fiCopy the code

(5) User environment variables take effect

source /etc/profile
Copy the code

(6) Check whether the installation is successful

$Java -version Java version "1.8.0_152" Java(TM) SE Runtime Environment (build 1.8.0_152-B16) Java HotSpot(TM) 64-bit Server VM (Build 25.152-B16, Mixed mode)Copy the code

3.3 installation Logstash

3.3.1 Creating an Installation Directory

$ sudo mkdir /usr/local/logstash
Copy the code

3.3.2 Downloading the Logstash installation file

$wget - P/usr/local/logstash https://artifacts.elastic.co/downloads/logstash/logstash-6.2.4.tar.gzCopy the code

3.3.2 Decompressing the installation file

$CD /usr/local/logstash/ $sudo tar -zxvf logstash-6.2.4.tar.gzCopy the code

3.3.3 Verifying the installation is successful

Test: fast start, standard input output as input and output, no filter

$CD logstash-6.2.4/ $./bin/logstash -e 'input {stdin {}} output {stdout {}}' Sending logstash's logs to / usr/local/logstash/logstash - 6.2.4 / logs which is now configured via log4j2. Properties [the 2021-05-27 T00: ", 729] [INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"fb_apache", : / usr/local/directory = > "logstash/logstash - 6.2.4 / modules/fb_apache/configuration"} [the 2021-05-27 T00: ", 804] [INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"netflow", : / usr/local/directory = > "logstash/logstash - 6.2.4 / modules/netflow/configuration"} [the 2021-05-27 T00: the prosperous, 827] [WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are Specified [2021-05-27T00:22:30:979][INFO][logstash. Runner] Starting logstash {"logstash. Version "=>"6.2.4"} [2021-05-27T00:22:31.821][INFO][logstash. Agent] Successfully started logstash API endpoint {:port=>9600} [2021-05-27T00:22:36.463][INFO][logstash. Pipeline] Starting pipeline {:pipeline_id=>"main", "pipeline "=>1, "pipeline.batch.size"=>125, "Pipeline. Batch. delay"=>50} [2021-05-27T00:22:36,690][INFO][logstash. Pipeline] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x55a5abea run>"} The stdin plugin is now waiting for input: < span style = "box-sizing: border-box; line-height: 22px; display: block; word-break: inherit! Important; < span style = "box-sizing: border-box; word-break: inherit! Important; word-break: inherit! Important; ## {"@timestamp" => 2021-05-26t16:22:52.527z, "host" => "*******", "message" => "hello world", "@version" => "1" }Copy the code

Four Logstash parameters and configuration

4.1 Common Startup Parameters

parameter	instructions	For example,
-e	Execute immediately, starting the instance with the configuration parameters on the command line	/bin/logstash -e ‘Input {stdin {}} output {stdout {}}’
-f	Specifies the configuration file to start the instance	./bin/logstash -f config/test.conf
-t	Test the correctness of the configuration file	./bin/logstash-f config/test.conf -t
-l	Specify the log file name	./bin/logstash-f config/test.conf -l logs/test.log
-w	Specifies the number of filter threads. The default number is 5	./bin/logstash-f config/test.conf -w 8

4.2 Configuration File Structure and Syntax

(1) section

Logstash defines a region with {}. Plug-ins can be defined in this region. Multiple plug-ins can be defined in one region, as follows:

input {
    stdin {
    }
    beats {
        port => 5044
    }
}
Copy the code

(2) Data type

Logstash supports only a small number of data types:

Boolean: ssl_enable => true
Number: port => 33
String: name => “Hello world”
Commonts: # this is a comment

(3) Field reference

The data in the Logstash data stream is called an Event object. The Event is made up of a JSON structure, and the properties of the Event are called fields. If you want to reference these fields in a configuration file, just write the name of the field in brackets [], such as [type]. For nested fields, write the field name in [], for example: [tags][type]; In addition, the Logstash arrag type supports subscript and reverse order tables, such as [tags][type][0],[tags][type][-1].

(4) Conditional judgment

Logstash supports the following operators:

Equality: = =,! =, <, >, <=, >=
Regexp: =,!
Inclusion: in, not in
Boolean: and, or, nand, xor
Unary:!

Such as:

if EXPRESSION {
  ...
} else if EXPRESSION {
  ...
} else {
  ...
}
Copy the code

(5) Environment variable reference

Logstash supports referencing system environment variables. If the environment variables do not exist, you can set the default value, for example:

export TCP_PORT=12345

input {
  tcp {
    port => "${TCP_PORT:54321}"
  }
}
Copy the code

4.3 Input Plugin

Input plugins include the following types. For details, see the official website documentation – common input plugins:

elasticsearch
exec
file
github
http
jdbc
jms
jmx
kafka
log4j
rabbitmq
redis
tcp
udp
unix
websocket

4.3.1 File reading plug-in

The file reading plug-in is mainly used to capture the change information of files and encapsulate the change information into Event process for processing or transmission.

Configuration examples

input
  file {
    path => ["/var/log/*.log", "/var/log/message"]
    type => "system"
    start_position => "beginning"
  }
}
Copy the code

Commonly used parameters

The parameter name	type	The default value	Description information
add_field	hash	{}	Use to add fields to the Event
close_older	number	3600	Set the number of seconds in which a file has not been updated to turn off listening on the file
codec	string	“Plain”	After the data is entered, it is decoded
delimiter	string	“\ n”	A line delimiter for file content, encapsulated by line Event by default
discover_interval	number	15	Check the number of seconds between paths to see if new files are generated

4.3.2 TCP Listening Plug-in

The TCP plug-in has two working modes: Client and Server, which are used to send network data and listen network data respectively.

Configuration examples

tcp {
    port => 41414
}
Copy the code

Common Parameters (empty => same as above)

The parameter name	type	The default value	Description information
add_field
codec
enable_metric
host
id
mode	“Server” and “client”	“Server”	“Server” listens for connection requests from “client”, and “Client” connects to “server”.
port	number	There is no	The listening port must be specified in Server mode and the connection port must be specified in Client mode
proxy_protocol	boolean	false	Proxyprotocol support, only v1 is supported at this time
ssl_cert
ssl_enable
ssl_extra_chain_certs
ssl_key
ssl_key_passphrase
ssl_verify
tags
type

4.3.3 Redis reading plug-in

Used to read data information cached in Redis.

Configuration examples

Input {redis {host => "127.0.0.1" port => 6379 data_type => "list" key => "logstash-list"}}Copy the code

Common Parameters (empty => same as above)

The parameter name	type	The default value	Description information
add_field
batch_count	number	125	To use the Redis Batch feature, a version of Redis 2.6.0 or later is required
codec
data_type	List, channel, pattern_channel	There is no	You must set the items, according to the different Settings, SUBSCRIBE redis use different commands, order is: BLPOP, SUBSCRIBE, PSUBSCRIBE
db	number	0	Specifies the Redis database to use
enable_metric
host	string	127.0.0.1	Redis service address
id
key	string	There is no	You must set the key name of the entry, reidslist, or channel
password	string	There is no	Redis password
port	number	6379	Redis connection port number
tags
threads	number	1
timeout	number	5	Redis service connection timeout duration, in seconds
Note:

Data_type Note that “channel” and “pattern_channel” are broadcast types, and the same data will be sent to the Logstash cluster that subscribes to the channel. That is, there will be data duplication in the logstash cluster environment. Each node in the cluster will receive the same data, but in the single-node case, “Pattern_channel” can be targeted simultaneously for multiple keys that satisfy the pattern

4.3.4 Kafka read plug-in

Used to read the topic data information pushed in Kafka.

Configuration examples

input { kafka { bootstrap_servers => "kafka-01:9092,kafka-02:9092,kafka-03:9092" topics_pattern => "elk-.*" consumer_threads => 5 decorate_events => true codec => "json" auto_offset_reset => "latest" group_id => "Logstash1 "##logstash cluster need to be the same}}Copy the code

Common parameters:

The parameter name	type	The default value	Description information
bootstrap_servers	string	localhost:9092	Kafka list, used to establish initial connections to the cluster
topics_pattern	string		The topic regular expression pattern to subscribe to. When using this configuration, the topic configuration is ignored.
consumer_threads	number		Number of concurrent threads, ideally you should have as many threads as you have partitions
decorate_events	string	none	Acceptable values are: none/basic/extended/false
codec	codec	plain	A codec for input data
auto_offset_reset	string		When Kafka initial offset
group_id	String	logstash	The identifier of the group to which the consumer belongs

Note:

Auto_offset_reset: earliest- automatically resets the offset to the earliest; Latest – Automatically resets the offset to the latest offset None – Throws an exception to the consumer if the previous offset of the consumer group is not found; Anything else- Throws an exception to the consumer.
Decorate_events: none: metadata is not added, basic: record attributes are added, extended: record attributes are added, title is added, false: alias none not recommended, true: alias basic not recommended

4.4 Common Filter Plugin

The rich filter plug-in is an important factor of the power of the LogStash. The filter plug-in mainly deals with the event information flowing through the current LogStash. You can add fields, remove fields, convert field types, slice data through regular expressions, etc. Can also be judged according to the conditions of different data processing methods, details see the official website document – common filter plug-in

4.4.1 Grok re capture

Grok is the best Logstash tool for parsing unstructured data into structured data for easy query. It’s great for parsing syslog logs, Apache logs, mysql logs, and other Web logs

(1) Predefined expression call:

Logstash provides 120 common regular expressions for installation, and after installation you can call them by name with the following SYNTAX: %{SYNTAX:SEMANTIC}
SYNTAX: Indicates the name of the installed regular expression
SEMANTIC: Represents the name of the content matched from the Event

For example, if the Event content is [debug] 127.0.1-test log content, the result is client: 127.0.0.1 if you match %{IP:client}. You can use %{NUMBER:num:int} if you want to convert data when you capture it. By default, all returns are strings. The only conversions currently supported by Logstash are “int” and “float”.

A slightly more complete example:

Log file content: 55.3.244.1 GET /index.html 15824 0.043
Expression: %{IP:client} %{WORD:method} %{URIPATHPARAM: Request} %{NUMBER:bytes} %{NUMBER:duration}
Configuration file contents:

input {
  file {
    path => "/var/log/http.log"
  }
}
filter {
  grok {
    match => {"message" => "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}"}
  }
}
Copy the code

Output result:

Client: 55.3.244.1 method: GET Request: /index.html bytes: 15824 duration: 0.043Copy the code

(2) Custom expression call

Grammar: (? <field_name>the pattern here)

Example: To catch hexadecimal queue_id of 10 or 11 and length, use the expression (? <queue_id>[0-9a-f]{10,11}) installs a custom expression

Just like predefined expressions, you can configure your custom expressions into the Logstash stash and use them as defined expressions. The following are the instructions:

1. Create a folder called “Patterns” under the Logstash root. In the “Patterns” folder create a file called “extra”.
2. Add the expressions in the file “extra” with the following format: patternName regexp, separated by a space:

# contents of./patterns/postfix: POSTFIX_QUEUEID [0-9a-f]{10,11} # contents of./patterns/postfix: POSTFIX_QUEUEID [0-9a-f]{10,11}Copy the code

3. To use custom expressions, specify the variable “patterns_dir” with the contents pointing to the directory where the expression file resides, as shown in the following example:

<1> Log content

Jan 1 06:25:43 mailserver14 postfix/cleanup[21403]: BEF25A72965: The message - id = < ccaf@mailserver14.example.com > 20130101142543.5828399Copy the code

< 2 > Logstash configuration

filter {
  grok {
    patterns_dir => ["./patterns"]
    match => { "message" => "%{SYSLOGBASE} %{POSTFIX_QUEUEID:queue_id}: %{GREEDYDATA:syslog_message}" }
  }
}
Copy the code

<3> Running results

timestamp: Jan 1 06:25:43
logsource: mailserver14
program: postfix/cleanup
pid: 21403
queue_id: BEF25A72965
Copy the code

(3) Common grok configuration parameters (empty => same as above)

The parameter name	type	The default value	Description information
add_field
add_tag
break_on_match	boolean	true	If multiple patterns exist in the match field, all subsequent matches are terminated after the first match succeeds. To match all patterns, set this parameter to false
enable_metric
id
keep_empty_captures	boolean	false	If true, the prize for failed fields is set to null
match	array	{}	Set pattern array: match=> {” message “=> [” Duration: %{NUMBER: Duration}”, “Speed: %{NUMBER: Speed}”]}
named_captures_only	boolean	true	If true, only store named captures from grok.
overwrite	array	[]	Match => {” message “=>” %{SYSLOGBASE} %{DATA:message} “} overwrite=> [” message “]
patterns_dir	array	[]	Specify a custom directory for storing pattern files. When started, the Logstash folder reads all files that match patterns_FILES_glob
patterns_files_glob	string	“*”	Use to match the file in patterns_dir
periodic_flush	boolean	false	Call the Flush method of Filter periodically
remove_field	array	[]	Remove any field from Event: remove_field=> [” foo_%{somefield} “]
remove_tag	array	[]	Remove_tag => [” foo_%{somefield} “]
tag_on_failure	array	[” _grokparsefailure “]	If there is no match, add the array to the Tags field
tag_on_timeout	string	“_groktimeout”	When a match times out, add this content to the Tags field
timeout_millis	number	30000	Set the timeout period for a single match, in milliseconds. If this parameter is set to 0, the timeout setting is disabled

4.4.2 Date time processing plug-in

The plugin is used for formatting time fields, such as converting “Apr 17 09:32:01” (MMM DD HH:mm:ss) to “MM-DD HH:mm:ss”. In general, Logstash automatically stamps the Event with a time stamp, but this time stamp is the processing time of the Event (mainly the time when the input receives data), and there will be a deviation from the log recording time (mainly due to buffer). We can use this plug-in to replace the default timestamp value with the log occurrence time.

Common Configuration Parameters (empty => same as above)

The parameter name	type	The default value	Description information
add_field
add_tag
enable_metric
id
locale
match	array	[]	Time field matching, can customize a variety of formats, until the match to or the end of the match
periodic_flush
remove_field
remove_tag
tag_on_failure
target	string	“@ timestamp”	Specifies the location (field) where match matches and converts to date, overridden by “@timestamp” by default.
timezone	string	There is no	Specifies the time zone for time formatting

Note:

Match formats: Time field matches, various formats can be customized until the match arrives or ends. Format: [field,formats…] , for example, match=>[” logDate “, “MMM dd YYYY HH:mm:ss”, “MMM d YYYY HH:mm:ss”, “ISO8601”]

4.4.3 Mutate data modification plug-in

The Mutate plugin is another important Logstash plugin. It provides rich capabilities for handling basic types of data. You can rename, delete, replace, and modify fields in events.

Common Configuration Parameters (empty => same as above)

The parameter name	type	The default value	Description information
add_field
add_tag
convert	hash	There is no	Converts a specified field to a specified type. If the field is an array, all array elements are converted. If the field is a hash, no processing is performed. For example: convert=> {” fieldName “=>” integer “}
enable_metric
gsub	array	There is no	[” fieldName “, “/”, “, “fieldName2”, “[\?#-]”, “. “] , explanation: Replace all slashes in fieldName with slashes, and use slashes. Replace all the “\” “? “in” fieldname2 “. , “#” and “-“
id
join	hash	There is no	Concatenates each element of an array field with the specified symbol. Does not apply to non-array fields. For example, join each element of the array field fieldName into a string using “, “: join=> {” fieldName” => “, “}
lowercase	array	There is no	Converts custom field values to lowercase
merge	hash	There is no	Merges two arrays or hashes, automatically converted to a unit prime if they are strings; Merges an array and a hash. For example, merge added_field to dest_field: merge=> {” dest_field “=>” added_field “}
periodic_flush
remove_field
remove_tag
rename	hash	There is no	Changes the names of one or more fields. For example, rename HOSTORIP to client_IP: rename=> {HOSTORIP => client_IP}
replace	hash	There is no	Replaces the entire contents of a specified field with the new value, supporting variable references. For example, replace the value of “message” with the result after the string “: My new message” on the concatenation of the content of the field “source_host” : replace=> {” message “=>” %{source_host}: My new message “}
split	hash	There is no	Split string fields into array fields using custom delimiters. Only string fields can be used. For example, split the contents of “fieldName” into an array according to “, “: split=> {” fieldName” => “, “}
strip	array	There is no	Remove whitespace characters at both ends of the field content. Strip => [field1, field2] strip=> [field1, field2]
update	hash	There is no	Update the content of an existing field. For example, update the content of the sample field to Mynew Message: update=> {” sample “=>” My New Message “}
uppercase	array	There is no	Converts a string to uppercase

4.4.4 JSON plug-in

JSON plugins are used to decode jSON-formatted strings, usually in a pile of log messages, some in JSON format, some not

(1) Configuration examples

json {
    source => ...
}
Copy the code

In case configuration, message is a string in JSON format:"{\"uid\":3081609001,\"type\":\"signal\"}"

filter {
    json {
        source => "message"
        target => "jsoncontent"
    }
}
Copy the code

Output result:

{" @ version ":" 1 ", "@ timestamp" : "the 2014-11-18 T08: o. 000 z", "the host" : "web121.mweibo.tc.sinanode.com", "message" : "{\"uid\":3081609001,\"type\":\"signal\"}", "jsoncontent": { "uid": 3081609001, "type": "signal" } }Copy the code

If deleted from the case configurationtarget, the output is as follows:

{" @ version ":" 1 ", "@ timestamp" : "the 2014-11-18 T08: o. 000 z", "the host" : "web121.mweibo.tc.sinanode.com", "message" : "{\"uid\":3081609001,\"type\":\"signal\"}", "uid": 3081609001, "type": "signal" }Copy the code

(2) Common configuration parameters (empty => same as above)

The parameter name	type	The default value	Description information
add_field
add_tag
enable_metric
id
periodic_flush
remove_field
remove_tag
skip_on_invalid_json	boolean	false	Whether to skip the JSON that fails the verification
source	string	There is no	You must set the item to specify the JSON string field to decode
tag_on_failure
target	string	There is no	The name of the field in which the parsed JSON object is located. If not, all fields of the JSON object are hung under the root node

4.4.5 ElasticSearch Query filtering plug-in

This command is used to query events in Elasticsearch and apply the query result to the current event

Common Configuration Parameters (empty => same as above)

The parameter name	type	The default value	Description information
add_field
add_tag
ca_file	string	There is no	SSL Certificate Authority file path
enable_sort	boolean	true	Whether to sort the results
fields	array	{}	Copy fields from old events to new events, old events from ElasticSearch (used to query updates)
hosts	array	[] “localhost: 9200”	List of ElasticSearch services
index	string	“”	A comma-separated list of elasticSearch indexes that will be created automatically if the index does not exist when saving data to ElasticSearch using ‘_all’ or ‘ ‘
password	string	There is no	password
periodic_flush
query	string	There is no	Query string for querying ElasticSearch
remove_field
remove_tag
result_size	number	1	The number of results returned when querying ElasticSearch
sort	string	“@ timestamp: desc”	A comma separated “:” list is used to sort the query results
ssl	boolean	false	SSL
tag_on_failure
user	string	There is no	The user name

4.5 Output Plugin

4.5.1 ElasticSearch output plugin

Write event information to Elasticsearch

(1) Configuration examples

Output {elasticSearch {hosts => [" 127.0.0.1/9200 "] index => "FileBeat -%{type}-%{+ YYYY.mm. Dd}" template_overwrite => true } }Copy the code

(2) Common configuration parameters (empty => same as above)

The parameter name	type	The default value	Description information
absolute_healthcheck_path	boolean	false	If healthcheck_path is configured, you can determine whether the URL for elasticSearch healthcheck is an absolute path. For example, the access path of ElasticSearch is as follows:http://localhost:9200/es, where healthcheck_path is /health,When the current parameter is true, the access path is:http://localhost:9200/es/health “,When the current parameter is false, the access path is:http://localhost:9200/health”
absolute_sniffing_path	boolean	false	If sniffing_path is configured, determine the Sniffing access path of ElasticSearch. For example, the access path of ElasticSearch is:http://localhost:9200/es “, “sniffing_path” is “/_sniffing”,When the current parameter is true, the access path is:http://localhost:9200/es/_sniffing “,When the current parameter is false, the access path is:http://localhost:9200/_sniffing”
action	string	“Index”	Elasticsearch operation type: Index: index Logstash Event data to ElasticSearch; Delete: deletes a document by id, which must be specified; Delete: deletes a document by id, which must be specified; Update: Updates the document by ID
cacert	string	There is no	Cer or. Pem certificate file path. The certificate is used for ElasticSearch authentication
codec
doc_as_upsert	boolean	false	Enable UPDATE to enable upsert mode, which creates a new document if it does not already exist
document_id	string	There is no	Elasticsearch document ID, used to override documents already saved in ElasticSearch

4.5.2 Redis output plug-in

It is used to write events into Redis for caching. Under normal circumstances, Logstash Filter processing consumes system resources, and complex Filter processing is time-consuming. If events are generated quickly, Redis can be used as buffer

(1) Configuration examples

Output {redis {host => "127.0.0.1" port => 6379 data_type => "list" key => "logstash-list"}}Copy the code

(2) Common configuration parameters (empty => same as above)

The parameter name	type	The default value	Description information
batch	boolean	false	Specifies whether to enable the Batch mode of Redis. This parameter is valid only when data_type= list
batch_events	number	50	Batch size, when batch reaches this size, “RPUSH” is executed.
batch_timeout	number	5	Batch timeout period after which “RPUSH” is executed
codec
congestion_interval	number	1	How often to check for blocks? If set to 0, no Event is checked once
congestion_threshold	number	0
data_type	“List”, “channel”	There is no	Data types stored in Redis will be RPUSH if “list” or PUBLISH if “channel”
db	number	0	Redis database number used
enable_metric
host	array	[” 127.0.0.1 “]	Redis service list. If multiple redis services are configured, one is selected randomly. If the current Redis service is unavailable, the next one is selected
id
key	string	There is no	Thename of a Redis list or channel. Dynamic names are valid here, forexample logstash-%{type}.
password	string	There is no	Password of the Redis service
port	number	6379	The Redis service listens on the port
reconnect_interval	number	1	The reconnection interval if the connection fails
shuffle_hosts	boolean	true	Shufflethe host list during Logstash startup.
timeout	number	5	Redis connection timeout duration
workers	number	1	whenwe no longer support the :legacy type This is hacky, but it can only be herne

4.5.3 File output plug-in

Use to print events to a file

(1) Configuration examples

output {
    file {
        path => ...
        codec => line { format => "custom format: %{message}"}
    }
}
Copy the code

(2) Common configuration parameters (empty => same as above)

The parameter name	type	The default value	Description information
codec
create_if_deleted	boolean	true	If the target file is deleted, a new file is created at write event
dir_mode	number	– 1	Set the directory access permission. If the value is -1, use the default access permission of the operating system
enable_metric
file_mode	number	– 1	Set the file access permission. If the value is -1, use the default access permission of the OPERATING system
filename_failure	string	“_filepath_failures”	If the specified file path is invalid, the file is created in the directory and the data is logged
flush_interval	number	2	Flush interval
gzip	boolean	false	Whether to enable Gzip compression
id
path	string	There is no	The output file path must be set, for example: path =>./test-%{+YYYY-MM-dd}.txt
workers	string	1	whenwe no longer support the :legacy type This is hacky, but it can only be herne

4.5.4 Kafka output plug-in

Kafka is used to output events to Kafka topics

(1) Configuration examples

output {
    kafka {
        bootstrap_servers => "localhost:9092"
        topic_id => "test"
        compression_type => "gzip"
    }
}
Copy the code

(2) Common configuration parameters (empty => same as above)

The parameter name	type	The default value	Description information
bootstrap_servers	string		Kafka cluster information. The format is host1:port1,host2:port2
topic_id	string		The topic on which the message is generated
compression_type	String	none	The compression type of all data generated by the producer. The default value is none (no compression). The value can be None, gzip, snappy, or LZ4.
batch_size	number	16384	Configure to control the default batch size in bytes
buffer_memory	number	33554432(32MB)	The total number of bytes of memory available to the producer to buffer records waiting to be sent to the server
max_request_size	number	1048576(1MB)	The maximum size of the request
flush_interval	number	2	Flush interval
gzip	boolean	false	Whether to enable Gzip compression
id
path	string	There is no	The output file path must be set, for example: path =>./test-%{+YYYY-MM-dd}.txt
workers	string	1	whenwe no longer support the :legacy type This is hacky, but it can only be herne

4.6 Codec Plugin

4.6.1 JSON encoding plug-in

You can omit the filter/grok configuration by entering the predefined JSON data directly

Configuration examples

json {
}
Copy the code

Common Configuration Parameters

The parameter name	type	The default value	Description information
charset	string	“Utf-8”	Character set
enable_metric
id

Five Logstash instance

5.1 Receiving Filebeat events and Sending them to Redis

Input {beats {port => 5044}} output {redis {host => "127.0.0.1" port => 6379 data_type => "list" key => "logstash-list" } }Copy the code

5.2 Read Redis data, judge according to “type”, process them respectively, and output them to ES

Input {redis {host => "127.0.0.1" port => 6379 data_type => "list" key => "logstash-list"}} filter {if [type] == "application" { grok { match => ["message", "(? M) - (? < systemName >. +?) (? < logTime > (?) > \ d \ d {1, 2} - (? : 0? [1-9] | [0-2] 1) - (? : (? : 0 [1-9]) | (? : [12] [0-9]) | | 3 [01] (? :) [1-9]) (?:2[0123]|[01]?[0-9]):(?:[0-5][0-9]):(?:(?:[0-5][0-9]|60)(?:[:.,][0-9]+)?)) \[(?<level>(\b\w+\b)) *\] (?<thread>(\b\w+\b)) \((?<point>.*?)\) - (?<content>.*)"] } date { match => ["logTime", "yyyy-MM-dd HH:mm:ss,SSS"] } json { source => "message" } date { match => ["timestamp", "yyyy-MM-dd HH:mm:ss,SSS"] } } if [type] == "application_bizz" { json { source => "message" } date { match => ["timestamp", "yyyy-MM-dd HH:mm:ss,SSS"] } } mutate { remove_field => ["@version", "beat", "LogTime "]}} Output {stdout{} elasticSearch {hosts => ["127.0.0.1:9200"] index =>" FileBeat -%{type}-%{+ YYYY.mm. Dd}"  document_type => "%{documentType}" template_overwrite => true } }Copy the code

6 Application Scenarios

6.1 Use LogStash as the log Finder

Architecture: LogStash capture, process, forward to ElasticSearch store and display in Kibana

Features: Since Logstash is deployed on each server, it consumes CPU and memory resources. Therefore, it is suitable for the server with rich computing resources. Otherwise, it is easy to degrade the performance of the server and may even fail to work properly.

6.2 Message Mode

Message mode: Beats does not yet support output to message queues (except for newer versions: version 5.0 and above), so there can only be Logstash instances at both ends of the message queue. Logstash collects data from various data sources and sends it to message queues (Kafka, Redis, rabbitMQ, etc.) without any processing and transformation. After that, Logstash takes data from message queues for transformation, analysis and filtering, and outputs it to ElasticSearch, which is graphicalized in Kibana

The architecture (Logstash log resolution must be good enough for all aspects of server performance) :

Pattern characteristics: This architecture is suitable for large log scale situations. However, since the Logstash log resolution node and Elasticsearch are under heavy load, you can configure them in cluster mode to share the load. The message queue is introduced to balance the network transmission, thus reducing the possibility of network occlusion, especially the loss of data. However, the problem of Logstash occupying too many system resources still exists

Workflow: Filebeat collects — > Logstash forwards to Kafka — > Logstash processes the data from the Kafka cache for analysis — > outputs to ES — > displays in Kibana

6.3 LogStash (Non-FileBeat) Files are collected and output to the Kafka cache. Kafka data is read and output to files or ES

6.4 Logstash Synchronizes mysql database data to ES (THE JDBC plug-in is integrated with Logstash5 and later, so you can use it directly without downloading and installing it)

Seven Logstash and Flume

First of all, from the structural comparison, we will find that the two are surprisingly similar! The Logstash Shipper, Broker, and Indexer correspond to Flume Source, Channel, and Sink respectively! It’s just that Logstash is integrated, Broker is not needed, and Flume needs to be configured separately. But this once again shows that the design ideas of computers are universal! It’s just going to be implemented differently.

From the programmer’s point of view, as mentioned above, Flume is really tedious. You need to configure source, channel and sink manually, and it involves complex data collection environment. You may have to do multiple configurations, which is mentioned above. The three parts of the attributes are defined, programmers to choose their own line, even if there is no, you can also develop plug-ins, very convenient. Of course, Flume has a lot of plugins, but there are only two kinds of channels: memory and file. The reader can see, both actually configuration is very flexible, just to see the scene to choose.

In fact, from the author and historical background, the original design purpose of the two is not quite the same. Flume itself is originally designed to transmit data to HDFS (it is not designed to collect logs, which is fundamentally different from Logstash), so it naturally focuses on data transmission. Programmers need to be very clear about the routing of the entire data, and there is a reliability policy more than Logstash. The channel mentioned above is used for persistence purposes. Data is not deleted until it is confirmed to be transferred to the next location. This step is controlled by transactions, which makes reliability very good. In contrast, Logstash clearly focuses on data preprocessing because the log fields require a lot of preprocessing to prepare for parsing.

Why did I start with Logstash and then Flume? There are several considerations

First of all, Logstash is actually more like a general model, so it is easier for newcomers to understand, while Flume, a lightweight thread, may have some basic computer programming to understand better.
Secondly, in most cases, Logstash is used more frequently. I haven’t counted this data myself, but according to my experience, Logstash can be used together with other ELK components, so the development and application will be much simpler, the technology is mature, and the application scenarios are wide. On the contrary, Flume components need to be used with many other tools, so the scenario will be more targeted, not to mention Flume configuration is too complicated.

In conclusion, we can understand their differences as follows:

The Logstash is like a desktop computer you buy. The main board, power supply, hard disk, and case are all packed inside. You can use them directly or modify them by yourself.

Flume is like providing you with a complete set of motherboard, power supply, hard disk, Flume did not pack, just like a manual guide you how to assemble, to run up.

Reference Documents:

[1] Stray siege lion. CSDN: blog.csdn.net/chenleiking… , 2017-06-22.
[2] Logstash website: www.elastic.co/cn/logstash

Logstash- Data stream engine