ELK uses tips (Issue 2)

ELK Tips are Tips for using ELK from the Elastic Chinese community.

A, Logstash

Filebeat: Non-zero metrics in the last 30s

Filebeat cannot send log data to Elasticsearch.
Error message:INFO [monitoring] 1og/log.go:124 Non-zero metrics in the last 30s;
Add the enabled: true attribute under input and Output.

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/*.log

output.elasticsearch:
  hosts: ["https://localhost:9200"]
  username: "filebeat_internal"
  password: "YOUR_PASSWORD"
  enabled: true
Copy the code

The default value of the Enabled attribute for input and Output is true.

2. Logstash generates monthly indexes

output {
	if [type] = ="typeA"{
		elasticsearch {
			hosts  => "127.0.0.1:9200"
			index => "log_%{+YYYY_MM}"}}}Copy the code

%{+ yyyy.mm. Dd}

3. Filebeat is configured to delete specific fields

Filebeat implements a function similar to that of the Logstash filter, called processors. There are not many different types of processors, and they provide as many common functions as possible while keeping Filebeat lightweight.

Here are some common processors:

add_cloud_metadata: Adds meta information about the cloud server.
add_locale: Adds the local time zone.
decode_json_fieldsParse and process fields containing Json strings;
drop_event: Discards the message events that match the conditions.
drop_fields: Deletes the fields that match the conditions.
include_fields: Select the fields that match the conditions.
rename: field rename;
add_kubernetes_metadata: Adds k8S meta information.
add_docker_metadata: Adds the meta information of the container.
add_host_metadata: Adds the meta information of the operating system.
dissect: features similar to gork’s regular matching fields;
dns: Configures the independent DNS resolution mode for FileBeat.
add_process_metadata: Adds meta information about a process.

Processors are used in:

- type: <input_type>
  processors:
  - <processor_name>:
      when:
        <condition>
      <parameters>
...
Copy the code

4. LogStash Collects FTP log files

exec {
    codec => plain { }
    command= >"curl ftp://server/logs.log"
    interval => 3000}
}
Copy the code

5, Logstash docker-compose startup failed (Permission denied)

Docker-compose: docker-compose: docker-compose: docker-compose: docker-compose: docker-compose

$ cat docker-compose.yml

version: '2'Services: logstash: image: docker. Elastic. Co/logstash/logstash: 6.4.2 user: rootcommand: id
Copy the code

6. Metricize Filter Plugin

Split a message into multiple messages.

# Raw information
{
    type= >"type A"
    metric1 => "value1"
    metric2 => "value2"
}

# config information
filter {
  metricize {
    metrics => [ "metric1"."metric2"]}}# Final output{{type= >"type A"                type= >"type A"
    metric => "metric1"             metric => "metric2"
    value => "value1"               value => "value2"}}Copy the code

Second, the Elasticsearch

1. Internal structure of ES inverted index

Lucene’s inverted index stores the corresponding document information according to the field. If docName and docContent contain the term “apple”, there will be these two index chains, as shown below:

DocName:The word "apple" -> "doc1, doc2, doc3..."DocContent:The word "apple" -> "doc2, doc4, doc6..."
Copy the code

“Jest” or “RestHighLevelClient

RestHighLevelClient is an official component. It will always be officially supported and updated with ES. It is recommended to use the official high-level API.

Jest is maintained by the community, so there will be some delay in updating. Currently, the latest version connects to ES6.3.1, and there are only four issues in the past month, indicating low overall activity, so it is not recommended to use it.

In addition, a TransportClient user’s manual in Chinese is recommended, which is well translated: github.com/jackiehff/e… .

3, ES single fragment uses From/Size paging to encounter repeated data

Under normal circumstances, From/Size is used for a single FRAGMENT of ES without data duplication. The possible reasons for data duplication are as follows:

No sort added;
Add sort by score, but all query statements are filter filter conditions (the score is the same);
Sort is added, but there are new, modified, deleted documents in the index, and so on.

For multiple shards, it is recommended to add the preference parameter to achieve consistency of paging results.

4. The number of object passed must be even but was [1]

[bug Mc-10868] – ES will return an error when passing in a Json object when calling setSource: The number of object passed must be even but was [1]. In this case, it is recommended to convert a Json object into a Map set or a Json object into a Json string. However, The type of The string needs to be set.

IndexRequest indexRequest = new IndexRequest("index"."type"."id"); JSONObject doc = new JSONObject(); //indexRequest.source(jsonObject); Indexrequest.source (jsonObject.parseObject ((String) doc.get())"json"), Map.class)); Indexrequest.source (json.tojsonString (doc), xContentType.json);Copy the code

5. Search across clusters

ES 6. X native support across the cluster search, please refer to the specific configuration: www.elastic.co/guide/en/ki…

PUT _cluster/settings
{
  "persistent": {
    "cluster": {
      "remote": {
        "cluster_one": {
          "seeds": [
            "127.0.0.1:9300"]},"cluster_two": {
          "seeds": [
            "127.0.0.1:9301"]},"cluster_three": {
          "seeds": [
            "127.0.0.1:9302"]}}}}}Copy the code

ES 6.5 has a new feature, cross-cluster Replication, for those of you who are interested.

6. Set the null sort position when ES sort

GET /_search
{
    "sort": [{"price" : {"missing" : "_last"}}]."query" : {
        "term" : { "product" : "chocolate"}}}Copy the code

7. How to deal with ES cold archiving data

With relatively low with large disk machine configuration for ES Warm Nodes, can be achieved by the index. The routing. The allocation. The require. Box_type data to set up the index data is cold or hot. If the index is rarely used, you can close the index and then open it when you need to search.

8. ES similar article detection

For deduplication of large text, you can refer to the SimHash algorithm. The document fingerprint (64-bit) can be extracted using SimHash. Two articles can be determined by calculating the Hamming distance using SimHash. Hamming distance calculation can be realized by plug-in: github.com/joway/elast…

9. Terms aggregation query optimization

If only the first N records need to be aggregated, you are advised to add Terms when you aggregate Terms"collect_mode": "breadth_first";
It can also be set"min_doc_count": 10To limit the minimum number of matching documents;
If you have requirements for the returned Term, you can set theminclude 和 excludeTo filter Term;
If you want to fetch all Term aggregations, but there are many of them, consider Filtering Values with partitions to fetch aggregations into multiple batches.

No result is displayed due to the Tomcat character set

The two systems are connected to the same ES service, the configuration and code are completely consistent, and the same search criteria, one can search something, the other can not search anything. The result of the investigation is that the Tomcat configuration of one of the systems is wrong, resulting in garbled codes during the request, so the data cannot be searched.

11, ES index set default word segmentation

By default, if the field does not specify a word splitter, ES or the standard word splitter is used. You can change the default word segmentation with the following Settings.

X supports default index segmentation (default_index) and default query segmentation (default_search).X is no longer supported.

PUT /index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "default": {
          "type": "ik_max_word"."tokenizer": "ik_max_word"
        }
      }
    }
  }
}
Copy the code

Magic parameters in ES

Index name: _index
Type name: _type
Document Id: _id
Score: _score
Index sort: _doc

If you have no special sorting requirements, it is recommended to use _doc for sorting, for example, when performing Scroll operations.

ES delay data Rollup

Rollup job has a delay parameter to control the delay of job execution. By default, the job execution is not delayed. In this way, if data of an interval has been aggregated, the late data of the interval will not be processed.

The rollup API supports searching both raw and rollup indexes, so if the data is often delayed, you can set a suitable delay, such as 1h, 6h, or even 24h, to delay the rollup index. But it ensures that late data is processed.

In application scenarios, rollup is generally used to aggregate historical data and reduce storage space. Therefore, a delay of several hours or even several days is reasonable. When searching for both the most recent bare index and the historical rollup index, you can combine the data from both to give the correct aggregate results while keeping performance in check.

Rollup is experimental, but very useful, especially for data warehouse scenarios using ES.

14. Es6.x retrieves all aggregation results

In ES2.x, setSize(0) is used to obtain all the aggregated results. In ES6.x, setSize(integer.max_value) is equivalent to 0 in 2.x.

15. ES Jar package conflict

The recommended solution is to use maven-shade-plugin, which resolves Jar package conflicts by replacing the conflicting Jar packages with a namespace. For details, please refer to the article: www.jianshu.com/p/d9fb7afa6… .

<plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> The < version > against 2.4.1 < / version > < configuration > < createDependencyReducedPom >false</createDependencyReducedPom> </configuration> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <relocations> <relocation> <pattern>com.google.guava</pattern> <shadedPattern>net.luculent.elasticsearch.guava</shadedPattern> </relocation> <relocation> <pattern>com.fasterxml.jackson</pattern> <shadedPattern>net.luculent.elasticsearch.jackson</shadedPattern> </relocation>  <relocation> <pattern>org.joda</pattern> <shadedPattern>net.luculent.elasticsearch.joda</shadedPattern> </relocation> <relocation> <pattern>com.google.common</pattern> <shadedPattern>net.luculent.elasticsearch.common</shadedPattern> </relocation> <relocation> <pattern>com.google.thirdparty</pattern> <shadedPattern>net.luculent.elasticsearch.thirdparty</shadedPattern> </relocation> </relocations> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer" />
                        <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                    </transformers>
                </configuration>
            </execution>
        </executions>
    </plugin>
</plugins>
Copy the code

How does ES select Shard to store documents?

ES hashes the specified (or randomly generated by default) _id of the document to be indexed using the djB2 hash algorithm. After obtaining the hash result, modulo the number of index shards n, with the following formula: hash(_id) % n; Decide which shard to store according to the result of module fetching.

Third, Kibana

1. Display custom fields in Kiabana’s Discovery screen

By default, Kibana’s Discovery screen displays only two fields, time and _Source. The left half of the screen shows a lot of them under Popular. To add your custom fields to the Discovery screen, click Add after the fields you want to display.

2. Description of the Monitor indicator of FileBeat

Total: ‘All Events newly created in the Publishing Pipeline ‘
Emitted by: ‘Processed by the output (including retries)’
Indispensable: ‘Events indispensable by the output (includes Events dropped by the output)’
Queued: ‘Events added to the event pipeline queue’

4. Selected community articles

Elastic Certification Experience
A quick start with Logstash
When Elasticsearch meets Kafka–Kafka Connect
Elasticsearch separated hot and cold data read/write
Elasticsearch good practice
ELK uses Tips (Issue 1)

Any Code, Code Any!

Scan code to pay attention to “AnyCode”, programming road, together forward.