In the Elastic Stack, Logstash serves as an ETL ingestion tool that makes it possible to ingest large amounts of data. The Elastic Stack provides index lifecycle management that allows us to manage incoming data between hot and cold nodes and remove indexes that don’t need to be saved. In today’s article, we’ll cover how to configure index lifecycle management for Logstash. Before you begin, you are advised to read the following documentation:

  • How to get started with Logstash
  • Logstash: Getting started with Logstash
  • Elasticsearch: An introduction to Index lifecycle management

In this experiment, I’ll use Elastic Stack 7.10 to demonstrate this.

 

The premise condition

You will need to complete the experiments in the article “Logstash: Getting Started with Logstash, Part 2”. In that article, it describes in detail how to install and configure the required Elastic Stack: Elasticsearch, Kibana, and Logstash.

In today’s practice, we start Elasticsearch a little differently. Let’s create a two-node Elasticsearch cluster. One node is hot and the other is warm. We installed the following way to boot.

See the article “Elasticsearch: Using Shard Filtering to control which node to assign indexes to to run a two-node cluster.” After installing Elasticsearch, open a terminal and run the following command:

./bin/elasticsearch -E node.name=node1 -E node.attr.data=hot -Enode.max_local_storage_nodes=2
Copy the code

It will run a node called node1. At the same time, run the following command in another terminal:

./bin/elasticsearch -E node.name=node2 -E node.attr.data=warm -Enode.max_local_storage_nodes=2
Copy the code

It runs another node called node2. You can run the following command to check:

GET _cat/nodes? vCopy the code

Display two nodes:

We can check the properties of both nodes with the following command:

GET _cat/nodeattrs? v&s=nameCopy the code

Obviously one of the nodes is hot and the other is warm.

Now we have created our Elasticsearch cluster.

The test environment is as follows:

In general, the index life cycle has the following phases:

As shown above, a typical ILM typically has four phases: Hot, Warm, Cold, and Delete. You can initiate phases that are tailored to your business needs.

 

Hands-on practice

In the following practice, we will adopt the following steps:

  • Create an Index Lifecycle Management (ILM) Policy
  • Create Index template
  • Import data and observe the results of ILM

Create the ILM Policy

First, open Kibana. Let’s define an ILM policy called Logstash:

Let’s define the hot Phase configuration:

As shown in the figure above, it scrolls to another index when the number of documents meets any of the following criteria:

  • The size of the index exceeds 1G
  • The number of documents exceeds 5
  • The index has a maximum time span of more than 30 days

The purpose of this is to prevent an index from becoming too large.

Next, let’s define the warm Phase configuration:

Above, we start the Warm Phase. In this phase, the data will be stored on nodes with warm tags. Since we only have one warm node, IN this exercise, I set the number of replica to 0. In practice, more replicas represent more read capacity. This can be set according to your business needs and configuration. I also enable Shrink index, which means that it will compress all the primary shards into one in the warm phase. Typically, the primary shard represents the ability to import data. In warm Phase, we usually don’t need to import data, we only import data in hot nodes.

For the sake of simplicity, we will not define any other phases in today’s exercise.

Click Save as New Policy above. This completes the definition of policy.

We can use the following API to find it, or click the Show Request button above to see:

We can use the following command to view:

GET _ilm/policy/logstash
Copy the code
{"logstash" : {"version" : 1, "modified_date" : "2020-12-07T06:21:20.988z ", "policy" : {"phases" : {"warm" : { "min_age" : "0ms", "actions" : { "allocate" : { "number_of_replicas" : 0, "include" : { }, "exclude" : { }, "require" : { "data" : "warm" } }, "shrink" : { "number_of_shards" : 1 }, "set_priority" : { "priority" : 50 } } }, "hot" : { "min_age" : "0ms", "actions" : { "rollover" : { "max_size" : "1gb", "max_age" : "30d", "max_docs" : 5 }, "set_priority" : { "priority" : 100 } } } } } } }Copy the code

 

Create logstash template

Next we need to create an index template. We’ll call it logstash_template:

If you want to use Data Stream, see my previous article “Elastic: Data Stream for Index Lifecycle Management.” Click the Next button:

Let’s copy the following configuration into the edit box:

{
  "lifecycle": {
    "name": "logstash",
    "rollover_alias": "logstash"
  },
  "routing": {
    "allocation": {
      "require": {
        "data": "hot"
      }
    }
  },
  "refresh_interval": "1s",
  "number_of_shards": "2"
}
Copy the code

Above we defined the name of lifecyle and the name of rollover_alias. Click the Next button:

Click Create Template above:

So we create logSTash_template. In addition to creating the logSTash_template, it also generates the Logstash alias when the first document is imported. We can use the following command to check:

GET _index_template/logstash_template
Copy the code

We can view the logSTash_template defined by using the command above. Note that this API is different from previous versions. The previous index template was the _template endpoint instead of _index_template. In actual use, you need to look at it. The command above produces the following results:

{
  "index_templates" : [
    {
      "name" : "logstash_template",
      "index_template" : {
        "index_patterns" : [
          "logstash-*"
        ],
        "template" : {
          "settings" : {
            "index" : {
              "lifecycle" : {
                "name" : "logstash",
                "rollover_alias" : "logstash"
              },
              "routing" : {
                "allocation" : {
                  "require" : {
                    "data" : "hot"
                  }
                }
              },
              "refresh_interval" : "1s",
              "number_of_shards" : "2"
            }
          }
        },
        "composed_of" : [ ]
      }
    }
  ]
}
Copy the code

 

Define the Logstash configuration file

With the introduction of ILM, we need to redefine web_log.conf from the previous article “Logstash: Getting Started with Logstash tutorial part 2” :

weblog.conf

input {
  tcp {
    port => 9900
  }
}

filter {
  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}" }
  }

  mutate {
    remove_field => [ "message" ]
  }

  geoip {
    source => "clientip"
  }

  useragent {
    source => "agent"
    target => "useragent"
  }

  date {
    match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
  }  

  mutate {
    remove_field => [ "timestamp" ]
  }    
}

output {
  stdout { }

   elasticsearch {
    hosts => ["localhost:9200"]
    ilm_rollover_alias => "logstash"
    ilm_pattern => "000001"
    ilm_policy => "logstash"
    #user => "elastic"
    #password => "password"
  } 
}
Copy the code

Note that we defined the following lines in the output of elasticSearch:

    ilm_rollover_alias => "logstash"
    ilm_pattern => "000001"
    ilm_policy => "logstash"
Copy the code

Please refer to the official Elasticsearch Output Plugin if you are interested. We fill in the policy and rollover_alias we defined earlier.

 

Import the data through logstash and check the results

We start the logstash ina terminal:

./bin/logstash -f /Users/liuxg/demos/logstash/weblog.conf
Copy the code

Modify the preceding path based on the path of your configuration file.

To quickly see ILM in action, we reduced the ILM check time from the default of 10 minutes to 10 seconds.

PUT _cluster/settings
{
    "transient": {
      "indices.lifecycle.poll_interval": "10s"
    }
}
Copy the code

Enter the following command in another terminal:

head -n 5 weblog-sample.log | nc localhost 9900
Copy the code

You can run the following command to check:

GET logstash/_search
Copy the code

We can see that all documents are written to the logstuck-000001 index:

After a while, we type the following command:

GET logstash/_ilm/explain
Copy the code

It shows:

{ "indices" : { "logstash-000002" : { "index" : "logstash-000002", "managed" : true, "policy" : "Logstash ", "lifecycle_date_millis" : 1607324430865, "age" : "1.93m", "phase" : "hot", "phase_time_millis" : 1607324432647, "action" : "rollover", "action_time_millis" : 1607324443170, "step" : "check-rollover-ready", "step_time_millis" : 1607324443170, "phase_execution" : { "policy" : "logstash", "phase_definition" : { "min_age" : "0ms", "actions" : { "rollover" : { "max_size" : "1gb", "max_age" : "30d", "max_docs" : 5 }, "set_priority" : { "priority" : 100 } } }, "version" : 1, "modified_date_in_millis" : 1607322080988 } }, "shrink-logstash-000001" : { "index" : "shrink-logstash-000001", "managed" : true, "policy" : "Logstash ", "lifecycle_date_millis" : 1607324430710, "age" : "1.93m", "phase" : "warm"," Action ": "complete", "action_time_millis" : 1607324455330, "step" : "complete", "step_time_millis" : 1607324455330 } } }Copy the code

There are two indexes shown above: logstair-000002 and duck-logstair-000001. This duck-logstash -000001 is due to the fact that there are five documents from the Hot Phase that meet the trigger condition, thus moving to the Warm Phase. In warm Phase, we combine all primary shards into one Primary shard. We can look at the policy definition above.

In terminal, we then type the following command:

head -n 1 weblog-sample.log | nc localhost 9900
Copy the code

We then use the following command to check:

GET logstash/_search
{
  "_source": "_index"
}
Copy the code

The command above shows:

{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : {" total ": {" value" : 6, "base" : "eq"}, "max_score" : 1.0, "hits" : [{" _index ": "Duck-logstash -000001", "_type" : "_doc", "_id" : "rAEAPHYB9dcfabeQVSHM", "_score" : 1.0, "_source" : { } }, { "_index" : "shrink-logstash-000001", "_type" : "_doc", "_id" : "qgEAPHYB9dcfabeQVSHM", "_score" : 1.0, the "_source" : {}}, {" _index ":" the shrink - logstash - 000001 ", "_type" : "_doc", "_id" : "Qehimb9dcfabeqvshm ", "_score" : 1.0, "_source" : {}}, {"_index" :" duck-logstash -000001", "_type" : "_doc", "_id" : "rQEAPHYB9dcfabeQVSHN", "_score" : 1.0, "_source" : {}}, {" _index ": "Duck-logstash -000001", "_type" : "_doc", "_id" : "rgEAPHYB9dcfabeQVSHT", "_score" : 1.0, "_source" : {}}, {" _index ":" logstash - 000002 ", "_type" : "_doc", "_id" : "1 qegphyb9dcfabeqysh_", "_score" : 1.0, "_source" : {}}]}}Copy the code

That is, there are two indexes: duck-logstair-000001 and logstair-000002. The latest imported document is stored in the Logststash -000002 (Hot Node) index. The previous five documents are stored in duck-logstash -000001 (Warm node).

After the data is imported, we can use the following command to view the logstash alias that has been created:

GET _cat/aliases? vCopy the code

The command above shows:

alias index filter routing.index routing.search is_write_index .kibana .kibana_1 - - - - .kibana_task_manager Kibana_task_manager_1 - - - - APM-7.10.0-span APM-7.10.0-span 000001 -- true APM-7.10.0-TRANSACTION Apm-7.10.0-transaction-000001 -- true logStash duck-logstash -000001 - - - -logstash-000001 duck-logstash -000001 - - - - APM-7.10.0-metric APM-7.10.0-metric -000001 -- true APM-7.10.0-profile APM-7.10.0-profile-000001 -- true .kibana-event-log-7.10.0. kibana-event-log-7.10.0-000001 -- true logstash logstash -- 000002 -- true Metricbeat -7.10.0 metricBeat -7.10.0-2020.12.06-000001 - true APM-7.10.0-error APM-7.10.0-error-000001 - true metricBeat -7.10.0-2020.12.06-000001 - true ilm-history-3 ilm-history-3-000001 - - - trueCopy the code

From the list above, we can see that the logstash alias points to logstuck-00002.