Warning: This feature is in beta and is subject to change. The design and code are less mature than the formal GA functionality and are provided as-is without warranty. Beta functionality is not subject to the support SLA of official GA functionality.

Runtime fields are fields that are evaluated at query time. Runtime fields enable you to:

Add fields to existing documents without reindexing the data
Start using data without understanding its structure
The value returned from the index field when overwriting the query
Define fields for a specific purpose without modifying the infrastructure

You can access the runtime field from the search API just like any other field, and Elasticsearch sees the runtime field no differently. You can define runtime fields in index Mapping or search Request. This choice is entirely up to you, and is part of the inherent flexibility of runtime fields.

Runtime fields are useful when using log data (see the example), especially when data structures are uncertain. Your search speed will be slower, but the index size will be much smaller, and you can process logs much faster without having to index them.

You can also read two other Runtime Fields articles below:

Elasticsearch: Create Runtime field and use it in Kibana – released in 7.11
Elasticsearch: Dynamically create Runtime Fields-7.11 release

Benefits of Runtime Fields

Because Runtime Fields are not indexed, adding run-time fields does not increase the size of the index. You can save on storage costs and speed up extraction by defining run-time fields directly in index maps. You can get data out into an Elastic Stack faster and access it instantly. When you define a runtime field, you can immediately use it for search requests, aggregation, filtering, and sorting.

If you set runtime Fields as an index field, there is no need to modify any queries that reference runtime Fields. Even better, you can refer to some indexes where the field is a Runtime field and others where the field is an index field (they can share the same index field name). You have the flexibility to choose which fields to index and which fields to keep as run-time fields.

Essentially, the most important benefit of a runtime field is the ability to add it to a document after it has been extracted. This feature simplifies mapping decisions because you don’t have to decide up front how to parse the data, and you can always modify the mapping using run-time fields. Using run-time fields allows for smaller indexes and faster extraction times, reducing resource consumption and operating costs.

compromise

Runtime Fields uses less disk space and provides flexibility in accessing data, but can affect search performance depending on calculations defined in Runtime scripts.

To balance search effectiveness and flexibility, you usually search and filter index fields, such as timestamps. Elasticsearch automatically uses these index fields first when running the query, reducing response time. You can then use runtime Fields to limit the number of fields required by Elasticsearch to calculate its value. Using index fields with Runtime Fields gives you flexibility in querying the data you index and how you define other fields.

Use the asynchronous search API to run a search that contains runtime Fields. This search method helps offset the performance impact of the calculated value of the run-time field in each document that contains the field. If the query cannot return the result set synchronously, you will get the results asynchronously when the results are available.

Important: Queries against Runtime Fields are considered time consuming. If search. Allow_expensive_queries is set to false, time-consuming queries are not allowed and Elasticsearch will reject any queries for the Runtime field.

example

The following example demonstrates how to use Runtime Fields to fix errors in index data. We intentionally index documents that have some errors, and then use Runtime Fields to hide the index fields. This example shows how users will see the correct information when querying data or creating visualizations in Kibana Lens, which is computed in The Runtime Fields. In this case, errors in the index data can be immediately fixed by adding shadow data to the Runtime Fields rather than reindexing. Runtime filed is the name provided for the schema implementation when read in Elasticsearch.

To create an index template called dur_log, type the following command in Kibana console:

# Create an index template which we will use to create multiple indices
PUT _index_template/dur_log
{
  "index_patterns": [
    "dur_log-*"
  ],
  "template": {
    "mappings": {
    "properties": {
      "timestamp": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss"
      },
      "browser": {
        "type": "keyword"
      },
      "duration": {
        "type": "double"
      }
    }
  }
  }
}
Copy the code

The above shows the index properties that any index starting with dur_log- will have. It defines three fields:

timestamp
type
duration

Next we use the BULK API to create an index called dur_log-1:

# Laod a few documents, Firefox erroneously entered in ms instead of sec POST dur_log-1/_bulk {"index":{}} {"timestamp": "The 2021-01-25 10:01:12", "browser" : "Chrome", "duration" : 1.176} {" index ": {}} {" timestamp" : "The 2021-01-25 10:01:13", "browser" : "Safari" and "duration" : 1.246} {" index ": {}} {" timestamp" : "The 2021-01-26 10:02:11", "browser" : "the Edge", "duration" : 0.993} {" index ": {}} {" timestamp" : "2021-01-26 10:02:15", "browser": "Firefox", "duration": 1342} {"index":{}} {"timestamp": "The 2021-01-26 10:01:23", "browser" : "Chrome", "duration" : 1.151} {" index ": {}} {" timestamp" : "The 2021-01-27 10:01:54", "browser" : "Chrome", "duration" : 1.141} {" index ": {}} {" timestamp" : "2021-01-28 10:01:32", "browser": "Firefox", "duration": 984} {"index":{}} {"timestamp": "The 2021-01-29 10:01:21", "browser" : "the Edge", "duration" : 1.233} {" index ": {}} {" timestamp" : "The 2021-01-30 10:02:07", "browser" : "Safari" and "duration" : 1.312} {" index ": {}} {" timestamp" : "2021-01-30 10:01:19", "browser": "Chrome", "duration": 1.231}Copy the code

Above we import some data into Elasticsearch. We can see some problems in the data above: Firefox displays a duration value of 984,1342, while other browsers display a duration value around 1. We can easily find this problem by using the following aggregation:

# Aggregate for average duration per browser
GET dur_log-1/_search
{
  "size": 0,
  "aggs": {
    "terms": {
      "terms": {
        "field": "browser"
      },
      "aggs": {
        "average duration": {
          "avg": {
            "field": "duration"
          }
        }
      }
    }
  }
}
Copy the code

The command above shows the result:

{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 20, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "terms" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "Chrome", "doc_count" : 8, "Average duration" : {"value" : 1.17475}}, {"key" : "Edge", "doc_count" : 4, "average duration" : {"value" : 1.113}}, {"key" : "Firefox", "doc_count" : 4, "average duration" : {"value" : 1163.0}}, {"key" : "Safari", "doc_count" : 4, "average duration" : {"value" : 1.279}}]}}}Copy the code

It shows that Firefox’s average Duration value is 1163. This is significantly higher than other browsers.

What is the reason for this? The reason may be that the unit of duration was wrong when the data was originally imported. Other browsers have a duration in seconds, while Firefox has a duration in milliseconds. So when we use Lens to display data, it looks like this:

From the above, we can see that Firefox values are significantly higher than those of other browsers. It would be wrong to display its units clearly. So how do we fix this error? One way is that we modify our data source again, divide the duration value of Firefox by 1000 and re-import, or use ingest Pipeline to handle the import process. So is there a way to show the correct data without reimporting it?

The answer is to use runtime Field. We define a Runtime field as follows:

# Create a runtime field to shadow the indexed field and have the Firefox duration divided by 1000 GET dur_log-1/_search  { "runtime_mappings": { "duration": { "type": "double", "script": { "source": If (doc['browser'].value == "Firefox") {emit(params._source['duration'] / 1000.0)} else {emit(params._source['duration'])}""" } } }, "size": 0, "aggs": { "terms": { "terms": { "field": "browser" }, "aggs": { "average duration": { "avg": { "field": "duration" } } } } } }Copy the code

Note the runtime_mappings section above:

"runtime_mappings": { "duration": { "type": "double", "script": { "source": If (doc['browser'].value == "Firefox") {emit(params._source['duration'] / 1000.0)} else {emit(params._source['duration'])}""" } } },Copy the code

In the previous section, we used a painless script to do special processing for duration in Firefox. If the browser is Firefox, its Duration value is divided by 1000. Although this field is the same field name as duration in the source, it will first adopt the duration defined in runtime_mappings when searching. This result will be displayed in the search results, but it does not affect the source data stored in Elasticsearch. The command above shows the result:

{ "took" : 21, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 10, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "terms" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "Chrome", "doc_count" : 4, "average duration" : {"value" : 1.17475}}, {"key" : "Edge", "doc_count" : 2, "average duration" : {"value" : 1.113}}, {"key" : "Firefox", "doc_count" : 2, "average duration" : {"value" : 1.163}}, {" key ":" Safari ", "doc_count" : 2, "business duration" : {" value ": 1.279}}]}}}Copy the code

As you can see from the above results, Firefox’s average duration this time is 1.163. This is obviously similar to the duration value of other browsers, which is a very reasonable value.

The above definition of runtime_mappings at search time solves our search problem, but it is not easy to visualize in Kibana. We can add this runtime field definition to the Mapping:

# Add the runtime field to the mapping so all can use it PUT dur_log-1/_mapping { "runtime": { "duration": { "type": "double", "script": { "source": If (doc['browser'].value == "Firefox") {emit(params._source['duration'] / 1000.0)} else {emit(params._source['duration'])}""" } } } }Copy the code

Using the above definition, we will do special processing for dur_log-1 indexes. To query dur_log-1 mapping, run the following command:

GET dur_log-1/_mapping
Copy the code

The command above shows:

{ "dur_log-1" : { "mappings" : { "runtime" : { "duration" : { "type" : "double", "script" : { "source" : If (doc['browser'].value == "Firefox") {emit(params._source['duration'] / 1000.0)} else {emit(params._source['duration'])}""", "lang" : "painless" } } }, "properties" : { "browser" : { "type" : "keyword" }, "duration" : { "type" : "double" }, "timestamp" : { "type" : "date", "format" : "yyyy-MM-dd HH:mm:ss" } } } } }Copy the code

Again, we use the following command for aggregation:

# Aggregate on duration and return all fields
GET dur_log-*/_search
{
  "size": 0, 
  "aggs": {
    "terms": {
      "terms": {
        "field": "browser"
      },
      "aggs": {
        "average duration": {
          "avg": {
            "field": "duration"
          }
        }
      }
    }
  }
}
Copy the code

The command above shows the result:

{ "took" : 2, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 10, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "terms" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "Chrome", "doc_count" : 4, "average duration" : {"value" : 1.17475}}, {"key" : "Edge", "doc_count" : 2, "average duration" : {"value" : 1.113}}, {" key ":" Firefox ", "doc_count" : 2, "business duration" : {" value ": 1.163}}, {" key" : "Safari", "doc_count" : 2, "average duration" : {"value" : 1.279}}]}}}Copy the code

Above, it shows the correct result.

Similarly, we directly display data in Lens:

Obviously this time we have the right statistics. Firefox values are similar to those of other browsers, not vastly different.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Elasticsearch: Used Runtime fields to override index fields to fix errors – Released in 7.11

Benefits of Runtime Fields

compromise

example

Elasticsearch: Used Runtime fields to override index fields to fix errors – Released in 7.11

Benefits of Runtime Fields

compromise

example

Related Posts

RocketMq Topic Routing Information Update (2)

Spring Boot configures the MySQL database in application.yml

Master-slave causes cache inconsistency thinking