In the ingest node of Elasticsearch, we can use the following processor to process some of our data. Their function is very specific and clear. Is there a more flexible way to program Elasticsearch? If so, what language is it in?
In Elasticsearch, it uses a language called Painless. It was built specifically for Elasticsearch. Painless is a simple, secure scripting language designed for use with Elasticsearch. It is the default scripting language for Elasticsearch and can be safely used for inline and stored scripts. It has a groovy-like syntax. Versions after Elasticsearch 6.0 no longer support Groovy, Javascript, or Python.
Using scripts, you can evaluate custom expressions in Elasticsearch. For example, you can use scripts to return “Script fields” as part of a search request, or to evaluate a custom score for a query.
How to use scripts:
The syntax of the script is:
"script": { "lang": "..." , "source" | "id": "..." , "params": { ... }}Copy the code
- The default value of lang here is “painless”. In practical use, this can be omitted unless a second language is available
- The source can be an inline script, or an ID that corresponds to a stored script
- Any named parameter can be used as an input parameter to the script
Painless can use Java-style comment statements that support branching, looping, and other control structures, so Painless has some keywords that cannot be used to declare identifiers. The Painless keyword is much less than Java, with only 15 in all. The following table lists all the available keywords to see which statement types Painless supports.
Painless keyword
if | else | while | do | for |
---|---|---|---|---|
in | continue | break | return | new |
try | catch | throw | this | instanceof |
Painlesss supports all control statements in Java syntax except switch.
A simple use example of Painless
The inline script
Let’s start by creating a simple document:
PUT twitter/_doc/1 {"user" : "hello ", "message" :" nice weather today, go for a walk ", "uid" : 2, "age" : 20, "city" : "Beijing ", "province" : "Beijing", "country" : "Chinese", "address" : "haidian district in Beijing, China", "location" : {" lat ":" 39.970718 ", "says lon" : "116.325747"}}Copy the code
In this document, we now want to change the age to 30, so one way to do this is to read the entire document, change the age to 30, and write it back in the same way. First there are several actions: read the data, then modify it, then write it again. Obviously it’s a bit of a hassle. Here we can modify it directly using Painless:
POST twitter/_update/1
{
"script": {
"source": "ctx._source.age = 30"
}
}
Copy the code
The source here indicates our Painless code. Here we wrote very little code in the DSL. This code is called inline. Here we access the age in _souce directly via ctx._source.age. So we programmatically modify the age directly. The result of the run is:
{ "_index" : "twitter", "_type" : "_doc", "_id" : "1", "_version" : 16, "_seq_no" : 20, "_primary_term" : 1, "found" : True, "_source" : {" user ":" ShuangYuShu - zhang SAN ", "message" : "today the weather is good, walk to", "uid" : 2, "age" : 30, "city" : "Beijing", "province" : "Beijing", "country" : "Chinese", "address" : "haidian district in Beijing, China", "location" : {" lat ":" 39.970718 ", "says lon" : "116.325747"}}}Copy the code
Obviously this age has changed to 30. The above approach is fine, but scripts need to be recompiled every time they are executed. Compiled scripts can be cached and used later. The script above needs to be recompiled if the age is changed. A better approach would be this:
POST twitter/_update/1
{
"script": {
"source": "ctx._source.age = params.value",
"params": {
"value": 34
}
}
}
Copy the code
In this way, the source of our script does not have to change, and only needs to be compiled once. The next time you call it, you just need to change the parameters in params.
In the Elasticsearch:
"script": {
"source": "ctx._source.num_of_views += 2"
}
Copy the code
and
"script": {
"source": "ctx._source.num_of_views += 3"
}
Copy the code
Are treated as two different scripts that need to be compiled separately, so the best way is to pass in the parameters using Params.
In addition to the above update, we can also use script query to continue searching our documents:
GET twitter/_search { "query": { "script": { "script": { "source": "doc['city'].contains(params.name)", "lang": "Painless ", "params": {"name":" Beijing "}}}}}Copy the code
In the script above, query for all documents that contain “Beijing” in the city field.
Stored script
In this case, scripts can be stored in a cluster state. It can then be called by ID:
PUT _scripts/add_age
{
"script": {
"lang": "painless",
"source": "ctx._source.age += params.value"
}
}
Copy the code
Here, we define a script called add_age. What it does is it adds a value to the age in the source. We can call it later:
POST twitter/_update/1
{
"script": {
"id": "add_age",
"params": {
"value": 2
}
}
}
Copy the code
From the above implementation, we can see that age will be incremented by 2.
Access the fields in source
The syntax used to access field values in Painless depends on context. In Elasticsearch, there are many different Plainless contexts. As that link shows, Plainless contexts include: Ingest Processor, update, update by Query, sort, filter, and so on.
Context | Access to the field |
---|---|
Ingest node: Use CTX to access fields | ctx.field_name |
Updates: Use the _source field | ctx._source.field_name |
Updates here include _update, _reindex, and update_by_query. Here, our understanding of context is very important. It means that CTX contains different fields in use for different apis. In the following examples, we make specific analysis for some situations.
Painless script example
First we create a pipeline called add_field_c. For more information on how to create a Pipleline, see my previous article “How to use the Pipeline API to handle events in Elasticsearch”.
Example 1
PUT _ingest/pipeline/add_field_c
{
"processors": [
{
"script": {
"lang": "painless",
"source": "ctx.field_c = (ctx.field_a + ctx.field_b) * params.value",
"params": {
"value": 2
}
}
}
]
}
Copy the code
This pipepline creates a new field: field_c. The result is the sum of Field_A and field_B multiplied by 2. So let’s create a document like this:
PUT test_script/_doc/1? pipeline=add_field_c { "field_a": 10, "field_b": 20 }Copy the code
Here, we use pipleline add_field_c. The results are as follows:
{ "took" : 147, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : , "hits" : {0} "total" : {" value ": 1, the" base ":" eq "}, "max_score" : 1.0, "hits" : [{" _index ": "Test_script _type", "" :" _doc ", "_id" : "1", "_score" : 1.0, "_source" : {" field_c ": 60," field_a ": 10," field_b ": 20}}]}}Copy the code
Obviously, we can see that field_c was created successfully.
Example 2
In ingest, you can use script handlers to process metadata such as _index and _type. Here is an example of an Ingest Pipeline that renames the index and type to my_index, regardless of what was provided in the original index request:
PUT _ingest/pipeline/my_index
{
"description": "use index:my_index and type:_doc",
"processors": [
{
"script": {
"source": """
ctx._index = 'my_index';
ctx._type = '_doc';
"""
}
}
]
}
Copy the code
Using the above pipeline, we can try to index a document to any_index:
PUT any_index/_doc/1? pipeline=my_index { "message": "text" }Copy the code
The results are as follows:
{
"_index": "my_index",
"_type": "_doc",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 89,
"_primary_term": 1,
}
Copy the code
That is, the actual document is stored in my_index, not any_index.
Example 3
PUT _ingest/pipeline/blogs_pipeline
{
"processors": [
{
"script": {
"source": """
if (ctx.category == "") {
ctx.category = "None"
}
"""
}
}
]
}
Copy the code
We defined a pipeline above that will check if the category field is empty and change it to “None” if so. Again, take the previous test_script index:
PUT test_script/_doc/2? pipeline=blogs_pipeline { "field_a": 5, "field_b": 10, "category": "" } GET test_script/_doc/2Copy the code
The results are as follows:
{
"_index" : "test_script",
"_type" : "_doc",
"_id" : "2",
"_version" : 2,
"_seq_no" : 6,
"_primary_term" : 1,
"found" : true,
"_source" : {
"field_a" : 5,
"field_b" : 10,
"category" : "None"
}
}
Copy the code
Obviously, it has changed the category field to “None”.
Example 4
POST _reindex
{
"source": {
"index": "blogs"
},
"dest": {
"index": "blogs_fixed"
},
"script": {
"source": """
if (ctx._source.category == "") {
ctx._source.category = "None"
}
"""
}
}
Copy the code
The example above writes “None” if category is empty on reindex. As we can see from the two examples above, we can work directly on cxt.field for pipeline, and on fields under CXT. _source for update. This is also the context difference mentioned earlier.
Example 5
PUT test/_doc/1
{
"counter" : 1,
"tags" : ["red"]
}
Copy the code
You can add tags to the tags list using the and update script (this is just a list, so tags are added even if they exist) :
POST test/_update/1
{
"script" : {
"source": "ctx._source.tags.add(params.tag)",
"lang": "painless",
"params" : {
"tag" : "blue"
}
}
}
Copy the code
Display result:
GET test/_doc/1
Copy the code
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_version" : 4,
"_seq_no" : 3,
"_primary_term" : 11,
"found" : true,
"_source" : {
"counter" : 1,
"tags" : [
"red",
"blue"
]
}
}
Copy the code
Shows that “blue” has been successfully added to the tags list.
You can also remove tags from the tags list. The Painless function that deletes a tag takes the array index of the element to be deleted. To avoid possible runtime errors, you first need to ensure that the tag exists. If the list contains duplicates of the tag, this script removes only one match.
POST test/_update/1
{
"script": {
"source": "if (ctx._source.tags.contains(params.tag)) { ctx._source.tags.remove(ctx._source.tags.indexOf(params.tag)) }",
"lang": "painless",
"params": {
"tag": "blue"
}
}
}
GET test/_doc/1
Copy the code
Display result:
{
"_index" : "test",
"_type" : "_doc",
"_id" : "1",
"_version" : 5,
"_seq_no" : 4,
"_primary_term" : 11,
"found" : true,
"_source" : {
"counter" : 1,
"tags" : [
"red"
]
}
}
Copy the code
“Blue” has apparently been deleted.
Painless script simple practice
To illustrate how Painless works, let’s load some hockey statistics into the Elasticsearch index:
PUT hockey/_bulk? refresh {"index":{"_id":1}} {" first ":" Johnny ", "last" : "gaudreau", "goals" :,27,1 [9], "assists" :,46,0 [17], "gp" :,82,1 [26], "born" : "1993/08/13}" {"index":{"_id":2}} {" first ":" Sean ", "last" : "monohan," "goals" :,54,26 [7], "assists" :,26,13 [11], "gp" :,82,82 [26], "born" : "1994/10/12}" {"index":{"_id":3}} {" first ":", jiri ", "last" : "hudler", "goals" :,34,36 [5], "assists" :,62,42 [11], "gp" :,80,79 [24], "born" : "1984/01/04}" {"index":{"_id":4}} {" first ", "sampling", "last" : "frolik", "goals" :,6,15 [4], "assists" :,23,15 [8], "gp" :,82,82 [26], "born" : "1988/02/17}" {" index ": {" _id" : 5}} {" first ":" Sam ", "last", "Bennett," "goals" : [0, 5], "assists" : (8, 0), "gp" : [0] 26, "born" : "1996/06/20}" {"index":{"_id":6}} {" first ":" Dennis ", "last" : "wideman," "goals" :,26,15 [0], "assists" :,30,24 [11], "gp" :,81,82 [26], "born" : "1983/03/20}" {"index":{"_id":7}} {" first ":" David ", "last" : "Jones," "goals" :,19,5 [7], "assists" :,17,4 [3], "gp" :,45,34 [26], "born" : "1984/08/10}" {"index":{"_id":8}} {" first ":" tj ", "last" : "brodie," "goals" :,14,7 [2], "assists" :,42,30 [8], "gp" :,82,82 [26], "born" : "1990/06/07}" {"index":{"_id":39}} {" first ":" mark ", "last" : "giordano", "goals" :,30,15 [6], "assists" :,30,24 [3], "gp" :,60,63 [26], "born" : "1983/10/03}" {"index":{"_id":10}} {" first ":" mikael ", "last" : "backlund", "goals" :,15,13 [3], "assists" :,24,18 [6], "gp" :,82,82 [26], "born" : "1989/03/17}" {"index":{"_id":11}} {" first ":" Joe ", "last" : "colborne," "goals" :,18,13 [3], "assists" :,20,24 [6], "gp" :,67,82 [26], "born" : "1990/01/30}"Copy the code
Use Painless to access values in Doc
The values in the document can be accessed through a Map value called doc. For example, the following script counts the total number of goals scored by a player. This example uses type int and fo r loops.
GET hockey/_search { "query": { "function_score": { "script_score": { "script": { "lang": "painless", "source": """ int total = 0; for (int i = 0; i < doc['goals'].length; ++i) { total += doc['goals'][i]; } return total; """}}}}}Copy the code
Here we calculate the _score for each document using script. Add up each athlete’s goal through script to form the final _score. Here we use the Map type doc[‘goals’] to access our field values. The result displayed is:
{ "took" : 25, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : , "hits" : {0} "total" : {" value ": 11," base ":" eq "}, "max_score" : 87.0, "hits" : [{" _index ": "Hockey", "_type" : "_doc", "_id" : "2", "_score" : 87.0, "_source" : {" first ":" Sean ", "last" : "monohan," "goals" : [ 7, 54, 26 ], "assists" : [ 11, 26, 13 ], "gp" : [ 26, 82, 82 ], "born" : "1994/10/12" } }, ...Copy the code
Alternatively, you can use script_fields instead of function_score to do the same:
GET hockey/_search { "query": { "match_all": {} }, "script_fields": { "total_goals": { "script": { "lang": "painless", "source": """ int total = 0; for (int i = 0; i < doc['goals'].length; ++i) { total += doc['goals'][i]; } return total; """}}}}Copy the code
The result displayed is:
{ "took" : 5, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : {" total ": {" value" : 11, "base" : "eq"}, "max_score" : 1.0, "hits" : [{" _index ":" hockey ", "_type" : "_doc", "_id" : "1", "_score" : 1.0, "fields" : {" total_goals ": [37]}}, {" _index" : "hockey", "_type" : "_doc", "_id" : "2", "_score" : 1.0, "fields" : {" total_goals ": [87]}},...Copy the code
The following example uses a Painless script to sort players by their combined first and last names. Use doc [‘first’].value and doc [‘last’].value to access the name.
GET hockey/_search
{
"query": {
"match_all": {}
},
"sort": {
"_script": {
"type": "string",
"order": "asc",
"script": {
"lang": "painless",
"source": "doc['first.keyword'].value + ' ' + doc['last.keyword'].value"
}
}
}
}
Copy the code
Check for missing items
Doc [r]. ‘the field’ value. If the field is missing from the document, an exception is thrown.
To check the document for missing values, call doc [‘field’].size() == 0.
Update the field with Painless
You can also easily update fields. You can access the original source of the field using ctx._source.<field-name>.
First, let’s look at the player’s source data by submitting the following request:
GET hockey/_search
{
"stored_fields": [
"_id",
"_source"
],
"query": {
"term": {
"_id": 1
}
}
}
Copy the code
The result displayed is:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "hockey",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"first" : "johnny",
"last" : "gaudreau",
"goals" : [
9,
27,
1
],
"assists" : [
17,
46,
0
],
"gp" : [
26,
82,
1
],
"born" : "1993/08/13"
}
}
]
}
}
Copy the code
To change player 1’s last name to hockey, simply set ctx._source.last to the new value:
POST hockey/_update/1
{
"script": {
"lang": "painless",
"source": "ctx._source.last = params.last",
"params": {
"last": "hockey"
}
}
}
Copy the code
You can also add fields to the document. For example, this script adds a new field containing the player nickname as hockey.
POST hockey/_update/1
{
"script": {
"lang": "painless",
"source": """
ctx._source.last = params.last;
ctx._source.nick = params.nick
""",
"params": {
"last": "gaudreau",
"nick": "hockey"
}
}
}
Copy the code
The result displayed is:
GET hockey/_doc/1
Copy the code
{
"_index" : "hockey",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"_seq_no" : 11,
"_primary_term" : 1,
"found" : true,
"_source" : {
"first" : "johnny",
"last" : "gaudreau",
"goals" : [
9,
27,
1
],
"assists" : [
17,
46,
0
],
"gp" : [
26,
82,
1
],
"born" : "1993/08/13",
"nick" : "hockey"
}
}
Copy the code
A new field called “Nick” has been added.
We can even manipulate the date type to get information such as year and year:
GET hockey/_search
{
"script_fields": {
"birth_year": {
"script": {
"source": "doc.born.value.year"
}
}
}
}
Copy the code
Display result:
{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : {" total ": {" value" : 11, "base" : "eq"}, "max_score" : 1.0, "hits" : [{" _index ":" hockey ", "_type" : "_doc", "_id" : "2", "_score" : 1.0, "fields" : {" birth_year ": [1994]}},...Copy the code
Script Caching
Elasticsearch sees a new script for the first time, compiles it and stores the compiled version in the cache. Either inline or stored scripts are stored in the cache. The new script can expel cached scripts. By default, 100 scripts can be stored. We can change its size by setting script.cache.max_size, or set the expiration time by script.cache.expire. These Settings need to be set in config/ elasticSearch.yml.
Script debugging
Scripts that can’t be debugged are very difficult. Having a good debugging tool is definitely very useful for our scripting.
Debug.explain
Painless doesn’t have a REPL, and while it’s fine one day, it won’t tell you the whole story about debugging a Painless script embedded in Elasticsearch, because the data or “context” that the script can access is so important. Currently, the best way to debug an embedded script is to throw an exception at a selected location. Although you can throw your own exception(throw new Exception (‘whatever’), Painless’s sandbox prevents you from accessing useful information, such as the type of the object. So Painless has a utility method, debug.explain, that throws exceptions for you. For example, you can use _explain to explore the context in which script Query is available.
PUT /hockey/_doc/1? Refresh {" first ":" Johnny ", "last" : "gaudreau", "goals" :,27,1 [9], "assists" :,46,0 [17], "gp" :,82,1 [26]} POST/hockey / _explain / 1 { "query": { "script": { "script": "Debug.explain(doc.goals)" } } }Copy the code
Suggesting that doc. Class goals by org. Elasticsearch. Index. Fielddata. ScriptDocValues. Long to respond to:
{
"error": {
"root_cause": [
{
"type": "script_exception",
"reason": "runtime error",
"painless_class": "org.elasticsearch.index.fielddata.ScriptDocValues.Longs",
"to_string": "[1, 9, 27]",
"java_class": "org.elasticsearch.index.fielddata.ScriptDocValues$Longs",
"script_stack": [
"Debug.explain(doc.goals)",
" ^---- HERE"
],
"script": "Debug.explain(doc.goals)",
"lang": "painless"
}
],
"type": "script_exception",
"reason": "runtime error",
"painless_class": "org.elasticsearch.index.fielddata.ScriptDocValues.Longs",
"to_string": "[1, 9, 27]",
"java_class": "org.elasticsearch.index.fielddata.ScriptDocValues$Longs",
"script_stack": [
"Debug.explain(doc.goals)",
" ^---- HERE"
],
"script": "Debug.explain(doc.goals)",
"lang": "painless",
"caused_by": {
"type": "painless_explain_error",
"reason": null
}
},
"status": 400
}
Copy the code
You can use the same technique to look at the LinkedHashMap in the _source is _update API:
POST /hockey/_update/1
{
"script": "Debug.explain(ctx._source)"
}
Copy the code
The results are as follows:
{ "error": { "root_cause": [ { "type": "remote_transport_exception", "reason": "[localhost] [127.0.0.1:9300] [indices: data/write/update [s]]"}], "type" : "illegal_argument_exception", "" reason" : "failed to execute script", "caused_by": { "type": "script_exception", "reason": "runtime error", "painless_class": "java.util.LinkedHashMap", "to_string": "{first=johnny, last=gaudreau, goals=[9, 27, 1], assists=[17, 46, 0], gp=[26, 82, 1], born=1993/08/13, nick=hockey}", "java_class": "java.util.LinkedHashMap", "script_stack": [ "Debug.explain(ctx._source)", " ^---- HERE" ], "script": "Debug.explain(ctx._source)", "lang": "painless", "caused_by": { "type": "painless_explain_error", "reason": null } } }, "status": 400 }Copy the code
Reference:
【 1 】 www.elastic.co/guide/en/el…
(2) www.elastic.co/guide/en/el…