In previous articles, I introduced Painless scripting and provided detailed information about its syntax and usage. It also covers best practices, such as why parameters are used, when to use “doc” values instead of “_source” when accessing document fields, and how to create fields dynamically.
Previous posts:
- Elasticsearch: Painless scripting
- Elasticsearch: Painless Script programming
In this article, we’ll explore more uses of Painless scripts. This article describes using Painless scripts in query contexts, filtering context, using conditions in scripts, removing fields/nested fields, accessing nested objects, using scripts in scoring, and more.
The tutorial
First, let’s use the data set available for the rest of this article:
PUT tweets/_bulk
{"index":{"_id":1}}
{"username":"tom","posted_date":"2017/07/25" ,"message": "I brought apple stock at the best price" ,"tags": ["stock","money"] , "info":{"device":"mobile", "os": "ios"}, "likes": 10}
{"index":{"_id":2}}
{"username":"mary","posted_date":"2017/06/25" ,"message": "Machine learning is the future" ,"tags": ["ai","tech"] , "info":{"device":"desktop", "os": "ios"}, "likes": 100}
{"index":{"_id":3}}
{"username":"tom","posted_date":"2017/07/27" ,"message": "just tweeting" ,"tags": ["confused"] , "info":{"device":"mobile", "os": "win"}, "likes": 0}
{"index":{"_id":4}}
{"username":"mary","posted_date":"2017/07/28" ,"message": "exploring painless" ,"tags": ["elastic"] , "info":{"device":"mobile", "os": "linux"}, "likes": 100}
{"index":{"_id":5}}
{"username":"mary","posted_date":"2017/05/20" ,"message": "painless is fun but its a new scripting language in the town" ,"tags": ["elastic","painless","scripting"] , "info":{"device":"mobile", "os": "linux"}, "likes": 1000}
Copy the code
Above, we used the BULK API to import our experimental data into the tweets index.
Script Query
Script queries enable us to execute scripts on each document. Script queries are typically used in the context of filters. If you want to include a script in a query or filter context, be sure to embed the script in the script object (“script” : {}). Therefore, in the following example, you will see the Script tag inside the Script tag.
Let’s try an example. Let’s find all the tweets that contain the string “painless” and are longer than 25 characters.
GET tweets/_search { "query": { "bool": { "must": [ { "match": { "message": "painless" } } ], "filter": [ { "script": { "script": { "source": "doc['message.keyword'].value.length() > params.length", "params": { "length": 25}}}}]}}}Copy the code
Return result:
"hits" : [
{
"_index" : "tweets",
"_type" : "_doc",
"_id" : "5",
"_score" : 0.60910475,
"_source" : {
"username" : "mary",
"posted_date" : "2017/05/20",
"message" : "painless is fun but its a new scripting language in the town",
"tags" : [
"elastic",
"painless",
"scripting"
],
"info" : {
"device" : "mobile",
"os" : "linux"
},
"likes" : 1000
}
}
]
Copy the code
Aggregation 中的 Scripts
Scripts can also be used in aggregation. For aggregation, we typically perform the aggregation using values in fields (non-analysis fields). Using scripts, you can extract values from existing fields, append values from multiple fields, and then aggregate the newly derived values.
In the above tweet, we only included “posted_date” information. What if we want to find out the number of tweets per month? Here is an example that shows the use of scripts in aggregation:
GET tweets/_search
{
"size": 0,
"aggs": {
"my_terms_agg": {
"terms": {
"script": {
"source": """
ZonedDateTime date = doc['posted_date'].value;
return date.getMonth()
"""
}
}
}
}
}
Copy the code
Above we get the month of each document from script and let the month of production do the aggregation:
"aggregations" : {
"my_terms_agg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "JULY",
"doc_count" : 3
},
{
"key" : "JUNE",
"doc_count" : 1
},
{
"key" : "MAY",
"doc_count" : 1
}
]
}
}
Copy the code
Delete a field using a script
We can use scripts to delete fields/nested fields. All you have to do is use the remove method and pass in the field/nested field name. For example, suppose we want to remove the nested field “Device” from the document ID 5.
POST tweets/_update/5
{
"script": {
"source": "ctx._source.info.remove(params.fieldname)",
"params": {
"fieldname": "device"
}
}
}
Copy the code
Select * from document where id = 5;
GET tweets/_doc/5
Copy the code
Result returned:
"_source" : {
"username" : "mary",
"posted_date" : "2017/05/20",
"message" : "painless is fun but its a new scripting language in the town",
"tags" : [
"elastic",
"painless",
"scripting"
],
"info" : {
"os" : "linux"
},
"likes" : 1000
}
Copy the code
We can see that the device under INFO has been deleted.
Use Scripts to customize scores
When we perform a match query, ElasticSearch returns the match and calculates a score for each matched document to show how well the document matches the given query. Although the default algorithm BM25 does a good job of scoring/correlation, sometimes correlation questions must be answered by other algorithms, or correlations must be enhanced by other scoring heuristics. This is where the Script_score and function_score functions of Elasticsearch become useful.
Suppose we want to search for “painless” text, but show tweets with more “likes” at the top of the search results. It’s more like the top trending tweets/trending tweets. Let’s see it in action.
GET tweets/_search
{
"query": {
"function_score": {
"query": {
"match": {
"message": "painless"
}
},
"script_score": {
"script": {
"source": "1 + doc['likes'].value"
}
}
}
}
}
Copy the code
Return result:
"Hits" : [{" _index ":" tweets ", "_type" : "_doc", "_id" : "5", "_score" : 529.9271, "_source" : {" username ": "mary", "posted_date" : "2017/05/20", "message" : "painless is fun but its a new scripting language in the town", "tags" : [ "elastic", "painless", "scripting" ], "info" : { "os" : "linux" }, "likes" : 1000 } }, { "_index" : "Tweets", "_type" : "_doc", "_id" : "4", "_score" : 98.51341, "_source" : {" username ":" Mary ", "posted_date" : "2017/07/28", "message" : "exploring painless", "tags" : [ "elastic" ], "info" : { "device" : "mobile", "os" : "linux" }, "likes" : 100 } } ]Copy the code
In the example above, if a custom score is not created due to a regular query, document 4 will be at the top due to TF/IDF (because the sentence is short), meaning that the document score will be higher than document 5.
GET tweets/_search
{
"query": {
"match": {
"message": "painless"
}
}
}
Copy the code
The result returned is:
"hits" : [
{
"_index" : "tweets",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.9753803,
"_source" : {
"username" : "mary",
"posted_date" : "2017/07/28",
"message" : "exploring painless",
"tags" : [
"elastic"
],
"info" : {
"device" : "mobile",
"os" : "linux"
},
"likes" : 100
}
},
{
"_index" : "tweets",
"_type" : "_doc",
"_id" : "5",
"_score" : 0.5293977,
"_source" : {
"username" : "mary",
"posted_date" : "2017/05/20",
"message" : "painless is fun but its a new scripting language in the town",
"tags" : [
"elastic",
"painless",
"scripting"
],
"info" : {
"os" : "linux"
},
"likes" : 1000
}
}
]
Copy the code
A document ID of 4 has a higher score than a document ID of 5.