preface
In this article, you will find the most useful search techniques for Elasticsearch, as well as the Java API implementation
Data preparation
To illustrate the different types of ES retrieval, we will retrieve a collection of documents containing the following types:
Title Title authors Author Summary publish_date Num_reviews PublisherCopy the code
First, we use the BULK API to batch create new indexes and commit data
Set index Settings
PUT /bookdb_index
{ "settings": { "number_of_shards": 1}}# Bulk Commits data
POST /bookdb_index/book/_bulk
{"index": {"_id": {1}}"title":"Elasticsearch: The Definitive Guide"."authors": ["clinton gormley"."zachary tong"]."summary":"A distibuted real-time search and analytics engine"."publish_date":"2015-02-07"."num_reviews": 20."publisher":"oreilly"}
{"index": {"_id": {2}}"title":"Taming Text: How to Find, Organize, and Manipulate It"."authors": ["grant ingersoll"."thomas morton"."drew farris"]."summary":"organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization"."publish_date":"2013-01-24"."num_reviews": 12."publisher":"manning"}
{"index": {"_id": {3}}"title":"Elasticsearch in Action"."authors": ["radu gheorge"."matthew lee hinman"."roy russo"]."summary":"build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms"."publish_date":"2015-12-03"."num_reviews": 18."publisher":"manning"}
{"index": {"_id": {4}}"title":"Solr in Action"."authors": ["trey grainger"."timothy potter"]."summary":"Comprehensive guide to implementing a scalable search engine using Apache Solr"."publish_date":"2014-04-05"."num_reviews": 23."publisher":"manning"}
Copy the code
Note: The ES version used in this experiment is ES 6.3.0
1. Basic Match Query
1.1 Full-text Retrieval
There are two ways to perform full-text retrieval:
1) Use a retrieval API that contains parameters as part of the URL
Example: Perform a full-text search for “Guide” below
GET bookdb_index/book/_search? q=guide [Results]"hits": {
"total": 2."max_score": 1.3278645."hits": [{"_index": "bookdb_index"."_type": "book"."_id": "4"."_score": 1.3278645."_source": {
"title": "Solr in Action"."authors": [
"trey grainger"."timothy potter"]."summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr"."publish_date": "2014-04-05"."num_reviews": 23."publisher": "manning"}}, {"_index": "bookdb_index"."_type": "book"."_id": "1"."_score": 1.2871116."_source": {
"title": "Elasticsearch: The Definitive Guide"."authors": [
"clinton gormley"."zachary tong"]."summary": "A distibuted real-time search and analytics engine"."publish_date": "2015-02-07"."num_reviews": 20."publisher": "oreilly"}}}]Copy the code
2) Use the full ES DSL, where the Json body is the request body and the result is the same as method 1.
GET bookdb_index/book/_search
{
"query": {
"multi_match": {
"query": "guide"."fields" : ["_all"]}}}Copy the code
Interpretation: Use the multi_match keyword instead of the match keyword as a convenient shorthand for running the same query on multiple fields. The fields attribute specifies the fields to be queried, in which case we want to query all fields in the document
Note: ES 6.x does not enable the _all field by default and does not specify fields by default search for all fields
1.2 Specify specific field retrieval
Both apis also allow you to specify fields to search for. For example, search for books with the word “in Action “in the title field
1) URL retrieval method
GET bookdb_index/book/_search? q=title:in action
[Results]
"hits": {
"total": 2."max_score": 1.6323128."hits": [{"_index": "bookdb_index"."_type": "book"."_id": "3"."_score": 1.6323128."_source": {
"title": "Elasticsearch in Action"."authors": [
"radu gheorge"."matthew lee hinman"."roy russo"]."summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms"."publish_date": "2015-12-03"."num_reviews": 18."publisher": "manning"}}, {"_index": "bookdb_index"."_type": "book"."_id": "4"."_score": 1.6323128."_source": {
"title": "Solr in Action"."authors": [
"trey grainger"."timothy potter"]."summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr"."publish_date": "2014-04-05"."num_reviews": 23."publisher": "manning"}}}]Copy the code
2) DSL retrieval Methods However, Full Body’s DSL gives you more flexibility to create more complex queries (we’ll see later) and specify the results you want to return. In the following example, we specify the number of results to return, the offset (useful for paging), the document fields we are returning, and the highlighting of the properties.
Representation of the number of results: size Representation of the offset value: from Specifies the representation of the returned field: _source Representation of the highlighted value: highliaght
GET bookdb_index/book/_search
{
"query": {
"match": {
"title": "in action"}},"size": 2."from": 0."_source": ["title"."summary"."publish_date"]."highlight": {
"fields": {
"title": {}
}
}
}
[Results]
"hits": {
"total": 2."max_score": 1.6323128."hits": [{"_index": "bookdb_index"."_type": "book"."_id": "3"."_score": 1.6323128."_source": {
"summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms"."title": "Elasticsearch in Action"."publish_date": "2015-12-03"
},
"highlight": {
"title": [
"Elasticsearch <em>in</em> <em>Action</em>"]}}, {"_index": "bookdb_index"."_type": "book"."_id": "4"."_score": 1.6323128."_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr"."title": "Solr in Action"."publish_date": "2014-04-05"
},
"highlight": {
"title": [
"Solr <em>in</em> <em>Action</em>"}}]}Copy the code
Note:
- For multi-word retrieval, matching queries allow you to specify whether to use the AND operator instead of the default OR operator –> “operator” : “and”
- You can also specify the minimum_should_match option to adjust the correlation of returned results. For details, see the Elasticsearch Guide.
2. Multi-field Search
As we have already seen, to query multiple document fields in a search (for example, to search for the same query string in the title and summary), use the multi_match query
GET bookdb_index/book/_search
{
"query": {
"multi_match": {
"query": "guide"."fields": ["title"."summary"]
}
}
}
[Results]
"hits": {
"total": 3."max_score": 2.0281231."hits": [{"_index": "bookdb_index"."_type": "book"."_id": "1"."_score": 2.0281231."_source": {
"title": "Elasticsearch: The Definitive Guide"."authors": [
"clinton gormley"."zachary tong"]."summary": "A distibuted real-time search and analytics engine"."publish_date": "2015-02-07"."num_reviews": 20."publisher": "oreilly"}}, {"_index": "bookdb_index"."_type": "book"."_id": "4"."_score": 1.3278645."_source": {
"title": "Solr in Action"."authors": [
"trey grainger"."timothy potter"]."summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr"."publish_date": "2014-04-05"."num_reviews": 23."publisher": "manning"}}, {"_index": "bookdb_index"."_type": "book"."_id": "3"."_score": 1.0333893."_source": {
"title": "Elasticsearch in Action"."authors": [
"radu gheorge"."matthew lee hinman"."roy russo"]."summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms"."publish_date": "2015-12-03"."num_reviews": 18."publisher": "manning"}}}]Copy the code
Note: The reason document 4 (_id=4) matches in the above results is that the Guide exists in the summary.
Boosting the Retrieval of a Boosting field
Since we are searching in multiple fields, we may want to improve the score for one field. In the example below, we increased the score for the “Summary” field by a factor of three to increase the importance of the “Summary” field and thus improve the relevance of document 4.
GET bookdb_index/book/_search
{
"query": {
"multi_match": {
"query": "elasticsearch guide"."fields": ["title"."summary^3"]}},"_source": ["title"."summary"."publish_date"]
}
[Results]
"hits": {
"total": 3."max_score": 3.9835935."hits": [{"_index": "bookdb_index"."_type": "book"."_id": "4"."_score": 3.9835935."_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr"."title": "Solr in Action"."publish_date": "2014-04-05"}}, {"_index": "bookdb_index"."_type": "book"."_id": "3"."_score": 3.1001682."_source": {
"summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms"."title": "Elasticsearch in Action"."publish_date": "2015-12-03"}}, {"_index": "bookdb_index"."_type": "book"."_id": "1"."_score": 2.0281231."_source": {
"summary": "A distibuted real-time search and analytics engine"."title": "Elasticsearch: The Definitive Guide"."publish_date": "2015-02-07"}}}]Copy the code
Note: Boosting means more than just calculating score multiplications to add factors. The actual improved score value is achieved through normalization and some internal optimization. See Elasticsearch Guide for more information
4, Bool Query (Bool Query)
We can use the AND/OR/NOT operators to fine-tune our search query to provide more relevant OR specified search results.
This is done through the bool query in the search API. A bool query takes either a must argument (equivalent to AND), a must_NOT argument (equivalent to NOT), OR a should argument (equivalent to OR).
For example, if I want to search for a book called “Elasticsearch” or “Solr” in the title, AND by “Clinton Gormley”, but NOT by “Radu Gheorge”
GET bookdb_index/book/_search
{
"query": {
"bool": {
"must": [{"bool": {
"should": [{"match": {"title": "Elasticsearch"}},
{"match": {"title": "Solr"}}]}}, {"match": {"authors": "clinton gormely"}}]."must_not": [{"match": {"authors": "radu gheorge"}
}
]
}
}
}
[Results]
"hits": {
"total": 1,
"max_score": 2.0749094."hits": [{"_index": "bookdb_index"."_type": "book"."_id": "1"."_score": 2.0749094."_source": {
"title": "Elasticsearch: The Definitive Guide"."authors": [
"clinton gormley"."zachary tong"]."summary": "A distibuted real-time search and analytics engine"."publish_date": "2015-02-07"."num_reviews": 20."publisher": "oreilly"}}}]Copy the code
There are two cases of should in a bool query:
- When there is must at the same level of “should”, the conditions in “should” can be satisfied or not satisfied. The more the conditions are satisfied, the higher the score will be
- When there is no must, at least one condition must be satisfied in should by default
Note: As you can see, a bool query can contain any other query type, including other Boolean queries, to create arbitrarily complex or deeply nested queries
5, Fuzzy Fuzzy Queries
Fuzzy matching can be enabled in Match and multi-match retrieval to catch spelling errors. The ambiguity is specified based on the Levenshtein distance from the original word
GET bookdb_index/book/_search
{
"query": {
"multi_match": {
"query": "comprihensiv guide"."fields": ["title"."summary"]."fuzziness": "AUTO"}},"_source": ["title"."summary"."publish_date"]."size": 2
}
[Results]
"hits": {
"total": 2."max_score": 2.4344182."hits": [{"_index": "bookdb_index"."_type": "book"."_id": "4"."_score": 2.4344182."_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr"."title": "Solr in Action"."publish_date": "2014-04-05"}}, {"_index": "bookdb_index"."_type": "book"."_id": "1"."_score": 1.2871116."_source": {
"summary": "A distibuted real-time search and analytics engine"."title": "Elasticsearch: The Definitive Guide"."publish_date": "2015-02-07"}}}]Copy the code
The fuzzy value of “AUTO” is equivalent to specifying the value 2 if the field length is greater than 5. However, setting the edit distance to 1 for 80% of spelling errors and setting ambiguity to 1 May improve overall search performance. For more information, Typos and Misspellingsch
6. Wildcard Query Wildcard Query
Wildcard queries allow you to specify matching patterns rather than a whole term search
- ? Match any character
-
- Matches zero or more characters
For example, to find all records that have authors beginning with the letter “T”, look like this
GET bookdb_index/book/_search
{
"query": {
"wildcard": {
"authors": {
"value": "t*"}}},"_source": ["title"."authors"]."highlight": {
"fields": {
"authors": {}
}
}
}
[Results]
"hits": {
"total": 3."max_score": 1,
"hits": [{"_index": "bookdb_index"."_type": "book"."_id": "1"."_score": 1,
"_source": {
"title": "Elasticsearch: The Definitive Guide"."authors": [
"clinton gormley"."zachary tong"]},"highlight": {
"authors": [
"zachary <em>tong</em>"]}}, {"_index": "bookdb_index"."_type": "book"."_id": "2"."_score": 1,
"_source": {
"title": "Taming Text: How to Find, Organize, and Manipulate It"."authors": [
"grant ingersoll"."thomas morton"."drew farris"]},"highlight": {
"authors": [
"<em>thomas</em> morton"]}}, {"_index": "bookdb_index"."_type": "book"."_id": "4"."_score": 1,
"_source": {
"title": "Solr in Action"."authors": [
"trey grainger"."timothy potter"]},"highlight": {
"authors": [
"<em>trey</em> grainger"."<em>timothy</em> potter"}}]}Copy the code
7. Regular expression Query (Regexp Query)
Regular expressions can specify more complex retrieval modes than wildcard retrieval, as shown in the following example:
POST bookdb_index/book/_search
{
"query": {
"regexp": {
"authors": "t[a-z]*y"}},"_source": ["title"."authors"]."highlight": {
"fields": {
"authors": {}
}
}
}
[Results]
"hits": {
"total": 1,
"max_score": 1,
"hits": [{"_index": "bookdb_index"."_type": "book"."_id": "4"."_score": 1,
"_source": {
"title": "Solr in Action"."authors": [
"trey grainger"."timothy potter"]},"highlight": {
"authors": [
"<em>trey</em> grainger"."<em>timothy</em> potter"}}]}Copy the code
8. Match Phrase Query
Matching phrase queries require that all words in the query string exist in the document, in the order specified in the query string, and close to each other.
By default, these words must be completely adjacent, but you can specify a slop value that indicates the deviation from word to word while document matching is still considered.
GET bookdb_index/book/_search
{
"query": {
"multi_match": {
"query": "search engine"."fields": ["title"."summary"]."type": "phrase"."slop": 3}},"_source": [ "title"."summary"."publish_date" ]
}
[Results]
"hits": {
"total": 2."max_score": 0.88067603."hits": [{"_index": "bookdb_index"."_type": "book"."_id": "4"."_score": 0.88067603."_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr"."title": "Solr in Action"."publish_date": "2014-04-05"}}, {"_index": "bookdb_index"."_type": "book"."_id": "1"."_score": 0.51429313."_source": {
"summary": "A distibuted real-time search and analytics engine"."title": "Elasticsearch: The Definitive Guide"."publish_date": "2015-02-07"}}}]Copy the code
Note: In the example above, for non-phrase type queries, document _id 1 usually has a higher score and is displayed before document _id 4 because of its shorter field length.
However, as a phrase query, proximity between words is taken into account, so the document _ID 4 score is better
9, matching phrase prefix retrieval
Matching phrase prefix queries provide an auto-complete version of searching for immediate types or “relatively easy” at query time without having to prepare the data in any way.
Like the match_PHRASE query, it takes a slope argument, making word order and relative position less “strict.” It also accepts the max_expansions parameter to limit the number of matching conditions to reduce resource intensity
GET bookdb_index/book/_search
{
"query": {
"match_phrase_prefix": {
"summary": {
"query": "search en"."slop": 3."max_expansions": 10}}},"_source": ["title"."summary"."publish_date"]}Copy the code
Note: The query time search type has a performance cost. A better solution is to use time as the index type. Find out more about the Completion Suggester API or Edge-Ngram filters.
10. Query String
Query_string query provides a concise concise syntax to implement multi-match queries multi_match queries, Boolean queries bool queries, boosting scores Boosting, fuzzy matching. Wildcards, regular expression regexp, and range queries.
In the examples below, we perform a fuzzy search on the term “search algorithm”, one of which was written by “Grant Ingersoll” or “Tom Morton”. We search all fields, but apply the promotion to the summary field of document 2
GET bookdb_index/book/_search
{
"query": {
"query_string": {
"query": "(saerch~1 algorithm~1) AND (grant ingersoll) OR (tom morton)"."fields": ["summary^2"."title"."authors"."publisher"]}},"_source": ["title"."summary"."authors"]."highlight": {
"fields": {
"summary": {}
}
}
}
[Results]
"hits": {
"total": 1,
"max_score": 3.571021."hits": [{"_index": "bookdb_index"."_type": "book"."_id": "2"."_score": 3.571021."_source": {
"summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization"."title": "Taming Text: How to Find, Organize, and Manipulate It"."authors": [
"grant ingersoll"."thomas morton"."drew farris"]},"highlight": {
"summary": [
"organize text using approaches such as full-text <em>search</em>, proper name recognition, clustering, tagging"}}]}Copy the code
11. Simple Query String
Simple_query_string query is a version query_string query, exposed to the user more suitable for a single search box, because it with + / | replacing AND/OR / / – NOT use, AND give up invalid part of the query, Instead of throwing an exception when the user makes a mistake.
GET bookdb_index/book/_search
{
"query": {
"simple_query_string": {
"query": "(saerch~1 algorithm~1) + (grant ingersoll) | (tom morton)"."fields": ["summary^2"."title"."authors"."publisher"]}},"_source": ["title"."summary"."authors"]."highlight": {
"fields": {
"summary": {}
}
}
}
[Results]
# result same as above
Copy the code
12, Term/Terms search (specified field search)
The examples in sections 1-11 above are examples of full-text search. Sometimes we are more interested in structured searches where we want to find a perfect match and return results
In the following example, we search all the books in the index published by Manning Publications (with term and terms queries)
GET bookdb_index/book/_search
{
"query": {
"term": {
"publisher": {
"value": "manning"}}},"_source" : ["title"."publish_date"."publisher"]
}
[Results]
"hits": {
"total": 3."max_score": 0.35667494."hits": [{"_index": "bookdb_index"."_type": "book"."_id": "2"."_score": 0.35667494."_source": {
"publisher": "manning"."title": "Taming Text: How to Find, Organize, and Manipulate It"."publish_date": "2013-01-24"}}, {"_index": "bookdb_index"."_type": "book"."_id": "3"."_score": 0.35667494."_source": {
"publisher": "manning"."title": "Elasticsearch in Action"."publish_date": "2015-12-03"}}, {"_index": "bookdb_index"."_type": "book"."_id": "4"."_score": 0.35667494."_source": {
"publisher": "manning"."title": "Solr in Action"."publish_date": "2014-04-05"}}}]Copy the code
Multiple terms allows you to specify Multiple keywords for retrieval
GET bookdb_index/book/_search
{
"query": {
"terms": {
"publisher": ["oreilly"."manning"]}}}Copy the code
13, Sorted by Term Query – (Term Query – Sorted)
The Term query is as easy to sort as any other query. Multilevel sorting is also allowed
GET bookdb_index/book/_search
{
"query": {
"term": {
"publisher": {
"value": "manning"}}},"_source" : ["title"."publish_date"."publisher"]."sort": [{"publisher.keyword": { "order": "desc"}},
{"title.keyword": {"order": "asc"}}]
}
[Results]
"hits": {
"total": 3."max_score": null,
"hits": [{"_index": "bookdb_index"."_type": "book"."_id": "3"."_score": null,
"_source": {
"publisher": "manning"."title": "Elasticsearch in Action"."publish_date": "2015-12-03"
},
"sort": [
"manning"."Elasticsearch in Action"] {},"_index": "bookdb_index"."_type": "book"."_id": "4"."_score": null,
"_source": {
"publisher": "manning"."title": "Solr in Action"."publish_date": "2014-04-05"
},
"sort": [
"manning"."Solr in Action"] {},"_index": "bookdb_index"."_type": "book"."_id": "2"."_score": null,
"_source": {
"publisher": "manning"."title": "Taming Text: How to Find, Organize, and Manipulate It"."publish_date": "2013-01-24"
},
"sort": [
"manning"."Taming Text: How to Find, Organize, and Manipulate It"]]}}Copy the code
Select * from Elasticsearch (select * from Elasticsearch (select * from Elasticsearch (select * from Elasticsearch))))
14. Range Query
Another example of structured retrieval is range retrieval. In the example below, we searched for books published in 2015.
GET bookdb_index/book/_search
{
"query": {
"range": {
"publish_date": {
"gte": "2015-01-01"."lte": "2015-12-31"}}},"_source" : ["title"."publish_date"."publisher"]
}
[Results]
"hits": {
"total": 2."max_score": 1,
"hits": [{"_index": "bookdb_index"."_type": "book"."_id": "1"."_score": 1,
"_source": {
"publisher": "oreilly"."title": "Elasticsearch: The Definitive Guide"."publish_date": "2015-02-07"}}, {"_index": "bookdb_index"."_type": "book"."_id": "3"."_score": 1,
"_source": {
"publisher": "manning"."title": "Elasticsearch in Action"."publish_date": "2015-12-03"}}}]Copy the code
Note: Range queries apply to date, number, and string type fields
Filtered Query
(No longer available since version 5.0, don’t worry about it)
Filtered queries allow you to filter the results of the query. In the following example, we query for a book named “Elasticsearch” in the title or summary, but we want to filter the results to only 20 or more reviews.
POST /bookdb_index/book/_search
{
"query": {
"filtered": {
"query" : {
"multi_match": {
"query": "elasticsearch"."fields": ["title"."summary"]}},"filter": {
"range" : {
"num_reviews": {
"gte": 20}}}}},"_source" : ["title"."summary"."publisher"."num_reviews"]
}
[Results]
"hits": [{"_index": "bookdb_index"."_type": "book"."_id": "1"."_score": 0.5955761."_source": {
"summary": "A distibuted real-time search and analytics engine"."publisher": "oreilly"."num_reviews": 20."title": "Elasticsearch: The Definitive Guide"}}]Copy the code
Note: Filtered queries do not require the existence of queries to be filtered. If no query is specified, the match_all query is run, which basically returns all documents in the index and then filters them. Actually, run the filter first to reduce the surface area to be queried. In addition, the filter is cached after the first use, which makes it very effective
Update: Filtered queries have been removed from Elasticsearch 5.x in favor of Boolean queries. This is the same example as the bool query rewritten above. The result returned is exactly the same.
GET bookdb_index/book/_search
{
"query": {
"bool": {
"must": [{"multi_match": {
"query": "elasticsearch"."fields": ["title"."summary"]}}],"filter": {
"range": {
"num_reviews": {
"gte": 20}}}}},"_source" : ["title"."summary"."publisher"."num_reviews"]}Copy the code
Search for Multiple Filters
(5.x is no longer supported, so don’t worry.) Multiple filters can be combined by using Boolean filters.
In the next example, the filter determines that the result returned must contain at least 20 comments, must not be published before 2015, and should be published by Oreilly
POST /bookdb_index/book/_search
{
"query": {
"filtered": {
"query" : {
"multi_match": {
"query": "elasticsearch"."fields": ["title"."summary"]}},"filter": {
"bool": {
"must": {
"range" : { "num_reviews": { "gte": 20}}},"must_not": {
"range" : { "publish_date": { "lte": "2014-12-31"}}},"should": {
"term": { "publisher": "oreilly"}}}}}},"_source" : ["title"."summary"."publisher"."num_reviews"."publish_date"]
}
[Results]
"hits": [{"_index": "bookdb_index"."_type": "book"."_id": "1"."_score": 0.5955761."_source": {
"summary": "A distibuted real-time search and analytics engine"."publisher": "oreilly"."num_reviews": 20."title": "Elasticsearch: The Definitive Guide"."publish_date": "2015-02-07"}}]Copy the code
17, Function Score: Field Value Factor
There may be a case where you want to include the value of a particular field in the document in the correlation score calculation. This is typical in situations where you want to increase the relevance of a document based on its popularity
In our case, we wanted to add more popular books (judged by the number of reviews). This can be scored using the field_value_factor function
GET bookdb_index/book/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "search engine"."fields": ["title"."summary"]}},"field_value_factor": {
"field": "num_reviews"."modifier": "log1p"."factor": 2}}},"_source": ["title"."summary"."publish_date"."num_reviews"]
}
[Results]
"hits": [{"_index": "bookdb_index"."_type": "book"."_id": "1"."_score": 1.5694137."_source": {
"summary": "A distibuted real-time search and analytics engine"."num_reviews": 20."title": "Elasticsearch: The Definitive Guide"."publish_date": "2015-02-07"}}, {"_index": "bookdb_index"."_type": "book"."_id": "4"."_score": 1.4725765."_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr"."num_reviews": 23."title": "Solr in Action"."publish_date": "2014-04-05"}}, {"_index": "bookdb_index"."_type": "book"."_id": "3"."_score": 0.14181662."_source": {
"summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms"."num_reviews": 18."title": "Elasticsearch in Action"."publish_date": "2015-12-03"}}, {"_index": "bookdb_index"."_type": "book"."_id": "2"."_score": 0.13297246."_source": {
"summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization"."num_reviews": 12."title": "Taming Text: How to Find, Organize, and Manipulate It"."publish_date": "2013-01-24"}}}]Copy the code
Note 1: We can run a regular multi_match query and sort by the num_reviews field, but we lose the benefit of correlation scores. Note 2: There are a number of additional parameters that can be adjusted to adjust the degree of enhancement to the raw correlation score (e.g. ‘Modifier’, ‘Factor’, ‘boost_mode’, etc.). See the Elasticsearch guide.
16, Function Score: Decay Functions
Let’s say we don’t want to increments the score with the value of a field to get the desired result. Examples: price range, number field range, date range. In our example, we are searching for “Search Engines” books published around June 2014.
GET bookdb_index/book/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "search engine"."fields": ["title"."summary"]}},"functions": [{"exp": {
"publish_date": {
"origin": "2014-06-15"."scale": "30d"."offset": "7d"}}}]."boost_mode": "replace"}},"_source": ["title"."summary"."publish_date"."num_reviews"]
}
[Results]
"hits": {
"total": 4."max_score": 0.22793062."hits": [{"_index": "bookdb_index"."_type": "book"."_id": "4"."_score": 0.22793062."_source": {
"summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr"."num_reviews": 23."title": "Solr in Action"."publish_date": "2014-04-05"}}, {"_index": "bookdb_index"."_type": "book"."_id": "1"."_score": 0.0049215667."_source": {
"summary": "A distibuted real-time search and analytics engine"."num_reviews": 20."title": "Elasticsearch: The Definitive Guide"."publish_date": "2015-02-07"}}, {"_index": "bookdb_index"."_type": "book"."_id": "2"."_score": 0.000009612435."_source": {
"summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization"."num_reviews": 12."title": "Taming Text: How to Find, Organize, and Manipulate It"."publish_date": "2013-01-24"}}, {"_index": "bookdb_index"."_type": "book"."_id": "3"."_score": 0.0000049185574."_source": {
"summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms"."num_reviews": 18."title": "Elasticsearch in Action"."publish_date": "2015-12-03"}}}]Copy the code
19, Function Score: Script Scoring
In cases where the built-in scoring functionality doesn’t suit your needs, you have the option of specifying Groovy scripts for scoring
In our example, we specify a script that takes publish_date into account, and then decide how many comments to consider. Newer books may not have as many reviews, so they shouldn’t ‘pay the price’ for that
The scoring script is as follows:
publish_date = doc['publish_date'].value
num_reviews = doc['num_reviews'].value
if (publish_date > Date.parse('yyyy-MM-dd', threshold).getTime()) {my_score = math.log (2.5 + num_reviews)}else {
my_score = Math.log(1 + num_reviews)
}
return my_score
Copy the code
To use the scoring script dynamically, we use the script_score parameter
GET /bookdb_index/book/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "search engine"."fields": ["title"."summary"]}},"functions": [{"script_score": {
"script": {
"params": {
"threshold": "2015-07-30"
},
"lang": "groovy"."source": "publish_date = doc['publish_date'].value; num_reviews = doc['num_reviews'].value; Parse (' YYYY-MM-dd ', threshold).gettime ()) {return log(2.5 + num_reviews)}; publish_date > date.parse (' YYYY-MM-dd ', threshold).gettime ()) {return log(2.5 + num_reviews)}; return log(1 + num_reviews);"}}}]}},"_source": ["title"."summary"."publish_date"."num_reviews"]}Copy the code
Note 1: To use dynamic scripts, you must enable elasticSearch instances in the config/ElasticSearch.yml file. You can also use scripts already stored on the Elasticsearch server. See Elasticsearch Reference Docs for more information. Note 2: JSON cannot contain embedded newlines, so semicolons are used to separate statements. By Tim Ojo Aug. 05, 16 · Big Data Zone
Note: How do I enable Groovy scripts in ES6.3? Failed to configure script.allowed_types: inline & script.allowed_contexts: search, update
Java API implementation
The Java API implements the above query, with the code at github.com/whirlys/ela…
23 Useful Elasticsearch Example Queries you need to know
For more, visit my personal blog: laijianfeng.org
Open the wechat scan, follow the wechat official account of “Xiao Xiao Xiao Feng”, and timely receive the blog push