Elasticsearch Search API (Request Body Search)

Author introduction: Author of RocketMQ Technology Insider, maintainer of wechat public account of middleware interest circle, there is a corresponding QR code at the end of the article, you can better interact with the author after following.

This article is a bit long and may require some patience, This article introduces es three kinds of paging, sorting, FROM, size, source filter, dov values fields, POST filter, highlighting, rescoring, search Type, Scroll, preference, Preference, Explain, Version, Index Boost, MIN_score, Names Query, Inner hits, Field Collapsing, Search After.

You can choose the content you are interested in according to the keywords. Most of the above shows examples of JAVA usage.

This section describes the query body of Elasticsearch Search API and the implementation body of customized query criteria.

The query criteria in the search request body are defined using the Elasticsearch DSL query syntax. The body of the query is defined by using query.

GET /_search
	{
   		 "query" : {
        		"term" : { "user" : "kimchy" }
    		}
	}
Copy the code

2, From/Size ElasticSearch () Paging the result set using the FROM and size parameters. From Sets the offset of the first data. The size set this batch of data returned by the article number (effective) for each divided, because the Elasticsearch is inherently distributed, by setting the main subdivision number for data level segmentation, a query request usually need from multiple background nodes (fragmentation) for data gathering, reason this way will have a common problem: distributed database Deep paging. Elasticsearch provides another pagination method, Scroll API, which will be discussed in more detail later. Note: From + size cannot exceed the value of the index.max_result_window configuration item, which defaults to 10000.

Elasticsearch supports sorting by one or more fields, as well as asC ascending or DESC descending order. Elasticsearch can be sorted by _score, default. If sort is used, the sort value (field sort) for each document is also returned as part of the response.

3.1 sort order Elasticsearch provides two sort order, SortOrder. ASC (ASC) ascending, SortOrder. DESC (DESC) in descending order, if order type for _score, the default sort order for descending (DESC),

If the sort type is a field, the default sort order is ascending (ASC).

3.2 Selection of sorting model

Elasticsearch supports sorting by array or multi-valued fields. The pattern option controls the array value selected to sort the document to which it belongs. Mode options can have the following values:

Min participates in sorting using the smallest value in the array.
Max uses the largest value in the array to participate in sorting.
Sum uses the sum of the array to sort.
Avg uses the average of the array to participate in sorting.
Median uses the median in the array to participate in sorting.

An example is as follows:

PUT /my_index/_doc/1? refresh { "product": "chocolate", "price": [20, 4] } POST /_search { "query" : { "term" : { "product" : "chocolate" } }, "sort" : [ {"price" : {"order" : "asc", "mode" : "avg"}} // @1 ] }Copy the code

If it is a value of array type and participates in sorting, it usually performs some calculation on the array elements to get a final value to participate in sorting, such as taking the average, the maximum, the minimum, the sum and so on. Es is specified by sorting model mode.

Elasticsearch also supports sorting fields within one or more nested objects. A nested query query contains the following options (parameters) :

Path defines the nested objects to sort. The sort field must be a direct (non-nested) field in the nested object, and the sort field must exist.
Filter Defines the filtering context, which defines the filtering context in the sorting environment.
Max_children sort takes into account the maximum number of subattribute documents in the root document. Default is unlimited.
Nested collation supports nesting.

"sort" : [
  {
    "parent.child.age" : {      // @1
        "mode" :  "min",
         "order" : "asc",
         "nested": {                // @2
            "path": "parent",
            "filter": {
                "range": {"parent.age": {"gte": 21}}
            },
            "nested": {                            // @3
                "path": "parent.child",
                "filter": {
                    "match": {"parent.child.name": "matt"}
                }
            }
         }
    }
  }
]
Copy the code

Code @1: Sort field names, support cascading representation of field names. Code @2: Defines sort nesting syntax by nested attributes, where PATH specifies the current nested object,filter defines the filtering context, and @3 can be nested again by nested attributes.

Because of the es index, fields under the type can be dynamically added when indexing documents. If some documents do not contain sorting fields, how to determine the order of these documents? Es is determined by missing attribute, and its optional value is:

_last Defaults to last.
_first is the first.

By default, an exception will be thrown if the sorted fields are unmapped. This exception can be ignored with unmapped_type, which specifies a type that tells ES that if no mapping is found for the field name, the field is considered to be a type specified by unmapped_type, and the value of the field is not present in any document.

3.6 Geo Sorting map types will be explained in a follow-up tutorial on Geo sorting.

Field filtering (_source and stored_fields) By default, all contents in the _source field are returned for matching results. The field filtering mechanism allows the user to return some fields in the _source field as required. The Elasticsearch Document Get API is an example of the Elasticsearch Document Get API.

5. Doc Value Fields can be used as follows:

GET /_search
{
    "query" : {
        "match_all": {}
    },
    "docvalue_fields" : [
        {
            "field": "my_date_field",   
            "format": "epoch_millis" 

        }
    ]
}
Copy the code

By using docvalue_fields to specify the fields and formats to convert, the doc Value fields also work for fields that define STORED = False in the mapping file. Wildcard characters are supported for fields, for example, “field”: “myfield*”. The fields specified in docvalue_fields do not change the value in the _souce field, but instead use the fields return value for additional returns.

The Java example code snippet is as follows (a full Demo sample is given at the end of the article) :

SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(QueryBuilders.termQuery("user", "dingw"))
		.sort(new FieldSortBuilder("post_date").order(SortOrder.DESC))
		.docValueField("post_date", "epoch_millis")
Copy the code

The result is as follows:

{
    "took":88,
    "timed_out":false,
    "_shards":{
        "total":5,
        "successful":5,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":2,
        "max_score":null,
        "hits":[
            {
                "_index":"twitter",
                "_type":"_doc",
                "_id":"11",
                "_score":null,
                "_source":{
                    "post_date":"2009-11-19T14:12:12",
                    "message":"test bulk update",
                    "user":"dingw"
                },
                "fields":{
                    "post_date":[
                        "1258639932000"
                    ]
                },
                "sort":[
                    1258639932000
                ]
            },
            {
                "_index":"twitter",
                "_type":"_doc",
                "_id":"12",
                "_score":null,
                "_source":{
                    "post_date":"2009-11-18T14:12:12",
                    "message":"test bulk",
                    "user":"dingw"
                },
                "fields":{
                    "post_date":[
                        "1258553532000"
                    ]
                },
                "sort":[
                    1258553532000
                ]
            }
        ]
    }
}
Copy the code

6. Post Filter Post Filter filters the documents that match the search criteria.

GET /shirts/_search
{
  "query": {
    "bool": {
      "filter": {
        "term": { "brand": "gucci" }      // @1
      }
    }
  },
  "post_filter": {     // @2
    "term": { "color": "red" }
  }
}
Copy the code

First, index is retrieved according to @1 condition, and then matching documents are obtained, and the results are filtered again by @2 filtering condition.

7, considering (query results Highlighting)

7.1 Highlighting profiler supported by Es

It is used to highlight the query keyword in the query result to indicate that the part of the query condition matched in the query result is highlighted in another color.

Note: The highlight display does not reflect the Boolean logic of the query when extracting the terms to be highlighted. So for some complex Boolean queries (such as nested Boolean queries, or queries using minimum_should_match, etc.) highlighting can be a bit of an error.

Highlight the actual contents of the required fields. If the field is not stored (the mapping does not set store to true), the relevant field is extracted from the _source. Elasticsearch supports three highlighting tools, which are used by specifying type for each field.

Unified Highlighter uses the Lucene Unified highlighter display. The text is first broken down into sentences and the individual sentences are scored using the BM25 algorithm as if they were documents in a corpus. Support for precise phrases and multi-term (blur, prefix, regular expression) highlighting. This is the es’s default highlight display.
Plain Highlighter uses standard Lucene highlighter(Lucene standard highlighter). Plain Highlighter is best suited for matching highlighting requirements for individual fields. To accurately reflect the query logic, it creates a small index in memory and rerun the original query criteria through Lucene’s query execution plan to get lower-level matching information for the current document. If you need to highlight multiple fields, it is recommended to use Unified Highlighter or TERM_vector fields.

Plain Highlighter is a real-time analysis and processing highlighter. That is, when the user queries, the search engine queries the target data doCID, extracts the highlighted field data to the memory, and then calls the analyzer of the field for processing. The analyzer analyzes and processes the text. After the analysis, the similarity algorithm is used to calculate the top N groups with the highest score and highlight the segment to return the data. Suppose the user is searching for large documents and needs to highlight them. Displaying 40 queries per page (20K each), even if similarity calculation and search sorting are not time-consuming, the entire query will be highlighted down to nearly two seconds. Highlighter is a real-time analysis highlighting device. This real-time analysis mechanism makes ES occupy less IO resources and less storage space (compared with FVH, it can save half of storage space if the word library is full). In fact, real-time computing highlighting uses CPU resources to relieve IO pressure. In short highlighted fields (such as the title of the article), the speed is faster, and the IO pressure is small due to the small number of I/O accesses, which is conducive to improving system throughput.

Resources: blog.csdn.net/kjsoftware/…

Fast Vector Highlighter uses Lucene Fast Vector Highlingter, based on word vectors, the highlighting processor must turn terM_vector = with_POSItionS_offsets on. Position and offset).

To address the speed performance of highlighting on large text fields, the Lucene highlighting module provides a vector-based highlighting method called fast-vector-highlighter (also known as FVH). Fast-vector-highlighter (FVH) highlighter uses the word vector saved during index construction to directly calculate the highlighted paragraph, which reduces the real-time analysis process compared with plain highlighting. Instead, it reads the word segmentation results directly from disk to memory for calculation. Therefore, the precondition for using FVH is to configure the word vector to store the word position information and the word offset information when building the index.

Note: FVH highlights do not support SPAN queries. If you need support for SPAN queries, try another highlighting, such as Unified Highlighter.

The logic of FVH highlighting is as follows: 1. Analyze the highlighting query syntax and extract the highlighting word set 2 in the expression. Read the word vector set under the document field from disk 3. Walk through the word vector set and extract the word vector 4 that appears in the expression. The word frequency information is read according to the extracted target word vector, and the location information and offset 5 are obtained according to the word frequency. Higher scores by similarity algorithm to obtain the first n group highlight information 6. Read the content of the field (many fields separated by a space), according to the extracted word vector positioning interception highlighted field directly Resources: blog.csdn.net/kjsoftware/…

7.2 Offsets the Strategy

Get the offset policy. One core of highlighting to address is the highlighted root and its position (position and offset).

ES provides the strategy for obtaining Offsets in 3:

The Postings List If index_Options is set to offsets, Unified Highlighter uses this information to highlight The document without reanalyzing The text. It reruns the original query directly against the index and extracts the matching offset from the index. This is important if the field is large, because it does not require reanalyzing the text that needs to be highlighted. Less disk space than the TERm_vector method.
Term Vectors If you set TERM_vector to with_POSItionS_offset in a field map, Unified Highlighter automatically uses term_vector to highlight fields. It is particularly useful for large fields (> 1MB) and highlighting multi-root queries (such as prefixes or wildcards) because it has access to a dictionary of terms for each document. The Fast Vector Highlighter takes effect only when the field mapping TERM_vector is set to with_POSItionS_offset.
Plain highlighting should be used when there is no other option. It creates a small index in memory and reruns the original query criteria through Lucene’s query execution plan to access low-level match information on the current document. Repeat this for each field and document that needs to be highlighted. Plain highlighting is this model.

Note: For large text, Plain highlighting displays can require a lot of time and memory. To prevent this, in the next Elasticsearch, the maximum number of text characters to analyze will be limited to 1 million. The 6.x version is not restricted by default, but you can use the index setting parameter index.highlight.max_analyzed_offset for specific indexes.

7.3 Highlighting Configuration Items The highlighted global configuration is overwritten by the field level.

Boundary_chars sets a collection of boundary strings, which default to:.,! ? \t\n
Boundary_max_scan scans boundary characters. The default is 20
Boundary_scanner specifies how to decompose the highlighted snippet, with options such as chars, sentence, word
Chars characters. Use the character specified by bordery_chars as the highlight boundary. Controls the distance at which boundary characters are scanned by boundary_max_scan. This scanning mode is only applicable to fast vector highlighter.
The highlighted fragment at the boundary of the next sentence determined using Java’s BreakIterator. You can use boundary_scanner_locale to specify the locale to use. Default behavior for Unified Highlighter.
Word word, the fragment highlighted at the boundary of the next word determined by Java’s BreakIterator.
Boundary_scanner_locale Area Settings. The parameter takes the form of a language marker, for example. En-us, -FR, jA-JP. More information can be found in the Locale markup documentation. The default value is local.root.
Encoder indicates whether the code segment should be encoded as HTML: default (no encoding) or HTML (HTML-escapes the code segment text and then inserts a highlight tag).
Fields specifies the highlighted field to retrieve. Wildcard characters are supported. For example, you can specify comment_* to get a highlight of all text and keyword fields beginning with comment_. Note: When you use wildcards, only text and keyword fields are matched.
Force_source Specifies whether to force highlighting from _source. The default is false. The default is to highlight the source field content (_source), even if the field is stored separately.
Fragmenter specifies how to split text in highlighted snippets: the optional values are simple, span. For Plain highlighting only. The default is span.
Simple splits the text into equal sized pieces.
Span splits text into equal sized segments, but tries to avoid splitting text between highlighted terms. This is useful when looking up phrases.
Fragment_offset controls the margin to start highlighting, only for fast vector highlighter.
Fragment_size Highlighted fragment. Default: 100.
Highlight_query Highlights queries other than matching search queries. This is especially useful if you use Rescore queries, as they are not taken into account by default in highlighting. In general, search queries should be included in highlight_query.
Matched_fields combines matches on multiple fields to highlight a single field. This is most intuitive for multiple fields that parse the same string in different ways. All matched_fields must set terM_vector to with_POSItionS_offset, but only the fields to which matches are combined are loaded, so it is recommended that store be set to true. Only suitable for fast vector highlighter.
No_match_size The amount of text that you want to return from the beginning of the field if there is no matching fragment to highlight. The default value is 0(nothing is returned).
Number_of_fragments The maximum number of highlighted fragments returned. If the number of fragments is set to 0, the fragments are not returned. The default value is 5.
Order The value defaults to None and returns highlighted documents in the order of the fields, which can be set to Score (sorted by relevance).
Phrase_limit controls the number of matching phrases in a document to be considered. Prevents the fast vector highlighter from parsing too many phrases and consuming too much memory. When using matched_fields, the phrase_limit phrase for each matched field is considered. Increasing the limit increases query time and consumes more memory. Only fast vector highlighter is supported. The default value is 256.
pre_tags

Used to highlight HTML tags, with post_tags, defaultHighlight text.
post_tags

Used to highlight HTML tags, with pre_tags, by defaultHighlight text.
Require_field_match By default, only fields containing query matches are highlighted. Set require_field_match to false to highlight all fields. The default value is true.
tags_schema

Define highlighting styles, for example.
Type Specifies the highlight display. Possible values are Unified, plain, or FVH. The default value is unified.

7.4 Highlighting Demo

public static void testSearch_highlighting() { RestHighLevelClient client = EsClient.getClient(); try { SearchRequest searchRequest = new SearchRequest(); searchRequest.indices("map_highlighting_01"); SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); SourceBuilder. Query (. / / QueryBuilders matchAllQuery () QueryBuilders. TermQuery (" context ", "id")); HighlightBuilder highlightBuilder = new HighlightBuilder(); highlightBuilder.field("context"); sourceBuilder.highlighter(highlightBuilder); searchRequest.source(sourceBuilder); System.out.println(client.search(searchRequest, RequestOptions.DEFAULT)); } catch (Exception e) { // TODO: handle exception } }Copy the code

The return value is as follows:

{ "took":2, "timed_out":false, "_shards":{ "total":5, "successful":5, "skipped":0, "failed":0 }, "hits":{ "total":1, "Max_score" : 0.2876821, "hits" : [{" _index ":" map_highlighting_01 ", "_type" : "_doc", "_id" : "erYsbmcBeEynCj5VqVTI." "_score":0.2876821, "_source":{"context": "}, "highlight" : {/ / @ 1 "context" : [" city of Chinese and western way can be accepted from the second generation of < em > id < / em >. "]}}}}]Copy the code

Each field returns a subset of the original data, up to a maximum of fragmentSize entries that match the keyword. Normally, this field should replace the original value when displaying text on a page so that it can be highlighted.

8. Rescoring re-scoring mechanism. A query first uses an efficient algorithm to find documents, and then applies another query algorithm to the top N documents that return results, usually these algorithms are inefficient but provide matching accuracy.

Resoring queries are summed with the original query score as follows:

Total Add the two scores
Multiply multiplies the original score by the rescore query score. Used for function query redirection.
Avg is averaged
Max is the maximum
Min is the minimum.

9. Search Type Query Type. Optional values: QUERY_THEN_FETCH, QUERY_AND_FETCH, and DFS_QUERY_THEN_FETCH. Default value: query_then_fetch.

QUERY_THEN_FETCH: Firstly, request is sent to related fragments (multiple) according to routing algorithm. At this time, only documentId and some necessary information (for sorting, etc.) are returned. Then, the results of each fragment are aggregated and sorted, and the number of data pieces (TOP N) specified by the client is selected. It then takes documentId to request specific document information from each shard.
QUERY_AND_FETCH: abandoned in 5.4.x version, QUERY_AND_FETCH directly requests data from each shard node, each shard returns the document information of the requested quantity from the client, and then returns all the returned data as size * (the number of shards after routing) requested by the client.
DFS_QUERY_THEN_FETCH: Before sending the request to each node, a word frequency and correlation calculation are performed. The following process is the same as QUERY_THEN_FETCH. As you can see, the document correlation of this query type is higher, but the performance is worse than QUERY_THEN_FETCH.

10, Scroll query. Es is another form of paging. While a search request returns a single “page” of results, the Scroll API can be used to retrieve a large number of results (or even all of them) from a single search request in much the same way that cursors are used on traditional databases. The Scroll API is not used for real-time user requests, but for processing large amounts of data, for example to reindex the contents of an index into a new index with a different configuration.

10.1 How to Use the Scroll API There are two steps to use the Scroll API:

1, the first step is to specify the scroll query (similar to the database cursor life time) by using the scroll parameter.

POST /twitter/_search? scroll=1m { "size": 100, "query": { "match" : { "title" : "elasticsearch" } } }Copy the code

This method returns an important parameter: scrollId. 2. Step 2: Use this scrollId to pull the next batch of es server (next page data)

POST  /_search/scroll 
{
    "scroll" : "1m", 
    "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ==" 
}
Copy the code

In the third step of the cycle, you can process data in batches. 3. Step 3: Clear scrollId, similar to clearing database cursors, to quickly release resources.

DELETE /_search/scroll
{
    "scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ=="
}
Copy the code

The following are examples of the Java versions of the SCOLL API:

public static void testScoll() { RestHighLevelClient client = EsClient.getClient(); String scrollId = null; try { System.out.println("step 1 start "); // step 1 start SearchRequest searchRequest = new SearchRequest(); searchRequest.indices("map_highlighting_01"); SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); Sourcebuilder.query (QueryBuilders. TermQuery ("context", "id ")); searchRequest.source(sourceBuilder); searchRequest.scroll(TimeValue.timeValueMinutes(1)); SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT); scrollId = result.getScrollId(); // step 1 end // step 2 start if(! StringUtils.isEmpty(scrollId)) { System.out.println("step 2 start "); SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId); scrollRequest.scroll(TimeValue.timeValueMinutes(1)); While (true) {// Loop SearchResponse scollResponse = client.scroll(scrollRequest, requestOptions.default); if(scollResponse.getHits().getHits() == null || scollResponse.getHits().getHits().length < 1) { break; } scrollId = scollResponse.getScrollId(); // Process the file scrollRequest.scrollid (scrollId); } // step 2 end } System.out.println(result); } catch (Exception e) { e.printStackTrace(); } finally { if(! StringUtils.isEmpty(scrollId)) { System.out.println("step 3 start "); // step 3 start ClearScrollRequest clearScrollRequest = new ClearScrollRequest(); clearScrollRequest.addScrollId(scrollId); try { client.clearScroll(clearScrollRequest, RequestOptions.DEFAULT); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } // step 3 end } } }Copy the code

The point here is that the first query returns not only scrollId, but also the first batch of data.

10.2 The Keeping the Search Context Alive Scroll parameter (passed to the search request and each scroll request) tells Elasticsearch how long it should keep the search context active. Its value (for example, 1m, see Time Unitsedit) doesn’t take long enough to process all the data — it just takes long enough to process the previous batch of results. Each scroll request (with the Scroll parameter) sets a new expiration time. If the Scroll request is not passed in, the search context is released as part of the Scroll request. The internal implementation of Scroll is similar to a snapshot. When a scroll request is first received, a snapshot is created for the result matched by the search context, and subsequent changes to the document are not reflected in the API’s results.

For Scroll queries that return a large number of documents, the scroll can be divided into multiple slices that can be used independently, specified by slice.

Such as:

GET /twitter/_search? scroll=1m // @1 { "slice": { // @11 "id": 0, // @12 "max": 2 // @13 }, "query": { "match" : { "title" : "elasticsearch" } } } GET /twitter/_search? scroll=1m // @2 { "slice": { "id": 1, "max": 2 }, "query": { "match" : { "title" : "elasticsearch" } } }Copy the code

@1,@2 two parallel query, according to the fragment to query. @11: Defines a sharding query using slice. @12: indicates the ID of the shard query. @13: Indicates the total number of files queried this time.

This mechanism is perfect for multithreading data.

The specific sharding mechanism is to first forward the request to each shard node, then use the matching document (hashcode(_uid)%slice number) at each node, and then each shard node returns the data to the coordination node. By default, sharding is based on the _UID of the document. To improve the sharding process, you can optimize it as follows and specify the sharding field.

The fragment field type is numeric.
The doc_values of the field is set to true.
This field is indexed in each document.
The value of this field is only assigned at creation time and is not updated.
The cardinality of the fields should be high (equivalent to database index selectivity) to ensure that the data returned by each slice is equal and evenly distributed.

Note that the default slice maximum is 1024, which can be changed with the index setting index.max_slices_per_scroll.

Such as:

GET /twitter/_search? scroll=1m { "slice": { "field": "date", "id": 0, "max": 10 }, "query": { "match" : { "title" : "elasticsearch" } } }Copy the code

Preference Indicates the preference of the query to select the fragment value of the replica in a replication group. By default, Elasticsearch selects from available shard copies in an unspecified order, and routing between copies is covered in more detail in the clustering section. This field allows you to specify the sharding tendency and which copy to select.

Preference Optional values:

_primary is executed only on nodes. It is deprecated after 6.1.0 and will be removed in 7.x.
_primary_first Is performed first on the primary node. Deprecated after 6.1.0 and will be removed in 7.x.
The _replica operation is performed only on the replica fragment. If there are multiple copies, the sequence is random. Deprecated after 6.1.0 and will be removed in 7.x.
_replica_first is executed preferentially on a shard or in a random order if there are multiple replicas. Deprecated after 6.1.0 and will be removed in 7.x.
The _only_local operation will only be performed on the shard assigned to the local node. The _only_local option ensures that shard copies are used only on local nodes, which is sometimes useful for troubleshooting. All other options do not fully guarantee that any particular shard copy will be used in the search, and when the index changes, this may mean different results if repeated searches are performed on different shard copies in different flush states.
_local Takes precedence over local fragments.
_prefer_nodes: ABC,xyz Indicates the shard with the specified node ID. In this example, the node ids are ABC and xyz.
_shards:2,3 limits the operation to the specified shard. (here is 2, and 3) the preferences can be combined with other preferences, but must first appeared: _shards: 2, 3 | _local.
_only_nodes: ABC * * z, x… Restrictions are based on node IDS.
Custom (string) Value A Custom string whose route is hashCode (the value)% Specifies the number of nodes in the group. For example, in Web applications, sessionId is used as the trend value.

12, Explain whether explain how each score is calculated.

GET /_search
{
    "explain": true,
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}
Copy the code

Version If set to true, returns the current version number of each hit document.

GET /_search
{
    "version": true,
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}
Copy the code

When searching for multiple indexes, it is possible to configure a different Boost level for each Index. This property comes in handy when clicks from one index are more important than clicks from another index.

The following is an example:

GET / _search {" indices_boost ": [{" alias1", 1.4}, {" index "*", 1.3}]}Copy the code

15. Min_score specifies the minimum score for the document to return. If the score is lower than this value, the document is not returned.

GET /_search
{
    "min_score": 0.5,
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}
Copy the code

Named Queries Each filter and query can accept a _name in its top-level definition. The matched_QUERIES structure is added to each matching document in the search response to record the matching query name of the document. The tags for queries and filters are meaningful only for bool queries.

The following is an example of Java:

public static void testNamesQuery() { RestHighLevelClient client = EsClient.getClient(); try { SearchRequest searchRequest = new SearchRequest(); searchRequest.indices("esdemo"); SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); sourceBuilder.query( QueryBuilders.boolQuery() .should(QueryBuilders.termQuery("context", "fox").queryName("q1")) .should(QueryBuilders.termQuery("context", "brown").queryName("q2")) ); searchRequest.source(sourceBuilder); SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT); System.out.println(result); } catch (Throwable e) { e.printStackTrace(); } finally { EsClient.close(client); }}Copy the code

The result is as follows:

{ "took":4, "timed_out":false, "_shards":{ "total":5, "successful":5, "skipped":0, "failed":0 }, "hits":{ "total":2, "Max_score" : 0.5753642, "hits" : [{" _index ":" esdemo ", "_type" : "matchquerydemo", "_id" : "2", "_score" : 0.5753642, "_source":{ "context":"My quick brown as fox eats rabbits on a regular basis.", "title":"Keeping pets healthy" }, "Matched_queries" : [" q1 ", "q2"]}, {" _index ":" esdemo ", "_type" : "matchquerydemo", "_id" : "1", "_score" : 0.39556286, "_source":{ "context":"Brown rabbits are commonly seen brown.", "title":"Quick brown rabbits" }, "matched_queries":[ "q2" ] } ] } }Copy the code

As mentioned above, each matching document contains matched_queries that indicate which query criteria the document matches.

Inner hits is used to define the return rule of the Inner nested layer.

From is used for internal matching pages.
Size for internal matching pages, size.
Sort Sort policy.
Name is the name defined for the internal nesting layer.

This part of the example will be highlighted in the next section.

18, Field collapsing(collapsing) allows search results to be collapsed by field value. Folding is done by selecting only the highest-sorted document on each fold key. Somewhat similar to aggregation grouping, the effect is similar to grouping by field, with the first layer of the list of documents hit by default by the first information of that field, and also by allowing search results to be collapsed by field value. Folding is done by selecting only the highest-sorted document on each fold key. For example, the following query retrieves the best tweets for each user and sorts them by the number of likes.

An example shows the use of Field collapsing.

Select * from elasticSearch (select * from elasticSearch (select * from elasticSearch))

GET /twitter/_search
{
    "query": {
        "match": {
            "message": "elasticsearch"
        }
    },
    "collapse" : {
        "field" : "user" 
    },
    "sort": ["likes"]
}
Copy the code

Return result:

{
    "took":8,
    "timed_out":false,
    "_shards":{
        "total":5,
        "successful":5,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":5,
        "max_score":null,
        "hits":[
            {
                "_index":"mapping_field_collapsing_twitter",
                "_type":"_doc",
                "_id":"OYnecmcB-IBeb8B-bF2X",
                "_score":null,
                "_source":{
                    "message":"to be a elasticsearch",
                    "user":"user2",
                    "likes":3
                },
                "sort":[
                    3
                ]
            },
            {
                "_index":"mapping_field_collapsing_twitter",
                "_type":"_doc",
                "_id":"OonecmcB-IBeb8B-bF2q",
                "_score":null,
                "_source":{
                    "message":"to be elasticsearch",
                    "user":"user2",
                    "likes":3
                },
                "sort":[
                    3
                ]
            },
            {
                "_index":"mapping_field_collapsing_twitter",
                "_type":"_doc",
                "_id":"OInecmcB-IBeb8B-bF2G",
                "_score":null,
                "_source":{
                    "message":"elasticsearch is very high",
                    "user":"user1",
                    "likes":3
                },
                "sort":[
                    3
                ]
            },
            {
                "_index":"mapping_field_collapsing_twitter",
                "_type":"_doc",
                "_id":"O4njcmcB-IBeb8B-Rl2H",
                "_score":null,
                "_source":{
                    "message":"elasticsearch is high db",
                    "user":"user1",
                    "likes":1
                },
                "sort":[
                    1
                ]
            },
            {
                "_index":"mapping_field_collapsing_twitter",
                "_type":"_doc",
                "_id":"N4necmcB-IBeb8B-bF0n",
                "_score":null,
                "_source":{
                    "message":"very likes elasticsearch",
                    "user":"user1",
                    "likes":1
                },
                "sort":[
                    1
                ]
            }
        ]
    }
}
Copy the code

First, the above will list all the users’ tweets. What if you only wanted to show one tweet per user with the highest likes, or two tweets per user? At this time, by field folding on the shining stage. Java Demo:

public static void search_field_collapsing() { RestHighLevelClient client = EsClient.getClient(); try { SearchRequest searchRequest = new SearchRequest(); searchRequest.indices("mapping_field_collapsing_twitter"); SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); sourceBuilder.query( QueryBuilders.matchQuery("message","elasticsearch") ); sourceBuilder.sort("likes", SortOrder.DESC); CollapseBuilder collapseBuilder = new CollapseBuilder("user"); sourceBuilder.collapse(collapseBuilder); searchRequest.source(sourceBuilder); SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT); System.out.println(result); } catch (Throwable e) { e.printStackTrace(); } finally { EsClient.close(client); }}Copy the code

The results are as follows:

{
    "took":22,
    "timed_out":false,
    "_shards":{
        "total":5,
        "successful":5,
        "skipped":0,
        "failed":0
    },
    "hits":{
        "total":5,
        "max_score":null,
        "hits":[
            {
                "_index":"mapping_field_collapsing_twitter",
                "_type":"_doc",
                "_id":"OYnecmcB-IBeb8B-bF2X",
                "_score":null,
                "_source":{
                    "message":"to be a elasticsearch",
                    "user":"user2",
                    "likes":3
                },
                "fields":{
                    "user":[
                        "user2"
                    ]
                },
                "sort":[
                    3
                ]
            },
            {
                "_index":"mapping_field_collapsing_twitter",
                "_type":"_doc",
                "_id":"OInecmcB-IBeb8B-bF2G",
                "_score":null,
                "_source":{
                    "message":"elasticsearch is very high",
                    "user":"user1",
                    "likes":3
                },
                "fields":{
                    "user":[
                        "user1"
                    ]
                },
                "sort":[
                    3
                ]
            }
        ]
    }
}
Copy the code

The above example only returns the first piece of data per user. What if you need to return two pieces of data per user? You can set this with inner_hit.

public static void search_field_collapsing() { RestHighLevelClient client = EsClient.getClient(); try { SearchRequest searchRequest = new SearchRequest(); searchRequest.indices("mapping_field_collapsing_twitter"); SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); sourceBuilder.query( QueryBuilders.matchQuery("message","elasticsearch") ); sourceBuilder.sort("likes", SortOrder.DESC); CollapseBuilder collapseBuilder = new CollapseBuilder("user"); InnerHitBuilder collapseHitBuilder = new InnerHitBuilder("collapse_inner_hit"); collapseHitBuilder.setSize(2); collapseBuilder.setInnerHits(collapseHitBuilder); sourceBuilder.collapse(collapseBuilder); searchRequest.source(sourceBuilder); SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT); System.out.println(result); } catch (Throwable e) { e.printStackTrace(); } finally { EsClient.close(client); }}Copy the code

The result is as follows:

{ "took":42, "timed_out":false, "_shards":{ "total":5, "successful":5, "skipped":0, "failed":0 }, "hits":{ "total":5, "max_score":null, "hits":[ { "_index":"mapping_field_collapsing_twitter", "_type":"_doc", "_id":"OYnecmcB-IBeb8B-bF2X", "_score":null, "_source":{ "message":"to be a elasticsearch", "user":"user2", "likes":3 }, "fields":{ "user":[ "user2" ] }, "sort":[ 3 ], "inner_hits":{ "collapse_inner_hit":{ "hits":{ "total":2, "Max_score" : 0.19363807, "hits" : [{" _index ":" mapping_field_collapsing_twitter ", "_type" : "_doc." "Id ":" oonecmcB-ibeb8B-bf2q ", "_score":0.19363807, "_source":{"message":"to be elasticSearch ", "user":"user2", "likes":3 } }, { "_index":"mapping_field_collapsing_twitter", "_type":"_doc", "_id":"OYnecmcB-IBeb8B-bF2X", "_score" : 0.17225473, "_source" : {" message ":" to be a elasticsearch ", "user" : "user2", "likes" : 3}}}}}}, { "_index":"mapping_field_collapsing_twitter", "_type":"_doc", "_id":"OInecmcB-IBeb8B-bF2G", "_score":null, "_source":{ "message":"elasticsearch is very high", "user":"user1", "likes":3 }, "fields":{ "user":[ "user1" ] }, "Sort" : [3], "inner_hits" : {" collapse_inner_hit ": {" hits" : {" total ": 3," max_score ": 0.2876821, "hits":[ { "_index":"mapping_field_collapsing_twitter", "_type":"_doc", "_id":"O4njcmcB-IBeb8B-Rl2H", "_score":0.2876821, "_source":{"message":" elasticSearch is high DB ", "user":"user1", "likes":1}, {" _index ":" mapping_field_collapsing_twitter ", "_type" : "_doc", "_id" : "N4necmcB - IBeb8B - bF0n", "_score" : 0.2876821, "_source":{ "message":"very likes elasticsearch", "user":"user1", "likes":1 } } ] } } } } ] } }Copy the code

The result is two levels, the first level, the first message per user, and then the inner_hits nested inside.

19, Search After Elasticsearch supports the third paging method, this method does not support skipping pages.

* * * * * * * * * * * * * * * * * * * * Max_result_window controls the maximum value of (from + size). The default value is 10000, and an error will be reported if this value is exceeded. 2. Through the Scroll API, this method is similar to the working mode of snapshot, without real-time performance, and the storage of scroll context requires certain performance. This section introduces the third paging method, search after, which queries data on the next page based on the results of the previous page. The basic idea is to select a group of fields (sorting fields, which can be globally unique), the sort query response result of ES will return a sort array, containing the maximum value of the sorted field, the next page query will take this group of fields as the query condition, ES returns the next batch of appropriate data based on this data.

The following is an example of Java:

public static void search_search_after() { RestHighLevelClient client = EsClient.getClient(); try { SearchRequest searchRequest = new SearchRequest(); searchRequest.indices("mapping_search_after"); SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); sourceBuilder.query( QueryBuilders.termQuery("user","user2") ); sourceBuilder.size(1); sourceBuilder.sort("id", SortOrder.ASC); searchRequest.source(sourceBuilder); SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT); System.out.println(result); If (hasHit(result)) {// if(hasHit(result)) {// if(hasHit(result)) {// if(hasHit(result)) {// if(hasHit(result)) {// If (hasHit(result)) {// If (hasHit(result)) result.getHits().getHits().length; SearchHit aLastHit = result.getHits().getHits()[length - 1]; / / start the next round of query sourceBuilder. SearchAfter (aLastHit. GetSortValues ()); result = client.search(searchRequest, RequestOptions.DEFAULT); System.out.println(result); } } catch (Throwable e) { e.printStackTrace(); } finally { EsClient.close(client); } } private static boolean hasHit(SearchResponse result) { return ! ( result.getHits() == null || result.getHits().getHits() == null || result.getHits().getHits().length < 1 ); }Copy the code

This article introduces es three kinds of paging, sorting, FROM, size, source filter, dov values fields, POST filter, highlighting, rescoring, search Type, Scroll, preference, Preference, Explain, Version, Index Boost, MIN_score, Names Query, Inner hits, Field Collapsing, Search After.

See article such as surface, I am Weige, keen on systematic analysis of JAVA mainstream middleware, pay attention to the public number “middleware interest circle”, replycolumnCan get into the system column navigation, replydataYou can get the author’s learning mind map.

Elasticsearch Search API (Request Body Search)

Related Posts

SQL Basic Operations in Structured Query Language (TAKE Mysql as an example)

Do not learn countless — SpringBoot entry ⅰ

Spring addresses loop-dependent source analysis