You can use two methods to filter search results:
- Use a Boolean query with the filter clause. Search requests apply Boolean filters to search results and summaries.
- Use the post_filter parameter of the search API. Search requests only apply post filters to search hits, not to summaries. You can use the Post Filter to calculate the aggregation against a broader result set and then narrow down the results even further.
You can also rescore after post filter to improve correlation and reorder results.
Post filter
When the post_filter parameter is used to filter search results, the search results are filtered after the aggregation is calculated. The Post filter has no effect on the aggregation result.
For example, you are selling shirts with the following properties:
PUT /shirts { "mappings": { "properties": { "brand": { "type": "keyword"}, "color": { "type": "keyword"}, "model": { "type": "keyword"} } } } PUT /shirts/_doc/1? refresh { "brand": "gucci", "color": "red", "model": "slim" }Copy the code
Suppose the user specified two filters:
Color: red and brand: gucci. You just want to show them red shirts made by Gucci in your search results. Typically, you can do this using a Boolean query:
GET /shirts/_search
{
"query": {
"bool": {
"filter": [
{ "term": { "color": "red" }},
{ "term": { "brand": "gucci" }}
]
}
}
}
Copy the code
However, you also want to use faceted navigation to display a list of additional options that the user can click. Perhaps you have a Model field that allows users to limit search results to red Gucci T-shirts or dress-shirts.
This can be done with terms aggregation:
GET /shirts/_search { "query": { "bool": { "filter": [ { "term": { "color": "red" }}, { "term": { "brand": "Gucci"}}}}], "aggs" : {" models ": {" terms" : {" field ":" the model "} # returned gucci is one of the most popular red shirt type}}}Copy the code
But you might also want to tell the user how many other colors of Gucci shirts are available. If you add terms aggregation only to the color field, you will only return red because the query only returns Gucci’s red shirt.
Instead, you want to include shirts of all colors in the aggregation process, and then apply the color filter only to the search results. This is the purpose of post_filter:
GET /shirts/_search { "query": { "bool": { "filter": { "term": { "brand": "Gucci"} # Now, the main query will look for all Gucci shirts, regardless of color}}}, "AGgs ": {"colors": {"terms": {"field": "Color"} # color AGG return Gucci shirt popular color}, "color_red": {"filter": {"term": {"color": "Aggs ": {"models": {"terms": {"field": "Model"}}}}}, "post_filter": {# Finally post_filter will remove the color other than red from the search results "term": {"color": "red"}}}Copy the code
Rescore filtered search results
By using a secondary (usually more expensive) algorithm rather than applying an expensive algorithm to all documents in the index, records can help improve accuracy by reordering only the topmost (for example, 100-500) documents returned by the Query and Post_filter phases.
A rescore request is executed on each shard before the result is returned by each shard, which is sorted by the node that processes the entire search request.
The rescore API currently has only one implementation: the query rescorer, which uses the query to adjust the score. Alternative loggers, such as pair-wise rescorer, may be provided in the future.
NOTE: An error is raised if an explicit sort is provided for the rescore query (except in descending order _score).
NOTE: When showing the user the pages, you should not change window_size (by passing a different value than the value) as you step through each page, as this may change the top match, causing the result to be confused as the user steps through the page.
Query rescorer
Query Rescorer only performs the second query on the top-K results returned by the Query and post_filter phases. The number of documents to be checked per shard can be controlled by the window_size parameter, which defaults to 10.
By default, the scores of the original query and re-score query are combined linearly to produce a final _score for each document. The relative importance of the original query and rescore query can be controlled by query_weight and rescore_query_weight, respectively. Both default to 1.
Such as:
POST /_search { "query" : { "match" : { "message" : { "operator" : "or", "query" : "the quick brown" } } }, "rescore" : { "window_size" : 50, "query" : { "rescore_query" : { "match_phrase" : { "message" : { "query" : "The quick brown", "slop" : 2}}, "query_weight" : 0.7, "rescore_query_weight" : 1.2}}Copy the code
The combination of scores can be controlled by score_mode:
Score model | instructions |
---|---|
total | Add raw score and re-score query score are the default values |
multiply | Multiplying the original score by the re-score query score is useful for functional query recalibration |
avg | Average raw score and re-score query score |
max | Gets the maximum of the original score and the re-score query score |
min | Take the minimum of the original score and re-score the query score |
Multiple rescores
It is also possible to perform multiple re-scores in sequence:
POST /_search { "query" : { "match" : { "message" : { "operator" : "or", "query" : "the quick brown" } } }, "rescore" : [ { "window_size" : 100, "query" : { "rescore_query" : { "match_phrase" : { "message" : { "query" : "The quick brown"," SLOp ": 2}}}, "query_weight" : 0.7, "rescore_query_weight" : 1.2}}, {"window_size" : 10, "query" : { "score_mode": "multiply", "rescore_query" : { "function_score" : { "script_score": { "script": { "source": "Math.log10(doc.count.value + 2)" } } } } } } ] }Copy the code
The first gets the results of the query, then the second gets the results of the first query, and so on. The second re-score will see the order completed by the first re-score, so you can use a large window on the first re-score to query pulling documents into a smaller window for a second re-score.
See the website: www.elastic.co/guide/en/el…
Translation is not allowed to ask for more advice, translation is not easy do not embezzle, such as use, please indicate the source