Elasticsearch master (7)

How to manually control the accuracy of full-text search results

1. Add a title field to the post data

 POST /waws/article/_bulk
 { "update": { "_id": "1"}} {"doc" : {"title" : "this is java and elasticsearch blog"}}
 { "update": { "_id": "2"}} {"doc" : {"title" : "this is java blog"}}
 { "update": { "_id": "3"}} {"doc" : {"title" : "this is elasticsearch blog"}}
 { "update": { "_id": "4"}} {"doc" : {"title" : "this is java, elasticsearch, hadoop blog"}}
 { "update": { "_id": "5"}} {"doc" : {"title" : "this is spark blog"}}
Copy the code

2. Search for blogs with Java or ElasticSearch in the title

This, just like the previous term Query, is different. Search exact value. Search full text. Match Query, which is responsible for full-text retrieval. Of course, if the field to be retrieved is of type NOT_analyzed, then match Query is also equivalent to term Query.

 GET /waws/article/_search
 {
     "query": {
         "match": {
             "title": "java elasticsearch"}}} {"took": 1."timed_out": false,
   "_shards": {
     "total": 5."successful": 5."failed": 0
   },
   "hits": {
     "total": 3."max_score": 0.5910557."hits": [{"_index": "waws"."_type": "article"."_id": "4"."_score": 0.5910557."_source": {
           "articleID": "QQPX-R-3956-#aD8"."userID": 2."hidden": true,
           "postDate": "2017-01-02"."tag": [
             "java"."elasticsearch"]."tag_cnt": 2."view_cnt": 80."title": "this is java, elasticsearch, hadoop blog"}}, {"_index": "waws"."_type": "article"."_id": "3"."_score": 0.2876821."_source": {
           "articleID": "JODL-X-1937-#pV7"."userID": 2."hidden": false,
           "postDate": "2017-01-01"."tag": [
             "hadoop"]."tag_cnt": 1."view_cnt": 100."title": "this is elasticsearch blog"}}, {"_index": "waws"."_type": "article"."_id": "1"."_score": 0.26742277."_source": {
           "articleID": "XHDK-A-1293-#fJ3"."userID": 1."hidden": false,
           "postDate": "2017-01-01"."tag": [
             "java"."hadoop"]."tag_cnt": 2."view_cnt": 30."title": "this is java and elasticsearch blog"}}]}}Copy the code

Use the and keyword flexibly. If you want all search keywords to match, use the and keyword to achieve results that match Query alone cannot achieve

 GET /waws/article/_search
 {
     "query": {
         "match": {
             "title": {
                 "query": "java elasticsearch"."operator": "and"}}}} {"took": 1."timed_out": false,
   "_shards": {
     "total": 5."successful": 5."failed": 0
   },
   "hits": {
     "total": 2."max_score": 0.7465237."hits": [{"_index": "waws"."_type": "article"."_id": "4"."_score": 0.7465237."_source": {
           "articleID": "QQPX-R-3956-#aD8"."userID": 2."hidden": true,
           "postDate": "2017-01-02"."tag": [
             "java"."elasticsearch"]."tag_cnt": 2."view_cnt": 80."title": "this is java, elasticsearch, hadoop blog"}}, {"_index": "waws"."_type": "article"."_id": "1"."_score": 0.53484553."_source": {
           "articleID": "XHDK-A-1293-#fJ3"."userID": 1."hidden": false,
           "postDate": "2017-01-01"."tag": [
             "java"."hadoop"]."tag_cnt": 2."view_cnt": 30."title": "this is java and elasticsearch blog"}}]}}Copy the code

Search for blogs containing at least 3 of the 4 keywords Java, ElasticSearch, Spark, hadoop

The second step in controlling the accuracy of the search results is to specify how many of the keywords must match in order to be returned as a result

 GET /waws/article/_search
 {
   "query": {
     "match": {
       "title": {
         "query": "java elasticsearch spark hadoop"."minimum_should_match": "75%"}}}} {"took": 2."timed_out": false,
   "_shards": {
     "total": 5."successful": 5."failed": 0
   },
   "hits": {
     "total": 1."max_score": 1.3375794."hits": [{"_index": "waws"."_type": "article"."_id": "4"."_score": 1.3375794."_source": {
           "articleID": "QQPX-R-3956-#aD8"."userID": 2."hidden": true,
           "postDate": "2017-01-02"."tag": [
             "java"."elasticsearch"]."tag_cnt": 2."view_cnt": 80."title": "this is java, elasticsearch, hadoop blog"}}]}}Copy the code

5, select title (bool) from title (bool

 GET /waws/article/_search
 {
   "query": {
     "bool": {
       "must":     { "match": { "title": "java" }},
       "must_not": { "match": { "title": "spark"  }},
       "should": [{"match": { "title": "hadoop" }},
                   { "match": { "title": "elasticsearch"}}]}}} {"took": 1."timed_out": false,
   "_shards": {
     "total": 5."successful": 5."failed": 0
   },
   "hits": {
     "total": 3."max_score": 1.3375794."hits": [{"_index": "waws"."_type": "article"."_id": "4"."_score": 1.3375794."_source": {
           "articleID": "QQPX-R-3956-#aD8"."userID": 2."hidden": true,
           "postDate": "2017-01-02"."tag": [
             "java"."elasticsearch"]."tag_cnt": 2."view_cnt": 80."title": "this is java, elasticsearch, hadoop blog"}}, {"_index": "waws"."_type": "article"."_id": "1"."_score": 0.53484553."_source": {
           "articleID": "XHDK-A-1293-#fJ3"."userID": 1."hidden": false,
           "postDate": "2017-01-01"."tag": [
             "java"."hadoop"]."tag_cnt": 2."view_cnt": 30."title": "this is java and elasticsearch blog"}}, {"_index": "waws"."_type": "article"."_id": "2"."_score": 0.19856805."_source": {
           "articleID": "KDKE-B-9947-#kL5"."userID": 1."hidden": false,
           "postDate": "2017-01-02"."tag": [
             "java"]."tag_cnt": 1."view_cnt": 50."title": "this is java blog"}}]}}Copy the code

6, Bool combined multiple search criteria, how to calculate relevance score

Search for the corresponding fractions of must and should, add them up, and divide by the total number of must and should

  • # 1: Java, including all the keywords in should, Hadoop, ElasticSearch
  • # 2: Java, including ElasticSearch in should
  • # 3: Java, which does not contain any of the keywords in should

Should can affect relevancy scores

Must ensures that the keyword must have, and calculates document’s relevance score for the search criteria based on the must condition

On the basis of must, the conditions in should can also be mismatched, but if there are more matches, the Document’s relevance score will be higher

7. Search for Java, Hadoop, Spark, and ElasticSearch with at least three keywords

By default, should does not match any of them. For example, in the search above, this is Java blog, does not match any of the should criteria

There is an exception, however, if there is no must, then at least one match must be made in should. For example, in the following search, there are four criteria in should. By default, a match is returned as a result if one of the criteria is met

However, it can be precisely controlled. In the 4 conditions of should, at least several matches can be returned as the result

 GET /waws/article/_search
 {
   "query": {
     "bool": {
       "should": [{"match": { "title": "java" }},
         { "match": { "title": "elasticsearch"   }},
         { "match": { "title": "hadoop"   }},
         { "match": { "title": "spark"}}]."minimum_should_match": 3}}}Copy the code

Elasticsearch master (8)

Analysis of the underlying principle of Multiword search based on term+bool

1, How to convert regular match to term+should

 {
     "match": { "title": "java elasticsearch"}}Copy the code

When using a multi-value search such as the match Query above, ES automatically converts the match Query to a bool syntax bool should, specifying multiple search terms, and using term Query

 {
   "bool": {
     "should": [{"term": { "title": "java" }},
       {"term": { "title": "elasticsearch"}}}}]Copy the code

2, how to convert and match to term+must

 {
     "match": {
         "title": {
             "query":    "java elasticsearch"."operator": "and"}}} {"bool": {
     "must": [{"term": { "title": "java" }},
       { "term": { "title": "elasticsearch"}}}}]Copy the code

How to convert minimum_should_match

 {
     "match": {
         "title": {
             "query":"java elasticsearch hadoop spark"."minimum_should_match": "75%"}}} {"bool": {
     "should": [{"term": { "title": "java" }},
       { "term": { "title": "elasticsearch"}},
       { "term": { "title": "hadoop" }},
       { "term": { "title": "spark"}}]."minimum_should_match": 3}}Copy the code

Last time, why did we cover two ways to implement multi-value search? And in fact, that’s what sets the stage for this lecture. match query –> bool + term