Elasticsearch master (7)
How to manually control the accuracy of full-text search results
1. Add a title field to the post data
POST /waws/article/_bulk
{ "update": { "_id": "1"}} {"doc" : {"title" : "this is java and elasticsearch blog"}}
{ "update": { "_id": "2"}} {"doc" : {"title" : "this is java blog"}}
{ "update": { "_id": "3"}} {"doc" : {"title" : "this is elasticsearch blog"}}
{ "update": { "_id": "4"}} {"doc" : {"title" : "this is java, elasticsearch, hadoop blog"}}
{ "update": { "_id": "5"}} {"doc" : {"title" : "this is spark blog"}}
Copy the code
2. Search for blogs with Java or ElasticSearch in the title
This, just like the previous term Query, is different. Search exact value. Search full text. Match Query, which is responsible for full-text retrieval. Of course, if the field to be retrieved is of type NOT_analyzed, then match Query is also equivalent to term Query.
GET /waws/article/_search
{
"query": {
"match": {
"title": "java elasticsearch"}}} {"took": 1."timed_out": false,
"_shards": {
"total": 5."successful": 5."failed": 0
},
"hits": {
"total": 3."max_score": 0.5910557."hits": [{"_index": "waws"."_type": "article"."_id": "4"."_score": 0.5910557."_source": {
"articleID": "QQPX-R-3956-#aD8"."userID": 2."hidden": true,
"postDate": "2017-01-02"."tag": [
"java"."elasticsearch"]."tag_cnt": 2."view_cnt": 80."title": "this is java, elasticsearch, hadoop blog"}}, {"_index": "waws"."_type": "article"."_id": "3"."_score": 0.2876821."_source": {
"articleID": "JODL-X-1937-#pV7"."userID": 2."hidden": false,
"postDate": "2017-01-01"."tag": [
"hadoop"]."tag_cnt": 1."view_cnt": 100."title": "this is elasticsearch blog"}}, {"_index": "waws"."_type": "article"."_id": "1"."_score": 0.26742277."_source": {
"articleID": "XHDK-A-1293-#fJ3"."userID": 1."hidden": false,
"postDate": "2017-01-01"."tag": [
"java"."hadoop"]."tag_cnt": 2."view_cnt": 30."title": "this is java and elasticsearch blog"}}]}}Copy the code
Use the and keyword flexibly. If you want all search keywords to match, use the and keyword to achieve results that match Query alone cannot achieve
GET /waws/article/_search
{
"query": {
"match": {
"title": {
"query": "java elasticsearch"."operator": "and"}}}} {"took": 1."timed_out": false,
"_shards": {
"total": 5."successful": 5."failed": 0
},
"hits": {
"total": 2."max_score": 0.7465237."hits": [{"_index": "waws"."_type": "article"."_id": "4"."_score": 0.7465237."_source": {
"articleID": "QQPX-R-3956-#aD8"."userID": 2."hidden": true,
"postDate": "2017-01-02"."tag": [
"java"."elasticsearch"]."tag_cnt": 2."view_cnt": 80."title": "this is java, elasticsearch, hadoop blog"}}, {"_index": "waws"."_type": "article"."_id": "1"."_score": 0.53484553."_source": {
"articleID": "XHDK-A-1293-#fJ3"."userID": 1."hidden": false,
"postDate": "2017-01-01"."tag": [
"java"."hadoop"]."tag_cnt": 2."view_cnt": 30."title": "this is java and elasticsearch blog"}}]}}Copy the code
Search for blogs containing at least 3 of the 4 keywords Java, ElasticSearch, Spark, hadoop
The second step in controlling the accuracy of the search results is to specify how many of the keywords must match in order to be returned as a result
GET /waws/article/_search
{
"query": {
"match": {
"title": {
"query": "java elasticsearch spark hadoop"."minimum_should_match": "75%"}}}} {"took": 2."timed_out": false,
"_shards": {
"total": 5."successful": 5."failed": 0
},
"hits": {
"total": 1."max_score": 1.3375794."hits": [{"_index": "waws"."_type": "article"."_id": "4"."_score": 1.3375794."_source": {
"articleID": "QQPX-R-3956-#aD8"."userID": 2."hidden": true,
"postDate": "2017-01-02"."tag": [
"java"."elasticsearch"]."tag_cnt": 2."view_cnt": 80."title": "this is java, elasticsearch, hadoop blog"}}]}}Copy the code
5, select title (bool) from title (bool
GET /waws/article/_search
{
"query": {
"bool": {
"must": { "match": { "title": "java" }},
"must_not": { "match": { "title": "spark" }},
"should": [{"match": { "title": "hadoop" }},
{ "match": { "title": "elasticsearch"}}]}}} {"took": 1."timed_out": false,
"_shards": {
"total": 5."successful": 5."failed": 0
},
"hits": {
"total": 3."max_score": 1.3375794."hits": [{"_index": "waws"."_type": "article"."_id": "4"."_score": 1.3375794."_source": {
"articleID": "QQPX-R-3956-#aD8"."userID": 2."hidden": true,
"postDate": "2017-01-02"."tag": [
"java"."elasticsearch"]."tag_cnt": 2."view_cnt": 80."title": "this is java, elasticsearch, hadoop blog"}}, {"_index": "waws"."_type": "article"."_id": "1"."_score": 0.53484553."_source": {
"articleID": "XHDK-A-1293-#fJ3"."userID": 1."hidden": false,
"postDate": "2017-01-01"."tag": [
"java"."hadoop"]."tag_cnt": 2."view_cnt": 30."title": "this is java and elasticsearch blog"}}, {"_index": "waws"."_type": "article"."_id": "2"."_score": 0.19856805."_source": {
"articleID": "KDKE-B-9947-#kL5"."userID": 1."hidden": false,
"postDate": "2017-01-02"."tag": [
"java"]."tag_cnt": 1."view_cnt": 50."title": "this is java blog"}}]}}Copy the code
6, Bool combined multiple search criteria, how to calculate relevance score
Search for the corresponding fractions of must and should, add them up, and divide by the total number of must and should
- # 1: Java, including all the keywords in should, Hadoop, ElasticSearch
- # 2: Java, including ElasticSearch in should
- # 3: Java, which does not contain any of the keywords in should
Should can affect relevancy scores
Must ensures that the keyword must have, and calculates document’s relevance score for the search criteria based on the must condition
On the basis of must, the conditions in should can also be mismatched, but if there are more matches, the Document’s relevance score will be higher
7. Search for Java, Hadoop, Spark, and ElasticSearch with at least three keywords
By default, should does not match any of them. For example, in the search above, this is Java blog, does not match any of the should criteria
There is an exception, however, if there is no must, then at least one match must be made in should. For example, in the following search, there are four criteria in should. By default, a match is returned as a result if one of the criteria is met
However, it can be precisely controlled. In the 4 conditions of should, at least several matches can be returned as the result
GET /waws/article/_search
{
"query": {
"bool": {
"should": [{"match": { "title": "java" }},
{ "match": { "title": "elasticsearch" }},
{ "match": { "title": "hadoop" }},
{ "match": { "title": "spark"}}]."minimum_should_match": 3}}}Copy the code
Elasticsearch master (8)
Analysis of the underlying principle of Multiword search based on term+bool
1, How to convert regular match to term+should
{
"match": { "title": "java elasticsearch"}}Copy the code
When using a multi-value search such as the match Query above, ES automatically converts the match Query to a bool syntax bool should, specifying multiple search terms, and using term Query
{
"bool": {
"should": [{"term": { "title": "java" }},
{"term": { "title": "elasticsearch"}}}}]Copy the code
2, how to convert and match to term+must
{
"match": {
"title": {
"query": "java elasticsearch"."operator": "and"}}} {"bool": {
"must": [{"term": { "title": "java" }},
{ "term": { "title": "elasticsearch"}}}}]Copy the code
How to convert minimum_should_match
{
"match": {
"title": {
"query":"java elasticsearch hadoop spark"."minimum_should_match": "75%"}}} {"bool": {
"should": [{"term": { "title": "java" }},
{ "term": { "title": "elasticsearch"}},
{ "term": { "title": "hadoop" }},
{ "term": { "title": "spark"}}]."minimum_should_match": 3}}Copy the code
Last time, why did we cover two ways to implement multi-value search? And in fact, that’s what sets the stage for this lecture. match query –> bool + term