Elasticsearch index creation/data retrieval

ES 6.0 did not recommend oneindexUnder the multipletypeAnd will be completely removed in 7.0. In 6.0indexThe next is unable to create multipletypeThe,typeThe resulting field type conflict and retrieval efficiency decline, resulting intypeIt will be removed. (5. X to 6. X)
_allFields are also discarded and usedcopy_toCustom union fields. (5. X to 6. X)
type:text/keywordTo decide whether to participle or not,index: true/falseDetermine whether to index (2.x to 5.x)
analyzerTo set the word splitter separately (2.x to 5.x)

Create indexes

First load the IK and restart the service.

# elasticsearch-plugin install: elasticsearch-plugin https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.6.2/elasticsearch-analysis-ik-6.6.2.zip

Document field types reference: https://www.elastic.co/guide/…

Other parameters reference document fields (different field types may have corresponding attributes) : https://www.elastic.co/guide/…

Let’s create a new index with the name news:

Set the default word splitter to be IK word splitter used to handle Chinese using the default name _doc to define type intentionally turning off the _source store (used to verify the store option) title not storing author without word content store

_source fields may have a look the meaning of this post: https://blog.csdn.net/napoay/…

PUT /news { "settings": { "number_of_shards": 5, "number_of_replicas": 1, "index": { "analysis.analyzer.default.type" : "ik_smart" } }, "mappings": { "_doc": { "_source": { "enabled": false }, "properties": { "news_id": { "type": "integer", "index": true }, "title": { "type": "text", "store": false }, "author": { "type": "keyword" }, "content": { "type": "text", "store": true }, "created_at": { "type": "date", "format": "Yyyy-mm-dd hh: MM :ss"}}}}} # View the structure created GET /news/_mapping

Verify that the word splitter is working

GET /_analyze {"analyzer": "ik_smart", "text": "I love my country"} GET /_analyze {"analyzer": "I love my country"; "Ik_max_word ", "text":" I love my country "}

GET /news/_analyze {"text": "I love my country!" }

GET /news/_analyze {"field": "author" "text": "I love my country"; GET /news/_analyze {"field": "title" "text": "I love the motherland"; }

Add the document

For illustration purposes, the following queries will use these documents as examples.

POST /news/_doc {"news_id": 1, "title": 1, "author": 1, "content": "We learn popular call together, want want want want, in your face and a charming, whoops want want want want, my tail can matter wave", "created_at" : "the 2019-03-26 11:55:20"} {" news_id ": 2," title ": "Together we learn cat", "author" : "wanda cat won't be participle", "content" : "we learn cat together, still want want want want, in your face and a charming, whoops want want want want, my tail can be kept shaking", "created_at" : "Of" the 2019-03-26 11:55:20} {" news_id ": 3," title ":" can't make out ", "author" : "wanda cat", "content" : "I can't make it up, just write some data and test it ", "created_at": "2019-03-26 11:55:20"}

Retrieve the data

GET /news/_doc/_search is the interface to query _doc under news, we use RESTAPI +DSL demonstration

match_all

That is, no retrieval conditions to obtain all the data

# paging retrieve unconditionally Sorting by news_id GET/news / _doc _search {" query ": {" match_all" : {}}, "from" : 0, "size" : 2, "sort" : {" news_id ": "desc" } }

Because we have turned off the _source field, which means that ES only builds an inverted index on the data and does not store its raw data, there is no relevant document raw data in the result. The main reason for this is to demonstrate the highlight mechanism.

match

In general search, many articles say that match query will make word segmentation for the query content, which is not completely correct. Match query also depends on the type of the retrieved field. If the field type itself is the keyword(NOT_ANALYZED), then match is equivalent to term query.

We can use the word splitter to explain how the field will be handled:

GET /news/_analyze {" field ": "title", "text": "field"; Participle? Without words?" }

The query

GET/news / _doc _search {" query ": {" match" : {" title ":" we will be participle "}}, "highlight" : {" fields ": {" title" : {}}}}

If _source is turned off, the store attribute of the field must be turned on to store the raw data of the field. In this way, we can do the highlighting process. Otherwise, there is no original content and the keyword cannot be highlighted

multi_match

To retrieve multiple fields, for example, if I want to query the documents with our keywords in title or content, the following will do:

GET/news / _doc _search {" query ": {" multi_match" : {" query ":" we are good, "" fields" : [" title ", "content"]}}, "highlight" : { "fields": { "title": {}, "content": {} } } }

match_phrase

This one needs to be authenticated to understand match_phrase query, what is a phrase query? To put it simply, the document field to be queried should contain all the keywords after the query content is partitioned and parsed, and the distribution distance offset of keywords in the document should meet the threshold set by slop. Slop representation can shift the keywords several times to satisfy the distribution in the document. If the slop is large enough, even if all keywords are distributed discreetly in the document, it can still be satisfied by translation.

Content: I love China Match_Phrase: I China Slop: 0// I China Slop: 1//

Test case

GET /news/_analyze {"field": "title", "text": "c"} # reponse {"tokens": [{"tokens": "We", "start_offset" : 0, "end_offset" : 2, "type" : "CN_WORD", "position" : 0}, {" token ":" learning ", "start_offset" : 2, "end_offset": 3, "type": "CN_CHAR", "position": 1 } ] }

GET /news/_analyze {"field": "title", "text": "} # reponse {"tokens": [{"tokens": "We", "start_offset" : 0, "end_offset" : 2, "type" : "CN_WORD", "position" : 0}, {" token ":" together ", "start_offset" : 2, "end_offset" : 4, "type" : "CN_WORD", "position" : 1}, {" token ":" learning ", "start_offset" : 4, "end_offset" : 5, "type" : "CN_CHAR", "position": 2 }, ... ] }

Note in the Position field that only if the slop threshold value is greater than the Position difference between two non-adjacent keywords can the position condition of shifting keywords to the query content phrase distribution be satisfied.

The slop must be greater than or equal to 1 for this document to be searchable. The slop must be greater than or equal to 1 for this document to be query.

Use the query phrase pattern:

GET/news / _doc _search {" query ": {" match_phrase" : {" title ": {" query" : "we", "slop" : 1}}}, "highlight" : { "fields": { "title": {} } } }

Query results:

{... {" _index ":" news ", "_type" : "_doc", "_id" : "if - CuGkBddO9SrfVBoil", "_score" : 0.37229446, "highlight" : {" title ": [" < em > < / em > < em > together we learn < / em > meow "]}}, {" _index ":" news ", "_type" : "_doc", "_id" : "iP - AuGkBddO9SrfVOIg3", "_score" : 0.37229446, "highlight" : {" title ": [" < em > < / em > < em > to learn < / em > called"]}}... }

term

The term should be understood only as a keyword to retrieve the index without partitioning the query condition. However, whether fields are indexed by segmentation when the document is stored is determined by _mappings. There may be two indexes [” we “, “together “], but there is no index [” we “], the query cannot be found. Keyword type fields are stored regardless of the word, the establishment of a complete index, query will not be on the query condition word segmentation, is strong consistency.

GET/news / _doc _search {" query ": {" term" : {" title ":" we together "}}, "highlight" : {" fields ": {" title" : {}}}}

terms

Terms is given multiple keywords, like a man-participle

{" query ": {" terms" : {" title ": [" we", "together"]}}, "highlight" : {" fields ": {" title" : {}}}}

Documents that satisfy any keyword [” we “, “together “] can be retrieved.

wildcard

Shell wildcard query:? One character * multiple characters, query the keyword matching pattern in the inverted index.

Query a document for a two – character keyword

{
   "query": {
       "wildcard": {
               "title": "??"
       }
   },
   "highlight": {
        "fields": {
            "title": {},
            "content": {}
        }
    }
}

prefix

Prefix query, query the keyword matching pattern in the inverted index.

{" query ": {" prefix" : {" title ":" I "}}, "highlight" : {" fields ": {" title" : {}, "content" : {}}}}

regexp

Regular expression query, query the keywords in the inverted index that conform to pattern.

Query documents containing keywords of 2 ~ 3 characters

{" query ": {" regexp" : {" title ":". "{2, 3}}}," highlight ": {" fields" : {" title ": {}," content ": {}}}}

bool

A Boolean query links multiple query combinations through a bool: must: must_not: must_none should: one

{" query ": {" bool" : {" must ": {" match" : {" title ":" absolute to have our "}}, "must_not" : {" term ": {" title" : "Never have I}}", "should" : [{" match ": {" content" : "we"}}, {" multi_match ": {" query" : "meet", "fields" : [" title ", "content"]}}, {" match_phrase ": {" title" : "one can be"}}], "filter" : {" range ": {" created_at" : {" lt ": "2020-12-05 12:00:00", "gt": "2019-01-05 12:00:00" } } } } }, "highlight": { "fields": { "title": {}, "content": {}}}}

filter

Filter is typically used in conjunction with something like match to filter the data that meets the criteria of the query.

{
   "query": {
        "bool": {
            "must": {
                "match_all": {}
            },
            "filter": {
                "range": {
                    "created_at": {
                        "lt": "2020-12-05 12:00:00",
                        "gt": "2017-12-05 12:00:00"
                    }
                }
            }
        }
   }
}

Or use it alone

{ "query": { "constant_score" : { "filter": { "range": { "created_at": { "lt": "2020-12-05 12:00:00", "gt": "2017-12-05 12:00:00"}}}}}}

2017-12-05 12:00:00 <= created_at < 2020-12-05 12:00:00 and news_id >= 2

{
   "query": {
       "constant_score" : {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "range": {
                                "created_at": {
                                    "lt": "2020-12-05 12:00:00",
                                    "gt": "2017-12-05 12:00:00"
                                }
                            }
                        },
                        {
                            "range": {
                                "news_id": {
                                    "gte": 2
                                }
                            }
                        }
                    ]
                }
            }
       }
   }
}