- ES 6.0 did not recommend one
index
Under the multipletype
And will be completely removed in 7.0. In 6.0index
The next is unable to create multipletype
The,type
The resulting field type conflict and retrieval efficiency decline, resulting intype
It will be removed. (5. X to 6. X) -
_all
Fields are also discarded and usedcopy_to
Custom union fields. (5. X to 6. X) -
type:text/keyword
To decide whether to participle or not,index: true/false
Determine whether to index (2.x to 5.x) -
analyzer
To set the word splitter separately (2.x to 5.x)
Create indexes
First load the IK and restart the service.
# elasticsearch-plugin install: elasticsearch-plugin https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.6.2/elasticsearch-analysis-ik-6.6.2.zip
Document field types reference: https://www.elastic.co/guide/…
Other parameters reference document fields (different field types may have corresponding attributes) : https://www.elastic.co/guide/…
Let’s create a new index with the name news:
Set the default word splitter to be IK word splitter used to handle Chinese using the default name _doc to define type intentionally turning off the _source store (used to verify the store option) title not storing author without word content store
_source fields may have a look the meaning of this post: https://blog.csdn.net/napoay/…
PUT /news { "settings": { "number_of_shards": 5, "number_of_replicas": 1, "index": { "analysis.analyzer.default.type" : "ik_smart" } }, "mappings": { "_doc": { "_source": { "enabled": false }, "properties": { "news_id": { "type": "integer", "index": true }, "title": { "type": "text", "store": false }, "author": { "type": "keyword" }, "content": { "type": "text", "store": true }, "created_at": { "type": "date", "format": "Yyyy-mm-dd hh: MM :ss"}}}}} # View the structure created GET /news/_mapping
Verify that the word splitter is working
GET /_analyze {"analyzer": "ik_smart", "text": "I love my country"} GET /_analyze {"analyzer": "I love my country"; "Ik_max_word ", "text":" I love my country "}
GET /news/_analyze {"text": "I love my country!" }
GET /news/_analyze {"field": "author" "text": "I love my country"; GET /news/_analyze {"field": "title" "text": "I love the motherland"; }
Add the document
For illustration purposes, the following queries will use these documents as examples.
POST /news/_doc {"news_id": 1, "title": 1, "author": 1, "content": "We learn popular call together, want want want want, in your face and a charming, whoops want want want want, my tail can matter wave", "created_at" : "the 2019-03-26 11:55:20"} {" news_id ": 2," title ": "Together we learn cat", "author" : "wanda cat won't be participle", "content" : "we learn cat together, still want want want want, in your face and a charming, whoops want want want want, my tail can be kept shaking", "created_at" : "Of" the 2019-03-26 11:55:20} {" news_id ": 3," title ":" can't make out ", "author" : "wanda cat", "content" : "I can't make it up, just write some data and test it ", "created_at": "2019-03-26 11:55:20"}
Retrieve the data
GET /news/_doc/_search is the interface to query _doc under news, we use RESTAPI +DSL demonstration
match_all
That is, no retrieval conditions to obtain all the data
# paging retrieve unconditionally Sorting by news_id GET/news / _doc _search {" query ": {" match_all" : {}}, "from" : 0, "size" : 2, "sort" : {" news_id ": "desc" } }
Because we have turned off the _source field, which means that ES only builds an inverted index on the data and does not store its raw data, there is no relevant document raw data in the result. The main reason for this is to demonstrate the highlight mechanism.
match
In general search, many articles say that match query will make word segmentation for the query content, which is not completely correct. Match query also depends on the type of the retrieved field. If the field type itself is the keyword(NOT_ANALYZED), then match is equivalent to term query.
We can use the word splitter to explain how the field will be handled:
GET /news/_analyze {" field ": "title", "text": "field"; Participle? Without words?" }
The query
GET/news / _doc _search {" query ": {" match" : {" title ":" we will be participle "}}, "highlight" : {" fields ": {" title" : {}}}}
If _source is turned off, the store attribute of the field must be turned on to store the raw data of the field. In this way, we can do the highlighting process. Otherwise, there is no original content and the keyword cannot be highlighted
multi_match
To retrieve multiple fields, for example, if I want to query the documents with our keywords in title or content, the following will do:
GET/news / _doc _search {" query ": {" multi_match" : {" query ":" we are good, "" fields" : [" title ", "content"]}}, "highlight" : { "fields": { "title": {}, "content": {} } } }
match_phrase
This one needs to be authenticated to understand match_phrase query, what is a phrase query? To put it simply, the document field to be queried should contain all the keywords after the query content is partitioned and parsed, and the distribution distance offset of keywords in the document should meet the threshold set by slop. Slop representation can shift the keywords several times to satisfy the distribution in the document. If the slop is large enough, even if all keywords are distributed discreetly in the document, it can still be satisfied by translation.
Content: I love China Match_Phrase: I China Slop: 0// I China Slop: 1//
Test case
GET /news/_analyze {"field": "title", "text": "c"} # reponse {"tokens": [{"tokens": "We", "start_offset" : 0, "end_offset" : 2, "type" : "CN_WORD", "position" : 0}, {" token ":" learning ", "start_offset" : 2, "end_offset": 3, "type": "CN_CHAR", "position": 1 } ] }
GET /news/_analyze {"field": "title", "text": "} # reponse {"tokens": [{"tokens": "We", "start_offset" : 0, "end_offset" : 2, "type" : "CN_WORD", "position" : 0}, {" token ":" together ", "start_offset" : 2, "end_offset" : 4, "type" : "CN_WORD", "position" : 1}, {" token ":" learning ", "start_offset" : 4, "end_offset" : 5, "type" : "CN_CHAR", "position": 2 }, ... ] }
Note in the Position field that only if the slop threshold value is greater than the Position difference between two non-adjacent keywords can the position condition of shifting keywords to the query content phrase distribution be satisfied.
The slop must be greater than or equal to 1 for this document to be searchable. The slop must be greater than or equal to 1 for this document to be query.
Use the query phrase pattern:
GET/news / _doc _search {" query ": {" match_phrase" : {" title ": {" query" : "we", "slop" : 1}}}, "highlight" : { "fields": { "title": {} } } }
Query results:
{... {" _index ":" news ", "_type" : "_doc", "_id" : "if - CuGkBddO9SrfVBoil", "_score" : 0.37229446, "highlight" : {" title ": [" < em > < / em > < em > together we learn < / em > meow "]}}, {" _index ":" news ", "_type" : "_doc", "_id" : "iP - AuGkBddO9SrfVOIg3", "_score" : 0.37229446, "highlight" : {" title ": [" < em > < / em > < em > to learn < / em > called"]}}... }
term
The term should be understood only as a keyword to retrieve the index without partitioning the query condition. However, whether fields are indexed by segmentation when the document is stored is determined by _mappings. There may be two indexes [” we “, “together “], but there is no index [” we “], the query cannot be found. Keyword type fields are stored regardless of the word, the establishment of a complete index, query will not be on the query condition word segmentation, is strong consistency.
GET/news / _doc _search {" query ": {" term" : {" title ":" we together "}}, "highlight" : {" fields ": {" title" : {}}}}
terms
Terms is given multiple keywords, like a man-participle
{" query ": {" terms" : {" title ": [" we", "together"]}}, "highlight" : {" fields ": {" title" : {}}}}
Documents that satisfy any keyword [” we “, “together “] can be retrieved.
wildcard
Shell wildcard query:? One character * multiple characters, query the keyword matching pattern in the inverted index.
Query a document for a two – character keyword
{
"query": {
"wildcard": {
"title": "??"
}
},
"highlight": {
"fields": {
"title": {},
"content": {}
}
}
}
prefix
Prefix query, query the keyword matching pattern in the inverted index.
{" query ": {" prefix" : {" title ":" I "}}, "highlight" : {" fields ": {" title" : {}, "content" : {}}}}
regexp
Regular expression query, query the keywords in the inverted index that conform to pattern.
Query documents containing keywords of 2 ~ 3 characters
{" query ": {" regexp" : {" title ":". "{2, 3}}}," highlight ": {" fields" : {" title ": {}," content ": {}}}}
bool
A Boolean query links multiple query combinations through a bool: must: must_not: must_none should: one
{" query ": {" bool" : {" must ": {" match" : {" title ":" absolute to have our "}}, "must_not" : {" term ": {" title" : "Never have I}}", "should" : [{" match ": {" content" : "we"}}, {" multi_match ": {" query" : "meet", "fields" : [" title ", "content"]}}, {" match_phrase ": {" title" : "one can be"}}], "filter" : {" range ": {" created_at" : {" lt ": "2020-12-05 12:00:00", "gt": "2019-01-05 12:00:00" } } } } }, "highlight": { "fields": { "title": {}, "content": {}}}}
filter
Filter is typically used in conjunction with something like match to filter the data that meets the criteria of the query.
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"range": {
"created_at": {
"lt": "2020-12-05 12:00:00",
"gt": "2017-12-05 12:00:00"
}
}
}
}
}
}
Or use it alone
{ "query": { "constant_score" : { "filter": { "range": { "created_at": { "lt": "2020-12-05 12:00:00", "gt": "2017-12-05 12:00:00"}}}}}}
2017-12-05 12:00:00 <= created_at < 2020-12-05 12:00:00 and news_id >= 2
{
"query": {
"constant_score" : {
"filter": {
"bool": {
"must": [
{
"range": {
"created_at": {
"lt": "2020-12-05 12:00:00",
"gt": "2017-12-05 12:00:00"
}
}
},
{
"range": {
"news_id": {
"gte": 2
}
}
}
]
}
}
}
}
}