1. An overview of the
Elasticsearch’s main query syntax includes URI query and body query. URI is lightweight and fast, while body query is a json formatted query with many restrictions. This paper mainly introduces the use of structured query, filter and aggregate. The ES version used in this paper is 6.5.4. The Chinese word segmentation uses IK.
Install and use Elasticsearch
Use of IK word dividers in Elasticsearch
Create the following index in ES and import the data
PUT /news
{
"aliases": {
"test.chixiao.news": {}},"mappings": {"news": {
"dynamic": "false"."properties": {
"id": {
"type": "integer"
},
"title": {
"analyzer": "ik_max_word"."type": "text"
},
"summary": {
"analyzer": "ik_max_word"."type": "text"
},
"author": {
"type": "keyword"
},
"publishTime": {
"type": "date"
},
"modifiedTime": {
"type": "date"
},
"createTime": {
"type": "date"
},
"docId": {
"type": "keyword"
},
"voteCount": {
"type": "integer"
},
"replyCount": {
"type": "integer"}}}},"settings": {"index": {
"refresh_interval": "1s"."number_of_shards": 3."max_result_window": "10000000"."mapper": {
"dynamic": "false"
},
"number_of_replicas": 1
},
"analysis": {
"normalizer": {
"lowercase": {
"type": "custom"."char_filter": []."filter": [
"lowercase"."asciifolding"]}},"analyzer": {
"1gram": {
"type": "custom"."tokenizer": "ngram_tokenizer"}},"tokenizer": {
"ngram_tokenizer": {
"type": "nGram"."min_gram": "1"."max_gram": "1"."token_chars": [
"letter"."digit"]}}}}}Copy the code
2. The query
2.1 A query example
A simple query example is as follows. The query is divided into query and filter. Both types of query structures are in query.
GET /news/_search
{
"query": {"match_all": {}},
"sort": [{"publishTime": {
"order": "desc"}}]."size": 2."from": 0."_source": ["title"."id"."summary"]}Copy the code
Return result:
{
"took" : 7,
"timed_out" : false."_shards" : {
"total": 3."successful": 3."skipped": 0."failed": 0}."hits" : {
"total" : 204,
"max_score" : null,
"hits": [{"_index" : "news"."_type" : "news"."_id" : "228"."_score" : null,
"_source" : {
"summary" : According to shaanxi High People's Court, the Second instance of Xi 'an Intermediate People's Court on The morning of June 11 ruled that Han and others in Shaanxi province were involved in illegal lending. The court rejected the appeal and upheld the original verdict. Intermediate person in Xi 'an"."id" : 228,
"title" : "Shaanxi's first routine loan case verdict: Gang sprayed pepper water on borrowers."
},
"sort": [1560245097000]}, {"_index" : "news"."_type" : "news"."_id" : "214"."_score" : null,
"_source" : {
"summary" : "Netease Entertainment reported on June 11, June 11, gossip media exposure cao Yunjin and his wife Tang Wan appeared in Tianjin Civil Affairs Bureau for divorce. In this regard, netease Entertainment asked Cao yunjin's agent for confirmation and got an exclusive response: "It is indeed a divorce."."id" : 214,
"title" : "Cao Yunjin admits divorce: Amicable Divorce was malicious and malicious."
},
"sort": [1560244657000]}]}Copy the code
In the returned result, took represents time, _shards represents fragment information, current index has three fragments and all three fragments work properly, hits represents hit result, total represents hit total number, max_score represents maximum score, hits represents specific document hit.
There are two types of queries: Filter and query. Precision filtering is fast because it is easy to cache.
2.2 the FIlter query
- term
Term lookup can accurately find records that meet the conditions, where the FIELD identifies the FIELD in the index and the VALUE indicates the VALUE to be queried.
{"term": {
"FIELD": {
"value": "VALUE"}}}Copy the code
For example, to query source for new longitude and longitude news, you can use:
GET /news/_search
{
"query": {"term": {
"source": {
"value": "New Meridian"}}}}Copy the code
-
bool
When more than one logical combination query is required, you can use bool to group the logic. Bool can contain
{
"bool" : {
"must": []."should": []."must_not": [].}}Copy the code
Must: the search result must match. Must_not: the search result must NOT match. Must_not: the search result must match. Minimun_should_match = ‘should’; minimun_should_match = ‘should’; In the case of 0, the content of “should” will only be scored without inversion filtering
GET /news/_search
{
"query": {
"bool": {
"must": [{"term": {
"source": {
"value": "New Meridian"}}}]."should": [{"term": {
"id": {
"value": "4"}}}, {"term": {
"id": {
"value": "75"}}}]."minimum_should_match": 1}}}Copy the code
-
terms
You can use terms to find more than one exact value above, such as the article with id 4 or 75
GET /news/_search
{
"query": {"terms": {
"id": [
"4"."75"]}}}Copy the code
-
range
For queries that require a range, you can use range, where range and term are in the same position, such as finding articles with ids from 1 to 10, where:
gt
:>
Greater than (= greater than)lt
:<
Less thangte
:> =
Greater than or equal tolte
:< =
Less than or equal to
GET /news/_search
{
"query": {"range": {
"id": {
"gte": 1,
"lte": 10}}}}Copy the code
-
exists
In es, you can use exists to find a document where a field exists or does not exist, for example, to find a document where an author field exists. You can also use bool in combination with should and must_NOT to implement a non-existence or possible existence query.
GET /news/_search
{
"query": {
"exists": {"field": "author"}}}Copy the code
2.3. The Query Query
Unlike the exact matching of filter, query can perform full-text search for some fields and score the search results. In ES, only the fields of type text can be partitioned. Although the type keyword is a string, it can only be used as enumeration and cannot be partitioned.
-
match
We can use match when we want to search for a certain field, such as a sports story in an article
GET /news/_search
{
"query": {
"exists": {"field": "author"}}}Copy the code
In match, we can also specify a word splitter. For example, we can specify ik_smart as the word splitter to try to divide the input words into large particles. At this time, the document containing imported red wine will be recalled
GET /news/_search{
"query": {
"match": {
"name": {
"query": "Imported wine"."analyzer": "ik_smart"}}}}Copy the code
For the text of query, several words may be separated. In this case, and can be used to indicate that multiple words are matched before being recalled. If OR is used, it is similar to should that it can control how many words are matched before being recalled. For example, if you search for news containing sports news content, the following query will recall any document containing a sports or news content
GET /news/_search
{
"query": {
"match": {
"summary": {
"query": "Sports news"."operator": "or"."minimum_should_match": 1}}}}Copy the code
-
multi_match
You can use multi_match when you need to search multiple fields, such as title or summary for documents containing news keywords
GET /news/_search
{
"query": {
"multi_match": {
"query": "News"."fields": ["title"."summary"]}}}Copy the code
2.4. Combined query
With these fields for full-text search and filtering, you can implement complex combinatorial queries with bool
GET /news/_search
{
"query": {"bool": {
"must": [{"match": {
"summary": {
"boost": 1,
"query": "Changan"}}}, {"term": {
"source": {
"value": "New Meridian"."boost": 2}}}],"filter": {"bool": {
"must":[
{"term": {"id":75}}]}}}}}Copy the code
Bool must, must_not, should can use term, range, match. If you don’t want some query criteria to participate in the scoring, you can add a filter to bool. None of the query fields in this filter will participate in the scoring, and the query content can be cached.
3. The aggregation
The basic format of aggregation is:
GET /news/_search
{
"size": 0."aggs": {
"NAME": {
"AGG_TYPE": {}}}}Copy the code
Where, NAME stands for the NAME of the current aggregation, which can be any legal string, and AGG_TYPE stands for the type of aggregation, commonly divided into multi-value aggregation and single-value aggregation
3.1. An example of aggregation
GET /news/_search
{
"size": 0."aggs": {
"sum_all": {
"sum": {
"field": "replyCount"}}}}Copy the code
Query the sum of replayCount in the current library and return the result:
{
"took" : 8,
"timed_out" : false."_shards" : {
"total": 3."successful": 3."skipped": 0."failed": 0}."hits" : {
"total" : 204,
"max_score": 0.0."hits": []},"aggregations" : {
"sum_all" : {
"value": 390011.0}}}Copy the code
The returned result will contain the hit Document by default, so size needs to be 0 and sum_all in the result is the name specified in the request.
The aggregation types in Elasticsearch are Metrics and Bucket
3.2. The Metrics
Metrics are mainly returns of single values, such as AVG, Max, min, sum, STATS, etc.
-
max
For example, calculate the maximum number of likes in index
GET /news/_search
{
"size": 0."aggs": {
"max_replay": {
"max": {
"field": "replyCount"}}}}Copy the code
stats
You can use stats to view the total, minimum, maximum, and average value of a field. For example, you can view the basic information about the number of news replies in the document
GET /news/_search
{
"size": 0."aggs": {
"cate": {
"stats": {
"field": "replyCount"}}}}Copy the code
The return result is:
{
"took": 3."timed_out" : false."_shards" : {
"total": 3."successful": 3."skipped": 0."failed": 0}."hits" : {
"total" : 204,
"max_score": 0.0."hits": []},"aggregations" : {
"cate" : {
"count" : 202,
"min": 0.0."max": 32534.0."avg": 1930.7475247524753."sum": 390011.0}}}Copy the code
3.3. The Bucket
Buckets are similar to group BY in SQL. Buckets are used to divide content into buckets
-
terms
After using terms to bucket, you can check the distribution of data, such as how many sources there are in index and how many articles there are in each source. Size is used to specify the categories that return the most
GET /news/_search
{
"size": 0."aggs": {
"myterms": {
"terms": {
"field": "source"."size": 100}}}}Copy the code
3.4. Combinatorial clustering
GET /news/_search
{
"size": 0."aggs": {
"myterms": {
"terms": {
"field": "source"."size": 100}."aggs": {
"replay": {
"terms": {
"field": "replyCount"."size": 10}},"avg_price": {
"avg": {
"field": "voteCount"
}
}
}
}
}
}Copy the code
The above code first buckets the source, then buckets the replayCount for each Souce type, and calculates the average voteCount for each source class
One of the results returned is as follows
{
"key" : "China News Network"."doc_count": 16."avg_price" : {
"value": 1195.0},"replay" : {
"doc_count_error_upper_bound": 0."sum_other_doc_count": 4."buckets": [{"key": 0."doc_count": 3}, {"key" : 1,
"doc_count": 1}, {"key" : 5,
"doc_count": 1}, {"key": 32."doc_count": 1}, {"key" : 97,
"doc_count": 1}, {"key" : 106,
"doc_count": 1}, {"key" : 133,
"doc_count": 1}, {"key" : 155,
"doc_count": 1}, {"key" : 156,
"doc_count": 1}, {"key" : 248,
"doc_count": 1}]}}Copy the code
4. Combination of query and aggregation
With queries and aggregations, we can aggregate the results of a query. For example, if I want to see a summary of news that includes sports, I can query it like this
GET /news/_search
{
"size": 0."query": {"bool": {"must": [{"match": {
"summary": "Sports"}}}}],"aggs": {
"cate": {
"terms": {
"field": "source"}}}}Copy the code
5. To summarize
The syntax for querying Elasticsearch is complex and varied. For details, see the official documentation and authority guide. The authority guide is in 2.x, which is easy to read.
Elasticsearch Authoritative Guide
Elasticsearch6.5 official documentation