Basic syntax for Elasticsearch query and aggregation

1. An overview of the

Elasticsearch’s main query syntax includes URI query and body query. URI is lightweight and fast, while body query is a json formatted query with many restrictions. This paper mainly introduces the use of structured query, filter and aggregate. The ES version used in this paper is 6.5.4. The Chinese word segmentation uses IK.

Install and use Elasticsearch

Use of IK word dividers in Elasticsearch

Create the following index in ES and import the data

PUT /news
{
        "aliases": {
            "test.chixiao.news": {}},"mappings": {"news": {
                "dynamic": "false"."properties": {
                    "id": {
                        "type": "integer"
                    },
                    "title": {
                        "analyzer": "ik_max_word"."type": "text"
                    },
                    "summary": {
                        "analyzer": "ik_max_word"."type": "text"
                    },
                    "author": {
                        "type": "keyword"
                    },
                    "publishTime": {
                        "type": "date"
                    },
                    "modifiedTime": {
                        "type": "date"
                    },
                    "createTime": {
                        "type": "date"
                    },
                    "docId": {
                        "type": "keyword"
                    },
                    "voteCount": {
                        "type": "integer"
                    },
                    "replyCount": {
                        "type": "integer"}}}},"settings": {"index": {
                "refresh_interval": "1s"."number_of_shards": 3."max_result_window": "10000000"."mapper": {
                    "dynamic": "false"
                },
                "number_of_replicas": 1
            },
            "analysis": {
                "normalizer": {
                    "lowercase": {
                        "type": "custom"."char_filter": []."filter": [
                            "lowercase"."asciifolding"]}},"analyzer": {
                    "1gram": {
                        "type": "custom"."tokenizer": "ngram_tokenizer"}},"tokenizer": {
                    "ngram_tokenizer": {
                        "type": "nGram"."min_gram": "1"."max_gram": "1"."token_chars": [
                            "letter"."digit"]}}}}}Copy the code

2. The query

2.1 A query example

A simple query example is as follows. The query is divided into query and filter. Both types of query structures are in query.

GET /news/_search
{
  "query": {"match_all": {}}, 
  "sort": [{"publishTime": {
        "order": "desc"}}]."size": 2."from": 0."_source": ["title"."id"."summary"]}Copy the code

Return result:

{
  "took" : 7,
  "timed_out" : false."_shards" : {
    "total": 3."successful": 3."skipped": 0."failed": 0}."hits" : {
    "total" : 204,
    "max_score" : null,
    "hits": [{"_index" : "news"."_type" : "news"."_id" : "228"."_score" : null,
        "_source" : {
          "summary" : According to shaanxi High People's Court, the Second instance of Xi 'an Intermediate People's Court on The morning of June 11 ruled that Han and others in Shaanxi province were involved in illegal lending. The court rejected the appeal and upheld the original verdict. Intermediate person in Xi 'an"."id" : 228,
          "title" : "Shaanxi's first routine loan case verdict: Gang sprayed pepper water on borrowers."
        },
        "sort": [1560245097000]}, {"_index" : "news"."_type" : "news"."_id" : "214"."_score" : null,
        "_source" : {
          "summary" : "Netease Entertainment reported on June 11, June 11, gossip media exposure cao Yunjin and his wife Tang Wan appeared in Tianjin Civil Affairs Bureau for divorce. In this regard, netease Entertainment asked Cao yunjin's agent for confirmation and got an exclusive response: "It is indeed a divorce."."id" : 214,
          "title" : "Cao Yunjin admits divorce: Amicable Divorce was malicious and malicious."
        },
        "sort": [1560244657000]}]}Copy the code

In the returned result, took represents time, _shards represents fragment information, current index has three fragments and all three fragments work properly, hits represents hit result, total represents hit total number, max_score represents maximum score, hits represents specific document hit.

There are two types of queries: Filter and query. Precision filtering is fast because it is easy to cache.

2.2 the FIlter query

term

Term lookup can accurately find records that meet the conditions, where the FIELD identifies the FIELD in the index and the VALUE indicates the VALUE to be queried.

{"term": {
    "FIELD": {
      "value": "VALUE"}}}Copy the code

For example, to query source for new longitude and longitude news, you can use:

GET /news/_search
{
  "query": {"term": {
    "source": {
      "value": "New Meridian"}}}}Copy the code

bool

When more than one logical combination query is required, you can use bool to group the logic. Bool can contain

{
   "bool" : {
      "must": []."should": []."must_not": [].}}Copy the code

Must: the search result must match. Must_not: the search result must NOT match. Must_not: the search result must match. Minimun_should_match = ‘should’; minimun_should_match = ‘should’; In the case of 0, the content of “should” will only be scored without inversion filtering

GET /news/_search
{
  "query": {
    "bool": {
    "must": [{"term": {
      "source": {
        "value": "New Meridian"}}}]."should": [{"term": {
      "id": {
        "value": "4"}}}, {"term": {
      "id": {
        "value": "75"}}}]."minimum_should_match": 1}}}Copy the code

terms

You can use terms to find more than one exact value above, such as the article with id 4 or 75

GET /news/_search
{
  "query": {"terms": {
    "id": [
      "4"."75"]}}}Copy the code

range

For queries that require a range, you can use range, where range and term are in the same position, such as finding articles with ids from 1 to 10, where:

gt: >Greater than (= greater than)
lt: <Less than
gte: > =Greater than or equal to
lte: < =Less than or equal to

GET /news/_search
{
  "query": {"range": {
    "id": {
      "gte": 1,
      "lte": 10}}}}Copy the code

exists

In es, you can use exists to find a document where a field exists or does not exist, for example, to find a document where an author field exists. You can also use bool in combination with should and must_NOT to implement a non-existence or possible existence query.

GET /news/_search
{
  "query": {
    "exists": {"field": "author"}}}Copy the code

2.3. The Query Query

Unlike the exact matching of filter, query can perform full-text search for some fields and score the search results. In ES, only the fields of type text can be partitioned. Although the type keyword is a string, it can only be used as enumeration and cannot be partitioned.

match

We can use match when we want to search for a certain field, such as a sports story in an article

GET /news/_search
{
  "query": {
    "exists": {"field": "author"}}}Copy the code

In match, we can also specify a word splitter. For example, we can specify ik_smart as the word splitter to try to divide the input words into large particles. At this time, the document containing imported red wine will be recalled

GET /news/_search{
  "query": {
    "match": {
      "name": {
        "query": "Imported wine"."analyzer": "ik_smart"}}}}Copy the code

For the text of query, several words may be separated. In this case, and can be used to indicate that multiple words are matched before being recalled. If OR is used, it is similar to should that it can control how many words are matched before being recalled. For example, if you search for news containing sports news content, the following query will recall any document containing a sports or news content

GET /news/_search
{
  "query": {
    "match": {
      "summary": {
        "query": "Sports news"."operator": "or"."minimum_should_match": 1}}}}Copy the code

multi_match

You can use multi_match when you need to search multiple fields, such as title or summary for documents containing news keywords

GET /news/_search
{
  "query": {
    "multi_match": {
      "query": "News"."fields": ["title"."summary"]}}}Copy the code

2.4. Combined query

With these fields for full-text search and filtering, you can implement complex combinatorial queries with bool

GET /news/_search
{
  "query": {"bool": {
    "must": [{"match": {
        "summary": {
          "boost": 1,
          "query": "Changan"}}}, {"term": {
          "source": {
            "value": "New Meridian"."boost": 2}}}],"filter": {"bool": {
      "must":[
        {"term": {"id":75}}]}}}}}Copy the code

Bool must, must_not, should can use term, range, match. If you don’t want some query criteria to participate in the scoring, you can add a filter to bool. None of the query fields in this filter will participate in the scoring, and the query content can be cached.

3. The aggregation

The basic format of aggregation is:

GET /news/_search
{
  "size": 0."aggs": {
    "NAME": {
      "AGG_TYPE": {}}}}Copy the code

Where, NAME stands for the NAME of the current aggregation, which can be any legal string, and AGG_TYPE stands for the type of aggregation, commonly divided into multi-value aggregation and single-value aggregation

3.1. An example of aggregation

GET /news/_search
{
 "size": 0."aggs": {
    "sum_all": {
      "sum": {
        "field": "replyCount"}}}}Copy the code

Query the sum of replayCount in the current library and return the result:

{
  "took" : 8,
  "timed_out" : false."_shards" : {
    "total": 3."successful": 3."skipped": 0."failed": 0}."hits" : {
    "total" : 204,
    "max_score": 0.0."hits": []},"aggregations" : {
    "sum_all" : {
      "value": 390011.0}}}Copy the code

The returned result will contain the hit Document by default, so size needs to be 0 and sum_all in the result is the name specified in the request.

The aggregation types in Elasticsearch are Metrics and Bucket

3.2. The Metrics

Metrics are mainly returns of single values, such as AVG, Max, min, sum, STATS, etc.

For example, calculate the maximum number of likes in index

GET /news/_search
{
  "size": 0."aggs": {
    "max_replay": {
      "max": {
        "field": "replyCount"}}}}Copy the code

stats

You can use stats to view the total, minimum, maximum, and average value of a field. For example, you can view the basic information about the number of news replies in the document

GET /news/_search
{
 "size": 0."aggs": {
    "cate": {
      "stats": {
        "field": "replyCount"}}}}Copy the code

The return result is:

{
  "took": 3."timed_out" : false."_shards" : {
    "total": 3."successful": 3."skipped": 0."failed": 0}."hits" : {
    "total" : 204,
    "max_score": 0.0."hits": []},"aggregations" : {
    "cate" : {
      "count" : 202,
      "min": 0.0."max": 32534.0."avg": 1930.7475247524753."sum": 390011.0}}}Copy the code

3.3. The Bucket

Buckets are similar to group BY in SQL. Buckets are used to divide content into buckets

terms

After using terms to bucket, you can check the distribution of data, such as how many sources there are in index and how many articles there are in each source. Size is used to specify the categories that return the most

GET /news/_search
{
  "size": 0."aggs": {
    "myterms": {
      "terms": {
        "field": "source"."size": 100}}}}Copy the code

3.4. Combinatorial clustering

GET /news/_search
{
  "size": 0."aggs": {
    "myterms": {
      "terms": {
        "field": "source"."size": 100}."aggs": {
        "replay": {
          "terms": {
            "field": "replyCount"."size": 10}},"avg_price": { 
            "avg": {
                  "field": "voteCount"
               }
            }
      }
    }
  }
}Copy the code

The above code first buckets the source, then buckets the replayCount for each Souce type, and calculates the average voteCount for each source class

One of the results returned is as follows

{
          "key" : "China News Network"."doc_count": 16."avg_price" : {
            "value": 1195.0},"replay" : {
            "doc_count_error_upper_bound": 0."sum_other_doc_count": 4."buckets": [{"key": 0."doc_count": 3}, {"key" : 1,
                "doc_count": 1}, {"key" : 5,
                "doc_count": 1}, {"key": 32."doc_count": 1}, {"key" : 97,
                "doc_count": 1}, {"key" : 106,
                "doc_count": 1}, {"key" : 133,
                "doc_count": 1}, {"key" : 155,
                "doc_count": 1}, {"key" : 156,
                "doc_count": 1}, {"key" : 248,
                "doc_count": 1}]}}Copy the code

4. Combination of query and aggregation

With queries and aggregations, we can aggregate the results of a query. For example, if I want to see a summary of news that includes sports, I can query it like this

GET /news/_search
{
 "size": 0."query": {"bool": {"must": [{"match": {
     "summary": "Sports"}}}}],"aggs": {
    "cate": {
      "terms": {
        "field": "source"}}}}Copy the code

5. To summarize

The syntax for querying Elasticsearch is complex and varied. For details, see the official documentation and authority guide. The authority guide is in 2.x, which is easy to read.

Elasticsearch Authoritative Guide

Elasticsearch6.5 official documentation

Basic syntax for Elasticsearch query and aggregation

1. An overview of the

2. The query

2.1 A query example

2.2 the FIlter query

bool

terms

range

exists

2.3. The Query Query

match

multi_match