Elasticsearch Core query solves 90% query scenarios

Writing in the front

Official Document Address

The ES query Language is called DSL or Domain Specific Language.

The following index structure is used as an example for this article’s query.

PUT /dying_gq_bookstore
{
  "mappings": {
    "properties": {
      "book_id": {
        "type": "long"
      },
      "title": {
        "type": "text"
      },
      "tag": {
        "type": "keyword"
      },
      "book_type": {
        "type": "keyword"
      },
      "content": {
        "type": "text"
      },
      "status": {
        "type": "keyword"
      },
      "price": {
        "type": "long"
      },
      "stock": {"type": "long"}}}}Copy the code

POST /dying_gq_bookstore
{"index": {}}
{"book_id": 1."title": "Please wake me up when you leave."."tag": "Emotional"."book_type": "Best-selling literature."."content": "Anyway, let me know when you leave."."status": 1."price": "48"."stock": 20000}
{"index": {}}
{"book_id": 1."title": "I'd like two grapefruits, please."."tag": "Emotional"."book_type": "Best-selling literature."."content": "i want to see you"."status": 1."price": "39"."stock": 6000}
{"index": {}}
{"book_id": 1."title": "A gentle breeze"."tag": "Emotional"."book_type": "Best-selling literature."."content": "Go with the flow"."status": 1."price": "36"."stock": 18622}
{"index": {}}
{"book_id": 1."title": "Lost"."tag": "Mystery"."book_type": "Best-selling literature."."content": "Life is like a lamb to the slaughter."."status": 2."price": "54"."stock": 543}
{"index": {}}
{"book_id": 1."title": "Eight million ways to die."."tag": "Mystery"."book_type": "Foreign literature"."content": "Eight million people in this city, eight million ways to die."."status": 2."price": "69"."stock": 888}
{"index": {}}
{"book_id": 1."title": "Holmes and his Dog."."tag": "Mystery"."book_type": "Foreign literature"."content": "This dog has been old for a long time."."status": 2."price": "198"."stock": 88932}
{"index": {}}
{"book_id": 1."title": "I can only walk you one way."."tag": "Cure"."book_type": "Healing literature."."content": "I'm sorry I had to walk you this far."."status": 2."price": "38"."stock": 765}
{"index": {}}
{"book_id": 1."title": "A little bit"."tag": "Traffic"."book_type": "Traffic policing"."content": "Yes, but only a little."."status": 1."price": "99"."stock": 18833}
{"index": {}}
{"book_id": 1."title": "Chenghua Avenue"."tag": "Traffic"."book_type": "Traffic policing"."content": "Should I take Chenghua Avenue?"."status": 1."price": "23"."stock": 2334}
{"index": {}}
{"book_id": 1."title": "Huaxian Bridge"."tag": "Traffic"."book_type": "Traffic policing"."content": "Wah Sin Bridge Guidebook"."status": 1."price": "19"."stock": 210}
Copy the code

Core types

String: stringThe string type contains text and keyword.
- Text: This type is used to index long text. Before creating an index, the text will be segmented into word combinations to create an index. Es is allowed to retrieve these words. Text cannot be used for sorting and aggregation.
- Keyword: This type is non-word segmentation and can be used for search filtering, sorting, and aggregation. The keyword type cannot be used for word segmentation fuzzy retrieval with text.
Numeric types: long, INTEGER, short, byte, double, float
Date type: Date

1. Single condition query

Note the following part of the query will give a similar SQL auxiliary understanding, because the inverted index term matching relationship SQL is approximate is not necessarily equivalent, be careful

Fuzzy matching

The following fuzzy matching results are obtained based on the matching of term after word segmentation of original content by word segmentation. (For the record, all of the following queries can be combined for more complex queries with appropriate syntax)

Match fuzzy match

After the search term is segmented, the query matches the term of the search column in the following example: “Hello world” may be segmented as “hello”, “world”, “you”, and “ok”. Then match the term of the inverted index’s content (depending on the participle).

POST /dying_gq_bookstore/_search
{
  "from": 0.// Paging starts at 0
  "size": 20.Mysql limit 0,20
  "query": {
    "match": {// Term after the search term is segmented to match the field content
      "content": "Hello world." // Field: matches the content}}}Copy the code

Similar to SQL: SELECT * FROM dying_gq_bookstore WHERE content LIKE ‘% hello %’ LIMIT 0,20

Prefix Indicates a fuzzy prefix match

POST /dying_gq_bookstore/_search
{
  "from": 0.// Paging starts at 0
  "size": 20.Mysql limit 0,20
  "query": {
    "prefix": {// Matches the prefix of term after the segmentation of the field content
      "content": "Hello world." // Field: matches the content}}}Copy the code

SQL: SELECT * FROM dying_gq_bookstore WHERE content LIKE ‘hello world %’ LIMIT 0,20

Regexp regular matches

POST /dying_gq_bookstore/_search
{
  "from": 0.// Paging starts at 0
  "size": 20.Mysql limit 0,20
  "query": {
    "regexp": {// word segmentation term matching field content
      "content": "[1-9]" // Field: matches the content}}}Copy the code

An exact match

Term Single field single condition exact match

Term after exact word segmentation, the difference between match and term segmentation is that the word you want to match is not segmented.

POST /dying_gq_bookstore/_search
{
  "from": 0.// Paging starts at 0
  "size": 20.Mysql limit 0,20
  "query": {
    "match": {// Match the term after the segmentation of the field content
      "content": "Hello world." // Field: matches the content}}}Copy the code

Similar to SQL: SELECT * FROM dying_gq_bookstore WHERE content = ‘hello world’ LIMIT 0,20

Terms Accurately matches multiple conditions in a single field

The values in the array are equivalent to the relationship of OR

POST /dying_gq_bookstore/_search
{
  "from": 0.// Paging starts at 0
  "size": 20.Mysql limit 0,20
  "query": {
    "terms": {
      "content": [
        "Hello"."Time"."Good"]}}}Copy the code

Similar to SQL: SELECT * FROM dying_gq_bookstore WHERE content = ‘hello’ OR content = ‘time’ OR content = ‘ok’ LIMIT 0,20

Accurate matching within range

POST /dying_gq_bookstore/_search
{
  "from": 0.// Paging starts at 0
  "size": 20.Mysql limit 0,20
  "query": {
    "range": {
      "book_id": {
        "gte": 10./ / is greater than the
        "lte": 20 / / less than}}}}Copy the code

Similar to SQL: SELECT * FROM dying_gq_bookstore WHERE between 10 and 20 LIMIT 0,20

2. Query combination conditions

Boolean query

Combinable conditions:

Must_not: filter: filter: filter: filter: filter: filter: filter: filter: filter: filter: filter: filter: filter: filter: filter: filter: filter: It doesn’t compute the correlation score, it doesn’t compute the _score which is the correlation score, which is more efficient

POST /dying_gq_bookstore/_search
{
  "query": { 
    "bool": { 
      "must": [{"match": { "title":   "Search"        }},
        { "match": { "content": "Elasticsearch"}}]."filter": [{"term":  { "status": "published" }},
        { "range": { "book_id": { "gte": "200"}}}]}}Copy the code

3. Aggregate search

Bucket: bucket indicates the next bucket in a group. Metric: Indicates statistical analysis (such as quantity, maximum value, and minimum value) for each bucket.

Note that the text type cannot be aggregated

Example 1: Common aggregation

Example 1 Query semantics are as follows:

Group the index of dying_gq_BOOKSTORE by field tag, count the number of different tags and sort them in ascending order.

POST /dying_gq_bookstore/_search
{
  "size": 0.// Set size = 0 to indicate that no ES data document is returned, only aggregated data is returned
  "aggs": { / / the aggregation
    "group_by_tag": { // Aggregate name, you can customize the name
      "terms": {// Aggregate matching mode
        "field": "tag"./ / column name
        "order": { / / sorting
          "_count": "asc" // The number of numbers in the ascending order _count is fixed
        }
      }
    }
  }
}
Copy the code

Example 2: Nested polymerization run analysis

Example 2 Query semantics are as follows:

The dying_gq_BOOKSTORE index is aggregated by field tag. On this basis, statistical analysis is performed for each bucket, namely metric, to find the average price of books under each bucket tag group and sort them in ascending order by the average price. This nested aggregation approach is also known as tripping analysis

POST /dying_gq_bookstore/_search
{
  "size": 0."aggs": { / / the aggregation
    "group_by_tag": { // Aggregate name, you can customize the name
      "terms": {// Aggregate matching mode
        "field": "tag"./ / column name
        "order": { / / sorting
          "avg_by_price": "asc" // The average price is calculated in ascending order}},"aggs": { // Again aggregate nested aggregate drill-down analysis
        "avg_by_price": { // Aggregate name
          "avg": { // Find the average value
            "field": "price" // Average column name
          }
        }
      }
    }
  }
}
Copy the code

Example 3: Multiple nested aggregation

Example 3 Query semantics are as follows:

After grouping with tag tags, the average price of books under the tag tag is analyzed, and then subgroups are divided into book_type based on the bucket of tag group. The subgroups are sorted in ascending order based on the average price of book_type, and the external tag groups are sorted in ascending order based on the average price of tag.

POST /dying_gq_bookstore/_search
{
  "size": 0."aggs": { 
    "group_by_tag": { 
      "terms": {
        "field": "tag"."order": { 
          "avg_by_price_tag": "asc"}},"aggs": { 
        "avg_by_price_tag": { 
          "avg": { 
            "field": "price"}},"group_by_book_type": {"terms": {
            "field": "book_type"."order": {
              "avg_by_price_book_type": "asc"}},"aggs": {
            "avg_by_price_book_type": {
              "avg": {
                "field": "price"
              }
            }
          }
        }
      }
    }
  }
}
Copy the code

To help you understand, the query results are posted as follows:

{
  "took" : 1."timed_out" : false,
  "_shards" : {
    "total" : 1."successful" : 1."skipped" : 0."failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10."relation" : "eq"
    },
    "max_score" : null,
    "hits": []},"aggregations" : {
    "group_by_tag" : {
      "doc_count_error_upper_bound" : 0."sum_other_doc_count" : 0."buckets": [{"key" : "Cure"."doc_count" : 1."avg_by_price_tag" : {
            "value" : 38.0
          },
          "group_by_book_type" : {
            "doc_count_error_upper_bound" : 0."sum_other_doc_count" : 0."buckets": [{"key" : "Healing literature."."doc_count" : 1."avg_by_price_book_type" : {
                  "value" : 38.0}}]}}, {"key" : "Emotional"."doc_count" : 3."avg_by_price_tag" : {
            "value" : 41.0
          },
          "group_by_book_type" : {
            "doc_count_error_upper_bound" : 0."sum_other_doc_count" : 0."buckets": [{"key" : "Best-selling literature."."doc_count" : 3."avg_by_price_book_type" : {
                  "value" : 41.0}}]}}, {"key" : "Traffic"."doc_count" : 3."avg_by_price_tag" : {
            "value" : 47.0
          },
          "group_by_book_type" : {
            "doc_count_error_upper_bound" : 0."sum_other_doc_count" : 0."buckets": [{"key" : "Traffic policing"."doc_count" : 3."avg_by_price_book_type" : {
                  "value" : 47.0}}]}}, {"key" : "Mystery"."doc_count" : 3."avg_by_price_tag" : {
            "value" : 107.0
          },
          "group_by_book_type" : {
            "doc_count_error_upper_bound" : 0."sum_other_doc_count" : 0."buckets": [{"key" : "Best-selling literature."."doc_count" : 1."avg_by_price_book_type" : {
                  "value" : 54.0}}, {"key" : "Foreign literature"."doc_count" : 2."avg_by_price_book_type" : {
                  "value" : 133.5}}]}}}Copy the code

Example 4: Aggregation multiple analysis

Example 4 Query semantics are as follows:

Grouped by book_type, the maximum value, minimum value and total value of prices in the group are analyzed statistically

POST /dying_gq_bookstore/_search
{
  "size": 0."aggs": {
    "group_by_book_type": {
      "terms": {
        "field": "book_type"."order": {
          "price_min": "asc"}},"aggs": {
        "price_max": {
          "max": {
            "field": "price"}},"price_min": {"min": {
            "field": "price"}},"price_sum": {"sum": {
            "field": "price"
          }
        }
      }
      
    }
  }
}
Copy the code

Example 5: Maximum number of aggregation groups

Example 5 Query semantics are as follows:

Grouped by book_type, the most expensive books in the group are statistically analyzed and displayed

POST /dying_gq_bookstore/_search
{
  "size": 0."aggs": {
    "group_by_book_type": {
      "terms": {
        "field": "book_type"
      },
      "aggs": {
        "top_price": {
          "top_hits": {
            "size": 1."sort": [{
              "price": {
                "order": "desc"}}]}}}}}}Copy the code

Example 6: Interval group statistics

Example 6 The query semantics are as follows:

Group book prices by price. The group interval is 20: [0,20) [20,40) [40,60)…… and calculate the maximum book prices in the group

POST /dying_gq_bookstore/_search
{
  "size": 0."aggs": {
    "histogram_by_price": {
      "histogram": {
        "field": "price"."interval": 20 // Group interval
      },
      "aggs": {
        "max_by_price": {
          "max": {
            "field": "price"
          }
        }
      }
      
    }
  }
}
Copy the code

To help you understand, the query results are posted as follows:

{
  "took" : 1."timed_out" : false."_shards" : {
    "total" : 1."successful" : 1."skipped" : 0."failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10."relation" : "eq"
    },
    "max_score" : null."hits": []},"aggregations" : {
    "histogram_by_price" : {
      "buckets": [{"key" : 0.0."doc_count" : 1."max_by_price" : {
            "value" : 19.0}}, {"key" : 20.0."doc_count" : 4."max_by_price" : {
            "value" : 39.0}}, {"key" : 40.0."doc_count" : 2."max_by_price" : {
            "value" : 54.0}}, {"key" : 60.0."doc_count" : 1."max_by_price" : {
            "value" : 69.0}}, {"key" : 80.0."doc_count" : 1."max_by_price" : {
            "value" : 99.0}}, {"key" : 100.0."doc_count" : 0."max_by_price" : {
            "value" : null}}, {"key" : 120.0."doc_count" : 0."max_by_price" : {
            "value" : null}}, {"key" : 140.0."doc_count" : 0."max_by_price" : {
            "value" : null}}, {"key" : 160.0."doc_count" : 0."max_by_price" : {
            "value" : null}}, {"key" : 180.0."doc_count" : 1."max_by_price" : {
            "value" : 198.0}}]}}}Copy the code

Added: Query to remove weight

Select top 5 books with the highest price and press tag to remove the weight

POST /dying_gq_bookstore/_search
{
  "size": 5."collapse": { // Specify the de-duplicate field
    "field": "tag"
  },
  "sort": [{"price": {
        "order": "desc"}}}]Copy the code

conclusion

In general, the ES query DSL provides us with rich syntax and aggregation semantics.

Basic query uses Query to match, prefix, regexp, trem, trems, and range.

Combination condition bool Conditional connections must, should, must_NOT, and filter are combined with the above query conditions.

Aggs aggregation conditions treMS, AVG, SUM, Max, MIN, TOP_HITS, histogram and DATe_HISTOGRAM were analyzed

These three categories are nested within each other and can cover most of our scenarios. More advanced usage can be found on the official website. Official Document Address

I am dying stranded. I watched a movie for 97 hours but failed to get your likes. I think this is not that you don’t like me enough, but that the movie I watched is not long enough…

Elasticsearch Core query solves 90% query scenarios

Writing in the front

Core types

1. Single condition query

Fuzzy matching

An exact match

2. Query combination conditions

3. Aggregate search

conclusion

Related Posts

Troubleshooting a RocketMQ Broker Busy exception

Interviewer: Does BitMap know? Under what circumstances? What problems have you had?

How to dynamically extend fields without changing table structure