“This is the 20th day of my participation in the November Gwen Challenge. See details of the event: The Last Gwen Challenge 2021”.

Description The ElasticSearch group query returned unexpectedly

In the use of ES combined query, encountered a very interesting scene, hereby record

In some scenarios, a group query is performed directly on a Field, and the result is not returned. Text fields are not optimised for operations that require per-document field data like aggregations and sorting, So these operations are disabled by default. What is the problem and how to solve it

1. Data preparation

Initialize an index and write some test data

post second-index/_doc
{
  "url": "/test"."execute": {
    "args": "id=10&age=20"."cost": 10."res": "test result"
  },
  "response_code": 200."app": "yhh_demo"
}


post second-index/_doc
{
  "url": "/test"."execute": {
    "args": "id=20&age=20"."cost": 11."res": "test result2"
  },
  "response_code": 200."app": "yhh_demo"
}


post second-index/_doc
{
  "url": "/test"."execute": {
    "args": "id=10&age=20"."cost": 12."res": "test result2"
  },
  "response_code": 200."app": "yhh_demo"
}


post second-index/_doc
{
  "url": "/hello"."execute": {
    "args": "tip=welcome"."cost": 2."res": "welcome"
  },
  "response_code": 200."app": "yhh_demo"
}

post second-index/_doc
{
  "url": "/ 404"."execute": {
    "args": "tip=welcome"."cost": 2."res": "xxxxxxxx"
  },
  "response_code": 404."app": "yhh_demo"
}
Copy the code

2. Query basic knowledge points in groups

Equivalent to group by in SQL, it is often used in aggregation operation statistics scenarios

In ES, this is done using AGGS, with the following syntax

"aggs": {
    "agg-name": { // This agg-name is a custom aggregation name
        "terms": { // Terms represents the aggregated policy, grouped by field
            "field": ""."size": 10}}}Copy the code

For example, if we want to count access by URL, the corresponding query could be

GET second-index/_search
{
  "query": {
    "match_all": {}},"size": 1."aggs": {
    "my-agg": {
      "terms": {
        "field": "url"."size": 2}}}}Copy the code

The problem arises when you execute the grouped query directly above

Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [url] in order to load field data by uninverting the inverted index. Note that this can use Significant Memory this exception

3. Solutions

By default, this type of url is not indexed and does not support aggregate sorting. If required, you need to set fieldData =true or use url.keyword

GET second-index/_search
{
  "query": {
    "match_all": {}},"size": 1."aggs": {
    "my-agg": {
      "terms": {
        "field": "url.keyword"."size": 2}}}}Copy the code

Pay attention to

  • Although we are more interested in grouped results, hits still returns hits. If we want to query only grouped results, we can add size:0 to the query criteria

  • Aggregation operations and query conditions can be combined, for example, only the count corresponding to a URL can be queried

GET second-index/_search
{
  "query": {
    "term": {
      "url.keyword": {
        "value": "/test"}}},"size": 1."aggs": {
    "my-agg": {
      "terms": {
        "field": "url.keyword"."size": 2}}}}Copy the code

Fields of type TEXT are aggregated according to word segmentation. Another way to do this is to set FieldData =true, as follows

PUT second-index/_mapping
{
  "properties": {
    "url": {
      "type": "text"."fielddata": true}}}Copy the code

After the modification, and then according to the URL group query, will not throw exceptions

4. Summary

As a final summary, we need to pay attention when we use a field of ES for grouping operations

When the field type is text, the default scenario does not support group operations. If you must use it to group queries, there are two methods

  • Use its index fields, such asurl.keyword
  • Field add on indexfileddata: trueconfiguration

A gray contact information

All letter is better than no book, the above content, purely one’s words, due to the limited personal ability, it is hard to avoid omissions and mistakes, such as finding bugs or better suggestions, welcome criticism and correction, not grudging gratitude

  • Personal site: blog.hhui.top
  • Micro Blog address: Small Gray Blog
  • QQ: a gray /3302797840
  • Wechat official account: One Grey Blog