Elasticsearch series - Aggregate Query (1)

The profile

Elasticsearch’s aggregated query has the same effect as the database’s aggregated query.

Basic concept

bucket

Data grouping: Some data is divided into buckets based on a specific field. The data with the same field value is grouped into a bucket. Java Map<String, List> structure, similar to Mysql group by query result.

Metric:

Group by: group by: Max (),min(),avg();

case

We still take English children’s songs as the case background, review the index structure:

PUT /music
{
  "mappings": {
      "children": {
        "properties": {
          "id": {
            "type": "keyword"
          },
          "author_first_name": {
            "type": "text"."analyzer": "english"
          },
          "author_last_name": {
            "type": "text"."analyzer": "english"
          },
          "author": {
            "type": "text"."analyzer": "english"."fields": {
              "keyword": {
                "type": "keyword"."ignore_above": 256}}},"name": {
            "type": "text"."fields": {
              "keyword": {
                "type": "keyword"."ignore_above": 256}}},"content": {
            "type": "text"."fields": {
              "keyword": {
                "type": "keyword"."ignore_above": 256}}},"language": {
            "type": "text"."analyzer": "english"."fielddata": true
          },
          "tags": {
            "type": "text"."analyzer": "english"
          },
          "length": {
            "type": "long"
          },
          "likes": {
            "type": "long"
          },
          "isRelease": {
            "type": "boolean"
          },
          "releaseDate": {
            "type": "date"
          }
        }
      }
  }
}
Copy the code

Figure out which languages have the most songs

GET /music/children/_search
{
  "size": 0."aggs": {
    "song_qty_by_language": {
      "terms": {
        "field": "language"}}}}Copy the code

Grammar explanation:

Size :0 indicates that the original data will not be displayed as long as the statistical results are obtained
Aggs: Fixed syntax, aggregation analysis must declare AGGs
Song_qty_by_language: aggregation name. You can write it freely
Terms: What field are the groups based on
Field: Indicates the name of a field

The response results are as follows:

{
  "took": 2."timed_out": false."_shards": {
    "total": 5."successful": 5."skipped": 0."failed": 0
  },
  "hits": {
    "total": 5."max_score": 0."hits": []},"aggregations": {
    "song_qty_by_language": {
      "doc_count_error_upper_bound": 0."sum_other_doc_count": 0."buckets": [{"key": "english"."doc_count": 5}]}}}Copy the code

Grammar explanation:

Hits: hits is empty because size:0 was set at request time
Aggregations: aggregates the query results
Song_qty_by_language: The name declared at request time
Buckets: a collection of data groups obtained by querying buckets according to specified fields. [] is a data group, where key is the value of the specified field for each bucket and doc_count is the statistical quantity.

The default sort is doc_count descending.

Count the average duration of each song by language

GET /music/children/_search
{
  "size": 0."aggs": {
    "lang": {
      "terms": {
        "field": "language"
      },
      "aggs": {
        "length_avg": {
          "avg": {
            "field": "length"
          }
        }
      }
    }
  }
}
Copy the code

What is demonstrated here is a two-layer AGGS aggregation query. The data group is obtained according to the species statistics first, and then the average time is calculated in the data group.

The same is true for multiple AGGS nested syntax, just note the location of the AGGS code block.

Count the songs with the longest and shortest duration, etc

The most common statistics: count, avg, Max, min, sum.

GET /music/children/_search
{
  "size": 0."aggs": {
    "color": {
      "terms": {
        "field": "language"
      },
      "aggs": {
        "length_avg": {
          "avg": {
            "field": "length"}},"length_max": {
          "max": {
            "field": "length"}},"length_min": {
          "min": {
            "field": "length"}},"length_sum": {
          "sum": {
            "field": "length"
          }
        }
      }
    }
  }
}
Copy the code

Count the average song length by time and length

Look at the average of each 30-second interval.

Histogram syntax position is the same as terms, as a range partition; interval is used together with interval parameter :30 indicates that the interval of segmentation is [0,30),[30,60),[60,90),[90,120)

The closure relation of the segment is left open and right closed. If the data is not in a certain segment, the empty segment will also be returned.

GET /music/children/_search
{
  "size": 0."aggs": {
    "sales_price_range": {
      "histogram": {
        "field": "length"."interval": 30
      },
      "aggs": {
        "length_avg": {
          "avg": {
            "field": "length"
          }
        }
      }
    }
  }
}
Copy the code

The results of such data can be used to generate bar charts or line charts.

Break down the number of new songs by release date

Monthly statistics

Similar to the histogram syntax, date interval specifies the maximum time range extended_bounds with date interval.

GET /music/children/_search
{
  "size": 0."aggs": {
    "sales": {
      "date_histogram": {
        "field": "releaseDate"."interval": "month"."format": "yyyy-MM-dd"."min_doc_count": 0."extended_bounds": {
          "min": "2019-10-01"."max": "2019-12-31"
        }
      }
    }
  }
}
Copy the code

Interval can be day, week, month, quarter, year, etc. We can stretch it a little bit, like counting the likes of every new song released in each quarter of this year

GET /music/children/_search
{
  "size": 0."aggs": {
    "sales": {
      "date_histogram": {
        "field": "releaseDate"."interval": "quarter"."format": "yyyy-MM-dd"."min_doc_count": 0."extended_bounds": {
          "min": "2019-01-01"."max": "2019-12-31"}},"aggs": {
        "lang_qty": {
          "terms": {
            "field": "language"
          },
          "aggs": {
            "like_sum": {
              "sum": {
                "field": "likes"}}}},"total": {"sum": {
            "field": "likes"
          }
        }
      }
    }
  }
}
Copy the code

Belt filtration condition

Aggregate queries can be used with query, which is equivalent to where and group by in mysql

Query conditions

GET /music/children/_search
{
  "size": 0."query": {
    "match": {
      "language": "english"}},"aggs": {
    "sales": {
      "terms": {
        "field": "language"}}}}Copy the code

Filter conditions

GET /music/children/_search
{
  "size": 0."query": {
    "constant_score": {
      "filter": {
        "term": {
          "language": "english"}}}},"aggs": {
    "sales": {
      "terms": {
        "field": "language"}}}}Copy the code

Global bucket query

Global: is the global bucket, which puts all the data into the aggregate scope, regardless of the previous Query or filter.

The global bucket is used to compare the data of specified conditions with all data at the same time, such as the scenario we created: comparing the number of likes of songs by specified author with all songs.

GET /music/children/_search
{
  "size": 0."query": {
    "match": {
      "author": "Jean Ritchie"}},"aggs": {
    "likes": {
      "sum": {
        "field": "likes"}},"all": {
      "global": {},
      "aggs": {
        "all_likes": {
          "sum": {
            "field": "likes"
          }
        }
      }
    }
  }
}
Copy the code

Statistics of the number of likes in recent 2 months and in recent 1 month

Aggs. filter is for the data in the aggregate

Bucket filter: Filters agGs under different buckets

Similar to mysql’s having syntax

GET /music/children/_search
{
  "size": 0."aggs": {
    "recent_60d": {
      "filter": {
        "range": {
          "releaseDate": {
            "gte": "now-60d"}}},"aggs": {
        "recent_60d_likes_sum": {
          "sum": {
            "field": "likes"}}}},"recent_30d": {
      "filter": {
        "range": {
          "releaseDate": {
            "gte": "now-30d"}}},"aggs": {
        "recent_30d_likes_sum": {
          "avg": {
            "field": "likes"
          }
        }
      }
    }
  }
}
Copy the code

Statistical sorting

Default order by doc_count descending, the sorting rule can be changed, order can specify agGS alias, such as length_avg, similar to mysql order by CNT asC.

GET /music/children/_search
{
  "size": 0."aggs": {
    "group_by_lang": {
      "terms": {
        "field": "language"."order": {
          "length_avg": "desc"}},"aggs": {
        "length_avg": {
          "avg": {
            "field": "length"
          }
        }
      }
    }
  }
}
Copy the code

summary

This paper mainly introduces the common aggregation query, are given priority to by example, after understanding the basic writing method can be read quickly, there are not good to understand the place, and we are familiar with the database query SQL for comparison, thank you.

Focus on Java high concurrency, distributed architecture, more technical dry products to share and experience, please pay attention to the public account: Java architecture community can scan the left QR code to add friends, invite you to join the Java architecture community wechat group to discuss technology

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Elasticsearch series – Aggregate Query (1)

The profile

Basic concept

bucket

Metric:

case

Figure out which languages have the most songs

Count the average duration of each song by language

Count the songs with the longest and shortest duration, etc

Count the average song length by time and length

Break down the number of new songs by release date

Belt filtration condition

Query conditions

Filter conditions

Global bucket query

Statistics of the number of likes in recent 2 months and in recent 1 month

Statistical sorting

summary

Elasticsearch series – Aggregate Query (1)

The profile

Basic concept

bucket

Metric:

case

Figure out which languages have the most songs

Count the average duration of each song by language

Count the songs with the longest and shortest duration, etc

Count the average song length by time and length

Break down the number of new songs by release date

Belt filtration condition

Query conditions

Filter conditions

Global bucket query

Statistics of the number of likes in recent 2 months and in recent 1 month

Statistical sorting

summary

Related Posts

Computer network learning: packet forwarding and routing, ARP protocol

RSA Encryption Algorithm

Wrote a console for Raspberry PI in Node.js