The profile
Elasticsearch’s aggregated query has the same effect as the database’s aggregated query.
Basic concept
bucket
Data grouping: Some data is divided into buckets based on a specific field. The data with the same field value is grouped into a bucket. Java Map<String, List> structure, similar to Mysql group by query result.
Metric:
Group by: group by: Max (),min(),avg();
case
We still take English children’s songs as the case background, review the index structure:
PUT /music
{
"mappings": {
"children": {
"properties": {
"id": {
"type": "keyword"
},
"author_first_name": {
"type": "text"."analyzer": "english"
},
"author_last_name": {
"type": "text"."analyzer": "english"
},
"author": {
"type": "text"."analyzer": "english"."fields": {
"keyword": {
"type": "keyword"."ignore_above": 256}}},"name": {
"type": "text"."fields": {
"keyword": {
"type": "keyword"."ignore_above": 256}}},"content": {
"type": "text"."fields": {
"keyword": {
"type": "keyword"."ignore_above": 256}}},"language": {
"type": "text"."analyzer": "english"."fielddata": true
},
"tags": {
"type": "text"."analyzer": "english"
},
"length": {
"type": "long"
},
"likes": {
"type": "long"
},
"isRelease": {
"type": "boolean"
},
"releaseDate": {
"type": "date"
}
}
}
}
}
Copy the code
Figure out which languages have the most songs
GET /music/children/_search
{
"size": 0."aggs": {
"song_qty_by_language": {
"terms": {
"field": "language"}}}}Copy the code
Grammar explanation:
- Size :0 indicates that the original data will not be displayed as long as the statistical results are obtained
- Aggs: Fixed syntax, aggregation analysis must declare AGGs
- Song_qty_by_language: aggregation name. You can write it freely
- Terms: What field are the groups based on
- Field: Indicates the name of a field
The response results are as follows:
{
"took": 2."timed_out": false."_shards": {
"total": 5."successful": 5."skipped": 0."failed": 0
},
"hits": {
"total": 5."max_score": 0."hits": []},"aggregations": {
"song_qty_by_language": {
"doc_count_error_upper_bound": 0."sum_other_doc_count": 0."buckets": [{"key": "english"."doc_count": 5}]}}}Copy the code
Grammar explanation:
- Hits: hits is empty because size:0 was set at request time
- Aggregations: aggregates the query results
- Song_qty_by_language: The name declared at request time
- Buckets: a collection of data groups obtained by querying buckets according to specified fields. [] is a data group, where key is the value of the specified field for each bucket and doc_count is the statistical quantity.
The default sort is doc_count descending.
Count the average duration of each song by language
GET /music/children/_search
{
"size": 0."aggs": {
"lang": {
"terms": {
"field": "language"
},
"aggs": {
"length_avg": {
"avg": {
"field": "length"
}
}
}
}
}
}
Copy the code
What is demonstrated here is a two-layer AGGS aggregation query. The data group is obtained according to the species statistics first, and then the average time is calculated in the data group.
The same is true for multiple AGGS nested syntax, just note the location of the AGGS code block.
Count the songs with the longest and shortest duration, etc
The most common statistics: count, avg, Max, min, sum.
GET /music/children/_search
{
"size": 0."aggs": {
"color": {
"terms": {
"field": "language"
},
"aggs": {
"length_avg": {
"avg": {
"field": "length"}},"length_max": {
"max": {
"field": "length"}},"length_min": {
"min": {
"field": "length"}},"length_sum": {
"sum": {
"field": "length"
}
}
}
}
}
}
Copy the code
Count the average song length by time and length
Look at the average of each 30-second interval.
Histogram syntax position is the same as terms, as a range partition; interval is used together with interval parameter :30 indicates that the interval of segmentation is [0,30),[30,60),[60,90),[90,120)
The closure relation of the segment is left open and right closed. If the data is not in a certain segment, the empty segment will also be returned.
GET /music/children/_search
{
"size": 0."aggs": {
"sales_price_range": {
"histogram": {
"field": "length"."interval": 30
},
"aggs": {
"length_avg": {
"avg": {
"field": "length"
}
}
}
}
}
}
Copy the code
The results of such data can be used to generate bar charts or line charts.
Break down the number of new songs by release date
Monthly statistics
Similar to the histogram syntax, date interval specifies the maximum time range extended_bounds with date interval.
GET /music/children/_search
{
"size": 0."aggs": {
"sales": {
"date_histogram": {
"field": "releaseDate"."interval": "month"."format": "yyyy-MM-dd"."min_doc_count": 0."extended_bounds": {
"min": "2019-10-01"."max": "2019-12-31"
}
}
}
}
}
Copy the code
Interval can be day, week, month, quarter, year, etc. We can stretch it a little bit, like counting the likes of every new song released in each quarter of this year
GET /music/children/_search
{
"size": 0."aggs": {
"sales": {
"date_histogram": {
"field": "releaseDate"."interval": "quarter"."format": "yyyy-MM-dd"."min_doc_count": 0."extended_bounds": {
"min": "2019-01-01"."max": "2019-12-31"}},"aggs": {
"lang_qty": {
"terms": {
"field": "language"
},
"aggs": {
"like_sum": {
"sum": {
"field": "likes"}}}},"total": {"sum": {
"field": "likes"
}
}
}
}
}
}
Copy the code
Belt filtration condition
Aggregate queries can be used with query, which is equivalent to where and group by in mysql
Query conditions
GET /music/children/_search
{
"size": 0."query": {
"match": {
"language": "english"}},"aggs": {
"sales": {
"terms": {
"field": "language"}}}}Copy the code
Filter conditions
GET /music/children/_search
{
"size": 0."query": {
"constant_score": {
"filter": {
"term": {
"language": "english"}}}},"aggs": {
"sales": {
"terms": {
"field": "language"}}}}Copy the code
Global bucket query
Global: is the global bucket, which puts all the data into the aggregate scope, regardless of the previous Query or filter.
The global bucket is used to compare the data of specified conditions with all data at the same time, such as the scenario we created: comparing the number of likes of songs by specified author with all songs.
GET /music/children/_search
{
"size": 0."query": {
"match": {
"author": "Jean Ritchie"}},"aggs": {
"likes": {
"sum": {
"field": "likes"}},"all": {
"global": {},
"aggs": {
"all_likes": {
"sum": {
"field": "likes"
}
}
}
}
}
}
Copy the code
Statistics of the number of likes in recent 2 months and in recent 1 month
Aggs. filter is for the data in the aggregate
Bucket filter: Filters agGs under different buckets
Similar to mysql’s having syntax
GET /music/children/_search
{
"size": 0."aggs": {
"recent_60d": {
"filter": {
"range": {
"releaseDate": {
"gte": "now-60d"}}},"aggs": {
"recent_60d_likes_sum": {
"sum": {
"field": "likes"}}}},"recent_30d": {
"filter": {
"range": {
"releaseDate": {
"gte": "now-30d"}}},"aggs": {
"recent_30d_likes_sum": {
"avg": {
"field": "likes"
}
}
}
}
}
}
Copy the code
Statistical sorting
Default order by doc_count descending, the sorting rule can be changed, order can specify agGS alias, such as length_avg, similar to mysql order by CNT asC.
GET /music/children/_search
{
"size": 0."aggs": {
"group_by_lang": {
"terms": {
"field": "language"."order": {
"length_avg": "desc"}},"aggs": {
"length_avg": {
"avg": {
"field": "length"
}
}
}
}
}
}
Copy the code
summary
This paper mainly introduces the common aggregation query, are given priority to by example, after understanding the basic writing method can be read quickly, there are not good to understand the place, and we are familiar with the database query SQL for comparison, thank you.
Focus on Java high concurrency, distributed architecture, more technical dry products to share and experience, please pay attention to the public account: Java architecture community can scan the left QR code to add friends, invite you to join the Java architecture community wechat group to discuss technology