The aggregation framework has been an important part of Elasticsearch since version 1.0, and over the years it has been optimized, fixed, and even had some major overlooks. Many new aggregations have been added to Elasticsearch since version 7.0, such as the rare_terms, top_metrics, or auto_date_histogram aggregations. In this blog post we’ll explore some of them and take a closer look at what they can do for you. To test these new AGGs, we will set up the sample data set in the Elasticsearch 7.9 deployment.
The following documents may represent an e-business use case in which a user clicks on a product and retrieves product details. Of course, it lacks many details, such as individual user session ids, so you can start and monitor sessions, even further down the road, and leverage transforms to gain further insights into the data. This article remains simple to ensure understanding of all the concepts.
Import sample data from the command line using Discover or cURL in Kibana:
PUT website-analytics/_bulk? Refresh {" index ": {}} {" product_id" : "123", "@ timestamp" : "the 2020-10-01 T11: fool. 000 z", "price" : 12.34, "response_time_ms" : 242} {" index ": {}} {" product_id" : "456", "@ timestamp" : "2020-10-02 T12: PM. 000 z", "price" : 20.58, "response_time_ms" : 98} {" index ": {}} {" product_id" : "789", "@ timestamp" : "the 2020-10-03 T13:15:00. 000 z", "price" : 34.16, "response_time_ms" : 123} {" index ": {}} {" product_id" : "123", "@ timestamp" : "the 2020-10-02 T14:16:00. 000 z", "price" : 12.34, "response_time_ms" : 465} {" index ": {}} {" product_id" : "123", "@ timestamp" : "the 2020-10-02 T14:18:00. 000 z", "price" : 12.34, "response_time_ms" : 158} {" index ": {}} {" product_id" : "123", "@ timestamp" : "the 2020-10-03 T15:17:00. 000 z", "price" : 12.34, "response_time_ms" : 168} {" index ": {}} {" product_id" : "789", "@ timestamp" : "the 2020-10-06 T15:17:00. 000 z", "price" : 34.16, "response_time_ms" : 220} {" index ": {}} {" product_id" : "789", "@ timestamp" : "the 2020-10-10 T15:17:00. 000 z", "price" : 34.16, "response_time_ms" : 99}Copy the code
The data above shows a time series data, but there is a data for each day.
Auto-bucketing aggregations
These types of aggregations automatically change the date on which the bucket is defined. When you do a time-based summary, you usually define storage partitions based on time intervals (such as 1D). However, sometimes you don’t know the nature of the data, and it’s easier to just tell the expected number of buckets from the user’s perspective.
This is where the following two new aggregations come in.
auto_date_histogram Aggregation
The auto_date_HISTOGRAM aggregate runs on the date field and allows you to configure the number of buckets you expect to return. Let’s try it on a small data set:
POST website-analytics/_search? size=0 { "aggs": { "views_over_time": { "auto_date_histogram": { "field": "@timestamp", "buckets": 3 } } } }Copy the code
Running the above aggregation produces the result:
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : {"views_over_time" : {"buckets" : [{"key_as_string" : "2020-10-01t00:00:00.000z ", "key" : 1601510400000, "doc_count" : 7}, {" key_AS_string ":" 2020-10-08t00:00:00.000z ", "key" : 1602115200000, "doc_count" : 7}, {" key_AS_string ":" 2020-10-08t00:00:00.000z ", "key_AS_string" : "2020-10-08t00:00:00.000z ", "key" : 1602115200000, "doc_count" : 1 } ], "interval" : "7d" } } }Copy the code
We then run the following aggregation:
POST website-analytics/_search? size=0 { "aggs": { "views_over_time": { "auto_date_histogram": { "field": "@timestamp", "buckets": 10 } } } }Copy the code
The command above produces the result:
{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : {"views_over_time" : {"buckets" : [{"key_as_string" : "2020-10-01t00:00:00.000z ", "key" : 1601510400000, "doc_count" : 1}, {" key_AS_string ":" 2020-10-02t00:00:00.000z ", "key" : 1601596800000, "doc_count" : 1}, {" key_AS_string ":" 2020-10-02t00:00:00.000z ", "key" : 1601596800000, "doc_count" : 3}, {" key_AS_string ":" 2020-10-03T00:00:00.000z ", "key" : 1601683200000, "doc_count" : 2}, {" key_AS_string ":" 2020-10-03T00:00:00.000z ", "doc_count" : 2}, {" key_AS_string ":" 2020-10-03T00:00:00.000z ", "doc_count" : 2}, {" key_AS_string ": 2020-10-04T00:00:00.000z ", "doc_count" :0}, {"key_as_string" :0} 2020-10-05T00:00.000z ", "doc_count" :0}, {"key_as_string" :0} 2020-10-06T00:00:00.000z ", "doc_count" : 1}, {"key_as_string" : "2020-10-07T00:00:00.000z ", "key" : 160202028800000, "doc_count" :0}, {"key_as_string" : 2020-10-08T00:00.000z ", "doc_count" :0}, {"key_as_string" :0} 2020-10-09T00:00.000z ", "doc_count" :0}, {"key_as_string" :0} Doc_count: 1}], "interval" : "1d"}}Copy the code
Running both queries will show the return interval based on the number of buckets requested. If the requirement is 3 barrels, it should be 1 barrel per week, and if it is 10 barrels, it should be 1 barrel per day.
If a minimum interval is required, you can configure it as well, as shown in the auto_date_histogram documentation.
variable_width_histogram Aggregation
The variable-width histogram allows you to dynamically create a preconfigured number of buckets. Most importantly, these buckets are variable in width compared to the fixed width of regular histogram aggregation.
POST website-analytics/_search? size=0 { "aggs": { "prices": { "variable_width_histogram": { "field": "price", "buckets": 3 } } } }Copy the code
Since there are only three different prices in our dataset, the minimum/maximum/key values are the same. However, you can try using two buckets and see that one bucket now has different values. Run the above aggregation:
{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "prices" : Buckets: [{"min" : 12.34000015258789, "key" : 12.34000015258789, "Max" : 12.34000015258789, "doc_count" : 4}, {"min" : 20.579999923706055, "key" : 20.579999923706055, "Max" : 20.579999923706055, "doc_count" : 1}, {"min" : 20.579999923706055, "doc_count" : 1}, {"min" : 34.15999984741211, Max: 34.15999984741211, doc_count: 3}]}}Copy the code
If we set buckets to 2, this is the result:
{ "took" : 0, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "prices" : Buckets: [{"min" : 12.34000015258789, "key" : 13.988000106811523, "Max" : 20.579999923706055, "doc_count" : 5}, {"min" : 34.15999984741211, "key" : 34.15999984741211, "Max" : 34.15999984741211, "doc_count" : 3}]}Copy the code
Also, remember: bucket bounds are approximate.
A use case might be an e-commerce application where you want to display the price segment as part of a multidimensional navigation. However, using this option can make your site navigation quite sensitive to outliers, so consider using category filters before doing so.
Aggregations on strings
rare_terms Aggregation
As an Elastic Stack user, you’ve probably already used Terms Aggregation and had an error. Normally, Terms Aggregation returns the most frequent occurrences of the term in the data set. You can change the sort to return to find the least number of terms. However, this leads to an infinite number of errors, so the result may be an approximation because the data is collected on multiple shards of the entire cluster. This is because Elasticsearch tries to prevent copying all data from different shards to a single node, as doing so is expensive and slow.
Compared to term aggregation, the REar_terms aggregation attempts to circumvent these problems by using a different implementation. Even though this is still approximating counting, rare term aggregations have well-defined bounded errors.
To find the product ID that is least indexed in the above dataset, try the following
POST website-analytics/_search? size=0 { "aggs": { "rarest_product_ids": { "rare_terms": { "field": "product_id.keyword" } } } }Copy the code
The command above returns:
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"rarest_product_ids" : {
"buckets" : [
{
"key" : "456",
"doc_count" : 1
}
]
}
}
}
Copy the code
You can also use max_DOC_count (as opposed to the term aggregated min_DOC_count) and change the number of buckets to return.
string_stats Aggregation
How do I get some statistics about string field values in the data? Let’s try the string_STATS aggregation:
POST website-analytics/_search? size=0 { "aggs": { "rarest_product_ids": { "string_stats": { "field": "product_id.keyword", "show_distribution" : true } } } }Copy the code
The command above shows the result:
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : {"rarest_product_ids" : {"count" : 8, "min_length" : 3, "max_length" : 3, "avg_length" : 3.0, "entropy" : 2.9906015629507223, "Distribution" : {"1" : 0.166666666666666, "2" : 0.16666666666, "3" : 0.1666666666666, "7" : 0.125, "8" : 0.125, "9" : 0.125, "4" : 0.04166666666664, "5" : 0.04166666666664, "6" : 0.04166666666664}}}}Copy the code
This will return statistics about the minimum/maximum/average length of the string in that field, but you will also see the distribution of each character found by adding the show_distribution parameter.
This is an easy way to quickly examine the data to find outliers, such as indexes that might be incorrectly indexed, such as product ids that are too long or too short. Similarly, the Shannon Entropy returned can be used for purposes such as looking up DNS data penetration attempts.
Metrics based aggregations
Let’s dive into the second set of aggregations, which are numerical field metric calculations on top of bucket aggregation.
top_metrics Aggregation
You probably already know the top_hits aggregation, which will return the full match, including its source. However, if you are only interested in a single value and want to sort by it, look at the top_metrics aggregate. If the entire document is not required, this aggregation will be much faster than the top_HITS aggregation and is typically used to retrieve the latest values from each bucket.
In our clickstream data set, you may be interested in the price of the latest click event.
POST website-analytics/_search
{
"size": 0,
"aggs": {
"tm": {
"top_metrics": {
"metrics": {"field": "price"},
"sort": {"@timestamp": "desc"}
}
}
}
}
Copy the code
The following result is returned:
{ "took" : 12, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : {" tm ": {" top" : [{" sort ": [" of" the 2020-10-10 T15:17:00. 000 z], "metrics" : {" price ", 34.15999984741211}}]}}}Copy the code
Sorting also supports _score or geographical distance. In addition, you can specify multiple metrics, so you can add another field to the metrics field, which you then need to turn into an array.
boxplot Aggregation
Boxplot aggregation does exactly what the name says – provides boxplot:
GET website-analytics/_search
{
"size": 0,
"aggs": {
"by_date": {
"date_histogram": {
"field": "@timestamp",
"calendar_interval": "day",
"min_doc_count": 1
},
"aggs": {
"load_time_boxplot": {
"boxplot": {
"field": "price"
}
}
}
}
}
}
Copy the code
The above query returns the following result:
{ "took" : 6, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "by_date" : {"buckets" : [{"key_as_string" : "2020-10-01t00:00:00.000z ", "key" : 1601510400000, "doc_count" : 1, "load_time_boxplot" : {"min" : 12.34000015258789, "Max" : 12.34000015258789, "Q1" : 12.34000015258789, "q2" : 12.34000015258789, "Q3" : 12.34000015258789}}, {" key_AS_STRING ":" 2020-10-02T00:00:00.000z ", "key" : 1601596800000, "doc_count" : 3, "load_time_boxplot" : {"min" : 12.34000015258789, "Max" : 20.579999923706055, "Q1" : 12.34000015258789, "Q2" : 12.34000015258789, "Q3" : 18.519999980926514}}, {" key_AS_string ": "2020-10-03T00:00.000z ", "key" : 1601683200000, "doc_count" : 2, "load_time_boxplot" : {"min" : 12.34000015258789, "Max" : 34.15999984741211, "Q1" : 12.34000015258789, "Q2" : 23.25, "Q3" : }, {" key_AS_string ":" 2020-10-06T00:00:00.000z ", "key_as_string" : "2020-10-06T00:00:00.000z ", "key" : 1601942400000, "doc_count" : 1, "load_time_boxplot" : {"min" : 34.15999984741211, "Max" : 34.15999984741211, "Q1" : 34.15999984741211, "q2" : 34.15999984741211, "Q3" : 34.15999984741211}}, {" key_AS_string ": 1602288000000, "doc_count" : 1, "load_time_boxplot" : {"min" : 34.15999984741211, "Max" : 34.15999984741211, "Q1" : 34.15999984741211, "Q2" : 34.15999984741211, "Q3" : 34.15999984741211}}]}Copy the code
The above query returns a daily box plot with data in the daily bucket.
We will skip the T-test aggregation because the ultra-small data set here does not allow any useful aggregation requests. To see the value of this aggregation, you need a data set that assumes behavioral changes that can be detected by statistical assumptions.
Pipeline aggregations
Next, the aggregation on top of the aggregation called the pipeline aggregation. Last year saw a significant increase in these conversions.
cumulative_cardinality Aggregation
This is a useful aggregation for finding the number of new items in a dataset.
GET website-analytics/_search
{
"size": 0,
"aggs": {
"by_day": {
"date_histogram": {
"field": "@timestamp",
"calendar_interval": "day"
},
"aggs": {
"distinct_products": {
"cardinality": {
"field": "product_id.keyword"
}
},
"total_new_products": {
"cumulative_cardinality": {
"buckets_path": "distinct_products"
}
}
}
}
}
}
Copy the code
The result of the above query is:
{ "took" : 9, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "by_day" : {"buckets" : [{"key_as_string" : "2020-10-01t00:00:00.000z ", "key" : 1601510400000, "doc_count" : 1, "distinct_products" : { "value" : 1 }, "total_new_products" : { "value" : 1 } }, { "key_as_string" : "2020-10-02T00:00.000z ", "key" : 1601596800000, "doc_count" : 3, "distinct_products" : {"value" : 2}, "total_new_products" : {" value ": 2}}, {" key_as_string" : "the 2020-10-03 T00:00:00. 000 z", "key" : 1601683200000, "doc_count" : 2, "distinct_products" : { "value" : 2 }, "total_new_products" : { "value" : }}, {" key_AS_string ":" 2020-10-04T00:00:00.000z ", "key" : 1601769600000, "doc_count" :0, "distinct_products" : {" value ": 0}," total_new_products ": {" value" : 3}}, {" key_as_string ":" the 2020-10-05 T00:00:00. 000 z ", "key" : 1601856000000, "doc_count" : 0, "distinct_products" : { "value" : 0 }, "total_new_products" : { "value" : }}, {" key_AS_string ":" 2020-10-06T00:00:00.000z ", "key" : 1601942400000, "doc_count" : 1, "distinct_products" : {" value ": 1}," total_new_products ": {" value" : 3}}, {" key_as_string ":" the 2020-10-07 T00:00:00. 000 z ", "key" : 1602028800000, "doc_count" : 0, "distinct_products" : { "value" : 0 }, "total_new_products" : { "value" : }}, {" key_AS_string ":" 2020-10-08T00:00:00.000z ", "key" : 1602115200000, "doc_count" :0, "distinct_products" : {" value ": 0}," total_new_products ": {" value" : 3}}, {" key_as_string ":" the 2020-10-09 T00:00:00. 000 z ", "key" : 1602201600000, "doc_count" : 0, "distinct_products" : { "value" : 0 }, "total_new_products" : { "value" : }}, {" key_AS_string ":" 2020-10-10T00:00.000z ", "key" : 1602288000000, "doc_count" : 1, "distinct_products" : { "value" : 1 }, "total_new_products" : { "value" : 3 } } ] } } }Copy the code
From the query above, you can calculate how many new and unknown products are accessed each day and create a count of those products. This may help you determine in an e-commerce environment if your new product is getting real attention, or if your bestseller is at the top of the list and you might want to change the way you market it.
normalize Aggregation
Let’s try to outline which day has the highest percentage of traffic, where 100% is all the data that matches the query.
GET website-analytics/_search
{
"size": 0,
"aggs": {
"by_day": {
"date_histogram": {
"field": "@timestamp",
"calendar_interval": "day"
},
"aggs": {
"normalize": {
"normalize": {
"buckets_path": "_count",
"method": "percent_of_sum"
}
}
}
}
}
}
Copy the code
The result returned above is:
{ "took" : 3, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "by_day" : {"buckets" : [{" key_AS_string ":" 2020-10-01t00:00:00.000z ", "key" : 1601510400000, "doc_count" : 1, "normalize" : {"value" : 0.125}}, {" key_AS_string ":" 2020-10-02t00:00:00.000z ", "key" : 1601596800000, "doc_count" : 3, "normalize" : {" value ": 0.375}}, {" key_as_string" : "the 2020-10-03 T00:00:00. 000 z", "key" : 1601683200000, "doc_count" : 2, "normalize" : {"value" : 0.25}}, {" key_AS_string ": "2020-10-04T00:00.000z ", "key" : 1601769600000, "normalize" : {"value" : 0.0}}, {" key_AS_string ":" 2020-10-05T00:00:00.000z ", "key" : 1601856000000, "doc_count" :0, "normalize" : {"value" : 0.0}}, {" key_AS_string ":" 2020-10-06t00:00:00.000z ", "key" : 1601942400000, "doc_count" : 1, "normalize" : {" value ": 0.125}}, {" key_as_string" : "the 2020-10-07 T00:00:00. 000 z", "key" : 1602028800000, "doc_count" : 0, "normalize" : {"value" : 0.0}}, {" key_AS_string ": "2020-10-08T00:00.000z ", "key" : 1602115200000, "normalize" : {"value" : 0.0}}, {" key_AS_string ":" 2020-10-09t00:00:00.000z ", "key" : 1602201600000, "doc_count" :0, "normalize" : {"value" : 0.0}}, {" key_AS_string ":" 2020-10-10t00:00:00.000z ", "key_as_string" : "2020-10-10t00:00:00.000z ", "key" : 1602288000000, "doc_count" : 1, "normalize" : {"value" : 0.125}}]}}Copy the code
This returns additional information for each bucket: the number of documents found in each bucket as a percentage of the total number of documents returned by the search.
You may want to look at normalize Aggregation documentation as you can choose from more method values, such as mean or range rescaling.
moving percentiles Aggregation
The pipeline aggregation works on top of percentile aggregation and calculates cumulative percentiles using a sliding window.
GET website-analytics/_search
{
"size": 0,
"aggs": {
"by_day": {
"date_histogram": {
"field": "@timestamp",
"calendar_interval": "day"
},
"aggs": {
"response_time": {
"percentiles": {
"field": "response_time_ms",
"percents": [ 75, 99 ]
}
},
"moving_pct": {
"moving_percentiles": {
"buckets_path": "response_time",
"window": 2
}
}
}
}
}
}
Copy the code
The above query returns:
{ "took" : 4, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 8, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "by_day" : {"buckets" : [{"key_as_string" : "2020-10-01t00:00:00.000z ", "key" : 1601510400000, "doc_count" : 1, "response_time" : {" values ": {" 75.0", 242.0, "99.0", 242.0}}}, {" key_as_string ": "2020-10-02T00:00.000z ", "doc_count" : 3, "response_time" : {"values" : {"75.0" : 388.25, the "99.0", 465.0}}, "moving_pct" : {" values ": {" 75.0", 242.0, "99.0", 242.0}}}, {" key_as_string ": "2020-10-03T00:00.000z ", "doc_count" : 2, "response_time" : {"values" : {"75.0" : 168.0, the "99.0", 168.0}}, "moving_pct" : {" values ": {" 75.0", 353.5, "99.0", 465.0}}}, {" key_as_string ": "2020-10-04T00:00.000z ", "doc_count" :0, "response_time" : {"values" : {"75.0" : Null, "99.0" : null}}, "moving_pct" : {" values ": {" 75.0", 242.25, "99.0", 465.0}}}, {" key_as_string ": "2020-10-05T00:00.000z ", "doc_count" :0, "response_time" : {"values" : {"75.0" : Null, "99.0" : null}}, "moving_pct" : {" values ": {" 75.0", 168.0, "99.0", 168.0}}}, {" key_as_string ": "2020-10-06T00:00:00.000z ", "doc_count" : 1, "response_time" : {"values" : {"75.0" : 220.0, the "99.0", 220.0}}, "moving_pct" : {" values ": {" 75.0", null, "99.0" : null}}}, {" key_as_string ": "2020-10-07T00:00.000z ", "doc_count" :0, "response_time" : {"values" : {"75.0" : Null, "99.0" : null}}, "moving_pct" : {" values ": {" 75.0", 220.0, "99.0", 220.0}}}, {" key_as_string ": "2020-10-08T00:00.000z ", "doc_count" :0, "response_time" : {"values" : {"75.0" : Null, "99.0" : null}}, "moving_pct" : {" values ": {" 75.0", 220.0, "99.0", 220.0}}}, {" key_as_string ": "2020-10-09T00:00.000z ", "doc_count" :0, "response_time" : {"values" : {"75.0" : Null, "99.0" : null}}, "moving_pct" : {" values ": {" 75.0", null, "99.0" : null}}}, {" key_as_string ": "2020-10-10T00:00.000z ", "doc_count" : 1, "response_time" : {"values" : {"75.0" : 99.0, the "99.0", 99.0}}, "moving_pct" : {" values ": {" 75.0", null, "99.0" : null}}}}}}]Copy the code
I’m going to expand a little bit here, so let’s get started. After storing by day, the percentile aggregation computes the percentiles of buckets stored per day. The MOVing_Percentiles pipe AGG then takes the first two buckets and calculates a moving average from them. Note that if you also want to include the current bucket, you can change the behavior of which buckets should be used by using the Shift parameter.
pipeline inference Aggregation
We will skip inference bucket aggregation because we plan to publish a blog post explaining it soon. Whet your appetite: You can run a pre-trained model against the results of the parent bucket summary. Stay tuned!
Support for the histogram
field type
Strictly speaking, this is not aggregation, but aggregation is affected by this data type, so it’s worth mentioning.
You may have lost the histogram Field type, which allows you to store pre-aggregated numerical data – for example, this data is widely used in Elastic Observability. This particular field type supports a subset of aggregations, and you’ll find some aggregations that are not yet supported… . But more needs to be done to support this.
Support for geo_shape
in geo aggregations
Again, strictly speaking, this is not a single convergence, but a big step forward, so it is worth mentioning.
In addition to the geo_point field type, a significant investment has been made to make the geo_bounds, geo_tile, geo_hashgrid, and geo_centroid aggregation work with the geo_SHAPE field type.