Metric aggregations for ElasticSearch
preface
ElasticSearch (ElasticSearch) ElasticSearch JAVA API syntax for ElasticSearch This article introduces the use of the JAVA API and DSL statements for ElasticSearch aggregate queries.
ElasticSearch Aggregation
The aggregation framework helps to provide aggregated data based on search queries. It is based on simple building blocks called aggregations that can be combined to build complex summaries of data. Aggregation can be thought of as establishing units of work for analyzing information on a set of documents. The execution context defines what this set of documents is (for example, the top-level aggregation is executed in the context of the executed query/filter of the search request). There are many different types of aggregations, each with its own purpose and output. To better understand these types, it is often easier to divide them into four main families:
- Metric:
Tracking and calculating aggregates of metrics on a set of documents. These values are typically extracted from the fields of the document (using field data), but can also be generated using a script.
- Bucketing:
Generates a set of aggregates of buckets, each of which is associated with a key and a document condition. When the aggregation is performed, all bucket criteria are evaluated for each document in context, and when the criteria match, the document is considered to “fall into” the relevant bucket. At the end of the aggregation process, we have a list of buckets – each bucket has a set of documents that “belong” to.
- Matrix:
Manipulate multiple fields and generate a collection of matrix results based on the values extracted from the requested document fields. Unlike Metric and Bucketing, this aggregation does not support scripting!
- Pipeline:
It aggregates the output of other aggregations and their associated metrics.
Because each bucket effectively defines a set of documents to which all files belong, aggregations can potentially be associated at the bucket level, and these aggregations will be performed within the context of that bucket. This is the real power of aggregation: it can be nested!
Bucket aggregations can have subaggregations (buckets or metrics). The child aggregation evaluates against the bucket generated by its parent aggregation. There is no hard limit to the level/depth of nested aggregation (you can nest an aggregation under a “parent” aggregation that is itself a child aggregation of another higher-level aggregation). Clustering is used to represent double data. Thus, when running long positions greater than absolute value, the result may be approximately 2^53.
Metric aggregation
Numerical index aggregation is a special type of index aggregation, which can output numerical values. Some aggregations output a single numeric metric (such as AVG) and are referred to as single-value numeric metrics aggregation, Other aggregations generate multiple measures (such as STATS) and are referred to as multi-value numeric metrics aggregation. The difference between single-value and multi-value digital measure aggregations comes into play when these values act as direct subaggregations of some bucket aggregations that allow you to sort the returned buckets based on the digital measures in each bucket.
Metric aggregation has a lot of aggregation in the ElasticSearch documentation, so I’ll just list some of the most common examples.
Avg polymerization
The calculated average extracts values from the aggregated documents. These values can be extracted from specific numeric fields in the document or generated by the script provided.
Here we use an example to illustrate the average score of a class.
Example DSL statements:
POST /student/_search? size=0 { "aggs" : { "avg_grade" : { "avg" : { "field" : "grade" } } } }Copy the code
Note: The grade field type must be an integer
Of course, we can add weight to the score if it also contains weight. Weight: Each data point has an equal weight when calculating the normal average… It contributes equally to the final value. The larger the weight value is, the more advanced it is. The weighting formula is: ∑(value * weight) / ∑(weight).
Example DSL statements:
POST /student/_search
{
"size": 0,
"aggs" : {
"weighted_grade": {
"weighted_avg": {
"value": {
"field": "grade"
},
"weight": {
"field": "weight"
}
}
}
}
}
Copy the code
The Max/min aggregation
Here we use an example to illustrate to get the highest and lowest marks in the class.
Example DSL statements:
POST /student/_search? size=0 { "aggs" : { "max_grade" : { "max" : { "field" : "grade" } } } } POST /student/_search? size=0 { "aggs" : { "min_grade" : { "min" : { "field" : "grade" } } } }Copy the code
The sum aggregate
Gets the value of the sum of a field.
Example DSL statements:
POST /student/_search? size=0 { "aggs" : { "sum_grade" : { "sum" : { "field" : "grade" } } } }Copy the code
Top polymerization
A top_HITS metric is constantly being aggregated to track the most relevant documents. This aggregator is intended to be used as a child aggregator so that the best-matched documents can be summarized by storage partition. The TOP_HITS aggregator can be effectively used to put result sets through certain fields via the bucket aggregator. One or more bucket aggregators determine which properties to slice the result set into.
Option to
- From – The offset from the first result you want to extract.
- Size – Maximum number of best matches returned per store. By default, the first three matches are returned. Sort – Sort of hot matches. By default, hits are sorted by the score of the primary query.
Again, let’s use an example to illustrate. Select the first two fields from grade(grade) in descending order, and only grade(name) and grade(name) are included in the fields.
Example DSL statements:
POST /student/_search? size=0 { "aggs": { "top_tags": { "terms": { "field": "grade", "size": 2 }, "aggs": { "top_sales_hits": { "top_hits": { "sort": [ { "grade": { "order": "desc" } } ], "_source": { "includes": [ "grade", "name" ] }, "size" : 1 } } } } } }Copy the code
JAVA code examples
/** * @author pancm * @description Average aggregate query test case * @date 2019/4/1 * @param [] * @return void
**/
private static void avgSearch() throws IOException {
String buk="t_grade_avg"; AggregationBuilder aggregation = AggregationBuilders. Avg (buk). Field (AggregationBuilder aggregation = AggregationBuilders."grade");
logger.info("Find the average grade of the class :");
agg(aggregation,buk);
}
private static void maxSearch() throws IOException{
String buk="t_grade";
AggregationBuilder aggregation = AggregationBuilders.max(buk).field("grade");
logger.info("Get the highest grade in the class :");
agg(aggregation,buk);
}
private static void sumSearch() throws IOException{
String buk="t_grade";
AggregationBuilder aggregation = AggregationBuilders.sum(buk).field("grade");
logger.info("Find the total score of the class :");
agg(aggregation,buk);
}
private static SearchResponse search(AggregationBuilder aggregation) throws IOException {
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("student");
searchRequest.types("_doc"); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); / / don't need to explain searchSourceBuilder. Explain (false); / / don't need the original data searchSourceBuilder. FetchSource (false); / / don't need the version number searchSourceBuilder. Version (false);
searchSourceBuilder.aggregation(aggregation);
logger.info("Query statement :"+searchSourceBuilder.toString()); searchRequest.source(searchSourceBuilder); SearchResponse = client.search(searchRequest, requestOptions.default);return searchResponse;
}
protected static void agg(AggregationBuilder aggregation, String buk) throws IOException{
SearchResponse searchResponse = search(aggregation);
if(restStatus.ok.equals (searchResponse.status())) {// Get Aggregations Aggregations = searchResponse.getAggregations();if(buk.contains("avg"Avg ba = aggregations.get(buk); logger.info(buk+":" + ba.getValue());
logger.info("-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -");
}else if(buk.contains("max"// aggregations Max ba = aggregations. Get (buk); logger.info(buk+":" + ba.getValue());
logger.info("-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -");
}else if(buk.contains("min"){// take the subaggregate Min ba = aggregations.get(buk);
logger.info(buk+":" + ba.getValue());
logger.info("-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -");
}else if(buk.contains("sum"Sum ba = aggregations. Get (buk); logger.info(buk+":" + ba.getValue());
logger.info("-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -");
}else if(buk.contains("top"TopHits TopHits ba = aggregations. Get (buk); logger.info(buk+":" + ba.getHits().totalHits);
logger.info("-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -"); }}}Copy the code
other
Reference: www.elastic.co/guide/en/el…
The code for this article has been included in my Java-Study project, so if you are interested, welcome to Star, fork, and Issues. Project address :github.com/xuwujing/ja…
ElasticSearch Combat Series
- Kinaba for ElasticSearch
- ElasticSearch DSL statement for ElasticSearch
- ElasticSearch: JAVA API for ElasticSearch
- ElasticSearch: ElasticSearch
Music to recommend
Original is not easy, if you feel good, I hope to give a recommendation! Your support is the biggest motivation for my writing! Copyright: www.cnblogs.com/xuwujing CSDN blog.csdn.net/qazwsxpcm Personal blog: www.panchengming.com