6. In-depth polymerization analysis
1. Bucket
& Metric
Aggregation analysis and nested aggregation
1.1 Bucket
& Metric Aggregation
与 SQL
The understanding of the
1.2 Aggregation
The grammar of the
Aggregation is part of Search. In general, you are advised to set its Size to 0
This will only return the Aggregation.
1.3 Metric Aggregation
1.3.1 Metric
Understanding and analysis of aggregation
- Single-value analysis: Outputs only one analysis result
min
,max
,avg
,sum
Cardinality
(similar todistinct Count
)
- Multivalue analysis: Output multiple analysis results
stats
,extended
,stats
percentile
,percentile rank
(used when you’re trying to find percentiles)top hits
(Previous example)
1.3.2 Metric
Specific of aggregationDemo
1.3.2.1 Data Preparation
Mapping PUT /employees/ {"mappings": {"properties": {"age": {"type": "integer"}, "gender": { "type": "keyword" }, "job": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 50 } } }, "name": { "type": "keyword" }, "salary": { "type": "Integer"}}}} # Add some data to the employees index PUT /employees/_bulk {"index" : {"_id" : "1"}} {"name" : "Emma","age":32,"job":"Product Manager","gender":"female","salary":35000 } { "index" : { "_id" : "2" } } { "name" : "Underwood","age":41,"job":"Dev Manager","gender":"male","salary": 50000} { "index" : { "_id" : "3" } } { "name" : "Tran","age":25,"job":"Web Designer","gender":"male","salary":18000 } { "index" : { "_id" : "4" } } { "name" : "Rivera","age":26,"job":"Web Designer","gender":"female","salary": 22000} { "index" : { "_id" : "5" } } { "name" : "Rose","age":25,"job":"QA","gender":"female","salary":18000 } { "index" : { "_id" : "6" } } { "name" : "Lucy","age":31,"job":"QA","gender":"female","salary": 25000} { "index" : { "_id" : "7" } } { "name" : "Byrd","age":27,"job":"QA","gender":"male","salary":20000 } { "index" : { "_id" : "8" } } { "name" : "Foster","age":27,"job":"Java Programmer","gender":"male","salary": 20000} { "index" : { "_id" : "9" } } { "name" : "Gregory","age":32,"job":"Java Programmer","gender":"male","salary":22000 } { "index" : { "_id" : "10" } } { "name" : "Bryant","age":20,"job":"Java Programmer","gender":"male","salary": 9000} { "index" : { "_id" : "11" } } { "name" : "Jenny","age":36,"job":"Java Programmer","gender":"female","salary":38000 } { "index" : { "_id" : "12" } } { "name" : "Mcdonald","age":31,"job":"Java Programmer","gender":"male","salary": 32000} { "index" : { "_id" : "13" } } { "name" : "Jonthna","age":30,"job":"Java Programmer","gender":"female","salary":30000 } { "index" : { "_id" : "14" } } { "name" : "Marshall","age":32,"job":"Javascript Programmer","gender":"male","salary": 25000} { "index" : { "_id" : "15" } } { "name" : "King","age":33,"job":"Java Programmer","gender":"male","salary":28000 } { "index" : { "_id" : "16" } } { "name" : "Mccarthy","age":21,"job":"Javascript Programmer","gender":"male","salary": 16000} { "index" : { "_id" : "17" } } { "name" : "Goodwin","age":25,"job":"Javascript Programmer","gender":"male","salary": 16000} { "index" : { "_id" : "18" } } { "name" : "Catherine","age":29,"job":"Javascript Programmer","gender":"female","salary": 20000} { "index" : { "_id" : "19" } } { "name" : "Boone","age":30,"job":"DBA","gender":"male","salary": 30000} { "index" : { "_id" : "20" } } { "name" : "Kathy","age":29,"job":"DBA","gender":"female","salary": 20000}Copy the code
1.3.2.2 Viewing the minimum Salary
POST employees/_search {"size": 0, "AGgs ": {"min_salary": {"min": {"field": "salary"}}}}Copy the code
1.3.2.3 View the highest salary
POST employees/_search {"size": 0, "AGgs ": {"max_salary": {" Max ": {"field": "salary"}}}}Copy the code
1.3.2.4 An aggregate outputs multiple values
- The first way is the following one
POST employees/_search {"size": 0, "aggs": {"max_salary": {" Max ": {"field": "salary" } }, "min_salary": { "min": { "field": "salary" } }, "avg_salary": { "avg": { "field": "salary" } } } }Copy the code
- The second way
POST employees/_search {"size": 0, "AGgs ": {"stats": {"field": "salary"}}}}Copy the code
1.4 Bucket Aggregation
- According to certain rules, documents are assigned to different buckets to achieve the purpose of classification.
ES
Offer some common onesBucket Aggregation
Term
- Numeric types
Range
/Data Range
Histogram
/Date Histogram
- Support nesting: also do buckets in buckets
1.4.1 Terms Aggregation
- Fields need to be opened
fielddata
Before carrying outTerms Aggregation
keyword
The default supportdoc_values
Text
Need to beMapping
The enable. According to the results of word segmentation will be graded
1.4.2 Terms Aggregation
的Demo
1.4.2.1 toJob
andjob.keyword
aggregated
POST employees/_search {"size": 0, "aggs": {"jobs": {"terms": {"field": "job. Keyword "}}}}Copy the code
POST employees/_search {"size": 0, "aggs": {"jobs": {"terms": {"field": "job"}}}}Copy the code
If you want to do aggregate analysis on fields of type Text, you need to enable FieldData in Mapping
# open fieldData for Text and support terms aggregation PUT employees/_mapping {"properties": {"job": {"type": "text", "fielddata": true } } }Copy the code
# distinct use POST employees/_search {"size": 0, "AGgs ": {"cardinate": {"cardinality": {"field": "job.keyword" } } } }Copy the code
1.4.2.2 Conduct for genderTerms
The aggregation
POST employees/_search {"size": 0, "aggs": {"gender": {"terms": {"field": "gender"}}}}Copy the code
1.4.2.3 specifiedbucket size
# specify bucket size POST employees/_search {"size": 0, "AGgs ": {" agES_5 ": {"terms": {"field": "age", "size": 3}}}}Copy the code
I’m going to have three buckets
1.4.3 Bucket Size
& Top Hits
的 Demo
- Application scenario: After a bucket is obtained, the list of the most matched documents on the top of the bucket is displayed
Size
: Buckets are divided by age to find the bucket information of the specified data amountTop Hits
: Look at the three oldest employees in each job category
POST employees/_search {"size": 0, "aggs": {"jobs": {"terms": {"field": "job.keyword" }, "aggs": { "old_employee": { "top_hits": { "size": 3, "sort": [ { "age": { "order": "desc" } } ] } } } } } }Copy the code
1.4.4 optimizationTerms
Performance of aggregation
This configuration is turned on when aggregated queries are very frequent. It is a pre-loaded configuration switch that can greatly improve performance
1.4.5 Range
& Histogram
The aggregation
- Buckets are divided according to the range of numbers
- in
Range Aggregation
Can be customizedKey
Demo
:- By salary
Range
Points barrels - According to the interval of salary (
Histogram
Barrels) points
- By salary
{"size": 0, "aggs": {"salary_range": {"range": {"field": "salary", "ranges": [ { "to": 10000 }, { "from": 10000, "to": 20000 }, { "key": ">20000", "from": 20000 } ] } } } }Copy the code
# salary_histrogram {"size": 0, "aggs": {"salary_histrogram": {"salary_histrogram": { "histogram": { "field": "salary", "interval": 5000, "extended_bounds": { "min": 0, "max": 100000 } } } } }Copy the code
1.5 Bucket Aggregation
+ Metric Aggregation
Bucket
Aggregation analysis allows further analysis by adding subaggregation analysis, which can beBucket
Metric
Demo
- According to the type of work for buckets, and statistical salary information
- Buckets are divided first by job type, then by gender, and salary information is collected
1.5.1 Nested aggregationDemo
POST employees/_search {"size": 0, "aggs": {"job": {"terms": {"field": "job.keyword" }, "aggs": { "salary": { "stats": { "field": "salary" } } } } } }Copy the code
# multiple nesting. POST employees/_search {"size": 0, "AGgs ": {"job": {"terms": {"field": "job.keyword" }, "aggs": { "gender": { "terms": { "field": "gender" }, "aggs": { "stat_salary": { "stats": { "field": "salary" } } } } } } } }Copy the code
2. Pipeline
Aggregation analysis (do the aggregation again)
Basically, you do the aggregation analysis, you do the aggregation analysis again
Example: 2.1Pipeline: min_bucket
Of the occupations with the largest number of employees, find the occupations with the lowest average wages
Bucket_path is used to specify the keyword. See bucket_PATH later so this is a Pipeline aggregation
2.2 Pipeline
Conceptual understanding
- Pipe (
Pipeline
) concept: support aggregation analysis of the results of aggregation analysis Pipeline
The analysis results will be output to the original results, which can be divided into two categories according to the different positionsSibling
: results are identical to existing analysis results (this example is Sibling type)- Max, Min , Avg & Sum Bucket
- Stats, Extended Status Bucket
- Percetiles Bucket
Parent
: Results are embedded in existing aggregation analysis results- Derivate (derivative)
- Cumultive(cumulative sum)
- Moving Function(Moving Window)
2.3 example
Note that the experimental data for the following demonstration examples are the same as the prepared data above
2.3.1 View the types of jobs with the lowest average wages
POST /employees/_search {"size": 0, "aggs": {"jobs": {"terms": {"field": "job. Keyword ", "size": 10 }, "aggs": { "avg_salary": { "avg": { "field": "salary" } } } }, "min_salary_by_job": { "min_bucket": { "buckets_path": "jobs>avg_salary" } } } }Copy the code
2.3.2 Percentile of average salary
POST employees/_search {"size": 0, "aggs": {"jobs": {"terms": {"field": "job. Keyword ", "size": 10 }, "aggs": { "avg_salary": { "avg": { "field": "salary" } } } }, "percentiles_salary_by_job": { "percentiles_bucket": { "buckets_path": "jobs>avg_salary" } } } }Copy the code
2.3.3 Take the derivative of average salary according to age
POST employees/_search {"size": 0, "aggs": {"age": { "age", "min_doc_count": 1, "interval": 1 }, "aggs": { "avg_salary": { "avg": { "field": "salary" } }, "derivative_avg_salary": { "derivative": { "buckets_path": "avg_salary" } } } } } }Copy the code
3. Scope and sorting
What if we want to aggregate on the result set of a query structure?
This is the scope of aggregation
ES
The default scope for aggregate analysis isquery
Query result set of- At the same time
ES
The following ways to change the scope of aggregation are also supportedFilter
Post Filter
Global
3.1 Query
Scope of action of
POST employees/_search {"size": 0, "Query ": {"range": {"age": {"gte": 40}}}, "aggs": {"jobs": { "terms": { "field": "job.keyword" } } } }Copy the code
3.2 Filter
Scope of action of
Filter
It can apply to something specificaggs
In the query
# Filter
POST employees/_search
{
"size": 0,
"aggs": {
"old_person": {
"filter": {
"range": {
"age": {
"from": 35
}
}
},
"aggs": {
"jobs": {
"terms": {
"field": "job.keyword"
}
}
}
},
"all_jobs": {
"terms": {
"field": "job.keyword"
}
}
}
}
Copy the code
3.3 Post Filter
Scope of action of
- When we’re done aggregating, we can use it if we want to show specific information that fits the criteria
Post Filter
# post field, a statement that finds all job types. POST employees/_search {"aggs": {"jobs": {"terms": {"field": "job. Keyword "}}}, "post_filter": { "match": { "job.keyword": "Dev Manager" } } }Copy the code
3.4 Global
Scope of action of
Global
Can be ignored by our aggregationQuery
The qualified
# global
POST employees/_search
{
"size": 0,
"query": {
"range": {
"age": {
"gte": 40
}
}
},
"aggs": {
"jobs": {
"terms": {
"field": "job.keyword"
}
},
"all": {
"global": {},
"aggs": {
"salary_avg": {
"avg": {
"field": "salary"
}
}
}
}
}
}
Copy the code
3.5 the sorting
# order # count and key POST employees/_search {"size": 0, "query": {"range": {"age": {"gte": 20}}}, "aggs": { "jobs": { "terms": { "field": "job.keyword", "order": [ { "_count": "asc" }, { "_key": "desc" } ] } } } }Copy the code
POST employees/_search
{
"size": 0,
"aggs": {
"jobs": {
"terms": {
"field": "job.keyword",
"order": [
{
"avg_salary": "desc"
}
]
},
"aggs": {
"avg_salary": {
"avg": {
"field": "salary"
}
}
}
}
}
}
Copy the code
Elasticsearch Is a game about Elasticsearch. It’s a game about Elasticsearch.