6. In-depth polymerization analysis

1. Bucket & MetricAggregation analysis and nested aggregation

1.1 Bucket & Metric AggregationSQLThe understanding of the

1.2 AggregationThe grammar of the

Aggregation is part of Search. In general, you are advised to set its Size to 0

This will only return the Aggregation.

1.3 Metric Aggregation

1.3.1 MetricUnderstanding and analysis of aggregation

  • Single-value analysis: Outputs only one analysis result
    • min,max,avg,sum
    • Cardinality(similar todistinct Count)
  • Multivalue analysis: Output multiple analysis results
    • stats,extended,stats
    • percentile,percentile rank(used when you’re trying to find percentiles)
    • top hits(Previous example)

1.3.2 MetricSpecific of aggregationDemo

1.3.2.1 Data Preparation
Mapping PUT /employees/ {"mappings": {"properties": {"age": {"type": "integer"}, "gender": { "type": "keyword" }, "job": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 50 } } }, "name": { "type": "keyword" }, "salary": { "type": "Integer"}}}} # Add some data to the employees index PUT /employees/_bulk {"index" : {"_id" : "1"}} {"name" : "Emma","age":32,"job":"Product Manager","gender":"female","salary":35000 } { "index" : { "_id" : "2" } } { "name" : "Underwood","age":41,"job":"Dev Manager","gender":"male","salary": 50000} { "index" : { "_id" : "3" } } { "name" : "Tran","age":25,"job":"Web Designer","gender":"male","salary":18000 } { "index" : { "_id" : "4" } } { "name" : "Rivera","age":26,"job":"Web Designer","gender":"female","salary": 22000} { "index" : { "_id" : "5" } } { "name" : "Rose","age":25,"job":"QA","gender":"female","salary":18000 } { "index" : { "_id" : "6" } } { "name" : "Lucy","age":31,"job":"QA","gender":"female","salary": 25000} { "index" : { "_id" : "7" } } { "name" : "Byrd","age":27,"job":"QA","gender":"male","salary":20000 } { "index" : { "_id" : "8" } } { "name" : "Foster","age":27,"job":"Java Programmer","gender":"male","salary": 20000} { "index" : { "_id" : "9" } } { "name" : "Gregory","age":32,"job":"Java Programmer","gender":"male","salary":22000 } { "index" : { "_id" : "10" } } { "name" : "Bryant","age":20,"job":"Java Programmer","gender":"male","salary": 9000} { "index" : { "_id" : "11" } } { "name" : "Jenny","age":36,"job":"Java Programmer","gender":"female","salary":38000 } { "index" : { "_id" : "12" } } { "name" : "Mcdonald","age":31,"job":"Java Programmer","gender":"male","salary": 32000} { "index" : { "_id" : "13" } } { "name" : "Jonthna","age":30,"job":"Java Programmer","gender":"female","salary":30000 } { "index" : { "_id" : "14" } } { "name" : "Marshall","age":32,"job":"Javascript Programmer","gender":"male","salary": 25000} { "index" : { "_id" : "15" } } { "name" : "King","age":33,"job":"Java Programmer","gender":"male","salary":28000 } { "index" : { "_id" : "16" } } { "name" : "Mccarthy","age":21,"job":"Javascript Programmer","gender":"male","salary": 16000} { "index" : { "_id" : "17" } } { "name" : "Goodwin","age":25,"job":"Javascript Programmer","gender":"male","salary": 16000} { "index" : { "_id" : "18" } } { "name" : "Catherine","age":29,"job":"Javascript Programmer","gender":"female","salary": 20000} { "index" : { "_id" : "19" } } { "name" : "Boone","age":30,"job":"DBA","gender":"male","salary": 30000} { "index" : { "_id" : "20" } } { "name" : "Kathy","age":29,"job":"DBA","gender":"female","salary": 20000}Copy the code
1.3.2.2 Viewing the minimum Salary
POST employees/_search {"size": 0, "AGgs ": {"min_salary": {"min": {"field": "salary"}}}}Copy the code

1.3.2.3 View the highest salary
POST employees/_search {"size": 0, "AGgs ": {"max_salary": {" Max ": {"field": "salary"}}}}Copy the code
1.3.2.4 An aggregate outputs multiple values
  • The first way is the following one
POST employees/_search {"size": 0, "aggs": {"max_salary": {" Max ": {"field": "salary" } }, "min_salary": { "min": { "field": "salary" } }, "avg_salary": { "avg": { "field": "salary" } } } }Copy the code

  • The second way
POST employees/_search {"size": 0, "AGgs ": {"stats": {"field": "salary"}}}}Copy the code

1.4 Bucket Aggregation

  • According to certain rules, documents are assigned to different buckets to achieve the purpose of classification.ESOffer some common onesBucket Aggregation
    • Term
    • Numeric types
      • Range/Data Range
      • Histogram/Date Histogram
  • Support nesting: also do buckets in buckets

1.4.1 Terms Aggregation

  • Fields need to be openedfielddataBefore carrying outTerms Aggregation
    • keywordThe default supportdoc_values
    • TextNeed to beMappingThe enable. According to the results of word segmentation will be graded

1.4.2 Terms AggregationDemo

1.4.2.1 toJobandjob.keywordaggregated
POST employees/_search {"size": 0, "aggs": {"jobs": {"terms": {"field": "job. Keyword "}}}}Copy the code

POST employees/_search {"size": 0, "aggs": {"jobs": {"terms": {"field": "job"}}}}Copy the code

If you want to do aggregate analysis on fields of type Text, you need to enable FieldData in Mapping

# open fieldData for Text and support terms aggregation PUT employees/_mapping {"properties": {"job": {"type": "text", "fielddata": true } } }Copy the code

# distinct use POST employees/_search {"size": 0, "AGgs ": {"cardinate": {"cardinality": {"field": "job.keyword" } } } }Copy the code
1.4.2.2 Conduct for genderTermsThe aggregation
POST employees/_search {"size": 0, "aggs": {"gender": {"terms": {"field": "gender"}}}}Copy the code
1.4.2.3 specifiedbucket size
# specify bucket size POST employees/_search {"size": 0, "AGgs ": {" agES_5 ": {"terms": {"field": "age", "size": 3}}}}Copy the code

I’m going to have three buckets

1.4.3 Bucket Size & Top HitsDemo

  • Application scenario: After a bucket is obtained, the list of the most matched documents on the top of the bucket is displayed
  • Size: Buckets are divided by age to find the bucket information of the specified data amount
  • Top Hits: Look at the three oldest employees in each job category
POST employees/_search {"size": 0, "aggs": {"jobs": {"terms": {"field": "job.keyword" }, "aggs": { "old_employee": { "top_hits": { "size": 3, "sort": [ { "age": { "order": "desc" } } ] } } } } } }Copy the code

1.4.4 optimizationTermsPerformance of aggregation

This configuration is turned on when aggregated queries are very frequent. It is a pre-loaded configuration switch that can greatly improve performance

1.4.5 Range & HistogramThe aggregation

  • Buckets are divided according to the range of numbers
  • inRange AggregationCan be customizedKey
  • Demo:
    • By salaryRangePoints barrels
    • According to the interval of salary (HistogramBarrels) points
{"size": 0, "aggs": {"salary_range": {"range": {"field": "salary", "ranges": [ { "to": 10000 }, { "from": 10000, "to": 20000 }, { "key": ">20000", "from": 20000 } ] } } } }Copy the code

# salary_histrogram {"size": 0, "aggs": {"salary_histrogram": {"salary_histrogram": { "histogram": { "field": "salary", "interval": 5000, "extended_bounds": { "min": 0, "max": 100000 } } } } }Copy the code

1.5 Bucket Aggregation + Metric Aggregation

  • BucketAggregation analysis allows further analysis by adding subaggregation analysis, which can be
    • Bucket
    • Metric
  • Demo
    • According to the type of work for buckets, and statistical salary information
    • Buckets are divided first by job type, then by gender, and salary information is collected

1.5.1 Nested aggregationDemo

POST employees/_search {"size": 0, "aggs": {"job": {"terms": {"field": "job.keyword" }, "aggs": { "salary": { "stats": { "field": "salary" } } } } } }Copy the code

# multiple nesting. POST employees/_search {"size": 0, "AGgs ": {"job": {"terms": {"field": "job.keyword" }, "aggs": { "gender": { "terms": { "field": "gender" }, "aggs": { "stat_salary": { "stats": { "field": "salary" } } } } } } } }Copy the code

2. PipelineAggregation analysis (do the aggregation again)

Basically, you do the aggregation analysis, you do the aggregation analysis again

Example: 2.1Pipeline: min_bucket

Of the occupations with the largest number of employees, find the occupations with the lowest average wages

Bucket_path is used to specify the keyword. See bucket_PATH later so this is a Pipeline aggregation

2.2 PipelineConceptual understanding

  • Pipe (Pipeline) concept: support aggregation analysis of the results of aggregation analysis
  • PipelineThe analysis results will be output to the original results, which can be divided into two categories according to the different positions
    • Sibling: results are identical to existing analysis results (this example is Sibling type)
      • Max, Min , Avg & Sum Bucket
      • Stats, Extended Status Bucket
      • Percetiles Bucket
    • Parent: Results are embedded in existing aggregation analysis results
      • Derivate (derivative)
      • Cumultive(cumulative sum)
      • Moving Function(Moving Window)

2.3 example

Note that the experimental data for the following demonstration examples are the same as the prepared data above

2.3.1 View the types of jobs with the lowest average wages

POST /employees/_search {"size": 0, "aggs": {"jobs": {"terms": {"field": "job. Keyword ", "size": 10 }, "aggs": { "avg_salary": { "avg": { "field": "salary" } } } }, "min_salary_by_job": { "min_bucket": { "buckets_path": "jobs>avg_salary" } } } }Copy the code

2.3.2 Percentile of average salary

POST employees/_search {"size": 0, "aggs": {"jobs": {"terms": {"field": "job. Keyword ", "size": 10 }, "aggs": { "avg_salary": { "avg": { "field": "salary" } } } }, "percentiles_salary_by_job": { "percentiles_bucket": { "buckets_path": "jobs>avg_salary" } } } }Copy the code

2.3.3 Take the derivative of average salary according to age

POST employees/_search {"size": 0, "aggs": {"age": { "age", "min_doc_count": 1, "interval": 1 }, "aggs": { "avg_salary": { "avg": { "field": "salary" } }, "derivative_avg_salary": { "derivative": { "buckets_path": "avg_salary" } } } } } }Copy the code

3. Scope and sorting

What if we want to aggregate on the result set of a query structure?

This is the scope of aggregation

  • ESThe default scope for aggregate analysis isqueryQuery result set of
  • At the same timeESThe following ways to change the scope of aggregation are also supported
    • Filter
    • Post Filter
    • Global

3.1 QueryScope of action of

POST employees/_search {"size": 0, "Query ": {"range": {"age": {"gte": 40}}}, "aggs": {"jobs": { "terms": { "field": "job.keyword" } } } }Copy the code

3.2 FilterScope of action of

  • FilterIt can apply to something specificaggsIn the query
# Filter
POST employees/_search
{
  "size": 0,
  "aggs": {
    "old_person": {
      "filter": {
        "range": {
          "age": {
            "from": 35  
          }
        }
      },
      "aggs": {
        "jobs": {
          "terms": {
            "field": "job.keyword"
          }
        }
      }
    },
    "all_jobs": {
      "terms": {
        "field": "job.keyword"
      }
    }
  }
}
Copy the code

3.3 Post FilterScope of action of

  • When we’re done aggregating, we can use it if we want to show specific information that fits the criteriaPost Filter
# post field, a statement that finds all job types. POST employees/_search {"aggs": {"jobs": {"terms": {"field": "job. Keyword "}}}, "post_filter": { "match": { "job.keyword": "Dev Manager" } } }Copy the code

3.4 GlobalScope of action of

  • GlobalCan be ignored by our aggregationQueryThe qualified
# global
POST employees/_search
{
    "size": 0,
    "query": {
        "range": {
            "age": {
                "gte": 40
            }
        }
    },
    "aggs": {
        "jobs": {
            "terms": {
                "field": "job.keyword"
            }
        },
        "all": {
            "global": {},
            "aggs": {
                "salary_avg": {
                    "avg": {
                        "field": "salary"
                    }
                }
            }
        }
    }
}
Copy the code

3.5 the sorting

# order # count and key POST employees/_search {"size": 0, "query": {"range": {"age": {"gte": 20}}}, "aggs": { "jobs": { "terms": { "field": "job.keyword", "order": [ { "_count": "asc" }, { "_key": "desc" } ] } } } }Copy the code

POST employees/_search
{
    "size": 0,
    "aggs": {
        "jobs": {
            "terms": {
                "field": "job.keyword",
                "order": [
                    {
                        "avg_salary": "desc"
                    }
                ]
            },
            "aggs": {
                "avg_salary": {
                    "avg": {
                        "field": "salary"
                    }
                }
            }
        }
    }
}
Copy the code

Elasticsearch Is a game about Elasticsearch. It’s a game about Elasticsearch.