1. Basic concepts of ElasticSearch

1.1 document (Doucument)

1.ElasticSearch is document-oriented, a document is the smallest unit of all searchable data · log entries in a log file · Details of a movie/details of an album · Details of a song in an MP3 player/details of a PDF file 2. ·JSON object consists of fields · Each field has a corresponding field type (string/numeric/Boolean/date/binary/range type) 3. Each document has a Unique ID. You can specify the ID yourself or create it automatically through ElasticSearchCopy the code

1.2 JSON document

A document contains a series of fields, similar to a record in a database table JSON document format flexibility does not need to define the format · Field type can be specified or automatically calculated through ElasticSearch · Support array/support nestingCopy the code

1.3 Metadata of documents

{
        "_index": "my_test_index",
        "_type": "test_idnex",
        "_id": "AXcpGrIeEQcMCfQJ7Gc5",
        "_score": 1,
        "_source": {
          "testId": "4",
          "testName": "zhaoliu"
        }
 }     
Copy the code

Metadata, information about standard documents

_index Indicates the index name of the document
_type Specifies the type to which the document belongs
_id Indicates the unique ID of a document
_source Indicates the original JSON data of the document
_all Consolidates the contents of all fields into this field and is invalid
_version Document version
_score Indicates the correlation score

1.4 the index

{ "my_test_index": { "settings": { "index": { "search": { "slowlog": { "level": "info", "threshold": { "fetch": { "warn": "200ms", "trace": "50ms", "debug": "80ms", "info": "100ms" }, "query": { "warn": "200ms", "trace": "50ms", "debug": "80ms", "info": "100ms" } } } }, "indexing": { "slowlog": { "level": "info", "threshold": { "index": { "warn": "200ms", "trace": "20ms", "debug": "50ms", "info": "100ms" } }, "source": "1000" } }, "number_of_shards": "5", "provided_name": "my_test_index", "creation_date": "1611301841428", "unassigned": { "node_left": { "delayed_timeout": "5m" } }, "number_of_replicas": "1", "uuid": "e5B65ySmQ-GE8Tj9gUHIPw", "version": { "created": "5050399"}}}}}Copy the code

INDEX is a container for documents and a combination of a class of documents
- INDEX represents the concept of logical space: each INDEX has its own mapping definition that defines the field name and field type that contains the document
- The Shard represents the concept of physical space where the data in the index is spread over the Shard
Index Mapping and Settings
- Mapping defines the types of document fields
- Setting defines different data distributions

1.5 Type

Prior to 7.0, multiple Types could be set for an Index
Since 6.0,Type has been Deprecated(not recommended). As of 7.0 only one Type – > “_doc” can be created for an index

1.6 Abstraction and analogy

RDBMS	ElasticSearch
Table	Index(Type)
Row	Document
Column	Filed
Schema	Mapping
SQL	DSL

Prior to 7.0, multiple Types could be set for an Index
Currently, Type has been Deprecated(Deprecated), as of 7.0 only one Type can be created for an index – > “_doc”
The difference between traditional relational databases and ElasticSearch

— ElasticSearch — Schemaless/Correlation/High performance full text search — RDMS-transactional /Join

2. Nodes, clusters, fragments, and replicas

2.1 Distributed Features

Benefits of ElasticSearch’s distributed architecture – Horizontal storage capacity expansion – Improved system availability. The entire cluster is not affected when some nodes stop serving
ElasticSearch’s distributed architecture
- Different clusters are distinguished by different names default is “ElasticSearch”
- You can change the cluster name in the configuration file or run the -e cluster.name=geektime command
- A cluster can have one or more nodes

2.2 the node

The node is an instance of ElasticSearch

Multiple ElasticSearch processes can run on a machine, but production environments generally recommend running only one instance of ElasticSearch on a machine

Each node has a name specified in the configuration file or during startup. -e node.name=node1
Each connection is assigned a UID after startup and stored in the data directory

2.2.1 Master- Eligible Nodes and Master Nodes

Each node starts with a Master eligible node by default
- You can set Node. master:false to disable node.master
Master-eligible You can join the main selection process to become a Master node
When the first node starts, it elects itself as the Master node
Each node stores the state of the cluster. Only the Master node can change the state of the cluster
- Cluster State maintains the necessary information in a Cluster
  - Information about all nodes
  - All indexes and their associated Mapping and Setting information
  - Fragmented routing information
- Any node can modify the information, resulting in data inconsistency

2.2.2 Data Node & Coordinating Node

Data Node
- The nodes that can store Data are called Data nodes and are responsible for storing fragmented Data. It plays a crucial role in data expansion
Coordinating Node
- Receives Client requests, distributes them to the appropriate nodes, and finally aggregates the results together
- Each Node has a Coordinating Node role by default

2.2.3 Other Nodes

Hot & Warm Node (参考链接Hot & Warm Node)

— Data nodes with different hardware configurations to implement the Hot & Warm architecture and reduce the cost of cluster deployment

Machine Learning Node

— The Job responsible for machine learning is used for anomaly detection

Tribe Node (Coordinating nodes to act as federated clients across multiple clusters)

The Tribe Node connects to different ElasticSearch clusters and supports treating them as a single Cluster

2.2.4 Configuring the Node Type

A node can play multiple roles in a development environment
In a production environment, you should set up a single-role node (Dedicated node)

The node type	Configuration parameters	The default value
maste eligible	node.master	true
data	node.data	true
ingest	node.ingest	true
coordinating only	There is no	Each node defaults to a coordinating node and sets all other types to false
machine learning	node.ml	True (to enable x – pack)

2.3 Shard (Primary Shard & Replica Shard)

Master sharding, to solve the problem of horizontal data scaling. With master sharding, data can be partitioned across all nodes in the cluster
- A shard is a running instance of Lucene
- The number of primary shards is specified when the index is created and cannot be changed later, except for Reindex
Copy, to solve the problem of high availability of data. A shard is a copy of the master shard
- The number of duplicate fragments can be dynamically adjusted
- Increasing the number of copies can also improve the availability of the service to some extent (read throughput)
The distribution of blogs indexes in a three-node cluster

2.3.1 Sharding setting

Capacity planning is required for the production environment
- The number of fragments is too small. Procedure
  - Nodes cannot be added to achieve horizontal scaling
  - The amount of data in a single fragment is large, which takes time to reallocate data
- If the shard number is set to 7.0, the main shard is set to 1 by default.Shards are also resources. Too many shards may affect cluster stability. Because there are too many shards, there will be more meta information, which will consume heap memory. Too many shards can also affect read/write performance, since each read/write request requires one thread. So if the index does not have a large amount of data, you do not need to set many shards.The problem of)
  - It affects the relevance scoring of search results and the accuracy of statistical results
  - When there are too many fragments on a node, resources will be wasted and performance will be affected

2.4 Checking the Cluster Health Status

GET _cluster/health
{
  "cluster_name": "es-cn-zz11rb9fv000fj1pe"."status": "green"."timed_out": false."number_of_nodes": 6."number_of_data_nodes": 3."active_primary_shards": 766."active_shards": 1507."relocating_shards": 0."initializing_shards": 0."unassigned_shards": 0."delayed_unassigned_shards": 0."number_of_pending_tasks": 0."number_of_in_flight_fetch": 0."task_max_waiting_in_queue_millis": 0."active_shards_percent_as_number": 100
}
GET _cat/nodes
172.1725.. 39 45 91 3 0.23 0.08 0.06 di - 3Ja7gZv
172.1725.53. 55 79 1 0.00 0.01 0.05 mi * H1guebi
172.1725.52. 22 78 0 0.01 0.02 0.05 mi - rdjzfmG
172.1725.51. 24 78 0 0.00 0.01 0.05 mi - uaU255o
172.1725.38. 54 91 2 0.23 0.26 0.16 di - wQwmOos
172.1725.40. 65 89 1 0.01 0.17 0.26 di - 4mZ8XK7

GET _cat/shard
companyinfo                     4 r STARTED 31408061  38.5gb 172.1725.38. wQwmOos
companyinfo                     4 p STARTED 31408061  40.2gb 172.1725.. 39 3Ja7gZv
companyinfo                     1 p STARTED 31412834  43.2gb 172.1725.38. wQwmOos
companyinfo                     1 r STARTED 31412834  41.7gb 172.1725.. 39 3Ja7gZv
companyinfo                     3 r STARTED 31407535  37.6gb 172.1725.40. 4mZ8XK7
companyinfo                     3 p STARTED 31407535  36.8gb 172.1725.. 39 3Ja7gZv
companyinfo                     2 r STARTED 31412927  41.8gb 172.1725.40. 4mZ8XK7
companyinfo                     2 p STARTED 31412927  41.2gb 172.1725.. 39 3Ja7gZv
companyinfo                     0 p STARTED 31400572  40.4gb 172.1725.40. 4mZ8XK7
companyinfo                     0 r STARTED 31400572  43.1gb 172.1725.38. wQwmOos
Copy the code

Green Master shards and replicas are fine
Yellow primary fragments were all properly allocated, but duplicate fragments were not properly allocated
Red has a primary shard that could not be allocated
- For example, a new index was created when the disk capacity of the server exceeded 85%

3. CRUD & batch operation of documents

3.1 CRUD of documents

The Type name convention uses _doc
Create Will fail if the ID already exists
Index If ID does not exist create a new document otherwise delete the existing document before creating a new document version will increase
The Update document must already exist and only incremental changes are made to the response field

3.1.1 Index

PUT my_test_index/_doc/1
{
	"user":"mike"."comment":"You know,for search"
}
Copy the code

Index differs from Create in that if the document does not already exist, the new document is indexed. Otherwise, existing documents are deleted and new documents are indexed. Version information +1

3.1.2 the Create

PUT my_test_index/_create/1
{
	"user":"mike"."comment":"You know,for search"} POST my_test_index/_doc (generated automatically without specifying ID) {"user":"mike"."comment":"You know,for search"
}
Copy the code

Supports automatic generation of document ids and specified document IDS
By calling “POST /my_test_index/_doc”
- The system automatically generates the Document ID

3.1.3 the Read

GET my_test_indx/_doc/_1


{
  "took": 1."timed_out": false."_shards": {
    "total": 1."successful": 1."failed": 0
  },
  "hits": {
    "total": 1."max_score": 1."hits": [{"_index": "my_store"."_type": "products"."_id": "5"."_score": 1."_source": {
          "price": 10."productName": "ZHANGSAN"."productID": "XHDK-A-1293-#fJ3"}}}]Copy the code

Find the document, return to HTTP 200
- Document meta information
  - _index/_type/
  - Version information: Even if a document with the same ID is deleted, the version number will continue to increase
  - The _source contains all the raw data for the document by default
Unable to find document returns HTTP 404

3.1.4 Update

POST my_test_index/_update/1
{
	"doc": {"user":"mike"."comment":"You know,ElasticSearch"}}Copy the code

The Update method does not delete the original document, but performs a true data Update
The Post method /Payload needs to be included in “doc”

3.1.4 the Delete

DELETE my_test_index/_doc/1
Copy the code

3.2 Bulk API

Supports operation on different indexes in a single API call
Four types of operations are supported
- Index
- Create
- Update
- Delete
You can specify an Index in the URI or Payload
The failure of a single operation does not affect other operations
The return result includes the result of each operation

3.3 Batch Reading -mget

Batch operations can reduce the cost of network connections and improve performance

GET /_mget
{
	"docs": [{
			"_index": "my_store"."_id": 1
		},
		{
			"_index": "companyinfo"."_id": "cd5b8daadc31482e84715da912a604f4"} {}]"docs": [{"_index": "my_store"."_type": "products"."_id": "1"."_version": 4."found": true."_source": {
        "price": 12."productID": "XHDK-A-1293-#fJ3"}}, {"_index": "companyinfo"."_type": "companyinfo"."_id": "cd5b8daadc31482e84715da912a604f4"."_version": 1."found": true."_source": {
        "entName": Guangxi Golden Friend Haoyou Investment Co., LTD.."orgLogo": ""."regCapital": "RMB 5 million"."city": Guangxi Zhuang Autonomous Region."regDate": "2017-05-17"."industry": "Business Services"."taxpayerIdNo": "91450800MA5L585X6F"."creditCode": "91450800MA5L585X6F"."registrationAuthority": Guigang Market Supervision Administration."staffSize": ""."orgCode": "MA5L585X-6"."enterpriseStatus": "To be continued (in operation, in operation, on the books)"."id": "cd5b8daadc31482e84715da912a604f4"."businessRegCode": "450800000151505"."email": ""."introduction": ""."regCapitalNumber": 500."website": ""."address": 1 / F, Longsheng New Village, Jiefang North Road, Guigang City."town": ""."bossId": "4b12e1b8d1ef11-p-4b12e276d1ef1"."corporation": "Snow covered"."businessScope": "Investment in cultural industry, tourism and tourist commodities; Investment in construction; Corporate image planning, marketing planning, event planning, stage modeling planning, wedding celebration planning; Exhibition services, conference services, etiquette services, photography services; Network information technology development, consulting, transfer services; Film and television planning consulting, enterprise management consulting, investment information consulting (the above items except the special provisions of the state); Television program production services (specific projects subject to the approval of the examination and approval department); Animation design; Retail of publications (specific projects subject to the approval of the examination and approval department), indoor and outdoor decoration engineering, architectural engineering design, municipal engineering, landscape engineering design (the above projects with the credit card operation); Catering services (specific items subject to the approval of the examination and approval department); Design, production, agency, release all kinds of domestic advertising; Performance broker (subject to the specific project approved by the examination and approval department); Government procurement, bidding agency, engineering consulting, land evaluation, real estate evaluation, assets evaluation, real estate evaluation audit, project settlement.."businessTerm": "Long term"."contributedcapital": ""."checkDate": "2017-05-17"."enterpriseType": "Limited liability Company (sole natural person)"."orgNameEn": ""."taxpayerQualification": ""."telphone": ""."district": ""."sameEnterprise": "< Associated Enterprise 3>"."oldOrgName": ""."readAddress": 1 / F, Longsheng New Village, Jiefang North Road, Guigang City."contributors": ""}}}]Copy the code

3.4 Batch Query -msearch

3.5 Common Errors Are Displayed

The problem	why
Unable to connect	The network or cluster is faulty
Connection cannot be closed	The network is faulty or the node is faulty
429	The Cluster is too Busy
4XX	Request size error
500	Cluster internal error

4. Invert indexes

4.1 Forward and inverted indexes

The inverted index contains two parts
- Term Dictionary, which records the words of all documents and records the associations of words to inverted lists
  - Word dictionaries are generally large and can be implemented using B+ trees or hash zippers for high-performance inserts and queries
- Posting List is a combination of documents that correspond to words and consists of an inverted index
  - Posting Index entries
    - Document ID
    - Word frequency TF The number of times the word appears in a document is used for relevance scoring
    - Position The Position of a word in a document that is used for phrase query.
    - Offset Records the start and end positions of words to highlight them

4.2 Inverted index of ElasticSearch

Each field in the ElasticSearch JSON document has its own inverted index
You can specify that certain fields are not indexed
- Advantages: Saves storage space
- Disadvantages: Fields cannot be searched

5. Word segmentation

5.1 Alalysis and Analyzer

Alalysis text analysis is the process of converting a full text into a series of words (term/token), also known as word segmentation
Alalysis is implemented through Analyzer
- You can use ElasticSearch’s built-in parser or use a custom parser
In addition to converting entries as data is written, the same parser is used to parse Query statements when they are matched

5.2 Composition of the Analyzer

The Analyzer consists of three parts -Character Filters(for raw text processing, such as outgoing HTML)/Tokenizer(for word segmentation according to rules)/Token Filter(to process the word segmentation, lowercase, Delete stopWords, add synonyms)

Built-in word splitter for Elasticsearch

Standard Analyzer – a default word Analyzer, word segmentation, lowercase processing
Simple Analyzer one according to non-letter segmentation (symbols are filtered), lowercase processing
Stop Analyzer a lowercase processing, Stop word filtering (the, a, is)
Whitespace Analyzer is divided by space, not lowercase
The Keyword Analyzer treats the input directly as the output, regardless of the word
Patter Analyzer a regular expression, black default \W+(non-character delimited)
Language one provides word segmentation for more than 30 common languages
Customer Analyzer Custom word Analyzer

5.3.1 Standard Analyzer

Default word divider
According to the word segmentation
Lower case processing

5.3.2 Simple Analyzer

By non-letter shards, all non-letter shards are removed
Lower case processing

5.3.3 Whitespace Analyzer

Data is segmented according to Spaces

5.3.4 Stop Analyzer

Compared with the Simple Analyzer
Stop the filter
- Will drop the, a, is and other modifiers

5.3.5 Keyword Analyzer

Treat an input as term output without word segmentation

5.3.6 Pattern Analyzer

Word segmentation through regular expression
The default is \W+, delimited by non-character symbols

5.3.7 Language Analyzers

5.4 Using the _Analyzer Api

Specify Analyzer directly for testing

GET /_analyze
{
  "analyzer": "standard"."text": "Mastering ElasticSearch,elasticsearch in Action"
}

results
{
  "tokens": [{"token": "mastering"."start_offset": 0."end_offset": 9."type": "<ALPHANUM>"."position": 0
    },
    {
      "token": "elasticsearch"."start_offset": 10."end_offset": 23."type": "<ALPHANUM>"."position": 1
    },
    {
      "token": "elasticsearch"."start_offset": 24."end_offset": 37."type": "<ALPHANUM>"."position": 2
    },
    {
      "token": "in"."start_offset": 38."end_offset": 40."type": "<ALPHANUM>"."position": 3
    },
    {
      "token": "action"."start_offset": 41."end_offset": 47."type": "<ALPHANUM>"."position": 4}}]Copy the code

Use Simple Analyzer for testing

GET /_analyze
{
  "analyzer": "simple"."text": "2 run 。Maste-ring ElasticSearch,elasticsearch in Action"
}
result
{
  "tokens": [{"token": "run"."start_offset": 2."end_offset": 5."type": "word"."position": 0
    },
    {
      "token": "maste"."start_offset": 7."end_offset": 12."type": "word"."position": 1
    },
    {
      "token": "ring"."start_offset": 13."end_offset": 17."type": "word"."position": 2
    },
    {
      "token": "elasticsearch"."start_offset": 18."end_offset": 31."type": "word"."position": 3
    },
    {
      "token": "elasticsearch"."start_offset": 32."end_offset": 45."type": "word"."position": 4
    },
    {
      "token": "in"."start_offset": 46."end_offset": 48."type": "word"."position": 5
    },
    {
      "token": "action"."start_offset": 49."end_offset": 55."type": "word"."position": 6}}]Copy the code

Use Whitespace Analyzer for testing

GET /_analyze
{
  "analyzer": "whitespace"."text": "Maste-ring ElasticSearch,elasticsearch in Action"
}

result
{
  "tokens": [{"token": "Maste-ring"."start_offset": 0."end_offset": 10."type": "word"."position": 0
    },
    {
      "token": "ElasticSearch,elasticsearch"."start_offset": 11."end_offset": 38."type": "word"."position": 1
    },
    {
      "token": "in"."start_offset": 39."end_offset": 41."type": "word"."position": 2
    },
    {
      "token": "Action"."start_offset": 42."end_offset": 48."type": "word"."position": 3}}]Copy the code

Use Stop Analyzer for testing

GET /_analyze
{
  "analyzer": "stop"."text": "this is a ElasticSearch,elasticsearch in Action"
}

result
{
  "tokens": [{"token": "elasticsearch"."start_offset": 10."end_offset": 23."type": "word"."position": 3
    },
    {
      "token": "elasticsearch"."start_offset": 24."end_offset": 37."type": "word"."position": 4
    },
    {
      "token": "action"."start_offset": 41."end_offset": 47."type": "word"."position": 6}}]Copy the code

Use Keyword Analyzer for testing

GET /_analyze
{
  "analyzer": "keyword"."text": "this is a ElasticSearch,elasticsearch in Action"
}
result
{
  "tokens": [{"token": "this is a ElasticSearch,elasticsearch in Action"."start_offset": 0."end_offset": 47."type": "word"."position": 0}}]Copy the code

Use Pattern Analyzer for testing

GET /_analyze
{
  "analyzer": "pattern"."text": "this is a Elastic-Search,elasticsearch in Action"
}

result
{
  "tokens": [{"token": "this"."start_offset": 0."end_offset": 4."type": "word"."position": 0
    },
    {
      "token": "is"."start_offset": 5."end_offset": 7."type": "word"."position": 1
    },
    {
      "token": "a"."start_offset": 8."end_offset": 9."type": "word"."position": 2
    },
    {
      "token": "elastic"."start_offset": 10."end_offset": 17."type": "word"."position": 3
    },
    {
      "token": "search"."start_offset": 18."end_offset": 24."type": "word"."position": 4
    },
    {
      "token": "elasticsearch"."start_offset": 25."end_offset": 38."type": "word"."position": 5
    },
    {
      "token": "in"."start_offset": 39."end_offset": 41."type": "word"."position": 6
    },
    {
      "token": "action"."start_offset": 42."end_offset": 48."type": "word"."position": 7}}]Copy the code

Using Language Analyzers

GET /_analyze
{
  "analyzer": "english"."text": "this is a Elastic-Search,elasticsearch in Action"
}

result
{
  "tokens": [{"token": "elast"."start_offset": 10."end_offset": 17."type": "<ALPHANUM>"."position": 3
    },
    {
      "token": "search"."start_offset": 18."end_offset": 24."type": "<ALPHANUM>"."position": 4
    },
    {
      "token": "elasticsearch"."start_offset": 25."end_offset": 38."type": "<ALPHANUM>"."position": 5
    },
    {
      "token": "action"."start_offset": 42."end_offset": 48."type": "<ALPHANUM>"."position": 7}}]Copy the code

Specifies the field of the index to test

POST my_store/_analyze
{
  "field": "productName"."text": "XHDK-A-1293-#fJ3"
}
result
{
  "tokens": [{"token": "xhdk"."start_offset": 0."end_offset": 4."type": "<ALPHANUM>"."position": 0
    },
    {
      "token": "a"."start_offset": 5."end_offset": 6."type": "<ALPHANUM>"."position": 1
    },
    {
      "token": "1293"."start_offset": 7."end_offset": 11."type": "<NUM>"."position": 2
    },
    {
      "token": "fj3"."start_offset": 13."end_offset": 16."type": "<ALPHANUM>"."position": 3}}]Copy the code

Custom word dividers for testing

POST /_analyze
{
  "tokenizer": "standard"."filter": ["lowercase"]."text":"Hello ElasticSearch"
}

result

{
  "tokens": [{"token": "hello"."start_offset": 0."end_offset": 5."type": "<ALPHANUM>"."position": 0
    },
    {
      "token": "elasticsearch"."start_offset": 6."end_offset": 19."type": "<ALPHANUM>"."position": 1}}]Copy the code

5.5 Difficulties in Chinese word segmentation

Chinese sentences, cut into one word instead of one word
In English, words are separated by natural Spaces
A Chinese sentence has different meanings in different contexts
- This apple is not very good/this apple is not very good!
Some examples
- There is a point in what he says
Use the default ElasticSearch word splitter for Chinese word segmentation

GET /_analyze
{
  "analyzer": "standard"."text": "It's not certain."
}
result
{
  "tokens": [{"token": "This"."start_offset": 0."end_offset": 1."type": "<IDEOGRAPHIC>"."position": 0
    },
    {
      "token": "Things"."start_offset": 1."end_offset": 2."type": "<IDEOGRAPHIC>"."position": 1
    },
    {
      "token": "Really"."start_offset": 2."end_offset": 3."type": "<IDEOGRAPHIC>"."position": 2
    },
    {
      "token": "Set"."start_offset": 3."end_offset": 4."type": "<IDEOGRAPHIC>"."position": 3
    },
    {
      "token": "No"."start_offset": 4."end_offset": 5."type": "<IDEOGRAPHIC>"."position": 4
    },
    {
      "token": "Under"."start_offset": 5."end_offset": 6."type": "<IDEOGRAPHIC>"."position": 5
    },
    {
      "token": "To"."start_offset": 6."end_offset": 7."type": "<IDEOGRAPHIC>"."position": 6}}]Copy the code

5.6 Chinese word segmentation IK

5.6.1 Basic use of IK word divider

IK word segmentation GitHub official document address

5.6.2 Ik_MAX_word segmentation parsing

GET /_analyze
{
  "analyzer": "ik_max_word"."text": "It's not certain."
}

result

{
  "tokens": [{"token": "This thing"."start_offset": 0."end_offset": 2."type": "CN_WORD"."position": 0
    },
    {
      "token": "Sure"."start_offset": 2."end_offset": 4."type": "CN_WORD"."position": 1
    },
    {
      "token": "Not coming down."."start_offset": 4."end_offset": 7."type": "CN_WORD"."position": 2
    },
    {
      "token": "不下"."start_offset": 4."end_offset": 6."type": "CN_WORD"."position": 3
    },
    {
      "token": "Down"."start_offset": 5."end_offset": 7."type": "CN_WORD"."position": 4}}]Copy the code

5.6.3 Use IK_smart for word Segmentation

GET /_analyze
{
  "analyzer": "ik_smart"."text": "It's not certain."
}
result
{
  "tokens": [{"token": "This thing"."start_offset": 0."end_offset": 2."type": "CN_WORD"."position": 0
    },
    {
      "token": "Sure"."start_offset": 2."end_offset": 4."type": "CN_WORD"."position": 1
    },
    {
      "token": "Not coming down."."start_offset": 4."end_offset": 7."type": "CN_WORD"."position": 2}}]Copy the code

5.6.4 Use word segmentation for highlighting queries

GET companyinfo/_search
  {
    "query" : { "match" : { "entName" : "Beijing Letter Check" }},
    "highlight" : {
        "pre_tags" : ["<tag1>"."<tag2>"]."post_tags" : ["</tag1>"."</tag2>"]."fields" : {
            "entName": {}}}."from": 0
    , "size": 1
}
result
{
  "took": 2357."timed_out": false."_shards": {
    "total": 5."successful": 5."failed": 0
  },
  "hits": {
    "total": 4665655."max_score": 21.237017."hits": [{"_index": "companyinfo"."_type": "companyinfo"."_id": "5083796b34f940698d9cb0ce2984f314"."_score": 21.237017."_source": {
          "id": "5083796b34f940698d9cb0ce2984f314"."bossId": "05232caa241311-p-05232d3624131"."orgLogo": "https://static.xinchacha.com/companyLogo/5083796b34f940698d9cb0ce2984f314.png?Expires=1609503319&OSSAccessKeyId=LTAI4GF jBCimq7VBgRQ5LKfq&Signature=2DppCS5yzYTZvtMT45GYqdHtjkM%3D"."entName": Beijing Xinchacha Credit Management Co., LTD.."telphone": "400-900-6808"."website": "http://www.xcc.com"."email": "[email protected]"."introduction": "The main product of Beijing Xinchacha Credit Management Co., Ltd. is credit communication and encryption protection."."readAddress": Room 516, Floor 5, Building 1, Yard 5, Longyu North Street, Changping District, Beijing."corporation": "Liu"."sameEnterprise": "< Associated Enterprise 1>"."enterpriseStatus": "Open"."regCapitalNumber": 1000."regCapital": "10 million RMB"."contributedcapital": ""."regDate": "2019-08-02"."checkDate": "2019-08-02"."creditCode": "91110114MA01LTMB1Y"."orgCode": "MA01LTMB-1"."taxpayerIdNo": "91110114MA01LTMB1Y"."taxpayerQualification": ""."businessRegCode": ""."industry": "Information Transmission, Software and Information Technology Services"."enterpriseType": "Limited liability Company (sole natural person)"."businessTerm": 2019-08-02 to unlimited term."staffSize": ""."contributors": ""."registrationAuthority": Changping Branch of Beijing Administration for Industry and Commerce."city": "Beijing"."town": "Beijing"."district": ""."oldOrgName": ""."orgNameEn": ""."address": Room 516, Floor 5, Building 1, Yard 5, Longyu North Street, Changping District, Beijing."businessScope": "Collection and evaluation of enterprise credit (excluding financial credit investigation); Software development; Computer system services; Enterprise management; Market research; Economic information consultation (excluding intermediary); Basic software services; Application software services (excluding medical software); To undertake exhibitions and exhibitions; Conference services; Technology development, technology consultation, technology exchange, technology transfer and technology popularization; Technical services; Design, production, agency, advertising; Educational consulting. (Enterprises independently choose business projects and carry out business activities in accordance with the law; For projects subject to approval according to law, business activities shall be carried out according to the approved contents after approval by relevant departments; Shall not engage in business activities of projects prohibited or restricted by the municipal industrial policies.) )"
        },
        "highlight": {
          "entName": [
            "< tag1 > Beijing < / tag1 > < tag1 > letter < / tag1 > < tag1 > check < / tag1 > credit management co., LTD."}}]}}Copy the code

5.6.5 Description of the returned result from the word segmentation

{
  "tokens": [{"token": "This thing"."start_offset": 0."end_offset": 2."type": "CN_WORD"."position": 0}]} token: specific content start_offset: start position end_offset: end position Type: type position: position (subscript)Copy the code

6 Search API

URI Search
- Use query parameters in the URL
Request Body Search
- More complete Query Domain Specific Language(DSL) based on JSON format with ElasticSearch

6.1 the URI query

Use “q” to specify the query string
“Query String syntax”,KV key value pair

6.2 Request Body Query

6.3 the Response parsing

6.3.1 Relevance Analysis

Search is a conversation between the user and the search engine
Users care about the relevance of search results
- Can you find all the relevant content
- How much irrelevant content is returned
- Whether the document is rated reasonably
- Balance ranking results with business requirements

6.3.2 Measuring relevance

Information Retrieval
- Precision returns as few extraneous documents as possible
  - Precision-true Positive/ All results returned (True and False Positive)
- Recall as many relevant documents as possible
  - Recall -True Positive/ All Positive results that should be returned (True positives + False Negtives)
- RanKing – Whether it is possible to sort by relevance

Note: Refer to ElasticSearch for correlation calculation

7. The URI explanation

7.1 Searching through URI Query

GET /companyinfo/_search? Q = company & df = entName&from =0&size=1&timeout=1s
{
  "profile":"true"
}
Copy the code

Q Specifies the Query statement, using Query String Syntax
Df Default field. If this parameter is not specified, all fields will be queried
Sort Sort/FROM and size are used for paging
Profile to see how queries are executed

7.2 Query String Syntax (1)

Specify field V.s generic query
- q=title:2012 /q=2012

# # # # # # # # # # specified field for query # # # # # # # # # # # # # # # # # # # # # GET/companyinfo / _search? Q = entName: the company & the from =0&size=1&timeout=1s
{
  "profile":"true"
}
result
{
  "took": 2."timed_out": false."_shards": {
    "total": 5."successful": 5."failed": 0
  },
  "hits": {
    "total": 3807."max_score": 17.098007."hits": [{"_index": "companyinfo"."_type": "companyinfo"."_id": "6355d4063b5311eb925000163e350731"."_score": 17.098007."_source": {
          "entName": "MEDSENTIAL,L.L.C"."orgLogo": ""."regCapital": ""."city": ""."regDate": ""."industry": ""."taxpayerIdNo": ""."creditCode": ""."registrationAuthority": ""."staffSize": ""."orgCode": ""."enterpriseStatus": ""."id": "6355d4063b5311eb925000163e350731"."businessRegCode": ""."email": ""."introduction": ""."regCapitalNumber": 0."website": ""."address": ""."town": ""."bossId": ""."corporation": "No."."businessScope": ""."businessTerm": "- up to unlimited term"."contributedcapital": ""."checkDate": ""."enterpriseType": ""."orgNameEn": ""."taxpayerQualification": ""."telphone": ""."district": ""."sameEnterprise": "< Associated enterprise >"."oldOrgName": ""."readAddress": ""."contributors": "0"}}},"profile": {
    "shards": [{"id": "[3Ja7gZvNRfSLKQ4iGlsUgg][companyinfo][2]"."searches": [{"query": [{"type": "TermQuery"."description": "entName:l"."time": "0.5806670000 ms"."time_in_nanos": 580667."breakdown": {
                  "score": 192605."build_scorer_count": 63."match_count": 0."create_weight": 204501."next_doc": 100255."match": 0."create_weight_count": 1."next_doc_count": 996."score_count": 792."build_scorer": 81454."advance": 0."advance_count": 0}}]."rewrite_time": 1644."collector": [{"name": "CancellableCollector"."reason": "search_cancelled"."time": "0.5750680000 ms"."time_in_nanos": 575068."children": [{"name": "TimeLimitingCollector"."reason": "search_timeout"."time": "0.4419630000 ms"."time_in_nanos": 441963."children": [{"name": "SimpleTopScoreDocCollector"."reason": "search_top_hits"."time": "0.3136840000 ms"."time_in_nanos": 313684}]}]}],"aggregations": []}, {"id": "[3Ja7gZvNRfSLKQ4iGlsUgg][companyinfo][3]"."searches": [{"query": [{"type": "TermQuery"."description": "entName:l"."time": "0.6785800000 ms"."time_in_nanos": 678580."breakdown": {
                  "score": 209943."build_scorer_count": 61."match_count": 0."create_weight": 266078."next_doc": 107535."match": 0."create_weight_count": 1."next_doc_count": 908."score_count": 759."build_scorer": 93295."advance": 0."advance_count": 0}}]."rewrite_time": 1855."collector": [{"name": "CancellableCollector"."reason": "search_cancelled"."time": "0.6240210000 ms"."time_in_nanos": 624021."children": [{"name": "TimeLimitingCollector"."reason": "search_timeout"."time": "0.4900560000 ms"."time_in_nanos": 490056."children": [{"name": "SimpleTopScoreDocCollector"."reason": "search_top_hits"."time": "0.3464840000 ms"."time_in_nanos": 346484}]}]}],"aggregations": []}, {"id": "[3Ja7gZvNRfSLKQ4iGlsUgg][companyinfo][4]"."searches": [{"query": [{"type": "TermQuery"."description": "entName:l"."time": "0.5367190000 ms"."time_in_nanos": 536719."breakdown": {
                  "score": 198601."build_scorer_count": 40."match_count": 0."create_weight": 167458."next_doc": 110958."match": 0."create_weight_count": 1."next_doc_count": 878."score_count": 742."build_scorer": 58041."advance": 0."advance_count": 0}}]."rewrite_time": 6874."collector": [{"name": "CancellableCollector"."reason": "search_cancelled"."time": "0.5998340000 ms"."time_in_nanos": 599834."children": [{"name": "TimeLimitingCollector"."reason": "search_timeout"."time": "0.4730040000 ms"."time_in_nanos": 473004."children": [{"name": "SimpleTopScoreDocCollector"."reason": "search_top_hits"."time": "0.3354350000 ms"."time_in_nanos": 335435}]}]}],"aggregations": []}, {"id": "[wQwmOosAQjSSL7-qjOg7Pw][companyinfo][0]"."searches": [{"query": [{"type": "TermQuery"."description": "entName:l"."time": "0.5707300000 ms"."time_in_nanos": 570730."breakdown": {
                  "score": 193154."build_scorer_count": 60."match_count": 0."create_weight": 206270."next_doc": 92610."match": 0."create_weight_count": 1."next_doc_count": 993."score_count": 744."build_scorer": 76898."advance": 0."advance_count": 0}}]."rewrite_time": 1685."collector": [{"name": "CancellableCollector"."reason": "search_cancelled"."time": "0.7637310000 ms"."time_in_nanos": 763731."children": [{"name": "TimeLimitingCollector"."reason": "search_timeout"."time": "0.6415250000 ms"."time_in_nanos": 641525."children": [{"name": "SimpleTopScoreDocCollector"."reason": "search_top_hits"."time": "0.3118390000 ms"."time_in_nanos": 311839}]}]}],"aggregations": []}, {"id": "[wQwmOosAQjSSL7-qjOg7Pw][companyinfo][1]"."searches": [{"query": [{"type": "TermQuery"."description": "entName:l"."time": "0.6177530000 ms"."time_in_nanos": 617753."breakdown": {
                  "score": 215893."build_scorer_count": 55."match_count": 0."create_weight": 165227."next_doc": 107813."match": 0."create_weight_count": 1."next_doc_count": 980."score_count": 770."build_scorer": 127014."advance": 0."advance_count": 0}}]."rewrite_time": 1355."collector": [{"name": "CancellableCollector"."reason": "search_cancelled"."time": "0.9852530000 ms"."time_in_nanos": 985253."children": [{"name": "TimeLimitingCollector"."reason": "search_timeout"."time": "0.8321480000 ms"."time_in_nanos": 832148."children": [{"name": "SimpleTopScoreDocCollector"."reason": "search_top_hits"."time": "0.3572100000 ms"."time_in_nanos": 357210}]}]}],"aggregations": []}]}}Copy the code

Term v.s Phrase (PhraseQuery)
- Beautiful Mind is equivalent to Beautiful OR Mind
- “Beautiful Mind”, equivalent to Beautiful AND Mind. The Phrase query also requires the same sequence

GET /companyinfo/_search? q=entName:ALM INTERNATIONAL {"profile":"true"
}
Copy the code

Grouping and quotation marks
- title:(Beautiful AND Mind)
- title=”Beautiful Mind”
grouping

GET /companyinfo/_search? q=entName:(ALM INTERNATIONAL) {"profile":"true"
}
Copy the code

quotes

GET /companyinfo/_search? q=entName:"ALM INTERNATIONAL"
{
  "profile":"true"
}

Copy the code

7.3 Query String Syntax (2)

Boolean operations
- AND/OR/NOT OR && / | | /!
- title:(matrix NOT reloaded)
grouping
- + said must
- Said must_not
- title:(+matrix -reloaded)

AND example

GET /companyinfo/_search? q=entName:(ALM AND INTERNATIONAL) {"profile":"true"
}
Copy the code

OR Operation Example

GET /companyinfo/_search? q=entName:(ALM OR INTERNATIONAL) {"profile":"true"
}
Copy the code

NOT Operation Example

GET /companyinfo/_search? q=entName:(ALM NOT INTERNATIONAL) {"profile":"true"
}
Copy the code

+ Operation Example

GET /companyinfo/_search? q=entName:(ALM %2BINTERNATIONAL) {"profile":"true"
}
Copy the code

7.4 Query String Syntax (3)

Range queries
- Interval indicates :[] closed interval, {} open interval
  - year:{2019 TO 2018}
  - year:[* TO 2018]
Math symbols
- year:>2010
- year:(>2010&&<=2018)
- year:(+>2010+<=2018)

GET /companyinfo/_search? q=regCapitalNumber:[* TO2018]
{
  "profile":"true"
}

Copy the code

7.5 Query String Syntax (4)

Wildcard query (Wildcard query is inefficient and occupies large memory. Therefore, it is not recommended to use wildcard query, especially in the first place)
- ? The value contains 1 character, and * represents 0 or more characters
  - title:mi? d
  - title:be*
Regular expression
- title:[bt]oy
Fuzzy matching and approximate query
- title:but~1
- title:”but”~2

Wildcard query example

GET /companyinfo/_search? q=entName:b*&from=0&size=1&timeout=1s
{
  "profile":"true"
}
Copy the code

Fuzzy matching query example

GET /companyinfo/_search? q=entName:b~1&from=0&size=1&timeout=1s
{
  "profile":"true"
}
Copy the code

Example of approximation matching

GET /companyinfo/_search? q=entName:"B"~2
{
  "profile":"true"
}
Copy the code

8 Introduction to the Request Body & Query DSL

8.1 the Request Body Search

Send the query to ElasticSearch via the HTTP Request Body
Query DSL

POST /my_test_index,my_store/_search? ignore_unavailable=true
{
  "profile":true."query": {
    "match_all": {}}}Copy the code

8.1.1 paging

POST /my_store/_search
{
  "from": 0
  , "size": 20
  , "query": {
    "match_all": {}}}Copy the code

From starts at 0 and returns 10 results by default
The higher the cost of turning pages later in the fetch

8.1.2 sorting

GET /my_store/_search
{
  "sort": [{"price": "desc"}]."from": 0."size": 20."query": {
    "match_all": {}}}Copy the code

It is best to sort by “number” and “date” fields
Because for sorting multi-value types or parsed fields, the system will pick a value that is not known

8.1.3 _source filtering

GET /my_store/_search
{
  "_source": ["price"."productAge"]."from": 0."size": 20."query": {
    "match_all": {}}}Copy the code

If _source is not stored, only the metadata of the matching document is returned
_source Wildcard character _source[“name*”,”desc*”]

8.1.4 Script Fields

GET my_store/_search
{
  "script_fields": {
    "new_field": {
      "script": {
        "lang": "painless"."source":"doc['productName'].value+'hello'"}}},"query": {
    "match_all": {}}}Copy the code

8.1.5 Using the query expression -match

GET /my_store/_search
{
  "query": {
    "match": {
      "productID": 30
    }
  }
}
GET /my_store/_search
{
  "query": {
    "match": {
      "productName": {"query": "ZHANGSAN"."operator": "and"}}}}Copy the code

8.1.6 Phrase Search -match Phrase

GET my_store/_search
{
  "query": {
    "match_phrase": {
      "content": {
        "query": "wang san"."slop":1
      }
    }
  }
}

result 
{
  "took": 0."timed_out": false."_shards": {
    "total": 1."successful": 1."failed": 0
  },
  "hits": {
    "total": 1."max_score": 1.0942618."hits": [{"_index": "my_store"."_type": "products"."_id": "AXdIzcDtomOanSvnaKZX"."_score": 1.0942618."_source": {
          "content": "my name is wang san"}}]}}Copy the code

9 Query String and Simple Query String

9.1 the Query String of the Query

Similar to the URI Query

POST my_store/_search
{
  "query": {
    "query_string": {
      "default_field": "content"."query": "my name is"
    }
  }
}
POST my_store/_search
{
  "query": {
    "query_string": {
      "fields": ["content"."productName"]."query": "(my name is) OR (ZHANGSAN)"}}}Copy the code

9.2 Simple Query String Query

Similar to Query String, but ignores incorrect syntax and supports only partial Query syntax
AND OR NOT is NOT supported AND is treated as a string
The default relationship between terms is OR, and Operator can be specified
Support partial logic
- - Replace the AND
- | replace the OR
- – replace the not

POST my_store/_search
{
  "query": {
    "simple_query_string": {
      "query": "my name is"."fields": ["content"]."default_operator": "AND"}}}Copy the code

10 Dynamic Mapping and common field types

10.1 What is Mapping

Mapping is similar to the definition of schema in a database
- Define the field types in the index
- Define the data types of fields, such as strings, numbers, booleans…
- The relevant configuration of fields, inverted indexes, (Analyzed or Not Analyzed,Analyzer)
Mapping maps JSON documents into the flat format that Lucene needs
A Mapping belongs to the Type of an index
- Each document belongs to a Type
- A Type has a Mapping definition
- 7.0 From now on, you do not need to specify type information in the Mapping definition

10.2 Data type of the field

A simple type
- Text/Keyword
- Date
- Integer/Floating
- Boolean
- IPv4&IPv6
The complex type
- Object type/nested type
Special type
- geo_point&geo_shape/percolator

10.3 What is Dynamic Mapping

When a document is written, an index is automatically created if it does not exist
The Dynamic Mapping mechanism eliminates the need to manually define Mappings. ElasticSearch automatically calculates the field type based on the document information
But sometimes the calculations are wrong, such as geographical location information
When the type is not set correctly, some functions, such as the Range query, will not work properly
Check my_store mapping information

{
GET my_store/_mapping

  "embranchment_v1": {
    "mappings": {
      "embranchment_v1": {
        "_all": {
          "enabled": false
        },
        "date_detection": false."properties": {
          "companyId": {
            "type": "keyword"
          },
          "embranchmentName": {
            "type": "text"."fields": {
              "keyword": {
                "type": "keyword"."ignore_above": 256}}},"id": {
            "type": "text"."fields": {
              "keyword": {
                "type": "keyword"."ignore_above": 256}}},"principal": {
            "type": "text"."fields": {
              "keyword": {
                "type": "keyword"."ignore_above": 256}}},"regDate": {
            "type": "text"
          },
          "relation": {
            "type": "text"."fields": {
              "keyword": {
                "type": "keyword"."ignore_above": 256}}},"status": {
            "type": "keyword"}}},"_default_": {
        "_all": {
          "enabled": false
        }
      }
    }
  }
}
Copy the code

10.4 Automatic Type Identification

JSON type	ElasticSearch type
string	– Match the date format, – set to date Sets the number to float or long, which is off by default – set to Text, and add the keyWord field
Boolean value	boolean
floating-point	float
The integer	long
object	Object
An array of	Is determined by the type of the first non-null value
A null value	ignore
## 10.5 Can I change the field type of the Mapping

Two cases
- Newly added field
  - When Dynamic is set to true, the Mapping is updated as soon as documents with new fields are written
  - If Dynamic is set to false, the Mapping is not updated and the new field data cannot be indexed but the information will appear in _source
  - Dynamic is set to Strict, document writing fails
- Once data has been written to an existing field, the field definition cannot be changed
  - Lucene’s implementation of inverted indexes, once generated, is not allowed to change
- If you want to change the field type, you must Reindex the API to rebuild the index
why
- If the data type of a field is changed, indexed data cannot be searched
- But adding new fields does not have the same effect

10.6 Controlling Dynamic Mappings

When dynamic is set to false, data is written to new fields, which can be indexed but new fields are discarded
When set to Strict mode, data write directly fails

	true	false	strict
Indexable document	YES	YES	NO
Field indexability	YES	NO	NO
The Mapping is updated	YES	NO	NO

PUT my_store
{
	"mappings": {"_doc": {"dynamic":"false"}}}Copy the code

11 Display Mapping Settings and common parameters

11.1 Suggestions for Customizing the Mapping

You can refer to the API manual and write it by hand
In order to reduce the input workload and reduce the probability of error, you can follow the following steps
- Create a temporary index and write some sample numbers
- Get the dynamic Mapping definition for this temporary file by accessing the Mapping API
- After modification, use this configuration to create your index
- Drop temporary index

11.2 Control whether the current field is indexed

Index Controls whether the current field is indexed. The default is true. If set to false, the field is not searchable
You can avoid creating inverted indexes and save disk overhead

11.3 the index Options

There are four different levels of index Options that allow you to invert the contents of index records
- Docs Record doc ID
- Freqs records the DOC ID and term Frequencies
- Positions record doc ID /term Comb /term position
- offset dic id/term frequencies/term posistion/character offects
The default value of the Text type is postions, and the default value of the other types is docs
More records occupy more storage space

11.4 null_value

A Null search is required
Only the KeyWord type supports Null_Value

11.5 copy_to set

_all was replaced by copt_to in 7
Meet some specific search requirements
Copy_to copies the value of the field to the target field, similar to _all
The target field of copy_to does not appear in _source

11.6 Array Types

ElasticSearh does not provide specific array types. But any field can contain multiple values of the same type

12 Configure custom Analyzer in Multi-field Feature and Mapping

12.1 Multi-field type

Accurate matching of manufacturer names
- Add a keyword field
Use a different Analyzer
- Different languages
- Pinyin field search
- It also supports specifying different Analyzers for search and index

12.2 Exact Values v.s Full Text

Exact Values v.s Full Text
- Exact Value: contains numbers/dates/a specific string (for example, “Apple Store”).
  - The KeyWord in ElasticSearch
- Full-text, unstructured text data
- The text in ElasticSearch

12.3 Exact Values are not needed

ElasticSearch creates an inverted index for each field
- Exact Value does not need special word segmentation when indexing

12.4 Customizing Participles

When the word splitter of ElassticSearch belt cannot be satisfied, a custom word splitter can be realized by self-combining different components
- Character Filter
- Tokenizer
- Token Filter

12.4.1 Character Filter

Processing of text prior to Tokenizer, such as adding delete and replace characters. Multiple Character Filters can be configured. Affects the Tokenizer’s postion and offset information
Some built-in Character Filters
- HTML Stricp – To remove HTML tags
- Mapping String Replacement
- Pattern replace Indicates the regular match replacement

12.4.2 Tokenizer

The original text is segmented into terms or tokens according to certain rules.
ElasticSearch built-in Tokenizer
- whitespace/tandard/uax_url_email/pattern/keyword/path hierarchy
You can develop plug-ins in Java to implement your own Tokenizer

12.4.3 Token Filters

Add, modify, and delete words from the Tokenizer output
Built-in Token Filters
- Lowercase/stop/synonym

13 Index Template and Dynamic Template

13.1 What is Index Template?

The Index Template helps you set Mappings and Settings and automatically match them to newly created indexes according to certain rules
- Templates are only useful when an index is newly created. Modifying a template does not affect indexes that have been created
- You can merge multiple index templates, and these Settings are merged
- You can merge the process by specifying the values of “order”

13.2 How Index Template works

When an index is newly created
- Apply ElasticSearch default setting and Mappings
- Apply the Settings in Index Template with low order
- If the Index Template with the order high is applied, the previous setting will be overwritten
- Setting and Mappings specified by the user when creating the index are applied and override the Settings in the previous template

13.3 What is Dynamic Template

Set the data type dynamically based on the data type identified by ElasticSearch, along with the field name
- All pay-only types are set to keyword, or the keyword field is turned off
- Fields starting with is are set to false
- Everything starting with long_ is set to long

The Dynamic Template is defined in a Mapping of an index
Template has a name
The matching rule is an array
Set the Mapping to match the field

14 Overview of ElasticSearch Aggregation Analysis

14.1 What is Aggregation?

ElasticSearch provides statistical analysis of ES data in addition to search
- Real time high
- Hadoop(T+1)
By aggregating, we get an overview of the data, analyzing and summarizing the whole set of data rather than looking for individual documents
- The number of rooms in an area
- Different price range, number of hotels available for booking
High performance, you can get analysis results from ElasticSearch with only one statement
- You don’t need to implement the analysis logic yourself on the client side

Polymerization analysis of Kibana

14.2 Classification of sets

Bucket Aggregation A collection of columns of documents that meet specific criteria
Metric Aggregation A mathematical operation that provides statistical analysis of document fields
Pipeline Aggregation reaggregates the other Aggregation results
Matrix Aggregration supports operations on multiple fields and provides a result Matrix

14.2.1 Bucket & Metric

Metric A series of statistical methods
Bucket is a set of documents that meet the criteria

14.2.1.1 Bucket

Some examples
- Hangzhou belongs to Zhejiang/an actor belongs to male or female
- Nested relationship – Hangzhou belongs to Zhejiang belongs to China belongs to Asia
ElasticSearch provides many types of buckets to help you partition documents in a variety of ways
- Term & Range(time/age/geographical location)

Aggregate the number of enterprises by region

GET companyinfo/_search
{
  "size":0."aggs": {"flight_dest": {"terms": {
      "field":"city"}}}} {"took": 6485."timed_out": false."_shards": {
    "total": 5."successful": 5."failed": 0
  },
  "hits": {
    "total": 157377743."max_score": 0."hits": []},"aggregations": {
    "flight_dest": {
      "doc_count_error_upper_bound": 1243691."sum_other_doc_count": 97419767."buckets": [{"key": ""."doc_count": 25140413
        },
        {
          "key": "Guangdong province"."doc_count": 6211429
        },
        {
          "key": Jiangsu Province."doc_count": 4535255
        },
        {
          "key": "Shandong Province"."doc_count": 4360040
        },
        {
          "key": "Beijing"."doc_count": 4061179
        },
        {
          "key": "Shanghai"."doc_count": 3813461
        },
        {
          "key": Zhejiang Province."doc_count": 3713236
        },
        {
          "key": Sichuan Province."doc_count": 2834499
        },
        {
          "key": Henan Province."doc_count": 2747657
        },
        {
          "key": "Hebei Province"."doc_count": 2540804}]}}}Copy the code

14.2.1.2 joined the Metrics

Aggregate enterprises according to the city and take out the maximum and minimum registered capital

GET companyinfo/_search
{
  "size":0."aggs": {"flight_dest": {"terms": {
      "field":"city"}},"max_price": {"max": {
        "field": "regCapitalNumber"}},"min_price": {"min": {
        "field": "regCapitalNumber"}}}} {"took": 26141."timed_out": false."_shards": {
    "total": 5."successful": 5."failed": 0
  },
  "hits": {
    "total": 157377743."max_score": 0."hits": []},"aggregations": {
    "max_price": {
      "value": 18944818812
    },
    "min_price": {
      "value": 0
    },
    "flight_dest": {
      "doc_count_error_upper_bound": 1243691."sum_other_doc_count": 97419767."buckets": [{"key": ""."doc_count": 25140413
        },
        {
          "key": "Guangdong province"."doc_count": 6211429
        },
        {
          "key": Jiangsu Province."doc_count": 4535255
        },
        {
          "key": "Shandong Province"."doc_count": 4360040
        },
        {
          "key": "Beijing"."doc_count": 4061179
        },
        {
          "key": "Shanghai"."doc_count": 3813461
        },
        {
          "key": Zhejiang Province."doc_count": 3713236
        },
        {
          "key": Sichuan Province."doc_count": 2834499
        },
        {
          "key": Henan Province."doc_count": 2747657
        },
        {
          "key": "Hebei Province"."doc_count": 2540804}]}}}Copy the code

ElasticSearch (Getting Started)