1. Basic concepts of ElasticSearch
1.1 document (Doucument)
1.ElasticSearch is document-oriented, a document is the smallest unit of all searchable data · log entries in a log file · Details of a movie/details of an album · Details of a song in an MP3 player/details of a PDF file 2. ·JSON object consists of fields · Each field has a corresponding field type (string/numeric/Boolean/date/binary/range type) 3. Each document has a Unique ID. You can specify the ID yourself or create it automatically through ElasticSearchCopy the code
1.2 JSON document
A document contains a series of fields, similar to a record in a database table JSON document format flexibility does not need to define the format · Field type can be specified or automatically calculated through ElasticSearch · Support array/support nestingCopy the code
1.3 Metadata of documents
{
"_index": "my_test_index",
"_type": "test_idnex",
"_id": "AXcpGrIeEQcMCfQJ7Gc5",
"_score": 1,
"_source": {
"testId": "4",
"testName": "zhaoliu"
}
}
Copy the code
Metadata, information about standard documents
- _index Indicates the index name of the document
- _type Specifies the type to which the document belongs
- _id Indicates the unique ID of a document
- _source Indicates the original JSON data of the document
- _all Consolidates the contents of all fields into this field and is invalid
- _version Document version
- _score Indicates the correlation score
1.4 the index
{ "my_test_index": { "settings": { "index": { "search": { "slowlog": { "level": "info", "threshold": { "fetch": { "warn": "200ms", "trace": "50ms", "debug": "80ms", "info": "100ms" }, "query": { "warn": "200ms", "trace": "50ms", "debug": "80ms", "info": "100ms" } } } }, "indexing": { "slowlog": { "level": "info", "threshold": { "index": { "warn": "200ms", "trace": "20ms", "debug": "50ms", "info": "100ms" } }, "source": "1000" } }, "number_of_shards": "5", "provided_name": "my_test_index", "creation_date": "1611301841428", "unassigned": { "node_left": { "delayed_timeout": "5m" } }, "number_of_replicas": "1", "uuid": "e5B65ySmQ-GE8Tj9gUHIPw", "version": { "created": "5050399"}}}}}Copy the code
- INDEX is a container for documents and a combination of a class of documents
- INDEX represents the concept of logical space: each INDEX has its own mapping definition that defines the field name and field type that contains the document
- The Shard represents the concept of physical space where the data in the index is spread over the Shard
- Index Mapping and Settings
- Mapping defines the types of document fields
- Setting defines different data distributions
1.5 Type
- Prior to 7.0, multiple Types could be set for an Index
- Since 6.0,Type has been Deprecated(not recommended). As of 7.0 only one Type – > “_doc” can be created for an index
1.6 Abstraction and analogy
RDBMS | ElasticSearch |
---|---|
Table | Index(Type) |
Row | Document |
Column | Filed |
Schema | Mapping |
SQL | DSL |
- Prior to 7.0, multiple Types could be set for an Index
- Currently, Type has been Deprecated(Deprecated), as of 7.0 only one Type can be created for an index – > “_doc”
- The difference between traditional relational databases and ElasticSearch
— ElasticSearch — Schemaless/Correlation/High performance full text search — RDMS-transactional /Join
2. Nodes, clusters, fragments, and replicas
2.1 Distributed Features
- Benefits of ElasticSearch’s distributed architecture – Horizontal storage capacity expansion – Improved system availability. The entire cluster is not affected when some nodes stop serving
- ElasticSearch’s distributed architecture
- Different clusters are distinguished by different names default is “ElasticSearch”
- You can change the cluster name in the configuration file or run the -e cluster.name=geektime command
- A cluster can have one or more nodes
2.2 the node
- The node is an instance of ElasticSearch
Multiple ElasticSearch processes can run on a machine, but production environments generally recommend running only one instance of ElasticSearch on a machine
- Each node has a name specified in the configuration file or during startup. -e node.name=node1
- Each connection is assigned a UID after startup and stored in the data directory
2.2.1 Master- Eligible Nodes and Master Nodes
- Each node starts with a Master eligible node by default
- You can set Node. master:false to disable node.master
- Master-eligible You can join the main selection process to become a Master node
- When the first node starts, it elects itself as the Master node
- Each node stores the state of the cluster. Only the Master node can change the state of the cluster
- Cluster State maintains the necessary information in a Cluster
- Information about all nodes
- All indexes and their associated Mapping and Setting information
- Fragmented routing information
- Any node can modify the information, resulting in data inconsistency
- Cluster State maintains the necessary information in a Cluster
2.2.2 Data Node & Coordinating Node
- Data Node
- The nodes that can store Data are called Data nodes and are responsible for storing fragmented Data. It plays a crucial role in data expansion
- Coordinating Node
- Receives Client requests, distributes them to the appropriate nodes, and finally aggregates the results together
- Each Node has a Coordinating Node role by default
2.2.3 Other Nodes
- Hot & Warm Node (参考链接Hot & Warm Node)
— Data nodes with different hardware configurations to implement the Hot & Warm architecture and reduce the cost of cluster deployment
- Machine Learning Node
— The Job responsible for machine learning is used for anomaly detection
- Tribe Node (Coordinating nodes to act as federated clients across multiple clusters)
The Tribe Node connects to different ElasticSearch clusters and supports treating them as a single Cluster
2.2.4 Configuring the Node Type
- A node can play multiple roles in a development environment
- In a production environment, you should set up a single-role node (Dedicated node)
The node type | Configuration parameters | The default value |
---|---|---|
maste eligible | node.master | true |
data | node.data | true |
ingest | node.ingest | true |
coordinating only | There is no | Each node defaults to a coordinating node and sets all other types to false |
machine learning | node.ml | True (to enable x – pack) |
2.3 Shard (Primary Shard & Replica Shard)
- Master sharding, to solve the problem of horizontal data scaling. With master sharding, data can be partitioned across all nodes in the cluster
- A shard is a running instance of Lucene
- The number of primary shards is specified when the index is created and cannot be changed later, except for Reindex
- Copy, to solve the problem of high availability of data. A shard is a copy of the master shard
- The number of duplicate fragments can be dynamically adjusted
- Increasing the number of copies can also improve the availability of the service to some extent (read throughput)
- The distribution of blogs indexes in a three-node cluster
2.3.1 Sharding setting
- Capacity planning is required for the production environment
- The number of fragments is too small. Procedure
- Nodes cannot be added to achieve horizontal scaling
- The amount of data in a single fragment is large, which takes time to reallocate data
- If the shard number is set to 7.0, the main shard is set to 1 by default.
Shards are also resources. Too many shards may affect cluster stability. Because there are too many shards, there will be more meta information, which will consume heap memory. Too many shards can also affect read/write performance, since each read/write request requires one thread. So if the index does not have a large amount of data, you do not need to set many shards.
The problem of)- It affects the relevance scoring of search results and the accuracy of statistical results
- When there are too many fragments on a node, resources will be wasted and performance will be affected
- The number of fragments is too small. Procedure
2.4 Checking the Cluster Health Status
GET _cluster/health
{
"cluster_name": "es-cn-zz11rb9fv000fj1pe"."status": "green"."timed_out": false."number_of_nodes": 6."number_of_data_nodes": 3."active_primary_shards": 766."active_shards": 1507."relocating_shards": 0."initializing_shards": 0."unassigned_shards": 0."delayed_unassigned_shards": 0."number_of_pending_tasks": 0."number_of_in_flight_fetch": 0."task_max_waiting_in_queue_millis": 0."active_shards_percent_as_number": 100
}
GET _cat/nodes
172.1725.. 39 45 91 3 0.23 0.08 0.06 di - 3Ja7gZv
172.1725.53. 55 79 1 0.00 0.01 0.05 mi * H1guebi
172.1725.52. 22 78 0 0.01 0.02 0.05 mi - rdjzfmG
172.1725.51. 24 78 0 0.00 0.01 0.05 mi - uaU255o
172.1725.38. 54 91 2 0.23 0.26 0.16 di - wQwmOos
172.1725.40. 65 89 1 0.01 0.17 0.26 di - 4mZ8XK7
GET _cat/shard
companyinfo 4 r STARTED 31408061 38.5gb 172.1725.38. wQwmOos
companyinfo 4 p STARTED 31408061 40.2gb 172.1725.. 39 3Ja7gZv
companyinfo 1 p STARTED 31412834 43.2gb 172.1725.38. wQwmOos
companyinfo 1 r STARTED 31412834 41.7gb 172.1725.. 39 3Ja7gZv
companyinfo 3 r STARTED 31407535 37.6gb 172.1725.40. 4mZ8XK7
companyinfo 3 p STARTED 31407535 36.8gb 172.1725.. 39 3Ja7gZv
companyinfo 2 r STARTED 31412927 41.8gb 172.1725.40. 4mZ8XK7
companyinfo 2 p STARTED 31412927 41.2gb 172.1725.. 39 3Ja7gZv
companyinfo 0 p STARTED 31400572 40.4gb 172.1725.40. 4mZ8XK7
companyinfo 0 r STARTED 31400572 43.1gb 172.1725.38. wQwmOos
Copy the code
- Green Master shards and replicas are fine
- Yellow primary fragments were all properly allocated, but duplicate fragments were not properly allocated
- Red has a primary shard that could not be allocated
- For example, a new index was created when the disk capacity of the server exceeded 85%
3. CRUD & batch operation of documents
3.1 CRUD of documents
- The Type name convention uses _doc
- Create Will fail if the ID already exists
- Index If ID does not exist create a new document otherwise delete the existing document before creating a new document version will increase
- The Update document must already exist and only incremental changes are made to the response field
3.1.1 Index
PUT my_test_index/_doc/1
{
"user":"mike"."comment":"You know,for search"
}
Copy the code
- Index differs from Create in that if the document does not already exist, the new document is indexed. Otherwise, existing documents are deleted and new documents are indexed. Version information +1
3.1.2 the Create
PUT my_test_index/_create/1
{
"user":"mike"."comment":"You know,for search"} POST my_test_index/_doc (generated automatically without specifying ID) {"user":"mike"."comment":"You know,for search"
}
Copy the code
- Supports automatic generation of document ids and specified document IDS
- By calling “POST /my_test_index/_doc”
- The system automatically generates the Document ID
3.1.3 the Read
GET my_test_indx/_doc/_1
{
"took": 1."timed_out": false."_shards": {
"total": 1."successful": 1."failed": 0
},
"hits": {
"total": 1."max_score": 1."hits": [{"_index": "my_store"."_type": "products"."_id": "5"."_score": 1."_source": {
"price": 10."productName": "ZHANGSAN"."productID": "XHDK-A-1293-#fJ3"}}}]Copy the code
- Find the document, return to HTTP 200
- Document meta information
- _index/_type/
- Version information: Even if a document with the same ID is deleted, the version number will continue to increase
- The _source contains all the raw data for the document by default
- Document meta information
- Unable to find document returns HTTP 404
3.1.4 Update
POST my_test_index/_update/1
{
"doc": {"user":"mike"."comment":"You know,ElasticSearch"}}Copy the code
- The Update method does not delete the original document, but performs a true data Update
- The Post method /Payload needs to be included in “doc”
3.1.4 the Delete
DELETE my_test_index/_doc/1
Copy the code
3.2 Bulk API
- Supports operation on different indexes in a single API call
- Four types of operations are supported
- Index
- Create
- Update
- Delete
- You can specify an Index in the URI or Payload
- The failure of a single operation does not affect other operations
- The return result includes the result of each operation
3.3 Batch Reading -mget
Batch operations can reduce the cost of network connections and improve performance
GET /_mget
{
"docs": [{
"_index": "my_store"."_id": 1
},
{
"_index": "companyinfo"."_id": "cd5b8daadc31482e84715da912a604f4"} {}]"docs": [{"_index": "my_store"."_type": "products"."_id": "1"."_version": 4."found": true."_source": {
"price": 12."productID": "XHDK-A-1293-#fJ3"}}, {"_index": "companyinfo"."_type": "companyinfo"."_id": "cd5b8daadc31482e84715da912a604f4"."_version": 1."found": true."_source": {
"entName": Guangxi Golden Friend Haoyou Investment Co., LTD.."orgLogo": ""."regCapital": "RMB 5 million"."city": Guangxi Zhuang Autonomous Region."regDate": "2017-05-17"."industry": "Business Services"."taxpayerIdNo": "91450800MA5L585X6F"."creditCode": "91450800MA5L585X6F"."registrationAuthority": Guigang Market Supervision Administration."staffSize": ""."orgCode": "MA5L585X-6"."enterpriseStatus": "To be continued (in operation, in operation, on the books)"."id": "cd5b8daadc31482e84715da912a604f4"."businessRegCode": "450800000151505"."email": ""."introduction": ""."regCapitalNumber": 500."website": ""."address": 1 / F, Longsheng New Village, Jiefang North Road, Guigang City."town": ""."bossId": "4b12e1b8d1ef11-p-4b12e276d1ef1"."corporation": "Snow covered"."businessScope": "Investment in cultural industry, tourism and tourist commodities; Investment in construction; Corporate image planning, marketing planning, event planning, stage modeling planning, wedding celebration planning; Exhibition services, conference services, etiquette services, photography services; Network information technology development, consulting, transfer services; Film and television planning consulting, enterprise management consulting, investment information consulting (the above items except the special provisions of the state); Television program production services (specific projects subject to the approval of the examination and approval department); Animation design; Retail of publications (specific projects subject to the approval of the examination and approval department), indoor and outdoor decoration engineering, architectural engineering design, municipal engineering, landscape engineering design (the above projects with the credit card operation); Catering services (specific items subject to the approval of the examination and approval department); Design, production, agency, release all kinds of domestic advertising; Performance broker (subject to the specific project approved by the examination and approval department); Government procurement, bidding agency, engineering consulting, land evaluation, real estate evaluation, assets evaluation, real estate evaluation audit, project settlement.."businessTerm": "Long term"."contributedcapital": ""."checkDate": "2017-05-17"."enterpriseType": "Limited liability Company (sole natural person)"."orgNameEn": ""."taxpayerQualification": ""."telphone": ""."district": ""."sameEnterprise": "< Associated Enterprise 3>"."oldOrgName": ""."readAddress": 1 / F, Longsheng New Village, Jiefang North Road, Guigang City."contributors": ""}}}]Copy the code
3.4 Batch Query -msearch
3.5 Common Errors Are Displayed
The problem | why |
---|---|
Unable to connect | The network or cluster is faulty |
Connection cannot be closed | The network is faulty or the node is faulty |
429 | The Cluster is too Busy |
4XX | Request size error |
500 | Cluster internal error |
4. Invert indexes
4.1 Forward and inverted indexes
- The inverted index contains two parts
- Term Dictionary, which records the words of all documents and records the associations of words to inverted lists
- Word dictionaries are generally large and can be implemented using B+ trees or hash zippers for high-performance inserts and queries
- Posting List is a combination of documents that correspond to words and consists of an inverted index
- Posting Index entries
- Document ID
- Word frequency TF The number of times the word appears in a document is used for relevance scoring
- Position The Position of a word in a document that is used for phrase query.
- Offset Records the start and end positions of words to highlight them
- Posting Index entries
- Term Dictionary, which records the words of all documents and records the associations of words to inverted lists
4.2 Inverted index of ElasticSearch
- Each field in the ElasticSearch JSON document has its own inverted index
- You can specify that certain fields are not indexed
- Advantages: Saves storage space
- Disadvantages: Fields cannot be searched
5. Word segmentation
5.1 Alalysis and Analyzer
- Alalysis text analysis is the process of converting a full text into a series of words (term/token), also known as word segmentation
- Alalysis is implemented through Analyzer
- You can use ElasticSearch’s built-in parser or use a custom parser
- In addition to converting entries as data is written, the same parser is used to parse Query statements when they are matched
5.2 Composition of the Analyzer
- The Analyzer consists of three parts -Character Filters(for raw text processing, such as outgoing HTML)/Tokenizer(for word segmentation according to rules)/Token Filter(to process the word segmentation, lowercase, Delete stopWords, add synonyms)
Built-in word splitter for Elasticsearch
- Standard Analyzer – a default word Analyzer, word segmentation, lowercase processing
- Simple Analyzer one according to non-letter segmentation (symbols are filtered), lowercase processing
- Stop Analyzer a lowercase processing, Stop word filtering (the, a, is)
- Whitespace Analyzer is divided by space, not lowercase
- The Keyword Analyzer treats the input directly as the output, regardless of the word
- Patter Analyzer a regular expression, black default \W+(non-character delimited)
- Language one provides word segmentation for more than 30 common languages
- Customer Analyzer Custom word Analyzer
5.3.1 Standard Analyzer
- Default word divider
- According to the word segmentation
- Lower case processing
5.3.2 Simple Analyzer
- By non-letter shards, all non-letter shards are removed
- Lower case processing
5.3.3 Whitespace Analyzer
- Data is segmented according to Spaces
5.3.4 Stop Analyzer
- Compared with the Simple Analyzer
- Stop the filter
- Will drop the, a, is and other modifiers
5.3.5 Keyword Analyzer
- Treat an input as term output without word segmentation
5.3.6 Pattern Analyzer
- Word segmentation through regular expression
- The default is \W+, delimited by non-character symbols
5.3.7 Language Analyzers
5.4 Using the _Analyzer Api
- Specify Analyzer directly for testing
GET /_analyze
{
"analyzer": "standard"."text": "Mastering ElasticSearch,elasticsearch in Action"
}
results
{
"tokens": [{"token": "mastering"."start_offset": 0."end_offset": 9."type": "<ALPHANUM>"."position": 0
},
{
"token": "elasticsearch"."start_offset": 10."end_offset": 23."type": "<ALPHANUM>"."position": 1
},
{
"token": "elasticsearch"."start_offset": 24."end_offset": 37."type": "<ALPHANUM>"."position": 2
},
{
"token": "in"."start_offset": 38."end_offset": 40."type": "<ALPHANUM>"."position": 3
},
{
"token": "action"."start_offset": 41."end_offset": 47."type": "<ALPHANUM>"."position": 4}}]Copy the code
- Use Simple Analyzer for testing
GET /_analyze
{
"analyzer": "simple"."text": "2 run 。Maste-ring ElasticSearch,elasticsearch in Action"
}
result
{
"tokens": [{"token": "run"."start_offset": 2."end_offset": 5."type": "word"."position": 0
},
{
"token": "maste"."start_offset": 7."end_offset": 12."type": "word"."position": 1
},
{
"token": "ring"."start_offset": 13."end_offset": 17."type": "word"."position": 2
},
{
"token": "elasticsearch"."start_offset": 18."end_offset": 31."type": "word"."position": 3
},
{
"token": "elasticsearch"."start_offset": 32."end_offset": 45."type": "word"."position": 4
},
{
"token": "in"."start_offset": 46."end_offset": 48."type": "word"."position": 5
},
{
"token": "action"."start_offset": 49."end_offset": 55."type": "word"."position": 6}}]Copy the code
- Use Whitespace Analyzer for testing
GET /_analyze
{
"analyzer": "whitespace"."text": "Maste-ring ElasticSearch,elasticsearch in Action"
}
result
{
"tokens": [{"token": "Maste-ring"."start_offset": 0."end_offset": 10."type": "word"."position": 0
},
{
"token": "ElasticSearch,elasticsearch"."start_offset": 11."end_offset": 38."type": "word"."position": 1
},
{
"token": "in"."start_offset": 39."end_offset": 41."type": "word"."position": 2
},
{
"token": "Action"."start_offset": 42."end_offset": 48."type": "word"."position": 3}}]Copy the code
- Use Stop Analyzer for testing
GET /_analyze
{
"analyzer": "stop"."text": "this is a ElasticSearch,elasticsearch in Action"
}
result
{
"tokens": [{"token": "elasticsearch"."start_offset": 10."end_offset": 23."type": "word"."position": 3
},
{
"token": "elasticsearch"."start_offset": 24."end_offset": 37."type": "word"."position": 4
},
{
"token": "action"."start_offset": 41."end_offset": 47."type": "word"."position": 6}}]Copy the code
- Use Keyword Analyzer for testing
GET /_analyze
{
"analyzer": "keyword"."text": "this is a ElasticSearch,elasticsearch in Action"
}
result
{
"tokens": [{"token": "this is a ElasticSearch,elasticsearch in Action"."start_offset": 0."end_offset": 47."type": "word"."position": 0}}]Copy the code
- Use Pattern Analyzer for testing
GET /_analyze
{
"analyzer": "pattern"."text": "this is a Elastic-Search,elasticsearch in Action"
}
result
{
"tokens": [{"token": "this"."start_offset": 0."end_offset": 4."type": "word"."position": 0
},
{
"token": "is"."start_offset": 5."end_offset": 7."type": "word"."position": 1
},
{
"token": "a"."start_offset": 8."end_offset": 9."type": "word"."position": 2
},
{
"token": "elastic"."start_offset": 10."end_offset": 17."type": "word"."position": 3
},
{
"token": "search"."start_offset": 18."end_offset": 24."type": "word"."position": 4
},
{
"token": "elasticsearch"."start_offset": 25."end_offset": 38."type": "word"."position": 5
},
{
"token": "in"."start_offset": 39."end_offset": 41."type": "word"."position": 6
},
{
"token": "action"."start_offset": 42."end_offset": 48."type": "word"."position": 7}}]Copy the code
- Using Language Analyzers
GET /_analyze
{
"analyzer": "english"."text": "this is a Elastic-Search,elasticsearch in Action"
}
result
{
"tokens": [{"token": "elast"."start_offset": 10."end_offset": 17."type": "<ALPHANUM>"."position": 3
},
{
"token": "search"."start_offset": 18."end_offset": 24."type": "<ALPHANUM>"."position": 4
},
{
"token": "elasticsearch"."start_offset": 25."end_offset": 38."type": "<ALPHANUM>"."position": 5
},
{
"token": "action"."start_offset": 42."end_offset": 48."type": "<ALPHANUM>"."position": 7}}]Copy the code
- Specifies the field of the index to test
POST my_store/_analyze
{
"field": "productName"."text": "XHDK-A-1293-#fJ3"
}
result
{
"tokens": [{"token": "xhdk"."start_offset": 0."end_offset": 4."type": "<ALPHANUM>"."position": 0
},
{
"token": "a"."start_offset": 5."end_offset": 6."type": "<ALPHANUM>"."position": 1
},
{
"token": "1293"."start_offset": 7."end_offset": 11."type": "<NUM>"."position": 2
},
{
"token": "fj3"."start_offset": 13."end_offset": 16."type": "<ALPHANUM>"."position": 3}}]Copy the code
- Custom word dividers for testing
POST /_analyze
{
"tokenizer": "standard"."filter": ["lowercase"]."text":"Hello ElasticSearch"
}
result
{
"tokens": [{"token": "hello"."start_offset": 0."end_offset": 5."type": "<ALPHANUM>"."position": 0
},
{
"token": "elasticsearch"."start_offset": 6."end_offset": 19."type": "<ALPHANUM>"."position": 1}}]Copy the code
5.5 Difficulties in Chinese word segmentation
- Chinese sentences, cut into one word instead of one word
- In English, words are separated by natural Spaces
- A Chinese sentence has different meanings in different contexts
- This apple is not very good/this apple is not very good!
- Some examples
- There is a point in what he says
- Use the default ElasticSearch word splitter for Chinese word segmentation
GET /_analyze
{
"analyzer": "standard"."text": "It's not certain."
}
result
{
"tokens": [{"token": "This"."start_offset": 0."end_offset": 1."type": "<IDEOGRAPHIC>"."position": 0
},
{
"token": "Things"."start_offset": 1."end_offset": 2."type": "<IDEOGRAPHIC>"."position": 1
},
{
"token": "Really"."start_offset": 2."end_offset": 3."type": "<IDEOGRAPHIC>"."position": 2
},
{
"token": "Set"."start_offset": 3."end_offset": 4."type": "<IDEOGRAPHIC>"."position": 3
},
{
"token": "No"."start_offset": 4."end_offset": 5."type": "<IDEOGRAPHIC>"."position": 4
},
{
"token": "Under"."start_offset": 5."end_offset": 6."type": "<IDEOGRAPHIC>"."position": 5
},
{
"token": "To"."start_offset": 6."end_offset": 7."type": "<IDEOGRAPHIC>"."position": 6}}]Copy the code
5.6 Chinese word segmentation IK
5.6.1 Basic use of IK word divider
IK word segmentation GitHub official document address
5.6.2 Ik_MAX_word segmentation parsing
GET /_analyze
{
"analyzer": "ik_max_word"."text": "It's not certain."
}
result
{
"tokens": [{"token": "This thing"."start_offset": 0."end_offset": 2."type": "CN_WORD"."position": 0
},
{
"token": "Sure"."start_offset": 2."end_offset": 4."type": "CN_WORD"."position": 1
},
{
"token": "Not coming down."."start_offset": 4."end_offset": 7."type": "CN_WORD"."position": 2
},
{
"token": "不下"."start_offset": 4."end_offset": 6."type": "CN_WORD"."position": 3
},
{
"token": "Down"."start_offset": 5."end_offset": 7."type": "CN_WORD"."position": 4}}]Copy the code
5.6.3 Use IK_smart for word Segmentation
GET /_analyze
{
"analyzer": "ik_smart"."text": "It's not certain."
}
result
{
"tokens": [{"token": "This thing"."start_offset": 0."end_offset": 2."type": "CN_WORD"."position": 0
},
{
"token": "Sure"."start_offset": 2."end_offset": 4."type": "CN_WORD"."position": 1
},
{
"token": "Not coming down."."start_offset": 4."end_offset": 7."type": "CN_WORD"."position": 2}}]Copy the code
5.6.4 Use word segmentation for highlighting queries
GET companyinfo/_search
{
"query" : { "match" : { "entName" : "Beijing Letter Check" }},
"highlight" : {
"pre_tags" : ["<tag1>"."<tag2>"]."post_tags" : ["</tag1>"."</tag2>"]."fields" : {
"entName": {}}}."from": 0
, "size": 1
}
result
{
"took": 2357."timed_out": false."_shards": {
"total": 5."successful": 5."failed": 0
},
"hits": {
"total": 4665655."max_score": 21.237017."hits": [{"_index": "companyinfo"."_type": "companyinfo"."_id": "5083796b34f940698d9cb0ce2984f314"."_score": 21.237017."_source": {
"id": "5083796b34f940698d9cb0ce2984f314"."bossId": "05232caa241311-p-05232d3624131"."orgLogo": "https://static.xinchacha.com/companyLogo/5083796b34f940698d9cb0ce2984f314.png?Expires=1609503319&OSSAccessKeyId=LTAI4GF jBCimq7VBgRQ5LKfq&Signature=2DppCS5yzYTZvtMT45GYqdHtjkM%3D"."entName": Beijing Xinchacha Credit Management Co., LTD.."telphone": "400-900-6808"."website": "http://www.xcc.com"."email": "[email protected]"."introduction": "The main product of Beijing Xinchacha Credit Management Co., Ltd. is credit communication and encryption protection."."readAddress": Room 516, Floor 5, Building 1, Yard 5, Longyu North Street, Changping District, Beijing."corporation": "Liu"."sameEnterprise": "< Associated Enterprise 1>"."enterpriseStatus": "Open"."regCapitalNumber": 1000."regCapital": "10 million RMB"."contributedcapital": ""."regDate": "2019-08-02"."checkDate": "2019-08-02"."creditCode": "91110114MA01LTMB1Y"."orgCode": "MA01LTMB-1"."taxpayerIdNo": "91110114MA01LTMB1Y"."taxpayerQualification": ""."businessRegCode": ""."industry": "Information Transmission, Software and Information Technology Services"."enterpriseType": "Limited liability Company (sole natural person)"."businessTerm": 2019-08-02 to unlimited term."staffSize": ""."contributors": ""."registrationAuthority": Changping Branch of Beijing Administration for Industry and Commerce."city": "Beijing"."town": "Beijing"."district": ""."oldOrgName": ""."orgNameEn": ""."address": Room 516, Floor 5, Building 1, Yard 5, Longyu North Street, Changping District, Beijing."businessScope": "Collection and evaluation of enterprise credit (excluding financial credit investigation); Software development; Computer system services; Enterprise management; Market research; Economic information consultation (excluding intermediary); Basic software services; Application software services (excluding medical software); To undertake exhibitions and exhibitions; Conference services; Technology development, technology consultation, technology exchange, technology transfer and technology popularization; Technical services; Design, production, agency, advertising; Educational consulting. (Enterprises independently choose business projects and carry out business activities in accordance with the law; For projects subject to approval according to law, business activities shall be carried out according to the approved contents after approval by relevant departments; Shall not engage in business activities of projects prohibited or restricted by the municipal industrial policies.) )"
},
"highlight": {
"entName": [
"< tag1 > Beijing < / tag1 > < tag1 > letter < / tag1 > < tag1 > check < / tag1 > credit management co., LTD."}}]}}Copy the code
5.6.5 Description of the returned result from the word segmentation
{
"tokens": [{"token": "This thing"."start_offset": 0."end_offset": 2."type": "CN_WORD"."position": 0}]} token: specific content start_offset: start position end_offset: end position Type: type position: position (subscript)Copy the code
6 Search API
- URI Search
- Use query parameters in the URL
- Request Body Search
- More complete Query Domain Specific Language(DSL) based on JSON format with ElasticSearch
6.1 the URI query
- Use “q” to specify the query string
- “Query String syntax”,KV key value pair
6.2 Request Body Query
6.3 the Response parsing
6.3.1 Relevance Analysis
- Search is a conversation between the user and the search engine
- Users care about the relevance of search results
- Can you find all the relevant content
- How much irrelevant content is returned
- Whether the document is rated reasonably
- Balance ranking results with business requirements
6.3.2 Measuring relevance
- Information Retrieval
- Precision returns as few extraneous documents as possible
- Precision-true Positive/ All results returned (True and False Positive)
- Recall as many relevant documents as possible
- Recall -True Positive/ All Positive results that should be returned (True positives + False Negtives)
- RanKing – Whether it is possible to sort by relevance
- Precision returns as few extraneous documents as possible
Note: Refer to ElasticSearch for correlation calculation
7. The URI explanation
7.1 Searching through URI Query
GET /companyinfo/_search? Q = company & df = entName&from =0&size=1&timeout=1s
{
"profile":"true"
}
Copy the code
- Q Specifies the Query statement, using Query String Syntax
- Df Default field. If this parameter is not specified, all fields will be queried
- Sort Sort/FROM and size are used for paging
- Profile to see how queries are executed
7.2 Query String Syntax (1)
- Specify field V.s generic query
- q=title:2012 /q=2012
# # # # # # # # # # specified field for query # # # # # # # # # # # # # # # # # # # # # GET/companyinfo / _search? Q = entName: the company & the from =0&size=1&timeout=1s
{
"profile":"true"
}
result
{
"took": 2."timed_out": false."_shards": {
"total": 5."successful": 5."failed": 0
},
"hits": {
"total": 3807."max_score": 17.098007."hits": [{"_index": "companyinfo"."_type": "companyinfo"."_id": "6355d4063b5311eb925000163e350731"."_score": 17.098007."_source": {
"entName": "MEDSENTIAL,L.L.C"."orgLogo": ""."regCapital": ""."city": ""."regDate": ""."industry": ""."taxpayerIdNo": ""."creditCode": ""."registrationAuthority": ""."staffSize": ""."orgCode": ""."enterpriseStatus": ""."id": "6355d4063b5311eb925000163e350731"."businessRegCode": ""."email": ""."introduction": ""."regCapitalNumber": 0."website": ""."address": ""."town": ""."bossId": ""."corporation": "No."."businessScope": ""."businessTerm": "- up to unlimited term"."contributedcapital": ""."checkDate": ""."enterpriseType": ""."orgNameEn": ""."taxpayerQualification": ""."telphone": ""."district": ""."sameEnterprise": "< Associated enterprise >"."oldOrgName": ""."readAddress": ""."contributors": "0"}}},"profile": {
"shards": [{"id": "[3Ja7gZvNRfSLKQ4iGlsUgg][companyinfo][2]"."searches": [{"query": [{"type": "TermQuery"."description": "entName:l"."time": "0.5806670000 ms"."time_in_nanos": 580667."breakdown": {
"score": 192605."build_scorer_count": 63."match_count": 0."create_weight": 204501."next_doc": 100255."match": 0."create_weight_count": 1."next_doc_count": 996."score_count": 792."build_scorer": 81454."advance": 0."advance_count": 0}}]."rewrite_time": 1644."collector": [{"name": "CancellableCollector"."reason": "search_cancelled"."time": "0.5750680000 ms"."time_in_nanos": 575068."children": [{"name": "TimeLimitingCollector"."reason": "search_timeout"."time": "0.4419630000 ms"."time_in_nanos": 441963."children": [{"name": "SimpleTopScoreDocCollector"."reason": "search_top_hits"."time": "0.3136840000 ms"."time_in_nanos": 313684}]}]}],"aggregations": []}, {"id": "[3Ja7gZvNRfSLKQ4iGlsUgg][companyinfo][3]"."searches": [{"query": [{"type": "TermQuery"."description": "entName:l"."time": "0.6785800000 ms"."time_in_nanos": 678580."breakdown": {
"score": 209943."build_scorer_count": 61."match_count": 0."create_weight": 266078."next_doc": 107535."match": 0."create_weight_count": 1."next_doc_count": 908."score_count": 759."build_scorer": 93295."advance": 0."advance_count": 0}}]."rewrite_time": 1855."collector": [{"name": "CancellableCollector"."reason": "search_cancelled"."time": "0.6240210000 ms"."time_in_nanos": 624021."children": [{"name": "TimeLimitingCollector"."reason": "search_timeout"."time": "0.4900560000 ms"."time_in_nanos": 490056."children": [{"name": "SimpleTopScoreDocCollector"."reason": "search_top_hits"."time": "0.3464840000 ms"."time_in_nanos": 346484}]}]}],"aggregations": []}, {"id": "[3Ja7gZvNRfSLKQ4iGlsUgg][companyinfo][4]"."searches": [{"query": [{"type": "TermQuery"."description": "entName:l"."time": "0.5367190000 ms"."time_in_nanos": 536719."breakdown": {
"score": 198601."build_scorer_count": 40."match_count": 0."create_weight": 167458."next_doc": 110958."match": 0."create_weight_count": 1."next_doc_count": 878."score_count": 742."build_scorer": 58041."advance": 0."advance_count": 0}}]."rewrite_time": 6874."collector": [{"name": "CancellableCollector"."reason": "search_cancelled"."time": "0.5998340000 ms"."time_in_nanos": 599834."children": [{"name": "TimeLimitingCollector"."reason": "search_timeout"."time": "0.4730040000 ms"."time_in_nanos": 473004."children": [{"name": "SimpleTopScoreDocCollector"."reason": "search_top_hits"."time": "0.3354350000 ms"."time_in_nanos": 335435}]}]}],"aggregations": []}, {"id": "[wQwmOosAQjSSL7-qjOg7Pw][companyinfo][0]"."searches": [{"query": [{"type": "TermQuery"."description": "entName:l"."time": "0.5707300000 ms"."time_in_nanos": 570730."breakdown": {
"score": 193154."build_scorer_count": 60."match_count": 0."create_weight": 206270."next_doc": 92610."match": 0."create_weight_count": 1."next_doc_count": 993."score_count": 744."build_scorer": 76898."advance": 0."advance_count": 0}}]."rewrite_time": 1685."collector": [{"name": "CancellableCollector"."reason": "search_cancelled"."time": "0.7637310000 ms"."time_in_nanos": 763731."children": [{"name": "TimeLimitingCollector"."reason": "search_timeout"."time": "0.6415250000 ms"."time_in_nanos": 641525."children": [{"name": "SimpleTopScoreDocCollector"."reason": "search_top_hits"."time": "0.3118390000 ms"."time_in_nanos": 311839}]}]}],"aggregations": []}, {"id": "[wQwmOosAQjSSL7-qjOg7Pw][companyinfo][1]"."searches": [{"query": [{"type": "TermQuery"."description": "entName:l"."time": "0.6177530000 ms"."time_in_nanos": 617753."breakdown": {
"score": 215893."build_scorer_count": 55."match_count": 0."create_weight": 165227."next_doc": 107813."match": 0."create_weight_count": 1."next_doc_count": 980."score_count": 770."build_scorer": 127014."advance": 0."advance_count": 0}}]."rewrite_time": 1355."collector": [{"name": "CancellableCollector"."reason": "search_cancelled"."time": "0.9852530000 ms"."time_in_nanos": 985253."children": [{"name": "TimeLimitingCollector"."reason": "search_timeout"."time": "0.8321480000 ms"."time_in_nanos": 832148."children": [{"name": "SimpleTopScoreDocCollector"."reason": "search_top_hits"."time": "0.3572100000 ms"."time_in_nanos": 357210}]}]}],"aggregations": []}]}}Copy the code
- Term v.s Phrase (PhraseQuery)
- Beautiful Mind is equivalent to Beautiful OR Mind
- “Beautiful Mind”, equivalent to Beautiful AND Mind. The Phrase query also requires the same sequence
GET /companyinfo/_search? q=entName:ALM INTERNATIONAL {"profile":"true"
}
Copy the code
-
Grouping and quotation marks
- title:(Beautiful AND Mind)
- title=”Beautiful Mind”
-
grouping
GET /companyinfo/_search? q=entName:(ALM INTERNATIONAL) {"profile":"true"
}
Copy the code
- quotes
GET /companyinfo/_search? q=entName:"ALM INTERNATIONAL"
{
"profile":"true"
}
Copy the code
7.3 Query String Syntax (2)
- Boolean operations
- AND/OR/NOT OR && / | | /!
- title:(matrix NOT reloaded)
- grouping
- + said must
- Said must_not
- title:(+matrix -reloaded)
AND example
GET /companyinfo/_search? q=entName:(ALM AND INTERNATIONAL) {"profile":"true"
}
Copy the code
OR Operation Example
GET /companyinfo/_search? q=entName:(ALM OR INTERNATIONAL) {"profile":"true"
}
Copy the code
NOT Operation Example
GET /companyinfo/_search? q=entName:(ALM NOT INTERNATIONAL) {"profile":"true"
}
Copy the code
+ Operation Example
GET /companyinfo/_search? q=entName:(ALM %2BINTERNATIONAL) {"profile":"true"
}
Copy the code
7.4 Query String Syntax (3)
- Range queries
- Interval indicates :[] closed interval, {} open interval
- year:{2019 TO 2018}
- year:[* TO 2018]
- Interval indicates :[] closed interval, {} open interval
- Math symbols
- year:>2010
- year:(>2010&&<=2018)
- year:(+>2010+<=2018)
GET /companyinfo/_search? q=regCapitalNumber:[* TO2018]
{
"profile":"true"
}
Copy the code
7.5 Query String Syntax (4)
- Wildcard query (Wildcard query is inefficient and occupies large memory. Therefore, it is not recommended to use wildcard query, especially in the first place)
- ? The value contains 1 character, and * represents 0 or more characters
- title:mi? d
- title:be*
- ? The value contains 1 character, and * represents 0 or more characters
- Regular expression
- title:[bt]oy
- Fuzzy matching and approximate query
- title:but~1
- title:”but”~2
Wildcard query example
GET /companyinfo/_search? q=entName:b*&from=0&size=1&timeout=1s
{
"profile":"true"
}
Copy the code
Fuzzy matching query example
GET /companyinfo/_search? q=entName:b~1&from=0&size=1&timeout=1s
{
"profile":"true"
}
Copy the code
Example of approximation matching
GET /companyinfo/_search? q=entName:"B"~2
{
"profile":"true"
}
Copy the code
8 Introduction to the Request Body & Query DSL
8.1 the Request Body Search
- Send the query to ElasticSearch via the HTTP Request Body
- Query DSL
POST /my_test_index,my_store/_search? ignore_unavailable=true
{
"profile":true."query": {
"match_all": {}}}Copy the code
8.1.1 paging
POST /my_store/_search
{
"from": 0
, "size": 20
, "query": {
"match_all": {}}}Copy the code
- From starts at 0 and returns 10 results by default
- The higher the cost of turning pages later in the fetch
8.1.2 sorting
GET /my_store/_search
{
"sort": [{"price": "desc"}]."from": 0."size": 20."query": {
"match_all": {}}}Copy the code
- It is best to sort by “number” and “date” fields
- Because for sorting multi-value types or parsed fields, the system will pick a value that is not known
8.1.3 _source filtering
GET /my_store/_search
{
"_source": ["price"."productAge"]."from": 0."size": 20."query": {
"match_all": {}}}Copy the code
- If _source is not stored, only the metadata of the matching document is returned
- _source Wildcard character _source[“name*”,”desc*”]
8.1.4 Script Fields
GET my_store/_search
{
"script_fields": {
"new_field": {
"script": {
"lang": "painless"."source":"doc['productName'].value+'hello'"}}},"query": {
"match_all": {}}}Copy the code
8.1.5 Using the query expression -match
GET /my_store/_search
{
"query": {
"match": {
"productID": 30
}
}
}
GET /my_store/_search
{
"query": {
"match": {
"productName": {"query": "ZHANGSAN"."operator": "and"}}}}Copy the code
8.1.6 Phrase Search -match Phrase
GET my_store/_search
{
"query": {
"match_phrase": {
"content": {
"query": "wang san"."slop":1
}
}
}
}
result
{
"took": 0."timed_out": false."_shards": {
"total": 1."successful": 1."failed": 0
},
"hits": {
"total": 1."max_score": 1.0942618."hits": [{"_index": "my_store"."_type": "products"."_id": "AXdIzcDtomOanSvnaKZX"."_score": 1.0942618."_source": {
"content": "my name is wang san"}}]}}Copy the code
9 Query String and Simple Query String
9.1 the Query String of the Query
- Similar to the URI Query
POST my_store/_search
{
"query": {
"query_string": {
"default_field": "content"."query": "my name is"
}
}
}
POST my_store/_search
{
"query": {
"query_string": {
"fields": ["content"."productName"]."query": "(my name is) OR (ZHANGSAN)"}}}Copy the code
9.2 Simple Query String Query
- Similar to Query String, but ignores incorrect syntax and supports only partial Query syntax
- AND OR NOT is NOT supported AND is treated as a string
- The default relationship between terms is OR, and Operator can be specified
- Support partial logic
-
- Replace the AND
- | replace the OR
- – replace the not
-
POST my_store/_search
{
"query": {
"simple_query_string": {
"query": "my name is"."fields": ["content"]."default_operator": "AND"}}}Copy the code
10 Dynamic Mapping and common field types
10.1 What is Mapping
- Mapping is similar to the definition of schema in a database
- Define the field types in the index
- Define the data types of fields, such as strings, numbers, booleans…
- The relevant configuration of fields, inverted indexes, (Analyzed or Not Analyzed,Analyzer)
- Mapping maps JSON documents into the flat format that Lucene needs
- A Mapping belongs to the Type of an index
- Each document belongs to a Type
- A Type has a Mapping definition
- 7.0 From now on, you do not need to specify type information in the Mapping definition
10.2 Data type of the field
- A simple type
- Text/Keyword
- Date
- Integer/Floating
- Boolean
- IPv4&IPv6
- The complex type
- Object type/nested type
- Special type
- geo_point&geo_shape/percolator
10.3 What is Dynamic Mapping
- When a document is written, an index is automatically created if it does not exist
- The Dynamic Mapping mechanism eliminates the need to manually define Mappings. ElasticSearch automatically calculates the field type based on the document information
- But sometimes the calculations are wrong, such as geographical location information
- When the type is not set correctly, some functions, such as the Range query, will not work properly
- Check my_store mapping information
{
GET my_store/_mapping
"embranchment_v1": {
"mappings": {
"embranchment_v1": {
"_all": {
"enabled": false
},
"date_detection": false."properties": {
"companyId": {
"type": "keyword"
},
"embranchmentName": {
"type": "text"."fields": {
"keyword": {
"type": "keyword"."ignore_above": 256}}},"id": {
"type": "text"."fields": {
"keyword": {
"type": "keyword"."ignore_above": 256}}},"principal": {
"type": "text"."fields": {
"keyword": {
"type": "keyword"."ignore_above": 256}}},"regDate": {
"type": "text"
},
"relation": {
"type": "text"."fields": {
"keyword": {
"type": "keyword"."ignore_above": 256}}},"status": {
"type": "keyword"}}},"_default_": {
"_all": {
"enabled": false
}
}
}
}
}
Copy the code
10.4 Automatic Type Identification
JSON type | ElasticSearch type |
---|---|
string | – Match the date format, – set to date Sets the number to float or long, which is off by default – set to Text, and add the keyWord field |
Boolean value | boolean |
floating-point | float |
The integer | long |
object | Object |
An array of | Is determined by the type of the first non-null value |
A null value | ignore |
## 10.5 Can I change the field type of the Mapping |
- Two cases
- Newly added field
- When Dynamic is set to true, the Mapping is updated as soon as documents with new fields are written
- If Dynamic is set to false, the Mapping is not updated and the new field data cannot be indexed but the information will appear in _source
- Dynamic is set to Strict, document writing fails
- Once data has been written to an existing field, the field definition cannot be changed
- Lucene’s implementation of inverted indexes, once generated, is not allowed to change
- If you want to change the field type, you must Reindex the API to rebuild the index
- Newly added field
- why
- If the data type of a field is changed, indexed data cannot be searched
- But adding new fields does not have the same effect
10.6 Controlling Dynamic Mappings
- When dynamic is set to false, data is written to new fields, which can be indexed but new fields are discarded
- When set to Strict mode, data write directly fails
true | false | strict | |
---|---|---|---|
Indexable document | YES | YES | NO |
Field indexability | YES | NO | NO |
The Mapping is updated | YES | NO | NO |
PUT my_store
{
"mappings": {"_doc": {"dynamic":"false"}}}Copy the code
11 Display Mapping Settings and common parameters
11.1 Suggestions for Customizing the Mapping
- You can refer to the API manual and write it by hand
- In order to reduce the input workload and reduce the probability of error, you can follow the following steps
- Create a temporary index and write some sample numbers
- Get the dynamic Mapping definition for this temporary file by accessing the Mapping API
- After modification, use this configuration to create your index
- Drop temporary index
11.2 Control whether the current field is indexed
- Index Controls whether the current field is indexed. The default is true. If set to false, the field is not searchable
- You can avoid creating inverted indexes and save disk overhead
11.3 the index Options
- There are four different levels of index Options that allow you to invert the contents of index records
- Docs Record doc ID
- Freqs records the DOC ID and term Frequencies
- Positions record doc ID /term Comb /term position
- offset dic id/term frequencies/term posistion/character offects
- The default value of the Text type is postions, and the default value of the other types is docs
- More records occupy more storage space
11.4 null_value
- A Null search is required
- Only the KeyWord type supports Null_Value
11.5 copy_to set
- _all was replaced by copt_to in 7
- Meet some specific search requirements
- Copy_to copies the value of the field to the target field, similar to _all
- The target field of copy_to does not appear in _source
11.6 Array Types
- ElasticSearh does not provide specific array types. But any field can contain multiple values of the same type
12 Configure custom Analyzer in Multi-field Feature and Mapping
12.1 Multi-field type
- Accurate matching of manufacturer names
- Add a keyword field
- Use a different Analyzer
- Different languages
- Pinyin field search
- It also supports specifying different Analyzers for search and index
12.2 Exact Values v.s Full Text
- Exact Values v.s Full Text
- Exact Value: contains numbers/dates/a specific string (for example, “Apple Store”).
- The KeyWord in ElasticSearch
- Full-text, unstructured text data
- The text in ElasticSearch
- Exact Value: contains numbers/dates/a specific string (for example, “Apple Store”).
12.3 Exact Values are not needed
- ElasticSearch creates an inverted index for each field
- Exact Value does not need special word segmentation when indexing
12.4 Customizing Participles
- When the word splitter of ElassticSearch belt cannot be satisfied, a custom word splitter can be realized by self-combining different components
- Character Filter
- Tokenizer
- Token Filter
12.4.1 Character Filter
- Processing of text prior to Tokenizer, such as adding delete and replace characters. Multiple Character Filters can be configured. Affects the Tokenizer’s postion and offset information
- Some built-in Character Filters
- HTML Stricp – To remove HTML tags
- Mapping String Replacement
- Pattern replace Indicates the regular match replacement
12.4.2 Tokenizer
- The original text is segmented into terms or tokens according to certain rules.
- ElasticSearch built-in Tokenizer
- whitespace/tandard/uax_url_email/pattern/keyword/path hierarchy
- You can develop plug-ins in Java to implement your own Tokenizer
12.4.3 Token Filters
- Add, modify, and delete words from the Tokenizer output
- Built-in Token Filters
- Lowercase/stop/synonym
13 Index Template and Dynamic Template
13.1 What is Index Template?
- The Index Template helps you set Mappings and Settings and automatically match them to newly created indexes according to certain rules
- Templates are only useful when an index is newly created. Modifying a template does not affect indexes that have been created
- You can merge multiple index templates, and these Settings are merged
- You can merge the process by specifying the values of “order”
13.2 How Index Template works
- When an index is newly created
- Apply ElasticSearch default setting and Mappings
- Apply the Settings in Index Template with low order
- If the Index Template with the order high is applied, the previous setting will be overwritten
- Setting and Mappings specified by the user when creating the index are applied and override the Settings in the previous template
13.3 What is Dynamic Template
- Set the data type dynamically based on the data type identified by ElasticSearch, along with the field name
- All pay-only types are set to keyword, or the keyword field is turned off
- Fields starting with is are set to false
- Everything starting with long_ is set to long
- The Dynamic Template is defined in a Mapping of an index
- Template has a name
- The matching rule is an array
- Set the Mapping to match the field
14 Overview of ElasticSearch Aggregation Analysis
14.1 What is Aggregation?
- ElasticSearch provides statistical analysis of ES data in addition to search
- Real time high
- Hadoop(T+1)
- By aggregating, we get an overview of the data, analyzing and summarizing the whole set of data rather than looking for individual documents
- The number of rooms in an area
- Different price range, number of hotels available for booking
- High performance, you can get analysis results from ElasticSearch with only one statement
- You don’t need to implement the analysis logic yourself on the client side
Polymerization analysis of Kibana
14.2 Classification of sets
- Bucket Aggregation A collection of columns of documents that meet specific criteria
- Metric Aggregation A mathematical operation that provides statistical analysis of document fields
- Pipeline Aggregation reaggregates the other Aggregation results
- Matrix Aggregration supports operations on multiple fields and provides a result Matrix
14.2.1 Bucket & Metric
- Metric A series of statistical methods
- Bucket is a set of documents that meet the criteria
14.2.1.1 Bucket
- Some examples
- Hangzhou belongs to Zhejiang/an actor belongs to male or female
- Nested relationship – Hangzhou belongs to Zhejiang belongs to China belongs to Asia
- ElasticSearch provides many types of buckets to help you partition documents in a variety of ways
- Term & Range(time/age/geographical location)
Aggregate the number of enterprises by region
GET companyinfo/_search
{
"size":0."aggs": {"flight_dest": {"terms": {
"field":"city"}}}} {"took": 6485."timed_out": false."_shards": {
"total": 5."successful": 5."failed": 0
},
"hits": {
"total": 157377743."max_score": 0."hits": []},"aggregations": {
"flight_dest": {
"doc_count_error_upper_bound": 1243691."sum_other_doc_count": 97419767."buckets": [{"key": ""."doc_count": 25140413
},
{
"key": "Guangdong province"."doc_count": 6211429
},
{
"key": Jiangsu Province."doc_count": 4535255
},
{
"key": "Shandong Province"."doc_count": 4360040
},
{
"key": "Beijing"."doc_count": 4061179
},
{
"key": "Shanghai"."doc_count": 3813461
},
{
"key": Zhejiang Province."doc_count": 3713236
},
{
"key": Sichuan Province."doc_count": 2834499
},
{
"key": Henan Province."doc_count": 2747657
},
{
"key": "Hebei Province"."doc_count": 2540804}]}}}Copy the code
14.2.1.2 joined the Metrics
Aggregate enterprises according to the city and take out the maximum and minimum registered capital
GET companyinfo/_search
{
"size":0."aggs": {"flight_dest": {"terms": {
"field":"city"}},"max_price": {"max": {
"field": "regCapitalNumber"}},"min_price": {"min": {
"field": "regCapitalNumber"}}}} {"took": 26141."timed_out": false."_shards": {
"total": 5."successful": 5."failed": 0
},
"hits": {
"total": 157377743."max_score": 0."hits": []},"aggregations": {
"max_price": {
"value": 18944818812
},
"min_price": {
"value": 0
},
"flight_dest": {
"doc_count_error_upper_bound": 1243691."sum_other_doc_count": 97419767."buckets": [{"key": ""."doc_count": 25140413
},
{
"key": "Guangdong province"."doc_count": 6211429
},
{
"key": Jiangsu Province."doc_count": 4535255
},
{
"key": "Shandong Province"."doc_count": 4360040
},
{
"key": "Beijing"."doc_count": 4061179
},
{
"key": "Shanghai"."doc_count": 3813461
},
{
"key": Zhejiang Province."doc_count": 3713236
},
{
"key": Sichuan Province."doc_count": 2834499
},
{
"key": Henan Province."doc_count": 2747657
},
{
"key": "Hebei Province"."doc_count": 2540804}]}}}Copy the code