What is it
Elasticsearch(ES) is a distributed search and analysis engine based on Json and provided by Apache Lucene for standalone search and analysis.
2. Basic grammar
2.1 Creating an Index
PUT juejin_hr_data_v1
{
"settings": {
"index": {
"routing": {
"allocation": {
"enable": "all"
}
},
"refresh_interval": "60s",
"number_of_shards": "3"
}
},
"mappings": {
"dynamic": false,
"properties": {
"week": {
"type": "keyword"
},
"team": {
"type": "keyword"
},
"school":{
"type":"integer"
},
"nowcoder":{
"type":"integer"
},
"boss":{
"type":"integer"
},
"maimai":{
"type":"integer"
}
}
}
}
Copy the code
2.2 Basic add, delete, modify and check
- increase
POST juejin_hr_data_v1/_bulk // bulk API, Batch write {" index ": {" _id" : "client_1"}} {" week ": 1," team ":" the client ", "school" : 0, "nowcoder" : 0, "boss" : 0, "maimai: 0}Copy the code
- delete
POST juejin_hr_data_v1/_delete_by_query
{
"query": {
"match_all": {}
}
}
Copy the code
- update
Overwrite PUT juejin_hr_data_v1/_doc/1 // No index specified will create {"week":1,"team":" client ","school":0,"nowcoder":0,"boss":0,"maimai":0} {"week":1,"team":" client ","school":0,"nowcoder":0,"boss":0,"maimai":0} POST juejin_hr_data_v1/1/_update {"doc":{"week":2}}Copy the code
- The query
GET juejin_hr_data_v1/_search
{
"query": {
"match_all": {}
}
}
Copy the code
Iii. Overall structure
3.1 the Cluster Cluster
- The Elasticsearch cluster consists of one or more nodes, each identified by a common cluster name.
- An instance of Elasticsearch is a Node. A machine can have multiple instances of Elasticsearch, and each instance should be deployed on a different machine.
3.1.1 Cluster Properties
The health status of an Elasticsearch cluster is determined by the Green, Yellow, and Red enumeration values
- Green: Either master shard or replicate shard is available
- Yellow: Replication shards are unavailable. Any primary shard can be used
- Red: The main shard is unavailable
When the cluster status is Red, it still provides services normally. It will execute requests in the existing surviving shards. The faulty shards need to be repaired as soon as possible to prevent the loss of query data.
Unlike Kafka and hbase, which rely on ZooKeeper, ES has its own cluster maintenance mechanism
3.1.2 Discovering Services
Es has implemented its own service discovery mechanism, called ZenDiscovery. Service discovery is a process of forming clusters from single nodes. Service discovery is triggered when a new ES node is started or when the primary node fails.
The starting point for service discovery is multiple hosts (hostname+ DNS or IP) provided by multiple host providers (Settings, file, and Cloud) and known nodes that have the primary qualification. This process is divided into two stages:
Phase 1. Each node attempts to connect to seed addresses and confirm that the connected nodes are eligible.
Phase 2. If phase 1 succeeds, the current node will share with its connected nodes all the known qualified node information of the current node, and the remote node will also respond to all the known qualified node information of the primary node. The current node finds a batch of new nodes, and continues to loop requests until the whole cluster forms a connected graph and the service discovery process is complete.
If the current node is not eligible to select the primary node, it continues the discovery process until it discovers the elected primary node. If no primary node is discovered during the discovery process, the node retries after the discovery.find_peers_interval interval (1 second by default).
If the current node is a primary node, it will continue the discovery process until it finds an elected master node or it finds enough non-primary but eligible nodes to complete an election. If neither of the two conditions is met, the system will retry after discovery.find_peers_interval.
3.1.2 to choose the main
ES is a peer-to-peer (P2P) distributed architecture design. Each node in a cluster can communicate with any other node. This is a distributed system unlike hadoop’s master-slave.
ES also has a master role, but its main function is to maintain the cluster status. When information is modified on any node, it synchronizes the change information to other nodes. That is, each node has a complete set of cluster status.
After ES 7.0, ES made some adjustments based on the Raft algorithm and used it as the master selection implementation. Raft is a distributed consensus protocol that is widely used in engineering and allows multiple nodes to reach a consensus even when some nodes fail, network delay, or brain break. This article will not cover much more than RAFT: Deep Parsing of the RAFT Distributed consistency Protocol
3.2 the node node
3.2.1 Node Attributes
-
For Elasticsearch, you can use node.master and node.data to set the node type. ES has multiple node types.
- Master node + Data Node (default)
node.master: true
node.data: true
Copy the code
A node qualifies as a master node and stores data. This is the default configuration for each Elasticsearch node
- The master node
node.master: true
node.data: false
Copy the code
Does not store data, has the master node qualification, can participate in the election, may become a true master node. Common server (CPU, memory consumption is average).
- Data nodes
node.master: false
node.data: true
Copy the code
Nodes are not eligible to become master nodes, do not participate in elections, and only store data. You need to configure several such nodes in a cluster to store data and provide storage and query services. The main consumption of disk, memory.
- Client node
node.master: false
node.data: false
Copy the code
It does not become a master node, nor does it store data, which is distributed primarily on request.
3.3 shard shard
- Sharding is the key for Elasticsearch to distribute data in clusters. Think of shards as containers for data. Documents are stored in shards, which are then distributed to nodes in the cluster. Elasticsearch will migrate shards between nodes to balance the cluster as it expands or shrinks.
- A shard can be a primary shard or replica shard.
As data storage units, fragments exist only in data nodes
A replication shard is only a copy of the master shard. It prevents data loss caused by hardware faults and provides read requests
3.2.1 Fragmented Backup
Pasting blocks outside Docs is not supported
Under an index, the primary shard is distributed as evenly as possible among each node, while the replication shard is not distributed among the same instances as the primary shard.
After Node2 goes offline, the cluster redistributes shards within a short period of time. Of course, the shards that follow the master and replicate will not be in the same Node. If Node1 continues to go offline, all primary shards converge on Node0, and the cluster health value is disconnected. Because the number of primary nodes currently available is 1 < discovery.zen.minimum_master_nodes default is 2.
If you set discovery.zen.minimum_master_nodes to 1 and start only one node, the cluster health value is yellow. In this case, all the master shards are available, there are unavailable replication shards, 5 replication shards are not allocated to the node, but the cluster is available, but all operations fall on the master shard, and may cause a single point of failure.
3.2.2 Index Writing Process
- Each node in ES cluster knows the location of documents in the cluster through routing, so each node has the ability to handle read and write requests.
- After a write request is sent to a node, the node is the coordination node. The coordination node calculates which shard to write to according to the routing formula, and then forwards the request to the main shard node of the shard. Assuming shard = hash(_routing) % num_of_pshard, the process is as follows:
- The client sends a write request to ES1 node, and the value is 0 according to the route calculation formula. Then the current data should be written to the main shard S0.
- The ES1 node forwards the request to ES3, the node where the main shard of S0 resides. ES3 receives the request and writes to it.
- Data is concurrently copied to two replica shards R0, where data conflicts are controlled through optimistic concurrency. Once all replica shards report success, the node ES3 reports success to the coordinating node, and the coordinating node reports success to the client.
By default, the doc _id is used as the _routing value. You can also manually specify a routing string. For example, in the case of articles, you can specify the article label as the routing value. Specifying routing directly reduces the amount of queries. (By default, the ES query queries each shard before merging.)
Section 3.3 the segment
3.3.1 Immutable Indexes
-
The inverted index written to disk is immutable and has the following advantages:
- No locks required. Because indexes are immutable, you don’t have to worry about inconsistencies when multiple requests are used
- Once the index is read into the kernel’s file system cache, it stays there. Because of its immutability, as long as there is enough space in the file system cache, most read requests go directly to memory and do not hit disk. This provides a significant performance boost.
- Other caches remain valid for the lifetime of the index and do not need to be rebuilt every time the data changes because the data does not change.
-
Writing a single large inverted index compresses data, reduces disk IO and the amount of memory needed to cache the index.
-
Also, because indexes are immutable, there are some problems when editing old data:
- When old data is deleted, the old data is not deleted immediately
.del
The file is marked for deletion. Old data can only be removed when the segment is updated, which wastes a lot of space. - If there is a data update frequently, each update is new new mark old, there will be a lot of space waste.
- Each time data is added, a new segment is required to store the data. When the number of segments is too high, the consumption of server resources such as file handles can be very high.
- The inclusion of all result sets in the results of the query and the need to exclude old data removed by tags adds to the query burden.
- When old data is deleted, the old data is not deleted immediately
Therefore, in order to ensure the efficiency of index invariability and avoid the problems caused by it as much as possible, introduce Segment.
3.3.2 rainfall distribution on 10-12 piecewise
- An index file is divided into multiple subfiles. Each subfile is called a segment, and each segment itself is an inverted index. The segment is immutable and cannot be modified once the data in the index is written to the hard disk.
3.3.2 Update of section
- New, section new is easy to handle, because the data is new, so you only need to add a section to the current document.
- Delete, because it cannot be modified, the delete operation does not remove the document from the old segment, but by adding a new one
.del
File that contains documents on segments that have been deleted. When a document is deleted, it is really just marked as deleted in the.del file, which still matches the query, but is removed from the result before it is eventually returned
- Update, cannot modify the old section to reflect the update to the document, in fact, update is the same as delete and add these two actions. Will put the old document in
.del
Tags are removed from the file, and the new version of the document is indexed into a new segment. It is possible that both versions of a document will be matched by a query, but the deleted older version will be removed before the result set is returned.
3.3.3 Merging of segments
-
Because of the special handling of section addition, deletion, and update, too many sections can be created. In addition to consuming file handles, memory and CPU resources, too many segments will slow down the search time because of too many segments retrieved during the search.
-
So when the number of segments is too large, the small segments are combined into one large segment. The main operations for segment merging are as follows:
- The new segment is flushed to hard disk.
- The new commit point writes the new segment and excludes the old segment.
- A new segment opens for search.
- The old segment is deleted.
3.3.4 reflesh
How does ES achieve near-real-time full-text search?
- Disks are the bottleneck. The operation of submitting a new segment to the disk is expensive, which seriously affects the performance. When a large amount of data is written, ES stops and the query fails to respond quickly.
- So instead of the persistence process being triggered every time a document is indexed, there needs to be a more lightweight way to make new documents searchable. To improve write performance, ES uses a write delay policy instead of adding a segment to disk for every new piece of data.
- Whenever there is new data, it is first written to the memory, is between memory and disk file system cache, when reaching the default time (1 second) or data reaches a certain amount of memory, will trigger a Refresh (Refresh), will be generated data in memory to a new period and the cache to the file cache system, It is later flushed to disk and a commit point is generated.
- New data will continue to be written to memory, but the data in memory is not stored in segments and therefore cannot be retrieved. Flushing from memory to the file caching system generates a new segment and opens it for search, rather than waiting to be flushed to disk.
- In Elasticsearch, the lightweight process of writing and opening a new segment is reflesh. By default, each shard automatically refreshes once per second.
3.3.5 flush
Although near-real-time searches are obtained through periodic Refresh, Refresh simply moves the data to the file caching system and does not persist the data. To avoid data loss, Elasticsearch added Translog, a transaction log that records all data that has not yet been persisted to disk. The flush process is as follows:
- When a document is indexed, it is added to the memory cache, as well as to the transaction log. New documents are constantly being written to memory, as well as to the transaction log. At this point the new data cannot be retrieved and queried.
- When the default refresh time is reached or a certain amount of data in memory is reached, a refresh is triggered:
- The document of the memory buffer is written to the segment, but there is no fsync.
- The section is opened, making new documents searchable.
- Cache cleared
- This process continues as more documents are added to the cache and written to the log
- As new document indexes are continuously written, a full commit is performed when the log data size exceeds 512 MB or the log time exceeds 30 minutes
- All documents in the memory cache are written to the new segment
- Clear the cache
- A commit point is written to the hard disk
- The file system cache is flushed to the hard disk by fsync
- The transaction log is cleared
3.4 document doc
3.4.1 track schema
The interaction form of ES data is JSON. Doc can be done out of the box. If there is no pre-defined mapping when writing doc, each field in doc will determine the type according to the SENT JSON data. The default dynamic field mapping rules are as follows:
Json type | Es type |
---|---|
null | Will not add field |
boolean | boolean |
string | The date (byDate detection(by) double/longNumeric detection)text(sub field with keyword) |
number | float/long |
object | Object |
array | Array (Array item type depends on the type of the first non-null element) |
In addition, es supports the definition of dynamic_template to extend and modify the default rule. For example, in the following example, the default string mapping is modified, meaning that the es type is text after matching the string type:
{
"mappings": {
"dynamic_templates": [
{
"strings_as_keywords": {
"match_mapping_type": "string",
"mapping": {
"type": "text"
}
}
}
]
}
Copy the code
However, if dynamic fields are not required, it is not recommended to use DYNAMIC mapping of ES. If dynamic mapping is used improperly, it will pollute the mapping. Therefore, you can set dynamic to false to disable dynamic mapping.
Of course you can use the PUT Mapping API to pre-define the mapping structure of an index, including the field type, the parser used (text type), whether it is indexed, and so on.
Es officially recommends indexing the same field in es in different ways. For example, a value of string type can be indexed as text type for full-text retrieval, or indexed as keyword type for sorting and aggregation.
Aliases are recommended. Es is open to the expansion of mapping, but the modification of mapping is prohibited. For example, you can add a field to a mapping, but you cannot delete/modify a field. So use alias to point to the real index. This way, in scenarios where fields need to be modified, you can use the ReIndex API to rebuild the index and then use the Alias API to change the pointer, making the switch seamless.
3.4.2 metadata
Each DOC in ES has some associated metadata, as follows:
- _index, the index of the current doc
- _type: specifies the mapping type of the current doc
- _ID, doc unique identifier, index unique
- _source, doc json raw data
- _size, the length of doc
- _rounting: indicates the value of the user-defined route described above
3.5 the field field
Field is the smallest unit of data in ES. By default, ES indexes all data in each field, and each field type has its own optimized data type, for example, string types (such as text and keyword) stored in inverted indexes. Data types (such as interger, float, and so on) are indexed using BKD Tree. Using different indexing methods for different field types is one of the reasons ES is so fast.
3.4.1 Field Type
Es supports the following field types
string | text | Use the configured analyzer to process the original string and write the inversion |
---|---|---|
keyword | Directly store the field as the root into the inversion | |
number | long | Value range: -2^63 to 2^63-1 |
long | Value range :-2^31 to 2^31-1 | |
short | Value range :-32,768 to 32,767 | |
byte | Value range :-128 to 127 | |
double | A double – precision floating – point number | |
float | Single-precision floating point number | |
half_float | A 16-bit long floating point number | |
date | string | Time in string format |
number | Time in number format, typically Unix timestamp (ms/s) | |
boolean | boolean | true / false |
binary | binary | Binary data |
object | array | An array of |
range | The scope of | |
object | Json object | |
nested | Json objects with associated relationships. By default, the fields of object objects are flat | |
geo | The map coordinates | |
ip | Ipv4 or ipv6 | |
completion | Autocomplete type that uses a dictionary tree index at the bottom | |
. | . |
3.4.2 Inverted Index
Inverted index, also known as Inverted index, Placed file, or Inverted file, is an index method used to store a map of a word’s location in a document or a group of documents under a full-text search. It is the most commonly used data structure in document retrieval systems. —- from Wikipedia
The most basic index structure of ES is inverted, for example, there are three documents, respectively:
Doc 1: It is what it is
Doc 1: What is it
Doc 2: It’s a banana
The corresponding inversion is as follows:
"a": {2}
"banana": {2}
"is": {0, 1, 2}
"it": {0, 1, 2}
"what": {0, 1}
Copy the code
For es, the inverted key is the text string, and the inverted value is the doc ID list.
Rule 3.4.3 FST
FST(Finite State automaton), similar to trie tree.
Es uses the FST data structure to store the inverted TERM dictionary, as shown in the figure below.
3.4.4 skiplist jump table
Es uses skiplist to store the doc ID list of inverted values for easy doCID retrieval.
In the AND condition, a query involves the merge of multiple inverted chains. The basic merge rule is as follows, assuming that there are three inverted chains
-
The first element docId=$docId is obtained at termA
-
Set currentDocId=$docid
-
Loop advance(currentDocId) = 1 (return doc greater than or equal to currentDocId),
- Because currentDocId ==1, go ahead
- If currentDocId and returned are not equal, execute 2 and continue
- If the number of equals equals to inverted chain -1, then doCID is added to the result set, next of the current inverted chain is added to the result set as the new doCID, next of the current inverted chain is added to the result set as the new DOCID, and next of the current inverted chain is taken as the new DOCID
-
Until some inverted chain comes to the end.
3.4.5 BKD tree Index
Bkd tree is a kind of dynamically indexed data structure, which can index large multidimensional data sets efficiently and scalably. It has (1) extremely high space utilization and (2) excellent query and (3) update performance — and all three attributes remain true even with high-intensity updates.
Four,
There is a lot of content in ES. This paper split the granularity from physical/logical, and explained it from Cluster to field.
Reference Documents:
www.cnblogs.com/caoweixiong…
www.cnblogs.com/duanxz/p/52…
www.elastic.co/guide/en/el…
www.cppblog.com/mysileng/ar…