Elasticsearch
Elasticsearch supports complex distributed mechanicsTransparent hiding feature
Elasticsearch is a distributed system designed to handle large amounts of data
- Hiding complex distributed mechanisms
-
Sharding mechanism (we inserted some documents into the ES cluster randomly, did we care how the data is sharded, which shard the data goes to)
-
Cluster Discovery (Cluster discovery mechanism, we did the experiment of changing the cluster status from yellow to green before, and directly started the second ES process, which automatically discovered the cluster as a node, joined it, and accepted part of the data. Up shard)
-
Shard load balancing (for example, if there are three nodes and a total of 25 Shards are to be allocated to three nodes, ES will automatically distribute them evenly to maintain balanced read and write load requests for each node)
-
Shard copy, request routing, cluster expansion, Shard redistribution
-
Elasticsearch vertical and horizontal expansion
Vertical scaling: Purchasing more powerful servers is very expensive, and there will be bottlenecks. Suppose the most powerful server in the world is 10TB, but when your total data volume reaches 5000TB, how many of the most powerful servers will you purchase
Horizontal expansion: The industry is often used to purchase more and more common servers, the performance is relatively ordinary, but many common servers organized together, can form a powerful computing and storage capacity
Common server: 1T, 1 set of 10,000, 1 million TONS of data, capacity expansion cost of 1 million Strong server: 10T, 1 set of 500,000, 1 million tons of data, capacity expansion cost of 5 million
Increased transparency to applications
Rebalance when adding or removing nodes
This is illustrated in the picture
- Maintain load Balancing
The master node
This is illustrated in the picture
- Create or drop indexes
- Add or delete a node
Distributed architecture with equal nodes
Node peer means that each server is available as a receiving server for requests
Example: We request a document, we will request to send a 5 server, server is don’t have the figures, but 5 5 server will be asked to transfer to no. 2, no. 2 on the data, the data back to the no. 5, 5 to get the data, send data to the requester, for the requester, he did not know that is 5, second-hand is no. 2, It just feels like I sent a request to Elasticsearch, got the response, and got the required data
- Node peer, each node can receive all requests
- Automatic request routing
- The response to collect
Elasticsearch (10)
The shard&replica mechanism is combed again
-
Index contains multiple shards
-
Each shard is a minimal unit of work, carrying partial data, Lucene instances, complete indexing and request processing capabilities (important)
-
When nodes are added or removed, the SHard automatically balances load among nodes
-
Primary shard and Replica Shard, each document must only exist in one primary shard and its corresponding replica shard, and cannot exist in multiple primary shards
-
Replica Shard is a copy of the Primary Shard. It is responsible for fault tolerance and load of read requests
-
The number of primary shards is fixed when the index is created. The number of replica shards can be modified at any time. We can add one or more replica shards to each primary shard
-
The default number of primary shards is 5, and the default replica is 1. By default, there are 10 shards, 5 primary shards and 5 replica shards
-
The primary shard cannot be placed on the same node as the replica shard of one’s own replica shard (otherwise, the node breaks down and both the primary shard and the replica shard are lost, which cannot be fault-tolerant). However, the primary shard can be placed on the same node as the replica shard of another Primary shard
What does an index look like in a single node environment
- In a single node environment, create an index. There are three primary shards and three replica shards
- The cluster status is yellow
- At this time, only 3 primary shards will be allocated to the only node, and the other 3 replica shards cannot be allocated
- The cluster works fine, but when a node goes down, all data is lost, and the cluster is unavailable to handle any requests
PUT /test_index
{
"settings" : {
"number_of_shards" : 3.Set the number of primary sharps
"number_of_replicas" : 1 Set the number of replicards used in each primary shard}}Copy the code
Elasticsearch (11)
Figure how replica Shard is distributed under 2 nodes
- Replica Shard Distribution: 3 primary shards, 3 replica shards and 2 nodes
- Primary –> Replica synchronization
- Read request: primary/replica