1. Description of the ElasticSearch cluster
-
Primary node (or candidate primary node)
The master node is responsible for creating indexes, deleting indexes, allocating fragments, and tracking the status of nodes in the cluster. The master node has relatively light load. Client requests can be directly sent to any node, and the corresponding node is responsible for distributing and returning the processing results.
After a node is started, Zen Discovery mechanism is adopted to find other nodes in the cluster and establish connections with them. The cluster will elect a master node from the candidate master nodes, and a cluster can only elect one master node. In some cases, due to network communication packet loss and other problems, A cluster may have multiple primary nodes, which is called “split brain”. Split brain may cause data loss. As the primary node has the highest authority, it determines when indexes can be created and how fragments can be moved. To avoid this problem, set the minimum number of candidate primary nodes to work with discovery.zen.minimum_master_nodes. You are advised to set this parameter to (candidate primary node /2) +1. For example, if there are three candidate primary nodes, set this parameter to (3/2) +1 to ensure that there are more than half of candidate primary nodes in the cluster. If there are not enough candidate master nodes, the election for master nodes will not be conducted to reduce the possibility of split brain.
Parameter Settings for the primary node:
node.master = true node.data = false Copy the code
-
Data nodes
Data nodes are responsible for data storage and CRUD operations. Data nodes have high requirements on machine configuration. First, sufficient disk space is required to store data, and data operations consume a lot of CPU, Memory, and IO performance. Often as a cluster grows, more data nodes need to be added to improve availability.
Parameter setting of data node:
node.master = false node.data = true Copy the code
-
Client node
Client nodes do not act as candidate master nodes, nor do they act as nodes of data nodes, only responsible for the distribution of requests, summary, etc., the increase of client node types is more for load balancing processing.
node.master = false node.data = false Copy the code
-
Extraction node (pre-processing node)
Can perform preprocessing pipeline, has its own independent task to execute, can do preprocessing operations on data before index data, not responsible for data storage and not responsible for cluster related transactions.
Parameter Settings:
node.ingest = true Copy the code
-
Coordinate nodes
A coordination node, which is a role and not a real Elasticsearch node, cannot be specified as a coordination node through a configuration item. Any node in the cluster can act as a coordinating node. When A node A receives the query request from the user, it will distribute the query clause to other nodes, and then combine the query results returned by each node, and finally return A complete data set to the user. In this process, node A plays the role of coordinating node.
A request from ES is very similar to a Map-Reduce operation. In ES there are also two stages, called scatter-Gather. The client sends a request to any node in the cluster. This node is called the Coordination node. It forwards the request to nodes with relevant data (the Scatter phase), which performs the request locally and returns the results to the coordination node. The coordination node aggregates these results (reduce) into a single global result set (gather stage).
-
Tribal node
Acting as a federated client between clusters, it is a special client that can connect multiple clusters and perform searches and other operations on all connected clusters. The clan node retrieves the cluster state from all connected clusters and merges it into the global cluster state. With this information, you can read and write to all the nodes in the cluster as if they were local. Note that the tribal nodes need to be able to connect to each individual node in each configured cluster.
2. ElasticSearch cluster working principle
2.1 Principle of Cluster Distribution
ES cluster can dynamically adjust the number of fragments and copies according to the number of nodes, so as to effectively balance the load of the whole cluster.
Single node:
In the state of two nodes, the number of copies is 1:
In the state of three nodes, the number of copies is 1:
In the state of three nodes, the number of copies is 2:
2.2 Fragment Processing Mechanism
When setting the fragment size, plan the capacity in advance. If the number of nodes is too large or the number of fragments is too small, new nodes cannot be fragmented and horizontal expansion cannot be achieved. In addition, the amount of data in a single fragment is too large, leading to long data redistribution.
Suppose there are two data nodes in a cluster. The sharding distribution of movie index is as follows:
PUT /orders
{
"settings": {"number_of_shards":2, // Primary node"number_of_replicas":2 // copy node}}Copy the code
There are two primary shards P0 and P1 in the whole cluster. P0 corresponds to two R0 shards, and P1 corresponds to two R1 shards.
2.3 Process for Creating an Index
-
The write request goes to the primary node, and if a NODE2 replica receives a write request, it is forwarded to the primary node.
-
After receiving the request, the master node takes modules according to documentId (if there is no external transfer of documentId, internal increment ID is used).
If the result is P0, the write request is forwarded to NODE3 for processing.
-
After the write request is processed on NODE3, data is asynchronously synchronized to NODE1 and NODE2.
2.4 Process for Reading Indexes
- The read request enters the MASTER node and is forwarded to different nodes according to the result of module fetching.
- If the modulus is R0, there will be an internal load balancing processing mechanism. If the last read request is R0 on NODE1, the current request will be forwarded to R0 on NODE2 to ensure that each node can process the request data evenly.
- If the read request is directly dropped to the replica node, the replica node will make judgment, if there is data, it will return, if there is no data will forward to other nodes for processing.
3. Plan the ElasticSearch cluster deployment
Preparing a VM:
10.10.20.28: Node-1 (Node 1), ports: 9200, 9300
10.10.20.28: Node-2 (Node 2), ports 9201, 9301
10.10.20.28: Node-3 (Node 3), ports: 9202, 9302
4. Configure the ElasticSearch cluster
-
Decompress the installation package:
cd /usr/local/ cluster tar - XVF elasticsearch 7.10.2 - Linux - x86_64. Tar. GzCopy the code
Decompress the installation package to the /usr/local/cluster directory.
-
Modifying a cluster configuration file:
vi /usr/local/ cluster/elasticsearch - 7.10.2 rac-node1 / config/elasticsearch. YmlCopy the code
10.10.20.28 configuration for the first node:
# cluster name cluster.name: my-application # node name node.name: node-1 # Bind IP addressNetwork. The host: 10.10.20.28Specify the service access port http.port: 9200 # specify API end - to - end call port transport.tcp.port: 9300 # cluster communication address discovery.seed_hosts: ["10.10.20.28:9300"."10.10.20.28:9301"."10.10.20.28:9302"] The cluster initializes the information about the nodes that can run cluster.initial_master_nodes: ["10.10.20.28:9300"."10.10.20.28:9301"."10.10.20.28:9302"] # Enable cross-domain access support. Default is false http.cors.enabled: true ## Cross-domain access allowed domain name, allow all domain names http.cors.allow-origin: "*" Copy the code
Modify directory permissions:
chown -R elsearch:elsearch /usr/local/ cluster/elasticsearch - 7.10.2 rac-node1Copy the code
-
Copy ElasticSearch installation directory:
Copy the remaining two nodes:
cd /usr/local/cluster cp -r elasticSearch-7.10.2-node1 elasticSearch-7.10.2-node2 cp -r elasticSearch-7.10.2-node1 Elasticsearch 7.10.2 -- node3Copy the code
-
Modify the configurations of other nodes:
10.10.20.28 Configuration of the Second Node:
# cluster name cluster.name: my-application # node name node.name: node-2 # Bind IP addressNetwork. The host: 10.10.20.28Specify the service access port http.port: 9201 # specify API end - to - end call port transport.tcp.port: 9301 # cluster communication address discovery.seed_hosts: ["10.10.20.28:9300"."10.10.20.28:9301"."10.10.20.28:9302"] The cluster initializes the information about the nodes that can run cluster.initial_master_nodes: ["10.10.20.28:9300"."10.10.20.28:9301"."10.10.20.28:9302"] # Enable cross-domain access support. Default is false http.cors.enabled: true ## Cross-domain access allowed domain name, allow all domain names http.cors.allow-origin: "*" Copy the code
10.10.20.28 Configuration of the Third NODE:
# cluster name cluster.name: my-application # node name node.name: node-3 # Bind IP addressNetwork. The host: 10.10.20.28Specify the service access port http.port: 9202 # specify API end - to - end call port transport.tcp.port: 9302 # cluster communication address discovery.seed_hosts: ["10.10.20.28:9300"."10.10.20.28:9301"."10.10.20.28:9302"] The cluster initializes the information about the nodes that can run cluster.initial_master_nodes: ["10.10.20.28:9300"."10.10.20.28:9301"."10.10.20.28:9302"] # Enable cross-domain access support. Default is false http.cors.enabled: true ## Cross-domain access allowed domain name, allow all domain names http.cors.allow-origin: "*" Copy the code
-
Starting a Cluster Node
First switch the elsearch user and start the service on the three nodes in sequence:
su elsearch /usr/local/ cluster/elasticsearch - 7.10.2 rac-node1 / / usr/bin/elasticsearch - dlocal/ cluster/elasticsearch - 7.10.2-2 / / usr/bin/elasticsearch - dlocal/ cluster/elasticsearch - 7.10.2 - node3 / bin/elasticsearch - dCopy the code
Note: If an error occurs during startup, clear the data directory on each node and restart the service.
-
Viewing Cluster Status
After the cluster is successfully installed and started, execute the following request: http://10.10.20.28:9200/_cat/nodes? pretty
You can see the information of three nodes, and the three nodes will elect the master node by themselves (ES is an improved implementation based on Bully election algorithm) :
5. Test ElasticSearch cluster fragments
Modify the Kibana configuration file to point to the created cluster node:
elasticsearch.hosts: ["http://10.10.20.28:9200"."http://10.10.20.28:9201"."http://10.10.20.28:9202"]
Copy the code
Restart the Kibana service and go to the console:
http://10.10.20.28:5601/app/home#/
Create index again (within number of copies) :
PUT /orders
{
"settings": {"index": {"number_of_shards": 2."number_of_replicas": 2}}}Copy the code
As you can see, this time the result is normal:
Clusters are not free to increase the number of replicas and create indexes (beyond the number of replicas) :
PUT /orders
{
"settings": {"index": {"number_of_shards": 2."number_of_replicas": 5}}}Copy the code
You can see the yellow warning error:
This article was created and shared by Mirson. For further communication, please add to QQ group 19310171 or visit www.softart.cn