preface
ElasticSearch gives us out-of-the-box features. ElasticSearch works for us without worrying about the low-level details. However, understanding ElasticSearch at a shallow level is definitely not enough. If you want to do more than just use ElasticSearch, I recommend you read on.
In this article, you will answer the following questions.
- The new
ElasticSearch
How did you join the cluster? ElasticSearch
How does the cluster determine whether a node is alive or not?- How are documents distributed to a particular shard?
ElasticSearch Cluster working principle
Node discovery
If a new ElasticSearch node is started when an ElaticSearch cluster already exists on the network, it is automatically added to the ElasticSearch cluster. So how did this node get added to the cluster?
Let me give you an example. Now that there is a master node and a data node in the network, what happens when the network starts a ElasticSearch node again? As shown in the figure below.
The newly launched ElasticSearch will perform a multicast operation. This node sends a ping request to all the machines in the network. In most cases, the machine may not be ElasticSearch’s machine and will not be returned. However, when a request is sent to the master node of ElasticSearch, the relevant response is returned. Once the new ElasticSearch cluster name is determined to be the same, the master node records the machine on which the new ElasticSearch cluster name is located and performs the related data transfer. This node will then become a new node in the ElasticSearch cluster.
Node probe
This is how the cluster works when a new ElasticSearch node is added. What if a node in a working cluster is down? How does the ElasticSearch cluster sense when a node is down, cull the broken node, and move the data?
There are already three ElasticSearch nodes. As shown in the figure below.
To ensure the normal operation of the cluster, the master node will perform probing work periodically. Ping other known ElaticSearch nodes at regular intervals.
If one of the nodes goes down at this point, the master node will not be able to receive a response. At this point, the master node will promote the original replica shard to master shard. Example Set the cluster to Yellow. Indicates that some data is lost, but all data exists in the current cluster. The search service continues, but the performance of the cluster is not guaranteed.
Document distribution
How does ElasticSearch distribute a new document when indexing it?
Again, the example above. There are three nodes in the cluster, and three shards of an index are evenly distributed over all three nodes. If a new document with id 2 is indexed, which node will it be routed to?
In fact, the distribution of documents follows this formula.
shard_num = hash(_routing) % num_primary_shards
Copy the code
Let’s assume that the new document is still 2 after the hash, so it will end up on the shard with shard 2, as shown below.
summary
ElasticSearch can be configured with many other discovery policies by default. Document distribution can also use the _routing field to specify which value to use as an input parameter to the hash function. I’ll talk about that next time.
About writing
From now on, I will write an article here every day, with no limit on subject matter, content or word count. Try to put your daily thoughts into it.
If this article has helped you, give it a thumbs up and even better follow it.
If none of these are available, write down what you want to say when you finish reading? Effective feedback and your encouragement are the biggest help to me.
And I’m going to pick up my blog. Welcome to visit and eat watermelon.
I’m shane. Today is September 8, 2019. Forty-sixth day of the hundred day Writing project, 46/100.