The ES series is based on ElasticSearch version 6.4.x. For ElasticSearch, each index is divided into multiple shards (default: 5 primary shards per index), and each shard can have multiple copies. When a document is added or deleted (in the master shard), its corresponding replication shards must be synchronized. If we do not, we will get inconsistent results, mainly for the same document retrieval requests. The process of keeping shard copies synchronized and services read is what we call the data replication model. ElasticSearch’s data replication model is based on the master backup model. In this model, each replication group will have one master shard and the other shards are replication shards. The master shard server is the main entry point for all indexing operations (index, update, delete operations). It is responsible for verifying them and making sure they are correct. Once an index operation is accepted by the master server, the master server is also responsible for copying the operation to other replicas. The purpose of this section is to provide a high-level overview of the Elasticsearch replication model and discuss its impact on the various interactions between write and read operations.

Each index operation of ElasticSearch is first resolved to a copy group using routing, default based on document ID (routing), hash(routing) % (primary count). Once the replication group is identified, the operation is forwarded to the primary shard of the group. The master shard is responsible for validating operations and forwarding them to other replicas. Because copies can be offline, there is no need to copy to all copies. Instead, elastic search maintains a list of shard copies that should receive operations. This list is called an in-sync copy and is maintained by the master node. As the name implies, these are a set of “good” shard copies that are guaranteed to have handled all indexes and deletes that have been approved by the user. The master is responsible for maintaining this invariant, so all operations must be copied to each copy in the collection. Main sharding processing process:

  1. Verify that the request complies with Elasticsearch interface specifications. If not, reject the request.
  2. Perform operations (such as indexing, updating, or deleting a document) on the master shard. If an error occurs during execution, an error is returned.
  3. Forward the action to each copy of the current synchronized replica set. If there are multiple copies, they are executed in parallel. In-sync is a currently available, active copy.
  4. Once all replicas have successfully performed operations and responded to the master server, the master server recognizes the successful completion of the request to the client. The process for writing a request is shown below (from the Authoritative Guide to Elasticsearch) :



    Error handling:

    Many things can go wrong during indexing — disks can be corrupted, nodes can be disconnected from each other, or some configuration error can cause an operation on a copy to fail even though it was successful on the primary server. These are rare, but the main thing is to react to them.

    Alternatively, if the master server is unavailable, the node hosting the master sends a message to the master. The indexing operation will wait (at most 1 minute by default) for the Master server to promote one of the replicas to a new master node. The operation is then forwarded to the new master server for processing. Note that the master also monitors the health of the individual nodes and may decide to actively degrade the master node. This typically occurs when the node that hosts the primary node is isolated from the cluster through a network problem. ElasticSearch (master shard) : ElasticSearch (master shard) : ElasticSearch (master shard) : ElasticSearch (master shard)



    NODE1 is the master server of the cluster, and the first replication group (P0,R0,RO, whose master shard resides on NODE3) and the second replication group (P1,R1,R1, whose master shard resides on NODE1).

    Once an operation is successfully performed on the master server, the master server must ensure that the data is ultimately consistent, even if the operation cannot reach the replica (or prevent the replica response) due to a failure on the replica or a network problem. In order to avoid data inconsistencies in the data replication group (for example, in the main subdivision performed successfully, but failed to perform in one of the two copying shard), primary shard in if it is not within the specified time (the default) one minute not copy the successful response of fragmentation or receiving error response, the primary shard will send a request to the Master server, Requests that the problematic shard be removed from the synchronous copy, and when the Master shard confirms that the problematic copy has been removed, the Master instructs the problematic copy to be deleted. At the same time, the master instructs another node to start building a new shard copy to restore the system to a healthy state.

    When a master shard forwards an operation to a replica, the number of copies is first used to verify that it is still the active master node. If it is isolated due to a network partition (or long GC), it may continue processing incoming index operations and forward them to the slave server before realizing that it has been degraded. Operations from older master servers will be rejected by replicas. When the Master receives a response from the replica rejecting its request, the Master shard sends the request to the Master server, which eventually knows that it has been replaced, and subsequent operations are routed to the new Master shard server.

    What happens if there is no copy?

    This is a valid scenario, either because of configuration or because all replicas fail. In this case, the main shard handles the operation without any external validation, which might seem problematic. On the other hand, the master shard server cannot fail other shards (replicas) on its own, but asks the Master server to do so on its behalf. This means that the Master server knows that the master shard is the only available copy of the replication group. Therefore, we guarantee that the master will not promote any other (obsolete) shard copies to a new master shard, and that any indexing to the master shard server will not be lost. Of course this means that we only use a single copy of the data, and physical hardware problems can lead to data loss. See Wait For Active Shards For some mitigation options (this parameter is described in more detail in the next section).

    Note: In an ElasticSearch cluster, there are two dimensions of primary selection. Select the Master node and the Master fragment of each replication group. Detailed theoretical analysis will be studied in depth in the source code analysis section.

In Elasticsearch, you can do very lightweight look-ups by ID or use complex aggregations to get non-trivial CPU power. One of the advantages of the master backup model is that it keeps all shard copies identical (except for exception recovery). Typically, a synchronous copy is sufficient to satisfy the read request. When a node receives a read request, the node is responsible for forwarding it to the corresponding data node according to routing rules, collating the response, and responding to the client. We call this node the coordination node for the request. The basic process is as follows:

  1. Route read requests to the relevant shard node. Note that since most search criteria do not include shard fields, they typically need to read data from multiple shard groups, each of which represents a different subset of data (5 by default, since ElasticSearch defaults to 5 master shards).
  2. Select a copy from each shard replication group. A read request can be a master shard or a replica shard in a replication group. By default, the load algorithm of read requests in ElasticSearch fragment groups is polling.
  3. Send a request to the selected shard based on each shard selected in step 2.
  4. The data returned by each shard node is aggregated and then returned to each client. Note that this step can be omitted if queries with shard fields are forwarded to a node.

Exception handling: When a shard is unable to respond to a READ request, the coordination node selects another copy from the same replication group and sends the shard level search request to that copy. Repeated failures result in no shard copies available. In some cases, such as search, ElasticSearch will prefer to respond quickly, returning successful shard data to the client, and specifying which shard nodes went wrong in the response packet.

4. Implied meaning of Elasticsearch primary and standby models

  • Under normal operation, each read operation is performed once for each associated replication group. Multiple copies of the same replication group perform the same search only under failure conditions.
  • Since data was first conducted on main fragmentation index, and then forward the request to copy, before forwarding the data has been changed on the primary shard, so when concurrent read, if read request has been forwarded to the main fragmentation node, the data before it is confirmed that (primary shard to wait for all copies all execution success) have seen the change. Similar to database read uncommitted.
  • The fault-tolerant capability of the master/slave model is two shards (1 master shard, 1 copy).

In case of failure, the following are possible:

  • A shard node can slow down the whole cluster index of performance Because during each operation (index), the main fragmentation in the local successfully perform index action, will forward the request node, copy divided due at this time the shard needs to wait for the response of all synchronous replicas nodes, a single slow fragmentation can slow down the speed of the replication group. Of course, a slow shard also slows down the searches that are routed to it.
  • Dirty reads an isolated master server can expose unrecognized writes. This is because an isolated master is only aware that it is isolated when sending requests to replicas or to the master. At this point, the operation has been indexed to the master node and can be read by concurrent reads. Elasticsearch can reduce this risk by ping the master every second (by default) and reject indexing if there is no known master node.

In this paper, we introduce the design idea of the read/write model of ElasticSearch document in detail, including the write model and its exception handling, the read model and its exception handling, the hidden design defects behind the master/slave load model, and the influence of ElasticSearch on the exception situation. The next section will start learning about the Document API Index API.


See the article, I am Wei Ge, keen on systematic analysis of JAVA mainstream middleware, pay attention to the public account “middleware interest circle”, reply column can get systematic column navigation, reply information can get the author’s learning mind map. \