The profile

This article mainly introduces the implementation principle of Elasticsearch concurrency control and optimistic lock. It lists common scenarios of e-mart, relational database concurrency control, ES concurrency control practice.

Concurrent scenarios

Whether it’s using a relational database or using Elasticsearch for search acceleration, concurrency control is a constant topic whenever data is updated.

When we use ES to update the document, we first read the original document, modify it, and then re-index the document. If there are more than one people doing the same operation at the same time, without concurrency control, it is very likely to lose the modification. The loss of one or two data may be some of the scenes, it doesn’t matter (such as a number of articles to read, comment number statistics), but some of the scenes with rigour highly demanding data, missing a may lead to serious production problems, such as the number of goods inventory of electricity system, missing an update, may cause the oversold phenomenon.

Let’s take the ordering process of the e-commerce system as an example. There are 100 goods in stock, and two users place orders and purchase, both of which include the steps to realize the regular ordering and deduction of inventory

  1. The client completes the order data verification and is ready to execute the order transaction.
  2. The client gets the inventory quantity of the item from ES.
  3. The client submits the order transaction and deducts the inventory quantity.
  4. The client writes the updated inventory quantity back to ES.

An example flow chart is as follows:

Without concurrency control, the inventory of the item would be updated to 99 (the correct value is 98), which would result in oversold. Assume that HTTP-1 is executed one step earlier than HTTP-2. The reason for this problem is that when HTTP-2 obtains inventory data, it updates to ES before HTTP-1 completes the process of placing orders and reducing inventory. As a result, the data obtained by HTTP-2 is already expired, and subsequent updates are definitely wrong.

In the above scenario, the more frequent the update operation is, the more concurrent operations are, the longer it takes to read the update section, and the higher the probability of data errors.

Common locking schemes

Concurrency control is particularly important, and there are two general schemes to ensure that data is correct when updated concurrently.

Pessimistic concurrency control

What pessimistic locking means: I believe that every update has the possibility of conflict, and concurrent updates are extremely unreliable. I only believe that it is safest to do serial updates strictly according to the granularity I define, where one thread updates while the other threads wait, and the next thread updates after the first thread completes.

This scheme is widely used in relational databases. Common table locks, row locks, read locks, write locks, distributed locks that rely on Redis or memcache are all pessimistic locks. The obvious feature is that subsequent threads are suspended and wait, and performance is generally low, but self-implemented distributed locking, with self-controlled granularity (by row, by customer, by business type, etc.), provides a good trade-off between data correctness and concurrency performance.

Optimistic concurrency control

What optimistic locking means: I don’t think collisions happen very often, I want to improve concurrency, and if there is a conflict, the conflicted thread will try again a few times.

In the use of relational database applications, also often their own implementation of optimistic locking scheme, performance advantages, the scheme is not difficult to achieve, or quite attractive.

The _version field records the updated version of Elasticsearch by default. The update succeeds only when the updated version is obtained. If the update fails when the expired data is obtained, the client determines the solution to the update failure.

ES optimistic locking scheme

If HTTP-2 submits updated data to ES, ES will judge that the version number submitted is monotonously increasing with the current Document version number. If the version number submitted is smaller than the document version number, it indicates that the data is expired, and the update request will prompt an error. The process diagram is as follows:

Use the built-in _version combat optimistic lock control effect

We simulate two threads to modify the same document data on the Kibana platform, just open two browser tags, we use the original case data:

{
  "_index": "music",
  "_type": "children",
  "_id": "2",
  "_version": 2,
  "found": true,
  "_source": {
    "name": "wake me, shark me",
    "content": "don't let me sleep too late, gonna get up brightly early in the morning",
    "language": "english",
    "length": "55"
  }
}Copy the code

The current version is 2, we use a browser TAB, issue an update request, take the current version with:

POST /music/children/2? version=2 { "doc": { "length": 56 } }Copy the code

Update successful

{
  "_index": "music",
  "_type": "children",
  "_id": "2",
  "_version": 3,
  "result": "updated",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 2,
  "_primary_term": 2
}Copy the code

At the same time, we also use version=2 to update the other TAB, and get the following error result:

{
  "error": {
    "root_cause": [
      {
        "type": "version_conflict_engine_exception",
        "reason": "[children][2]: version conflict, current version [3] is different than the one provided [2]",
        "index_uuid": "9759yb44TFuJSejo6boy4A",
        "shard": "2",
        "index": "music"
      }
    ],
    "type": "version_conflict_engine_exception",
    "reason": "[children][2]: version conflict, current version [3] is different than the one provided [2]",
    "index_uuid": "9759yb44TFuJSejo6boy4A",
    "shard": "2",
    "index": "music"
  },
  "status": 409
}Copy the code

Key error message: Versionconflictengine_exception. The system upgrades the version to 3. The simulation fails and retry.

In a real scenario, the number of retries is related to the number of concurrent threads. The more threads there are, the more frequently updates are made.

Use external _version combat optimistic lock control effect

SQL > select * from Elasticsearch; SQL > select * from Elasticsearch; SQL > select * from Elasticsearch; Master data synchronizes data to Elasticsearch, and master database concurrency control has its own version generation mechanism, which is easier to use when synchronizing data to ES.

The version_type parameter is added to the request syntax:

POST /music/children/2? version=2&version_type=external { "doc": { "length": 56 } }Copy the code

The only difference
  • built-inVersion, only if you provide version in esThe version can be updated only when the version is the same. Otherwise, an error is reported.
  • externalVersion, only if you provide version than in ESThe version can be changed only when the version is large.

Replica Shard data synchronization and concurrency control

Within Elasticsearch, whenever the Primary Shard receives new data, the Replica Shard synchronizes data with the replica Shard. This synchronization request is very large and asynchronous. If multiple changes are made to the same document, Shard synchronization requests are out of order and can be “last served first served”. Without any concurrency control mechanism, the result will be very different.

Shard data synchronization is also based on the built-in _version for optimistic lock concurrency control.

For example, the Primary shard of Elasticsearch contains the correct result. The primary shard of Elasticsearch contains the correct result. The primary shard of Elasticsearch contains the correct result. Select * from Elasticsearch; select * from Elasticsearch; select * from Elasticsearch;

If there is no concurrent control within Elasticsearch, the replica result of this document may be Text2, and the value of the primary shard does not match.

The expected update order should be text1–>text2–>text3, and the final correct result is text3. How does Elasticsearch work internally?

Select * from Elasticsearch; select * from Elasticsearch; select * from Elasticsearch; The data has been updated by a subsequent thread, which discards the current request and results in Text3. Text1 ->text3 is updated in the correct order.

summary

This article mainly introduces the cause of data confusion in concurrent scenarios, the actual principle of Elasticsearch optimistic lock, and the concurrency control in ES internal data synchronization. If there is any incorrect or not detailed, please inform us to modify, thanks.

Focus on Java high concurrency, distributed architecture, more technical dry goods to share and experience, please pay attention to the public account: Java architecture community