The profile
This article introduces the partial Update (PARTIAL Update) scripting for Elasticsearch and the benefits of incremental updates.
Incremental update process and principle
A simple review
We’ve briefly covered the syntax of increments, so let’s review the request example:
POST /music/children/1/_update
{
"doc": {
"length": "76"
}
}Copy the code
Elasticsearch application request from the client to Elasticsearch
- The client initiates the GET request first, obtains the Document information, and displays it on the front page for users to edit.
- After editing the data, the user clicks Submit.
- The background system processes the modified data and assembles the complete Document message.
- Send a PUT request to ES for full replacement.
- ES marks the old document as deleted and then recreates a new document.
Elasticsearch document is created in immutable mode, and all document updates create a new document and mark the old document as deleted. Incremental updates are no exception. Integrate the new document, replace the old document all three steps in one shard, millisecond.
Incremental update interactions between shards
Steps to incrementally update document:
- The Java client sends an update request to the ES cluster.
- The Coodinate Node receives the request, but the document is not on the current Node. It forwards the request to the P0 shard on Node2.
- Node 2 retrieves document and modifies itJSON under source, and re-index the document. If the document has been modified by other threads and there is a version conflict, the document will be updated again, with maximum retryOn_conflict times, abandoned after the number of retries exceeds.
- If the operation succeeds in Step 3, Node2 asynchronously forwards the complete content of the Document to Replica Shard of Node1 and Node3 to re-establish the index. Once all replicas return successfully, Node2 returns a success message to the Coodinate Node.
- The Coodinate Node responds to the update success message to the client. In this case, both the Primary shard and replica Shard in the ES cluster have been updated.
A few points to note:
- When the Primary Shard synchronizes the Document data to the Replica Shard, it sends the complete information of the document. Because the asynchronous request does not guarantee the order, if the incremental information is sent, the order is out of order, and the document content will be wrong.
- As long as the Coodinate Node responds to the Java client successfully, it means that all the primary shards have completed the update operation to the Replica shard. At this time, the data in the ES cluster is consistent and the update is safe.
- Retry policy. ES obtains document data and the latest version number again, updates the document if it succeeds, and tries again if it fails. The maximum retry times can be set, for example, 5 retry timesonconflict=5
- The retry strategy is best used in scenarios where incremental operations are not sequential, such as counting operations, and it does not matter who executes first or later, as long as the end result is correct. Other scenarios, such as the change of the inventory, the change of the account balance, update directly into a specified value, certainly cannot use retry strategy, but can be converted to add and subtract, the following orders by directly to update the inventory quantity of logic to the currently available inventory quantity = inventory quantity – order quantity “, account updates and subtract the amount of change, To some extent, sequential correlation can be converted into sequential independence, making it easier to use retry policies to resolve conflicts.
Advantages of incremental updates
- All the query, modify, and write back operations are completed inside ES, reducing the network data transmission cost (twice) and improving the performance.
- Shortening the query and modify time interval (millisecond level) compared to the full replacement time interval (second level or higher) can effectively reduce the situation of concurrent conflicts.
Use scripts to implement incremental updates
Elasticsearch supports scripting for more flexible logic. After version 6.0, the default script is Painless, and Groovy is no longer supported because Groovy compilation has a certain chance of not freeing memory, resulting in Full GC.
Taking the case of English nursery rhymes as the background, we assume that the data of document is as follows:
{
"_index": "music",
"_type": "children",
"_id": "2",
"_version": 6,
"found": true,
"_source": {
"name": "wake me, shark me",
"content": "don't let me sleep too late, gonna get up brightly early in the morning",
"language": "english",
"length": "55",
"likes": 0
}
}Copy the code
Built-in scripting
We now have a requirement that the document likes field increment by 1 every time someone clicks on a song. We can do this with a simple script:
POST /music/children/2/_update
{
"script" : "ctx._source.likes++"
}Copy the code
After executing the document, query the document and find that the likes become 1. After executing the document, the likes increment by 1.
External scripts
Make some changes to the increment requirement, support batch update playback amount, increment amount passed in by parameter, script can also be imported by way of precompilation stored in ES, when used.
Create a script
POST _scripts/music-likes
{
"script": {
"lang": "painless",
"source": "ctx._source.likes += params.new_likes"
}
}Copy the code
The script ID is music-likes and the argument is new_likes, which can be passed in when called.
Using a script
When we update, we invoke the script we just created by executing the following request
POST /music/children/2/_update
{
"script": {
"id": "music-likes",
"params": {
"new_likes": 2
}
}
}Copy the code
Id = “music-likes”; params = “new_likes”; document = “likes”;
See the scripts
Command:
GET _scripts/music-likesCopy the code
The argument after the slash is the script ID
Delete the script
Command:
DELETE _scripts/music-likesCopy the code
The argument after the slash is the script ID
Script Precautions
- When ES detects a new script, it executes script compilation and stores it in the cache, which takes time to compile.
- Script preparation can be parameterized, do not hard coding, improve script reuse.
- If too many scripts are compiled in a short period of time, ES directly reports to circuit if it is out of ES’s rangebreakingException error, the range is 15 entries per minute by default.
- The default script cache number is 100, and the expiration time is not set by default. The maximum number of characters of each script is 65535 bytes. You can change script.cache.expire and script.cache.max if you want to configure them by yourselfThe size and the script. The MaxsizeinBytes parameter.
In a word, improve script reusability.
Upsert grammar
At present, this counter is stored together with the content. If the counter is stored separately, a new song may appear on the shelves, but the document of the counter may not exist yet. Trying to update it will result in a documentMissingException error. For this scenario we need to use the upsert syntax:
POST /music/children/3/_update
{
"script" : "ctx._source.likes++",
"upsert": {
"likes": 0
}
}Copy the code
If the record with id 3 does not exist, on the first request, execute the JSON content in upsert and initialize a new document with ID 3 and value 0. On the second request, the document already exists, and the script is updated with “likes”.
summary
This article briefly introduces the process and principle of incremental update, and makes a simple comparison with full replacement. For some simple counting scenarios, the implementation of scripts is introduced. Scripts can achieve rich functions, for details, please refer to the introduction of Painless on the official website.
Focus on Java high concurrency, distributed architecture, more technical dry goods to share and experience, please pay attention to the public account: Java architecture community