This article describes how to add, delete, modify, and query ElasticSearch documents in batches. It also introduces the meaning of status codes returned by REST apis.
Let’s take a look at this table:
This table contains five methods: Index, Create, Read, Update, and Delete. Let’s take a look at HTTP requests for CRUD operations.
Provide an HTTP method followed by the index name. After 7.0, all types are represented by _doc, followed by the document ID.
With a brief look at HTTP requests for CURD operations, let’s first look at how to create a document:
Create a document
Create supports two methods: one is to Create a document by specifying a document ID, as shown in the image above. The other is to have ES generate the document ID automatically by calling POST/Users /_doc.
When you create a document by specifying its own id, consider id balance to avoid unbalanced allocation. The HASH function of ES ensures that the document ID is evenly distributed among the different shards.
When we execute the command, we can return the following result:
Each operation of _version is + 1, which is a locking mechanism. When a document is modified in parallel, an error will be reported if the updated version is smaller than the current version, and the modification is not allowed.
When creating a document, ES automatically creates the corresponding index and type if the index does not exist.
Now let’s look at another way to create a document. Instead of specifying an ID to create a document, the HTTP request is changed to POST, as follows:
The result is as follows:
If the document does not exist, the new document is indexed. Otherwise, the existing document is deleted. The new document is indexed with the version information _version + 1.
Query the document
The Get method is relatively simple, just need to Get index name /_doc/ document ID, by executing this command can know the specific information of the document.
When this statement is executed, HTTP 200 is returned as follows:
_index indicates the index, _type indicates the type, _id indicates the document ID, _version indicates the version information, and _source stores the complete original data of the document.
If the document id does not exist, HTTP 404 is returned and found is false.
Update the document
The Update method uses HTTP POST. Doc must be specified in the request body, and the specific document must be provided in the HTTP body. The Update method, unlike the Index method, does not delete the original document, but performs a true data Update.
For example, add a field to the original document whose ID is 1.
After execution, the version information is _version + 1, let’s query the document again:
As you can see, the new field has been successfully added.
Delete the document
The Delete method is also very simple, Delete index name /_doc/ document ID will do, and I won’t do the code demonstration here.
Now that we have covered the basic CRUD operations of the document, let’s look at batch operations:
Bulk API
In a REST request, reestablishing the network overhead is costly. Therefore, ES provides Bulk API to operate on different indexes in a single API call, reducing the network transmission overhead and increasing the write rate.
It supports four types of operations: Index, Create, Update, and Delete. The Index can be specified in the URI or in the body of the requested method.
If one of multiple operations fails, other operations are not affected and the returned results include the results of each operation.
For example, enter the following code:
When we execute the command, the result is as follows:
Took = 93 ms; errors = true; error = update; document 2 does not exist;
When Bulk API is used, if errors is true, the errors operations need to be modified to prevent missing data stored in ES.
Batch Querying Documents
In batch query, you need to specify the ID of the document to be queried. You can query data of different indexes in one _MGET operation, which reduces the cost of network connection and improves the performance.
To do this, enter the following code to get data for document id 1,3.
The running results are as follows:
After describing some of the actions in the document, let’s conclude with a look at some of the common REST API error returns.
Common REST API errors are returned
In the demo, a 404 error was reported when the document id did not exist, and there are various returns from ES, as shown in the following table:
conclusion
This article mainly introduces the DOCUMENT CRUD operations, Bulk API, _MGET API. These operations can improve API call performance, but do not send too much data at a time, or it may cause excessive pressure on ES cluster, resulting in performance degradation. The general recommendation is 1000-5000 documents, if your document is large, you can reduce the queue appropriately, the recommended size is 5-15 MB, the default is not more than 100 MB.
reference
Elastic Stack goes from beginner to practical
Elasticsearch Top Master series
Elasticsearch core technology and actual combat
www.elastic.co/guide/en/el…