The main purpose of this operation is to understand the new data structure of ES and prepare for the migration from 2.x to 7.x.

I use Docker for convenience

Prepare the environment

First, create a Docker network to ensure communication. The host mode is not adopted here

docker network create elastic
Copy the code

Pull the ES7.4 and Kibana images and start

Docker pull elasticSearch :7.4.2 docker run -d --name elasticsearch --network=elastic -p 9200:9200 -p 9300:9300-e"discovery.type=single-node"Elasticsearch: 7.4.2Copy the code

As you can no longer use the elasticSearch-Head plugin from ES (you can also use a third-party plugin or chome plugin), you can install Kibana directly.

Docker pull kibana: 7.4.2 docker run - d - name kibana - network = elastic - e ELASTICSEARCH_URL = http://192.168.123.107:9200 Kibana: 5601-5601 - p 7.4.2Copy the code

This is also very convenient when we look at the Demo document on the official website, because the official website can directly transfer the construction data to the Kibana Console with one click, saving the trouble of copying and pasting.

The Index some documents | Elasticsearch Reference [7.4] | Elastic

I created an index named Customer and type is _doc. In order to prevent uneven index data from affecting the performance of Lucene, 7.x has its type removed and its filed _type is still there.

I tried to continue to create a _doc2, sure enough, it was rejected, I guess it is compatible with old data? People are miserable. Read on first.

The batch index

Indexing documents in bulk

If you have a lot of documents to index, you can submit them in batches with the bulk API. Using bulk to batch document operations is significantly faster than submitting requests individually as it minimizes network roundtrips.

If you want to index many documents at once, you can use the BULK API to commit in bulk, which is fast because it reduces network round-trips

The optimal batch size depends a number of factors: the document size and complexity, the indexing and search load, and the resources available to your cluster. A good place to start is with batches of 1,000 to 5,000 documents and a total payload between 5MB and 15MB. From there, you can experiment to find the sweet spot.

Make 1000-5000 at a time, data 5m-15m is best.

The authorities gave a bunch of junk data and then gave Curl’s edible method to show efficiency.

Combined with previous blogs:

ES Preparations: 2.2 to 7.8 Upgrade by version

Considering the new changes to string, bool, and type, we can learn more about future migrations.

Github Archive: github.com/pkwenda/Blo…