Abstract
I wrote about ElasticSearch for the first time, but it’s been two years. Well, time gets old. Recently, I used ES again. I tried to find out the summary document of the past, but there was only one. After half a year of ES, I encountered so many problems and only produced so little. I had to pick up ES again, and found that there were still many ES slots, too many incompatible updates, and the differences between various versions were not small. I felt that ES was designed by people who preferred theoretical algorithms, rather than written by engineers. Very much like in the company, algorithm engineers ridicule back-end application development algorithm ability is weak, back-end application development ridicule algorithm engineer engineering ability is poor. That’s pretty much how ES feels as an application developer. But if you need a search, you can’t do without him. Since you can’t refuse, you have to enjoy it.
Write analysis
Why do I write analysis? I’m curious. For example, the following questions have always puzzled me
- Why does ES lose data
- What kind of node can be a coordinate node
- What are refresh index and Flush index operations
- The memory buffer and filesystem cache are stored somewhere.
- How do nodes in the cluster coordinate writes
- How is the data stored
- Why does writing to filesystem cache allow indexing
Write an overview
First of all, we write from the perspective of distributed cluster analysis, using the system default parameters to explain
The cluster has three nodes, all of which store data, and indexA has five shards and two replication sets. Data distribution is as follows: Node1: shard1 Node2: shard2,shard3,shard1-R1(replication set of SharD1) Node3: shard4,shard5,shard-R2(replication set of Shard1)
To simplify the problem, shard2,shard5 and other shard copy sets are ignored. Now take writing shard1 as an example.
- First, the client connects to a coordinate node through polling according to the configured connection node.
Coordinate node is not a dimensional description of the master/client/data node, it’s the node that handles the client request. This description is the same concept as Cassandra’s coordinate node. All the nodes in the cluster can be coordinate nodes.
- The Coodinate node computes the data on Shard1 through the hash algorithm
shard = hash(document_id) % (num_of_primary_shards)
, and sends the request to Node1 based on the shard information maintained on the node. - Node1 validates the index data and writes it to the shard. See the next section for details
Written to the shard
. - After data is successfully written on the primary node, data is sent to replica set nodes Node2 and Node3 in parallel.
- After data is successfully written to nodes 2 and 3, an ACK signal is sent to shard1 primary node Node1.
- Node1 sends ack to coordinate Node
- Coordinate Node sends the ACK to the client.
The whole process of coordinate node is similar to Cassandra. The master shard node and replica set are affected by master-slave mode, and master must decide whether the writing is successful or not, similar to mysql.
Write shard
The third step above, write inside the SHard, needs to be analyzed in detail
- Data is written to memory buffer
- Simultaneously write data to the Translog buffer
- Data is refreshed from buffer to FileSystemCache every 1s, and the segment file is generated. Once the segment file is generated, it can be queried by index
- After refresh, the memory buffer is cleared.
- Every 5s, translog is flushed from buffer to disk
- Periodically/quantitatively from FileSystemCache, combined with translog content
flush index
To disk. Do incremental flush.
The single-node write process for various databases is similar. It is generally written to memory, logged (in case of node failure and data loss in memory) and flushed to disk, with a continuous merge data block. But the data format is different.
In addition, the distributed or master-slave deployment structure requires the data to be copied to different nodes. This process is complicated, and each database processing has different logic.
Elastic Search also has a layer of buffers in the middle of writing, and we know that buffers and caches work differently,
1. Buffer is used when the processing speed at both ends of the system is balanced (from a long time scale). It is introduced to reduce the impact of sudden I/ OS in a short period of time and play the role of traffic shaping. Take the producer-consumer problem, where resources are generated and consumed at roughly the same rate, and a buffer can counteract the sudden change in resource creation/consumption. 2. Cache is a tradeoff between processing speed mismatches at both ends of the system. As the speed difference between CPU and memory becomes larger and larger, the locality feature of data is taken full advantage of to reduce the impact of the difference through the strategy of memory hierarchy.
Therefore, the data written into buffer is still raw data, which has no index and cannot be searched. This works only in Cache.
Compare to MySQL,Cassandra,Mongo writes
Operation logs and replication set logs need to be written during database writing. Different databases have different processing methods. Some databases are shared and some are separate.
Operation logs are written directly to disks to recover data in case of memory loss caused by process and machine downtime. Writing to buffer may cause data loss. So operation log writing buffers like Elastic Search mysql Innodb will also provide configuration items to ensure that the operation log will be flushed if the transaction succeeds. However, the minimum flush of es operation logs cannot be less than 100ms.
Here is a comparison of the logs for each database, the same functionality, but each creator has his own grid, which needs to be named differently.
The database | Log and swipe | Copy the log | note |
---|---|---|---|
cassandra | commit log | commit log | Commit log Directly writes data to the disk |
mongo | journal | oplog | Journal log Writes data to a disk |
mysql | redo logs | bin log | Redo logs are used to write buffers. |
elastic search | translog | translog | Write buffer |
Those of you who are interested can write mongo, Cassandra in the analysis
Mongo write analysis
Cassandra writes analysis
Follow the public account [Abbot’s Temple], receive the update of the article in the first time, and start the road of technical practice with Abbot
reference
www.elastic.co/guide/en/el…
www.elastic.co/pdf/archite…
Lalitvc.files.wordpress.com/2018/05/mys…
www.infoq.cn/article/ana…
Blog.insightdatascience.com/anatomy-of-…