This article will parse Elastic Search’s model architecture, not at the code level. Starting from the model in ES, understand why ES can achieve the official feature 1,2,3 +. Familiar with the model, we can build basic frameworks or middleware in different languages like ES.
A, characteristics
First understand the characteristics, which is why we chose this framework to meet the needs of existing scenarios.
- A distributed real-time document store where each field can be indexed and searched
- A distributed real-time analysis search engine
- Capable of extending hundreds of service nodes and supporting PB level of structured or unstructured data
Ii. Concept and model
Firstly, I will briefly introduce each concept in ES. For details, please refer to official documents or blogs.
!
Explain from fine to coarse-grained
Doucument: ES is document-oriented, the smallest data unit of index and search is document, corresponding to record in relational database.
Type: The Type is the container of the document. If Doucument corresponds to the record in the relational database, then the Type is the table, but in ES, there is no need to create “table” -type, and then use it.
Index: An Index is a collection of documents, which can be understood as a library, and each Index can be composed of multiple documents on different shards. An Index is a logical namespace that maps to one or more master shards. Every time a document enters ES, it hashes based on its ID and modulos the number of master shards. The resulting value is put into the master shard so that it can be evenly distributed across multiple master shards without causing one of them to be heavily loaded. If a document needs to be joined, you can set _routing to place the document in a specific Shard.
Shard: Shard, the smallest functional unit in ES, is responsible for storing and indexing data, and processing read and write requests.
Primary Shard: Primary Shard. Documents are stored in the Primary Shard. When indexing documents, an index is first created on the Primary Shard and then on the copy of the current Primary Shard.
Replica Shard: a Replica of the Primary Shard to increase fault tolerance and serve as read requests. A Primary shard1 can have multiple Replica Shard(1,2,3,*) copies. To ensure fault tolerance, there are different nodes between these copies and shards. If a Node is down, The Primary Shard or Replica Shard of other nodes works normally.
Node: In a cluster, Node represents the management unit of Shard and Replica. It provides data creation, storage, access, and management functions. Generally, only one Node is deployed for a service. Nodes are divided into primary and secondary nodes in a Cluster. The primary Node is responsible for the management of nodes in the Cluster, including the management of index and allocation of shards and other nodes. Node can be configured with different functions to adapt to various service scenarios.
Cluster: The concept of Cluster is relatively easy to understand, in the high availability of the structure, the concept of Cluster exists.
A more granular explanation
Source: Each field in each Document will be stored in Source. You can configure the field to be stored when performing put.
Reverted Index: Reverse Index. During indexing, the incoming Document will sort and analyze each field according to the Tokenizer, and form a one-to-many mapping relationship between each field and the Document position, and count the number of them. At this point, we can quickly query the Document by the fields in the reverse index. A reverse index is a bit like an index in a relational database, where a field is indexed so that data can be quickly located
Doc_values: Clearly refers to the actual disk location where the Document is stored, similar to id, in column-oriented fashion.
3. Analysis of conceptual model
Official description: “Elasticsearch is a distributed, RESTful search and data analysis engine.” From this description, we can obviously see the word distributed, but what is distributed? Why would Elastic Search use a distributed architectural model? Let’s start with the concept of clustering
The cluster
The relationship between cluster and single is opposite. In general, cluster refers to the combination of various service units under the cluster to provide a service. These services may have the same or different functions. The following are some common cluster models.
Distributed cluster: It is composed of multiple services with different functions, providing integrated and systematic services externally. Services are decoupled from each other to prevent a function from being abnormal and the entire system from running properly. In addition, services are more flexible and easy to expand and update.
Ha cluster: A cluster consists of multiple services with the same function and provides unified services externally. If one of the providers in the cluster goes down, other services can immediately follow suit, and multiple services can perform load balancing.
When updated iteration, can also use the scroll way released (release, to update a service, other services are still in the process of release can provide ability, not because of the current service, cause the failure of most of the request), also can undertake gray released (release, update a service, will need to verify the request into the new service, for validation, If the validation is successful, all remaining services can be updated. If the verification fails, the current service version is rolled back. Other services that are not updated are not affected during the validation process.
For example, in the current CLUSTER of ES, nodes and nodes are mutually backup. Although they are slightly different in function, they generally provide the same service externally, ensuring high availability.
Distributed high availability clusterA distributed cluster provides services with different functions and can also be used as an ha cluster for internal function services to ensure that the services can provide services with stable high availability.
master-slave
The master-slave structure is generally divided into single master-slave or multi-master-slave. The master structure typically has the capability to provide all functions or services, while the slave structure is typically a copy of the master structure, but only provides secondary (secondary) capabilities. The definition of Node in ES is the model of primary Shard and replica Shard.
For example, in the mysql database, the primary library is responsible for reading and writing, but the secondary library is only responsible for reading. Generally speaking, read services require more requests than write services. Multiple slave libraries can share the read burden on the master node to ensure that the write function is not affected.
The largest master-slave structureThe difficulties inIs thatData synchronization. When the data of the master structure is changed, it needs to be synchronized to each slave structure to ensure that the data read from each node is as complete as possible and reduce phantom read. When services need to be strongly associated, read operations should be performed in the primary library. For example, read inventory when placing an order (inventory will change in real time, requiring no oversold or under-sold), etc. If the business is not strongly correlated and the real-time requirements are not high, such as order record query, then you can choose to read from the library.
The election
In the master-slave structure, we have separated the read and write of the data. Then there is a new problem. What if the master node goes down and the service is still unavailable? It is expected that the system will adjust itself to the current situation, such as resetting a new master node. The primary and secondary nodes can be regarded as common nodes with different permissions. The primary node has only one more permission.
Election means that one of the remaining available nodes is selected by voting as the primary node to ensure the stability of the service system.
Refer to the reference
Rookie started guide Elastic: Elastic Chinese community blog elasticstack.blog.csdn.net/article/det… *