This paper explains the outline, divided into eight core concepts:
- NRT
- Cluster
- Node
- Document&Field
- Index
- Type
- Shard
- Replica
Near Realtime (NRT)
The core benefit of Elasticsearch is Near Real Time NRT, which we call Near Real Time. NRT has two meanings, as illustrated below:
- There is a small delay (about 1 second) between writing index data and when the data can be searched;
For example, when an e-commerce platform launches a new product, users will be able to search for information about the product one second later. This is near real time.
- Search and analysis based on Elasticsearch can be performed with second-level queries
Also give an example, for example, I now want to query in taobao, I recently have a year to buy a few items, a total of how much you spend, how much is the most expensive commodity, which month to buy the most things, what type of goods bought most of this information, if taobao said, you can wait 10 minutes to the results, are you very collapse, the delay time is not close to real time, If Taobao can return to you in seconds, it is near real time.
Let’s draw a diagram to illustrate the three basic concepts
Cluster: Cluster
There are multiple nodes, and the cluster each node belongs to is determined by a configuration (cluster name, default is ElasticSearch). For small to medium sized applications, starting with one node per cluster is normal. The purpose of clustering is to provide high availability and mass data storage and faster cross-node query capabilities.
The Node, the Node
A node in a cluster also has a name (randomly assigned by default). The node name is very important (when performing o&M operations). The default node is added to a cluster named “ElasticSearch”. Of course, a node can also form an ElasticSearch cluster
Document&field: documents and fields
Document is the smallest data unit in ES, a document can be a customer data, a commodity classification data, an order data, usually expressed by JSON data structure, each type under index, can store multiple documents. A document contains multiple fields, each of which is a data field.
It’s the equivalent of a row in mysql, and it’s easy to think of it this way, for example. The document data of a commodity is as follows:
product document
{
"product_id": "1000"."product_name": "MAC Pro 2019 Laptop"."product_desc": "High performance, high resolution, essential for programming."."category_id": "2"."category_name": "Electronic products"
}
Copy the code
Index: an Index
Contains a bunch of document data with similar structures, such as a customer index, a category index, an order index, and an index with a name. An index contains many documents, and an index represents a class of similar or identical documents. For example, if you create a product index, a product index, you might have all of the product data in it, all of the product documents.
Type: Type
Each index can have one or more types, type is a logical data classification in index, document under a type, all have the same field, such as blog system, there is an index, can define user data type, blog data type, comment data type.
Commodity index, which stores all the commodity data, commodity document
However, there are many types of goods, and the document field of each type may be different. For example, electrical goods may also contain some special fields such as after-sales time range. Fresh goods also include special fields such as fresh expiration dates
Type, daily chemical goods type, electrical goods type, fresh goods type
Product type: product_id, product_name, product_DESC, category_id, category_name Product_id, product_name, product_DESC, category_id, category_name, service_period Fresh product Type: Product_id, product_name, product_DESC, category_id, category_name, EAT_period
Each type contains a bunch of documents
{
"product_id": "2"."product_name": "Changhong TV"."product_desc": "4 k hd"."category_id": "3"."category_name": "Electrical"."service_period": "One year"
}
{
"product_id": "3"."product_name": "Shrimps"."product_desc": "All natural, made in Iceland."."category_id": "4"."category_name": "Fresh"."eat_period": "Seven days"
}
Copy the code
Shard sharding, also known as Primary Shard
A single machine cannot store a large amount of data. Es can shard the data in an index into multiple shards and store them on multiple servers. With the Shard, you can scale horizontally, store more data, distribute search and analysis operations across multiple servers, and improve throughput and performance.
Each shard is a Lucene index.
Replica copy, also known as Replica Shard
Any server may fail or go down at any time and shard may be lost. Therefore, multiple replica copies can be created for each shard. Replica can provide backup service when shard fails to ensure data is not lost. Multiple replicas can improve the throughput and performance of search operations.
Primary shard (set at one time during index creation, cannot be modified);
Replica Shard (modify the number at any time, 1 by default),
By default, each index has 10 shards, 5 primary shards, 5 Replica shards. The minimum high availability configuration is 2 servers.
Related indexes:
- Index contains multiple shards
- Each shard is a minimal unit of work, carrying partial data, Lucene instances, complete indexing and request processing capabilities
- When nodes are added or removed, the SHard automatically balances load among nodes
- Primary shard and Replica Shard, each document must only exist in one primary shard and its corresponding replica shard, and cannot exist in multiple primary shards
- Replica Shard is a copy of the Primary Shard. It is responsible for fault tolerance and load of read requests
- The number of primary shards is fixed when the replica shard is created, and the number of replica shards can be changed at any time
- The default number of primary shards is 5, and the default replica is 1. By default, there are 10 shards, 5 primary shards and 5 replica shards
- The primary shard cannot be placed on the same node as the replica shard of one’s own replica shard (otherwise, the node breaks down and both the primary shard and the replica shard are lost, which cannot be fault-tolerant). However, the primary shard can be placed on the same node as the replica shard of another Primary shard
Index in the cluster allocation graph: