Elasticsearch related concepts

The document

1.Elasticsearch is document-oriented, a document is the smallest unit of all searchable data, for example:	1.Log entries in log files	2.Details about a movie/a record	3.A song/a PDF document on an MP3 player2.The document is serialized to Json format and remains in ElasticSearch	1.Json objects have field registration	2.Each field has a corresponding field type (string, numeric, Boolean, date, binary, range type)3.Each document has a uniqueID	1.You can specify your own ID	2.Or automatically generated by ElasticsearchCopy the code

Metadata for the document

1.Metadata can be considered as the original field of each JSON data, which is used to annotate the relevant information of the document. Metadata includes the following metadata:	1. _indx Indicates the index name of the document 2._Type Indicates the name of the type to which the document belongs	3. _id Unique ID of a document 4._Source The raw JSON data of the document	5. _version Document version 6._Score relevance scoreCopy the code

The Index Index

1.An index is a container of documents, a combination of documentsIndex represents the concept of logical space: Each Index has its own Mapping definition, which defines the field name and field type of the contained document. The data of each Index is distributed on the Shard, which represents the concept of physical space. 2. Index Mapping and Settings Mapping defines the types of document fields Settings define different data distributionsCopy the code
Different semantics of indexes
1.N: Many different indexes can be created in an Elasticsearch cluster2.Verb: Indexing is the process of saving a document to ElasticSearch. Es creates an inverted index3.Noun: a b-tree index, an inverted indexCopy the code

Type

Type is a combination of a class of indexes1.Prior to 7.0, an Index could set many Types2.Since 6.0, Type has been Deprecated. As of 7.0, only one type can be created for an index, i.e._doc"Copy the code

Comparison with traditional relational database RDMS

1.ES is characterized by weak structure, correlation, and high performance full text retrieval2.RDMS transactional and data aggregation JoinsCopy the code
RDMS ES
Table Index(Type)
Row Document
Column Field
Schema Mapping
SQL DSL

REST API

ES provides an interface to rest apis for easy invocation by a variety of languagesCopy the code

Distributed cluster

Distributed cluster is the mainstream practice at present, the reason is also obvious, distributed system has the high availability and scalability that single system cannot match
1.High availability:	1.Service availability: Allows nodes to stop services	2.Data availability: Data will not be lost if some nodes are lost2.Scalability:	1.You can scale horizontallyCopy the code
node
1.The node is an ES instance	1.Essentially a JAVA process	2.Multiple ES processes can run on a machine, but production environments generally recommend running only one ES instance on a machine2.Each node has a name, which is specified in the configuration file or at startup with -e node.name=node13.After each node is started, it is assigned a UID, which is kept in the data directoryCopy the code
Master- Eligible Nodes and Master Node
1.When each node starts, there is a Master Eligible node by default	1.You can set Node. master:false to disable node.master2.Master-eligible nodes can join the main selection process to become Master nodes3.When the first node starts, it elects itself as the Master node4.The cluster status is stored on each node. Only the Master node can modify the cluster status	1.Cluster State, which maintains the necessary information about a Cluster		1.All node information		2.All indexes and Mapping and Settings information related to chess		3.The sharded path has informationCopy the code
Data Node
Nodes that can store Data, called Data nodes, are responsible for storing fragmented Data and play a crucial role in Data expansionCopy the code
Coordinating Node
1.Receives Client requests, distributes them to the appropriate nodes, and finally aggregates the results together2.Each Node acts as a Coordinationg Node by defaultCopy the code
Hot & Warm Node
Data nodes with different hardware configurations are used to implement the Hot & Warm architecture and reduce the cost of cluster deploymentCopy the code
Machine Learning Node
As the name suggests, run machine learning jobs for exception detectionCopy the code
Tribe Node
(5.3 Start using Cross Cluster Serach) The Tribe Node connects to different ES clusters and supports treating these clusters as a separate ClusterCopy the code
Configuring the Node Type
1.A node can play multiple roles in a development environment2.In a production environment, you should have nodes for a single roleCopy the code
The node type Configuration parameters The default value
master eligible node.master true
data node.data true
ingest node.ingest true
coordinating only There is no Each node defaults to coordinating, and sets all other types to false
machine_learing node.ml Ture (requires enable X-pack)
shard
1.Master sharding: To solve the problem of horizontal expansion of data. Through master sharding, data can be distributed to all nodes in the cluster	1.A shard is a running instance of Lucene	2.The number of primary shards is specified when the index is created and cannot be changed later, except for Reindex2.Copy, to solve the problem of high availability of data, shard is a copy of the master shard	1.The number of duplicate fragments can be dynamically adjusted	2.Increasing the number of copies can also improve the availability of the service to some extent (read throughput)3.Capacity planning is required for sharding in the production environment	1.The number of fragments is too small. Procedure		1.As a result, nodes cannot be added to achieve horizontal scaling		2.The amount of data in a single fragment is too large, which causes data redistribution time	2.The number of fragments is too large. Procedure		1.It affects the relevance scoring of search results and the accuracy of statistical results		2.Excessive fragments on a single node waste resources and affect performanceCopy the code
Check the cluster health status
1.Green: Master shards and replicas are allocated normally2.Yellow: Indicates that all primary fragments are allocated normally, but duplicate fragments cannot be allocated normally3.Red: Primary sharding failed to be allocated	1.For example, when the disk capacity of the server exceeds 85%, create a new indexCopy the code

More content welcome to pay attention to my personal public number “Han Elder brother has words”, 100G artificial intelligence learning materials, a large number of back-end learning materials waiting for you to take.