Elasticsearch is a search engine that can store a lot of data and get the information you want in a very short time. The first step is to ensure high availability of Elasticsearch. What is high availability? It usually means, by design, reducing the amount of time that the system cannot provide services. Assuming that the system is always able to provide services, we say that the system is 100% available. If the system goes down at some point, say a website goes down at some point, it can be temporarily unavailable. Therefore, in order to ensure high availability of Elasticsearch, we should minimize the time that Elasticsearch is unavailable

Elasticsearch has a special index health indicator for each index, which is divided into three levels:

  • Green, green. This means that all primary and replica shards have been allocated. Your cluster is 100% available.

  • Yellow, yellow. All master shards have been shard, but at least one copy is missing. No data is lost, so search results remain intact. However, your high availability is somewhat weakened. If more shards disappear, you lose data. So think of yellow as a warning that needs to be investigated in time.

  • Red, red. At least one master shard and all its copies are missing. This means you are missing data: a search returns only partial data, and a write request assigned to the shard returns an exception.

If you have only one host, the health status of the index is yellow, because one host, the cluster has no other hosts to prevent replication, so this is an unhealthy state, so the cluster is also very necessary.

In addition, since it is a cluster, so the storage space is certainly united, if the storage space of a host is fixed, then the cluster it relative to a single host also has more storage space, the amount of data can be stored is larger.

Learn more about the Elasticsearch cluster

Let’s take a look at the structure of the cluster.

First of all, we should know that multiple hosts form a cluster, and each host is called a Node.

Here is a three-node cluster:

In the figure, each Node has three shards. The one starting with P represents the Primary shard, and the one starting with R represents the Replica shard. Therefore, in the figure, master shards 1 and 2, and copy shards 0 are stored in node 1; copy shards 0, 1 and 2 are stored in node 2; master shards 0 and copy shards 1 and 2 are stored in node 3; there are altogether 3 master shards and 6 copy shards. At the same time, we also notice that node 1 has a MASTER identifier, which means it is a MASTER node. Compared with other nodes, it is more special and has the authority to control the whole cluster, such as resource allocation, node modification and so on.

Here comes the concept of node types, which can be divided into four types:

  • ** Master node: ** is the Master node. The primary responsibility of the master node is for things related to cluster operations, such as creating or dropping indexes, keeping track of which nodes are part of the cluster, and deciding which shards are assigned to the related nodes. A stable primary node is very important to the health of the cluster. By default, any node in a cluster can be selected as the primary node. Operations such as indexing data and searching queries consume a large amount of CPU, memory, and IO resources. To ensure the stability of a cluster, it is a good choice to separate the master node from the data node. Although master nodes can also coordinate nodes, route search, and add data from clients to data nodes, it is best not to use these dedicated master nodes. An important rule is to do as little work as possible.

  • ** Data node: ** is the Data node. Data node is the node that stores index data, and it mainly adds, deletes, modifies, searches and aggregates documents. Data nodes have high REQUIREMENTS on CPU, memory, and I/O. Therefore, you need to monitor the status of data nodes during optimization. If resources are insufficient, you need to add new nodes to the cluster.

  • ** Load balancing node: ** is also called Client node, also known as Client node. When a node is configured neither as a master node nor as a data node, the node can only handle routing requests, search, and index distribution operations. In essence, the client node acts as an intelligent load balancer. A single client node is very useful in a large cluster because it coordinates the master node and the data node. The client node joining the cluster can get the status of the cluster and can directly route requests based on the status of the cluster.

  • Preprocessing node: Also known as Ingest node, data can be preprocessed before indexing. All nodes support Ingest by default, or a node can be configured as Ingest node.

  • The above are several types of nodes. In fact, a node can correspond to different types. For example, a node can be both a master node and a data node and a pre-processing node, but if a node is neither a master node nor a data node, it is a load balancing node. The specific type can be set in a specific configuration file.

Set up the cluster

The environment that

Modifying kernel parameters

Log in to each server and modify the kernel parameters

vi /etc/sysctl.conf
Copy the code

Modify the following parameters. If no, add them

vm.max_map_count=262144
Copy the code

The refresh parameters

sysctl -p
Copy the code

Start the elasticsearch

Node 1 is carried out

docker run -d \ --name=elasticsearch \ --restart=always \ -p 9200:9200 \ -p 9300:9300 \ -e node.name=node-1 \ -e Network. Publish_host =192.168.31.149 \ -e network. Host =0.0.0.0 \ -e Discovery. Seed_hosts = 192.168.31.149 192.168.31.181, 192.168.31.233 \ - e Cluster. Initial_master_nodes = 192.168.31.149 192.168.31.181, 192.168.31.233 \ e cluster name = es - cluster \ - e "ES_JAVA_OPTS = - Xms512m - Xmx512m" \ elasticsearch: 7.5.1Copy the code

Description of environment variables:

  • Node. name Specifies the node name. In cluster mode, each node name is unique

  • Network. Publish_host is used for communication between machines in a cluster and for external use. Other machines access the ES service of this machine, which is generally the IP address of the local host

  • Network. Host Sets the bound IP address. It can be ipv4 or ipv6

  • Discovery. seed_hosts – added after ES7.0 to write the device address of the candidate primary node, which can be voted as the primary node if the master hangs after the service is enabled

  • Cluster. initial_master_nodes New configuration after ES7.0. This configuration is required when initializing a new cluster to elect the master

  • Cluster. name Cluster name. If the same name is a cluster, the three ES nodes must be the same

  • ES_JAVA_OPTS Set the memory. If the memory is insufficient, set it to a lower value

Node – 2

docker run -d \ --name=elasticsearch \ --restart=always \ -p 9200:9200 \ -p 9300:9300 \ -e node.name=node-2 \ -e Network. Publish_host =192.168.31.181 \ -e network. Host =0.0.0.0 \ -e Discovery. Seed_hosts = 192.168.31.149 192.168.31.181, 192.168.31.233 \ - e Cluster. Initial_master_nodes = 192.168.31.149 192.168.31.181, 192.168.31.233 \ e cluster name = es - cluster \ - e "ES_JAVA_OPTS = - Xms512m - Xmx512m" \ elasticsearch: 7.5.1Copy the code

Note: Modify the node.name and network. Publish_host parameters

Node – 3

docker run -d \ --name=elasticsearch \ --restart=always \ -p 9200:9200 \ -p 9300:9300 \ -e node.name=node-3 \ -e Network. Publish_host =192.168.31.233 \ -e network. Host =0.0.0.0 \ -e Discovery. Seed_hosts = 192.168.31.149 192.168.31.181, 192.168.31.233 \ - e Cluster. Initial_master_nodes = 192.168.31.149 192.168.31.181, 192.168.31.233 \ e cluster name = es - cluster \ - e "ES_JAVA_OPTS = - Xms512m - Xmx512m" \ elasticsearch: 7.5.1Copy the code

Note: Modify the node.name and network. Publish_host parameters

Modifying a Configuration File

The default ElasticSearch does not allow cross-domain access, so the ElasticSearch head plugin cannot connect. You need to modify the configuration file!

Copying data files

Log in to node-1, node-2, and node-3 respectively.

mkdir -p /data/elk7docker cp elasticsearch:/usr/share/elasticsearch /data/elk7/
Copy the code

Editing a Configuration File

vi /data/elk7/elasticsearch/config/elasticsearch.yml
Copy the code

As follows:

Cluster. The name: "docker - cluster" network. Host: 0.0.0.0 HTTP. Cors. Enabled: truehttp. Cors. Allow - origin: "*"Copy the code

Restart eelasticsearch

docker restart elasticsearch
Copy the code

Test cluster

Check the cluster health status

http://192.168.31.149:9200/_cluster/health?pretty
Copy the code

The effect is as follows:

You can see that the cluster has three nodes.

Check the cluster node status

http://192.168.31.149:9200/_cat/nodes?pretty
Copy the code

The effect is as follows:

It is found that Node-1 is preceded by an asterisk indicating that it is the primary node

Use elasticSearch head to connect

As you can see, three machines form an ES cluster. The cluster is in the healthy and green state. Els-node1 is the primary node with the star label (election). Can also do some add/delete index, query and other operations.

Phase to recommend

  • ElasticSearch search effects
  • How to BaaS the back end: Dismantling and merging business logic
  • Bloom filter design idea, determine whether elements exist in a large set
  • Read Redis cache penetration, breakdown to avalanche problems, attached solutions
  • How to BaaS back-end: Microservices for NoOps