preface

In the basic knowledge of ES in the previous part, we briefly introduced the basic concepts and common APIS of ES. This article focuses on the authentication of ES and sharding control under cluster

The environment is as follows: centos 7.6 ElasticSearch -7.9.3 Kibana-7.9.3

ElasticSearch authentication

Install the ES and Kibana

For details, see the connection address in the preface

Password setting for es

Example Modify the es configuration file

vim config/elasticsearch.yml
#Es security configuration
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
Copy the code

Start the script to set the password for the following users

bin/elasticsearch-setup-passwords interactive
Copy the code

Access es test

The test result is as follows, prompting you to enter the password

Kibana password Settings

Modify the kibana profile

vim config/kibana.yml
elasticsearch.username: "elastic"
elasticsearch.password: "123456"
Copy the code

Restart kibana

nohup ./kibana &
Copy the code

Log on to the test

Customize users and authorization

We can also directly add new users in Kibana and grant corresponding permissions, which will not be discussed too much here

The cluster is introduced

Single node

We first create a stand-alone cluster, define an index for user, three primary partitions, and one copy

PUT user 
{
  "settings": {
    "number_of_shards": 3."number_of_replicas": 1}}Copy the code

The current node distribution is:

  • Indicates that all primary shards in the current cluster are running properly, but not all replica shards are in the normal state
  • All three replica fragments are Unassigned — none of them are assigned to any node. It makes no sense to keep both the original data and copies on the same node, because if we lose that node, we will also lose all copies on that node.

The cluster expansion

One obvious disadvantage of single-node clusters is the single point of failure. Fortunately, we only need to start one node to prevent data loss. We just need to start an ES and specify the same cluster.name. It will find the cluster and automatically join it. Of course, this will only work if it is running on the same machine. Seed_hosts: [” XXX :9300″, “XXX :9300”] to configure the nodes that can be joined.

After we start the second node Node2:

When the second node joins the cluster, three replica shards are assigned to the node — one copy shard for each primary shard. This means that when any one node in the cluster goes down, our data is intact. All newly indexed documents will be stored on the primary shard and copied to the corresponding copy shard in parallel. This ensures that we can get the document from both the primary shard and the replica shard.

After we start the third node, Node3:

When there are two nodes, our cluster has no single point of failure. But how do we scale up to demand for our growing applications? At this point we start the third node, and our cluster will have a three-node cluster: the cluster redistributes the sharding in order to spread the load

One shard from Node 1 and one shard from Node 2 has been migrated to the new Node 3, and each Node now has two shards instead of three. This means that each node’s hardware resources (CPU, RAM, I/O) will be shared by fewer shards, and the performance of each shard will be improved.

Sharding is a fully functional search engine with the ability to use all the resources on a node. Our index, which has six shards (three primary shards and three replica shards), can be expanded to a maximum of six nodes. Each node has one shard, and each shard has all the resources of its node, which can provide the throughput of the system.

A failure

We manually shut down the first node:

The node we shut down is a primary node, node1. The cluster must have a master Node to work properly, so the first thing that happens is to elect a new master Node: Node 3. When we shut down Node 1, we lost the primary shards 0 and 2, and the index didn’t work properly without the missing primary shards. If we check the state of the cluster at this point, we will see that the state is RED: not all the primary shards are working properly.

If we restart Node 1, the cluster can fragment and distribute the missing copies again, and the state of the cluster will be restored. If Node 1 still has the previous shards, it will try to reuse them and copy only the modified data files from the primary shard. Compared to the previous cluster, only the Master node is switched.

Routing rules

When indexing a document, the document is stored in a primary shard. How does Elasticsearch know which shard to place a document in? When we create a document, how does it decide whether the document should be stored in shard 1 or shard 2? First of all, it can’t be random, or we won’t know where to look when we want to retrieve documents in the future. In fact, the process is determined by the following formula:

 shard = hash(routing) % number_of_primary_shards
Copy the code

Routing is a variable value. The default value is the document’s id, or you can set it to a custom value. Routing generates a number using the hash function and divides this number by number_of_primary_shards (number of primary fragments) to get the remainder. The remainder, which is distributed between 0 and number_of_primary_shards-1, is where we are seeking the document in the shard.

This explains why we determine the number of main shards when we create the index and never change the number: if the number changes, all previous values of routes will be invalid and the document will never be found.