In this tutorial, we cover some common issues related to sharding management in Elasticsearch, their solutions, and some best practices. In some use cases, we combine specific techniques to accomplish the task.

 

Move the Shard from one node to another

This is one of the most common use cases when dealing with clusters of any size. A typical scenario is if there are too many shards co-existing on a node, they will all be used for queries or indexes.

This situation represents a potential risk to node/cluster health. Therefore, it is a good habit to move shards from one node to another. Elasticsearch probably won’t handle this automatically, which means we need to intervene manually. How do you do this?

Elasticsearch provides a cluster-level API that allows you to move shards from one node to another. Let’s look at an example using this API below:

It is important to note that in handling any reroute command, Elasticsearch to resume normal execution balance (the respect such as cluster. Routing. Rebalance. Enable Settings such as the value), so as to keep balance. For example, if the requested allocation involves moving shards from node 1 to node 2, this might cause shards to move from node 2 back to node 1 to maintain balance.

Can use cluster. Routing. Allocation. The enable setting sets the cluster to disable distribution. If allocation is disabled, the only allocation that will be performed is the explicit allocation specified using reroute and subsequent allocations as a result of rebalancing.

By using? Dry_run URI Query parameters, or reroute commands can be run in dry Run mode by passing “dry_run” : true in the request body. This computes the result of applying the command to the current cluster state and returns the resulting cluster state after applying the command (and rebalancing), but not actually performing the requested change.

We can use the Reroute API to move a shard from one node to another. Here’s an example:

POST /_cluster/reroute
{
  "commands": [
    {
      "move": {
        "index": "test",
        "shard": 0,
        "from_node": "node1",
        "to_node": "node2"
      }
    },
    {
      "allocate_replica": {
        "index": "test",
        "shard": 1,
        "node": "node3"
      }
    }
  ]
}
Copy the code

Above, we force shard 0 of index test to be moved from node1 to node2. We also force shard 1 of index test to be assigned to Node3.

 

Stop using node

Another use case is to deactivate a node from an active cluster. One of the main challenges in this situation is to stop the nodes without causing the cluster to go down or restart. Fortunately, Elasticsearch provides an option to gracefully remove/deactivate nodes without losing data or causing an outage. Let’s see how to implement it:

PUT _cluster/settings
{
  "transient": {
    "cluster.routing.allocation.exclude._ip": "IP of the node"
  }
}
Copy the code

The above API causes the cluster to stop allocating anything to the specified node and exclude it. At the same time, data from this node will be migrated to the non-excluded node. The data transfer will take place in the background and will result in the complete deletion of the node from the cluster.

When a node is disabled, the disk space available on other nodes should be greater than the size of data to be transferred. Otherwise, the cluster state may turn red or yellow, which may cause an outage.

It is often helpful to have other options to identify the node to deactivate. In the example above, we identified the node with its “IP”. We can also do the same with the unique “Node ID” and “Node name” in the cluster.

Excluded by node ID

PUT _cluster/settings
{
  "transient": {
    "cluster.routing.allocation.exclude._id": "unique id of the node"
  }
}
Copy the code

Exclude nodes by name

PUT _cluster/settings
{
  "transient": {
    "cluster.routing.allocation.exclude._name": "name of the node"
  }
}
Copy the code

How do we check if the node outage is over? To this end, we have two rules:

Methods a

We use the following method:

GET _cluster/health? prettyCopy the code

The results are as follows:

{ "cluster_name" : "elasticsearch", "status" : "yellow", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 26, "active_shards" : 26, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 19, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 57.77777777777777}Copy the code

Relocating_shards displays a value of 0 above, indicating that no shards are currently being relocated.

Method 2

Check the state of proprietary nodes using the following API:

GET _cat/nodes? vCopy the code

We get the node name from the API above:

IP heap. Percent ram. Percent CPU load_1m load_5m load_15m node.role Master name 127.0.0.1 42 50 2 1.94 DILm * liuxg-2.localCopy the code

Then use the following API:

GET _nodes/liuxg-2.local/stats/indices
Copy the code

Liuxg-2.local above is the name of our node. The result displayed is:

"nodes" : { "Zs0Uy-9mTDOifm5Ef8U6FA" : { "timestamp" : 1581585326681, "name" : "liuxg-2.local", "transport_address" : "127.0.0.1:9300", "the host" : "127.0.0.1", "IP" : "127.0.0.1:9300", "roles" : [ "ingest", "master", "data", "ml" ], "attributes" : { "ml.machine_memory" : "34359738368", "xpack.installed" : "true", "ml.max_open_jobs" : "20" }, "indices" : { "docs" : { "count" : 0, "deleted" : 8 }, ...Copy the code

If the value of indices.docs. Count above is 0, the migration is complete.

 

Rename index

Another use case is to rename an index. This can be done in a number of ways, depending on usage.

Aliasing

If we want to rename an index without losing any data, the most common method is an alias.

For example, we want to rename the index “testIndex” to “testIndex-1”. We can alias “testindex-1” for index “testIndex” so that all requests referencing “testindex-1” will now be routed to “testIndex”. You can do as follows:

POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "testindex",
        "alias": "testindex-1"
      }
    }
  ]
}
Copy the code

This approach allows us to rename indexes with zero downtime.

Reindex API

Sometimes an alias is not the best choice for renaming. In this case, we are left with an option called reindexing. It reindexes all documents from the target index to the target index. In order to do this effectively, two things need to be checked:

  • Is there enough room on the machine?
  • Whether the target index has a correct mapping.

If the above two conditions are met, we can use the reindex API as follows:

POST _reindex
{
  "source": {
    "index": "testindex"
  },
  "dest": {
    "index": "testindex-1"
  }
}
Copy the code

Useful links:

1) In the depth guide to running Elasticsearch In production: facinating. Tech / 2020/02/22 /…