scenario

Inevitably in our system for data storage, such as the system will have users to upload a lot of pictures, these image data are stored in a file server on a disk, and a server storage space is limited, so when the image quantity more than server disk, typically require expansion, using many servers to store our image data.

At this point, we are faced with two problems:

  • How to determine which server to store an image on and which server to go to if you want to view a particular image;
  • How to ensure that the image data is not lost in the event of a server crash or disk corruption.

These two problems are the two core problems that need to be considered when designing a scalable system. The first problem is data partitioning in nature, and the second is data replication.

Data partitioning and replication strategies are the core of distributed systems. Well-designed data partitioning and replication schemes improve system performance, availability, and reliability, and define system scaling and management efficiency.

Data partition

The first thing we need to consider is how to determine which server node to store the image on.

Ordinary Hash algorithm

A simple method is to convert the image ID to a number through the HASH algorithm, and then modulo the number and the total number of servers to find the corresponding server. Such as:

The solution described in the figure above solves the problem of determining the storage/retrieval server, but there is a serious problem.

As the amount of image data increases, new server nodes will be added, or if a server fails or becomes corrupted, it will have to be deleted, which will change the total number of servers. Then the mapping of all our pictures will be destroyed, resulting in the inability to query the corresponding pictures. Having to remap all the images and image name information, and having to move the image data around with a new number of servers, was very complicated, and certainly not what we really wanted.

Consistent hash algorithm

We can use a consistent hashing algorithm to solve the above problem.

Consistent hash algorithms can be used for data and server nodes and ensure that only a small amount of data is moved when server nodes are added or removed.

A consistent hash algorithm logically stores data from a distributed system in a ring. Each server node in the ring is assigned a data range. Here is an example of a consistent hash ring:

The consistent hash algorithm divides the ring into smaller predefined ranges, each of which is assigned to a server node. The beginning of the scope is called token, which is to assign a token to each node. The range assigned to each node is calculated as follows:

Range start value: Token value

End value: the next token value -1

Here are the tokens and data ranges for the four server nodes described in the figure above:

Whenever the system needs to read or save a file (data), the first step it performs is to use the hash algorithm to compute the hash value of the file name (ID).

The hash value is used to determine which range the file resides in and which node the file should be stored on.

With this consistent hash algorithm, when a server node needs to be added or removed from a consistent hash ring, it works fine because only the next node that is currently being added or removed will be affected.

For example, when a node is deleted, the next node is responsible for storing all the data on the deleted node.

However, there is another problem in this scheme, which may lead to uneven data and load distribution. For example, a large number of data hash values are distributed between 1 and 25, so Server 1 stores a large amount of data and bears most of the request load.

For example, in the following example, if all data hash values end up in the range (76-100), all data will be saved on Server 4, but none on Server 1,2, and 3.

This problem is solved by virtual nodes in a consistent hash algorithm.

Virtual node

In the consistent hash scheme above, a single Token is assigned to each server node, which is to assign a relatively large data range to a server node. This is a kind of static range division, which needs to calculate the corresponding data range according to the given number of nodes.

When adding or removing nodes in this solution, we need to do a lot of data movement if we want each server node to maintain data balance and load balance. The main problems are as follows:

  • Adding or deleting nodes: Recalculates the data range and incurs huge administrative overhead on the cluster.
  • Hot data problem: Because each node is allocated a large range, some nodes may become hot nodes if the data is not evenly distributed.

Consistency in order to deal with these problems, the hash algorithm introduces virtual nodes, will originally larger data range is divided into a number of small data range, and then distributed to different scope of multiple data server node, these small data range is called virtual node, for each physical server nodes, is no longer just a range of data management, Instead, it manages many small ranges of data.

In addition, virtual nodes are randomly distributed in the cluster. It is recommended not to store the continuous data range on a physical server node. In this way, hotspot data within the continuous range is not stored on a node.

In addition, each node keeps copies of other nodes for partition fault tolerance.

And because there may be performance differences among the servers in the cluster, some servers may contain more virtual nodes than others.

The following figure shows how object server nodes A, B, C, D, and E use the consistent hash algorithm for virtual nodes. Each physical node is assigned A set of virtual nodes, and each virtual node is replicated once and stored on the other nodes.

Advantages of virtual nodes

Virtual nodes with consistent hash algorithms have many advantages.

  • Because virtual nodes divide the hash range into smaller sub-ranges, they help the physical nodes on the cluster to distribute the load more evenly, and thus rebalance the nodes when they are added or removed more quickly.
  • When new server nodes are added, it transfers some virtual nodes from existing nodes to maintain the cluster balance.
  • Similarly, when a node needs to be rebuilt, many nodes participate in the rebuilding process, rather than fetching data from a fixed number of copies. Using virtual nodes is easier to manage when server performance is different.
  • We can assign a large number of virtual nodes to powerful servers and a small number of virtual nodes to weak servers.
  • Since virtual nodes help allocate many smaller ranges to each physical node, this also reduces the likelihood of hot issues.

Data replication

In order to ensure the high availability and persistence of distributed systems, data of one node needs to be replicated to multiple other nodes to prevent the current node from abnormal crash and unavailability.

The consistent hashing algorithm copies each data item to a fixed number of nodes in the system, assuming that three copies of data need to be backed up.

After calculating the hash value, each data is assigned to a coordinating node, which first stores the data locally and then copies it to the two subsequent nodes clockwise.

Each node then has data for the region of the ring between it and its third leading node.

conclusion

The consistent hash algorithm helps to efficiently partition and replicate data. Any distributed system that needs to scale or wants to achieve high availability through data replication can utilize the consistent hash algorithm.

For example, a distributed system that dynamically adjusts its cache usage by adding or removing cache servers based on traffic load; Or the system that uses some column storage servers to store files needs to be expanded or reduced according to the usage.

I am xiao Hei, a programmer in the Internet “casual”.

Water does not compete, you are in the flow