Redis series of directories

Redis series – distributed lock

Redis series – Cache penetration, cache breakdown, cache avalanche

Why Is Redis so fast?

Redis series — Data Persistence (RDB and AOF)

Redis series – consistent hash algorithm

Redis series – High Availability (Master slave, Sentinel, Cluster)

Redis series – Things and Optimism lock

Redis series — Geospatial: Do you have Lao Wang next door?

Bitmaps: Did you check in today?

What is a Bloom filter?!

What do you know about consistent hash algorithms? When to use it? Solve what problem? Does redis cluster mode use a consistent hash algorithm?

Data sharding

In distributed data storage, data fragmentation is often considered to avoid placing a large amount of data in a single table or library, which may take too long to perform operations such as query. For example, three mysql libraries (numbered 0,1,2) are used to store order data. When an order data comes in, hash the orderId and the number of machines, hash(orderId) % 3. If the result is 2, the data will be stored in mysql (numbered 2). When storing in different tables and libraries, hash the data according to the primary key or unique key of the database, and then modulo it with the number of database machines to determine which library to put the data in.

When the number of machines is insufficient and needs to be expanded or the machine is down, the number of machines will change, resulting in a decline in the hit ratio of data. Therefore, the previous data need to be re-hash for sharding. This operation will cause the service to be unavailable for a certain period of time, and this problem will occur each time the capacity is scaled down.

Consistency of the hash

The consistent hash algorithm is mainly applied to distributed storage systems. It can effectively overcome the poor scalability caused by the common remainder hash algorithm in distributed storage architecture and ensure that as many requests as possible hit original machine nodes when nodes are dynamically added or deleted.

Hash ring

The consistent Hash algorithm also uses the same method of modulating, except that the same method described is modulating the number of servers, whereas the consistent Hash algorithm modulates 2^ 32. In short, the consistent Hash algorithm organizes the entire Hash control into a virtual circle, If the value space of a hash function H is modulo 0-2^32-1 (i.e., the hash value is a 32-bit unsigned integer), the entire hash ring is as follows:

The entire space is organized clockwise, with the point directly above the circle representing 0 and the first point to the right of 0 representing 1, and so on, 2, 3, 4, 5, 6… All the way to 2^ 32-1, that is, the first point to the left of 0 is 2^ 32-1, 0 and 2^ 32-1 overlap at zero, and we call this 2^32 Hash ring.

The next step is to use Hash to a Hash for each server, specific can select the server host name (considering the IP changes, do not use IP) as a key for the Hash, so that each machine can determine its position in the Hash ring, it is assumed that the above three master node after the IP address of the Hash in the ring space location is as follows:

Add three key-values to the ring: Use the same Hash function to compute the Hash value of the data key and determine the position of the data on the ring. Locate the data clockwise from the location to the first encountered server node, this node is the server where the key is stored!

For example, we have three keys A, B, and C. After hashing, their positions in the ring space are as follows: key-a is stored on node1, key-b is stored on Node2, and key-c is stored on Node3.

Fault tolerance and scalability

If Node 2 fails, key-a and key-c will not be affected and key-b will be relocated to Node 3. Generally, in the consistent Hash algorithm, if a server is unavailable, the data affected is only the data between this server and the previous server in its ring space (i.e. the data between Node 2 and Node 1 in the following figure, and key-2 in the figure). Nothing else will be affected.

Similarly, if a new node 4 is added to the cluster, the data between node 1 and node 4 will be affected and the rest of the data will not be affected.

To sum up, the consistent Hash algorithm only needs to relocate a small part of the data in the ring space for the increase or decrease of nodes, which has good fault tolerance and scalability.

Data skew

Consistency Hash algorithm when the service Node is too little, easy because of the uneven Node division of data skew (cached object mostly cached in a server), only two servers in the system, for example, will cause a large data set to the Node 2, but only in very small amounts will locate on Node 1. Its ring distribution is as follows:

In order to solve the problem of data skew, the consistent Hash algorithm introduces the virtual node mechanism, that is, it computes multiple hashes for each service node and places one service node for each computed result, which is called the virtual node. This can be done by adding a number after the host name. For example, we can calculate three virtual nodes for each server, and then we can calculate the hash value of Node 1#1, Node 1#2, Node 1#3, Node 2#1, Node 2#2, and Node 2#3 respectively, and then we can form six virtual nodes:

In the figure above, virtual nodes node 1#1, node 1#2, and node 1#3 belong to the real node node 1. Virtual nodes node 2#1, node 2#2, and node 2#3 all belong to real nodes node 2.

Used in actual projects

The **Redis cluster does not use consistent hash, but instead introduces the concept of hash slots. ** You can refer to my other redis series – High Availability (Master slave, Sentinel, Cluster).

None of the consistent hashing we’re talking about is a function of the cache machine itself, but is implemented by agents or clients in front of the cluster. The official Redis clustering is the clustering itself through slots for data sharding.

Redis clustering came out late in version 3.0. Before the clustering model came out, many companies did their own Redis clustering. These self-developed Redis clusters can be implemented in a variety of ways, such as in The Redis Jedis client JAR package is to achieve the consistent hash algorithm (client mode), or in front of the Redis cluster with a layer of pre-proxy such as Twemproxy also implemented the hash consistency algorithm (proxy mode). Twemproxy is an open source Redis and Memcached proxy from Twitter. Using this proxy mode to build a cluster, our client connection only needs to connect to the proxy server, without connecting to the specific Redis machine behind the proxy. The hash algorithm used by Twemproxy can also be specified in the configuration file.

Done, done!

[spread knowledge, share value], thank small partners attention and support, I am [Zhuge small ape], a hesitation in the struggle of the Internet migrant workers!!