In front of us, we introduced the Redis cluster solution developed by the Chinese — Codis, Codis friendly management interface and powerful automatic balance slot function by the majority of developers love. Today we are going to talk about the Redis author’s own Cluster solution – Cluster. Hopefully, after reading this article, you will be able to fully understand the advantages and disadvantages of Codis and Cluster, so that you can make a leisurely choice in different application scenarios.

The Redis Cluster is decentralized, which is fundamentally different from Codis. The Redis Cluster divides 16384 slots, and each node is responsible for a portion of the data. Slot information is stored on each node, and the node persists slot information in the configuration file. Therefore, ensure that the configuration file is writable. When a client connects, it gets a copy of slot information. In this way, when a client needs to access a key, it can directly locate the node based on the slot information cached locally. In this case, the slot information cached by the client is inconsistent with the slot information cached by the server. How to solve this problem? I’m going to keep it in suspense and explain it later.

features

First, let’s take a look at the official introduction of Redis Cluster.

  • High performance and linear scalability up to 1000 nodes. There are no proxies, asynchronous replication is used, and no merge operations are performed on values.
  • Acceptable degree of write safety: the system tries (in a best-effort way) to retain all the writes originating from clients connected with the majority of the master nodes. Usually there are small windows where acknowledged writes can be lost. Windows to lose acknowledged writes are larger when clients are in a minority partition.
  • Availability: Redis Cluster is able to survive partitions where the majority of the master nodes are reachable and there is at least one reachable slave for every master node that is no longer reachable. Moreover using replicas migration, masters no longer replicated by any slave will receive one from a master which is covered by multiple slaves.

Don’t (kan) want (bu) to (dong)? That’s okay. I’ll break it down for you.

Write a security

Redis Cluster uses asynchronous primary/secondary synchronization to ensure final consistency. This can cause some data loss problems. Before you continue reading, you can first think about the circumstances in which data is lost.

Let’s start with a more common case of lost write:

The client sends a write request to a master. The master succeeds in writing the request and notifies the client. Before synchronizing to the slave, the master hangs and its slave becomes the new master in its place. The previously written data is lost.

In addition, there is another situation.

The master node was unable to communicate with most nodes, and after a period of time, the master was considered offline and replaced by its slaves. After a period of time, the original master node was overwritten to restore the connection. If a client has an expired routing table, it will send write requests to the old master node (which has become a slave), resulting in write data loss.

This generally doesn’t happen, though, because when a master is disconnected long enough to be considered offline, it starts rejecting write requests. When it recovers, it still rejects write requests for a short period of time to allow other nodes to update their routing table configurations.

In order to ensure write security, the Redis Cluster will try to connect clients to the part of the majority of nodes when partitioning occurs, because if the connection to a few parts, when the master is replaced, all the write requests will be rejected because the majority of the master is unreachable, so the loss of data will be much higher.

The Redis Cluster maintains a NODE_TIMEOUT variable. If the master restores the connection after NODE_TIMEOUT, no data will be lost.

availability

If most of the cluster’s masters are reachable and each unreachable master has at least one slave, failover starts after NODE_TIMEOUT (usually 1 to 2 seconds) and the cluster is still available after failover.

If all the N master nodes in the cluster have one slave, the cluster must be available when one node fails. If two nodes fail, the probability of the cluster becoming unavailable is 1/(N*2-1).

Redis Cluster has added a new feature called Replicas Migration to improve availability. In fact, this feature is to rearrange the slaves of the cluster after each fault, and to equip the master that does not have a slave, so as to better deal with the next fault.

performance

Redis Cluster does not provide a proxy, but lets clients redirect directly to the correct node.

A copy of the cluster state is kept in the client and the correct node is normally connected directly.

Because the Redis Cluster is backed up asynchronously, the node does not need to WAIT for other nodes to confirm the write success before returning, unless the WAIT command is explicitly used.

For commands that operate on multiple keys, the keys must be on the same node because the data will not move. (Unless it’s Resharding)

The primary goal of the Redis Cluster design is to improve performance and scalability, providing only weak (but reasonable) data security and availability.

Key allocation model

The Redis Cluster is divided into 16,384 slots. This also means that a cluster can have up to 16,384 masters, although the official recommendation is for a maximum of 1,000 masters.

If the Cluster is not in a reconfiguration process, it reaches a stable state. In a stable state, a slot is serviced by only one master, but a master node may have one or more slaves to relieve the pressure of the master’s read requests.

The Redis Cluster hashes the key using the CRC16 algorithm and then moulds 16384 to determine the slot to which the key belongs (hash tag breaks this rule).

Keys hash tags

Tags are implementations that break the above calculation rules, and Hash tags are a way of ensuring that multiple keys are assigned to the same slot.

The hash tag is computed by taking the characters between braces {} and the characters between the first open bracket and the first close bracket if there are multiple pairs of braces. If the braces are preceded by no characters, the entire string is evaluated.

After all this, you may still be confused. Don’t worry, let’s eat some chestnuts.

  1. {Jackeyzhe}. Following and {Jackeyzhe}. Follower calculate the hash value of the Jackey
  2. Foo {{bar}} the key will hash {bar
  3. Follow {}{Jackey} evaluates the entire string

redirect

As mentioned earlier in the performance discussion, Redis Cluster does not provide proxies, but instead uses redirects to connect clients to the correct nodes. Let’s take a look at how Redis Cluster redirects.

Version redirection

The Redis client can send a query request to any node in the cluster. After receiving the query request, the node will parse it. If it is a command that operates on a single key or contains multiple keys in the same slot, the node will find out which slot the key belongs to.

If the slot to which the key belongs is serviced by the node, the result is returned directly. Otherwise a ‘MOVED’ error will be returned:

GET X-Moved 3999 127.0.0.1:6381Copy the code

This error includes which slot (3999) the key belongs to and the IP address and port number of the node where the slot is located. After receiving the error message, the client stores it so that it can find the correct node more accurately.

When a client receives a MOVED error, it can use the CLUSTER NODES or CLUSTER SLOTS command to update the information for the entire CLUSTER, because when redirects occur, it is rarely a single slot change, but rather multiple SLOTS at once. Therefore, when a MOVED error is received, the client should update the cluster distribution information as early as possible. When the cluster reaches the stable state, the slot and node information saved by the client is correct, and the cluster performance reaches the highly efficient state.

In addition to the MOVED redirection, a complete cluster should also support ASK redirection.

ASK a redirect

For a Redis Cluster, a MOVED redirect means that the requested slot is always served by another node, while an ASK redirect only means that the next request needs to be sent to the specified node. In the Redis Cluster migration will use the ASK redirect, so what is the process of Redis Cluster migration?

The migration of A Redis Cluster is in slot units. The migration process is divided into 3 steps (similar to loading an elephant into A refrigerator). Let’s take A look at the steps required to migrate A slot from node A to node B:

  1. The refrigerator door is opened, which is to obtain the list of keys in each slot from node A, and then migrate key by key. Before migrating, node A is set in the migrating state, and node B is set in the importing slot (using the CLUSTER SETSLOT command).
  2. For each key, serialize it with dump on node A and then run restore on node B on the client.
  3. The third step is to close the refrigerator door and delete the corresponding key from node A.

Some students will ASK, said to use ASK redirect? What we have described above is the process of migration, during which Redis will still provide services to the outside world. Imagine if, in the process of migration, I ask node A to query the value of X, and node A says: I don’t have it, I don’t know whether it went to node B or I haven’t saved it, you should ask node B first. We are then returned with an -ask targetNodeAddr error to ASK B. But if we go to B directly, B will say: This is not my responsibility, you have to ask A. (-Moved redirection). Because the migration hasn’t been completed yet, so B is right, x really isn’t his responsibility at this point. B: Let me ask you the value of a key. You have to treat it as your own key. You can’t say you don’t know. If x has migrated to B, the result will be returned directly. If B cannot find the whereabouts of X, x does not exist.

Fault tolerance

The Redis Cluster, like most clusters, uses heartbeat to determine whether a node is alive or not.

Heartbeat and gossip messages

The nodes in the cluster will constantly exchange Ping Pong packets with each other. Ping Pong packets have the same structure but different types. Together, ping Pong packets are called heartbeat packets.

Usually the node will send the ping packet and receive the pong packet returned by the receiver, although this is not always the case. It is also possible for the node to send only the PONG packet without having the receiver send the return packet, which is usually used to broadcast a new configuration.

The node sends a certain number of pings every few seconds. If a node does not receive a ping or pong package from a node for more than half of NODE_TIME, the ping package will be sent to the node before NODE_TIMEOUT. Before NODE_TIMEOUT, the node will attempt TCP reconnection. Avoid the error that a node is unreachable due to TCP connection problems.

Heartbeat Packet Contents

As mentioned above, the ping and Pong packages have the same structure, so let’s take a look at the contents of the packages.

The contents of the ping and Pong packages can be divided into header and gossip messages. The header contains the following information:

  • NODE ID is a 160-bit pseudo-random string that uniquely identifies a NODE in the cluster
  • CurrentEpoch and configEpoch fields
  • Node flag, which identifies whether a node is a master or slave, and other flag bits
  • The node provides a bitmap of the hash slot of the service
  • TCP port of the sender
  • The cluster status considered by the sender (Down or OK)
  • If it is slave, the NODE ID of the master is included

Gossip contains what this node thinks is the state of other nodes, but not all of the nodes in the cluster. The specific information is as follows:

  • NODE ID
  • IP address and port of the node
  • NODE flags

The gossip message plays an important role in error detection and node discovery.

Error detection

Error detection is used to identify whether unreachable nodes in the cluster have gone offline. If a master goes offline, its slave is promoted to master. If it cannot be promoted, the cluster is in an error state. In the Gossip message, there are two values for NODE flags: PFAIL and FAIL.

PFAIL flag

If a node detects that another node is unreachable for more than NODE_TIMEOUT, the node is marked as PFAIL, or Possible failure. Node unreachable indicates that a node sends a ping packet but does not receive any response after the NODE_TIMEOUT period. This means that NODE_TIMEOUT must be greater than the round-trip time of a network packet.

FAIL flag

The PFAIL flag is local information of a node. To promote the slave to master, you need to upgrade the PFAIL flag to FAIL. To upgrade PFAIL to FAIL, the following conditions must be met:

  • Node A marks node B as PFAIL
  • Node A uses the gossip message to collect the status of node B identified by most other master nodes
  • Most master nodes identify node B as PFAIL or FAIL during NODE_TIMEOUT * FAIL_REPORT_VALIDITY_MULT

If the preceding conditions are met, node A identifies node B as FAIL and sends A message indicating that node B fails to all nodes. Nodes that receive messages also mark B as FAIL.

The FAIL status is unidirectional and can only be upgraded from PFAIL to FAIL, but not from FAIL to PFAIL. However, there are some cases where the FAIL state can be cleared:

  • The node is reachable again, and it is the slave node
  • The node is reachable and is the master node, but does not provide any slot services
  • The node is reachable and master, but no slave has been promoted to master to replace it for a long time

PFAIL is promoted to FAIL using a weak protocol:

  • If the states of the nodes are not collected at the same time, we will discard the information reported earlier, but it can only ensure that the states of the nodes are agreed by most masters within a period of time
  • When a FAIL is detected, all nodes need to be notified, but there is no way to guarantee that each node will receive the message successfully

Because it is a weak protocol, Redis Cluster only requires that all nodes ultimately have the same state for a node. If most masters think a node fails, then eventually all nodes will mark it as FAIL. If only a small number of master nodes consider a node to FAIL, the slave will not be promoted to master, and therefore the FAIL status will be cleared.

Set up

Principle said so much, we must come to personally build a Redis Cluster, the following demonstration on a machine simulation to build 3 master 3 slave Redis Cluster. Of course, if you want to learn more about the other principles of Redis Cluster, you can check out the website.

Redis environment

First of all, we need to set up the Redis environment, here we start 6 Instances of Redis, port numbers are 6379, 6380, 6479, 6480, 6579, 6580 respectively

Copy six Redis configuration files and modify them as follows (for example, 6379, the port number and configuration file are modified as required) :

port 6379
cluster-enabled yes
cluster-config-file nodes6379.conf
appendonly yes
Copy the code

The name of the configuration file also needs to be changed. After the modification, six instances will be started respectively. .

Create a Redis Cluster

After the instance is started, you can create the Redis Cluster. If the Redis version is 3.x or 4.x, you need to use a tool called redis-trib. For versions after Redis5.0, the Redis Cluster command is integrated into the redis-CLI. I used Redis5 here, so I didn’t install the Redis-trib tool separately.

Then execute the command

Redis-cli --cluster create 127.0.0.1:6379 127.0.0.1:6380 127.0.0.1:6479 127.0.0.1:6480 127.0.0.1:6579 127.0.0.1:6580 --cluster-replicas 1Copy the code

When you see the output

[OK] All 16384 slots covered
Copy the code

The Redis Cluster has been created successfully.

Viewing Node Information

Use the cluster nodes command to view the node information of Redis Cluster.

You can see that nodes 6379, 6380, and 6479 are configured as master nodes.

reshard

Let’s try the Reshard operation again

As shown, enter the command

Redis - cli - cluster reshard 127.0.0.1:6380Copy the code

The Redis Cluster will ask you how many slots you want to move, in this case 1000 slots, and then ask you which NODE you want to move to. In this case, we will enter the NODE ID of 6479

After reshard is complete, you can enter commands to view the status of the node

Redis - cli - cluster check 127.0.0.1:6480Copy the code

You can see that node 6479 has 1000 more slots, 0-498 and 5461-5961 respectively.

Adding a Master Node

Adding slave node

We can also add slave nodes using the add-node command, but we need to add –cluster-slave and use –cluster-master-id to specify which master the new slave belongs to.

conclusion

To conclude, we introduced

Redis Cluster features: Write security, availability, and performance

Key allocation model: Using the CRC16 algorithm, if the same slot needs to be allocated, the tag can be used

Two redirects: MOVED and ASK

Fault tolerance mechanism: PFAIL and FAIL states

Finally, I set up an experimental Redis Cluster.