Welcome to pay attention to github.com/hsfxuebao, I hope to help you, if you think it can trouble to click on the Star ha

1. Design objectives of Redis cluster

Redis-cluster is a server Sharding technology officially supported by Redis3.0 and later. What was the Redis Cluster designed with in mind? Redis Cluster Specification (Opens New Window)

1.1 Redis Cluster goals

High-performance linear expansion up to 1000 nodes. There are no agents in the cluster, asynchronous replication (between cluster nodes), no merge operations on values

Acceptable write security: The system attempts (best-effort) to preserve all write operations initiated by clients connected to the master node. There is usually a small time window in which confirmed writes may be lost (that is, writes may be lost in a failover within a small window before a failover occurs). In the case of partition failure, the window for write loss is large for minority master writes.

Availability: Redis Cluster The Cluster is always available in the following scenarios: most master nodes are available, and for the few master nodes that are not available, each master has at least one currently available slave. Further, by using replicas migration technology, a master that currently has no slaves receives a new slave from a master that currently has multiple slaves to ensure availability.

1.2 Clients and Servers roles in the Redis Cluster protocol

The nodes of the Redis Cluster are responsible for maintaining the data and obtaining the Cluster status, which includes mapping keys to the correct nodes. Cluster nodes can also automatically discover other nodes, detect non-working nodes, and promote slave nodes to master when faults occur

All Cluster nodes use the Redis Cluster Bus consisting of TCP and binary protocols to automatically discover nodes, detect faulty nodes, and upgrade slave nodes to master. Each node connects to all other nodes through the Cluster bus. Nodes use the Gossip protocol to spread cluster information to discover new nodes, send ping packets to confirm that the peer is working properly, and send cluster messages to mark specific status. The Cluster Bus is also used to log Pub/Sub messages in clusters and to orchestrate manual failover upon receiving user requests.

1.3 Write safety

Redis Cluster uses asynchronous replication between nodes as well as the Last Failover Wins implicit merge function. This means that the data contained in the final elected master will eventually replace all other replicas/ Slaves nodes (containing data). When a partition problem occurs, there is always a window in which write losses can occur. However, this window is different for clients connected to the majority master (mimority of masters) and for clients connected to the minority master (mimority of masters).

The Redis cluster tries harder to save writes initiated by clients of the majority of masters than by clients of the minority of masters. The following scenario would cause confirmed writes on the master partition to be lost during a failure:

The write request reaches the master, but by the time the master completes and replies to the client, the write operation may not have been propagated to its slave via asynchronous replication. If the master hangs before the write reaches the slave, and the master is unreachable for long enough for the slave node to be promoted, then the write is lost forever. It is often difficult to observe directly because the master’s attempt to reply to the client(write acknowledgement) and propagate the write operation to the slave usually occur almost simultaneously. However, this is how it breaks down in the real world. (Not to mention the post-return outage scenario, because the write loss caused by the outage also exists on the standalone Redis, which is not the purpose of the Redis Cluster.)

Another mode in which write loss could theoretically occur is:

Master unavailable due to partition (unreachable)
The master is replaced by a slave.
After some time, the master is available again
Before the old master becomes a slave, a client writes to the node from an expired routing table.

The second failure scenario above is usually difficult to occur because:

If a minority master cannot communicate with the majority master for a certain amount of time, it will reject writes, and when the partition is restored, after the master reestablishes a connection with the majority master, It will also remain deny write for a short period of time to sense cluster configuration changes. The time window left for writing to the client is small.
This error also occurs if the client has been using an expired routing table (in fact, the cluster has been promoted due to a failover).

Writing minority side of a partition (master) will have a longer window and result in data loss. If a failover occurs, the data written to the master minority will be overwritten by the majority side (after the master minority reaccesses the cluster as a slave).

In particular, for a failover to occur, the master must not be connected by a majority of masters(maters) for at least NODE_TIMEOUT, so if the partition is repaired within that time, no write loss occurs. When the partition duration exceeds NODE_TIMEOUT, all writes to the minority master(Minority Side) during this time are lost. However, minority Side will immediately reject writes if the majority side is not connected after NODE_TIMEOUT, so there is a maximum time window for minority master to become unavailable. No more writes are accepted or lost outside this window.

1.4 Availability

Redis Cluster is not available on minority partition side. On the majority partition side, the cluster will be available again after NODE_TIMEOUT plus a short time (usually 1 ~ 2 seconds) required for a re-election, assuming that the master that is unreachable and exists by the majority masters has a slave.

That means that the Redis Cluster is designed to tolerate a small number of node failures, but the remaining remaining available in the event of network Splits and no master splits. It is not a suitable solution (for example, do not try to deploy Redis cluster between multiple machine rooms, this is not what Redis Cluster is supposed to do).

If a cluster consists of N master nodes and each node has only one slave, the majority side of the cluster can remain available when only one node has a partition problem. If two nodes have a partition failure, the majority side of the cluster can remain available. Only 1-(1/(N * 2-1)) is possible to keep the cluster available. That is, if there is a cluster of five masters and five slaves, it has a 1/(5 * 2-1)=11.11% chance of cluster unavailability when two nodes fail partitions.

Redis Cluster provides a useful feature called Replicas Migration that improves cluster availability in real-world common scenarios by automatically moving backup nodes to solitary master nodes. After each successful failover, the cluster automatically reconfigures the slave distribution to make it as resilient as possible to the next failure.

1.5 Performance (Performance)

Instead of routing commands to the node where the key resides, the Redis Cluster sends a redirection command (-moved) to the client to direct the client to the correct node. As a result, the client gets an updated cluster(Hash slots distribution) display and which nodes serve keys in the command, so clients can get the correct node and use it to continue executing the command.

Because the master and slave use asynchronous replication, the node does not need to WAIT for other nodes to acknowledge the write (unless the WAIT command is used) to reply to the client. Also, because multi-key commands are limited to near keys (keys in the same hash slot, or more commonly defined by a hash tag as a group of keys with the same hash field and similar business meanings), So data is never moved between nodes unless resharding is triggered.

Normal operations are executed as if they were in a single Redis instance. This means that a Redis Cluster with N master nodes can be considered to have N times the performance of a single Redis. At the same time, queries are usually executed in a round trip, because clients typically retain persistent connections (connection pools) to all nodes, so the latency is no different from the client operating on a single Redis instance.

The main goal of Redis Cluster is to provide extremely high performance and scalability while providing reasonably weak guarantees on data security and availability.

1.6 Avoiding Merge Operations

Redis Clusters are designed to avoid versioning (and merging/merging) on multiple nodes that have the same key-value pair, as this is not required under the Redis data model. Redis is very large at the same time; A list or sorted set with millions of elements is quite common. Similarly, the semantics of data types are complex. Transferring and merging such values will create significant bottlenecks and may require significant changes to the application-side logic, such as more memory to hold meta-data and so on.

There are no strict technical limitations to merge. CRDTs or synchronous replication state machines can shape complex data types similar to Redis. However, the runtime behavior of such systems is actually different from that of Redis Cluster. Redis Cluster is designed to support additional scenarios that are not supported by non-clustered Redis versions.

2. Main modules

Redis Cluster Specification also introduces the main modules of Redis Cluster, which contains many basic and concepts, we need to understand first.

2.1 Hash Slot

Instead of using consistent hashes, Redis-Cluster introduces the concept of a hash slot. There are 16384(2 ^ 14) hash slots in redis-cluster. After each key passes CRC16 verification, the module of 16383 is taken to determine which slot to place. Each node in a Cluster is responsible for a portion of hash slots.

For example, if there are three nodes in the cluster, one of the possible assignments is as follows:

Node A contains hash slots 0 to 5500;
Node B contains hash slots 5501 to 11000;
Node C contains hash slots 11001 through 16384.

clusterState.slotsArray records the clusterAll slotsTo determine which node a slot is assigned to in O(1) time; whileclusterNode.slotsThe array only records the clusterNode structureSlot assignment information for the node represented by. So, when swapping slots, you only need to send the ClusterNode. slots array. Then, the receiving node updates the corresponding clusterNode in the dictionary table. Instead of traversing all the elements in clusterState, marking the slot designated by the current node, and then transmitting its information, the time complexity is O(N). The following figure shows the clusterState structure for node 7000.

2.2 Keys hash tags

Hash tags provide a way to assign multiple (related) keys to the same Hash slot. This is the basis of multi-key operation in Redis Cluster.

The hash tag rules are as follows. If the following rules are met, the characters between {and} will be used to calculate the HASH_SLOT to ensure that such keys are stored in the same slot:

Key contains an {character
And if there is an} character to the right of the {
And if there is at least one character between {and}

Such as:

{user1000}. Following and {user1000}. Followers the two keys will be hash into the same hash slot as only user1000 will be used to calculate the hash slot value.
Foo {}{bar} is a key that does not enable a hash tag because there are no characters between the first {and}.
The {bar part of the foozap key will be used to compute hash slots
The bar in foo{bar}{zap} key will be used to compute hash slots, but zap will not

2.3 Cluster Nodes Properties

Each node has a unique name in the cluster. This name is represented by a 160bit random hexadecimal number and is first obtained when the node is started (usually via /dev/urandom). The node keeps its ID in the configuration file and uses it forever until the CLUSTER RESET HARD command is used by the administrator to RESET the node.

Node IDS are used to identify each node in the cluster. A node can change its IP address without changing its ID. The Cluster can detect IP /port changes and reconfigure the node using the Gossip protocol running on the Cluster bus.

The node ID is not the only information bound to the node, but it is the only field that is always globally consistent. Each node has a set of related information. Some of the information is about the details of how this node is configured in the cluster and ultimately remains consistent across the cluster. Other information, such as when the node was last pinged, is local to the node.

Each node maintains the following information about other nodes in the cluster: Node ID, IP and port of the node, node label, master node ID (if this is a slave node), time when the last pending ping was sent (0 if there are no pending pings), time when pong was last received, The current node configuration epoch, link state, and finally Hash Slots for the node service.

For details about node fields, see the description of CLUSTER NODES.

The CLUSTER NODES command can be sent to any node in the CLUSTER and provides information about the CLUSTER state and each node based on the node’s view.

Here is an example of a CLUSTER NODES output sent to a master node in a small CLUSTER with three NODES.

$redis - cli cluster nodes d1861060fe6a534d42d8a19aeb36600e18785e04 127.0.0.1:6379 myself - 0 1318428930 1 connected A scale of 0-1364 to 3886 e65cc906bfd9b1f7e7bde468726a052d1dae 127.0.0.1:6380 master - 1318428930 1318428931 2 connected. 1365-2729 D289c575dcbc4bdd2931585fd4339089e461a27d 127.0.0.1:6381 master 3 connected - 1318428931, 1318428931. 2730-4095Copy the code

In the example above, the different fields are listed in order:

node id, address:port, flags, last ping sent, last pong received, configuration epoch, link state, slots.
Copy the code

2.4 the Cluster bus

Each Redis Cluster node has an additional TCP port to accept connections from other nodes. This port has a fixed offset from the normal TCP port used to receive client commands. This port is equal to the normal command port plus 10000. For example, if a Redis street port holds a client connection on port 6379, then its cluster bus port 16379 will also be opened.

Node-to-node communication uses only the cluster bus and also the cluster bus protocol: a binary protocol consisting of frames of different types and sizes. The binary protocol of the cluster bus is not publicly documented because it does not want to be used by external software devices to anticipate group points for conversations.

2.5 Cluster Topology

Redis Cluster is a network-wide topology in which nodes maintain TCP connections with each other. In a cluster with N nodes, each node has n-1 TCP outgoing connections, and n-1 TCP incoming connections. These TCP connections are always kept alive. When a node sends a ping request on the cluster bus and expects a pong reply, it will try to reconnect to refresh the connection before waiting long enough to mark pong as unreachable. On the Redis Cluster nodes in the network-wide topology, the nodes use the Gossip protocol and the configuration update mechanism to avoid excessive messages exchanged between nodes under normal circumstances, so the number of messages exchanged (relative to the number of nodes) within the Cluster is not exponential.

2.6 Node Handshake

A node will always accept a link to a cluster bus port and will always reply to a ping request, even if the ping comes from an untrusted node. However, if the sending node is not considered to be part of the current cluster, all other packets are discarded.

A node can identify other nodes as part of the current cluster in two ways:

If a node appears in a MEET message. A meet message is very much like a PING message, but it forces the recipient to accept a node as part of the cluster. A node sends a MEET message to another node only after receiving the following command from the system administrator:

CLUSTER MEET ip port 
Copy the code

If a trusted node gossip has a node, the node that receives the gossip message will mark that node as part of the cluster. That is, if in A cluster, A knows B, and B knows C, eventually B will send the gossip message to A, telling A that node C is part of the cluster. In this case, A registers C as part of the network and tries to establish A connection with C.

This means that once we add nodes to the Connected graph, they will automatically form a fully connected graph. This means that the cluster can automatically discover other nodes as soon as the system administrator forces a trust relationship (a new node is added at a node by using the meet command).

3. Request redirection

Redis Cluster uses a decentralized architecture, with each primary node in the cluster responsible for a part of the slot. How does the client determine which node the key is mapped to? That’s what we’re talking about with request redirection.

In cluster mode, nodes process requests as follows:

Check whether the current key exists in the current NODE.
- Slots are calculated using crC16 (key) or 16384
- Query the node that is responsible for this slot to obtain the node pointer
- This pointer is compared to its own node
If the slot is not owned by itself, return the MOVED redirect
If the slot is owned by itself and the key is in the slot, the result of the key is returned
If a key does not exist in the slot, do you check whether the slot is MIGRATING?
If the key is being migrated, an ASK error is returned to redirect the client to the destination server
If the Slot is not moved out, check whether the Slot is imported.
If the Slot is imported and the ASKING flag is displayed, the operation is performed directly
Otherwise return to the MOVED redirect

There are two things to understand about this process: the MOVED redirection and the ASK redirection.

3.1 Moved Redirection

Slot hit: Returns a result
Slot mismatch: if the key requested by the current key command is not in the requested node, the current node will send a Moved redirection to the client. The client will locate the destination node based on the contents of the Moved redirection and send the command again.

PHP slot 9244 is not in the current node, so it is redirected to node 192.168.2.23:7001. Redis-cli will help you automatically redirect (if there is no cluster startup, i.e. no parameter -c, redis-CLI will not automatically redirect), and when writing the program, the logic of finding the target node needs to be handed over to the programmer manually.

Cluster keyslot keyName # get the slot for keyNameCopy the code

3.2 ASK Redirection

Ask redirection occurs when the cluster scales, which leads to slot migration. When we access the source node, the data may have migrated to the destination node. Use Ask redirection to resolve this situation.

3.3 Smart Client

The above two redirection mechanisms make the client implementation more complex, providing smart clients (JedisCluster) to reduce complexity and pursue better performance. The client is responsible for calculating and maintaining key -> slot -> node mapping to quickly locate the target node.

Implementation principle:

Select a runnable node from the cluster and use Cluster Slots to get the mapping between slots and nodes

Save the mapping locally, and use the mapping to operate directly on the destination node (CRC16(key) -> slot -> node), nicely avoiding the Moved redirection and creating a JedisPool for each node
This can be used for command operations

4. State detection and maintenance

How to maintain node status in Redis Cluster? Here are the states, the low-level protocol Gossip, and the specific communication (heartbeat) mechanism.

Each node in a Cluster maintains a list of the status of the entire Cluster in its own view, including:

Current Cluster Status
The slots information and migrate status of each node in the cluster
Indicates the master-slave status of each node in the cluster
The live status and unreachable vote of each node in the cluster

When the cluster status changes, such as new nodes joining, slot migration, node downtime, slave promotion to new Master, we expect these changes to be detected as soon as possible, spread to all nodes in the cluster and reach consensus. The heartbeat (PING, PONG, MEET) between nodes and the data they carry are the most important way of cluster state propagation.

4.1 Gossip protocols

Redis Cluster communication is the Gossip protocol, so you need to have a certain understanding of the Gossip protocol.

The Gossip Protocol, also known as an Epidemic Protocol, is a protocol for exchanging information between nodes or processes based on how an epidemic spreads. Widely used in distributed systems, for example we can use the Gossip protocol to ensure that all nodes in the network have the same data.

The Gossip protocol is a mature protocol in P2P networks. The biggest benefit of the Gossip protocol is that even if the number of nodes in the cluster increases, the load on each node does not increase very much and is almost constant. This allows Consul to scale clusters horizontally to thousands of nodes.

The Gossip algorithm is also known as anti-entropy. Entropy refers to chaos, and anti-entropy refers to finding consistency in chaos. In a bounded network, each node communicates randomly with other nodes, and after a chaotic communication, eventually all nodes agree on their states. Each node may know all other nodes, or only a few neighboring nodes. As long as these nodes can be connected through the network, eventually their status is consistent, which is also the characteristic of epidemic transmission. www.backendcloud.cn/2017/11/12/…

In fact, the Gossip protocol is no stranger to us, the Gossip protocol is also known as the Gossip protocol, the scale and speed of transmission are very fast, you can experience. So many algorithms in computers are derived from life, but also higher than life.

4.1.1 Using the Gossip protocol

Redis clusters are decentralized and communicate with each other through the Gossip protocol. There are several types of cluster messages:

MeetUsing the cluster meet IP port command, existing cluster nodes send invitations to new nodes to join the existing cluster.
PingA node sends a ping message every second to other nodes in the cluster, containing the known addresses, slots, status information, and last communication time of the two nodes.
PongAfter receiving the ping message, the node will reply to the Pong message, which also contains the information of its two known nodes.
FailIf a node cannot be pinged, the node broadcasts a message indicating that the node is down to all nodes in the cluster. Other nodes are marked offline after receiving the message.

4.1.2 Fault detection based on Gossip protocol

Each node in the cluster periodically sends PING messages to other nodes in the cluster to exchange status information and check the status of each node: online, Suspected offline PFAIL, and Offline FAIL.

Save your own information: When primary node A learns that primary node B considers primary node D to be in the suspected offline state (PFAIL), primary node A finds the clusterNode structure corresponding to primary node D in its ClusterState. nodes dictionary. Add the offline report of primary node B to the Fail_reports linked list of the clusterNode structure, and inform other nodes of the suspected offline status of node D through the Gossip protocol.

Ruled together: If more than half of the primary nodes in the cluster report primary node D as suspected offline, then primary node D will be marked as offline (FAIL), and the node marked as offline will broadcast the FAIL message of primary node D to the cluster. All nodes that receive the FAIL message immediately update the status of primary node D in nodes as offline.

Final ruling: To mark Node as FAIL, the following two conditions must be met:

More than half of the primary nodes mark Node as PFAIL.
The current node also marks Node with the PFAIL state.

4.2 Communication status and Maintenance

Once we understand the Gossip protocol basics, we can further understand the implementation and maintenance of PING, PONG, MEET communication between Redis nodes. Let’s go through a couple of questions.

4.2.1 When does the Heartbeat Start?

The Redis node will record the time when it last sent ping and received Pong to each node, and the heartbeat sending time is related to these two values. The cluster status can be updated in a timely manner without causing too many heartbeats:

Each time Cron sends a ping or meet to all unlinked nodes
Every 1 second, randomly select 5 from all known nodes and ping the one that received pong the longest last time
Each time Cron sends a ping to the node that has received more than timeout/2 pong
Reply pong immediately upon receipt of a ping or meet

4.2.2 What Heartbeat Data is Sent?

Header, the sender’s own information
- For slots
- Master-slave information
- IP port information
- State information
Gossip, information about some other nodes that the sender knows
- ping_sent, pong_received
- IP and port information
- A status message, such as a node that the sender thinks is unreachable, will mark it as PFAIL or FAIL in the status message

4.2.3 How do I Handle heartbeat Communication?

Adding a new node

Send the meet package to join the cluster
Get other unknown nodes from Gossip in the Pong package
Repeat the process until you finally join the cluster

Slots information

Determine if the slots information declared by the sender is different from that logged locally
If different and the sender epoch is large, update the local record
If the epoch is different and the sender is small, send an Update message to notify the sender

The Master slave information

The master and slave information of the sender is detected and the local status is updated

Node Fail detection (failure discovery)

A node that does not receive a PONG package after the timeout period is marked as PFAIL by the current node
The PFAIL tag is propagated along with Gossip
Every time the heartbeat packet is received, the PFAIL flag of other nodes will be detected and maintained in the machine as the vote for the node FAIL
When the number of PFAIL flags on a node reaches the majority, the node becomes a FAIL flag and a FAIL message is broadcast

Note: The existence of Gossip makes cluster state changes faster across the entire cluster. Redis selects N/10, where N is the number of nodes, to ensure that within the expiration time of the PFAIL vote, the node will receive 80% of the machines’ Gossip packets for the failed nodes, thus making it into the FAIL state.

4.2.4 Broadcast Information to Other Nodes?

When you need to publish some very important information that needs to be delivered immediately, you need to broadcast the information to all the machines in the cluster, using the broadcast scenario:

Fail information of nodes: When a node is found unreachable, the detection node marks it as PFAIL and transmits it through heartbeat. When a node finds that more than half of the pFails of the node are changed to FAIL and broadcast.
Failover Request message: When the slave attempts to initiate a Failover, it broadcasts the message that it requires to vote
New Master message: The node whose Failover succeeds broadcasts its information to the entire cluster

5. Failover

How to recover from a failure of the master node?

When the slave master fails, it attempts to perform a Failover to become the new master. There may be more than one slave because the master is dead. The process of Failover needs to be consistent across the cluster through the Raft protocol as follows:

The slave found that its master changed to FAIL
Add 1 to the currentEpoch cluster and broadcast Failover Request information
The other nodes receive this message, and only the master responds, determines the validity of the requester, and sends FAILOVER_AUTH_ACK, sending an ACK only once for each epoch
The slave that attempts failover collects FAILOVER_AUTH_ACK
More than half become new masters
Broadcast Pong to inform other cluster nodes

6. Capacity expansion & reduction

How does Redis Cluster expand and shrink?

6.1 capacity

Redis Cluster provides an elegant solution for expanding a cluster when capacity is limited or needs to be expanded for other reasons.

You can add a new node to the cluster by running cluster meet new node IP: port on any client in the cluster or by running redis-trib add node. The new node is the primary node in the cluster by default.
Migrating data The general process of migrating data is to determine which slots to migrate to the target node, obtain the keys in the slots, migrate all the keys to the target node, and broadcast all the slots (data) to the target node to all the primary nodes in the cluster. It is convenient to migrate data directly through the Redis-Trib tool. Now assume that slot 10 of node A is migrated to node B as follows:

B:cluster setslot 10 importing A.nodeId
A:cluster setslot 10 migrating B.nodeId    
Copy the code

Obtain the key in the slot and migrate the key to node B

A:cluster getkeysinslot 10 100
A:migrate B.ip B.port "" 0 5000 keys key1[ key2....]    
Copy the code

The cluster broadcast slot has been migrated to node B

cluster setslot 10 node B.nodeId
Copy the code

6.2 shrinkage capacity

The capacity reduction process is the same as that for capacity expansion. Determine whether the offline node is the active node and whether the active node has slots. If the active node has slots, migrate the slots to other active nodes in the cluster. Finally, you need to point the secondary nodes of the offline primary node to other primary nodes, preferably taking the secondary node offline first.

7. Understand more

Take a closer look at the Redis Cluster through a few examples

7.1 Why is Hash Slot 16384 in the Redis Cluster?

We know that the consistent hash algorithm is 2 to the 16th, why is hash Slot 2 to the 14th? Author’s original answer

When the Redis node sends heartbeat packets, all slots need to be put into this heartbeat packet, so that the node can know the current cluster information. 16384= 16K. When the Redis node sends heartbeat packets, the char bitmap compression is 2k (2 * 8 (8 bit) * 1024(1K) = 16K). In other words, 16K slots were created using 2k space.

Although the CRC16 algorithm can allocate up to 65535 (2^16-1) slots, 65535= 65K, compressed is 8K (8 * 8 (8 bit) * 1024(1K) =65K), that is, need to need 8K heartbeat packets, the author thinks that this is not worth it; Generally, a Redis cluster will not have more than 1000 master nodes, so 16K slot is a suitable choice.

7.2 Why is publishing subscription not recommended in Redis Cluster?

In cluster mode, all publish commands are broadcast to all nodes (including slave nodes). As a result, each publish data is broadcast to all nodes in the cluster once, which increases the bandwidth burden. Frequent use of PUB in a cluster with a large number of nodes will severely consume bandwidth, so pub is not recommended. (Although the official website says that Bloom filters or other algorithms can sometimes be used for optimization)

8. Comparison of Redis cluster schemes

In order to ensure the high availability of Redis, the following aspects are needed:

Data persistence
A master-slave replication
Automatic fault recovery
clustering

If most of our business is read requests, we can use read/write separation to improve performance. But what if there’s a lot of write requests? Now is the era of big data, companies such as Alibaba and Tencent, with large volumes of data, have a very large amount of data written at all times. At this time, if only one master node is unbearable, how to deal with it?

This calls for clustering! In simple terms, multiple primary and secondary nodes form a cluster, and each node stores part of the data. In this way, write requests can also be distributed to multiple primary nodes to solve the problem of heavy write pressure. In addition, when node capacity and performance are insufficient, new nodes can be dynamically added to expand cluster capacity and improve performance.

The mainstream Redis clustering schemes in the industry mainly include the following:

Client Sharding
Codis
Twemproxy
Redis Cluster

They can also be divided by whether they are centralized or not. Among them, client sharding and Redis Cluster belong to the non-centralized Cluster scheme, while Codis and Tweproxy belong to the centralized Cluster scheme.

Centralization refers to whether the client accesses multiple Redis nodes directly or through a Proxy of the middle layer. Direct access belongs to the decentralized solution, while access through Proxy of the middle layer belongs to the centralized solution. They have their own advantages and disadvantages, and will be introduced separately below.

8.1 Client Fragmentation

Client sharding basically means that we only need to deploy multiple Redis nodes, and how to use these nodes is mainly on the client side. The client uses a fixed Hash algorithm to calculate Hash values for different keys and then reads and writes data to different Redis nodes.

Client fragmentation cluster mode

Client sharding requires the business developer to assess the volume of requests and data for the business in advance, and then have the DBA deploy enough nodes for the developer to use.

The advantage of this scheme is that it is very convenient to deploy. DBA can directly deploy and deliver how many nodes the business needs, and the rest requires business developers to write the request routing logic of keys according to the number of nodes, formulate a rule, generally adopt fixed Hash algorithm, and write different keys to different nodes. Data is then read according to this rule.

Visible, its disadvantage is that the high cost of business developers use Redis, you need to write code to use multiple node routing rules, and if the amount of data prior to the business assessment is inaccurate and later expansion and migration cost is very high, because the number of nodes is changed, the Hash algorithm is no longer before the corresponding node.

Therefore, the consistent hashing algorithm was derived to minimize data migration and performance problems when the number of nodes changes.

This client fragmentation scheme is generally used in business scenarios where the amount of business data is relatively stable and will not increase significantly in the later period. It only needs to evaluate the amount of business data in the early stage.

8.2 Twemproxy

Twemproxy is an open source clustering solution from Twitter that can be used as both Redis Proxy and Memcached Proxy.

Its function is relatively simple, only to achieve the request routing and forwarding, there is no online capacity expansion as Codis, the focus of its solution is to unify the client fragmentation logic into the Proxy layer, other functions did not do any processing.

Tweproxy has been launched for the longest time. In the early days when there was no good server sharding cluster solution, it was widely used and its performance was extremely stable.

However, its pain point is that it cannot be expanded and shrunk online, which makes operation and maintenance very inconvenient, and there is no friendly OPERATION and maintenance UI to use. It was in this context that Codis was derived.

8.3 Codis

When I need to use Redis, we don’t want to care how many nodes are behind the cluster, we want our Redis to be a large cluster that can add new nodes to address capacity and performance issues as our business volume increases.

This way is the service side subdivision scheme, the client does not need to care about how many Redis cluster behind nodes, just like using a Redis way to operate the cluster, this scheme will greatly reduce the cost of developers, developers can you just need to focus on business logic, don’t need to care about Redis resource problems.

How can a cluster of multiple nodes be used by developers as if they were working with a Redis? That involve more than one is how to organize service node, generally we will be in the middle of the client and the server adds a broker layer, the client only needs to operate the broker layer, broker layer to achieve the specific request forwarding rules, and then forwards the request to multiple nodes at the back, so this way is also called the way of centralized clustering scheme, Codis is a clustering solution implemented in this way. The Proxy cluster mode is as follows:Codis is a centralized cluster solution developed by The Former God of Pea Pod in China. Since Proxy layer is required to forward all requests, it has high requirements on Proxy performance. Codis is developed by Go language, which is compatible with development efficiency and performance.

Codis consists of several components:

Codis-proxy: forwards read and write requests
Codis-dashbaord: a unified control center that integrates functions such as data forwarding rules, automatic fault recovery, online data migration, node expansion and reduction, and automatic O&M apis
Codis-group: Redeveloped Redis Server based on Redis 3.2.8, adding asynchronous data migration function
Codis-fe: UI interface for managing multiple clusters

Visible Codis components or quite a lot, its functions are very complete, in addition to the request forwarding function, but also to achieve online data migration, node capacity expansion, automatic recovery and other functions.

Codis Proxy is the component responsible for request forwarding and maintains specific rules for request forwarding. Codis divides the entire cluster into 1024 slots. When processing read/write requests, Codis uses the crc32Hash algorithm to calculate the Hash value of keys and then modulates the 1024 slots according to the Hash value. Finally find the specific Redis node.

The biggest feature of Codis is that it can be expanded online without affecting client access during expansion, that is, no downtime is required. This is very convenient for service users. When the cluster performance is insufficient, nodes can be dynamically added to improve the cluster performance.

In order to achieve online capacity expansion and ensure reliable performance of data during migration, Codis modified Redis and added commands related to asynchronous data migration. It is developed based on Redis 3.2.8, and the upper layer works with Dashboard and Proxy components to complete data migration and capacity expansion functions that are nondestructive to business.

Therefore, in order to use Codis, you must use its built-in Redis, which means that whether Redis in Codis can keep up with the latest features of the official version may not be guaranteed. It depends on the maintainer of Codis. Codis is no longer maintained. So using Codis, you can only use Redis 3.2.8, which is a pain point.

In addition, since clustering requires the deployment of multiple nodes, operating a cluster cannot achieve all functions as operating a single Redis. Commands that may cause problems when operating multiple nodes are disabled or restricted. For details, please refer to the list of commands not supported by Codis.

8.4 Redis Cluster

In the centralized mode of adding a layer of Proxy in the middle, it has high requirements on Proxy, because once it fails, all clients operating the Proxy cannot handle it. In order to achieve high availability of Proxy, another mechanism is needed, such as Keepalive.

In addition, adding a layer of Proxy for forwarding will inevitably lead to certain performance loss. Is there a better solution besides client fragmentation and the centralized solution mentioned above?

Officially launched by Redis, Redis Cluster takes a different approach. It does not adopt the Proxy solution in the centralized mode, but puts part of the request forwarding logic on the client side and part on the server side, and they cooperate with each other to complete the processing of the request.

Redis Cluster was released in Redis 3.0. The earlier Version of Redis Cluster was not widely used due to its lack of rigorous testing and production validation. It is in this context that the industry has derived the above mentioned centralized cluster solutions: Codis and Tweproxy.

However, with the iteration of Redis, the official Cluster of Redis is becoming more and more stable, and more people begin to adopt the official clustering scheme. And because it’s official, it’s guaranteed continuous maintenance, which gives it an advantage over third-party open source solutions.

Redis Cluster has no intermediate Proxy layer, so how to forward requests?

Redis puts the logic of request forwarding into Smart Client. In order to use Redis Cluster, the Client SDK must be upgraded. The logic of request forwarding is built into the SDK, so business developers also do not need to write their own forwarding rules. The Redis Cluster uses 16,384 slots to forward routing rules. As follows:

Without the Proxy layer for forwarding, the client can directly operate the corresponding Redis node, thus reducing the performance loss of Proxy layer forwarding.

Redis Cluster also provides functions such as online data migration, node expansion and capacity reduction, and built-in sentry automatic recovery function. It can be seen that Redis Cluster integrates all functions in one. Therefore, it is very simple to deploy, does not need to deploy too many components, and is extremely operational and maintenance friendly.

Redis Cluster also handles client requests during node data migration, capacity expansion or reduction. When the data accessed by the client happens to be in the migration process, the server and the client develop some protocols to inform the client to access the correct node and help the client to correct its routing rules.

Although Redis Cluster provides the function of online data migration, its migration performance is not high. In the process of migration, large keys may block two migrated nodes for a long time. This function is better than Codis data migration performance.

8.5 summarize

After comparing these clustering scenarios, let’s summarize:

	Client Sharding	Codis	Tweproxy	Redis Cluster
Cluster pattern	There is no centralized	centralized	centralized	There is no centralized
use	The client writes routing rule code and directly connects to Redis	Proxy Access	Proxy Access	The Smart Client directly connects to Redis. The Smart Client has built-in routing rules
performance	high	Performance loss	Performance loss	high
Number of supported databases	multiple	multiple	multiple	a
Pipeline	support	support	support	Only single single-node pipelines are supported, not cross-node pipelines
The client needs to be upgraded	no	no	no	is
Supports online horizontal capacity expansion	Does not support	support	Does not support	support
Redis version	Support the latest version	Only 3.2.8 is supported, and the upgrade is difficult	Support the latest version	Support the latest version
maintainability	Simple operation and maintenance, high cost for developers	Multiple components complicate deployment	Only the Proxy component is deployed easily	Simple operation and maintenance, official continuous maintenance
Automatic fault recovery	Need to deploy sentries	Need to deploy sentries	Need to deploy sentries	Built-in sentry logic eliminates the need to deploy sentries

Reference documentation

High scalability: Sharding technology (Redis Cluster) A detailed explanation of Redis clustering three schemes compared with Redis Cluster practice

Redis6 Series 18- Redis Cluster