This article appeared in:
Talk about the cluster version Redis and Gossip protocol


Wechat official Account: Back-end technology compass

Yesterday’s article wrote about the problem of consistent hash algorithm in distributed system. At the end of the article, I mentioned the implementation scheme of redis-cluster for consistent hash algorithm. Today, I will look at Redis-cluster and the important concept Gossip protocol.

1. Basic concepts of Redis Cluster

The cluster version of Redis sounds very grand, and it is indeed a lot more complex than the single instance one master one slave or one master many slave model. The architecture of the Internet is always evolving with the development of the business.

  • Single instance Redis architecture

Starting with a primary N slave plus read/write separation, Redis looks good as an instance of a cache, and has a Sentinel mechanism that allows for primary/secondary failover.

Single instance with one master and two slaves + read/write separation structure:

Note: Pictures from the Internet

In the case of single instance, there is only one Master as storage in essence. Even if the memory of the machine is 128GB, it is generally recommended that the utilization rate should not exceed 70%-80%. Therefore, 100GB data is a lot to use at most, and 50% is good in practice. Because a large amount of data means a high persistence cost, it can severely block the service, or even eventually switch the master.

If the single instance is used only as a cache, then in addition to the cache breakdown problem that occurs when the service fails or blocks, there could be a lot of requests that would kill MySQL together.

If the single instance is used as the main storage, then the problem is larger, because of the persistence issue, whether bgSave or Aof will cause the disk flushing blocking, resulting in a service request success rate decline, this is not a single instance can solve, because as the main storage, persistence is necessary.

Therefore, we expect a multi-master and multi-slave Redis system, which will increase the pressure and stability both as the main memory and as the cache. Nevertheless, I suggest:

  • Redis try not to be the master storage!
  • Redis try not to be the master storage!
  • Redis try not to be the master storage!

If you go your own way, you’ll either be hurting yourself or someone else.

  • Clustering and sharding

To support clustering, one of the first things to overcome is the sharding problem, also known as the consistent hash problem. There are three common solutions:

  • Client sharding:

This situation is mainly similar to the practice of hashing modulus, when the client has complete control of the number of servers, can be used simply.

  • Middle layer sharding:

In this case, a middle layer is added between the client and the server, acting as the administrator and scheduler. The client request is sent to the middle layer, and the middle layer realizes the forwarding and recycling of the request. Of course, the most important role of the middle layer is the dynamic management of multiple servers.

  • Server fragment:

Do not use the intermediation is decentralized management mode, arbitrary nodes in the client to the server directly request, if the requested Node without the required data, is a client reply version, and tell the client the required data storage location, this process is actually a client and server to cooperate, together to request redirection.

  • Cluster version of Redis for mid-tier sharding

As mentioned earlier, changing to N principal N slave can improve processing power and stability, but it also faces the problem of consistent hashing, which is the data problem when the capacity is dynamically expanded.

Before the official release of Redis cluster version, there are some solutions in the industry can’t wait to use the self-developed version of Redis cluster, including domestic Wandoujia Codis, foreign Twiter TwemProxy.

The core idea is to add a sharding layer between multiple Redis servers and clients, and the sharding layer completes the consistent hash and sharding of data. The practices of each company are different to some extent. However, the core problems to be solved are the expansion capacity, failover, data integrity, data consistency, request processing latency and other issues in the scenario of multiple Redis.

Industry Codis with LVS and other practices to achieve Redis cluster solutions have many are applied to the generation environment, the performance is also good, mainly the official cluster version in Redis3.0, how to its stability, many companies are not willing to do mice, but in fact after iteration has been to Redis5.x version, The official cluster version is pretty good, at least in my opinion.

  • Official cluster version of the server shard

The official version is different from the above Codis and Twemproxy in that it implements Sharding Sharding technology at the server layer. In other words, there is no middle layer in the official version, but multiple service nodes themselves realize Sharding. Of course, it can also be considered that these functions of realizing Sharding are integrated into the Redis service itself. There is no separate Sharding module.

The previous article also mentioned that the concept of slot was introduced into the official cluster for data sharding, and then data slots were allocated to multiple Master nodes, which were then configured with N slave nodes, thus the official cluster architecture of multi-instance Sharding version was formed.

Redis Cluster is a distributed Cluster that can share data among multiple Redis nodes. On the server side, the communication between nodes is carried out through a special protocol, which acts as the communication protocol for the management part of the middle layer. This protocol is called Gossip protocol.

The purpose of distributed system conformance protocol is to solve the problem of multi-node status notification in cluster, which is the basis of cluster management.

The figure shows the official cluster architecture diagram based on the Gossip protocol:

Note: Pictures from the Internet

2. Basic operating principle of Redis Cluster

  • Node state information structure

Each node in a Cluster maintains a copy of the current state of the entire Cluster in its view, including:

  1. Current Cluster Status
  2. Information about slots that each node in the cluster is responsible for, and their migrate status
  3. The status of each node in the cluster is master-slave
  4. The survival status and unreachable vote of each node in the cluster

In other words, the above information is the content and theme of gossip spread among nodes in the cluster, and it is relatively comprehensive. In this way, both the information of oneself and that of others are passed on to each other, and the final information is comprehensive and accurate, which is different from the problem of Byzantine Empire and has high credibility.

Based on the Gossip protocol, when the cluster status changes, such as new node joining, slot migration, node breakdown, slave promoted to new Master, we hope these changes will be discovered as soon as possible and spread to all nodes in the whole cluster and reach an agreement. The heartbeat (PING, PONG, MEET) between nodes and the data they carry are the most important way of cluster state propagation.

  • Concept of the Gossip protocol

The Gossip Protocol, also known as the Epidemic Protocol, is a protocol for exchanging information between nodes or processes based on how an epidemic spreads.


It is widely used in distributed systems. For example, we can use the Gossip protocol to ensure that all nodes in the network have the same data.


The Gossip Protocol was originally created in 1987 by Alan Demers, a researcher at Xerox’s Palo Alto Research Center.
www.iteblog.com/archives/25…

The Gossip protocol is a mature protocol in P2P networks. The great benefit of The Gossip protocol is that even if the number of nodes in the cluster increases, the load on each node does not increase much, and is almost constant. This allows the cluster that Consul manages to scale out to thousands of nodes.

The Gossip algorithm is also known as anti-entropy. Entropy is a physical concept that represents chaos, and anti-entropy is the search for consistency in chaos. This fully illustrates the characteristics of Gossip: In a bounded network, each node communicates randomly with other nodes, and after some chaotic communication, the state of all nodes eventually reaches an agreement. Each node may know all other nodes or only a few neighbor nodes. As long as these nodes can be connected through the network, their status is consistent eventually. Of course, this is also the characteristic of epidemic transmission.
www.backendcloud.cn/2017/11/12/…

The above description is more academic, in fact, the Gossip protocol for our people to eat melon is not unfamiliar, the Gossip protocol is also known as the Gossip protocol, to put it simply, the Gossip protocol, the spread of the scale and speed are very fast, you can experience. So many algorithms in computers are derived from life, but higher than life.

  • The use of the Gossip protocol

Redis clusters are decentralized and communicate with each other using the Gossip protocol. Messages in the cluster can be of the following types:

  • Meet Using the cluster Meet IP port command, a node in an existing cluster sends an invitation to a new node to join the existing cluster.
  • The Ping node sends a Ping message to other nodes in the cluster every second. The message contains the known addresses, slots, status information, and last communication time of the two nodes.
  • After receiving the ping message, the Pong node replies a Pong message containing the information of the two known nodes.
  • If node Fail fails to ping a node, a message indicating that the node is down is broadcast to all nodes in the cluster. After receiving the message, other nodes mark themselves offline.

Due to the decentralization and communication mechanism, Redis Cluster chose ultimate consistency and basic availability.

For example, when a new node is added (meet), only the invited node and the invited node know about it, and the rest of the nodes have to wait for the ping message to spread layer by layer. In addition to Fail, the entire network is notified immediately. Other nodes, such as new nodes, nodes coming online again, nodes being elected as primary nodes, slot changes, etc., need to wait to be notified. In other words, the Gossip protocol is the final consistency protocol.

The Gossip protocol has a high requirement on the server time. Otherwise, an inaccurate timestamp may affect the validity of the node’s message determination. In addition, the network overhead after the number of nodes increases will also exert pressure on the server. At the same time, the number of nodes is too many, which means that the time to reach the final consistency is relatively longer. Therefore, the official recommendation of the maximum number of nodes is about 1000. The figure shows the communication interaction diagram when the node server is newly added:

Note: Pictures from the Internet

Redis official cluster is a decentralized peer-to-peer (P2P) network, which was very popular in the early years, such as Edm, BT and so on. In The Redis cluster, Gossip protocol plays the role of decentralized communication protocol, which realizes the autonomous behavior of the whole cluster without central management node according to the formulated communication rules.

  • Fault detection based on Gossip protocol

Each node in the cluster periodically sends PING messages to other nodes in the cluster to exchange status information of each node and check the status of each node: online, suspected offline PFAIL, offline FAIL.

Save information for yourself: When primary node A receives A message from primary node B that primary node D is in the PFAIL state, primary node A finds the clusterNode structure corresponding to primary node D in its clusterState. Nodes dictionary. Add the offline report of primary node B to the Fail_Reports link list in the clusterNode structure, and inform other nodes of the suspected offline status of node D through the Gossip protocol.

To rule together: If more than half of the primary nodes in the cluster report that the primary node D is suspected to be offline, then the primary node D will be marked as offline (FAIL), and the nodes marked as offline will broadcast the FAIL message of the primary node D to the cluster. All nodes that receive the FAIL message will immediately update the status of the primary node D in the Nodes and mark it offline.

Final decision: Marking Node as FAIL requires the following two conditions:

  • More than half of the primary nodes have Node in the PFAIL state.
  • The current node also marks node as a PFAIL state.

That is to say the current node found suspected hangs up the other nodes, then write them in your own little, waiting for the notice to the other good gay friend, let them look at myself, finally and half or more of the good gay friends think hang up and the node and the node oneself also think you hang up and it’s really hung up, be careful.

3. References

  1. Cloud.tencent.com/developer/a…
  2. www.zhihu.com/question/21…
  3. www.backendcloud.cn/2017/11/12/…
  4. www.cnblogs.com/zhoujinyi/p…
  5. Catkang. Making. IO / 2016/05/08 /…

Like to see to form a habit

Love life you and I are better than yesterday!

Welcome to follow the wechat official account: Back-end Technology Compass