Wechat search “code road mark”, point attention not lost!
Along the way, we’ve seen that Redis enables manual failover through master-slave replication, monitoring and automatic failover through sentinel mode. The high performance of Redis makes sentry easy to handle in average scale applications.
As business system function, module, the increase in the size, complexity, we more and more high to the requirement of Redis, especially in the highs of the scene dynamic scalable ability, such as: electric business platform for daily flow low and stable, double 11 promote traffic is several times, and in both cases the number of requirements will be different for each system. If you always have peak-hour hardware and middleware configurations, you are bound to waste a lot of resources.
As an excellent cache product in the industry, Redis has become a necessary middleware for all kinds of systems. Sentinel mode is excellent, but because it does not have dynamic horizontal scaling capability, it cannot meet the increasingly complex application scenarios. Before the official launch of the cluster model, the industry has introduced various good practices, such as: Codis, TwemProxy, etc.
In order to make up for this defect, Redis officially launched a new operation mode – Redis Cluster since version 3.0.
Redis Cluster adopts a centrless structure, has the ability of automatic data sharding among multiple nodes, supports dynamic addition and removal of nodes, and can automatically failover when some nodes are unavailable to ensure high availability of the system. According to the official description, Redis Cluster has the following design goals:
- High performance and scalable, supporting up to 1000 nodes. Data is fragmented among multiple nodes. Asynchronous replication is used for primary/secondary synchronization, and redirection is implemented without proxy.
- A certain level of acceptable write security: The system will try to preserve all write operations of the client through the network partition where most primary nodes are located, and there is usually a short window in which write commands are acknowledged but lost. This window can be large if the client is connected to a network partition with a small number of nodes.
- Availability: A Redis Cluster can work under a network partition if most nodes are reachable and the unreachable master node has at least one reachable slave node. Moreover, if A master node A has no node, but some master nodes B have multiple slave nodes (more than one), A slave node of B can be transferred to A through the slave node migration operation.
A brief overview. Combined with the above three objectives, I think the biggest feature of Redis Cluster lies in its scalability. Multiple master nodes store all data through the sharding mechanism, that is, each master/slave replication structure unit manages some keys. The master-slave replication, sentry mode has other advantages as well. When the system capacity is large enough, read requests can be allocated by adding secondary nodes, but write requests can only pass through the primary node, which has the following risks:
- All write requests are centralized in a single Redis instance, and write delays may occur on a single master node as requests increase.
- Each node stores the full amount of system data. If too much data is stored, the execution of RDB backup or AOF rewrite will increase the fork time, and the master/slave replication transmission and data recovery time will increase, or even fail.
- If this primary node fails, it may result in temporary data loss or unavailability of all services during failover.
Therefore, dynamic scaling capability is the most dazzling feature of Redis Cluster. Ok, start to get down to business. This paper will introduce Redis Cluster on the whole with examples, and analyze its working principle in depth in subsequent articles.
The cluster structure
Or the continuation of the previous style, through the example of building and demonstration, to give you the establishment of the cluster structure intuitive feeling; Then comb the logic relationship based on the source code; Finally, elaborate on the process of cluster construction, step by step.
Hands-on practice
According to the official documentation, in Redis version 5 and later, cluster setup is relatively easy. This article uses six instances of Redis (version 6.2.0), three master nodes, three slave nodes, and one copy for each master node.
- Preparing configuration FilesThe directory:
cluster-demo
Create 6 folders named after the port numbers Redis will listen on: 7000, 7001, 7002, 7003, 7004, 7005. Place a minimal configuration file for the Redis Cluster in each directory named:cluster.conf
, the content is as follows (note the port modification) :
port 7000
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
Copy the code
- Start the Redis instance: Switch to the six directories and execute the command
redis-server cluster.conf
To start the Redis instance in Cluster mode. Take 7000 as an example, as shown below:
- Create the cluster: I use Redis version 6.2.0, so I can use Redis – CLI directly. Open terminal input
--cluster create
Command to create a cluster using the Redis instance we just opened, three master and three slave.
Redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001\127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 \ --cluster-replicas 1Copy the code
Through terminal, see the input as shown in the figure below:Above to>>>
There are some core operations when redis-CLI creates the cluster. Of course, there are also some parts that are not output in the log. Finally, the cluster relationship will be established as shown in the following figure.The figure above describes the cluster node relationship from two perspectives: on the left is the physical structure without considering the role of the node, and the bidirectional arrows between the nodes represent the cluster bus; On the right side, node roles and master-slave groups are considered, which reflects the master-slave replication relationship and cluster bus (cluster bus is only drawn between the master nodes, you can imagine, it is too messy to draw all of them).
With the help of redis- CLI, the redis Cluster setup is relatively simple, a single command can solve all the problems. From the above process, we can clearly understand that in the process of Cluster creation, Redis – CLI is a manager, responsible for checking node status, master-slave relationship establishment, data sharding and coordination between nodes to form a Cluster by handshake, but this is not without the support of Redis Cluster capability.
In order to understand the clustering process in depth and lay the groundwork for the rest of the understanding, I will introduce Redis Cluster to some of the concepts or structures, and then explain the clustering process in detail.
Cluster data structure
In the above example Cluster, there are six Redis instances that constitute the Cluster structure of three master and three slave nodes, and the scope of hash slots is defined for each group of master and slave nodes. How does Redis Cluster describe this relationship? With that in mind, let’s go back to the data structure and see how Redis describes this relationship. According to the relationship between Redis source data structures, I draw the organizational relationship of important data structures related to Redis Cluster, as shown below (from the perspective of node A) :
clusterState
As we know, Redis Cluster is a kind of operation mode of Redis, and everything belongs to redisServer, the most core data structure in Redis. The following are only selected fields about Cluster mode.
struct redisServer {
/* Cluster */
// Whether to run in cluster mode
int cluster_enabled; /* Is cluster enabled? * /
// Timeout parameter for cluster node communication
mstime_t cluster_node_timeout; /* Cluster node timeout. */
// An automatically generated configuration file (nodes.conf), which cannot be modified by users, stores cluster status
char *cluster_configfile; /* Cluster auto-generated config file name. */
// Cluster status. View the current cluster status from the perspective of the current Redis instance
struct clusterState *cluster; /* State of the cluster */
}
Copy the code
Therefore, in cluster mode, each redisServer describes the information and status of all nodes in the whole cluster in its view through clusterState. ClusterState contains not only the state of the current node itself, but also the state of other nodes in the cluster.
In addition, the key point is “in its view”, because the cluster is a centrless distributed system, nodes spread information through the network, and the network is not 100% reliable, there may be partition or disconnection problems, so the cluster status maintained by each node may be inaccurate or not updated in a timely manner.
The following is the complete structure of clusterState. We have a brief understanding of it and will continue to cover it in later chapters.
// This structure stores the state of the cluster from the point of view of the current node
typedef struct clusterState {
// Current node information
clusterNode *myself; /* This node */
// The configuration era of the cluster
uint64_t currentEpoch;
// Cluster status
int state; /* CLUSTER_OK, CLUSTER_FAIL, ... * /
// Is responsible for the number of primary nodes in the hash slot
int size; /* Num of master nodes with at least one slot */
// Node dictionary: name->clusterNode
dict *nodes; /* Hash table of name -> clusterNode structures */
/ / blacklist
dict *nodes_black_list; /* Nodes we don't re-add for a few seconds. */
// The hash slot and destination node being migrated
clusterNode *migrating_slots_to[CLUSTER_SLOTS];
// The hash slot and source node that are importing
clusterNode *importing_slots_from[CLUSTER_SLOTS];
// The mapping between hash slots and nodes
clusterNode *slots[CLUSTER_SLOTS];
// The number of keys stored in each hash slot
uint64_t slots_keys_count[CLUSTER_SLOTS];
rax *slots_to_keys;
/* The following fields are used to take the slave state on elections. */
// Failover authorization time
mstime_t failover_auth_time; /* Time of previous or next election. */
// Failover gets votes
int failover_auth_count; /* Number of votes received so far. */
// Whether to call a vote
int failover_auth_sent; /* True if we already asked for votes. */
//
int failover_auth_rank; /* This slave rank for current auth request. */
// Current failover configuration era
uint64_t failover_auth_epoch; /* Epoch of the current election. */
int cant_failover_reason; /* Why a slave is currently not able to failover. See the CANT_FAILOVER_* macros. */
/* Manual failover state in common. */
mstime_t mf_end; /* Manual failover time limit (ms unixtime). It is zero if there is no MF in progress. */
/* Manual failover state of master. */
clusterNode *mf_slave; /* Slave performing the manual failover. */
/* Manual failover state of slave. */
long long mf_master_offset; /* Master offset the slave needs to start MF or zero if still not received. */
int mf_can_start; /* If non-zero signal that the manual failover can start requesting masters vote. */
/* The following fields are used by masters to take state on elections. */
// The configuration era of the last poll
uint64_t lastVoteEpoch; /* Epoch of the last vote granted. */
int todo_before_sleep; /* Things to do in clusterBeforeSleep(). */
/* Messages received and sent by type. */
long long stats_bus_messages_sent[CLUSTERMSG_TYPE_COUNT];
long long stats_bus_messages_received[CLUSTERMSG_TYPE_COUNT];
// The number of nodes that reach PFAIL
long long stats_pfail_nodes; /* Number of nodes in PFAIL status, excluding nodes without address. */
} clusterState;
Copy the code
Here are a few fields to help you understand the basic fields of a cluster:
- CurrentEpoch: the current era of the cluster, which is equivalent to the era of the cluster and is upgraded due to resharding, failover, etc.
- Myself: Data type is
clusterNode
, stores the status of the current node, explained later; - Nodes: dictionary type, which is the information about all nodes in a storage cluster in the K-V structure. K is the node name (also called node ID) and the data type of V
clusterNode
. - Slots: The mapping between hash slots and nodes
clusterNode
Array, indexed by hash slot number, pointing to the responsible node;
The last three fields describe the state of the node itself, record the sibling nodes in the cluster, and store the allocation of hash slots within the cluster. When a node is started for the first time, only the node itself exists, and sibling nodes and hash slot allocation information will be available only after other nodes join or join the existing cluster. These three fields are related to the clusterNode structure.
Node Properties (clusterNode)
Redis Cluster describes the information and status of a Cluster node through the data structure clusterNode. From different perspectives, it can be used to describe the state of the node itself or other nodes.
- When Redis starts in cluster mode, it initializes one
clusterNode
Object to maintain its own state. - When a node discovers another node through a handshake or heartbeat process, it also creates one
clusterNode
To record information about other nodes.
Both themselves and other nodes are stored in clusterState maintained by redisServer, the core data structure of Redis, which is constantly updated as the cluster status changes.
Some of the information maintained by clusterNode is stable or static, such as node IDS, IP addresses and ports. Others change with the cluster state, such as the range of hash slots that the node is responsible for, node state, and so on. Let’s take a look at this data structure in the form of source code + comments:
// This is the description of the cluster node, which is the basis of the cluster operation
typedef struct clusterNode {
// Node creation time
mstime_t ctime; /* Node object creation time. */
// The node name, also called the node ID, is stored in node.conf after startup and will not change unless the file is deleted
char name[CLUSTER_NAMELEN]; /* Node name, hex string, sha1-size */
// Node status. The cluster is driven by the state machine
int flags; /* CLUSTER_NODE_... * /
// The configuration era of the node
uint64_t configEpoch; /* Last configEpoch observed for this node */
// Represents the hash slot that the node is responsible for
unsigned char slots[CLUSTER_SLOTS/8]; /* slots handled by this node */
// The node is responsible for the number of hash slots
int numslots; /* Number of slots handled by this node */
// If the current node is the primary node, the number of secondary nodes is stored
int numslaves; /* Number of slave nodes, if this is a master */
// If the current node is the primary node, store a list of secondary nodes (array)
struct clusterNode **slaves; /* pointers to slave nodes */
// If the current node is a slave node, the primary node of its master/slave replication is stored
struct clusterNode *slaveof; /* pointer to the master node. Note that it may be NULL even if the node is a slave if we don't have the master node in our tables. */
// The time when the ping request was last sent
mstime_t ping_sent; /* Unix time we sent latest ping */
// The last time I received pong's reply
mstime_t pong_received; /* Unix time we received the pong */
// The last time data was received
mstime_t data_received; /* Unix time we received any data */
// The time when the node reaches the FAIL state
mstime_t fail_time; /* Unix time when FAIL flag was set */
// The time of the last vote during failover
mstime_t voted_time; /* Last time we voted for a slave of this master */
// Update time of replication offset
mstime_t repl_offset_time; /* Unix time we received offset for this node */
mstime_t orphaned_time; /* Starting time of orphaned master condition */
// The replication offset of the node
long long repl_offset; /* Last known repl offset for this node. */
// Node IP address
char ip[NET_IP_STR_LEN]; /* Latest known IP address of this node */
// Node port number
int port; /* Latest known clients port of this node */
// Cluster bus port number
int cport; /* Latest known cluster port of this node. */
// Network links to nodes
clusterLink *link; /* TCP/IP link with this node */
// List of nodes reporting that this node is down
list *fail_reports; /* List of nodes signaling this as failing */
} clusterNode;
Copy the code
Let’s focus on a few key fields.
- Node name /ID:
name
, each node has a unique ID, which is the only basis to identify the node. - Node status:
flags
. If you also learn the Redis source code, you will find that a lot of processes are driven by the state machine, each node in the Redis Cluster in order to describe their own or other node state, node state drive system process, although it is int type, but in fact only use the lower 10 bits, each bit corresponds to a state, Let’s take a look at the role of each digit, which will be covered later:CLUSTER_NODE_NULL_NAME
, 0, corresponds to a binary sequence of all zeros. When a new node joins the cluster by handshake, it has no name by default, which in turn indicates that the node does not have a unique ID.CLUSTER_NODE_MASTER
, 1, binary indicates that the first digit from the left is 1, indicating that the node is the primary node.CLUSTER_NODE_SLAVE
, 2, binary indicates that the second digit from the left is 1, indicating that the node is a slave node.CLUSTER_NODE_PFAIL
, 4. Binary indicates that the third bit from the left is 1, indicating that the node may break down and need to be confirmed by other nodes.CLUSTER_NODE_FAIL
, 8. Binary indicates that the fourth bit from the left is 1, indicating that the node is down.CLUSTER_NODE_MYSELF
, 16, binary indicates that the fifth bit from the left is 1, indicating that the node is storing the object itself;CLUSTER_NODE_HANDSHAKE
Binary indicates that the sixth bit from the left is 1, indicating that the node is in the first ping interaction in the handshake process.CLUSTER_NODE_NOADDR
64, binary indicates that the seventh bit from the left is 1, indicating that the network address of the node is not known.CLUSTER_NODE_MEET
, 128, binary means that the 8th bit from the left is 1, indicating that the MEET command is sent to this node;CLUSTER_NODE_MIGRATE_TO
, 256, binary indicates that the ninth bit from the left is 1, indicating that this node is suitable for replication migration.CLUSTER_NODE_NOFAILOVER
, 512, binary indicates that the 10th bit from the left is 1, indicating that the node will not perform failover.
- Configuration era:
configEpoch
. Similar to the concept of “epoch” in Sentinel mode, hereconfigEpoch
Represents the era of the node, which is the current era of the clustercurrentEpoch
It might be different. - The node is responsible for the hash slot:
slots
. The hash slots of the node or its master node are stored as bitmaps. - Master-slave relationship: If the node is the master node,
slaves
The number of copies stored; If it’s from a node,slaveof
Stores its primary node. - Cluster links:
link
Is used to maintain network links between the current node and other nodes. It is theclusterState
It is the basis of cluster bus that the isolated nodes are linked together to form a network.
Based on the understanding of clusterState and clusterNode data structures, we can basically establish the logical relationship of master-slave replication among cluster nodes at the code level. I believe you have a deeper understanding of the relationship in the structure diagram at the beginning of this section.
Cluster Bus
Cluster bus is a dedicated link within Redis Cluster for Cluster governance, which is composed of TCP links between nodes. Each node in the cluster is actively linked to all the other nodes, so each node is also connected to all the other nodes.
Will exist if the cluster has N nodesN*(N-1)
In graph theory, this constitutes a directed complete graph with N vertices. Taking three nodes as an example, the relationship between nodes and the cluster bus is shown as follows:From the previous understanding of Cluster building, node attributes and Cluster status, we can know that Redis Cluster is a centrless distributed system, so it needs to continuously exchange information between nodes to achieve state consistency, and Cluster bus is the channel of information exchange between nodes. The “passage” here isclusterNode
In thelink
, its data structure isclusterLink
.
/* clusterLink encapsulates everything needed to talk with a remote node. */
typedef struct clusterLink {
mstime_t ctime; /* Link creation time */
// Network links with remote nodes
connection *conn; /* Connection to remote node */
sds sndbuf; /* Packet send buffer */
char *rcvbuf; /* Packet reception buffer */
size_t rcvbuf_len; /* Used size of rcvbuf */
size_t rcvbuf_alloc; /* Used size of rcvbuf */
struct clusterNode *node; /* Node related to this link if any, or NULL */
} clusterLink;
Copy the code
ClusterLink encapsulates the remote node instance and its network connection, receiving and sending packet information, based on which two nodes can maintain real-time communication.
Note that the ports used in the cluster bus are not used for clients like 6379, but are dedicated; It is not set manually by us, but is derived from the port serving the client by offset calculation (+10000). For example, if the port serving the client is 6379, the cluster bus listens on port 16379.
Therefore, if you want to deploy the Redis instance in cluster mode, you must ensure that both ports on the host are not occupied; otherwise, the instance will fail to start.
Communication protocol
So far, we have looked at cluster nodes, cluster states, and cluster buses. They provide the foundation for the cluster to run, and the next step is to get the nodes to “move”, get them to meet each other, introduce their friends… Everything needs to communicate, and the cluster bus has provided the communication channel. Let’s look at their “language” again.
Message structure
The cluster message structure consists of message header and message body. All types of messages use the common message header. The message header contains message type fields, and different message body objects are added according to the message type. Understand the structure of the cluster message with source code and comments:
typedef struct {
// Fixed header, magic number "RCmb"
char sig[4];
// Total length of message: header + body
uint32_t totlen;
// Message version, currently 1
uint16_t ver;
// The port that provides services to external clients, such as 6379
uint16_t port;
// Message type, such as PING, PONG, MEET, etc. The node needs to append or parse the message body based on this value
uint16_t type;
//
uint16_t count;
// The current cluster era from the node that sent the message
uint64_t currentEpoch;
// The configuration era of the sending node or its master node
uint64_t configEpoch;
// Replication offset: for the master node, is the replication offset for command propagation; For slave nodes, it is the replication offset from their master node that has been processed
uint64_t offset;
// The name /ID of the sending node
char sender[CLUSTER_NAMELEN];
// The hash slot that the sending message node is responsible for
unsigned char myslots[CLUSTER_SLOTS/8];
// If a slave node, this field places the node name /ID of its master node
char slaveof[CLUSTER_NAMELEN];
// The IP address of the sending node
char myip[NET_IP_STR_LEN];
char notused1[34];
// Cluster bus listening port
uint16_t cport;
// The status of the sending node
uint16_t flags;
// From the perspective of the sending node, the state of the current cluster is OK or FAIL
unsigned char state;
unsigned char mflags[3];
// The body of the message, according to the above message type type, determines the contents of this field
union clusterMsgData data;
} clusterMsg;
Copy the code
The message header mainly contains the state of the message sender node so that the message receiver can parse the message and update the node information in the local cluster state. The message body is determined by the Type field in the message header. In the message structure, the message body uses the consortium type clusterMsgData.
ClusterMsgData is a union that assigns or parses corresponding fields based on type. Union is a C data structure, and you can think of this field as a Java generic, dynamically specified by the runtime.
union clusterMsgData {
/* PING, MEET and PONG */
struct {
/* Array of N clusterMsgDataGossip structures */
clusterMsgDataGossip gossip[1];
} ping;
/* For broadcast node failure FAIL */
struct {
clusterMsgDataFail about;
} fail;
/* PUBLISH */
struct {
clusterMsgDataPublish msg;
} publish;
/* Used to broadcast the latest status of the node hash slot UPDATE */
struct {
clusterMsgDataUpdate nodecfg;
} update;
/* MODULE */
struct {
clusterMsgModule msg;
} module;
};
Copy the code
Message type
The Redis Cluster provides several different message types and then combines them to perform certain functions, such as heartbeat, handshake, configuration update, and so on. Let’s start by looking ata few important message types and their corresponding data structures. Note that data structures refer only to the data part of the overall message structure.
- PING: used for heartbeat requests between nodes.
- PONG: Reply to PING heartbeat requests between nodes;
- MEET: node handshake request, which is a special PING type;
ClusterMsgDataGossip: clusterMsgDataGossip: clusterMsgDataGossip: clusterMsgDataGossip: clusterMsgDataGossip: clusterMsgDataGossip: clusterMsgDataGossip: clusterMsgDataGossip
typedef struct {
/ / the node ID
char nodename[CLUSTER_NAMELEN];
// Indicates the time when the sending node sent the last ping request to the node
uint32_t ping_sent;
// The time when the message sending node last received pong reply from the node
uint32_t pong_received;
IP / / node
char ip[NET_IP_STR_LEN]; /* IP address last time it was seen */
// External service port
uint16_t port; /* base port last time it was seen */
// Cluster bus port
uint16_t cport; /* cluster port last time it was seen */
// The state of the node in the view of the sending node
uint16_t flags; /* node->flags copy */
// Reserved field
uint32_t notused1;
} clusterMsgDataGossip;
Copy the code
The MEET message is only used when a node joins the cluster by shaking hands (detailed in the cluster establishment process later), and the PING PONG combination is used for heartbeat interaction between nodes (detailed in the cluster fault tolerance section).
- FAIL: Used to tell other nodes that I (the message sender) found it
nodename
The node is faulty. If a node is found to be faulty, the source node sends a broadcast message to all other reachable nodes in the cluster using this type of command.
typedef struct {
// The name of the faulty node
char nodename[CLUSTER_NAMELEN];
} clusterMsgDataFail;
Copy the code
- UPDATE: indicates that the hash slot allocation of other cluster nodes has changed. After receiving the information, nodes need to UPDATE the mapping between hash slots and nodes in the local cluster status.
typedef struct {
uint64_t configEpoch; /* Config epoch of the specified instance. */
char nodename[CLUSTER_NAMELEN]; /* Name of the slots owner. */
unsigned char slots[CLUSTER_SLOTS/8]; /* Slots bitmap. */
} clusterMsgDataUpdate;
Copy the code
- FAILOVER_AUTH_REQUEST: initiates a failover vote from a node.
- FAILOVER_AUTH_ACK: the primary node acknowledges the vote request initiated by the secondary node.
The combination of the two is used to elect the interaction process from the node and is the basis for cluster mode failover.
Cluster Establishment Process
Simple cluster example system set up, through the console output we can generally understand the process of cluster creation; The basic concepts related to clustering, from theory to code structure, were also explained earlier. Next, summarize the whole process and explain the node handshake process in detail.
The overall process
ClusterManagerCommandCreate combination process of source code (function) and the console output, the main process of cluster creation summed up as follows:
- According to the input parameters, Redis-CI successively creates cluster management nodes, establishes network links with each node, and obtains node and existing cluster information.
- Check the input nodes successively, for example, whether they are existing nodes in the cluster and whether they are empty.
- Allocate primary and secondary nodes and hash slots to determine whether the nodes meet the requirements for creating a cluster: at least three primary nodes;
- Output hash slot sharding and master/slave node assignment, and perform node configuration with user permission:
- For the active node: Yes
CLUSTER ADDSLOTS
Command to add the range of hash slots that the primary node is responsible for; - For slave nodes: Pass
CLUSTER REPLICATE
Command to create a primary/secondary replication relationship. - For all nodes: Pass
cluster set-config-epoch
Command to configure the epoch (config epoch
Add 1);
- For the active node: Yes
- Redis – through the cli
cluster meet
Command triggers the node handshake process between nodes over the cluster bus (Cluster Bus
)MEET
,PING
,PONG
Information, gradually establish a cluster relationship; - Check the cluster node and hash slot allocation using the node on port 7000.
- Another important point that is not shown in the figure above is that when a cluster is set up, nodes are constantly “gossiping” over the cluster bus using the binary protocol Gossip to complete node discovery, health check, failover, configuration update, and migration from nodes.
Next, we further analyze the core steps of data sharding, master-slave assignment, epoch configuration and node handshake.
Master/slave allocation and data sharding
- Calculate the number of primary nodes.
According to the input node information and the requirements on the number of secondary nodes, redis-CLI calculates the number of primary nodes and then groups all nodes according to a primary N secondary node. Assuming that the number of input nodes is N and the number of copies of each primary node is required to be R, it can be theoretically divided into:Where, m is the calculation result on the right rounded down. In cluster mode, at least three primary nodes are required. If m is less than 3, a message is displayed indicating that the creation fails.
- Assign primary and secondary nodes.
Redis – CLI assigns primary nodes to different hosts preferentially according to IP distribution of input nodes, selects M primary nodes, and then assigns secondary nodes according to the number of secondary nodes r. After the specified number of slave nodes r is allocated, the slave node is allocated again if there are any remaining nodes.
After the new node is started, it is the primary node by default. After the primary node is assigned to the secondary node, redis-CLI only needs to set the primary node for the secondary node and wait for the configuration to be performed.
- Data fragmentation
When data is sharded, 16384 hash slots will be evenly divided to the master node by default.
Configuration is issued
After master/slave allocation and data fragmentation are complete, the redis-CLI has saved the master/slave configuration and data fragmentation configuration information for the node locally. After obtaining the permission of the administrator, it will traverse the node list and deliver the configuration to the corresponding node.
If the node is the primary node, data fragmentation is configured. Redis -cli uses the CLUSTER ADDSLOTS command to set the hash slots for the nodes. After receiving the message, the active node performs the following operations:
- If clusterState-> Importing_slots is not empty, set it to NULL.
- Modify the Myself slots field to bitmap the range of hash slots that the current node is responsible for.
If it is a slave node, the primary node is set for it. Redis -cli sends the CLUSTER REPLICATE command to a slave node. After the secondary node receives the data, the primary/secondary replication is performed.
Upgrade era
After the above configuration, each node has changed from the state when it was just started. To show this change, redis-cli updated the epoch of each node, using the command cluster set-config-epoch.
When the node receives the command, it changes Myself’s configEpoch and ensures that the currentEpoch of the cluster is not lower than this value.
Node to shake hands
At this point, each individual node has been configured, and the Redis-CLI then issues a handshake command to the nodes to add them to the cluster one by one from scratch. For security reasons, the handshake between nodes can only be initiated by the administrator and is completed through the cluster bus.
After the node is started, it listens to the cluster bus port and will accept all external network links and receive messages sent by them. However, if the source node is not found to be a known node of the cluster, all messages sent by the node will be discarded. An existing node in a cluster can accept a new node to join the cluster in either of the following ways:
- MEET request: A new node sends a MEET request, indicating that the expansion command is initiated by the administrator and the node joins the cluster through handshake.
- Automatic discovery: If a node is recognized as a valid node by one node in the cluster, it is notified to other nodes through heartbeat between nodes, and other nodes also consider it as a valid node in the cluster. For example, there are three nodes A, B and C in A known cluster, and A recognizes D through MEET request. After A period of heartbeat, B and C will also accept D as the node of the cluster.
Combining the above two methods, we only need to shake hands with the first node successively from the second node, and then all nodes can join the cluster through automatic discovery.
Ok, let’s look at how the handshake process is implemented. For simplicity, let’s describe the process using just two nodes. Assume the node information is as follows:
- Node A: 127.0.0.1 7000
- Node B: 127.0.0.1 7001
Run the redis-cli meet command to make node B shake hands with node A using cluster meet 127.0.0.1 7000.
After receiving the command, node B begins to shake hands with node A. The following figure illustrates the changes of node status during the handshake.The graph shows that during the handshake process, two nodes complete the handshake through two interactions of “meeting-pong-ping-pong”, and the state change is clear at a glance. Describe the process in words:
- Node B creates the node information of handshake node A, which has no name when initialized and is in the status of
MEET
andHANDSHAKE
. - Node B initiatively creates A cluster bus connection with node A
MEET
Request, and node A repliesPONG
. At this time:- For B, cancel for A
MEET
State, get the name of A; - For A, B is
HANDSHAKE
And add the node to the node list.
- For B, cancel for A
- Node A creates A cluster bus connection with node B and proactively initiates the connection
PING
The request. Node B then recoversPONG
. At this time:- For B, cancel for A
HANDSHAKE
state - For A, cancel for B
HANDSHAKE
Status, set the name.
- For B, cancel for A
- At this point, nodes A and B complete the handshake, and then enter the normal heartbeat maintenance process.
Summary of Cluster Structure
This part is mainly to lay the foundation, the Cluster of some basic concepts introduced below, while introducing the physical structure and logical structure of Redis Cluster, through examples and Cluster establishment process, to give you a more intuitive understanding.
More exciting content in the back ~