Consistent hashing algorithm
Solve the problem of data migration when nodes change dynamically in distributed environment. This mode is used in distributed cache, distributed storage, and load balancing scenarios. For example, the common hash algorithm in distributed cache maps data to corresponding nodes after modulating data according to the number of nodes. When nodes need to be expanded, the number of nodes changes, resulting in the change of 郔 data modulating results. When searching cached data, the original cache cannot be found, resulting in a large number of cache failures. To ensure that the cache does not become invalid, you need to migrate old data based on the new hash result during capacity expansion. As a result, most data will be migrated and the cache will be unavailable for a long time. Even if you multiply, you have to migrate half of your data. The consistent hash algorithm can greatly reduce the amount of data migration.
The principle of
Consistent hashing organizes the entire Hash space into a virtual ring, and then hashes each server using the IP or host name of the server as the key so that each machine can determine its position on the Hash ring. Then use the following algorithm to locate the data access to the corresponding server: Calculate the Hash value of the data key using the same function Hash, and determine the position of the data on the ring. From this position, “walk” along the ring clockwise, and the first encountered server is the server to which it should be located.
- Data skew problem
When the number of server nodes is small, the amount of data in each interval may differ greatly, resulting in data skew. Consistent hash algorithm introduces virtual node mechanism, each node corresponds to multiple virtual nodes, each key finds the corresponding virtual node, and then calculates the real node corresponding to the virtual node.
- Hash conflict problem
Hash conflicts generated when the key computes the hash have no impact. If hash conflicts occur when nodes are hashing, the node on which data is stored is unknown. Generally, nodes do not generate hash conflicts. Virtual nodes are introduced to reduce the probability of hash conflicts.
Consistent hashing algorithm principle www.cnblogs.com/lpfuture/p/…
Zab agreement
Zab protocol is to solve the final data consistency problem of each node in distributed system.
The ZAB protocol has three roles:
- leader
Process data requests from clients and synchronize the proposal to followers and observers
- follower
Receives read and write requests from the client, forwards transaction requests to the leader, and votes in the leader election
- observer
Receives read and write requests from clients and forwards the transaction requests to the leader without participating in voting. This is mainly to increase the concurrent read capability of the cluster. Zab protocol has two modes: crash recovery mode and atomic broadcast mode.
- Crash recovery mode
This mode is used when the cluster is initialized or the leader node crashes to generate a new leader and synchronize data
- Atomic broadcast mode
Use this mode during normal cluster operation. After the leader is elected and the cluster synchronization is complete, switch from crash recovery mode to message broadcast mode. The message broadcast pattern is similar to two-phase commit.
Raft algorithm
Raft algorithm also addresses the final consistency of data on nodes in distributed systems. It is a decentralized, highly available protocol. www.cnblogs.com/xybaby/p/10…
Raft and Zab
- Leader Election Process
Raft algorithm, each candidate can only vote once in a certain term round, first come, first served, which may lead to split vote. Raft set different timeout time for each candidate, so that the candidate who timed out first can initiate a new round of voting and get more than half of the votes first. Avoid endless loops. In ZAB protocol, each follower can vote multiple times in each electonEpoch round, and update the vote as long as it meets a larger one, and then synchronize the vote to all servers. In this case, there is no split vote, and it is also conducive to selecting the server with more updated logs. But election time is theoretically more expensive than Raft.
- A new server is added to the cluster
Raft algorithm: When a new server joins, the leader sends appendEntries RPC. This server can recognize the Leader Zab algorithm. When a new server joins, the leader zab algorithm initiates an election in the advanced LOOKING state. Other followers then send him a poll telling him who the current leader is. When he receives more than half of the followers pointing to the same leader, he changes his status to following and synchronizes the data from the leader.
See blog.csdn.net/qq_40994017…
Gossip protocols
Gossip protocol is a message propagation protocol in distributed system. It is a decentralized distributed protocol, which solves two problems: state propagation in cluster and state consistency. Use a random way to spread the information to the whole network, and in a certain period of time to make all nodes in the system data consistent.
advantage
- scalability
The gossip protocol generally requires an O(logN) round to propagate information to all nodes, where N represents the number of nodes. As the number of nodes increases, the propagation rounds change little. Moreover, the node does not wait for an ACK of the message when the data is transmitted, so it does not matter if the message fails because the message can be passed to the node that failed earlier through other nodes. The system can easily scale to millions of processes.
- Fault tolerance
The gossip protocol is not affected by the restart or downtime of any node on the network.
- Robustness,
The Gossip protocol is decentralized, so all nodes in the cluster are peer and there are no special nodes, so the failure of any node does not prevent other nodes from continuing to send messages. Any node can join or leave at any time without affecting the overall quality of service (QOS) of the system.
- Final consistency
The Gossip protocol implements exponential and fast information transmission. Therefore, when new information needs to be transmitted, messages can be quickly sent to the global nodes, so that all nodes have the latest data in a limited time.
Learn more about the Gossip protocol in this article