Kademlia algorithm for source analysis of Ethereum

Overview of KAD algorithm

Kademlia is a point-to-point distributed hash table (DHT) that also has provable consistency and performance in error-prone environments. Using an XOR index-based topology to route queries and locate nodes simplifies the algorithm and AIDS in proof. This topology has one characteristic: each message exchange is capable of delivering or reinforcing valid information. The system uses this information to perform concurrent asynchronous queries and can tolerate node failures without causing user timeout.

KAD algorithm to deal with the problem

  1. How do I allocate storage content to each node? How do I add or delete content
  2. How do I find the node/address/path where the file is stored

Node status

The basic attributes of a node are as follows:

  • Node ID, Node ID
  • Node IP address and port number

In a Kad network, all nodes are treated as the leaf of a binary tree, and the position of each node is uniquely determined by the shortest prefix of its ID value.

For any node, the binary tree can be decomposed into a series of continuous subtrees that do not contain themselves. The topmost subtree, which consists of the other half of the tree that does not contain itself; The next layer of the subtree consists of the remaining half that does not contain itself; And so on until the whole tree is split. Figure 1 shows how node 0011 is divided into subtrees:

The dotted lines contain the subtrees, with the prefix 0,01,000,0010 from top to bottom.

The Kad protocol ensures that each node knows at least one node of each of its subtrees, as long as those subtrees are non-empty. In this case, each node can find any node by its ID value. This routing process is achieved by what is called XOR distance.

Figure 2 illustrates how node 0011 finds node 1110 through a continuous query. Node 0011 obtains more and more close nodes by continuously learning and querying the best nodes among the subtrees at the bottom step, and finally converges to the target node.

It should be noted that only node 101 queried in the first step is known to node 0011. Nodes queried in the following steps are all nodes returned by the previous step that are closer to the target. This is a recursive operation process.


Node distance

Each node in the Kad network has a 160-bit ID value as its identifier, and the Key is also a 160-bit identifier. Each computer that joins the Kad network is assigned a node ID value in the 160-bit key space (the ID can be thought to be randomly generated), and the

pairs are stored on the node whose ID value is “closest” to the key value.
,value>

Determining the distance between two nodes x and y is based on the mathematical XOR binary operation, d(x,y)=x⊕y, which corresponds to 0 with the same position and 1 with different positions. Such as:

    010101
XOR 110001
----------
    100100
Copy the code

So the distance between these two nodes is 32 plus 4 is 36.

Obviously, the difference in values at high levels has a greater impact on the results.

For xOR operations, there are the following mathematical properties:

  • The distance between the two nodes is random
  • The node is 0 away from itself
  • Symmetry. The distance from A to B is the same as the distance from B to A
  • The triangles are not equal. distance(A,B)+distance(B,C) <= distance(A,C)

For any given node x and distance δ ≥0, there will always be an exact node y such that d(x,y)= δ. In addition, unidirectivity ensures that all queries with the same key value gradually converge to the same path, regardless of the starting node position of the query. In this way, as long as the

pair is cached on all nodes along the query path, you can reduce the pressure on the hot key value nodes and also speed up the query response.
,value>

K barrel

The concept of the K bucket

The Kad routing table is constructed from tables called K buckets.

For each 0≤ I ≤160, each node has some node information within the range of [2^ I ^,2^ I +1^). This information consists of lists of IP addresses,UDP ports, and Node ids (Kad networks use UDP to exchange information). Each such list is called a K bucket, and each K bucket is stored in the order of the last time it was seen. The least recently seen items are placed at the head, and the most recently seen items are placed at the tail. Each bucket has no more than K data items.

The list of all k-buckets of a node is as follows:

When I is small, the K bucket is usually empty (that is, there are not enough nodes, such as when I = 0, there may be only one item at most); When the value of I is large, the number of items in the corresponding K bucket is likely to exceed K (of course, the wider the coverage range, the greater the possibility of more nodes). Here, K is a constant set to balance system performance and network load, but it must be an even number, such as K = 20. In the BitTorrent implementation, the value is k = 8.

As the coverage distance range of each K bucket increases exponentially, the information of the nodes close to the K bucket is more and that of the nodes far from the K bucket is less, thus ensuring convergence in the route query process. Because the interval is divided in an exponential way, it has been proved that for a Kad network with N nodes, the target node can be accurately located only after logN step query at most.

K bucket update mechanism

When node X receives a PRC message, the IP address of sender Y is used to update the corresponding bucket K. The specific steps are as follows:

  1. Calculate the distance between yourself and the sender: d(x,y)=x⊕y. Note that x and y are ID values, not IP addresses
  2. Select the corresponding bucket K from D to perform the update operation
  3. If the IP address of y already exists in the K bucket, the corresponding entry is moved to the end of the K bucket
  4. If the IP address of Y is not recorded in bucket K
    1. If the number of entries in bucket K is smaller than K, the IP address, UDP port, Node ID (IP address, UDP port, Node ID) of Y is inserted to the end of the queue
    2. If the number of entries in bucket K is greater than K, select the header entry (for example, node Z) to perform the RPC_PING operation
      1. If Z does not respond, the information for Z is removed from bucket K and the information for Y is inserted to the end of the queue
      2. If z responds, the z message is moved to the end of the queue and the Y message is ignored

The K bucket update mechanism is a very efficient implementation of a strategy to update recently seen nodes, unless the online node has not been moved out of the K bucket. That is to say, nodes that have been online for a long time have a high possibility to remain in the K-bucket list.

Therefore, by keeping the nodes online for a long time in bucket K, Kad significantly increases the probability that the nodes in bucket K will remain online in the next period, which ** brings great benefits in terms of Kad network stability and reduced network maintenance costs (no need to frequently build routing tables for nodes) **.

Another advantage of this mechanism is that it can defend against DOS attacks to a certain extent, because Kad will update the information of K bucket only after the old node fails, which avoids flooding routing information by adding new nodes.

To prevent bucket K from aging, all buckets K that have not been updated within a certain period of time are randomly selected from their buckets K to perform the RPC_PING operation.

These K-bucket mechanisms enable Kad to ease traffic bottlenecks (all nodes do not perform a large number of updates at the same time) and to respond quickly to node failures.


Protocol messages

Kademlia includes four types of remote RPC operations: PING, STORE, FIND_NODE, and FIND_VALUE.

  1. The PING operation probes a node to determine whether it is still online.

  2. The STORE operation tells a node to STORE a

    pair for future queries.
    ,value>

  3. The FIND_NODE operation takes a 160-bit ID as an argument. The recipient of this operation returns information (IP address, UDP port, Node ID) of K nodes that it knows are closer to the destination ID.

    The information for these nodes can be obtained from a single K bucket or from multiple K buckets (if the K bucket closest to the target ID is not full). In either case, the receiver will return information about K nodes to the operation initiator. However, if the receiver’s node information of all buckets K does not add up to K, it will return the information of all nodes to the initiator.

  4. The FIND_VALUE operation is similar to the FIND_NODE operation except that it only returns the Node’s IP address, UDP port, and Node ID. If the recipient of this operation receives a STORE operation for the same key, the stored value is returned directly.

    Note: On the Kad network, data stored in the system is stored in

    pairs. According to the author’s analysis, in the DHT implementation of BitSpirit, its key value is the info_hash string of the Torrent file, and its value is closely related to the Torrent file.
    ,value>

To prevent address forgery, in all RPC operations, the receiver needs to respond with a random 160-bit ID value. In addition, in order to ensure the network address of the sender, the PING operation can also be attached to the RPC reply message of the receiver (in the above four operations, when the receiver replies to the sender, the PING of the sender can be carried on the receiver to verify whether the sender is still alive).


Routing lookup

One of the most important features of Kad technology is its ability to provide a fast node lookup mechanism, which can also be adjusted by parameters.

If node x wants to find the node whose ID value is t, Kad recursively searches the route as follows:

  1. Calculate the distance to t: d(x,y)=x⊕y
  2. Select * from bucket K (” [“, “] “) of x and run FIND_NODE operation at the same time. If the bucket K contains less than α nodes, a total of α nodes closest to D are selected from nearby buckets.
  3. Interconnect with each node subjected to query operation. If it finds itself to be T, it answers that it is the closest to T. Otherwise, measure the distance between oneself and T, and select the information of α nodes from the corresponding K bucket to x.
  4. X performs FIND_NODE again for each newly received node, and the process repeats until each branch has a node that responds that it is closest to T.
  5. Through the above search operation, X obtains the information of k nodes closest to T.

Note: The term “closest” is used here because the node with ID t does not necessarily exist in the network, that is, t is not assigned to any computer.

Here alpha is also a parameter set for system optimization, just like K. In BitTorrent implementations, the value is α=3.

When α=1, the query process is similar to Chord’s hop by hop query process, as shown in Figure 4.

The whole route query process is recursively operated, and the process can be expressed by mathematical formula as follows:

N0=x (that is, the initiator of the query operation)

N1 = find ⎯ noden0 (t)

N2 = find ⎯ noden1 (t)

. .

Nl = find ⎯ nodenl – 1 (t)

This recursion continues until Nl=t, or Nl’s routing table has no information about T, i.e. the query fails.

Since each query can obtain information from K bucket closer to T, this mechanism ensures that each recursive operation can obtain the effect of at least halving the distance (or reducing the distance by 1 bit), so that the convergence rate of the whole query process is O(logN), where N is the number of all nodes in the network.

When x wants to query the

pair, the operation is similar to the search node, x selects k nodes whose ID values are closest to the key value, performs FIND_VALUE operation, and repeats FIND_VALUE operation for each new node returned. Until a node returns a value.
,value>

Once the FIND_VALUE operation succeeds, the

pairs are cached on the nearest node that does not return a value. The next time you query for the same key, you will get the result much faster. In this way, the cache range of the popular

data is gradually expanded, making the system have excellent response speed (cache is 24 hours, but the contents of the target node are republished to other nearest nodes every hour to refresh the timeout period of the data, The lifetime of the data cached is inversely proportional to the distance from the target node.
,value>
,value>


Data is stored

The process of storing

pairs is as follows:
,value>

  1. The initiator first locates the k nodes whose ID values are closest to the key
  2. Initiator Initiates STORE operations on the K nodes
  3. The k nodes that perform the STORE operation republish all of their own every hour<key,value>The data
  4. In order to limit the failure information, all<key,value>Valid data expires 24 hours after initial release

In addition, in order to ensure the consistency of data publishing and searching, it is stipulated that at any time, when node W finds that the new node U is closer than some

pairs of data on w, w copies these

pairs of data to U, but will not delete them from W.
,value>
,value>


Node joining and leaving

If node U wants to join the Kad network, it must contact a node that is already on the Kad network, such as w.

U first inserts w into its own appropriate K bucket, then performs a FIND_NODE operation on its own node ID (publishes a FIND_NODE request to w to find U), and then updates the contents of its own K bucket based on the information received. By gradually querying its neighboring nodes from near to far, U completes the construction of the information of K bucket, which is still empty, and publishes its information to K bucket of other nodes at the same time.

For example, the routing table of node U is generated as follows:

  1. Initially, the routing table of U is a single K bucket covering the entire 160 bit ID space, as shown in Figure 6.
  2. After learning the new node information, U will try to insert the new node information into the corresponding bucket K according to its prefix value:
    1. If the K bucket is not full, the new node is directly inserted into the K bucket.
    2. If the K bucket is full,
      1. If the coverage range of the K bucket contains the ID of node U, the K bucket is split into two new K buckets of the same size, and the node information in the original K bucket is reallocated according to the prefix value of the new K bucket
      2. If the coverage scope of bucket K does not contain the ID of packet node U, the information about the new node is directly discarded
  3. This process is repeated over and over again, resulting in a routing table in the Table 1 structure. To achieve the result that the node with close distance has more information and the node with far distance has less information, the route query process can be fast convergence.

In Figure 7, we demonstrate how the K bucket is progressively split when the coverage contains its own ID value.

When bucket 010 of K is full, because its coverage range contains the ID of node 0100, the K bucket is split into two new K buckets, 0101 and 0100. The information of 010 of K bucket is redistributed to the two new K buckets according to its prefix value. Note that the 160 bit ID notation is not used here, but for the sake of demonstration of principle, the actual Kad network has 160 bit ID values.

Nodes leave the Kad network without publishing any information, and one of the goals of the Kademlia protocol is to be able to work flexibly if any node fails at any time. To this end, Kad requires that each node publish all its

pairs periodically (typically: every hour) and cache the data to its k nearest neighbors, so that the data stored in the failed node can be quickly updated to other new nodes. So if a node leaves, it leaves, and the k-bucket refresh mechanism in the node ensures that the information of the node that is no longer online is removed from the local K-bucket
,value>


reference

Github.com/blockchainG… Being fostered fostered fostered fostered

Mindcarver. Cn/fostered fostered fostered fostered fostered

Blog.csdn.net/qq_25870633…

[www.ic.unicamp.br/~bit/ensino… www.ic.unicamp.br/~bit/ensino… A peer-to-peer Information System Based on the XOR Metric .pdf)

Blog.csdn.net/hoping/arti…

www.jianshu.com/p/f2c31e632…

www.ic.unicamp.br/~bit/ensino…


List of topics: Juejin, Github, SmartBlue, Cyanosis, Chaning-Cyan, Fancy, hydrogen, condensed- nighttime Purple, Greenwillow

Contribution topics:Github.com/xitu/juejin…

theme: juejin highlight: