About Ethereum

Structure design of point – to – point communication protocol family

It is divided into three layers:

  • 1. At layer 1, eth: Eth.Peer represents the remote communication object and all its communication operations. It encapsulates the lower-layer P2P. peer object and read/write channels. Eth. PeerSet: Indicates the set type of the peer. By eth Ethereum, eth. ProtocolManager module USES such as top management.
  • 2. The second layer, PKG P2P, can be considered as a generalized P2P communication structure. P2p. Peer{} represents the remote communication object, CONN {} encapsulated from the lower level connection object, protoRW{} is the channel object for communication, and P2P. Server{} is the object that starts listening and processes new connections or disconnections. P2p. Protocol{} is a type specially developed for upper-layer applications. It contains callback functions required by upper-layer applications and sends the origin of a new connection to the peer through p2p.Server{}.
  • The third layer is the network code package of Golang, which is divided into two parts: 1. PKG NET, including the interface representing network connection, the interface representing network address and their implementation class; 2. PKG SYSCall, including the network related system call class at the lower level, can be regarded as the system implementation encapsulating network layer (IP) and transport layer (TCP) protocol.

From the perspective of logical application classification, there are also three levels:

1. Udp-based neighbor discovery layer;

The UDP-based neighbor discovery layer uses the Kad (Kademlia P2P Network Protocol) node discovery mechanism to discover neighbors by ping pong and shaking hands with other nodes on the network, calculate the distance between nodes, and dynamically maintain the neighbor table.

2. TCP based encrypted communication layer;

The tcp-based encrypted communication layer shakes hands with the nodes discovered by the node discovery layer to establish secure encrypted connections, and is responsible for encoding and decoding, encryption and decryption, and secure transmission of the data provided by the core layer protocol. RLPX is the protocol actually used by this layer.

3. Core protocol layer.

The core protocol layer is responsible for passing the business data that needs to be sent to the encrypted communication layer and processing the business data received from the encrypted communication layer.

Theoretically, whether TCP or UDP communication, as long as you know the IP address and port number can be carried out. In order to find the address of the corresponding party, there are two ways to start: 1. Initial connection (seed node) 2. Address propagation discovery (flooding, distributed hash table).

C/s mode

In C/S mode, clients obtain services through the server. The services on the server are also rich and colorful, but the drawback is also very obvious, that is, centralization. It can be said that the security of information is all in the server. His network topology is as follows:

Theoretically, whether TCP or UDP communication, as long as you know the IP address and port number can be carried out. Take chat software as an example, generally speaking, you are with the server to establish a connection, rather than directly with the chat object to establish a connection, so in fact, you do not need to know the IP address and port number of the other side, the server can know, no matter what you do have to go through the server.

Blockchain P2P network classification: centralized P2P network, fully distributed unstructured P2P network, fully distributed structured P2P network, semi-distributed P2P network.

In terms of the network topology, C/S is much like the centralized p2p model (the network topology is shown below)

The difference is that it is peer to peer, that is, both parties can provide services for each other and enjoy the services of each other. The central server of P2P differs from the central server of C/S in that the central server of P2P only provides indexes, while the traditional server provides full services. One of his communication process is: the IP and port of the central server are known, the new node joins the network, and reports to the central server its IP and port number, so that other nodes can be queried, p2p communication.

Fully distributed model, its network topology is as follows. This model is truly decentralized, with nodes joining and leaving freely. But that raises the question, how do you get the target address? Through the initial connection – seed node, through address propagation found: non – structural – flooding, structural – distributed hash table solution to obtain the address of the other party.

The fully distributed non-structural model was introduced first. A relatively successful case was Gnutella, which used the flooding technology to discover other nodes and random forwarding mechanism. The subvalue of TTL (Time to Live) was used to control the finite propagation of message communication. However, this technology can be used to find nodes by violence when the nodes in the network have accumulated to a certain extent. Then, when the network is just established, how can the new nodes learn the information of the network nodes? Gnutella also uses directory servers to find the addresses of other nodes.

A fully distributed unstructured model, with seed nodes, is the solution used in Bitcoin. These nodes provide the initial entry point to the network at initial startup. New nodes are connected to other nodes through these stable nodes as an intermediary, and can continuously obtain the address list of nodes in the blockchain network, so these nodes are also called seed nodes. After intervening in a new node, the node can be used as an intermediary to obtain the address of other nodes in the network, which is called address broadcast. By using net.advertiselocal (), bitcoin advertises its node information to other nodes. This is an active one-way process. After receiving, node B saves the data locally and does not respond. Similarly, between node B and other nodes, it can also push its own address to other connected nodes, so that the connected nodes can obtain the address information of their own nodes by pushing. B) Actively fetching (pulling) addresses of other nodes. Only pushing its own address is not enough to broadcast it only to the nodes that it is connected to. Bitcoin also has another address broadcasting mechanism, which actively retrieving addresses through net.getaddresses (). Node A obtains the address of node B. Node B returns the address information included in node B. To avoid wasting network resources, you need to respond to each request once. Node B can also request addresses from other nodes to expand its address base. 3) Address database In order to avoid the limitation of connection number and bandwidth of seed nodes, bitcoin nodes acquire the information of other nodes through broadcasting after accessing the network, and save the results so that they can access the network next time. The bitcoin client stores the obtained node information in a local file named peers. Dat, using levelDB format to store the data.

Fully distributed structural model, this is to solve the non-structural model, node address management is difficult, there is no fixed rule constraints between nodes, can not accurately locate node information, only through the flood query way into the row search, the network consumption is very large. A structured network uses a distributed Hash table (DHT) to normalize the addresses of different nodes into standard length data by means of Hash functions. Relatively successful cases include Chord, Pastry, etc. The structure of the network is the same as that of the unstructured network, which is random and has no fixed structure, but the node management has a fixed structure diagram. Ethereum converts the public key of the node ellipse encryption algorithm into a 64-byte NodeID as a unique identifier to distinguish nodes, allowing Ethereum to achieve precise node address lookup without a central server. DHT hashes P2P network nodes to standard-length data using Hash algorithms. The whole network forms a huge Hash table. Each participating node has a part of the hash table and stores and maintains its own data. The hash table is distributed on each node of the P2P network. Any node connected to P2P network has its own ID located in the hash table, which can be used to find more nodes through DHT, or can be accurately searched by other nodes according to the ID value. Although DHT supports nodes to join or leave freely, the complex maintenance mechanism of DHT makes it unable to adapt to high-frequency node changes. Ethereum uses the Kadenlia protocol. Kad is a DHT protocol. With this protocol, addresses can be searched quickly and accurately.

Introduction of KAD algorithm

Compared with traditional DHT, Kad has the following advantages:

A) The query request of Kad is parallel and asynchronous, which can avoid the query failure caused by node exit or failure.

B)Kad simplifies the number of configuration messages that nodes must send to learn about each other, and automatically exchanges configuration information when lookups occur.

C)Kad adopts binary tree nodes to divide into multiple Kad buckets to simplify the query structure. Unidirectional XOR algorithm (XOR) is used to calculate the distance to ensure that all queries of the same TargetID converge to the same path, so as to reduce the query network consumption between nodes and improve the query efficiency.

D) The algorithm used by nodes to record whether other nodes are available can prevent some common denial of service (DoS) attacks.

Ethereum Kad converges differently from traditional Kad. Traditional Kad uses its own selfID as the convergence target, while Ethereum not only uses its own selfID as the convergence target, but also uses randomly generated TargetID as the convergence target to generate a wide range of hash tables.

How to achieve the address of the accurate query?

NodeID= Node identifier (public key length after node ellipse encryption /8) = 64-bit length data, which regulates the length of each node.

Nbuckets=Kad buckets (common.hash{} * 8/15) =32 * 8/15=17. Nodes maintain the number of routing tables.

Hash {} * 8-nbuckets = 239, which is the distance between kAD standard nodes.

Bucketsize =16, each bucket contains 16 nodes. The more nodes each routing table contains, the faster the search speed is, and the more resources are consumed

RefreshInterval Refresh timer=30 * time.Minute: Specifies the refresh frequency.

RevalidateInterval Validates timer = 10 * time.Second.

CopyNodesInterval Persistence timer = 30 * time.Second, set the persistence frequency.

How does the KAD algorithm find the target node exactly?

Maintain an ID for each node that uniquely identifies the node. After there is a marker, how to find. The most important point of this algorithm is to use xOR operation to get the logical distance between them, targetID and selfID xOR calculation, the result of the highest bit of 1 is an order of magnitude of the distance between them, called KAD bucket, this is the xor property, naturally is a binary tree structure. Because if you go from left to right, from high to low, the last bit that’s the same xor is going to be 0, and then it’s going to be different, and it’s going to be 1, and then you can fork, and it’s going to be either 0 or 1, so it’s a binary tree.

As long as it is agreed that the ID number corresponds to the sort in the “communication record”, it must be found. After locating the corresponding bucket, make the closest bucket to the target TID point (which exists in your “address book”) to help you find it in the same way, and return the “contact information” of the target tid if it is found. In this way, the target node can be found quickly, and only log2N times are required.

Query steps:

Closest (a) Closest to the target node is obtained using the closest() method on the current bucket.

(b) Use Lookup() to search again if the node is not found in the surrounding node.

(c)Lookup () will ask the known node to find its neighbor, which in turn recursively finds its nearest alpha (3) node findNode ().

Reference:

  1. Blockchain P2P Network Protocol Evolution process
  2. Keeganlee. Me/post/blockc…
  3. www.jianshu.com/p/f2c31e632…
  4. Blog.csdn.net/teaspring/a…