The introduction

The goal of this series of articles is to record based on LevelDB(RockDB) build a distributed KV storage engine implementation process, this series of articles corresponding source in DSTORE.

This paper mainly analyzes the implementation principle of the client in the network framework, which is divided into the following two parts

Other articles in this series include:

Client function requirements

In a distributed system, a server acts as both server and client in the process of interacting with other servers. Therefore, the implementation of the client in the network framework is also crucial. Generally, the client in the network framework provides at least the following interfaces:

  • Connect: Connect to another server. Because it is a server program and requires high performance, this operation must be non-blocking
  • Read: Reads the data sent by the server
  • Write: Writes data to the server

Client-side implementation

This section discusses the implementation of various interfaces required by clients in the network framework.

connect

Let’s take a look at the blocking method connect, which is generally used as follows

int ret = connect(fd, server_addr, server_len);
send(fd, data);
recv(fd, data);Copy the code

The blocking connect function call looks like this, and its TCP state transition looks like this:

As shown above, the whole process is as follows:

  1. When the client invokes blocking connect, the operating system sends SYN packets to the server, and the TCP status of the client changes to SYNC_SENT
  2. The server calls Accept to accept the connect request, first sets its TCP status to SYNC_RCVD, and then sends an acknowledgement packet for the SERVER’s SYN packet
  3. When the client receives the confirmation packet from the server, the operating system sets the TCP state to ESTABLISHED. Then the blocked connect function returns and the operating system sends the confirmation packet to the server
  4. After the server receives the confirmation packet from the client, the operating system sets the TCP connection state to ESTABLISHED, and the accept call from the server returns

As can be seen from the above call flow, a blocking CONNECT will wait for the server to receive the confirmation packet before returning. The waiting time is the round-trip time of a network packet. For the server program, blocking is unacceptable for waiting for the connection to be established, so a non-blocking CONNECT must be used.

Non-blocking CONNECT is called as follows, and its TCP state transitions are shown below:

As shown above, the whole process is as follows:

  1. When the client invokes connect, the operating system sends a SYN packet to the server. The TCP status of the client changes to SYNC_SENT, and the client returns the packet
  2. The server calls Accept to accept the connect request. It first sets its TCP state to SYNC and sends an acknowledgement of the SYN packet sent by the client. The Accept function then returns
  3. After the client receives the confirmation packet from the server, the operating system sets the status to ESTABLISHED and sends the confirmation packet to the server
  4. After the server receives the confirmation packet, the operating system sets the status to ESTABLISHED

For non-blocking CONNECT, there is no waiting for the server to acknowledge the packet, but the network framework needs to handle the fact that the application needs to be notified when the connection is actually established.

According to the Linux manual documentation, the description is as follows

EINPROGRESS The socket is nonblocking and the connection cannot be completed immediately. It is possible to select(2) or poll(2) for completion by selecting the socket for writing. After select(2) indicates writability, ‐ use getsockopt(2) to read the SO_ERROR option at level SOL_SOCKET to determine whether connect() completed success‐ fully (SO_ERROR is zero) or unsuccessfully (SO_ERROR is one of the usual error codes listed here, explaining the reason for the failure).

As mentioned above, after non-blocking connect, if the return value is EINPROGRESS, you need to use select or epoll to listen for writable events. Then, you need to use getSockopt to find out if there is an error. If there is no error, the connection is successfully established. Of course, if the return value is 0, the connection will have been established by the time a non-blocking CONNECT returns, just as with a blocking connect.

The entire implementation of non-blocking connect is done in tcp_client, where non-blocking CONNECT calls are done in connect.

read

Read is implemented the same as read on the server side, so I won’t go into details here.

write

The implementation of write is different from that of the server, and different processing needs to be done according to different situations:

  • If the connection is in the connected state, the getsockopt function checks whether the connection is properly established and sets the connection to the connected state when the writable event is received
  • If the connection is connected, a call to write is attempted to write data to the kernel buffer when a writable event is received

The first three articles of distributed KV storage engine described how to design and implement the network framework, and the following articles will focus on how to design and implement the RPC library based on the network framework, stay tuned.

PS: The update of this blog will be pushed to the wechat public account as soon as possible, welcome everyone to follow.

reference

  • Unix Network programming