Original: Little sister taste (wechat public ID: XJjdog), welcome to share, reprint, please keep the source.

Today introduce a function that can take out boast: realize socket handle to migrate between processes!

We have a large number of server instances running on our server. Each of these instances carries hundreds of thousands of connections and very busy network requests. It’s every Internet programmer’s dream to be able to play with that number of connections, that traffic.

But software is always updated, and when it is upgraded, you need to stop the old instance and start a new one. In between these stops, tens of seconds pass, not to mention JAVA’s startup time of having a baby.

The traditional way is to remove the instance from the load balancing, and then add it when it is started. For microservices, quarantine first and then unquarantine after startup. These operations are a nightmare for a large number of applications.

1. Zero downtime update

Is there a way to transfer the socket mounted by one process to another process? In this way, when I upgrade, I can start an upgraded version of the process and then transfer the old process’s socket, one by one, to it.

Achieve zero downtime update.

This is ok. Facebook did something similar, and they called it Socket Takeover. Don’t search for this keyword on Baidu, you’ll probably get a bunch of garbage.

It’s such a great technology, it’s so useful, why doesn’t anyone understand it? Don’t ask me. I don’t know. Maybe people are too busy studying the character hui in anise beans right now.

So today, let xJjDog introduce you, and increase your bragging capital in the future.

This awesome feature is implemented by a pair of low-level Linux system calls: sendmsg() and recvmsg(). The send function is used when sending network packets, but only when the socket is connected. In contrast, sendmsg can be used at any time.

2. Technical points

In C network programming, you first register the listening address with the LISTEN function and then accept the new connection with the accept function. Such as:

int listen_fd = socket(addr->ss_family, SOCK_STREAM, 0); . bind(listen_fd, (struct sockaddr *) addr, addrlen); .int accept_fd = accept(fd, (struct sockaddr *) &addr, &addrlen);
Copy the code

So the first thing we’re going to do is we’re going to pass the listen_fd from one process to another. How do I send it? There’s got to be a passage. On Linux, that’s UDS, Unix Domain Sockets.

2.1 Unix Domain Sockets Listening

The representation of UDS (Unix Domain Sockets) on Linux is a file. Instead of listening on a port, a process can also listen on a UDS file, such as/TMP /xjjdog.sock. The DATA transfer file does not require physical devices such as network adapters. Therefore, the data transfer speed through the UDS is very fast.

But today we don’t care how big it is, we care how useful it is. Using the bind function, we can also receive connections through this file, just as the port receives connections.

struct sockaddr_un addr;
char *path="/tmp/xjjdog.sock";
int err, fd;
fd = socket(AF_UNIX, SOCK_STREAM, 0);
memset(&addr, 0.sizeof(struct sockaddr_un));
addr.sun_family = AF_UNIX;
strncpy(addr.sun_path, path, strlen(path));
addrlen = sizeof(addr.sun_family) + strlen(path); err = bind(fd, (struct sockaddr *) &addr, addrlen); . accept_fd = accept(fd, (struct sockaddr *) &addr, &addrlen);Copy the code

Like this. Other processes can then connect to our service in two different ways.

  1. Through the port: Normal services are performed and normal service data is output. Perform normal services
  2. Through THE UDS: Starts receiving datalisten_fdandaccept_fdBoys and girls. The socket service is migrated without stopping

2.2 Key points of FD migration technology

How do you migrate? Let’s focus on the second step.

In fact, when the newly upgraded service is connected through the UDS, we start to use the sendmsg function to transfer listen_fd to it.

Let’s take a look at the arguments to the sendmsg function.

ssize_t sendmsg(
    int socket,
    const struct msghdr *message,
    int flags
);
Copy the code

Sockets can be understood as our UDS connection. The key is the structure MSGHDR.

struct msghdr {
    void            *msg_name;      /* optional address */
    socklen_t       msg_namelen;    /* size of address */
    struct          iovec *msg_iov; /* scatter/gather array */
    int             msg_iovlen;     /* # elements in msg_iov */
    void            *msg_control;   /* ancillary data, see below */
    socklen_t       msg_controllen; /* ancillary data buffer len */
    int             msg_flags;      /* flags on received message */
};
Copy the code

Where, msg_iov represents the data to be sent normally, such as HelloWord; In addition to that, there are two ancillary variables that provide additional functionality, the variables msg_control and msg_controllen. Where msg_control points to another structure, CMSGHDR.

struct cmsghdr {
    socklen_t cmsg_len;    /* data byte count, including header */
    int       cmsg_level;  /* originating protocol */
    int       cmsg_type;   /* protocol-specific type */
    /* followed by */
    unsigned char cmsg_data[];
};
Copy the code

Within this structure, there is a member variable called cmsg_type, which is the key to socket migration.

It has three types.

  • SCM_RIGHTS
  • SCM_CREDENTIALS
  • SCM_SECURITY

Where SCM_RIGHTS is what we need, it allows us to send a file handle from one process to another process.

struct msghdr msg;.struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
cmsg->cmsg_level = SOL_SOCKET;
cmsg->cmsg_type = SCM_RIGHTS;

// The socket fd list is set on cmsg_data
int *fds = (int *) CMSG_DATA(cmsg);
Copy the code

With sendMsg, the socket handle is sent to another process.

3. Receive and restore

Again, the recvmsg function will receive this data and restore it to the CMSGHDR structure. We can then get the handle list from cmSG_data.

Why is that possible? Because a socket handle, in a process, is really just a reference. The actual FD handle is actually in the kernel. Migration is simply removing a pointer from one process and adding it to another.

Fd handle, there are two cases.

  • Listen for a FD and call it directlyacceptFunction on fd
  • Normal FD, which needs to be restored to a normal socket

Pictures from the paper: (Zero Downtime Release: Disruption Load Balancing of a multi-billion User Website).

For a normal FD, you must call the same code logic as when the original new connection arrived. Therefore, a general migration process includes:

  1. First, migrate the listener FD to the new process and start the listener so that the new process can receive new requests quickly. If we turn it onSO_REUSEADDROption, and old and new services can even be served together
  2. Wait for the new process to warm up, then stop listening on the original process
  3. Migrate a large number of sockets in the original process, which may be tens of thousands of sockets. It is best that the code can see the migration progress
  4. The new process receives these sockets and gradually restores them to a normal connection. You skip the Accept phase and get the list of sockets
  5. After the migration, the old process is idle and can be safely stopped

4. End

This is a dark technology that has actually been used in some mainstream applications. You’ll see some very familiar software, and this feature is a big selling point for them. For example, HAProxy, load balancing running on layer 4 networks; Envoy, Istio’s default data plane software, for example, uses a similar technique to complete a hot restart.

In fact, similar technologies, such as SOFA, will be used to replace proxy in the process of promoting Servicemesh. For Golang and C, this feature is easy to implement because of the API exposure; In Java, however, there is a lot of difficulty because the cross-platform nature of Java does not allow for this kind of API customization for Linux.

As you can see, the sendmsg and recvmsg functions do some pretty cool things. It is more suitable for stateless proxy services, where the migration is not secure if the service has state persisting, and of course you can try to apply this technique to some middleware. But either way, this dark technology, with its violent beauty of a different kind, is sure to make Windows Server users cry.

Author introduction: Little sister taste (XJjdog), a programmer is not allowed to detour the public number. Focus on infrastructure and Linux. Ten years of architecture, 10 billion traffic per day, and you discuss the high concurrency world, give you a different taste. My personal wechat xJJdog0, welcome to add friends, further communication.