Smooth restart:

In current software systems, deploying a new version or modifying some configuration information without shutting down the service has become a mandatory requirement. Here are different ways to smooth the restart of an application, while using some examples to mine the details. Here, through introduction to Teleport to Teleport is designed for Kubernetes access control, unfamiliar with can check this link https://gravitational.com/teleport/.


SO_REUSERPORT vs Duplicating Sockets:

In order to make Teleport more highly available, we have recently spent some time on how to smoothly restart Teleport’s TLS and SSH listener. Our goal is to upgrade Teleport’s package without generating a new instance.

Two general realization method is described in this article, https://blog.cloudflare.com/the-sad-state-of-linux-socket-balancing, its method is roughly like this:

You can set SO_REUSERPORT when using a socket, which allows multiple processes to bind to the same port. In this way, each process has a corresponding receive processing queue.

You can also reuse sockets by passing them to child processes that share a single receive queue.

There are some downsides to SO_REUSERPORT. One is that our engineers have used this approach before, and this multiple receive queue approach sometimes causes TCP connections to break. Also, Go is not easy to set the SO_REUSERPORT parameter.

The second approach is more appealing because most developers are familiar with its simple Unix fork/exec model. In this way, all file descriptors can be passed to the child process, but the OS /exec package in GO currently does not allow this, possibly for security reasons, and only stdin STdou and stderr can be passed to the child process. But OS packages have lower-level packages that pass all file descriptors to child processes, and that’s exactly what we’re going to do.


Signal control process switching:

Before we go into the official source code, let’s talk about the details of how this works.

A socket listener is created when a new Teleport process is started, which receives all traffic sent to the destination port. We added a signal handler to handle SIGUSR2, which causes Teleport to copy a Lisenter socket and then pass the file descriptor and metadata information about its environment variables to a new process. Once a new process is started, use the previously passed file to describe the compliance elements to start modifying the socket and start traffic.

It should be noted that after the socket is reused, the two sockets process traffic in a cyclic equilibrium, as shown in the following figure. This means that the Teleport process will accept new connections at intervals.



Figure 1: Teleport can reuse itself and share data transmission with other multiplexed processes

The parent process (PID2) is closed in the same way, but in reverse order. Once a Teleport process receives a SIGOUT signal, it begins to shut down the process by stopping receiving new connections and waiting for all connections to exit. The parent process will then close its own listener socket and exit. Now the kernel only sends traffic to new processes.




Figure 2: Once the first process is closed, all traffic will no longer be reused.

Example:

We wrote a little application using this method. The source code is at the bottom. First we compile and then apply:

$ go build restart.go
$ ./restart &
[1] 95147
$ Created listener file descriptor for :8080.

$ curl http://localhost:8080/hello
Hello from 95147!
Copy the code

Send the USR2 signal to the original process, now when you click send HTTP request, two process Pids will be returned:

$ kill -SIGUSR2 95147
user defined signal 2 signal received.
Forked child 95170.
$ Imported listener file descriptor for :8080.

$ curl http://localhost:8080/hello
Hello from 95170!
$ curl http://localhost:8080/hello
Hello from 95147!Copy the code

Kil drops the original process and you will find that it returns a new PID number:

$ kill -SIGTERM 95147
signal: killed
[1]+  Exit 1                  go run restart.go
$ curl http://localhost:8080/hello
Hello from 95170!
$ curl http://localhost:8080/hello
Hello from 95170!Copy the code

Finally, kill the new process, the whole process does not kill.

$ kill -SIGTERM 95170
$ curl http://localhost:8080/hello
curl: (7) Failed to connect to localhost port 8080: Connection refusedCopy the code

As you can see, writing a smooth restart service in Go is easy once you understand how it works and can greatly improve the efficiency of your service.

Golang Graceful Restart Source Example

package main

import (
	"context"
	"encoding/json"
	"flag"
	"fmt"
	"net"
	"net/http"
	"os"
	"os/signal"
	"path/filepath"
	"syscall"
	"time"
)

type listener struct {
	Addr     string `json:"addr"`
	FD       int    `json:"fd"`
	Filename string `json:"filename"`
}

func importListener(addr string) (net.Listener, error) {
	// Extract the encoded listener metadata from the environment.
	listenerEnv := os.Getenv("LISTENER")
	if listenerEnv == "" {
		return nil, fmt.Errorf("unable to find LISTENER environment variable")
	}

	// Unmarshal the listener metadata.
	var l listener
	err := json.Unmarshal([]byte(listenerEnv), &l)
	iferr ! = nil {return nil, err
	}
	ifl.Addr ! = addr {return nil, fmt.Errorf("unable to find listener for %v", addr)
	}

	// The file has already been passed to this process, extract the file
	// descriptor and name from the metadata to rebuild/find the *os.File for
	// the listener.
	listenerFile := os.NewFile(uintptr(l.FD), l.Filename)
	if listenerFile == nil {
		return nil, fmt.Errorf("unable to create listener file: %v", err)
	}
	defer listenerFile.Close()

	// Create a net.Listener from the *os.File.
	ln, err := net.FileListener(listenerFile)
	iferr ! = nil {return nil, err
	}

	return ln, nil
}

func createListener(addr string) (net.Listener, error) {
	ln, err := net.Listen("tcp", addr)
	iferr ! = nil {return nil, err
	}

	return ln, nil
}

func createOrImportListener(addr string) (net.Listener, error) {
	// Try and import a listener for addr. If it's found, use it. ln, err := importListener(addr) if err == nil { fmt.Printf("Imported listener file descriptor for %v.\n", addr) return ln, nil } // No listener was imported, that means this process has to create one. ln, err = createListener(addr) if err ! = nil { return nil, err } fmt.Printf("Created listener file descriptor for %v.\n", addr) return ln, nil } func getListenerFile(ln net.Listener) (*os.File, error) { switch t := ln.(type) { case *net.TCPListener: return t.File() case *net.UnixListener: return t.File() } return nil, fmt.Errorf("unsupported listener: %T", ln) } func forkChild(addr string, ln net.Listener) (*os.Process, error) { // Get the file descriptor for the listener and marshal the metadata to pass // to the child in the environment. lnFile, err := getListenerFile(ln) if err ! = nil { return nil, err } defer lnFile.Close() l := listener{ Addr: addr, FD: 3, Filename: lnFile.Name(), } listenerEnv, err := json.Marshal(l) if err ! = nil { return nil, err } // Pass stdin, stdout, and stderr along with the listener to the child. files := []*os.File{ os.Stdin, os.Stdout, os.Stderr, lnFile, } // Get current environment and add in the listener to it. environment := append(os.Environ(), "LISTENER="+string(listenerEnv)) // Get current process name and directory. execName, err := os.Executable() if err ! = nil { return nil, err } execDir := filepath.Dir(execName) // Spawn child process. p, err := os.StartProcess(execName, []string{execName}, &os.ProcAttr{ Dir: execDir, Env: environment, Files: files, Sys: &syscall.SysProcAttr{}, }) if err ! = nil { return nil, err } return p, nil } func waitForSignals(addr string, ln net.Listener, server *http.Server) error { signalCh := make(chan os.Signal, 1024) signal.Notify(signalCh, syscall.SIGHUP, syscall.SIGUSR2, syscall.SIGINT, syscall.SIGQUIT) for { select { case s := <-signalCh: fmt.Printf("%v signal received.\n", s) switch s { case syscall.SIGHUP: // Fork a child process. p, err := forkChild(addr, ln) if err ! = nil { fmt.Printf("Unable to fork child: %v.\n", err) continue } fmt.Printf("Forked child %v.\n", p.Pid) // Create a context that will expire in 5 seconds and use this as a // timeout to Shutdown. ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) defer cancel() // Return any errors during shutdown. return server.Shutdown(ctx) case syscall.SIGUSR2: // Fork a child process. p, err := forkChild(addr, ln) if err ! = nil { fmt.Printf("Unable to fork child: %v.\n", err) continue } // Print the PID of the forked process and keep waiting for more signals. fmt.Printf("Forked child %v.\n", p.Pid) case syscall.SIGINT, syscall.SIGQUIT: // Create a context that will expire in 5 seconds and use this as a // timeout to Shutdown. ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) defer cancel() // Return any errors during shutdown. return server.Shutdown(ctx) } } } } func handler(w http.ResponseWriter, r *http.Request) { fmt.Fprintf(w, "Hello from %v! \n", os.Getpid()) } func startServer(addr string, ln net.Listener) *http.Server { http.HandleFunc("/hello", handler) httpServer := &http.Server{ Addr: addr, } go httpServer.Serve(ln) return httpServer } func main() { // Parse command line flags for the address to listen on. var addr string flag.StringVar(&addr, "addr", ":8080", "Address to listen on.") // Create (or import) a net.Listener and start a goroutine that runs // a HTTP server on that net.Listener. ln, err := createOrImportListener(addr) if err ! = nil { fmt.Printf("Unable to create or import a listener: %v.\n", err) os.Exit(1) } server := startServer(addr, ln) // Wait for signals to either fork or quit. err = waitForSignals(addr, ln, server) if err ! = nil { fmt.Printf("Exiting: %v\n", err) return } fmt.Printf("Exiting.\n") }Copy the code

Note: golang1.8 and above, because graceful shutdown of server.shutdown is a feature added to 1.8.

English text: https://gravitational.com/blog/golang-ssh-bastion-graceful-restarts/