Why do you need timeout control?

  • If the request time is too long, the user may have left the page and the server is consuming resources to process the request. The result is meaningless
  • A long time of server processing consumes too many resources, resulting in reduced concurrency and even unavailability

Go timeout control is necessary

Go is normally used to write back-end services. Generally, a request is completed by multiple serial or parallel sub-tasks, and each sub-task may be another internal request. Therefore, when the request times out, we need to quickly return to release the occupied resources, such as Goroutine, file descriptor, etc.

Timeout control common to the server

  • Logical processing within a process
  • Read and write client requests, such as HTTP or RPC requests
  • Invoke other server requests, including RPC calls or DB access

What if there is no timeout control?

To simplify this article, let’s take a request function, hardWork, as an example. It doesn’t matter what it does, and as the name suggests, it can be slow to process.

func hardWork(job interface{}) error {
	time.Sleep(time.Minute)
	return nil
}

func requestWork(ctx context.Context, job interface{}) error {
  return hardWork(job)
}
Copy the code

At this point, the client will always see the familiar screen

The vast majority of users will not wait a minute and leave you early, leaving a bunch of resources occupied throughout the call link. This article will not go into other details but focus on timeout implementation.

Let’s take a look at how to implement timeouts and what the pits are.

First version implementation

Instead of reading down, try to figure out how to implement a timeout for this function. First try:

func requestWork(ctx context.Context, job interface{}) error {
	ctx, cancel := context.WithTimeout(ctx, time.Second*2)
	defer cancel()

	done := make(chan error)
	go func(a) {
		done <- hardWork(job)
	}()

	select {
	case err := <-done:
		return err
	case <-ctx.Done():
		return ctx.Err()
	}
}
Copy the code

Let’s test that by writing a main function

func main(a) {
	const total = 1000
	var wg sync.WaitGroup
	wg.Add(total)
	now := time.Now()
	for i := 0; i < total; i++ {
		go func(a) {
			defer wg.Done()
			requestWork(context.Background(), "any")
		}()
	}
	wg.Wait()
	fmt.Println("elapsed:", time.Since(now))
}
Copy the code

Run for results

➜ go run timeout.go
elapsed: 2.005725931s
Copy the code

The timeout has taken effect. But is that enough?

Goroutine leaked

Let’s add a line at the end of our main function to see how many goroutines we have

time.Sleep(time.Minute*2)
fmt.Println("number of goroutines:", runtime.NumGoroutine())
Copy the code

Sleep for 2 minutes to wait for all tasks to finish, and then print the current number of Goroutines. So let’s do it and see what happens

➜ go run timeout. Go Elapsed: 2.005725931s number of goroutines: 1001Copy the code

Goroutine leaked, so let’s see why. First, the requestWork exits after 2 seconds. Once the requestWork exits, the Done Channel has no goroutine to receive. The done < -hardwork (job) line is stuck, causing each timeout request to consume a Goroutine. This is a big bug, and the entire service becomes unresponsive when resources run out.

So how to fix? When making chan, set the buffer size to 1, as follows:

done := make(chan error, 1)
Copy the code

This allows done < -hardwork (job) to write regardless of timeout without jamming the Goroutine. In Go, a channel doesn’t have to be closed, it’s just an object. Close (channel) is just used to tell the receiver that there’s nothing left to write. No other use.

After changing this line of code let’s test again:

➜ go run timeout.go
elapsed: 2.005655146s
number of goroutines: 1
Copy the code

Goroutine leak problem solved!

Panic cannot catch

Let’s change the hardWork function implementation to

panic("oops")
Copy the code

The code to modify the main function and catch the exception is as follows:

go func(a) {
  defer func(a) {
    if p := recover(a); p ! =nil {
      fmt.Println("oops, panic")}} ()defer wg.Done()
  requestWork(context.Background(), "any")
}()
Copy the code

Panic cannot be caught because other goroutines of panic created in the Goroutine in the requestWork cannot be caught.

Buffer size = 1; buffer size = 1; buffer size = 1;

func requestWork(ctx context.Context, job interface{}) error {
	ctx, cancel := context.WithTimeout(ctx, time.Second*2)
	defer cancel()

	done := make(chan error, 1)
	panicChan := make(chan interface{}, 1)
	go func(a) {
		defer func(a) {
			if p := recover(a); p ! =nil {
				panicChan <- p
			}
		}()

		done <- hardWork(job)
	}()

	select {
	case err := <-done:
		return err
	case p := <-panicChan:
		panic(p)
	case <-ctx.Done():
		return ctx.Err()
	}
}
Copy the code

Panic can be handled in the requestWork caller.

Does the timeout have to be right?

The requestWork implementation above ignores the CTX parameter passed in. If CTX has a timeout set, we must pay attention to whether the timeout is less than 2 seconds. If it is less than 2 seconds, we need to use the timeout. Go-zero /core/contextx already provides a method to do this for us in one line of code, just change it as follows:

ctx, cancel := contextx.ShrinkDeadline(ctx, time.Second*2)
Copy the code

Data race

The requestWork returns only one error parameter. If multiple parameters need to be returned, then we need to pay attention to the data race, which can be solved by locking. Specific implementation reference go – zero/ZRPC/internal/serverinterceptors timeoutinterceptor. Go, there is no need to do.

Complete sample

package main

import (
	"context"
	"fmt"
	"runtime"
	"sync"
	"time"

	"github.com/tal-tech/go-zero/core/contextx"
)

func hardWork(job interface{}) error {
	time.Sleep(time.Second * 10)
	return nil
}

func requestWork(ctx context.Context, job interface{}) error {
	ctx, cancel := contextx.ShrinkDeadline(ctx, time.Second*2)
	defer cancel()

	done := make(chan error, 1)
	panicChan := make(chan interface{}, 1)
	go func(a) {
		defer func(a) {
			if p := recover(a); p ! =nil {
				panicChan <- p
			}
		}()

		done <- hardWork(job)
	}()

	select {
	case err := <-done:
		return err
	case p := <-panicChan:
		panic(p)
	case <-ctx.Done():
		return ctx.Err()
	}
}

func main(a) {
	const total = 10
	var wg sync.WaitGroup
	wg.Add(total)
	now := time.Now()
	for i := 0; i < total; i++ {
		go func(a) {
			defer func(a) {
				if p := recover(a); p ! =nil {
					fmt.Println("oops, panic")}} ()defer wg.Done()
			requestWork(context.Background(), "any")
		}()
	}
	wg.Wait()
	fmt.Println("elapsed:", time.Since(now))
	time.Sleep(time.Second * 20)
	fmt.Println("number of goroutines:", runtime.NumGoroutine())
}
Copy the code

For more details

Please refer to the go-Zero source code:

  • go-zero/core/fx/timeout.go
  • go-zero/zrpc/internal/clientinterceptors/timeoutinterceptor.go
  • go-zero/zrpc/internal/serverinterceptors/timeoutinterceptor.go

The project address

Github.com/tal-tech/go…

Welcome to Go-Zero and star support us!

WeChat communication

Follow the “micro service practice” public account and reply to the group to obtain the community qr code.